I have two var of code:
first:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var usersLookUp = list_fiodat.ToLookup(u => u.fio, u => u.dat); // create map
var dates = usersLookUp[FullName];
if (dates.Count() > 0)
{
foreach (var dt in dates)
{
if (dt == BirthDate) return true;
}
}
}
and second:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var members = from s in list_fiodat where s.fio == FullName & s.dat == Birthdate select s;
if (members.Count() > 0 return true;
}
They make the same job - searching user by name and birthday.
The first one work very quick.
The second is very slowly (10x-50x)
Tell me please if it possible accelerate the second one?
I mean may be the list need in special preparing?
I tried sorting: list_fiodat_sorted = list_fiodat.OrderBy(x => x.fio).ToList();, but...
I skip your first test and change Count() to Any() (count iterate all list while any stop when there are an element)
public bool Test1(List<pair_fiodat> list_fiodat)
{
foreach (string fname in XML_files)
{
var members = from s in list_fiodat
where s.fio == fname & s.dat == BirthDate
select s;
if (members.Any())
return true;
}
return false;
}
If you want optimize something, you must leave comfortable things that offer the language to you because usually this things are not free, they have a cost.
For example, for is faster than foreach. Is a bit more ugly, you need two sentences to get the variable, but is faster. If you iterate a very big collection, each iteration sum.
LINQ is very powerfull and it's wonder work with it, but has a cost. If you change it for another "for", you save time.
public bool Test2(List<pair_fiodat> list_fiodat)
{
for (int i = 0; i < XML_files.Count; i++)
{
string fname = XML_files[i];
for (int j = 0; j < list_fiodat.Count; j++)
{
var s = list_fiodat[j];
if (s.fio == fname & s.dat == BirthDate)
{
return true;
}
}
}
return false;
}
With normal collections there aren't difference and usually you use foeach, LINQ... but in extreme cases, you must go to low level.
In your first test, ToLookup is the key. It takes a long time. Think about this: you are iterating all your list, creating and filling the map. It's bad in any case but think about the case in which the item you are looking for is at the start of the list: you only need a few iterations to found it but you spend time in each of the items of your list creating the map. Only in the worst case, the time is similar and always worse with the map creation due to the creation itself.
The map is interesting if you need, for example, all the items that match some condition, get a list instead found a ingle item. You spend time creating the map once, but you use the map many times and, in each time, you save time (map is "direct access" against the for that is "sequencial").
Related
I would like to know, how can i sort the names and surenames alphabetically in my list.
I'm not sure, but i googled and i'm guessing it only sort's by the name.
public void FilterParticipants(List<string> players, PlayerContainer allPlayers)
{
for (int i = 0; i < allPlayers.Count; i++)
{
if (!players.Contains(allPlayers.FindName(i) + " " + allPlayers.FindSurname(i)))
{
players.Add(allPlayers.FindName(i) + " " + allPlayers.FindSurname(i));
}
}
players.Sort();
}
If you want to sort your player names by Surname and then Name and cannot change your design to have a List<Player> passed in, then here's one solution.
Note there's a slight design change, as it's usually better to return a new list rather than modifying the input list. Also, the method name is a little misleading. "Filter" implies that you're reducing the set of items based on some criteria, but in this case we're adding items if they don't exist, so I renamed it to GetCombinedParticipants.
Given that, here's one way you could implement it. Note that this design uses Substring to find the last space in the name, which is used as a delimeter between the first name and the last name (which therefore assumes that there are no spaces in the last name). If there are, then I don't know how you could possibly identify them from a List<string>, which is another good reason to create a Player class with separate FirstName and Surname poperties...
public List<string> GetCombinedParticipants(List<string> players,
PlayerContainer allPlayers)
{
// Make a copy of the input list
var results = players.ToList();
for (int i = 0; i < allPlayers.Count; i++)
{
var fullName = $"{allPlayers.FindName(i)} {allPlayers.FindSurname(i)}";
if (!results.Contains(fullName)) results.Add(fullName);
}
// Order by last name, then by first name
return results
.OrderBy(name => name.Substring(name.LastIndexOf(" ") + 1))
.ThenBy(name => name.Substring(0, name.LastIndexOf(" ")))
.ToList();
}
I have an ArrayList with multiples items on it, everyone of them is a String divided by commas "loglogs", the three first items are the localization (Destin, lat and long). I need to insert the Strings of these loglogs in buttons depending on its localization (based on that three parameters) in the button Tooltip or text programatically. I have all the button creation but I have to add the strings but there are more loglogs than buttons so...
I need to "filter" the ArrayList into another ArrayList, filter it depending on these three inital coordinates, I want to create another ArrayList but appending the strings that are identical in their three first elements of the arrayList. That way I will combine the "loglogs" into another "loglogsCondensed", with all the "localization" unique so I can add this part to my button and index creation.
foreach (String log in logslogs)
{
String[] colContent = log.Split(','); //splited the content with commas
Loglog log = new Loglog(); //Loglog is a class of logs with information in specific columns
log.Destin = colContent[0];
log.Lat = Convert.ToChar(colContent[1]);
log.Long = colContent[2];
log.Barcode = colContent[6];
log.Source = colContent[7];
log.SampleName = colContent[9];
AllLogs.Add(log);
I need to pass from logslogs with 1000 memebers to an ArrayList with less items, where the ones with the same location based on the three first items are appended as one item.
Suposse this is kind of easy if you know how to code properly (not my case). A thousand thanks only for read this out, even more to the people who try to help.
Best,
I have the solution!, probably is not going to win any contest of cleaneness but it does what I need!. I create an index to filter comparing the items depending of the three coordinates: Destin, Long and Lat. If they are the same I remove the last item and put the appended line in the last place and so on...
int c = 0; //Just to go the first time
//We create an index to compare the former with the "actual"
//log in every loop of the "foreach"
String IndiceDestin0 = string.Empty;
String IndiceLat0 = string.Empty;
String IndiceLong0 = string.Empty;
String IndiceDestin1;
String IndiceLat1;
String IndiceLong1;
foreach (String log in logslogs)
{
String[] LongContent = log.Split(',');
Loglog log = new Loglog();
log.Destin = LongContent[0];
log.Lat = Convert.ToChar(LongContent[1]);
log.Long = LongContent[2];
log.Barcode = LongContent[6];
log.Source = LongContent[7];
log.DestDestinBarcode = LongContent[8];
log.SampleName = LongContent[9];
AllLogs.Add(log);
//This only works once, the first time because we don't have a "former" data to compare we have to bypass the comparison
if (c == 0)
{
IndiceDestin0 = LongContent[0];
IndiceLat0 = LongContent[1];
IndiceLong0 = LongContent[2];
c++;
}
else
{
IndiceDestin1 = LongContent[0];
IndiceLat1 = LongContent[1];
IndiceLong1 = LongContent[2];
if (IndiceDestin0.Equals(IndiceDestin1) && IndiceLat0.Equals(IndiceLat1) && IndiceLong0.Equals(IndiceLong1))
{
int last = logsToButtons.Count - 1;
string oldLog = logsToButtons[last].ToString();
string appendedLog = oldLog + log;
//We remove the last "single" log to add the aggregated log
logsToButtons.RemoveAt(last);
logsToButtons.Add(appendedLog);
}
else
{
logsToButtons.Add(log);
}
IndiceDestin0 = IndiceDestin1;
IndiceLat0 = IndiceLat1;
IndiceLong0 = IndiceLong1;
c++;
}
}
I get to have a shorter version of the array but appending together the ones that have the same coordenates, thank you everybody for your help, I know is messy but it works!
Best,
I have a C# program where I have a list (List<string>) of unique strings. These strings represent the name of different cases. It is not important what is is. But they have to be unique.
cases = new List<string> { "case1", "case3", "case4" }
Sometimes I read some cases saved in a text format into my program. Sometime the a case stored on file have the same name as a case in my program.I have to rename this new case. Lets say that the name of the case I load from a file is case1.
But the trouble is. How to rename this without adding a large random string. In my case it should ideally be called case2, I do not find any good algorithm which can do that. I want to find the smalles number I can add which make it unique.
i would use a HashSet that only accepts unique values.
List<string> cases = new List<string>() { "case1", "case3", "case4" };
HashSet<string> hcases = new HashSet<string>(cases);
string Result = Enumerable.Range(1, 100).Select(x => "case" + x).First(x => hcases.Add(x));
// Result is "case2"
in this sample i try to add elements between 1 and 100 to the hashset and determine the first sucessfully Add()
If you have a list of unique strings consider to use a HashSet<string> instead. Since you want incrementing numbers that sounds as if you actually should use a custom class instead of a string. One that contains a name and a number property. Then you can increment the number and if you want the full name (or override ToString) use Name + Number.
Lets say that class is Case you could fill a HashSet<Case>. HashSet.Add returns false on duplicates. Then use a loop which increments the number until it could be added.
Something like this:
var cases = new HashSet<Case>();
// fill it ...
// later you want to add one from file:
while(!cases.Add(caseFromFile))
{
// you will get here if the set already contained one with this name+number
caseFromFile.Number++;
}
A possible implementation:
public class Case
{
public string Name { get; set; }
public int Number { get; set; }
// other properties
public override string ToString()
{
return Name + Number;
}
public override bool Equals(object obj)
{
Case other = obj as Case;
if (other == null) return false;
return other.ToString() == this.ToString();
}
public override int GetHashCode()
{
return (ToString() ?? "").GetHashCode();
}
// other methods
}
The solution is quite simple. Get the max number of case currently stored in the list, increment by one and add the new value:
var max = myList.Max(x => Convert.ToInt32(x.Substring("case".Length))) + 1;
myList.Add("case" + max);
Working fiddle.
EDIT: For filling any "holes" within your collection you may use this:
var tmp = myList;
var firstIndex = Convert.ToInt32(myList[0].Substring("case".Length));
for(int i = firstIndex; i < tmp.Count; i++) {
var curIndex = Convert.ToInt32(myList[i].Substring("case".Length));
if (curIndex != i)
{
myList.Add("case" + (curIndex + 1));
break;
}
}
It checks for every element in your list if its number behind the case is equal to its index in the list. The loop is stopped at the very first element where the condition is broken and therefor you have a hole in the list.
I'm trying to achieve a super-fast search, and decided to rely heavily on caching to achieve this. The order of events is as follows;
1) Cache what can be cached (from entire database, around 3000 items)
2) When a search is performed, pull the entire result set out of the cache
3) Filter that result set based on the search criteria. Give each search result a "relevance" score.
4) Send the filtered results down to the database via xml to get the bits that can't be cached (e.g. prices)
5) Display the final results
This is all working and going at lightning speed, but in order to achieve (3) I've given each result a "relevance" score. This is just a member integer on each search result object. I iterate through the entire result set and update this score accordingly, then order-by it at the end.
The problem I am having is that the "relevance" member is retaining this value from search to search. I assume this is because what I am updating is a reference to the search results in the cache, rather than a new object, so updating it also updates the cached version. What I'm looking for is a tidy solution to get around this. What I've come up with so far is either;
a) Clone the cache when i get it.
b) Create a seperate dictionary to store relevances in and match them up at the end
Am I missing a really obvious and clean solution or should i go down one of these routes? I'm using C# and .net.
Hopefully it should be obvious from the description what I'm getting at, here's some code anyway; this first one is the iteration through the cached results in order to do the filtering;
private List<QuickSearchResult> performFiltering(string keywords, string regions, List<QuickSearchResult> cachedSearchResults)
{
List<QuickSearchResult> filteredItems = new List<QuickSearchResult>();
string upperedKeywords = keywords.ToUpper();
string[] keywordsArray = upperedKeywords.Split(' ');
string[] regionsArray = regions.Split(',');
foreach (var item in cachedSearchResults)
{
//Check for keywords
if (keywordsArray != null)
{
if (!item.ContainsKeyword(upperedKeywords, keywordsArray))
continue;
}
//Check for regions
if (regionsArray != null)
{
if (!item.IsInRegion(regionsArray))
continue;
}
filteredItems.Add(item);
}
return filteredItems.OrderBy(t=> t.Relevance).Take(_maxSearchResults).ToList<QuickSearchResult>();
}
and here is an example of the "IsInRegion" method of the QuickSearchResult object;
public bool IsInRegion(string[] regions)
{
int relevanceScore = 0;
foreach (var region in regions)
{
int parsedRegion = 0;
if (int.TryParse(region, out parsedRegion))
{
foreach (var thisItemsRegion in this.Regions)
{
if (thisItemsRegion.ID == parsedRegion)
relevanceScore += 10;
}
}
}
Relevance += relevanceScore;
return relevanceScore > 0;
}
And basically if i search for "london" i get a score of "10" the first time, "20" the second time...
If you use the NetDataContractSerializer to serialize your objects in the cache, you could use a [DataMember] attribute to control what gets serialized and what doesn't. For instance, you could store your temporarary calculated relevance value in a field that is not serialized.
I've produced a function to get back a random set of submissions depending on the amount passed to it, but I worry that even though it works now with a small amount of data when the large amount is passed through, it would become efficent and cause problems.
Is there a more efficent way of doing the following?
public List<Submission> GetRandomWinners(int id)
{
List<Submission> submissions = new List<Submission>();
int amount = (DbContext().Competitions
.Where(s => s.CompetitionId == id).FirstOrDefault()).NumberWinners;
for (int i = 1 ; i <= amount; i++)
{
bool added = false;
while (!added)
{
bool found = false;
var randSubmissions = DbContext().Submissions
.Where(s => s.CompetitionId == id && s.CorrectAnswer).ToList();
int count = randSubmissions.Count();
int index = new Random().Next(count);
foreach (var sub in submissions)
{
if (sub == randSubmissions.Skip(index).FirstOrDefault())
found = true;
}
if (!found)
{
submissions.Add(randSubmissions.Skip(index).FirstOrDefault());
added = true;
}
}
}
return submissions;
}
As I say, I have this fully working and bringing back the wanted result. It is just that I'm not liking the foreach and while checks in there and my head has just turned to mush now trying to come up with the above solution.
(Please read all the way through, as there are different aspects of efficiency to consider.)
There are definitely simpler ways of doing this - and in particular, you really don't need to perform the query for correct answers repeatedly. Why are you fetching randSubmissions inside the loop? You should also look at ElementAt to avoid the Skip and FirstOrDefault - and bear in mind that as randSubmissions is a list, you can use normal list operations, like the Count property and the indexer!
The option which comes to mind first is to perform a partial shuffle. There are loads of examples on Stack Overflow of a modified Fisher-Yates shuffle. You can modify that code very easily to avoid shuffling the whole list - just shuffle it until you've got as many random elements as you need. In fact, these days I'd probably implement that shuffle slightly differently to you could just call:
return correctSubmissions.Shuffle(random).Take(amount).ToList();
For example:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
for (int i = 0; i < elements.Length; i++)
{
// Find an item we haven't returned yet
int swapIndex = i + rng.Next(elements.Length - i);
T tmp = elements[i];
yield return elements[swapIndex];
elements[swapIndex] = tmp;
// Note that we don't need to copy the value into elements[i],
// as we'll never use that value again.
}
}
Given the above method, your GetRandomWinners method would look like this:
public List<Submission> GetRandomWinners(int competitionId, Random rng)
{
List<Submission> submissions = new List<Submission>();
int winnerCount = DbContext().Competitions
.Single(s => s.CompetitionId == competitionId)
.NumberWinners;
var correctEntries = DbContext().Submissions
.Where(s => s.CompetitionId == id &&
s.CorrectAnswer)
.ToList();
return correctEntries.Shuffle(rng).Take(winnerCount).ToList();
}
I would advise against creating a new instance of Random in your method. I have an article on preferred ways of using Random which you may find useful.
One alternative you may want to consider is working out the count of the correct entries without fetching them all, then work out winning entries by computing a random selection of "row IDs" and then using ElementAt repeatedly (with a consistent order). Alternatively, instead of pulling the complete submissions, pull just their IDs. Shuffle the IDs to pick n random ones (which you put into a List<T>, then use something like:
return DbContext().Submissions
.Where(s => winningIds.Contains(s.Id))
.ToList();
I believe this will use an "IN" clause in the SQL, although there are limits as to how many entries can be retrieved like this.
That way even if you have 100,000 correct entries and 3 winners, you'll only fetch 100,000 IDs, but 3 complete records. Hope that makes sense!