Searching in ListArray C# - c#

What is the fastest method for searching data from list array in C#?
My code:
public class fruits
{
public string Initial;
public string Fruit;
public fruits(string initials, string names)
{
Initial = initials;
Fruit = names;
}
}
// load
List<fruits> List = new List<fruits>();
List.Add(new fruits("A", "Apple"));
List.Add(new fruits("P", "Pineapple"));
List.Add(new fruits("AP", "Apple Pineapple"));
//combo box select text
var text = combobox.SelectText();
for (int i=0; i<list.Count(); i++)
{
if (list[i].Fruit == text)
{
MessageBox.Show(list[i].Initial);
}
}
I know this search method is not good, if list data contains too much data.

If you want a "fast" solution, you should use a foreach instead of LINQ. This solution can improve your perfomance a lot:
fruits firstOrDefault = null:
foreach (fruits f in List)
{
if (f.Fruit == text)
{
FirstOrDefault = f;
break;
}
}
You can get few more information about the LINQ performance in posts like
Is a LINQ statement faster than a 'foreach' loop?
http://geekswithblogs.net/BlackRabbitCoder/archive/2010/04/23/c-linq-vs-foreach---round-1.aspx
https://blogs.msdn.microsoft.com/ericlippert/2009/05/18/foreach-vs-foreach/

You can use linq
var result = List.FirstOrDefault(q => q.Fruit == text );
MessageBox.Show(result.Initial);

The best (and only) way to tell what method is fastest for a certain situation is to actually benchmark/measure it with different algorithms. You already have two answers/approaches here (LINQ and foreach). Time both of them and then pick the faster one.
Or in other words: Measuring your code gives you an advantage over those people who think they are too smart to measure. ;)
To speed things up further you might want to consider to keep the list sorted and then do a binary search on the list. It increases the time for insertion, because you have to sort the list after inserts, but it should speed up the search process. But then again: Do not just take my word for it, measure it!

Related

If it possible accelerate From-Where-Select method?

I have two var of code:
first:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var usersLookUp = list_fiodat.ToLookup(u => u.fio, u => u.dat); // create map
var dates = usersLookUp[FullName];
if (dates.Count() > 0)
{
foreach (var dt in dates)
{
if (dt == BirthDate) return true;
}
}
}
and second:
struct pair_fiodat {string fio; string dat;}
List<pair_fiodat> list_fiodat = new List<pair_fiodat>();
// list filled 200.000 records, omitted.
foreach(string fname in XML_files)
{
// get FullName and Birthday from file. Omitted.
var members = from s in list_fiodat where s.fio == FullName & s.dat == Birthdate select s;
if (members.Count() > 0 return true;
}
They make the same job - searching user by name and birthday.
The first one work very quick.
The second is very slowly (10x-50x)
Tell me please if it possible accelerate the second one?
I mean may be the list need in special preparing?
I tried sorting: list_fiodat_sorted = list_fiodat.OrderBy(x => x.fio).ToList();, but...
I skip your first test and change Count() to Any() (count iterate all list while any stop when there are an element)
public bool Test1(List<pair_fiodat> list_fiodat)
{
foreach (string fname in XML_files)
{
var members = from s in list_fiodat
where s.fio == fname & s.dat == BirthDate
select s;
if (members.Any())
return true;
}
return false;
}
If you want optimize something, you must leave comfortable things that offer the language to you because usually this things are not free, they have a cost.
For example, for is faster than foreach. Is a bit more ugly, you need two sentences to get the variable, but is faster. If you iterate a very big collection, each iteration sum.
LINQ is very powerfull and it's wonder work with it, but has a cost. If you change it for another "for", you save time.
public bool Test2(List<pair_fiodat> list_fiodat)
{
for (int i = 0; i < XML_files.Count; i++)
{
string fname = XML_files[i];
for (int j = 0; j < list_fiodat.Count; j++)
{
var s = list_fiodat[j];
if (s.fio == fname & s.dat == BirthDate)
{
return true;
}
}
}
return false;
}
With normal collections there aren't difference and usually you use foeach, LINQ... but in extreme cases, you must go to low level.
In your first test, ToLookup is the key. It takes a long time. Think about this: you are iterating all your list, creating and filling the map. It's bad in any case but think about the case in which the item you are looking for is at the start of the list: you only need a few iterations to found it but you spend time in each of the items of your list creating the map. Only in the worst case, the time is similar and always worse with the map creation due to the creation itself.
The map is interesting if you need, for example, all the items that match some condition, get a list instead found a ingle item. You spend time creating the map once, but you use the map many times and, in each time, you save time (map is "direct access" against the for that is "sequencial").

How to sort a list of objects by a property

I've got this chunk of code:
DataTable dt = new DataTable();
dt.Columns.Add("Status");
dt.Columns.Add("File");
dt.Columns.Add("Revision");
int i = 0;
foreach (SvnStatusEventArgs status in statuses) // statuses is a Collection
{
dt.Rows.Add();
switch (status.LocalContentStatus)
{
case SvnStatus.NotVersioned:
dt.Rows[i]["Status"] = "Not Versioned";
break;
default:
dt.Rows[i]["Status"] = status.LocalContentStatus.ToString();
break;
}
dt.Rows[i]["File"] = status.Path;
foreach(SvnInfoEventArgs info in infos) //infos is a Collection
{
if (status.Path.Equals(info.Path))
{
dt.Rows[i]["Revision"] = info.Revision;
break;
}
}
i++;
}
statuses and infos can have up to 20K rows each, however, so my nested foreach could take a long time.
I thought I could maybe speed this up if I converted these Collections to Lists and then try to Sort them both by Path.
Looking over the MSDN Page for the Sort method, I have no idea how I'd go about comparing the Path field from SvnStatusEventArgs[n] and SvnStatusEventArgs[n+1]. Then I also started to wonder, since I'd be iterating in their entirety over both of these groups of objects and sorting them anyways, would that really be any more efficient than my existing code? I suppose it would be n*2 as opposed to n*n, right?
For what it's worth, the Path field I'm trying to sort by is just a string.
You could create a Dictionary<string, int>(the key is the path and the value the revision).
Dictionary<string, int> pathRevisions = infos
.GroupBy(info => info.Path)
.ToDictionary(group => group.Key, group => group.First().Revision);
....
in the loop:
int revision;
if(pathRevisions.TryGetValue(status.Path, out revision))
dt.Rows[i].SetField("Revision", revision);
Your question was fairly unclear but since you said in the comments this is what you meant
foreach (SvnStatusEventArgs status
in statuses
.OrderBy(x => x.Path))
This is a very basic approach though. If you want a more optimal one you should use Tim Schmelter's solution.
The best would just be to make a dictionary on infos - with key as path. That would be the most efficient overall.

List operations getting a list's index

I have a class in my Windows application like so:
public class Pets
{
String Name {get;set;}
int Amount {get;set;}
}
In one of my other classes i made a List of that class like so.
List<Pets> myPets = new List<Pets>();
myPets.Add(new Pets{ Name = "Fish", Amount = 8});
myPets.Add(new Pets{ Name = "Dogs", Amount = 2});
myPets.Add(new Pets{ Name = "Cats", Amount = 2});
Is there a way i can get the Index of the Pets whos Name = "Fish"?
I realize i can do this
int pos = 0;
for(int x = 0; x<myPets.Count;x++)
{
if( myPets[x].Name == "Fish")
{
pos = x;
}
}
But in the case that i have ALOT of items in myPets it would take long to loop through them to find the one i am looking for. Is there another way to complete the task above. That would make my application run quicker? In the case that myPets has a lot of items in it.
The way you have your data structured at the moment does not lend itself well to searching by pet's name if the list is large.
So iterating manually like you suggest and what FindIndex is doing is known as a linear search which is a brute force algorithm. If you have N items in your collection, the worst case scenario to find an item is N iterations. This is known as O(N) using the Big O Notation. The speed of search grows linearly with the number of items within your collection.
For faster searching you need to change to a different data structure (like a Hashtable), use a database or implement a different search algorithm, such as a binary search( O(log(n)) complexity ).
Have a look at this question for an example:
Can LINQ use binary search when the collection is ordered?
If you want to find index only to access item of List you can use Dictionary instead.
var pets = new Dictionary<string, int>();
pets.Add("Dogs", 2);
pets.Add("Fish", 8);
int amount = pets["Fish"];

Cache only parts of an object

I'm trying to achieve a super-fast search, and decided to rely heavily on caching to achieve this. The order of events is as follows;
1) Cache what can be cached (from entire database, around 3000 items)
2) When a search is performed, pull the entire result set out of the cache
3) Filter that result set based on the search criteria. Give each search result a "relevance" score.
4) Send the filtered results down to the database via xml to get the bits that can't be cached (e.g. prices)
5) Display the final results
This is all working and going at lightning speed, but in order to achieve (3) I've given each result a "relevance" score. This is just a member integer on each search result object. I iterate through the entire result set and update this score accordingly, then order-by it at the end.
The problem I am having is that the "relevance" member is retaining this value from search to search. I assume this is because what I am updating is a reference to the search results in the cache, rather than a new object, so updating it also updates the cached version. What I'm looking for is a tidy solution to get around this. What I've come up with so far is either;
a) Clone the cache when i get it.
b) Create a seperate dictionary to store relevances in and match them up at the end
Am I missing a really obvious and clean solution or should i go down one of these routes? I'm using C# and .net.
Hopefully it should be obvious from the description what I'm getting at, here's some code anyway; this first one is the iteration through the cached results in order to do the filtering;
private List<QuickSearchResult> performFiltering(string keywords, string regions, List<QuickSearchResult> cachedSearchResults)
{
List<QuickSearchResult> filteredItems = new List<QuickSearchResult>();
string upperedKeywords = keywords.ToUpper();
string[] keywordsArray = upperedKeywords.Split(' ');
string[] regionsArray = regions.Split(',');
foreach (var item in cachedSearchResults)
{
//Check for keywords
if (keywordsArray != null)
{
if (!item.ContainsKeyword(upperedKeywords, keywordsArray))
continue;
}
//Check for regions
if (regionsArray != null)
{
if (!item.IsInRegion(regionsArray))
continue;
}
filteredItems.Add(item);
}
return filteredItems.OrderBy(t=> t.Relevance).Take(_maxSearchResults).ToList<QuickSearchResult>();
}
and here is an example of the "IsInRegion" method of the QuickSearchResult object;
public bool IsInRegion(string[] regions)
{
int relevanceScore = 0;
foreach (var region in regions)
{
int parsedRegion = 0;
if (int.TryParse(region, out parsedRegion))
{
foreach (var thisItemsRegion in this.Regions)
{
if (thisItemsRegion.ID == parsedRegion)
relevanceScore += 10;
}
}
}
Relevance += relevanceScore;
return relevanceScore > 0;
}
And basically if i search for "london" i get a score of "10" the first time, "20" the second time...
If you use the NetDataContractSerializer to serialize your objects in the cache, you could use a [DataMember] attribute to control what gets serialized and what doesn't. For instance, you could store your temporarary calculated relevance value in a field that is not serialized.

Is there a more efficent way to randomise a set of LINQ results?

I've produced a function to get back a random set of submissions depending on the amount passed to it, but I worry that even though it works now with a small amount of data when the large amount is passed through, it would become efficent and cause problems.
Is there a more efficent way of doing the following?
public List<Submission> GetRandomWinners(int id)
{
List<Submission> submissions = new List<Submission>();
int amount = (DbContext().Competitions
.Where(s => s.CompetitionId == id).FirstOrDefault()).NumberWinners;
for (int i = 1 ; i <= amount; i++)
{
bool added = false;
while (!added)
{
bool found = false;
var randSubmissions = DbContext().Submissions
.Where(s => s.CompetitionId == id && s.CorrectAnswer).ToList();
int count = randSubmissions.Count();
int index = new Random().Next(count);
foreach (var sub in submissions)
{
if (sub == randSubmissions.Skip(index).FirstOrDefault())
found = true;
}
if (!found)
{
submissions.Add(randSubmissions.Skip(index).FirstOrDefault());
added = true;
}
}
}
return submissions;
}
As I say, I have this fully working and bringing back the wanted result. It is just that I'm not liking the foreach and while checks in there and my head has just turned to mush now trying to come up with the above solution.
(Please read all the way through, as there are different aspects of efficiency to consider.)
There are definitely simpler ways of doing this - and in particular, you really don't need to perform the query for correct answers repeatedly. Why are you fetching randSubmissions inside the loop? You should also look at ElementAt to avoid the Skip and FirstOrDefault - and bear in mind that as randSubmissions is a list, you can use normal list operations, like the Count property and the indexer!
The option which comes to mind first is to perform a partial shuffle. There are loads of examples on Stack Overflow of a modified Fisher-Yates shuffle. You can modify that code very easily to avoid shuffling the whole list - just shuffle it until you've got as many random elements as you need. In fact, these days I'd probably implement that shuffle slightly differently to you could just call:
return correctSubmissions.Shuffle(random).Take(amount).ToList();
For example:
public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> source, Random rng)
{
T[] elements = source.ToArray();
for (int i = 0; i < elements.Length; i++)
{
// Find an item we haven't returned yet
int swapIndex = i + rng.Next(elements.Length - i);
T tmp = elements[i];
yield return elements[swapIndex];
elements[swapIndex] = tmp;
// Note that we don't need to copy the value into elements[i],
// as we'll never use that value again.
}
}
Given the above method, your GetRandomWinners method would look like this:
public List<Submission> GetRandomWinners(int competitionId, Random rng)
{
List<Submission> submissions = new List<Submission>();
int winnerCount = DbContext().Competitions
.Single(s => s.CompetitionId == competitionId)
.NumberWinners;
var correctEntries = DbContext().Submissions
.Where(s => s.CompetitionId == id &&
s.CorrectAnswer)
.ToList();
return correctEntries.Shuffle(rng).Take(winnerCount).ToList();
}
I would advise against creating a new instance of Random in your method. I have an article on preferred ways of using Random which you may find useful.
One alternative you may want to consider is working out the count of the correct entries without fetching them all, then work out winning entries by computing a random selection of "row IDs" and then using ElementAt repeatedly (with a consistent order). Alternatively, instead of pulling the complete submissions, pull just their IDs. Shuffle the IDs to pick n random ones (which you put into a List<T>, then use something like:
return DbContext().Submissions
.Where(s => winningIds.Contains(s.Id))
.ToList();
I believe this will use an "IN" clause in the SQL, although there are limits as to how many entries can be retrieved like this.
That way even if you have 100,000 correct entries and 3 winners, you'll only fetch 100,000 IDs, but 3 complete records. Hope that makes sense!

Categories