LINQ Filtering - Optimization - c#

I make one database trip to get a list of entities.
I then would like to separate this list into 2 lists, one for the entities that have not expired (using a start and end) which i call TopListings and another which are regular listings, those that have expired or have start/end date as null (the ones that are not TopListings)
I am not entirely sure which filtering is fasted to separate into 2 lists, should I get the toplist first, then filter second list based on what is NOT in the top list for second?
var listings = ListingAdapter.GetMapListings(criteria);
var topListings = listings.Where(x => x.TopStartDate >= DateTime.Now && x.TopExpireDate >= DateTime.Now);
//I AM NOT SURE WHAT THIS LINE SHOULD BE
var regularListings = listings.Where(x => x.TopStartDate < DateTime.Now || x.TopExpireDate < DateTime.Now || x.TopStartDate == null || x.TopExpireDate == null );
Thank you

You might want to use a LookUp
like this:
var lookup = listings.ToLookup(x => x.TopStartDate >= DateTime.Now && x.TopExpireDate >= DateTime.Now);
var topListings = lookup[true];
var regularListings = lookup[false]; // I assume everything not a topListing is a regular listing.
If this isnt enough, you could create an enum
enum ListingType { Top, Regular, WhatEver };
...
var lookup = listings.ToLookUp(determineListingType); // pass a methoddelegate that determines the listingtype for an element.
...
var topListings = lookup[ListingType.Top];
var regularListings = lookup[ListingType.Regular];
var whateverListings = lookup[ListingType.WhatEver];

In this case, it would probably be easier to use a loop, instead of Linq operators:
var topListings = new List<Listing>();
var regularListings = new List<Listing>();
foreach (var x in listings)
{
if (x.TopStartDate >= DateTime.Now && x.TopExpireDate >= DateTime.Now)
topListings.Add(x);
else
regularListings.Add(x);
}
This is also more efficient, because the list is enumerated only once.

Take a look at the 'Except' operator to make things a little easier. You might have to add a .ToList() on topListings first though.
var regularListings = listings.Except(topListings);
http://blogs.msdn.com/b/charlie/archive/2008/07/12/the-linq-set-operators.aspx

Make use of regular foreach loop that's straight forward. You can iterate through listing with one go and add items to appropriate collections. If you are LINQ kind of guy, ForEach extension is what you are looking for:
var topListings = new List<Listing>();
var regularListings = new List<Listing>();
listing.ForEach(item=>{
if (x.TopStartDate < DateTime.Now
|| // I've inverted the condition, since it is faster-one or two conditions will be checked, instead of always two
x.TopExpireDate < DateTime.Now)
regularListings.Add(x);
else
topListings.Add(x);
});

Related

Most efficient way to search enumerable

I am writing a small program that takes in a .csv file as input with about 45k rows. I am trying to compare the contents of this file with the contents of a table on a database (SQL Server through dynamics CRM using Xrm.Sdk if it makes a difference).
In my current program (which takes about 25 minutes to compare - the file and database are the exact same here both 45k rows with no differences), I have all existing records on the database in a DataCollection<Entity> which inherits Collection<T> and IEnumerable<T>
In my code below I am filtering using the Where method and then doing a logic based the count of matches. The Where seems to be the bottleneck here. Is there a more efficient approach than this? I am by no means a LINQ expert.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age);
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
EDIT: I can confirm that all existingRecords are in memory before this code is executed. There is no IO or DB access in the above loop.
Himbrombeere is right, you should execute the query first and put the result into a collection before you use Any, Count, AddRange or whatever method will execute the query again. In your code it's possible that the query is executed 5 times in every loop iteration.
Watch out for the term deferred execution in the documentation. If a method is implemented in that way, then it means that this method can be used to construct a LINQ query(so you can chain it with other methods and at the end you have a query). But only methods that don't use deferred execution like Count, Any, ToList(or a plain foreach) will actually execute it. If you dont want that the whole query is executed everytime and you have to access this query multiple times , it's better to store the result in a collection(.f.e with ToList).
However, you could use a different approach which should be much more efficient, a Lookup<TKey, TValue> which is similar to a dictionary and can be used with an anonymous type as key:
var lookup = existingRecords.Entities.ToLookup(r => new
{
fund = r["field_1"].ToString(),
bps = Convert.ToDecimal(r["field_2"]),
withdrawalPct = Convert.ToDecimal(r["field_3"]),
percentile = Convert.ToDecimal(r["field_4"]),
age = Convert.ToDecimal(r["field_5"])
});
Now you can access this lookup in the loop very efficiently.
foreach (var record in inputDataLines)
{
var fields = record.Split(',');
var fund = fields[0];
var bps = Convert.ToDecimal(fields[1]);
var withdrawalPct = Convert.ToDecimal(fields[2]);
var percentile = Convert.ToInt32(fields[3]);
var age = Convert.ToInt32(fields[4]);
var bombOutTerm = Convert.ToDecimal(fields[5]);
var matchingRows = lookup[new {fund, bps, withdrawalPct, percentile, age}].ToList();
entitiesFound.AddRange(matchingRows);
if (matchingRows.Count() == 0)
{
rowsToAdd.Add(record);
}
else if (matchingRows.Count() == 1)
{
if (Convert.ToDecimal(matchingRows.First()["field_6"]) != bombOutTerm)
{
rowsToUpdate.Add(record);
entitiesToUpdate.Add(matchingRows.First());
}
}
else
{
entitiesToDelete.AddRange(matchingRows);
rowsToAdd.Add(record);
}
}
Note that this will work even if the key does not exist(an empty list is returned).
Add a ToList after your Convert.ToDecimal(r["field_5"]) == age);-line to force an immediate execution of the query.
var matchingRows = existingRecords.Entities.Where(r => r["field_1"].ToString() == fund
&& Convert.ToDecimal(r["field_2"]) == bps
&& Convert.ToDecimal(r["field_3"]) == withdrawalPct
&& Convert.ToDecimal(r["field_4"]) == percentile
&& Convert.ToDecimal(r["field_5"]) == age)
.ToList();
The Where doesn´t actually execute your query, it just prepares it. The actual execution happens later in a delayed way. In your case that happens when calling Count which itself will iterate the entire collection of items. But if the first condition fails, the second one is checked leading to a second iteration of the complete collection when calling Count. In this case you actually execute that query a thrird time when calling matchingRows.First().
When forcing an immediate execution you´re executing the query only once and thus iterating the entire collection only once also which will decrease your overall-time.
Another option, which is basically along the same lines as the other answers, is to prepare your data first, so that you're not repeatedly calling things like r["field_2"] (which are relatively slow to look up).
This is a (1) clean your data, (2) query/join your data, (3) process your data approach.
Do this:
(1)
var inputs =
inputDataLines
.Select(record =>
{
var fields = record.Split(',');
return new
{
fund = fields[0],
bps = Convert.ToDecimal(fields[1]),
withdrawalPct = Convert.ToDecimal(fields[2]),
percentile = Convert.ToInt32(fields[3]),
age = Convert.ToInt32(fields[4]),
bombOutTerm = Convert.ToDecimal(fields[5]),
record
};
})
.ToArray();
var entities =
existingRecords
.Entities
.Select(entity => new
{
fund = entity["field_1"].ToString(),
bps = Convert.ToDecimal(entity["field_2"]),
withdrawalPct = Convert.ToDecimal(entity["field_3"]),
percentile = Convert.ToInt32(entity["field_4"]),
age = Convert.ToInt32(entity["field_5"]),
bombOutTerm = Convert.ToDecimal(entity["field_6"]),
entity
})
.ToArray()
.GroupBy(x => new
{
x.fund,
x.bps,
x.withdrawalPct,
x.percentile,
x.age
}, x => new
{
x.bombOutTerm,
x.entity,
});
(2)
var query =
from i in inputs
join e in entities on new { i.fund, i.bps, i.withdrawalPct, i.percentile, i.age } equals e.Key
select new { input = i, matchingRows = e };
(3)
foreach (var x in query)
{
entitiesFound.AddRange(x.matchingRows.Select(y => y.entity));
if (x.matchingRows.Count() == 0)
{
rowsToAdd.Add(x.input.record);
}
else if (x.matchingRows.Count() == 1)
{
if (x.matchingRows.First().bombOutTerm != x.input.bombOutTerm)
{
rowsToUpdate.Add(x.input.record);
entitiesToUpdate.Add(x.matchingRows.First().entity);
}
}
else
{
entitiesToDelete.AddRange(x.matchingRows.Select(y => y.entity));
rowsToAdd.Add(x.input.record);
}
}
I would suspect that this will be the among the fastest approaches presented.

How to use if else in linq

i have a table in which i have record of users.table have field lastLoginMonth and LastLoginYear ...i want to fetch that user who have login time more than 5 month ..but here i found two case ..
1)current year and lastLoginYear same
2)current year and lastLoginYear different
to handle this i have to use different conditions but i don't know how to handle this in query.....
var year = db.UserManagers.ToList();
foreach (var y in year)
{
if (y.LastLoginYear == mydate.Year)
{
var modell = (from ummm in db.UserManagers
where ((mydate.Month - ummm.LastLoginMonth) >5
&& ummm.LoginWarning==false)
select ummm).ToList();
return View(modell);
}
var model = (from ummm in db.UserManagers
where (((12 - y.LastLoginMonth) + mydate.Month) >5
&& ummm.LoginWarning == false)
select ummm).ToList();
return View(model);
}
how i can organize this query in a simple way ...
Use ternary operator:
var modell = (from ummm in db.UserManagers
where (((y.LastLoginYear == mydate.Year)
? ((mydate.Month - ummm.LastLoginMonth) >5)
: ((12 - y.LastLoginMonth) + mydate.Month) >5)
&& ummm.LoginWarning==false)
select ummm).ToList();
Take a look at this example to understand what does this mean like:
var list = new List<string> { "1", "abc", "5"};
var sel = (from s in list where ((s.Length > 1) ? true : false) select s);
As you can see, we take each string s stored in the list and apply to it the next filter: If it’s Length more then 1, we take it (as it will be where true), otherwise, we don’t take it. Thus we will take only those strings that have Length more then 1.
Also pay attention that you make return inside the foreach loop. That means that the foreach will iterate only 1 time and then will exit on the return you wrote. So you might expect this code to make something different from what you have written.
First approach that came to my mind could be with simple inline if:
var year = db.UserManagers.ToList();
foreach (var y in year)
{
var model = (from ummm in db.UserManagers
where (((y.LastLoginYear == mydate.Year)?(mydate.Month - ummm.LastLoginMonth):((12 - y.LastLoginMonth) + mydate.Month)) >5 && ummm.LoginWarning==false)
select ummm).ToList();
return View(model);
}
}
I assume that you realize that a return inside a foreach loop would make it execute only once and then return the result?

Checking If Intervals Are of Equal Length in a List

I have a List<foo> that has two properties:
start_time and end_time
Assume that we have 100 records in that list. How can I check if all intervals are of equal length? In other terms, I'd like to know if the values of the difference between end and start times for all foo objects are equal.
Where (value = end_time-start_time).
Is it possible to achieve this in a single LINQ line?
Thanks, appreciate it.
Sure, you can write something like this:
var list = new List<foo>();
var areAllEqual = list.GroupBy(l => (l.end_time - l.start_time)).Count() == 1;
Alternatively, if you want to do more with that information:
var differences = list.GroupBy(l => (l.end_time - l.start_time)).ToList();
var numDifferences = differences.Count();
var areAllEqual = numDifferences == 1;
var firstDifference = differences.First().Key;
var allDifferences = differences.Select(g => g.Key);
Something like this should work.
var first = items.First;
var duration = first.end_time - first.start_time;
var allEqual = items.All(i => duration == (i.end_time - i.start_time))

Find a dupe in a List with Linq

I am building a list of Users. each user has a FullName.
I'm comparing users on FullName.
i'm taking a DataTable with the users from the old DB and parsing them to a 'User' Object. and adding them in a List<Users>. which in the code is a List<Deelnemer>
It goes like this:
List<Deelnemer> tempDeeln = new List<Deelnemer>();
bool dupes = false;
foreach (DataRow rij in deeln.Rows) {
Deelnemer dln = new Deelnemer();
dln.Dln_Creatiedatum = DateTime.Now;
dln.Dln_Email = rij["Ler_Email"].ToString();
dln.Dln_Inst_ID = inst.Inst_ID;
dln.Dln_Naam = rij["Ler_Naam"].ToString();
dln.Dln_Username = rij["LerLog_Username"].ToString();
dln.Dln_Voornaam = rij["Ler_Voornaam"].ToString();
dln.Dln_Update = (DateTime)rij["Ler_Update"];
if (!dupes && tempDeeln.Count(q => q.FullName.ToLower() == dln.FullName.ToLower()) > 0)
dupes = true;
tempDeeln.Add(dln);
}
then when the foreach is done, i look if the bool is true, check which ones are the doubles, and remove the oldest ones.
now, i think this part of the code is very bad:
if (!dupes && tempDeeln.Count(q => q.FullName.ToLower() == dln.FullName.ToLower()) > 0)
it runs for every user added, and runs over all the already created users.
my question: how would I optimize this.
You can use a set such as a HashSet<T> to track unique names observed so far. A hash-set supports constant-time insertion and lookup, so a full linear-search will not be required for every new item unlike you exising solution.
var uniqueNames = new HashSet<string>(StringComparer.CurrentCultureIgnoreCase);
...
foreach(...)
{
...
if(!dupes)
{
// Expression is true only if the set already contained the string.
dupes = !uniqueNames.Add(dln.FullName);
}
}
If you want to "remove" dupes (i.e. produce one representative element for each name) after you have assembled the list (without using a hash-set), you can do:
var distinctItems = tempDeeln.GroupBy(dln => dln.FullName,
StringComparer.CurrentCultureIgnoreCase)
.Select(g => g.First());
Try this out--
http://blogs.msdn.com/b/ericwhite/archive/2008/08/19/find-duplicates-using-linq.aspx
Count will go through whole set of items. Try to use Any, this way it will only check for first occurrence of the item.
if (!dupes && tempDeeln.Any(q => q.FullName.ToLower() == dln.FullName.ToLower()))
dupes = true;

Foreach loop problem for IQueryable object

Can we use foreach loop for IQueryable object?
I'd like to do something as follow:
query = IQueryable<Myclass> = objDataContext.Myclass; // objDataContext is an object of LINQ datacontext class
int[] arr1 = new int[] { 3, 4, 5 };
foreach (int i in arr1)
{
query = query.Where(q => (q.f_id1 == i || q.f_id2 == i || q.f_id3 == i));
}
I get a wrong output as each time value of i is changed.
The problem you're facing is deferred execution, you should be able to find a lot of information on this but basically none of the code s being executed until you actually try to read data from the IQueryable (Convert it to an IEnumerable or a List or other similar operations). This means that this all happens after the foreach is finished when i is set to the final value.
If I recall correctly one thing you can do is initialize a new variable inside the for loop like this:
foreach (int i in arr1)
{
int tmp = i;
query = query.Where(q => (q.f_id1 == tmp || q.f_id2 == tmp || q.f_id3 == tmp));
}
By putting it in a new variable which is re-created each loop, the variable should not be changed before you execute the IQueryable.
You dont need a for each, try it like this:
query = objDataContext.Myclass.Where(q => (arr1.Contains(q.f_id1) || arr1.Contains(q.f_id2) || arr1.Contains(q.f_id3));
this is because "i" is not evaluated until you really use the iterate the query collectionif not by that time I believe "i" will be the last.

Categories