Why does this linq extension method hit the database twice? - c#

I have an extension method called ToListIfNotNullOrEmpty(), which is hitting the DB twice, instead of once. The first time it returns one result, the second time it returns all the correct results.
I'm pretty sure the first time it hits the database, is when the .Any() method is getting called.
here's the code.
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value.IsNullOrEmpty())
{
return null;
}
if (value is IList<T>)
{
return (value as IList<T>);
}
return new List<T>(value);
}
public static bool IsNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value != null)
{
return !value.Any();
}
return true;
}
I'm hoping to refactor it so that, before the .Any() method is called, it actually enumerates through the entire list.
If i do the following, only one DB call is made, because the list is already enumerated.
var pewPew = (from x in whatever
select x)
.ToList() // This enumerates.
.ToListIsNotNullOrEmpty(); // This checks the enumerated result.
I sorta don't really want to call ToList() then my extension method.
Any ideas, folks?

I confess that I see little point in this method. Surely if you simply do a ToList(), a check to see if the list is empty suffices as well. It's arguably harder to handle the null result when you expect a list because then you always have to check for null before you iterate over it.
I think that:
var query = (from ...).ToList();
if (query.Count == 0) {
...
}
works as well and is less burdensome than
var query = (from ...).ToListIfNotNullOrEmpty();
if (query == null) {
...
}
and you don't have to implement (and maintain) any code.

How about something like this?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value == null)
return null;
var list = value.ToList();
return (list.Count > 0) ? list : null;
}

To actually answer your question:
This method hits the database twice because the extension methods provided by the System.Linq.Enumerable class exhibit what is called deferred execution. Essentially, this is to eliminate the need for constructing a string of temporarily cached collections for every part of a query. To understand this, consider the following example:
var firstMaleTom = people
.Where(p => p.Gender = Gender.Male)
.Where(p => p.FirstName == "Tom")
.FirstOrDefault();
Without deferred execution, the above code might require that the entire collection people be enumerated over, populating a temporary buffer array with all the individuals whose Gender is Male. Then it would need to be enumerated over again, populating another buffer array with all of the individuals from the first buffer whose first name is Tom. After all that work, the last part would return the first item from the resulting array.
That's a lot of pointless work. The idea with deferred execution is that the above code really just sets up the firstMaleTom variable with the information it needs to return what's being requested with the minimal amount of work.
Now, there's a flip side to this: in the case of querying a database, deferred execution means that the database gets queried when the return value is evaluated. So, in your IsNullOrEmpty method, when you call Any, the value parameter is actually being evaluated right then and there -- hence a database query. After this, in your ToListIfNotNullOrEmpty method, the line return new List<T>(value) also evaluates the value parameter -- because it's enumerating over the values and adding them to the newly created List<T>.

You could stick the .ToList() call inside the extension, the effect is slightly different, but does this still work in the cases you have?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if(value == null)
{
return null;
}
var result = value.ToList();
return result.IsNullOrEmpty() ? null : result;
}

Related

How to pass a predicate as parameter c#

How can I pass a predicate into a method but also have it work if no predicate is passed? I thought maybe something like this, but it doesn't seem to be correct.
private bool NoFilter() { return true; }
private List<thing> GetItems(Predicate<thing> filter = new Predicate<thing>(NoFilter))
{
return rawList.Where(filter).ToList();
}
private List<thing> GetItems(Func<thing, bool> filter = null)
{
return rawList.Where(filter ?? (s => true)).ToList();
}
In this expression s => true is the fallback filter which is evaluated if the argument filter is null. It just takes each entry of the list (as s) and returns true.
There are two parts to this.
First, you need to adjust the NoFilter() function to be compatible with Predicate<T>. Notice the latter is generic, but the former is not. Make NoFilter() look like this:
private bool NoFilter<T>(T item) { return true; }
I know you never use the generic type argument, but it's necessary to make this compatible with your predicate signature.
For fun, you could also define NoFilter this way:
private Predicate<T> NoFilter = x => true;
Now the second part: we can look at using the new generic method as the default argument for GetItems(). The trick here is you can only use constants. While NoFilter() will never change, from the compiler's view that's not quite the same things a a formal constant. In fact, there is only one possible constant you can use for this: null. That means your method signature must look like this:
private List<thing> GetItems(Predicate<thing> filter = null)
Then you can check for null at the beginning of your function and replace it with NoFilter:
private List<thing> GetItems(Predicate<thing> filter = null)
{
if (filter == null) filter = NoFilter;
return rawList.Where(filter).ToList();
}
And if you also do want to explicitly pass this to the method when calling it, that would look like this:
var result = GetItems(NoFilter);
That should fully answer the original question, but I don't want to stop here. Let's look deeper.
Since you need the if condition anyway now, at this point I would tend to remove the NoFilter<T>() method entirely, and just do this:
private IEnumerable<thing> GetItems(Predicate<thing> filter = null)
{
if (filter == null) return rawList;
return rawList.Where(filter);
}
Note that I also changed the return type and removed the ToList() call at the end. If you find yourself calling ToList() at the end of a function just to match a List<T> return type, it's almost always much better to change the method signature to return IEnumerable<T> instead. If you really need a list (and usually, you don't), you can always call ToList() after calling the function.
This change makes your method more useful, by giving you a more abstract type that will be more compatible with other interfaces, and it potentially sets you up for a significant performance bump, both in terms of lowered memory use and in terms of lazy evaluation.
One final addition here is, if you do pare down to just IEnumerable, we can see now this method does not really provide much value at all beyond the base rawItems field. You might look at converting to a property, like this:
public IEnumerable<T> Items {get {return rawList;}}
This still allows the consumer of your type use a predicate (or not) if they want via the existing .Where() method, while also continuing to hide the underlying raw data (you can't directly just call .Add() etc on this).

Is returning a null generic list preferred when no data is found?

I have client code that is virtually identical in four different methods (the difference being the particular Web API RESTful method being called and the corresponding manipulated generic list).
In three of the four cases, I can break out of the while loop (see How can I safely loop until there is nothing more to do without using a "placeholder" while conditon?) like so:
if (arr.Count <= 0) break;
...but in one case, that causes an NRE once there is no more data returned from the RESTful method. In that method, I have to use:
if (null == arr) break;
I now know why, thus this:
UPDATE
The reason for the different behavior was because the Repository code differs. Therefore, I am changing the question from "Why would checking JArray.Count work in most instances, but cause an NRE in one specific case?"
Here is how it's done in the three methods where checking for array count works:
public IEnumerable<Subdepartment> Get(int ID, int CountToFetch)
{
return subdepartments.Where(i => i.Id > ID).Take(CountToFetch);
}
...and here's the "alternate version" contained in RedemptionRepository:
public IEnumerable<Redemption> Get(int ID, int CountToFetch)
{
IEnumerable<Redemption> redempts = null;
if (redemptions.Where(i => i.Id > ID).Take(CountToFetch).Count() > 0)
{
redempts = redemptions.Where(i => i.Id > ID).Take(CountToFetch);
}
return redempts;
}
So, to be consistent with all four methods, I can either make all the other Repository methods like the above (returning null when no data is found), and change the test condition in the client to nullification, OR I can revert the Redemption repository code to be like it was formerly/like the others.
So the question: Which is the preferred method (no pun intended)?
You should definitely change the last method to match the previous ones:
public IEnumerable<Redemption> Get(int ID, int CountToFetch)
{
return redemptions.Where(i => i.Id > ID).Take(CountToFetch);
}
And NullReferenceException is not the only reason. Because LINQ is lazy and execution is deferred the other approach executes the query twice! Once to get Count() and second one to get actual collection of results. If you really want to return null instead of empty collection use should get following:
public IEnumerable<Redemption> Get(int ID, int CountToFetch)
{
var redempts = redemptions.Where(i => i.Id > ID).Take(CountToFetch).ToList();
if (redemptions.Any())
{
return redempts;
}
return null;
}
You should return an empty collection.
All Linq methods (that I am aware of) that have an IEnumerable-based return type return empty collections rather than null. Returning null from a method prevents you from chaining method calls since you now need to check for null to avoid an NullReferenceExcpetion (as you've discovered).

Calling FirstOrDefault multiple times in LINQ to set object properties

I have a LINQ query like so:
var Item = (from s in contextD.Items where s.Id == Id select s).ToList();
Further down the Item object properties are set like so:
Item.FirstOrDefault().Qty = qtyToUpdate;
Item.FirstOrDefault().TotalPrice = UItem.FirstOrDefault().Qty * UItem.FirstOrDefault().Price;
...
My question is, will calling FirstOrDefault always loop through the result set returned by the query?
Shouldn't a single call be made and put in an object like so:
MyObject objMyObject = new MyObject;
objMyObject = Item.FirstOrDefault();
and then go about setting objMyObject properties.
The first part using FirstOrDefault is actually in the production, I am trying to find out if that's the right way.
Regards.
will calling FirstOrDefault always loop through the result set
returned by the query?
FirstOrDefault() never loops through all result set - it either returns first item, or default if set is empty. Also in this particular case even enumerator will not be created - thus you are calling FirstOrDefault() on variable of List<T> type, simply item at index 0 will be returned (if list is not empty). If you will investigate Enumerable.FirstOrDefault() implementation:
IList<TSource> list = source as IList<TSource>;
if (list != null)
{
if (list.Count > 0)
{
return list[0];
}
}
But this will be invoked each time you are calling FirstOrDefault().
Also you are missing 'default' case. If you are chaining methods like in your first sample, you can get NullReferenceException if list is empty. So, make sure something was returned by query:
var item = Item.FirstOrDefault();
if (item == null)
return; // or throw
item.Qty = qtyToUpdate;
var uitem = UItem.FirstOrDefault();
if (uitem == null)
return; // or throw
item.TotalPrice = uitem.Qty * uitem.Price;
One more note - you have little difference in performance if you are performing FirstOrDefault() on in-memory collection. But difference will be huge if you will perform it without saving query results into list. In that case each FirstOrDefault() call will cause new database query.

ToList method in Linq

If I am not wrong, the ToList() method iterate on each element of provided collection and add them to new instance of List and return this instance.Suppose an example
//using linq
list = Students.Where(s => s.Name == "ABC").ToList();
//traditional way
foreach (var student in Students)
{
if (student.Name == "ABC")
list.Add(student);
}
I think the traditional way is faster, as it loops only once, where as of above of Linq iterates twice once for Where method and then for ToList() method.
The project I am working on now has extensive use of Lists all over and I see there is alot of such kind of use of ToList() and other Methods that can be made better like above if I take list variable as IEnumerable and remove .ToList() and use it further as IEnumerable.
Do these things make any impact on performance?
Do these things make any impact on performance?
That depends on your code. Most of the time, using LINQ does cause a small performance hit. In some cases, this hit can be significant for you, but you should avoid LINQ only when you know that it is too slow for you (i.e. if profiling your code showed that LINQ is reason why your code is slow).
But you're right that using ToList() too often can cause significant performance problems. You should call ToList() only when you have to. Be aware that there are also cases where adding ToList() can improve performance a lot (e.g. when the collection is loaded from database every time it's iterated).
Regarding the number of iterations: it depends on what exactly do you mean by “iterates twice”. If you count the number of times MoveNext() is called on some collection, then yes, using Where() this way leads to iterating twice. The sequence of operations goes like this (to simplify, I'm going to assume that all items match the condition):
Where() is called, no iteration for now, Where() returns a special enumerable.
ToList() is called, calling MoveNext() on the enumerable returned from Where().
Where() now calls MoveNext() on the original collection and gets the value.
Where() calls your predicate, which returns true.
MoveNext() called from ToList() returns, ToList() gets the value and adds it to the list.
…
What this means is that if all n items in the original collection match the condition, MoveNext() will be called 2n times, n times from Where() and n times from ToList().
var list = Students.Where(s=>s.Name == "ABC");
This will only create a query and not loop the elements until the query is used. By calling ToList() will first then execute the query and thus only loop your elements once.
List<Student> studentList = new List<Student>();
var list = Students.Where(s=>s.Name == "ABC");
foreach(Student s in list)
{
studentList.add(s);
}
this example will also only iterate once. Because its only used once. Keep in mind that list will iterate all students everytime its called.. Not only just those whose names are ABC. Since its a query.
And for the later discussion Ive made a testexample. Perhaps its not the very best implementation of IEnumable but it does what its supposed to do.
First we have our list
public class TestList<T> : IEnumerable<T>
{
private TestEnumerator<T> _Enumerator;
public TestList()
{
_Enumerator = new TestEnumerator<T>();
}
public IEnumerator<T> GetEnumerator()
{
return _Enumerator;
}
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new NotImplementedException();
}
internal void Add(T p)
{
_Enumerator.Add(p);
}
}
And since we want to count how many times MoveNext is called we have to implement our custom enumerator aswel. Observe in MoveNext we have a counter that is static in our program.
public class TestEnumerator : IEnumerator
{
public Item FirstItem = null;
public Item CurrentItem = null;
public TestEnumerator()
{
}
public T Current
{
get { return CurrentItem.Value; }
}
public void Dispose()
{
}
object System.Collections.IEnumerator.Current
{
get { throw new NotImplementedException(); }
}
public bool MoveNext()
{
Program.Counter++;
if (CurrentItem == null)
{
CurrentItem = FirstItem;
return true;
}
if (CurrentItem != null && CurrentItem.NextItem != null)
{
CurrentItem = CurrentItem.NextItem;
return true;
}
return false;
}
public void Reset()
{
CurrentItem = null;
}
internal void Add(T p)
{
if (FirstItem == null)
{
FirstItem = new Item<T>(p);
return;
}
Item<T> lastItem = FirstItem;
while (lastItem.NextItem != null)
{
lastItem = lastItem.NextItem;
}
lastItem.NextItem = new Item<T>(p);
}
}
And then we have a custom item that just wraps our value
public class Item<T>
{
public Item(T item)
{
Value = item;
}
public T Value;
public Item<T> NextItem;
}
To use the actual code we create a "list" with 3 entries.
public static int Counter = 0;
static void Main(string[] args)
{
TestList<int> list = new TestList<int>();
list.Add(1);
list.Add(2);
list.Add(3);
var v = list.Where(c => c == 2).ToList(); //will use movenext 4 times
var v = list.Where(c => true).ToList(); //will also use movenext 4 times
List<int> tmpList = new List<int>(); //And the loop in OP question
foreach(var i in list)
{
tmpList.Add(i);
} //Also 4 times.
}
And conclusion? How does it hit performance?
The MoveNext is called n+1 times in this case. Regardless of how many items we have.
And also the WhereClause does not matter, he will still run MoveNext 4 times. Because we always run our query on our initial list.
The only performance hit we will take is the actual LINQ framework and its calls. The actual loops made will be the same.
And before anyone asks why its N+1 times and not N times. Its because he returns false the last time when he is out of elements. Making it the number of elements + end of list.
To answer this completely, it depends on the implementation. If you are talking about LINQ to SQL/EF, there will be only one iteration in this case when .ToList is called, which internally calls .GetEnumerator. The query expression is then parsed into TSQL and passed to the database. The resulting rows are then iterated over (once) and added to the list.
In the case of LINQ to Objects, there is only one pass through the data as well. The use of yield return in the where clause sets up a state machine internally which keeps track of where the process is in the iteration. Where does NOT do a full iteration creating a temporary list and then passing those results to the rest of the query. It just determines if an item meets a criteria and only passes on those that match.
First of all, Why are you even asking me? Measure for yourself and see.
That said, Where, Select, OrderBy and the other LINQ IEnumerable extension methods, in general, are implemented as lazy as possible (the yield keyword is used often). That means that they do not work on the data unless they have to. From your example:
var list = Students.Where(s => s.Name == "ABC");
won't execute anything. This will return momentarily even if Students is a list of 10 million objects. The predicate won't be called at all until the result is actually requested somewhere, and that is practically what ToList() does: It says "Yes, the results - all of them - are required immediately".
There is however, some initial overhead in calling of the LINQ methods, so the traditional way will, in general, be faster, but composability and the ease-of-use of the LINQ methods, IMHO, more than compensate for that.
If you like to take a look at how these methods are implemented, they are available for reference from Microsoft Reference Sources.

Is this achievable with a single LINQ query?

Suppose I have a given object of type IEnumerable<string> which is the return value of method SomeMethod(), and which contains no repeated elements. I would like to be able to "zip" the following lines in a single LINQ query:
IEnumerable<string> someList = SomeMethod();
if (someList.Contains(givenString))
{
return (someList.Where(givenString));
}
else
{
return (someList);
}
Edit: I mistakenly used Single instead of First. Corrected now.
I know I can "zip" this by using the ternary operator, but that's just not the point. I would just list to be able to achieve this with a single line. Is that possible?
This will return items with given string or all items if given is not present in the list:
someList.Where(i => i == givenString || !someList.Contains(givenString))
The nature of your desired output requires that you either make two requests for the data, like you are now, or buffer the non-matches to return if no matches are found. The later would be especially useful in cases where actually getting the data is a relatively expensive call (eg: database query or WCF service). The buffering method would look like this:
static IEnumerable<T> AllIfNone<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
//argument checking ignored for sample purposes
var buffer = new List<T>();
bool foundFirst = false;
foreach (var item in source)
{
if (predicate(item))
{
foundFirst = true;
yield return item;
}
else if (!foundFirst)
{
buffer.Add(item);
}
}
if (!foundFirst)
{
foreach (var item in buffer)
{
yield return item;
}
}
}
The laziness of this method is either that of Where or ToList depending on if the collection contains a match or not. If it does, you should get execution similar to Where. If not, you will get roughly the execution of calling ToList (with the overhead of all the failed filter checks) and iterating the result.
What is wrong with the ternary operator?
someList.Any(s => s == givenString) ? someList.Where(s => s == givenString) : someList;
It would be better to do the Where followed by the Any but I can't think of how to one-line that.
var reducedEnumerable = someList.Where(s => s == givenString);
return reducedEnumerable.Any() ? reducedEnumerable : someList;
It is not possible to change the return type on the method, which is what you're asking. The first condition returns a string and the second condition returns a collection of strings.
Just return the IEnumerable<string> collection, and call Single on the return value like this:
string test = ReturnCollectionOfStrings().Single(x => x == "test");

Categories