Are below comments correct about DEFERRED EXECUTION?
1. var x = dc.myTables.Select(r=>r);//yes
2. var x = dc.myTables.Where(..).Select(r=>new {..});//yes
3. var x = dc.myTables.Where(..).Select(r=>new MyCustomClass {..});//no
In other words, I always thought projecting custom class objects will always cause eager execution. But I couldn't find references supporting/denying it (though I am seeing results contradicting it, hence the post)
Every statement in your question is an example of deferred execution. The contents of the Select and Where statement have no effect on whether or not the resulting value is deferred executed or not. The Select + Where statements themselves dictate that.
As a counter example consider the Sum method. This is always eagerly executed irrespective of what the input is.
var sum = dc.myTables.Sum(...); // Always eager
To prove your point, your test should look like this:
var tracer = string.Empty;
Func<inType, outType> map = r => {
tracer = "set";
return new outType(...);
}
var x = dc.myTables.Where(..).Select(map);
// this confirms x was never enumerated as tracer would be "set".
Assert.AreEqual(string.Empty, tracer);
// confirm that it would have enumerated if it could
CollectionAssert.IsNotEmpty(x);
It has been my observation that the only way to force execution right away is to force iteration of the collection. I do this by calling .ToArray() on my LINQ.
Generally methods that return a sequence use deferred execution:
IEnumerable<X> ---> Select ---> IEnumerable<Y>
and methods that return a single object doesn't:
IEnumerable<X> ---> First ---> Y
So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray doesn't because they can't.
from here
.Select(...) is always deferred.
When you're working with IQueryable<T>, this and other deferred execution methods build up an expression tree and this isn't ever compiled into an actual executable expression until it's iterated. That is, you need to:
Do a for-each on the projected enumerable.
Call a method that internally enumerates the enumerable (i.e. .Any(...), .Count(...), .ToList(...), ...).
Related
I have a linq query which returns a task object and stores it in an IEnumerable. For some reason the select query keeps enumerating until the task is started or finished (I think, it's hard to debug).
The query is pretty straight forward:
Context.RetrieveDataTasks = retrievableProducts.Select(product => Context.HostController.RetrieveProductDataFiles(product));
Where the signature for RetrieveProductDataFiles is :
public Task RetrieveProductDataFiles(IProduct product)
The retrievableProducts is in this case a list of 1 product:
var retrievableProducts = products
.Where(product => AFancyButIrrelevantClause)
.ToList();
I don't mind to rewrite the code to a foreach loop where I fill a new list manually to avoid this problem, but I'd like to understand why the select query keeps executing. I think it has something to do with the task which is waiting for activation, but I have no idea why that would cause this.
Edit:
Just to be complete, I'd expect that above code works exactly the same as :
var retrievableDataTasks = new List<Task>();
foreach (var product in retrievableProducts)
{
retrievableDataTasks.Add(Context.HostController.RetrieveProductDataFiles(product));
}
Context.RetrieveDataTasks = retrievableDataTasks;
While the construction with a foreach does exactly what I expect: it populates a list with tasks (in this specific case a list of 1 task) and this task is executed once. While in the construction with the Select query that same 1 task is started over and over again.
I hope it's clear enough with the code I provided, looking forward to learn why the select query behaves differently (and if possible, how to avoid it from happening).
The correct answer to the posted question is by "bas" himself. Every time you reference the IEnumerable it re-evaluates the expression inside of "Select" and thus starts the tasks again. "ToList" would actually fix the problem because it would stop evaluating them.
Using 'ToList' forces the iterator to iterate through all the collection, even though you think you said 'simply give me the first two items in the collection'. If that said collection has 1000 elements, you'll iterate on that collection until you've reached the last item, and it'll still give you 2 elements.
You consume an iterator method by using a foreach statement or LINQ query. Each iteration of the foreach loop calls the iterator method. When a yield return statement is reached in the iterator method, expression is returned, and the current location in code is retained. Execution is restarted from that location the next time that the iterator function is called.
In your method where you instantiate a list where you add to it, you'd need to improve a little to use yield returns and thus, not allocate data that doesn't need to be allocated. LINQ methods are lazy evaluated, which means that there won't be any memory allocation for data until you try to materialize the results (ToList for instance). While you're in your LINQ method, the only memory usage you get is for the current iteration, not for everything that's found in your collection.
Let's say use the following code snippet to help you.
private static IEnumerable<Product> GetMyProducts(IEnumerable<Product> products, bool AFancyButIrrelevantClause)
{
foreach(var product in products)
{
if(AFancyButIrrelevantClause)
yield return product;
}
}
or directly in LINQ to be more concise:
products.Where(product => AFancyButIrrelevantClause)
I need to call the method for each item in the list. So, I have used Where<> query as follows,
List<string> list = new List<string>();
list.Add("Name1");
list.Add("Name2");
list.Add("Name3");
var name = list.Where(n =>
{
return CheckName(n);
});
But in the above case, CheckName() is not hit. The same method is triggered if I use FirstOrDefault<>. I don't know whether it is a framework break or I am going in a wrong way.
As additional info, I am using.NET Framework 4.5.
Has anyone experienced this error? If so, is there any solution to overcome this issue?
You are understanding incorrectly the result of the Where condition. As linq is deffered executed it will only enter the where condition when materialized (by a ToList/FirstOrDefault/Sum and such).
The Where is never actually materialized in your current code (It did as you experienced when using FirstOrDefault) and as such it will never enter the CheckName method. Then, as Where will never return null but "worst case" an empty collection, which is not null, the result is true.
If you debug you will see that name equals true at the end of this. To "overcome" this depends on what is your desired output:
If you want to know if you have any item that matched the predicate then:
var result = list.Any(CheckName);
If you want to retrieve those that match the predicate:
var result = list.Where(CheckName);
If later you want to query and check if results contains anything then:
if(result.Any()) { /* ... */ }
If you only want the results (and thus materializing the query):
list.Where(CheckName).ToList();
Read more about linq being deffered executed here:
Linq and deffered execution
What are the benefits of a Deferred Execution in LINQ?
Just as a side note see how you can change your current code from:
var name = list.Where(n =>
{
return CheckName(n);
})
To:
var name = list.Where(n => CheckName(n));
And eventually to:
var name = list.Where(CheckName);
LINQ has a Deferred Execution principal which means the query will not be executed until and unless you access name variable. If you want to execute it immediately, (just for example) add .ToList() in the end, which is exactly what FirstOrDefault does. It does immediate execution instead of deferred execution.
var name = list.Where(n =>
{
return CheckName(n);
}).ToList() != null;
Also where condition result will never be null. Even if there is no object in list satisfying your condition(s) in CheckName, where will return an empty collection.
The CheckName() method is not executed because of Deferred execution of Linq. The actual statement is not executed till you actually access it. So in your case, for the CheckName(), you should do something like:
var name = list.Where(n =>
{
return CheckName(n);
}).ToList();
When you look at the Where-Method source code you can easily see why:
internal static IEnumerable<T> Where<T>(this IEnumerable<T> enumerable, Func<T, bool> where) {
foreach (T t in enumerable) {
if (where(t)) {
yield return t;
}
}
}
The yield will cause the execution to only happen once the returned IEnumerable<T> is actually accessed. That is what is called deferred execution.
If you need to call a method for each item in a list then you should use a simple for loop:
foreach var name in list
CheckName(name);
Just because LINQ is available, doesn't mean it should be used everywhere there is a collection. It is important to write code that makes sense and is self commenting and using it here has simultaneously introduced a flaw into your logic and made your code harder to read, understand and maintain. It's the wrong tool for the stated purpose
Doubtless you have additional requirements not stated here, like "I want to check every name in a list and make sure that none are null". You can and possibly should use linq for this but it looks more like
bool allNamesOK = list.All(n => n != null);
This code is compact and reads well; we can clearly see the intention (though I wouldn't call the list "list" - "names" would better)
I was looking into Enumerable.ToLookup API which converts an enumerable sequence into a dictionary type data structure. More details can be found here:
https://msdn.microsoft.com/en-us/library/system.linq.enumerable.tolookup(v=vs.110).aspx
The only difference it carries from ToDictionary API is the fact that it won't give any error if the key selector results in duplicate keys. I need a comparison of deferred execution semantics of these two APIs. As far as I know, ToDictionary API results in immediate execution of the sequence i.e. it doesn't follow deferred execution semantics of LINQ queries. Can anyone help me with the deferred execution behavior of ToLookup API? Is it the same as ToDictionary API or there is some difference?
Easy enough to test...
void Main()
{
var lookup = Inf().ToLookup(i => i / 100);
Console.WriteLine("if you see this, ToLookup is deferred"); //never happens
}
IEnumerable<int> Inf()
{
unchecked
{
for(var i=0;;i++)
{
yield return i;
}
}
}
To recap, ToLookup greedily consumes the source sequence without deferring.
In contrast, the GroupBy operator is deferred, so you can write the following to no ill-effect:
var groups = Inf().GroupBy(i => i / 100); //oops
However, GroupBy is greedy, so when you enumerate, the entire source sequence is consumed.
This means that
groups.SelectMany(g=>g).First();
also fails to complete.
When you think about the problem of grouping, it quickly becomes apparent that when separating a sequence into a sequence of groups, it would be impossible to know if even just one of the groups were complete without completely consuming the entire sequence.
This was sort of covered here, but it was hard to find!
In short -- ToLookup does not defer execution!
ToLookup() -> immediate execution
GroupBy() (and other query methods) -> deferred execution
If you look at the reference implementation source code for both the Enumerable.ToDictionary() and the Enumerable.ToLookup() methods, you will see that both end up executing a foreach loop over the source enumerable. That's one way to confirm that the execution of the source enumerable is not deferred in both cases.
But I mean, the answer is pretty self evident in that if you start off with an enumerable, and the return value of the function is no longer an enumerable, then clearly, it must have been executed (consumed), no?
(That last paragraph was not accurate as pointed out by #spender in the comments)
Will LINQ use defered execution when we map the result to an object?
var x = from rcrd in DBContext.FooTable
select new Model.BOFoo()
{ Bar = rcrd.Bar };
The above code map rcrd to Model.BOFoo object. Will this mapping cause LINQ to fetch the actual data from the database? Or will it wait until I call x.ToList()?
I'd answer yes. If I don't miss any information about this, LINQ will still use deferred execution even if we map the result to object. The object initialization will also be deferred.
Unless your Linq query includes a method that executes the query, then no, it will not execute and will be deferred.
Examples of methods that execute the query include First(), FirstOrDefault(), ToList(), ToArray, etc..
select is not such a method, not even select new.
I always find it useful to think of the code as it would be in a method chain:
DBContext.FooTable
.Select(rcrd=>new Model.BOFoo{Bar=rcrd.Bar});
Here it is easier to visualize the deferred execution as you are only passing a function into the Select that will be evaluated as needed. The new is merely a part of that function.
So, as har07 and Erik already mentioned, if it is a deferred execution method, then it will remain that way unless forced via another method such as ToList
I have following code:
IEnumerable<TreeItem> rootTreeItems = BuildRootTreeItems();
BuildTreeView(rootTreeItems.ElementAt(0));
private static void BuildTreeView(TreeItem treeItem)
{
TreeItem subMenuTreeItem = new TreeItem();
subMenuTreeItem.Header = "1";
TreeItem subMenuTreeItem2 = new TreeItem();
subMenuTreeItem.Header = "2";
treeItem.TreeItems.Add(subMenuTreeItem);
treeItem.TreeItems.Add(subMenuTreeItem2);
}
The weird thing is after the BuildTreeView returns, the first element of rootTreeItems doesn't have any children nodes, while it really has when debugging into the BuildTreeView method.
This problem really confused me for quite a long time, any one has any idea? thanks so much.
You're most likely hitting a deferred execution issue with IEnumerable<>. The thing to remember is that your IEnumerable<TreeItem> rootTreeItems is not a list, instead it is a promise to get a list each and every time it is asked to do so.
So, if BuildRootTreeItems() creates the IEnumerable<TreeItem> using a LINQ query and it doesn't force the execution of the query using .ToList() or .ToArray() then each time that you use rootTreeItems you are re-executing the query inside BuildRootTreeItems()
Calling rootTreeItems.ElementAt(0) will cause the query to execute. If later you try to call rootTreeItems.ElementAt(0) again then you are re-executing the query and getting back a different instance of the first TreeItem.
Try changing the first line like so:
IEnumerable<TreeItem> rootTreeItems = BuildRootTreeItems().ToArray();
This forces the execution and will prevent re-execution later. I'll bet your problem goes away.
There is possibility what your BuildRootTreeItems() method returns IEnumerable interface which elements are created via yield statement or as Gabe mentioned in comment above an implementation of IEnumerable which is created via Linq expressions chain containing Select method.
This can lead to recreating elements of IEnumerable on each access to any element from enumeration or iterating via it using Linq expressions or foreach statement.
I would go simplier:
This also could happen if
A) TreeItem is a value type(struct)
B) TreeItem.TreeItems returns a new collection
But correctness of this is difficult to deduct just from code provided.