Deferred execution value unchanged - c#

Say I have IQueryable<Car> cars.
I iterate it such as:
foreach (Car c in cars)
{
c.Doors = 2;
}
Why does c.Doors contain the original value after the foreach, instead of the changed value?
Thank you in advance

You indicated that IQueryable is the result of a linq to entities query which is not a correct statement. MyDatabaseContext.Cars.Where(x => x.Name == "Test") returns a IQueryable which on iteration perfoms a query on the database. (Iteration is when you perform a foreach over it). So it doesnt contain the result set yet, just the query.
looping twice over cars generates 2 identical queries to the database and returns 2 identical result sets. If you want to preserve the data. you need to call ToArray or ToList or after manipulation, perform a save of the changes before you iterate again.

Iterating over an IQueryable retrieves the result set from the database as it goes, as I'm sure you know. What I've observed in this and other situations is that this result is not cached, so you'll often find that iterating over the IQueryable again will actually run the query again, thus your modifications aren't preserved because you're getting a new result set which was never affected by that code.
The reason it works when you call ToList() first and iterate over the result of that is because ToList() retrieves and materialises the entire result set, which is then no longer linked to the database content and is really just a copy of what the database returned.
The only way to be sure your changes will stick is to operate on the local copy of the data, i.e. lift it out of IQueryable land. This can be as simple as saving an IEnumerable of the result set (i.e. the result of ToEnumerable()), which will then return the same result set each time you enumerate it, but unlike ToList() it won't cause evaluation immediately.

Related

IQueryable being 3 times as slow with Contain-method vs Array

I have some records which I fetch from database (normally about 100-200). I also need to get the corresponding Place for every record and fill in the Description of the Place in the record. I would normally do that in my .Select function but I need to check if Place isn't null before trying to take the Description. My code goes like this:
var places = db.Places.Where(p => p.Active && p.CustomerID == cust_ID).ToArray();
foreach (var result in query)
result.Description =
places.Where(Place.Q.Contains(result.Latitude, result.Longitude).Compile())
.FirstOrDefault()?.Description;
query is IQueryable.
If I take places as IQueryable or IEnumerable and remove the Compile() from my Expression, my code runs 3x (!!!) as slow as when I run the code as shown here.
Does anyone have an explanation for that? Does places get fetched from database every loop of foreach?
(Edit as my first question was answered)
Also is there any way I could check if Place is null in my Select function (not taking the results into memory, keeping it IQueryable) so I don't have to loop over my results afterwards?

Entity Framework - Performance in count

I've a little question about performance with Entity Framework.
Something like
using (MyContext context = new MyContext())
{
Document DocObject = context.Document.Find(_id);
int GroupCount = context.Document.Where(w=>w.Group == DocObject.Group).ToList().Count();
}
takes about 2 seconds in my database (about 30k datasets), while this one
using (MyContext context = new MyContext())
{
Document DocObject = context.Document.Find(_id);
int GroupCount = context.Document.Where(w=>w.Group == DocObject.Group).Count();
}
takes 0,02 seconds.
When my filter for 10 documents had 20 seconds to wait, I checked my code, and changed this to not use ToList() before Count().
Any ideas why it needs 2 seconds for this line with the ToList()?
Calling ToList() then Count() will:
execute the whole SELECT FROM WHERE against your database
then materialize all the resulting entities as .Net objects
create a new List<T> object containing all the results
return the result of the Count property of the .Net list you just created
Calling Count() against an IQueryable will:
execute SELECT COUNT FROM WHERE against your database
return an Int32 with the number of rows
Obviously, if you're only interested in the number of items (not the items themselves), then you shouldn't ever call ToList() first, as it will require a lot of resources for nothing.
Yes, ToList() will evaluate the results (retrieving the objects from database), if you do not use ToList(), the objects arenĀ“t retrieved from the database.
Linq-To-Entities uses LazyLoading per default.
It works something like this;
When you query your underlying DB connection using Linq-To-Entities you will get a proxy object back on which you can perform a number of operations (count being one). This means that you do not get all the data from the DB at once but rather the objects are retrieved from the DB at the time of evaluation. One way of evaluating the object is by using ToList().
Maybe you should read this.
Because ToList() will query the database for the whole objects (will do a SELECT * so to say), and then you'll use Count() on the list in memory with all the records, whereas if you use Count() on the IQueryable (and not on the List), EF will translate it to a simple SELECT COUNT(*) SQL query
Your first query isnt fully transalted to sql - when you call .ToList().Count(), you are basically saying "download all, materialize it to POCO and call extension method named Count()" which, of course, take some time.
Your second query is, however, transalted to something like select count(*) from Documents where GroupId = #DocObjectGroup which is much faster to execute and you arent materializing anything, just simple scalar.
Using the extension method Enumerable.ToList() will construct a new List object from the IEnumerable<T> source collection which means that there is an associated cost with doing ToList().

Which methods use in Linq to force a retrieving data from database

When i use this query, i don't retrieve data from database:
var query = db.Table.Where(x => x.id_category.HasValue).Select(x => x.Id);
But if i use this, i'll retrieve data:
var dataRetrieved = query.ToList();
Which others methods, like ToList(), i can force a retrieving data?
LINQ works by building up a series of query commands, and waits until the last possible moment to execute them. In order to force execution, you have to ask the query for "real" data, not just a description of the steps to get it.
According to this MSDN article (emphasis mine):
LINQ queries are always executed when the query variable is iterated over, not when the query variable is created....
To force immediate execution of a query that does not produce a singleton value, you can call the ToList method, the ToDictionary method, or the ToArray method on a query or query variable....
You could also force execution by putting the foreach or For Each loop immediately after the query expression, but by calling ToList or ToArray you cache all the data in a single collection object.
The select returns an IEnumerable, which is a sequence of values to invoke a transform function on. So it depends on you what transformation you want? you want list or array? you can also traverse through and manipulate each piece independently using a foreach loop. Following page will give you an idea of what you can do with your values.
Besides what BJ Myers and Ebad Massod already explained, there are also operators which return single value and that are executed immediately, such as:
First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault
Also, LINQ Aggregate Functions return a single value and are executed immediately, such as:
Average, Count, Max, Min, Sum
Here is a nice MSDN article: Classification of Standard Query Operators by Manner of Execution

Does foreach execute the query only once?

I have a list of items and a LINQ query over them. Now, with LINQ's deferred execution, would a subsequent foreach loop execute the query only once or for each turn in the loop?
Given this example (Taken from Introduction to LINQ Queries (C#), on MSDN)
// The Three Parts of a LINQ Query:
// 1. Data source.
int[] numbers = new int[7] { 0, 1, 2, 3, 4, 5, 6 };
// 2. Query creation.
// numQuery is an IEnumerable<int>
var numQuery =
from num in numbers
where (num % 2) == 0
select num;
// 3. Query execution.
foreach (int num in numQuery)
{
Console.Write("{0,1} ", num);
}
Or, in other words, would there be any difference if I had:
foreach (int num in numQuery.ToList())
And, would it matter, if the underlying data is not in an array, but in a Database?
Now, with LINQ's deferred execution, would a subsequent foreach loop execute the query only once or for each turn in the loop?
Yes, once for the loop. Actually, it may execute the query less than once - you could abort the looping part way through and the (num % 2) == 0 test wouldn't be performed on any remaining items.
Or, in other words, would there be any difference if I had:
foreach (int num in numQuery.ToList())
Two differences:
In the case above, ToList() wastes time and memory, because it first does the same thing as the initial foreach, builds a list from it, and then foreachs that list. The differences will be somewhere between trivial and preventing the code from ever working, depending on the size of the results.
However, in the case where you are going to repeatedly do foreach on the same results, or otherwise use it repeatedly, the then while the foreach only runs the query once, the next foreach runs it again. If the query is expensive, then the ToList() approach (and storing that list) can be a massive saving.
No, it makes no difference. The in expression is evaluated once. More specifically, the foreach construct invokes the GetEnumerator() method on the in expression and repeatedly calls MoveNext() and accesses the Current property in order to traverse the IEnumerable.
OTOH, calling ToList() is redundant. You shouldn't bother calling it.
If the input is a database, the situation is slightly different, since LINQ outputs IQueryable, but I'm pretty sure that foreach still treats it as an IEnumerable (which IQueryable inherits).
As written, each iteration of the loop would do exactly as much work as it needed to fetch the next result. So the answer would technically be "none of the above". The query would execute "in pieces".
If you use ToList() or any other materialization method (ToArray() etc) then the query will be evaluated once on the spot and subsequent operations (such as iterating over the results) will simply work on a "dumb" list.
If numbers were an IQueryable instead of an IEnumerable -- as it would likely be in a database scenario -- then the above is still close to the truth although not a perfectly accurate description. In particular, on the first attempt to materialize a result the queryable provider would talk to the database and produce a result set; then, rows from this result set would be pulled on each iteration.
The linq query will be executed when it is enumerated (either as the result of a .ToList() call or doing a foreach over the results.
If you are enumerating the results of the linq query twice, both times will cause it to query the data source (in your example, enumerating the collection) as it is itself only returning an IEnumerable. However, depending on the linq query, it may not always enumerate the entire collection (e.g .Any() and .Single() will stop on the first object or the first matching object if there is a .Where()).
The implementation details of a linq provider may differ so the usual behaviour when the data source is a database is to call .ToList() straight away to cache the results of the query & also ensure that the query (in the case of EF, L2S or NHibernate) is executed once there and then rather than when the collection is enumerated at some point later in the code and to prevent the query being executed multiple times if the results are enumerated multiple times.

Converting IEnumerable<T> to List<T> on a LINQ result, huge performance loss

On a LINQ-result you like this:
var result = from x in Items select x;
List<T> list = result.ToList<T>();
However, the ToList<T> is Really Slow, does it make the list mutable and therefore the conversion is slow?
In most cases I can manage to just have my IEnumerable or as Paralell.DistinctQuery but now I want to bind the items to a DataGridView, so therefore I need to as something else than IEnumerable, suggestions on how I will gain performance on ToList or any replacement?
On 10 million records in the IEnumerable, the .ToList<T> takes about 6 seconds.
.ToList() is slow in comparison to what?
If you are comparing
var result = from x in Items select x;
List<T> list = result.ToList<T>();
to
var result = from x in Items select x;
you should note that since the query is evaluated lazily, the first line doesn't do much at all. It doesn't retrieve any records. Deferred execution makes this comparison completely unfair.
It's because LINQ likes to be lazy and do as little work as possible. This line:
var result = from x in Items select x;
despite your choice of name, isn't actually a result, it's just a query object. It doesn't fetch any data.
List<T> list = result.ToList<T>();
Now you've actually requested the result, hence it must fetch the data from the source and make a copy of it. ToList guarantees that a copy is made.
With that in mind, it's hardly surprising that the second line is much slower than the first.
No, it's not creating the list that takes time, it's fetching the data that takes time.
Your first code line doesn't actually fetch the data, it only sets up an IEnumerable that is capable of fetching the data. It's when you call the ToList method that it will actually get all the data, and that is why all the execution time is in the second line.
You should also consider if having ten million lines in a grid is useful at all. No user is ever going to look through all the lines, so there isn't really any point in getting them all. Perhaps you should offer a way to filter the result before getting any data at all.
I think it's because of memory reallocations: ToList cannot know the size of the collection beforehand, so that it could allocate enough storage to keep all items. Therefore, it has to reallocate the List<T> as it grows.
If you can estimate the size of your resultset, it'll be much faster to preallocate enough elements using List<T>(int) constructor overload, and then manually add items to it.

Categories