How to use LinQ to increase performance - c#

I have this LinQ:
var IPI = item.INV_TAXES.Where(t => t.TAXTYPES.TAXNAME == "IPI")
.Select(t => new {TOT_AMT = t.TAXVALUE, t.TAXFACTOR, t.TAXBASE})
.First();
Then after in the code I call the next lines about 10 times:
PerformSomeCalculation(IPI.TOT_AMT);
PerformAnotherStuff(IPI.TOT_AMT,IPI.TAXVALUE);
PerformSomethingElse(IPI.TAXBASE);
I wonder if everytime that I call each member of IPI, the LinQ executes or just the first time when I assign it?
Is it better to assign the IPI members to a variable first?
decimal IPI_TOT_AMT = IPI.TOT_AMT,
IPI_TAXVALUE = IPI.TAXVALUE,
IPI_TAXBASE = IPI.TAXBASE;
And then use them.
Thanks for all the advises.

First off, STOP SHOUTING IN YOUR CODE.
You are right to be concerned. A query is executed fresh every time you access it, because the results might change. But the result of First is not a query, it is a value.
That is, if you did this:
var query = whatever.Where(whatever).Select(whatever);
Console.WriteLine(query.First());
Console.WriteLine(query.First());
Then the query is created by the first line, executed by the second line, and executed again by the third line. The query does not know whether the first result is different the second time you call First, so it runs the query again.
By contrast, if you do this:
var query = whatever.Where(whatever).Select(whatever);
var first = query.First();
Console.WriteLine(first);
Console.WriteLine(first);
Then the first line creates the query, the second line executes the query and stores the result, and the third and fourth lines report the stored result.

In the code you provided, the LINQ query will only run when you call the .First() method.
There will be no noticeable performance improvement by assigning the members to variables before accessing them.
Note that this is not always the case with all LINQ statements. For example, if you had said:
var IPIs = item.INV_TAXES.Where(t => t.TAXTYPES.TAXNAME == "IPI")
.Select(t => new {TOT_AMT = t.TAXVALUE, t.TAXFACTOR, t.TAXBASE});
... and then passed IPIs into several methods, you would likely end up with separate database round-trips. In order to avoid that issue, you would call .ToList() to force immediate evaluation before assigning the variable. But in this case calling .First() effectively does the same thing.

I wounder if everytime that I call each member of IPI, the LinQ
executes everytime or just the first time when I assign it?
No, the query is only performed once - IPI is an instance of an anonymous type with a bunch of primitive properties (decimals in fact). This object is not connected to a query anymore - your query was executed and returned this object as a result of the First() extension method which forces immediate execution and returns the first item in the input collection.

The answer by Eric Lippert explains all that you wanted. However, if you want to go further in the optimalization, you can try using CompiledQuery.Compile method to store and reuse the queries.
For more information, check msdn. You can start here:
http://msdn.microsoft.com/cs-cz/library/bb399335.aspx

The First() method execute the linq query and return the first object
This object is typed like any other object you use, the only difference is that the class definition is written by the compiler.
So nothing is executed any more when you use the object

Related

Which methods use in Linq to force a retrieving data from database

When i use this query, i don't retrieve data from database:
var query = db.Table.Where(x => x.id_category.HasValue).Select(x => x.Id);
But if i use this, i'll retrieve data:
var dataRetrieved = query.ToList();
Which others methods, like ToList(), i can force a retrieving data?
LINQ works by building up a series of query commands, and waits until the last possible moment to execute them. In order to force execution, you have to ask the query for "real" data, not just a description of the steps to get it.
According to this MSDN article (emphasis mine):
LINQ queries are always executed when the query variable is iterated over, not when the query variable is created....
To force immediate execution of a query that does not produce a singleton value, you can call the ToList method, the ToDictionary method, or the ToArray method on a query or query variable....
You could also force execution by putting the foreach or For Each loop immediately after the query expression, but by calling ToList or ToArray you cache all the data in a single collection object.
The select returns an IEnumerable, which is a sequence of values to invoke a transform function on. So it depends on you what transformation you want? you want list or array? you can also traverse through and manipulate each piece independently using a foreach loop. Following page will give you an idea of what you can do with your values.
Besides what BJ Myers and Ebad Massod already explained, there are also operators which return single value and that are executed immediately, such as:
First, FirstOrDefault, Last, LastOrDefault, Single, SingleOrDefault
Also, LINQ Aggregate Functions return a single value and are executed immediately, such as:
Average, Count, Max, Min, Sum
Here is a nice MSDN article: Classification of Standard Query Operators by Manner of Execution

Overhead of using linq queries on non-changing list

While debugging my code, I came to know that if you are using a Linq query for assigning any property, whenever that property is used, the query is going to be executed every time. So if you are doing something like:
Myprop= MyList.where(p=>p.Stauts=="ABKI BAAR ..");
So in your code, whenever you do foreach(var prop in Myprop)',Myprop.ToList().count()` or any use of Myprop, it will result in the execution of the linq query every time.
Isn't executing the linq query everytime it a bad idea, especially when I am not modifying MyList?
You can circumvent this by calling ToList after your expression:
Myprop = MyList.where(p=>p.Stauts=="ABKI BAAR ..").ToList(); // or .ToArray()
That way it will immediately execute the Linq query and you won't be executing the query every time, since now you have a list.
This obviously does not take into account changes of the source list later on.
Materialize you query result using .ToArray() or .ToList().
So, instead of
Myprop = MyList.Where(p => p.Status == "ABKI BAAR ..");
Use
Myprop = MyList.Where(p => p.Status == "ABKI BAAR ..").ToArray();
See the MSDN documentation for more information on ToList or ToArray.
You should certainly never write Myprop.ToList().Count(). That would materialize the query (or copy an already materialized result), count the results, and then throw away the materialized result of the query (or copy thereof). If you only need to count, just do Myprop.Count().

Is there a wildcard for the .Take method in LINQ?

I am trying to create a method using LINQ that would take X ammount of products fron the DB, so I am using the .TAKE method for that.
The thing is, in situations I need to take all the products, so is there a wildcard I can give to .TAKE or some other method that would bring me all the products in the DB?
Also, what happens if I do a .TAKE (50) and there are only 10 products in the DB?
My code looks something like :
var ratingsToPick = context.RatingAndProducts
.ToList()
.OrderByDescending(c => c.WeightedRating)
.Take(pAmmount);
You could separate it to a separate call based on your flag:
IEnumerable<RatingAndProducts> ratingsToPick = context.RatingAndProducts
.OrderByDescending(c => c.WeightedRating);
if (!takeAll)
ratingsToPick = ratingsToPick.Take(pAmmount);
var results = ratingsToPick.ToList();
If you don't include the Take, then it will simply take everything.
Note that you may need to type your original query as IEnumerable<MyType> as OrderByDescending returns an IOrderedEnumerable and won't be reassignable from the Take call. (or you can simply work around this as appropriate based on your actual code)
Also, as #Rene147 pointed out, you should move your ToList to the end otherwise it will retrieve all items from the database every time and the OrderByDescending and Take are then actually operating on a List<> of objects in memory not performing it as a database query which I assume is unintended.
Regarding your second question if you perform a Take(50) but only 10 entries are available. That might depend on your database provider, but in my experience, they tend to be smart enough to not throw exceptions and will simply give you whatever number of items are available. (I would suggest you perform a quick test to make sure for your specific case)
Your current solution always takes all products from database. Because you are calling ToList(). After loading all products from database you are taking first N in memory. In order to conditionally load first N products, you need to build query
int? countToTake = 50;
var ratingsToPick = context.RatingAndProducts
.OrderByDescending(c => c.WeightedRating);
// conditionally take only first results
if (countToTake.HasValue)
ratingsToPick = ratingsToPick.Take(countToTake.Value);
var result = ratingsToPick.ToList(); // execute query

Does foreach execute the query only once?

I have a list of items and a LINQ query over them. Now, with LINQ's deferred execution, would a subsequent foreach loop execute the query only once or for each turn in the loop?
Given this example (Taken from Introduction to LINQ Queries (C#), on MSDN)
// The Three Parts of a LINQ Query:
// 1. Data source.
int[] numbers = new int[7] { 0, 1, 2, 3, 4, 5, 6 };
// 2. Query creation.
// numQuery is an IEnumerable<int>
var numQuery =
from num in numbers
where (num % 2) == 0
select num;
// 3. Query execution.
foreach (int num in numQuery)
{
Console.Write("{0,1} ", num);
}
Or, in other words, would there be any difference if I had:
foreach (int num in numQuery.ToList())
And, would it matter, if the underlying data is not in an array, but in a Database?
Now, with LINQ's deferred execution, would a subsequent foreach loop execute the query only once or for each turn in the loop?
Yes, once for the loop. Actually, it may execute the query less than once - you could abort the looping part way through and the (num % 2) == 0 test wouldn't be performed on any remaining items.
Or, in other words, would there be any difference if I had:
foreach (int num in numQuery.ToList())
Two differences:
In the case above, ToList() wastes time and memory, because it first does the same thing as the initial foreach, builds a list from it, and then foreachs that list. The differences will be somewhere between trivial and preventing the code from ever working, depending on the size of the results.
However, in the case where you are going to repeatedly do foreach on the same results, or otherwise use it repeatedly, the then while the foreach only runs the query once, the next foreach runs it again. If the query is expensive, then the ToList() approach (and storing that list) can be a massive saving.
No, it makes no difference. The in expression is evaluated once. More specifically, the foreach construct invokes the GetEnumerator() method on the in expression and repeatedly calls MoveNext() and accesses the Current property in order to traverse the IEnumerable.
OTOH, calling ToList() is redundant. You shouldn't bother calling it.
If the input is a database, the situation is slightly different, since LINQ outputs IQueryable, but I'm pretty sure that foreach still treats it as an IEnumerable (which IQueryable inherits).
As written, each iteration of the loop would do exactly as much work as it needed to fetch the next result. So the answer would technically be "none of the above". The query would execute "in pieces".
If you use ToList() or any other materialization method (ToArray() etc) then the query will be evaluated once on the spot and subsequent operations (such as iterating over the results) will simply work on a "dumb" list.
If numbers were an IQueryable instead of an IEnumerable -- as it would likely be in a database scenario -- then the above is still close to the truth although not a perfectly accurate description. In particular, on the first attempt to materialize a result the queryable provider would talk to the database and produce a result set; then, rows from this result set would be pulled on each iteration.
The linq query will be executed when it is enumerated (either as the result of a .ToList() call or doing a foreach over the results.
If you are enumerating the results of the linq query twice, both times will cause it to query the data source (in your example, enumerating the collection) as it is itself only returning an IEnumerable. However, depending on the linq query, it may not always enumerate the entire collection (e.g .Any() and .Single() will stop on the first object or the first matching object if there is a .Where()).
The implementation details of a linq provider may differ so the usual behaviour when the data source is a database is to call .ToList() straight away to cache the results of the query & also ensure that the query (in the case of EF, L2S or NHibernate) is executed once there and then rather than when the collection is enumerated at some point later in the code and to prevent the query being executed multiple times if the results are enumerated multiple times.

Deferred execution value unchanged

Say I have IQueryable<Car> cars.
I iterate it such as:
foreach (Car c in cars)
{
c.Doors = 2;
}
Why does c.Doors contain the original value after the foreach, instead of the changed value?
Thank you in advance
You indicated that IQueryable is the result of a linq to entities query which is not a correct statement. MyDatabaseContext.Cars.Where(x => x.Name == "Test") returns a IQueryable which on iteration perfoms a query on the database. (Iteration is when you perform a foreach over it). So it doesnt contain the result set yet, just the query.
looping twice over cars generates 2 identical queries to the database and returns 2 identical result sets. If you want to preserve the data. you need to call ToArray or ToList or after manipulation, perform a save of the changes before you iterate again.
Iterating over an IQueryable retrieves the result set from the database as it goes, as I'm sure you know. What I've observed in this and other situations is that this result is not cached, so you'll often find that iterating over the IQueryable again will actually run the query again, thus your modifications aren't preserved because you're getting a new result set which was never affected by that code.
The reason it works when you call ToList() first and iterate over the result of that is because ToList() retrieves and materialises the entire result set, which is then no longer linked to the database content and is really just a copy of what the database returned.
The only way to be sure your changes will stick is to operate on the local copy of the data, i.e. lift it out of IQueryable land. This can be as simple as saving an IEnumerable of the result set (i.e. the result of ToEnumerable()), which will then return the same result set each time you enumerate it, but unlike ToList() it won't cause evaluation immediately.

Categories