When using .FromCache() on an IQueryable result set, should I additionally call .ToList(), or can I just return the IEnumerable<> returned by the materialized query with FromCache?
I am assuming you are using a derivative of the code from http://petemontgomery.wordpress.com/2008/08/07/caching-the-results-of-linq-queries/ . If you look at the FromCache implementation, you will see the that the query.ToList() is already called. This means that the evaluated list is what is cached. So,
You do NOT need to call ToList()
That depends entirely on what you want to do with it. If you're just going to foreach over it once then you may as well just leave it as an IEnumerable. There's no need to build up a list just to discard it right away.
If you plan to iterate over it multiple times it's probably best to ToList it, so that you're not accessing the underlying IQueryable multiple times. You should also ToList it if it's possible for the underlying query to change over time and you don't want those changes to be reflected in your query.
If you are likely to not need to iterate all of the items (you may end up stopping after the first item, or half way, or something like that) then it's probably best to leave it as an IEnumerable to potentially avoid even fetching some amount of data in the first place.
If the method has no idea how it's going to be used, and it's just a helper method that will be used by not-yet-written code, then consider returning IEnumerable. The caller can call ToList on it if they have a compelling reason to turn it into a list.
For me, as a general rule, I leave such queries as IEnumerable unless I have some compelling reason to make it a List.
Related
I have a comma delimited string called ctext which I want to split and put into a List<string>.
Would using LINQ,
List<string> f = ctext.Split(',').ToList();
be slower than not using LINQ?
List<string> f;
f.AddRange(ctext.Split(','));
It seems that LINQ would actually copy something somewhere at some point which would make it slower, whereas AddRange() would just check the size of the list once, expand it, and dump it in.
Or is there an even faster way? (Like using a for loop, but I doubt it.)
Fortunately, we can easily look at what ToList does now that it's open source. (Follow the link for the latest source...)
I haven't seen IListProvider<T> before, but I doubt that an array implements it, which means we've basically got new List<TSource>(source). Looking at the List<T> source shows that both the constructor and AddRange basically end up using CopyTo.
In other words, other than a few levels of indirection, I'd expect them both to do the same thing.
It seems that LINQ would actually copy something somewhere at some point which would make it slower, whereas AddRange() would just check the size of the list once, expand it, and dump it in.
You are correct that both of these things are happening, but incorrect in thinking that each are specific to that one operation. Both ToList and AddRange do both of those things. Both operations copy all of the values from the input sequence into the list, and since both of them are adding multiple items at the same time they're able to see how much to expand the internal capacity of the list all at once, rather than needing to perform multiple expansions.
If I run against dlinq, for example
var custq = data.StoreDBHelper.DataContext.Customers as IEnumerable <Data.SO.Customer>;
I Thought it was not much of a difference against running:
var custq = data.StoreDBHelper.DataContext.Customers as IQueryable <Data.SO.Customer>;
As, IQueryable inherits from IEnumerable.
But I discovered the following, if you call:
custq.Sum()
then the program will process this as you called .toList() it you use the 'as IEnumerable'
because the memory on the progam raised to the same level, when i tried, custq.ToList.Sum()
but not on the 'as IQueryable' (because the issue then running on sql server) and did not affect the memory usage of the program.
My question is simply this, should you not use 'as IEnumerable' on Dlinq? But 'as IQueryable' as an general rule? I know that if you are running standard iterations, it gets the same result, between 'as IEnumerable'and 'as IQueryable'.
But is it just the summary functions and delegate expressions in where statement that there will be a difference - or will you in general get better performance if you use 'as IQueryable'? (for standard iteration and filter functions on DLinq entities)
Thanks !
Well, depends on what you want to do...
Casting it as IEnumerable will return an object you can enumerate... and nothing more.
So yes, if you call Count on an IEnumerable, then you enumerate the list (so you actually perform your Select query) and count each iteration.
On the other hand, if you keep an IQueryable, then you may enumerate it, but you could also perform database operations like Were, OrderBy or count. Then this will delay execution of the query and eventually modify it before running it.
Calling OrderBy on an enumerable browse all results and order them in memory. Calling OrderBy on a queryable simply adds ORDER BY at the end of your SQL and let database do the ordering.
In general, it is better to keep it as an IQueryable, yes... Unless you want to count them by actually browsing them (instead of doing a SELECT COUNT(*)...)
I am wondering about general performance of LINQ. I admit, that it comes handy but how performant is LINQ? I know that is a broad question. So I want to ask about a particular example:
I have an anonymous type:
var users = reader.Select(user => new MembershipUser(reader.Name, reader Age));
And now, I want to convert it to the MembershipUserCollection.
So I do it like this:
MembershipUserCollection membershipUsers = new MembershipUserCollection();
users.ToList().ForEach(membershipUsers.Add); //what is the complexity of this line?
What is the complexity of the last line? Is it n^2 ?
Is ToList() method iterates for each element of the users and adds it to the list?
Or does ToList() works differently? Because if it is not, I find hard to justice the reason of using the last line of the code instead of simply:
foreach (var user in users)
{
membershipUsers.Add(user);
}
Your example isn't particularly good for your question because ToList() isn't really in the same class of extension methods as the other ones supporting LINQ. The ToList() extension method is a conversion operation, not a query operation. The real values in LINQ are deferred execution of a composite query built by combining several LINQ query operations and improved readability. In LINQ2SQL you also get the advantage of constructing arbitrary queries that get pushed to the DB server for actual execution, taking advantage of optimizations that the DB may have in place to improve performance.
In general, I would expect that the question of performance largely comes down to how well you construct the actual queries and has a lot more to do with how well the programmer knows the tools and data than how well the tool is implemented. In your case, it makes no sense to construct a temporary list just to be able to invoke the convenience ForEach method on it if all you care about is performance. You'd be better off simply iterating over the enumeration you already have (as you suspect). LINQ won't stop a programmer from writing bad code, though it may disguise bad code for the person who doesn't understand how LINQ works.
It's always the case that you can construct an equivalent program not using LINQ for any program using LINQ. It may be that you can actually improve on the performance. I would submit, though, that LINQ makes it much easier to write readable code than non-LINQ solutions. By that, I mean more compact and understandable. It also makes it easier to write composable code, which when executed in a deferred manner performs better than, non-LINQ compositions. By breaking the code into composable parts, you simplify it and improve understandability.
I think the trick here is to really understand where LINQ makes sense rather than treat it as a shiny, new tool that you need to now use for every problem you have. The nice part of this shiny, new tool, though, is that it really does come in handy in a lot of situations.
It's O(n) - since .ToList() iterates once through the enumeration and copys the elements into the resulting List<T> (whose insertion is O(1)). Thus the complexity is fine.
The actual issue you might see is that you create a completely new, temporary List<T> just to copy its contents into another list (and afterwards discard it).
I suspect that's just due to the convenience of having a .ForEach()-method on List<T>s. One could nonetheless code a direct implementation for IEnumerable<T>s, which would save this one superfluous copying - or just write
foreach (var user in users) membershipUsers.Add(user)
which is basically what you want to express after all ;-)
Converting to a list will have the same complexity as iterating over the sequence, which may really by anything depending on how the sequence is generated. A normal Select over an in-memory list is O(n).
The performance of using ForEach on a List vs a foreach loop comes down to the overhead of invoking a delegate vs the overhead of creating and using a enumerator, I cannot say which one is quicker, but if both are used on an in-memory list, the complexity is the same.
regarding linq to objects, if i use a .Where(x => x....) and then straight afterwards use a .SkipWhile(x => x...) does this incur a performance penalty because i am going over the collection twice?
Should i find a way to put everything in the Where clause or in the SkipWhile clause?
There will be a minor inefficiency due to chaining iterators together, but it really will be very minor. (In particular, although each matching item will be seen by both operators, they won't be buffered up or anything like that. LINQ to Object isn't going to create a new list of all matching items and then run SkipWhile over it.)
If this is so performance-critical, you'd probably get a very slight speed bump by not using LINQ in the first place. In every other case, write the simplest code first and only worry about micro-optimisations like this when you've proved it's a bottleneck.
Using a Where and a SkipWhile doesn't result in "going over the collection twice." LINQ to Objects works on a pull model. When you enumerate the combined query, the SkipWhile will start asking its source for elements. Its source is the Where, so this will cause the Where to start asking its source in turn for elements. So the SkipWhile will see all elements that pass the Where clause, but it's getting them as it goes. The upshot is that LINQ does a foreach over the original collection, returning only elements that pass both the Where and SkipWhile filters -- and this involves only a single pass over the collection.
There may be a trivial loss of efficiency because there are two iterators involved, but it is unlikely to be significant. You should write the code to be clear (as you are doing at the moment), and if you suspect that the clear version is causing a performance issue, measure to make sure, and only then try combining the clauses.
As with most things the answer is it depends on what you're doing. If you have multiple where's the operate on the same object it's probably worth combining them with &&'s for example.
Most of the LINQ operators won't iterate over the entire collection per operator, they merely process one item and pass it onto the next operator. There are exceptions to this such as Reverse and OrderBy, but generally if you were using Where and SkipWhile for example you would have a chain that would process one item at a time. Now your first Where statement could obviously filter out some items, so SkipWhile wouldn't see an item until it passed through the preceding operator.
My personal preference is to keep operators separate for clarity and combine them only if performance becomes an issue.
You aren't going over the collection twice when you use Where and SkipWhile.
The Where method will stream its output to the SkipWhile method one item at a time, and likewise the SkipWhile method will stream its output to any subsequent method one item at a time.
(There will be a small overhead because the compiler generates separate iterator objects for each method behind the scenes. But if I was worried about the overhead of compiler-generated iterators then I probably wouldn't be using LINQ in the first place.)
No, there's (essentially) no performance penalty. That's what lazy (deferred) execution is all about.
I'm getting into using the Repository Pattern for data access with the Entity Framework and LINQ as the underpinning of implementation of the non-Test Repository. Most samples I see return AsQueryable() when the call returns N records instead of List<T>. What is the advantage of doing this?
AsQueryable just creates a query, the instructions needed to get a list. You can make futher changes to the query later such as adding new Where clauses that get sent all the way down to the database level.
AsList returns an actual list with all the items in memory. If you add a new Where cluse to it, you don't get the fast filtering the database provides. Instead you get all the information in the list and then filter out what you don't need in the application.
So basically it comes down to waiting until the last possible momment before committing yourself.
Returning IQueryable<T> has the advantage, that the execution is defferer until you really start enumerating the result and you can compose the query with other queries and still get server side execution.
The problem is that you cannot control the lifetime of the database context in this method - you need an open context and must ensure that it stays open until the query gets executed. And then you must ensure that the context will be disposed. If you return the result as a List<T>, T[], or something similar, you loose deffered execution and server side execution of composed queries, but you win control over the lifetime of the database context.
What fits best, of course, depends on the actual requirements. It's another question without a single truth.
AsQueryable is an extension method for IEnumerable<T> that could do two things:
If the IEnumerable<T> implements IQueryable<T> justs casts, doing nothing.
Otherwise creates a 'fake' IEnumerable<T> (EnumerableQuery<T>) that implements every method compiling the lambdas and calling to Enumerable extension methods.
So in most of the cases using AsQueryable is useless, unless u are forced to pass a IQueryable to a method and u have a IEnumerable instead, it's a hack.
NOTE: AsQueryable is a hack, IQueryable of course is not!
Returning IQueryable<T> will defer execution of the query until its results are actually used. Until then, you can also perform additional database query operations on the IQueryable<T>; on a List you're restricted to generally less-efficient in-memory operations.
IQueryable: This is kind of deferred execution - lazy loading so your query is evaluated and hit to database. If we add any additional clause it will hit the Db with the required query with filters.
IEnumerable: Is eager loading thus records will be loaded first in-memory operation and then the operation will be performed.
So depending on use case like while paging on a huge number of records we should use IQueryable<T>, if short operation and doesn't create huge memory storage use IEnumerable.