If the IQueryable interface performs the query expression in the server rather than fetching all records like IEnumerable, why is IQueryable not replaced by IEnumerable where it can be faster and more efficient?
DBSet<T> has two flavors of Where (IQueryable and IEnumerable). Is there a way to call the IEnumerable version because the IQueryable is called by default, without calling ToList()?
If the IQueryable perform the query Expression in the server rather
than fetching all records like IEnumerable, why IQueryable not
replaced by IEnumerable where it can be faster and more efficient?
IQueryable and IEnumerable represent two different things. Think of a IQueryable as a "question", it does not have any results itself. A IEnumerable is an "answer", it only has data connected to it but you can't tell what generated that data.
It is not that a IQueryable is "faster" per say, it just allows you to put your filtering and projections in to the "question" you ask to the SQL server and let it return only the answers it needs to (In the form of a IEnumerable by calling .ToList() or similar).
If you only use a IEnumerable the only question you can ask is "Give me everything you know" then on the answer it gives you you perform your filtering and projections. That is why IQueryable is considered faster, because there is a lot less data that needs to be processed because you where able to ask a more specific question to the server.
The reason IQueryable has not replaced IEnumerable everywhere is because the thing you are asking a question has to be able to understand the question you are asking it. It takes a lot of work to be able to parse every possible thing you could ask it to filter or project on so most implementations limit themselves to only common things they know they need to be able to answer. For example in Entity Framework when you ask a question it does not understand how to handle you will get a error that says something similar to "Specified method is not supported" when you try to get a IEnumerable (an answer) from the IQueryable.
DBSet<T> has two flavors of Where (IQueryable and IEnumerable).
is there a way to call the IEnumerable version because the
IQueryable is called by default, without calling ToList()?
The class DBSet<T> has no Where method on it at all. The two Where functions come from two extension methods, Enumerable.Where and Queryable.Where. You can force it to use the Enumerable overload by casting the object to a IEnumerable<T> before you call the extension method. However do remember, Queryable.Where filters the question, Enumerable.Where only filters the result.
It is wasteful to ask for results from the server to then just throw them away, so I would not recommend doing this.
Related
Let's say I need an extension method which selects only required properties from different sources. The source could be the database or in-memory collection. So I have defined such extension method:
public IQueryable<TResult> SelectDynamic<TResult>(
this IQueryable<T> source,
...)
This works fine for IQueryables. But, I have to call this function also for IEnumerables.
And in that case, I can call it with the help of .AsQueryable():
myEnumerable.AsQueryable()
.SelectDynamic(...)
.ToList();
Both work fine. And if both work fine, in which conditions I have to create two different extension methods for the same purpose, one for IEnumerable and another one for IQueryable?
My method has to send query to the database in case of Queryable.
For example, here is the source of .Select extension method inside System.Linq namespace:
.Select for IEnumerable
.Select for IQueryable
I am repeating my main question again:
My method must send query to the database in case of Queryable, but not when working with IEnumerable. And for now, I am using AsQueryable() for the enumerables. Because, I dont want to write same code for the Enumerable. Can it have some side effects?
If your code only actually works when the objects its dealing with are loaded in memory, just supply the IEnumerable variant and let your callers decide when they want to convert an IQueryable into an in-memory IEnumerable.
Generally, you won't implement new variations around IQueryable unless you're writing a new database provider.
myEnumerable.AsQueryable() returns a custom object: new EnumerableQuery<TElement>(myEnumerable); (source code)
This EnumerableQuery class implements IEnumerable<T> and IQueryable<T>
When using the EnumerableQuery result of .AsQueryable() as an IEnumerable, the implementation of the interface method IEnumerable<T>.GetIterator() simply returns the original source iterator, so no change and minimal overhead.
When using the result of .AsQueryable() as an IQueriable, the implementation of the interface property IQueriable.Expression simply returns Expression.Constant(this), ready to be evaluated later as an IEnumerable when the whole expression tree is consumed.
(All the other methods and code paths of EnumerableQuery are not really relevant, when the EnumerableQuery is constructed directly from an IEnumerable, as far as I can tell)
If I understand you correctly, you have implemented your method selectDynamic<TResult>() in such a way that you construct an expression tree inside the method, that produces the desired result when compiled.
As far as I understand the source code, when you call e.g. myEnumerable.AsEnumerable().selectDynamic().ToList(), the expression tree you constructed is compiled and executed on myEnumerable, and the total overhead should be fairly minimal, since all this work is only done once per query, not once per element.
So i think there is nothing wrong with implementing your IEnumerable Extension method like this:
public IEnumerable<TResult> SelectDynamic<TResult>(
this IEnumerable<T> source,...)
return source.AsQueryable().SelectDynamic();
}
There is some slight overhead, since this compiles the query once each time this method is called, and I am not sure the JITer is smart enough to cache this compilation. But I think that will not be noticeable in most circumstances, unless you execute this query a thousand times per second.
There should be no other side efects, apart from slight performance issues, in implementing the IEnumerable extension method in this way.
The implementation of Enumerable.AsEnumerable<T>(this IEnumerable<T> source) simply returns source. However Observable.AsObservable<T>(this IObservable<T> source) returns an AnonymousObservable<T> subscribing to the source rather than simply returning the source.
I understand these methods are really useful for changing the monad within a single query (going from IQueryable => IEnumerable). So why do the implementations differ?
The Observable version is more defensive, in that you can't cast it to some known type (if it original were implemented as a Subject<T> you'd never be able to cast it as such). So why does the Enumerable version not do something similar? If my underlying type is a List<T> but expose it as IEnumerable<T> through AsEnumerable, it will be possible to cast back to a List<T>.
Please note that this isn't a question on how to expose IEnumerable<T> without being able to cast to the underlying, but why the implementations between Enumerable and Observable are semantically different.
Your question is answered by the documentation, which I encourage you to read when you have such questions.
The purpose of AsEnumerable is to hint to the compiler "please stop using IQueryable and start treating this as an in-memory collection".
As the documentation states:
The AsEnumerable<TSource>(IEnumerable<TSource>) method has no effect other than to change the compile-time type of source from a type that implements IEnumerable<T> to IEnumerable<T> itself. AsEnumerable<TSource>(IEnumerable<TSource>) can be used to choose between query implementations when a sequence implements IEnumerable<T> but also has a different set of public query methods available.
If you want to hide the implementation of an underlying sequence, use sequence.Select(x=>x) or ToList or ToArray if you don't care that you're making a mutable sequence.
The purpose of AsObservable is to hide the implementation of the underlying collection. As the documentation says:
Observable.AsObservable<TSource> ... Hides the identity of an observable sequence.
Since the two methods have completely different purposes, they have completely different implementations.
You're right about the relationship between AsEnumerable and AsObservable wrt the aspect of switching from expression tree based queries to in-memory queries.
At the same time, exposing an Rx sequence based on a Subject<T> is very common, and we needed a way to hide it (otherwise the user could cast to IObservable<T> and inject elements).
A long while ago in the history of Rx pre-releases, we did have a separate Hide method, which was merely a Select(x => x) alias. We never quite liked it and decided to have a place where we deviated from the LINQ to Objects precise mirrorring, and made AsObservable play the role of Hide, also based on users who believed this was what it did to begin with.
Notice though, we do have an extension method called AsObservable on IQbservable<T> as well. That one does simply what AsEnumerable does too: it acts as the hint to the compiler to forget about the expression tree based querying mode and switch to in-memory queries.
My question is how / when it makes sense to overload (if thats possible?) the Where() extension method of IQueryable when you are making your own IQueryable implementation?
For example in Entity Framework its my understanding that a Where() call made against an ObjectSet will change the actual SQL thats being passed to the database. Alternatively if you cast to IEnumerable() first the filtering is done with LINQ-To-Objects rather than LINQ-To-Entities.
For instance:
new MyDBEntities().MyDbTable.Where(x => x.SomeProperty == "SomeValue");
// this is linq-to-entities and changes the database-level SQL
Versus:
new MyDBEntities().MyDbTable.AsEnumerable().Where(x => x.SomeProperty == "SomeValue");
// this is linq-to-objects and filters the IEnumerable
How do you deal with this when implementing your own IQueryable and IQueryable already has pre-defined extension methods such as Where()? Specifically I want make my own very simple ORM that uses native SQL and SqlDataReader under the hood, and want to have a .Where() method that changes the native SQL before passing it to the database.
Should I even use IQueryable or just create my own class entirely? I want to be able to use lambda syntax and have my SQL command altered based on the lambda function(s) used for filtration.
You could create your own type analogous to IQueryable<T>, but that's probably not a good idea. You should probably write your own implementation of the interface. There is a nice series of articles on how to do it, but be prepared, doing so is not one of the easier tasks.
I know these questions have been asked before, I'll start by listing a few of them (the ones I've read so far):
IEnumerable vs IQueryable
List, IList, IEnumerable, IQueryable, ICollection, which is most flexible return type?
Returning IEnumerable<T> vs. IQueryable<T>
IEnumerable<T> as return type
https://stackoverflow.com/questions/2712253/ienumerable-and-iqueryable
Views with business logic vs code
WPF IEnumerable<T> vs IQueryable<T> as DataSource
IEnumerable<T> VS IList<T> VS IQueryable<T>
What interface should my service return? IQueryable, IList, IEnumerable?
Should I return IEnumerable<T> or IQueryable<T> from my DAL?
As you can see, there's some great resources on SO alone on the subject, but there is one question/section of the question I'm still not sure about having read these all through.
I'm primarily concerned with the IEnumerable vs IQueryable question, and more specifically the coupling between the DAL and it's consumers.
I've found varying opinions suggested regarding the two interfaces, which have been great. However, I'm concerned with the implications of a DAL returning IQueryable. As I understand it IQueryable suggest/implies that there is a Linq Provider under the hood. That's concern number one - what if the DAL suddenly requires data from a non-Linq provided source? The following would work, but is it more of a hack?
public static IQueryable<Product> GetAll()
{
// this function used to use a L2S context or similar to return data
// from a database, however, now it uses a non linq provider
// simulate the non linq provider...
List<Product> results = new List<Product> { new Product() };
return results.AsQueryable();
}
So I can use the AsQueryable() extension though I don't admit to knowing exactly what this does? I always imagine IQueryables as being the underlying expression trees which we can append as necessary until we're ready to perform our query and fetch the results.
I could rectify this by changing the return type of the function to IEnumerable. I can then return IQueryable from the function because it inherits IEnumerable, and I get to keep the deferred loading. What I lose is the ability to append to the query expression:
var results = SomeClass.GetAll().Where(x => x.ProductTypeId == 5);
When returning IQueryable, as I understand it, this would simply append the expression. When returning IEnumerable, despite maintaining the deferred loading, the expression has to be evaluated so the results will be brought to memory and enumerated through to filter out incorrect ProductTypeIds.
How do other people get round this?
Provide more functions in the DAL - GetAllByProductType, GetAllByStartDate,... etc
Provide an overload that accepts predicates? i.e.
public static IEnumerable<Product> GetAll(Predicate<Product> predicate)
{
List<Product> results = new List<Product> { new Product() };
return results.Where(x => predicate(x));
}
One last part (Sorry, I know, really long question!).
I found IEnumerable to be the most recommended across all the questions I checked, but what about the deferred loadings' requirement for a datacontext to be available? As I understand it, if your function returns IEnumerable, but you return IQueryable, the IQueryable is reliant on an underlying datacontext. Because the result at this stage is actually an expression and nothing has been brought to memory, you cannot guarantee that the DAL's/function's consumer is going to perform the query, nor when. So do I have to keep the instance of the context that the results were derived from available somehow? Is this how/why the Unit of Work pattern comes into play?
Summary of the questions for clarity ( did a search for "?"...):
If using IQueryable as a return type, are you too tightly coupling your UI/Business Logic to Linq Providers?
Is using the AsQueryable() extension a good idea if you suddenly need to return data from a non-Linq Provided source?
Anyone have a good link describing how for example converting a standard list to AsQueryable works, what it actually does?
How do you handle additional filtering requirements supplied by business logic to your DAL?
It seems the deferred loading of both IEnumerable and IQueryable are subject to maintaining the underlying provider, should I be using a Unit of Work pattern or something else to handle this?
Thanks a lot in advance!
well, you aren't strictly coupled to any specific provider, but as a re-phrasing of that: you can't easily test the code, since each provider has different supported features (meaning: what works for one might not work for another - even something like .Single())
I don't think so, if there is any question in your mind about ever changing provider - see above
it just provides a decorated wrapper that uses .Compile() on any lambdas, and uses LINQ-to-Objects instead. Note LINQ-to-Objects has more support than any other provider, so this won't be an issue - except that it means that any "mocks" using this approach don't really test your actual code at all and are largely pointless (IMO)
yeah, tricky - see below
yeah, tricky - see below
Personally, I'd prefer well defined APIs here that take known parameters and return a loaded List<T> or IList<T> (or similar) of results; this gives you a testable/mockable API, and doesn't leave you at the mercy of deferred execution (closed connection hell, etc). It also means that any differences between providers is handled internally to the implementation of your data layer. It also makes a much closer fit for calling scenarios such as web-services, etc.
In short; given a choice between IEnumerable<T> and IQueryable<T>, I choose neither - opting instead to use IList<T> or List<T>. If I need additional filtering, then either:
I'll add that to the existing API via parameters, and do the filtering inside my data layer
I'll accept that oversized data is coming back, which I then need to filter out at the caller
What is the quickest way to find out which .net framework linq methods (e.g .IEnumerable linq methods) are implemented using deferred execution vs. which are not implemented using deferred execution.
While coding many times, I wonder if this one will be executed right way. The only way to find out is go to MSDN documentation to make sure. Would there be any quicker way, any directory, any list somewhere on the web, any cheat sheet, any other trick up your sleeve that you can share? If yes, please do so. This will help many linq noobs (like me) to make fewer mistakes. The only other option is to check documentation until one have used them enough to remember (which is hard for me, I tend not to remember "anything" which is documented somewhere and can be looked up :D).
Generally methods that return a sequence use deferred execution:
IEnumerable<X> ---> Select ---> IEnumerable<Y>
and methods that return a single object doesn't:
IEnumerable<X> ---> First ---> Y
So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray don't because they can't.
There are also two types of deferred execution. For example the Select method will only get one item at a time when it's asked to produce an item, while the OrderBy method will have to consume the entire source when asked to return the first item. So, if you chain an OrderBy after a Select, the execution will be deferred until you get the first item, but then the OrderBy will ask the Select for all the items.
The guidelines I use:
Always assume any API that returns IEnumerable<T> or IQueryable<T> can and probably will use deferred execution. If you're consuming such an API, and need to iterate through the results more than once (e.g. to get a Count), then convert to a collection before doing so (usually by calling the .ToList() extension method.
If you're exposing an enumeration, always expose it as a collection (ICollection<T> or IList<T>) if that is what your clients will normally use. For example, a data access layer will often return a collection of domain objects. Only expose IEnumerable<T> if deferred execution is a reasonable option for the API you're exposing.
Actually, there's more; in addition you need to consider buffered vs non-buffered. OrderBy can be deferred, but when iterated must consume the entire stream.
In general, anything in LINQ that returns IEnumerable tends to be deferred - while Min etc (which return values) are not deferred. The buffering (vs not) can usually be reasoned, but frankly reflector is a pretty quick way of finding out for sure. But note that often this is an implementation detail anyway.
For actual "deferred execution", you want methods that work on an IQueryable. Method chains based on an IQueryable work to build an expression tree representing your query. Only when you call a method that takes the IQueryable and produces a concrete or IEnumerable result (ToList() and similar, AsEnumerable(), etc) is the tree evaluated by the Linq provider (Linq2Objects is built into the Framework, as is Linq2SQL and now the MSEF; other ORMs and persistence-layer frameworks also offer Linq providers) and the actual result returned. Any IEnumerable class in the framework can be cast to an IQueryable using the AsQueryable() extension method, and Linq providers that will translate the expression tree, like ORMs, will provide an AsQueryable() as a jump-off point for a linq query against their data.
Even against an IEnumerable, some of the Linq methods are "lazy". Because the beauty of an IEnumerable is that you don't have to know about all of it, only the current element and whether there's another, Linq methods that act on an IEnumerable often return an iterator class that spits out an object from its source whenever methods later in the chain ask for one. Any operation that doesn't require knowledge of the entire set can be lazily evaluated (Select and Where are two big ones; there are others). Ones that do require knowing the entire collection (sorting via OrderBy, grouping with GroupBy, and aggregates like Min and Max) will slurp their entire source enumerable into a List or Array and work on it, forcing evaluation of all elements through all higher nodes. Generally, you want these to come late in a method chain if you can help it.
Here's a summary of different ways you can know if your query will be deferred or not:
If you're using query expression syntax instead of query method syntax, then it will be deferred.
If you're using query method syntax, it MIGHT be deferred depending on what it returns.
Hover over the var key word (if that's what you're using as the type for the variable used to store the query). If it says IEnumerable<T> then it'll be deferred.
Try to iterate over the query using a foreach. If you get an error saying it cannot iterate over your variable because it does not support GetEnumerator(), you know the query is not deferred.
Source: Essential Linq
If you cast the collection to an IQueryable using .AsQueryable(), your LINQ calls will use the deferred execution.
See here: Using IQueryable with Linq