I have a linq to entity query.
will Any() force linq execution (like ToList() does)?
There is very good MSDN article Classification of Standard Query Operators by Manner of Execution which describes all standard operators of LINQ. As you can see from table Any is executed immediately (as all operators which return single value). You can always refer this table if you have doubts about manner of operator execution.
Yes, and no. The any method will read items from the source right away, but it's not guaranteed to read all items.
The Any method will enumerate items from the source, but only as many as needed to determine the result.
Without any parameter, the Any method will only try to read the first item from the source.
With a parameter, the Any method will only read items from the source until it finds one that satisfies the condition. All items are only read from the source if no items satisfies the condition until the last item.
This is easy to discover: Any() returns a simple bool. Since a bool is always a bool, and not an IQueryable or IEnumerable (or any other type) that can have a custom implementation, we must conclude that Any() itself must calculate the boolean value to return.
The exception is of course if the Any() is used inside a subquery on a IQueryable, in which case the Linq provider will typically just analyse the presence of the call to Any() and convert it to corresponding SQL (for example).
Short question, short answer: Yes it will.
To find out if the any element of the list matches the given condition (or if there is any element at all) the list will have to be enumerated. As MSDN states:
This method does not return any one element of a collection. Instead, it determines whether the collection contains any elements.
The enumeration of source is stopped as soon as the result can be determined.
Deferred execution does not apply here, because this method delivers the result of an enumeration, not another IEnumerable.
Related
I've been doing some research into the differences between the two linq methods Find() and First(). The only differences I could find (pun intended) was that Find() uses a foreach loop instead of a for loop, and First() does not require a parameter to be called.
So is there any reason that I should use Find() instead of First()?
EDIT: I have already read C# Difference between First() and Find()
, but it does not give any reason to use one over the other. It merely discusses how the two iterate over the list differently.
Mostly style preference, but for some cases there is difference.
Find is defined on limited set of types (List<T>, array) while First is defined as extension for all IEnumerable and IQueryable types. Using First allows you to change underlying collection type easily including using results of .Where and .Select methods. Converting enumerable to one that supports .Find is always slower option than just calling .First.
Performance of both methods is roughly the same on types they both defined as both simply do linear search through elements. More info in question you've linked - C# Difference between First() and Find()
If you have "queryable" enumeration (when using LINQ-to-SQL for example) using .First could be significantly faster than converting result to collection that support .Find (i.e. using .ToList) and than calling .Find. Such queryable enumeration likely convert .First into database specific query that will return one result while .ToList will likely have to pull in much more results for client side filtering.
Is it required or not to use ToList() after Select() in this code:
var names = someStorage.GetItems().Select(x => x.Name).ToList();
The Enumerable.ToList method will cause the population of data, if you do not call data wont be fetched and it will be a query.
The ToList(IEnumerable) method forces immediate
query evaluation and returns a List that contains the query
results. You can append this method to your query in order to obtain a
cached copy of the query results, MSDN.
It completely depends on what your code does subsequently. The ToList() method causes the query that you defined by using Select() to run immediately against the datastore. Without it, its execution would be delayed until you access the names variable for the first time.
The other aspect is that, if you don't use ToList(), the query will be run against the datastore each time you use the names variable - not just once as is the case with ToList(). So it also heavily depends on how often you use the names variable (If you use it only once (e.g. in a loop), then there is no difference, otherwise ToList() is much more efficient.
It depends on your assignment variable if you assigning to list then you need to convert.
If you do not call ToList it will be a IEnumerable<TSource> which is the enumerator, which supports a simple iteration over a collection of a specified type.
ToList converts the source sequence into a list. Some points to note:
The signature specifies List, not just IList. Of course it
could return a subclass of List, but there seems little point.
It uses immediate execution - nothing is deferred here
The parameter (source) musn't be null
It's optimized for the case when source implements ICollection
It always creates a new, independent list.
The last two points are worth a bit more discussion. Firstly, the optimization for ICollection isn't documented, but it makes a lot of sense:
List stores its data in an array internally
ICollection exposes a Count property so the List can create an
array of exactly the right size to start with
ICollection exposes a CopyTo method so that the List can copy
all the elements into the newly created array in bulk
Source to refer
Given a huge collection of objects, is there a performance difference between the the following?
Collection.Contains:
myCollection.Contains(myElement)
Enumerable.Any:
myCollection.Any(currentElement => currentElement == myElement)
Contains() is an instance method, and its performance depends largely on the collection itself. For instance, Contains() on a List is O(n), while Contains() on a HashSet is O(1).
Any() is an extension method, and will simply go through the collection, applying the delegate on every object. It therefore has a complexity of O(n).
Any() is more flexible however since you can pass a delegate. Contains() can only accept an object.
It depends on the collection. If you have an ordered collection, then Contains might do a smart search (binary, hash, b-tree, etc.), while with `Any() you are basically stuck with enumerating until you find it (assuming LINQ-to-Objects).
Also note that in your example, Any() is using the == operator which will check for referential equality, while Contains will use IEquatable<T> or the Equals() method, which might be overridden.
I suppose that would depend on the type of myCollection is which dictates how Contains() is implemented. If a sorted binary tree for example, it could search smarter. Also it may take the element's hash into account. Any() on the other hand will enumerate through the collection until the first element that satisfies the condition is found. There are no optimizations for if the object had a smarter search method.
Contains() is also an extension method which can work fast if you use it in the correct way.
For ex:
var result = context.Projects.Where(x => lstBizIds.Contains(x.businessId)).Select(x => x.projectId).ToList();
This will give the query
SELECT Id
FROM Projects
INNER JOIN (VALUES (1), (2), (3), (4), (5)) AS Data(Item) ON Projects.UserId = Data.Item
while Any() on the other hand always iterate through the O(n).
Hope this will work....
Could someone explain how does Union in LINQ work?
It is told that it merges two sequences and removes duplicates.
But can I somehow customize the duplicate removal behavior - let's say if I wish to use the element from the second sequence in case of duplicate or from the first sequence.
Or even if I wish to somehow combine those values in the resulting sequence?
How should that be implemented?
Update
I guess I described the problem incorrectly, let's say we have some value:
class Value {
String name
Int whatever;
}
and the comparer used performs a x.name == y.name check.
And let's say that sometimes I know I should take the element from the second sequence, because it's whatever field is newer / better than the whatever field of the first sequence.
Anyway, I would use the sequence1.Union(sequence2) or sequence2.Union(sequence1) variation of the methods.
Thank you
You can use second.Union(first) instead of first.Union(second). That way, it will keep the items from second rather than the items from first.
When the object returned by this method is enumerated, Union enumerates first and second in that order and yields each element that has not already been yielded.
http://msdn.microsoft.com/en-us/library/bb341731.aspx
So the elements from whichever sequence you use as the left parameter take precedence over the elements from right parameter.
The important thing about this is that it's well defined and documented behavior and not just an implementation detail that might change in the next version of .net.
As a side-note when you implement an IEqualityComparer<T> it's important to use consistent Equals and GetHashCode. And in this case I prefer to explicitly supply an equality comparer to the union method instead of having the Equals of the object itself return true for objects which are not identical for all purposes.
If the elements are duplicates then it doesn't matter which list they are taken from - unless your equality comparer doesn't take all properties of the element into account of course.
If they aren't really duplicates then they'll both appear in the resultant union.
UPDATE
From your new information at the minimum you should write a new equality operator that takes whatever into account. You can't just use sequence1.Union(sequence2) or sequence2.Union(sequence1) unless all the elements want taking from one sequence or the other.
At the extreme you'll have to write your own Union extension method which does this for you.
What is the quickest way to find out which .net framework linq methods (e.g .IEnumerable linq methods) are implemented using deferred execution vs. which are not implemented using deferred execution.
While coding many times, I wonder if this one will be executed right way. The only way to find out is go to MSDN documentation to make sure. Would there be any quicker way, any directory, any list somewhere on the web, any cheat sheet, any other trick up your sleeve that you can share? If yes, please do so. This will help many linq noobs (like me) to make fewer mistakes. The only other option is to check documentation until one have used them enough to remember (which is hard for me, I tend not to remember "anything" which is documented somewhere and can be looked up :D).
Generally methods that return a sequence use deferred execution:
IEnumerable<X> ---> Select ---> IEnumerable<Y>
and methods that return a single object doesn't:
IEnumerable<X> ---> First ---> Y
So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray don't because they can't.
There are also two types of deferred execution. For example the Select method will only get one item at a time when it's asked to produce an item, while the OrderBy method will have to consume the entire source when asked to return the first item. So, if you chain an OrderBy after a Select, the execution will be deferred until you get the first item, but then the OrderBy will ask the Select for all the items.
The guidelines I use:
Always assume any API that returns IEnumerable<T> or IQueryable<T> can and probably will use deferred execution. If you're consuming such an API, and need to iterate through the results more than once (e.g. to get a Count), then convert to a collection before doing so (usually by calling the .ToList() extension method.
If you're exposing an enumeration, always expose it as a collection (ICollection<T> or IList<T>) if that is what your clients will normally use. For example, a data access layer will often return a collection of domain objects. Only expose IEnumerable<T> if deferred execution is a reasonable option for the API you're exposing.
Actually, there's more; in addition you need to consider buffered vs non-buffered. OrderBy can be deferred, but when iterated must consume the entire stream.
In general, anything in LINQ that returns IEnumerable tends to be deferred - while Min etc (which return values) are not deferred. The buffering (vs not) can usually be reasoned, but frankly reflector is a pretty quick way of finding out for sure. But note that often this is an implementation detail anyway.
For actual "deferred execution", you want methods that work on an IQueryable. Method chains based on an IQueryable work to build an expression tree representing your query. Only when you call a method that takes the IQueryable and produces a concrete or IEnumerable result (ToList() and similar, AsEnumerable(), etc) is the tree evaluated by the Linq provider (Linq2Objects is built into the Framework, as is Linq2SQL and now the MSEF; other ORMs and persistence-layer frameworks also offer Linq providers) and the actual result returned. Any IEnumerable class in the framework can be cast to an IQueryable using the AsQueryable() extension method, and Linq providers that will translate the expression tree, like ORMs, will provide an AsQueryable() as a jump-off point for a linq query against their data.
Even against an IEnumerable, some of the Linq methods are "lazy". Because the beauty of an IEnumerable is that you don't have to know about all of it, only the current element and whether there's another, Linq methods that act on an IEnumerable often return an iterator class that spits out an object from its source whenever methods later in the chain ask for one. Any operation that doesn't require knowledge of the entire set can be lazily evaluated (Select and Where are two big ones; there are others). Ones that do require knowing the entire collection (sorting via OrderBy, grouping with GroupBy, and aggregates like Min and Max) will slurp their entire source enumerable into a List or Array and work on it, forcing evaluation of all elements through all higher nodes. Generally, you want these to come late in a method chain if you can help it.
Here's a summary of different ways you can know if your query will be deferred or not:
If you're using query expression syntax instead of query method syntax, then it will be deferred.
If you're using query method syntax, it MIGHT be deferred depending on what it returns.
Hover over the var key word (if that's what you're using as the type for the variable used to store the query). If it says IEnumerable<T> then it'll be deferred.
Try to iterate over the query using a foreach. If you get an error saying it cannot iterate over your variable because it does not support GetEnumerator(), you know the query is not deferred.
Source: Essential Linq
If you cast the collection to an IQueryable using .AsQueryable(), your LINQ calls will use the deferred execution.
See here: Using IQueryable with Linq