I've been doing some research into the differences between the two linq methods Find() and First(). The only differences I could find (pun intended) was that Find() uses a foreach loop instead of a for loop, and First() does not require a parameter to be called.
So is there any reason that I should use Find() instead of First()?
EDIT: I have already read C# Difference between First() and Find()
, but it does not give any reason to use one over the other. It merely discusses how the two iterate over the list differently.
Mostly style preference, but for some cases there is difference.
Find is defined on limited set of types (List<T>, array) while First is defined as extension for all IEnumerable and IQueryable types. Using First allows you to change underlying collection type easily including using results of .Where and .Select methods. Converting enumerable to one that supports .Find is always slower option than just calling .First.
Performance of both methods is roughly the same on types they both defined as both simply do linear search through elements. More info in question you've linked - C# Difference between First() and Find()
If you have "queryable" enumeration (when using LINQ-to-SQL for example) using .First could be significantly faster than converting result to collection that support .Find (i.e. using .ToList) and than calling .Find. Such queryable enumeration likely convert .First into database specific query that will return one result while .ToList will likely have to pull in much more results for client side filtering.
Related
Is it required or not to use ToList() after Select() in this code:
var names = someStorage.GetItems().Select(x => x.Name).ToList();
The Enumerable.ToList method will cause the population of data, if you do not call data wont be fetched and it will be a query.
The ToList(IEnumerable) method forces immediate
query evaluation and returns a List that contains the query
results. You can append this method to your query in order to obtain a
cached copy of the query results, MSDN.
It completely depends on what your code does subsequently. The ToList() method causes the query that you defined by using Select() to run immediately against the datastore. Without it, its execution would be delayed until you access the names variable for the first time.
The other aspect is that, if you don't use ToList(), the query will be run against the datastore each time you use the names variable - not just once as is the case with ToList(). So it also heavily depends on how often you use the names variable (If you use it only once (e.g. in a loop), then there is no difference, otherwise ToList() is much more efficient.
It depends on your assignment variable if you assigning to list then you need to convert.
If you do not call ToList it will be a IEnumerable<TSource> which is the enumerator, which supports a simple iteration over a collection of a specified type.
ToList converts the source sequence into a list. Some points to note:
The signature specifies List, not just IList. Of course it
could return a subclass of List, but there seems little point.
It uses immediate execution - nothing is deferred here
The parameter (source) musn't be null
It's optimized for the case when source implements ICollection
It always creates a new, independent list.
The last two points are worth a bit more discussion. Firstly, the optimization for ICollection isn't documented, but it makes a lot of sense:
List stores its data in an array internally
ICollection exposes a Count property so the List can create an
array of exactly the right size to start with
ICollection exposes a CopyTo method so that the List can copy
all the elements into the newly created array in bulk
Source to refer
I have a linq to entity query.
will Any() force linq execution (like ToList() does)?
There is very good MSDN article Classification of Standard Query Operators by Manner of Execution which describes all standard operators of LINQ. As you can see from table Any is executed immediately (as all operators which return single value). You can always refer this table if you have doubts about manner of operator execution.
Yes, and no. The any method will read items from the source right away, but it's not guaranteed to read all items.
The Any method will enumerate items from the source, but only as many as needed to determine the result.
Without any parameter, the Any method will only try to read the first item from the source.
With a parameter, the Any method will only read items from the source until it finds one that satisfies the condition. All items are only read from the source if no items satisfies the condition until the last item.
This is easy to discover: Any() returns a simple bool. Since a bool is always a bool, and not an IQueryable or IEnumerable (or any other type) that can have a custom implementation, we must conclude that Any() itself must calculate the boolean value to return.
The exception is of course if the Any() is used inside a subquery on a IQueryable, in which case the Linq provider will typically just analyse the presence of the call to Any() and convert it to corresponding SQL (for example).
Short question, short answer: Yes it will.
To find out if the any element of the list matches the given condition (or if there is any element at all) the list will have to be enumerated. As MSDN states:
This method does not return any one element of a collection. Instead, it determines whether the collection contains any elements.
The enumeration of source is stopped as soon as the result can be determined.
Deferred execution does not apply here, because this method delivers the result of an enumeration, not another IEnumerable.
I have a ILookup<TKey,TElement> lookup from which I fairly often get elements and iterate trough them using LINQ or foreach. I look up like this IEnumerable<TElement> results = lookup[key];.
Thus, results needs to be enumerated at least once every time I use lookup results (and even more if I'm iterating multiple times if I don't use .ToList() first).
Even though its not as "clean", wouldn't it be better (performance-wise) to use a Dictionary<TKey,List<TElement>>, so that all results from a key are only enumerated on construction of the dictionary? Just how taxing is ToList()?
ToLookup, like all the other ToXXX LINQ methods, uses immediate execution. The resulting object has no reference to the original source. It effectively does build a Dictionary<TKey, List<TElement>> - not those exact types, perhaps, but equivalent to it.
Note that there's a difference though, which may or may not be useful to you - the indexer for a lookup returns an empty sequence if you give it a key which doesn't exist, rather than throwing an exception. That can make life much easier if you want to be able to just index it by any key and iterate over the corresponding values.
Also note that although it's not explicitly documented, the implementation used for the value sequences does implement ICollection<T>, so calling the LINQ Count() method is O(1) - it doesn't need to iterate over all the elements.
See my Edulinq post on ToLookup for more details.
Assuming the implementation is System.Linq.Lookup (does ILookup have any other implementations?), the elements presented in lookup[key] are stored in an array of elements as a field of System.Linq.Lookup.Grouping. Repeatedly looking them up won't cause a re-iteration of source. Of course, rebuilding the Lookup will be more costly, but once built, the source is no longer accessed.
Just in case you're wondering how this came up, I'm working with some resultsets from Entity Framework.
I have an object that is an IEnumerable<IEnumerable<string>>; basically, a list of lists of strings.
I want to merge all the lists of strings into one big list of strings.
What is the best way to do this in C#.net?
Use the LINQ SelectMany method:
IEnumerable<IEnumerable<string>> myOuterList = // some IEnumerable<IEnumerable<string>>...
IEnumerable<String> allMyStrings = myOuterList.SelectMany(sl => sl);
To be very clear about what's going on here (since I hate the thought of people thinking this is some kind of sorcery, and I feel bad that some other folks deleted the same answer):
SelectMany is an extension method ( a static method that through syntactic sugar looks like an instance method on a specific type) on IEnumerable<T>. It takes your original enumeration of enumerations and a function for converting each item of that into a enumeration.
Because the items are already enumerations, the conversion function is simple- just return the input (sl => sl means "take a paremeter named sl and return it"). SelectMany then provides an enumeration over each of these in turn, resulting in your "flattened" list..
Use the Concat method:
firstEnumerable.Concat(secondEnumerable)
Using SelectMany will force an additional evaluation of each element of both enumerations that you don't need.
What is the quickest way to find out which .net framework linq methods (e.g .IEnumerable linq methods) are implemented using deferred execution vs. which are not implemented using deferred execution.
While coding many times, I wonder if this one will be executed right way. The only way to find out is go to MSDN documentation to make sure. Would there be any quicker way, any directory, any list somewhere on the web, any cheat sheet, any other trick up your sleeve that you can share? If yes, please do so. This will help many linq noobs (like me) to make fewer mistakes. The only other option is to check documentation until one have used them enough to remember (which is hard for me, I tend not to remember "anything" which is documented somewhere and can be looked up :D).
Generally methods that return a sequence use deferred execution:
IEnumerable<X> ---> Select ---> IEnumerable<Y>
and methods that return a single object doesn't:
IEnumerable<X> ---> First ---> Y
So, methods like Where, Select, Take, Skip, GroupBy and OrderBy use deferred execution because they can, while methods like First, Single, ToList and ToArray don't because they can't.
There are also two types of deferred execution. For example the Select method will only get one item at a time when it's asked to produce an item, while the OrderBy method will have to consume the entire source when asked to return the first item. So, if you chain an OrderBy after a Select, the execution will be deferred until you get the first item, but then the OrderBy will ask the Select for all the items.
The guidelines I use:
Always assume any API that returns IEnumerable<T> or IQueryable<T> can and probably will use deferred execution. If you're consuming such an API, and need to iterate through the results more than once (e.g. to get a Count), then convert to a collection before doing so (usually by calling the .ToList() extension method.
If you're exposing an enumeration, always expose it as a collection (ICollection<T> or IList<T>) if that is what your clients will normally use. For example, a data access layer will often return a collection of domain objects. Only expose IEnumerable<T> if deferred execution is a reasonable option for the API you're exposing.
Actually, there's more; in addition you need to consider buffered vs non-buffered. OrderBy can be deferred, but when iterated must consume the entire stream.
In general, anything in LINQ that returns IEnumerable tends to be deferred - while Min etc (which return values) are not deferred. The buffering (vs not) can usually be reasoned, but frankly reflector is a pretty quick way of finding out for sure. But note that often this is an implementation detail anyway.
For actual "deferred execution", you want methods that work on an IQueryable. Method chains based on an IQueryable work to build an expression tree representing your query. Only when you call a method that takes the IQueryable and produces a concrete or IEnumerable result (ToList() and similar, AsEnumerable(), etc) is the tree evaluated by the Linq provider (Linq2Objects is built into the Framework, as is Linq2SQL and now the MSEF; other ORMs and persistence-layer frameworks also offer Linq providers) and the actual result returned. Any IEnumerable class in the framework can be cast to an IQueryable using the AsQueryable() extension method, and Linq providers that will translate the expression tree, like ORMs, will provide an AsQueryable() as a jump-off point for a linq query against their data.
Even against an IEnumerable, some of the Linq methods are "lazy". Because the beauty of an IEnumerable is that you don't have to know about all of it, only the current element and whether there's another, Linq methods that act on an IEnumerable often return an iterator class that spits out an object from its source whenever methods later in the chain ask for one. Any operation that doesn't require knowledge of the entire set can be lazily evaluated (Select and Where are two big ones; there are others). Ones that do require knowing the entire collection (sorting via OrderBy, grouping with GroupBy, and aggregates like Min and Max) will slurp their entire source enumerable into a List or Array and work on it, forcing evaluation of all elements through all higher nodes. Generally, you want these to come late in a method chain if you can help it.
Here's a summary of different ways you can know if your query will be deferred or not:
If you're using query expression syntax instead of query method syntax, then it will be deferred.
If you're using query method syntax, it MIGHT be deferred depending on what it returns.
Hover over the var key word (if that's what you're using as the type for the variable used to store the query). If it says IEnumerable<T> then it'll be deferred.
Try to iterate over the query using a foreach. If you get an error saying it cannot iterate over your variable because it does not support GetEnumerator(), you know the query is not deferred.
Source: Essential Linq
If you cast the collection to an IQueryable using .AsQueryable(), your LINQ calls will use the deferred execution.
See here: Using IQueryable with Linq