How to take the first element of IEnumerable without iteration? - c#

I need to take the first element of IEnumerable, but without iteration. I used First(), but it can cause some bugs because it iterates. I know I can do it with enum, but how can I do it?

IEnumerable<T> is what the name says, an enumerable thing. The only method it provides is IEnumerator<T> GetEnumerator(). Every extension method will enumerate the IEnumerable<T> to some extent.
If lazy evaluation/multiple enumeration is problematic (I've had that with result sets from the database, that have been evaluated after the connection has been disposed, see here), consider enumerating once, e.g. by converting the IEnumerable<T> to a List<T> with IEnumerable<T>.ToList().
Remarks: If enumerating the enumerable causes errors, your design is flawed. Consider using another interface.

you can try with FirstOrDefault.it stands for first element
int[] numbers = { };
int first = numbers.FirstOrDefault();
Console.WriteLine(first);

Related

FOR-EACH over an IEnumerable vs a List

Is there any benefit or difference if my for-each loop is going through the method argument if I pass in that argument as an IEnumerable or if I pass that argument as a List?
If your IEnumerable is implemented by List then no; no difference. There is a big conceptual difference though; the IEnumerable says "I can be enumerated" which means also that the number of items is not known and the enumeration cannot be reversed, or random accessed. The List says "I am a fully formed list, already populated; I can be reversed and randomly accessed".
So you should generally build your function interface to accept the lowest functionality compatible with your operation; if you are only going to enumerate forwards, iteratively, then accept IEnumerable - this allows your function to be used in more scenarios.
If you made your function accept only List() then any caller with an array or IEnumerable passed into it, must convert their input into List() before calling your function - which may well be poorer performance than simply passing through their array or IEnumerable directly. In this sense accepting an IEnumerable invites better performance code.
In the general case, there can be a difference if the collection has an explicit interface implementation of IEnumerable
List has the explicit implementation, but does not change behavior. There is no difference in your case.
See: https://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs looking at GetEnumerator and similar
No there isn't. In both cases the for-each is translated to something like this
var enumerator = input.GetEnumerator();
while(enumerator.MoveNext())
{
// loop body.
// The current value is accessed through: enumerator.Current
}
Additionally, if the enumerator is disposable, it will be disposed after the loop.
Jon Skeet gives a detailed description here.
If you pass the same object, it doesn't matter whether your method accepts IEnumerable or List.
However, if all you're going to do inside the method is enumerate the object, it's best to expect an IEnumerable in the method argument, you don't want to limit the caller of the method by expecting a List.
No, there is no benefit or difference as to how the foreach loop would go through the collection.
As Olivier Jacot-Descombes has pointed out, the foreach loop will simply go through the elements one by one using the enumerator.
However, it can make a difference if your logic goes through the same collection at least twice. In this case if IEnumerable<> is used, you might end up regenerating the elements each time you go over the iterator.
ReSharper even has a special warning for this type of code: PossibleMultipleEnumeration
I am not saying that you should not use IEnumerable<>. Everything has its time and place and it's not always a good idea to use the most generic interface. Be careful with your choice.

How to count the items of an IEnumerable?

Given an instance IEnumerable o how can I get the item Count? (without enumerating through all the items)
For example, if the instance is of ICollection, ICollection<T> and IReadOnlyCollection<T>, each of these interfaces have their own Count method.
Is getting the Count property by reflection the only way?
Instead, can I check and cast o to ICollection<T> for example, so I can then call Count ?
It depends how badly you want to avoid enumerating the items if the count is not available otherwise.
If you can enumerate the items, you can use the LINQ method Enumerable.Count. It will look for a quick way to get the item count by casting into one of the interfaces. If it can't, it will enumerate.
If you want to avoid enumeration at any cost, you will have to perform a type cast. In a real life scenario you often will not have to consider all the interfaces you have named, since you usually use one of them (IReadOnlyCollection is rare and ICollection only used in legacy code). If you have to consider all of the interfaces, try them all in a separate method, which can be an extension:
static class CountExtensions {
public static int? TryCount<T>(this IEnumerable<T> items) {
switch (items) {
case ICollection<T> genCollection:
return genCollection.Count;
case ICollection legacyCollection:
return legacyCollection.Count;
case IReadOnlyCollection<T> roCollection:
return roCollection.Count;
default:
return null;
}
}
}
Access the extension method with:
int? count = myEnumerable.TryCount();
IEnumerable doesn't promise a count . What if it was a random sequence or a real time data feed from a sensor? It is entirely possible for the collection to be infinitely sized. The only way to count them is to start at zero and increment for each element that the enumerator provides. Which is exactly what LINQ does, so don't reinvent the wheel. LINQ is smart enough to use .Count properties of collections that support this.
The only way to really cover all your possible types for a collection is to use the generic interface and call the Count-method. This also covers other types such as streams or just iterators. Furthermore it will use the Count-property as of Count property vs Count() method? to avoid unneccessary overhead.
If you however have a non-generic collection you´d have to use reflection to use the correct property. However this is cumbersome and may fail if your collection doesn´t even have the property (e.g. an endless stream or just an iterator). On the other hand IEnumerable<T>.Count() will handle those types with the optimization mentioned above. Only if neccessary it will iterate the entire collection.

Why IReadOnlyCollection has ElementAt but not IndexOf

I am working with a IReadOnlyCollection of objects.
Now I'm a bit surprised, because I can use linq extension method ElementAt(). But I don't have access to IndexOf().
This to me looks a bit illogical: I can get the element at a given position, but I cannot get the position of that very same element.
Is there a specific reason for it?
I've already read -> How to get the index of an element in an IEnumerable? and I'm not totally happy with the response.
IReadOnlyCollection is a collection, not a list, so strictly speaking, it should not even have ElementAt(). This method is defined in IEnumerable as a convenience, and IReadOnlyCollection has it because it inherits it from IEnumerable. If you look at the source code, it checks whether the IEnumerable is in fact an IList, and if so it returns the element at the requested index, otherwise it proceeds to do a linear traversal of the IEnumerable until the requested index, which is inefficient.
So, you might ask why IEnumerable has an ElementAt() but not IndexOf(), but I do not find this question very interesting, because it should not have either of these methods. An IEnumerable is not supposed to be indexable.
Now, a very interesting question is why IReadOnlyList has no IndexOf() either.
IReadOnlyList<T> has no IndexOf() for no good reason whatsoever.
If you really want to find a reason to mention, then the reason is historical:
Back in the mid-nineties when C# was laid down, people had not quite started to realize the benefits of immutability and readonlyness, so the IList<T> interface that they baked into the language was, unfortunately, mutable.
The right thing would have been to come up with IReadOnlyList<T> as the base interface, and make IList<T> extend it, adding mutation methods only, but that's not what happened.
IReadOnlyList<T> was invented a considerable time after IList<T>, and by that time it was too late to redefine IList<T> and make it extend IReadOnlyList<T>. So, IReadOnlyList<T> was built from scratch.
They could not make IReadOnlyList<T> extend IList<T>, because then it would have inherited the mutation methods, so they based it on IReadOnlyCollection<T> and IEnumerable<T> instead. They added the this[i] indexer, but then they either forgot to add other methods like IndexOf(), or they intentionally omitted them since they can be implemented as extension methods, thus keeping the interface simpler. But they did not provide any such extension methods.
So, here, is an extension method that adds IndexOf() to IReadOnlyList<T>:
using Collections = System.Collections.Generic;
public static int IndexOf<T>( this Collections.IReadOnlyList<T> self, T elementToFind )
{
int i = 0;
foreach( T element in self )
{
if( Equals( element, elementToFind ) )
return i;
i++;
}
return -1;
}
Be aware of the fact that this extension method is not as powerful as a method built into the interface would be. For example, if you are implementing a collection which expects an IEqualityComparer<T> as a construction (or otherwise separate) parameter, this extension method will be blissfully unaware of it, and this will of course lead to bugs. (Thanks to Grx70 for pointing this out in the comments.)
It is because the IReadOnlyCollection (which implements IEnumerable) does not necessarily implement indexing, which often required when you want to numerically order a List. IndexOf is from IList.
Think of a collection without index like Dictionary for example, there is no concept of numeric index in Dictionary. In Dictionary, the order is not guaranteed, only one to one relation between key and value. Thus, collection does not necessarily imply numeric indexing.
Another reason is because IEnumerable is not really two ways traffic. Think of it this way: IEnumerable may enumerate the items x times as you specify and find the element at x (that is, ElementAt), but it cannot efficiently know if any of its element is located in which index (that is, IndexOf).
But yes, it is still pretty weird even you think it this way as would expect it to have either both ElementAt and IndexOf or none.
IndexOf is a method defined on List, whereas IReadOnlyCollection inherits just IEnumerable.
This is because IEnumerable is just for iterating entities. However an index doesn't apply to this concept, because the order is arbitrary and is not guaranteed to be identical between calls to IEnumerable. Furthermore the interface simply states that you can iterate a collection, whereas List states you can perform adding and removing also.
The ElementAt method sure does exactly this. However I won't use it as it reiterates the whole enumeration to find one single element. Better use First or just a list-based approach.
Anyway the API design seems odd to me as it allows an (inefficient) approach on getting an element at n-th position but does not allow to get the index of an arbitrary element which would be the same inefficient search leading to up to n iterations. I'd agree with Ian on either both (which I wouldn't recommend) or neither.
IReadOnlyCollection<T> has ElementAt<T>() because it is an extension to IEnumerable<T>, which has that method. ElementAt<T>() iterates over the IEnumerable<T> a specified number of iterations and returns value as that position.
IReadOnlyCollection<T> lacks IndexOf<T>() because, as an IEnumerable<T>, it does not have any specified order and thus the concept of an index does not apply. Nor does IReadOnlyCollection<T> add any concept of order.
I would recommend IReadOnlyList<T> when you want an indexable version of IReadOnlyCollection<T>. This allows you to correctly represent an unchangeable collection of objects with an index.
This extension method is almost the same as Mike's. The only difference is that it includes a predicate, so you can use it like this: var index = list.IndexOf(obj => obj.Id == id)
public static int IndexOf<T>(this IReadOnlyList<T> self, Func<T, bool> predicate)
{
for (int i = 0; i < self.Count; i++)
{
if (predicate(self[i]))
return i;
}
return -1;
}

Count property vs Count() method?

Working with a collection I have the two ways of getting the count of objects; Count (the property) and Count() (the method). Does anyone know what the key differences are?
I might be wrong, but I always use the Count property in any conditional statements because I'm assuming the Count() method performs some sort of query against the collection, where as Count must have already been assigned prior to me 'getting.' But that's a guess - I don't know if performance will be affected if I'm wrong.
EDIT: Out of curiosity then, will Count() throw an exception if the collection is null? Because I'm pretty sure the Count property simply returns 0.
Decompiling the source for the Count() extension method reveals that it tests whether the object is an ICollection (generic or otherwise) and if so simply returns the underlying Count property:
So, if your code accesses Count instead of calling Count(), you can bypass the type checking - a theoretical performance benefit but I doubt it would be a noticeable one!
// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
checked
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
ICollection<TSource> collection = source as ICollection<TSource>;
if (collection != null)
{
return collection.Count;
}
ICollection collection2 = source as ICollection;
if (collection2 != null)
{
return collection2.Count;
}
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num++;
}
}
return num;
}
}
Performance is only one reason to choose one or the other. Choosing .Count() means that your code will be more generic. I've had occasions where I refactored some code that no longer produced a collection, but instead something more generic like an IEnumerable, but other code broke as a result because it depended on .Count and I had to change it to .Count(). If I made a point to use .Count() everywhere, the code would likely be more reusable and maintainable. Usually opting to utilize the more generic interfaces if you can get away with it is your best bet. By more generic, I mean the simpler interface that is implemented by more types, and thus netting you greater compatibility between code.
I'm not saying .Count() is better, I'm just saying there's other considerations that deal more with the reusability of the code you are writing.
The .Count() method might be smart enough, or know about the type in question, and if so, it might use the underlying .Count property.
Then again, it might not.
I would say it is safe to assume that if the collection has a .Count property itself, that's going to be your best bet when it comes to performance.
If the .Count() method doesn't know about the collection, it will enumerate over it, which will be an O(n) operation.
Short Version: If you have the choice between a Count property and a Count() method always choose the property.
The difference is mainly around the efficiency of the operation. All BCL collections which expose a Count property do so in an O(1) fashion. The Count() method though can, and often will, cost O(N). There are some checks to try and get it to O(1) for some implementations but it's by no means guaranteed.
The Count() method is the LINQ method that works on any IEnumerable<>. You would expect the Count() method to iterate over the whole collection to find the count, but I believe the LINQ code actually has some optimizations in there to detect if a Count property exists and if so use that.
So they should both do almost identical things. The Count property is probably slightly better since there doesn't need to be a type check in there.
Count() method is an extension method that iterates each element of an IEnumerable<> and returns how many elements are there. If the instance of IEnumerable is actually a List<>, so it's optimized to return the Count property instead of iterating all elements.
Count() is there as an extension method from LINQ - Count is a property on Lists, actual .NET collection objects.
As such, Count() will almost always be slower, since it will enumerate the collection / queryable object. On a list, queue, stack etc, use Count. Or for an array - Length.
If there is a Count or Length property, you should always prefer that to the Count() method, which generally iterates the entire collection to count the number of elements within. Exceptions would be when the Count() method is against a LINQ to SQL or LINQ to Entities source, for example, in which case it would perform a count query against the datasource. Even then, if there is a Count property, you would want to prefer that, since it likely has less work to do.
The Count() method has an optimisation for ICollection<T> which results in the Count property being called. In this case there is probably no significant difference in performance.
There are types other than ICollection<T> which have more efficient alternatives to the Count() extension method though. This code analysis performance rule fires on the following types.
CA1829: Use Length/Count property instead of Enumerable.Count method
System.Array
System.Collections.Immutable.ImmutableArray<T>
System.Collections.ICollection
System.Collections.Generic.ICollection<T>
System.Collections.Generic.IReadOnlyCollection<T>
So, we should use Count and Length properties if they are available and fallback to the Count() extension method otherwise.
.Count is a property of a collection and gets the elements in the collection. Unlike .Count() which is an extension method for LINQ and counts the number of elements.
Generally .Count is faster than .Count() because it does not require the overhead of creating and enumerating a LINQ query.
It's better to use the .Count property unless you need the additional functionality provided by the .Count() method, such as the ability to specify a filtering predicate, e.g.
int count = numbers.Count(n => n.Id == 100);

Interview Question: .Any() vs if (.Length > 0) for testing if a collection has elements

In a recent interview I was asked what the difference between .Any() and .Length > 0 was and why I would use either when testing to see if a collection had elements.
This threw me a little as it seems a little obvious but feel I may be missing something.
I suggested that you use .Length when you simply need to know that a collection has elements and .Any() when you wish to filter the results.
Presumably .Any() takes a performance hit too as it has to do a loop / query internally.
Length only exists for some collection types such as Array.
Any is an extension method that can be used with any collection that implements IEnumerable<T>.
If Length is present then you can use it, otherwise use Any.
Presumably .Any() takes a performance hit too as it has to do a loop / query internally.
Enumerable.Any does not loop. It fetches an iterator and checks if MoveNext returns true. Here is the source code from .NET Reflector.
public static bool Any<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
return true;
}
}
return false;
}
I'm guessing the interviewer may have meant to ask about checking Any() versus Count() > 0 (as opposed to Length > 0).
Basically, here's the deal.
Any() will effectively try to determine if a collection has any members by enumerating over a single item. (There is an overload to check for a given criterion using a Func<T, bool>, but I'm guessing the interviewer was referring to the version of Any() that takes no arguments.) This makes it O(1).
Count() will check for a Length or Count property (from a T[] or an ICollection or ICollection<T>) first. This would generally be O(1). If that isn't available, however, it will count the items in a collection by enumerating over the entire thing. This would be O(n).
A Count or Length property, if available, would most likely be O(1) just like Any(), and would probably perform better as it would require no enumerating at all. But the Count() extension method does not ensure this. Therefore it is sometimes O(1), sometimes O(n).
Presumably, if you're dealing with a nondescript IEnumerable<T> and you don't know whether it implements ICollection<T> or not, you are much better off using Any() than Count() > 0 if your intention is simply to ensure the collection is not empty.
Length is a property of array types, while Any() is an extension method of Enumerable. Therefore, you can use Length only when working with arrays. When working with more abstract types (IEnumerable<T>), you can use Any().
.Length... System.Array
.Any ... IEnumerable (extension method).
I would prefer using "length" whenever i can find it. Property is anyhow light-weight than any method call.
Though, implementation of "Any" won't be doing anything more than the below mentioned code.
private static bool Any<T>(this IEnumerable<T> items)
{
return items!=null && items.GetEnumerator().MoveNext();
}
Also,
A better question could have been a difference beterrn ".Count" and ".Length", what say :).
I think this is a more general question of what to choose if we have 2 way to express something.
In does situation I would suggest the statement: "Be specific" quote from Peter Norvig in his book PAIP
Be specific mean use what best describe what your are doing.
Thus what you want to say is something like:
collection.isEmpty()
If you don't have such construct I will choose the common idiom that the communities used.
For me .Length > 0 is not the best one since it impose that you can size the object.
Suppose your implement infinite list. .Lenght would obviously not work.
Sounds quite similar to this Stackoverflow question about difference between .Count and .Any for checking for existence of a result: Check for Existence of a Result in Linq-to-xml
In that case it is better to use Any then Count, as Count will iterate all elements of an IEnumerable
We know that .Length is only used for Arrays and .Any() is used for collections of IEnumerable.
You can swap .Count for .Length and you have the same question for working with collections of IEnumberable
Both .Any() and .Count perform a null check before beginning an enumerator. So with regards to performance they are the same.
As for the array lets assume we have the following line:
int[] foo = new int[10];
Here foo.Length is 10. While this is correct it may not be the answer your looking for because we haven't added anything to the array yet. If foo is null it will throw an exception.
.Length iterates through the collection and returns the number of elements. Complexity is O(n)
.Any checks whether the collection has at least one item. Complexity is O(1).

Categories