Working with a collection I have the two ways of getting the count of objects; Count (the property) and Count() (the method). Does anyone know what the key differences are?
I might be wrong, but I always use the Count property in any conditional statements because I'm assuming the Count() method performs some sort of query against the collection, where as Count must have already been assigned prior to me 'getting.' But that's a guess - I don't know if performance will be affected if I'm wrong.
EDIT: Out of curiosity then, will Count() throw an exception if the collection is null? Because I'm pretty sure the Count property simply returns 0.
Decompiling the source for the Count() extension method reveals that it tests whether the object is an ICollection (generic or otherwise) and if so simply returns the underlying Count property:
So, if your code accesses Count instead of calling Count(), you can bypass the type checking - a theoretical performance benefit but I doubt it would be a noticeable one!
// System.Linq.Enumerable
public static int Count<TSource>(this IEnumerable<TSource> source)
{
checked
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
ICollection<TSource> collection = source as ICollection<TSource>;
if (collection != null)
{
return collection.Count;
}
ICollection collection2 = source as ICollection;
if (collection2 != null)
{
return collection2.Count;
}
int num = 0;
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num++;
}
}
return num;
}
}
Performance is only one reason to choose one or the other. Choosing .Count() means that your code will be more generic. I've had occasions where I refactored some code that no longer produced a collection, but instead something more generic like an IEnumerable, but other code broke as a result because it depended on .Count and I had to change it to .Count(). If I made a point to use .Count() everywhere, the code would likely be more reusable and maintainable. Usually opting to utilize the more generic interfaces if you can get away with it is your best bet. By more generic, I mean the simpler interface that is implemented by more types, and thus netting you greater compatibility between code.
I'm not saying .Count() is better, I'm just saying there's other considerations that deal more with the reusability of the code you are writing.
The .Count() method might be smart enough, or know about the type in question, and if so, it might use the underlying .Count property.
Then again, it might not.
I would say it is safe to assume that if the collection has a .Count property itself, that's going to be your best bet when it comes to performance.
If the .Count() method doesn't know about the collection, it will enumerate over it, which will be an O(n) operation.
Short Version: If you have the choice between a Count property and a Count() method always choose the property.
The difference is mainly around the efficiency of the operation. All BCL collections which expose a Count property do so in an O(1) fashion. The Count() method though can, and often will, cost O(N). There are some checks to try and get it to O(1) for some implementations but it's by no means guaranteed.
The Count() method is the LINQ method that works on any IEnumerable<>. You would expect the Count() method to iterate over the whole collection to find the count, but I believe the LINQ code actually has some optimizations in there to detect if a Count property exists and if so use that.
So they should both do almost identical things. The Count property is probably slightly better since there doesn't need to be a type check in there.
Count() method is an extension method that iterates each element of an IEnumerable<> and returns how many elements are there. If the instance of IEnumerable is actually a List<>, so it's optimized to return the Count property instead of iterating all elements.
Count() is there as an extension method from LINQ - Count is a property on Lists, actual .NET collection objects.
As such, Count() will almost always be slower, since it will enumerate the collection / queryable object. On a list, queue, stack etc, use Count. Or for an array - Length.
If there is a Count or Length property, you should always prefer that to the Count() method, which generally iterates the entire collection to count the number of elements within. Exceptions would be when the Count() method is against a LINQ to SQL or LINQ to Entities source, for example, in which case it would perform a count query against the datasource. Even then, if there is a Count property, you would want to prefer that, since it likely has less work to do.
The Count() method has an optimisation for ICollection<T> which results in the Count property being called. In this case there is probably no significant difference in performance.
There are types other than ICollection<T> which have more efficient alternatives to the Count() extension method though. This code analysis performance rule fires on the following types.
CA1829: Use Length/Count property instead of Enumerable.Count method
System.Array
System.Collections.Immutable.ImmutableArray<T>
System.Collections.ICollection
System.Collections.Generic.ICollection<T>
System.Collections.Generic.IReadOnlyCollection<T>
So, we should use Count and Length properties if they are available and fallback to the Count() extension method otherwise.
.Count is a property of a collection and gets the elements in the collection. Unlike .Count() which is an extension method for LINQ and counts the number of elements.
Generally .Count is faster than .Count() because it does not require the overhead of creating and enumerating a LINQ query.
It's better to use the .Count property unless you need the additional functionality provided by the .Count() method, such as the ability to specify a filtering predicate, e.g.
int count = numbers.Count(n => n.Id == 100);
Related
Given an instance IEnumerable o how can I get the item Count? (without enumerating through all the items)
For example, if the instance is of ICollection, ICollection<T> and IReadOnlyCollection<T>, each of these interfaces have their own Count method.
Is getting the Count property by reflection the only way?
Instead, can I check and cast o to ICollection<T> for example, so I can then call Count ?
It depends how badly you want to avoid enumerating the items if the count is not available otherwise.
If you can enumerate the items, you can use the LINQ method Enumerable.Count. It will look for a quick way to get the item count by casting into one of the interfaces. If it can't, it will enumerate.
If you want to avoid enumeration at any cost, you will have to perform a type cast. In a real life scenario you often will not have to consider all the interfaces you have named, since you usually use one of them (IReadOnlyCollection is rare and ICollection only used in legacy code). If you have to consider all of the interfaces, try them all in a separate method, which can be an extension:
static class CountExtensions {
public static int? TryCount<T>(this IEnumerable<T> items) {
switch (items) {
case ICollection<T> genCollection:
return genCollection.Count;
case ICollection legacyCollection:
return legacyCollection.Count;
case IReadOnlyCollection<T> roCollection:
return roCollection.Count;
default:
return null;
}
}
}
Access the extension method with:
int? count = myEnumerable.TryCount();
IEnumerable doesn't promise a count . What if it was a random sequence or a real time data feed from a sensor? It is entirely possible for the collection to be infinitely sized. The only way to count them is to start at zero and increment for each element that the enumerator provides. Which is exactly what LINQ does, so don't reinvent the wheel. LINQ is smart enough to use .Count properties of collections that support this.
The only way to really cover all your possible types for a collection is to use the generic interface and call the Count-method. This also covers other types such as streams or just iterators. Furthermore it will use the Count-property as of Count property vs Count() method? to avoid unneccessary overhead.
If you however have a non-generic collection you´d have to use reflection to use the correct property. However this is cumbersome and may fail if your collection doesn´t even have the property (e.g. an endless stream or just an iterator). On the other hand IEnumerable<T>.Count() will handle those types with the optimization mentioned above. Only if neccessary it will iterate the entire collection.
Could the order of items in list1 and list2 be different when
list2 = list1.ToList() and both are of type List?
If list1 is consistent in its ordering, then list2 will be in the same order.
It's possible that list1 is some type that doesn't itself promise to have the same order every time it is enumerated, in which case of course the two might differ, but it is the enumeration logic of list1 that is responsible for that, not ToList(). (The name of list1 suggests that it is itself a list, in which case the orders would certainly be the same).
One answer here already includes the source of one of the implementations of ToList(). It is not the only version of ToList() that exists, and corefx optimises for many more cases than netfx does, but it remains that all versions produce the list in the same order as they source would deliver them on enumeration.
Another answer says that this is not guaranteed in the documentation, only by the description of the overload of the List<T> constructor that takes an enumeration (which is not, incidentally the only constructor used by all implementations of ToList() in all cases).
However, a change to ToList() that did not promise to maintain the order would not be accepted.
Consider the case of someSource.OrderBy(x => x.ID).ToList(). In such a case (which incidentally, is a case that is optimised in corefx) if ToList() could change the order it would obviously remove the point of the OrderBy().
Okay, so what if someone changed ToList() in a way that didn't promise to maintain order, but treated OrderBy() as a special case? (After all, it's already a special case for performance reasons in one version). Well, that would still break say someSource.OrderBy(x => x.ID).Where(x => !x.Deleted).ToList(). In all, if we had a version of ToList() that didn't maintain order we'd be able to come up with some sort of linq query where a given order was promised by another part of the query and such an implementation of ToList() broke the promise of the query as a whole.
So, barring special-casing a source that explicitly doesn't promise to maintain order (ParallelEnumerable doesn't unless you use AsOrdered(), since there are a lot of advantages of not maintaining an order unless really necessary when it comes to parallel processing) we can't make a change to ToList() that doesn't maintain order without breaking the promises of linq queries as a whole.
So while the guarantee isn't called out in the documentation of ToList(), it is nevertheless guaranteed and will not be changed in a later version.
The general answer is No, order is not gauranteed to be preserved even if both lists are type of List.
Because List is not a sealed class. Another class could derive from it and override GetEnumerable and possibly return items out of order.
Sounds strange, yes. But its possible. So you cant say ToList will return exact same list unless they are both concrete type of List and not of any derived type.
The other answer says that its implementation detail that could change in future. I dont think so. List is very essential part of .net collections. Such a unreasanable breaking change is very unlikey.
Dont worry, as long as you use concrete List order is always preserved.
The simple answer is no, ToList will just loop over the source enumerable and keep the same order. List<T> guarantees order, so calling ToList on it won't change it.
The more nuanced answer however is that you may not be starting with a List<T> and may have an more general IEnumerable<T> which does not guarantee order at all This means that multiple calls to source.ToList() may produce different outputs.
In practice however, virtually all implementations of IEnumerable will preserve order.
For starters: it's safe to say everybody expects that. But why?
According to the documentation, the constructor of List<T> that takes an IEnumerable<T> is guaranteeing the order is preserved:
The elements are copied onto the List in the same order they are read by the enumerator of the collection.
While the documentation of .ToList() makes no such promises (doesn't say anything to the contrary either though).
Internally, one uses the other, so you are safe, but you are not guaranteed to be safe should the internal implementation of .ToList() change. So if you want to be sure, you should call new List(oldList); directly.
Smallprint: if you are nit picky about it... I could not find a guarantee that the IEnumerable<T> interface would return the elements of a list in order either. So both ways, you have to look at what is, and if you need to rely on it, maybe write some unit tests asserting this behavior so you get notified immediately should the current behavior change.
There should be no different. Check the source code out.
public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new List<TSource>(source);
}
And the part when it creates a list
// Constructs a List, copying the contents of the given collection. The
// size and capacity of the new list will both be equal to the size of the
// given collection.
//
public List(IEnumerable<T> collection) {
if (collection==null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
Contract.EndContractBlock();
ICollection<T> c = collection as ICollection<T>;
if( c != null) {
int count = c.Count;
if (count == 0)
{
_items = _emptyArray;
}
else {
_items = new T[count];
c.CopyTo(_items, 0);
_size = count;
}
}
else {
_size = 0;
_items = _emptyArray;
// This enumerable could be empty. Let Add allocate a new array, if needed.
// Note it will also go to _defaultCapacity first, not 1, then 2, etc.
using(IEnumerator<T> en = collection.GetEnumerator()) {
while(en.MoveNext()) {
Add(en.Current);
}
}
}
}
https://github.com/Microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs
Given a huge collection of objects, is there a performance difference between the the following?
Collection.Contains:
myCollection.Contains(myElement)
Enumerable.Any:
myCollection.Any(currentElement => currentElement == myElement)
Contains() is an instance method, and its performance depends largely on the collection itself. For instance, Contains() on a List is O(n), while Contains() on a HashSet is O(1).
Any() is an extension method, and will simply go through the collection, applying the delegate on every object. It therefore has a complexity of O(n).
Any() is more flexible however since you can pass a delegate. Contains() can only accept an object.
It depends on the collection. If you have an ordered collection, then Contains might do a smart search (binary, hash, b-tree, etc.), while with `Any() you are basically stuck with enumerating until you find it (assuming LINQ-to-Objects).
Also note that in your example, Any() is using the == operator which will check for referential equality, while Contains will use IEquatable<T> or the Equals() method, which might be overridden.
I suppose that would depend on the type of myCollection is which dictates how Contains() is implemented. If a sorted binary tree for example, it could search smarter. Also it may take the element's hash into account. Any() on the other hand will enumerate through the collection until the first element that satisfies the condition is found. There are no optimizations for if the object had a smarter search method.
Contains() is also an extension method which can work fast if you use it in the correct way.
For ex:
var result = context.Projects.Where(x => lstBizIds.Contains(x.businessId)).Select(x => x.projectId).ToList();
This will give the query
SELECT Id
FROM Projects
INNER JOIN (VALUES (1), (2), (3), (4), (5)) AS Data(Item) ON Projects.UserId = Data.Item
while Any() on the other hand always iterate through the O(n).
Hope this will work....
In a recent interview I was asked what the difference between .Any() and .Length > 0 was and why I would use either when testing to see if a collection had elements.
This threw me a little as it seems a little obvious but feel I may be missing something.
I suggested that you use .Length when you simply need to know that a collection has elements and .Any() when you wish to filter the results.
Presumably .Any() takes a performance hit too as it has to do a loop / query internally.
Length only exists for some collection types such as Array.
Any is an extension method that can be used with any collection that implements IEnumerable<T>.
If Length is present then you can use it, otherwise use Any.
Presumably .Any() takes a performance hit too as it has to do a loop / query internally.
Enumerable.Any does not loop. It fetches an iterator and checks if MoveNext returns true. Here is the source code from .NET Reflector.
public static bool Any<TSource>(this IEnumerable<TSource> source)
{
if (source == null)
{
throw Error.ArgumentNull("source");
}
using (IEnumerator<TSource> enumerator = source.GetEnumerator())
{
if (enumerator.MoveNext())
{
return true;
}
}
return false;
}
I'm guessing the interviewer may have meant to ask about checking Any() versus Count() > 0 (as opposed to Length > 0).
Basically, here's the deal.
Any() will effectively try to determine if a collection has any members by enumerating over a single item. (There is an overload to check for a given criterion using a Func<T, bool>, but I'm guessing the interviewer was referring to the version of Any() that takes no arguments.) This makes it O(1).
Count() will check for a Length or Count property (from a T[] or an ICollection or ICollection<T>) first. This would generally be O(1). If that isn't available, however, it will count the items in a collection by enumerating over the entire thing. This would be O(n).
A Count or Length property, if available, would most likely be O(1) just like Any(), and would probably perform better as it would require no enumerating at all. But the Count() extension method does not ensure this. Therefore it is sometimes O(1), sometimes O(n).
Presumably, if you're dealing with a nondescript IEnumerable<T> and you don't know whether it implements ICollection<T> or not, you are much better off using Any() than Count() > 0 if your intention is simply to ensure the collection is not empty.
Length is a property of array types, while Any() is an extension method of Enumerable. Therefore, you can use Length only when working with arrays. When working with more abstract types (IEnumerable<T>), you can use Any().
.Length... System.Array
.Any ... IEnumerable (extension method).
I would prefer using "length" whenever i can find it. Property is anyhow light-weight than any method call.
Though, implementation of "Any" won't be doing anything more than the below mentioned code.
private static bool Any<T>(this IEnumerable<T> items)
{
return items!=null && items.GetEnumerator().MoveNext();
}
Also,
A better question could have been a difference beterrn ".Count" and ".Length", what say :).
I think this is a more general question of what to choose if we have 2 way to express something.
In does situation I would suggest the statement: "Be specific" quote from Peter Norvig in his book PAIP
Be specific mean use what best describe what your are doing.
Thus what you want to say is something like:
collection.isEmpty()
If you don't have such construct I will choose the common idiom that the communities used.
For me .Length > 0 is not the best one since it impose that you can size the object.
Suppose your implement infinite list. .Lenght would obviously not work.
Sounds quite similar to this Stackoverflow question about difference between .Count and .Any for checking for existence of a result: Check for Existence of a Result in Linq-to-xml
In that case it is better to use Any then Count, as Count will iterate all elements of an IEnumerable
We know that .Length is only used for Arrays and .Any() is used for collections of IEnumerable.
You can swap .Count for .Length and you have the same question for working with collections of IEnumberable
Both .Any() and .Count perform a null check before beginning an enumerator. So with regards to performance they are the same.
As for the array lets assume we have the following line:
int[] foo = new int[10];
Here foo.Length is 10. While this is correct it may not be the answer your looking for because we haven't added anything to the array yet. If foo is null it will throw an exception.
.Length iterates through the collection and returns the number of elements. Complexity is O(n)
.Any checks whether the collection has at least one item. Complexity is O(1).
I'm writing a cache-eject method that essentially looks like this:
while ( myHashSet.Count > MAX_ALLOWED_CACHE_MEMBERS )
{
EjectOldestItem( myHashSet );
}
My question is about how Count is determined: is it just a private or protected int, or is it calculated by counting the elements each time its called?
From http://msdn.microsoft.com/en-us/library/ms132433.aspx:
Retrieving the value of this property is an O(1) operation.
This guarantees that accessing the Count won't iterate over the whole collection.
Edit: as many other posters suggested, IEnumerable<...>.Count() is however not guaranteed to be O(1). Use with care!
IEnumerable<...>.Count() is an extension method defined in System.Linq.Enumerable. The current implementation makes an explicit test if the counted IEnumerable<T> is indeed an instance of ICollection<T>, and makes use of ICollection<T>.Count if possible. Otherwise it traverses the IEnumerable<T> (possible making lazy evaluation expand) and counts items one by one.
I've not however found in the documentation whether it's guaranteed that IEnumerable<...>.Count() uses O(1) if possible, I only checked the implementation in .NET 3.5 with Reflector.
Necessary late addition: many popular containers are not derived from Collection<T>, but nevertheless their Count property is O(1) (that is, won't iterate over the whole collection). Examples are HashSet<T>.Count (this one is most likely what the OP wanted to ask about), Dictionary<K, V>.Count, LinkedList<T>.Count, List<T>.Count, Queue<T>.Count, Stack<T>.Count and so on.
All these collections implement ICollection<T> or just ICollection, so their Count is an implementation of ICollection<T>.Count (or ICollection.Count). It's not required for an implementation of ICollection<T>.Count to be an O(1) operation, but the ones mentioned above are doing that way, according to the documentation.
(Note aside: some containers, for instance, Queue<T>, implement non-generic ICollection but not ICollection<T>, so they "inherit" the Count property only from from ICollection.)
Your question does not specify a specific Collection class so...
It depends on the Collection class. ArrayList has an internal variable that tracks the count, as does List. However, it is implementation specific, and depending on the type of the collection, it could theoretically get recalculated on each call.
It is an internal value, and is not calculated. The documentation states that getting the value is an O(1) operation.
As others have noted, Count is maintained when modifying the collection. This is nearly always the case with every collection type in the framework. This is considerably different than using the Count extension method on an IEnumerable which will enumerate the collection each time.
Also, with the newer collection classes the Count property is not virtual which means that the jitter can inline the call to the Count accessor which makes it practically the same as accessing a field. In other words, very quick.
In case of a HashSet it's just an internal int field and even SortedSet (a binary tree based set for .net 4) has its count in an internal field.
According to Reflector, it is implemented as
public int Count{ get; }
so it is defined by the derived type
Just a quick note. Be ware that there are two ways to count a collection in .NET 3.5 when System.Linq is used. For a normal collection, the first choice should be to use the Count property, for the reasons already described in other answers.
An alternative method, via the LINQ .Count() extension method, is also available. The intriguing thing about .Count() is that it can be called on ANY enumerable, regardless of whether the underlying class implements ICollection or not, or whether it has a Count property. If you ever do call .Count() however, be aware that it WILL iterate over the collection to dynamically generate a count. That generally results in O(n) complexity.
The only reason I wanted to note this is, using IntelliSense, it is often easy to accidentally end up using the Count() extension rather than the Count property.
It's an internal int that get incremented each time a new item is added to the collection.