I work on applications developed in C#/.NET with Visual Studio. Very often ReSharper, in the prototypes of my methods, advises me to replace the type of my input parameters with more generic ones. For instance, List<> with IEnumerable<> if I only use the list with a foreach in the body of my method. I can understand why it looks smarter to write that but I'm quite concerned with the performance. I fear that the performance of my apps will decrease if I listen to ReSharper...
Can someone explain to me precisely (more or less) what's happening behind the scenes (i.e. in the CLR) when I write:
public void myMethod(IEnumerable<string> list)
{
foreach (string s in list)
{
Console.WriteLine(s);
}
}
static void Main()
{
List<string> list = new List<string>(new string[] {"a", "b", "c"});
myMethod(list);
}
and what is the difference with:
public void myMethod(List<string> list)
{
foreach (string s in list)
{
Console.WriteLine(s);
}
}
static void Main()
{
List<string> list = new List<string>(new string[] {"a", "b", "c"});
myMethod(list);
}
You're worried about performance - but do you have any grounds for that concern? My guess is that you haven't benchmarked the code at all. Always benchmark before replacing readable, clean code with more performant code.
In this case the call to Console.WriteLine will utterly dominate the performance anyway.
While I suspect there may be a theoretical difference in performance between using List<T> and IEnumerable<T> here, I suspect the number of cases where it's significant in real world apps is vanishingly small.
It's not even as if the sequence type is being used for many operations - there's a single call to GetEnumerator() which is declared to return IEnumerator<T> anyway. As the list gets larger, any difference in performance between the two will get even smaller, because it will only have any impact at all at the very start of the loop.
Ignoring the analysis though, the thing to take out of this is to measure performance before you base coding decisions on it.
As for what happens behind the scenes - you'd have to dig into the deep details of exactly what's in the metadata in each case. I suspect that in the case of an interface there's one extra level of redirection, at least in theory - the CLR would have to work out where in the target object's type the vtable for IEnumerable<T> was, and then call into the appropriate method's code. In the case of List<T>, the JIT would know the right offset into the vtable to start with, without the extra lookup. This is just based on my somewhat hazy understanding of JITting, thunking, vtables and how they apply to interfaces. It may well be slightly wrong, but more importantly it's an implementation detail.
You'd have to look at the generated code to be certain, but in this case, I doubt there's much difference. The foreach statement always operates on an IEnumerable or IEnumerable<T>. Even if you specify List<T>, it will still have to get the IEnumerable<T> in order to iterate.
In general, I'd say if you are replace the equivalent non-generic interface by the generic flavour (say IList<> --> IList<T>) you are bound to get better or equivalent performance.
One unique selling point is that because, unlike java, .NET does not use type erasure and supports true value types (struct), one of the main differences would be in how it stores e.g. a List<int> internally. This could quite quickly become a big difference depending on how intensively the List is being used.
A braindead synthetic benchmark showed:
for (int j=0; j<1000; j++)
{
List<int> list = new List<int>();
for (int i = 1<<12; i>0; i--)
list.Add(i);
list.Sort();
}
to be faster by a factor of 3.2x than the semi-equivalent non-generic:
for (int j=0; j<1000; j++)
{
ArrayList list = new ArrayList();
for (int i = 1<<12; i>0; i--)
list.Add(i);
list.Sort();
}
Disclaimer I realize this benchmark is synthetic, it doesn't actually focus on the use of interfaces right there (rather directly dispatches virtual methods calls on a specific type) etc. However, it illustrates the point I'm making. Don't fear generics (at least not for performance reasons).
In general, the increased flexibility will be worth what minor performance difference it would incur.
In the first version (IEnumerable) it is more generic and actually you say the method accepts any argument that implements this interface.
Second version yo restrict the method to accept sepcific class type and this is not recommended at all. And the performance is mostly the same.
The basic reason for this recommendation is creating a method that works on IEnumberable vs. List is future flexibility. If in the future you need to create a MySpecialStringsCollection, you could have it implement the IEnumerable method and still utilize the same method.
Essentially, I think it comes down, unless you're noticing a significant, meaningful performance hit (and I'd be shocked if you noticed any); prefer a more tolerant interface, that will accept more than what you're expecting today.
The definition for List<T> is:
[SerializableAttribute]
public class List<T> : IList<T>, ICollection<T>,
IEnumerable<T>, IList, ICollection, IEnumerable
So List<T> is derived from IList, ICollection, IList<T>, and ICollection<T>, in addition to IEnumerable and IEnumerable<T>.
The IEnumerable interface exposes the GetEnumerator method which returns an IEnumerator, a MoveNext method, and a Current property. These mechanisms are what the List<T> class uses to iterate through the list with foreach and next.
It follows that, if IList, ICollection, IList<T>, and ICollection<T> are not required to do the job, then it's sensible to use IEnumerable or IEnumerable<T> instead, thereby eliminating the additional plumbing.
An interface simply defines the presence and signature of public methods and properties implemented by the class. Since the interface does not "stand on its own", there should be no performance difference for the method itself, and any "casting" penalty - if any - should be almost too small to measure.
There is no performance penalty for a static-upcast. It's a logical construct in program text.
As other people have said, premature optimization is the root of all evil. Write your code, run it through a hotspot analysis before you worry about performance tuning things.
Getting in IEnumerable<> might create some trouble, as you could receive some LINQ expression with differed execution, or yield return. In both cases you won't have a collection but something you could iterate on.
So when you would like to set some boundaries, you could request an array. There is not a problem to call collection.ToArray() before passing parameter, but you'll be sure that there is no hidden differed caveats there.
Related
What is the difference between returning IList vs List, or IEnumerable vs List.
I want to know which is better to return.
When we need to use one, what effect will it have on performance?
There is no such a type that is always better to return. It's a decision you should make based on your design/performance/etc goals.
IEnumerable<T> is nice to use when you want to represent sequence of items, that you can iterate over, but you don't want to allow modifications(Add, Delete etc).
IList<T> gives you everything you could get using IEnumerable<T>, plus operations that give you more control over a collection: Add, Delete, Count, Index access etc.
List<T> is a concrete implementation of IList<T>. I would say that almost always it's better to expose IList<T> interface from your methods rather that List<T> implementation. And it's not just about lists - it's a basic design principle to prefer interfaces over concrete implementations.
Ok, now about non-generic versions IEnumerable, IList, List:
They actually came from very early versions of .NET framework, and life is much better using generic equivalents.
And few words about performance:
IEnumerable<T>(with IEnumerator<T>) is actually an iterator which allows you to defer some computations until later. It means that there is no need to allocate memory right away for storing amounts of data(of course, it's not the case when you have, say, array behind iterator). You can compute data gradually as needed. But it means that these computations might be performed over and over again(say, with every foreach loop). On the other hand, with List you have fixed data in memory, with cheap Index and Count operations. As you see, it's all about compromise.
Using concrete classes in parameters and results of methods makes a strong dependency, while using interfaces don't. What it mean?
If in the future you'll change the implementation of your class, and will use SynchroinizedCollection, LinkedList, or something other instead of List, then you have to change your methods signature, exactly the type of return value.
After that you have to not only rebuild assemblies that used this class, but may have to rewrite them.
However, if you're using one of IEnumerable, IReadonlyCollection, ICollection, IList interfaces, you'll not have to rewrite and recompile client assemblies. Thus, interfaces always preferred classes in parameters and results. (But remember, we're talking about dependencies between different assemblies. With the same assembly this rule is not so important.)
The question is, what interface to use? It depends on requirements of client classes (use cases). F.e. if you're processing elements one by one, use IEnumerable<T>, and if you need a count of elements, use IReadonlyCollection<T>. Both of these interfaces are co-variance that is convenient for a type-casting.
If you need write abilities (Add, Remove, Clear) or non co-variance read only abilities (Contains), use ICollection<T>. Finally, if you need a random indexed access, use IList<T>.
As for performance, the invocation of interface's method a bit slower, but it's insignificant difference. You shouldn't care about this.
Must .NET's IList be finite? Suppose I write a class FibonacciList implementing IList<BigInteger>
The property Item[n] returns the nth Fibonacci number.
The property IsReadOnly returns true.
The methods IndexOf and Contains we can implement easily enough because the Fibonacci sequence is increasing - to test if the number m is Fibonacci, we need only to compute the finite sequence of Fibonacci numbers up to m.
The method GetEnumerator() doing the right thing
We've now implemented all the methods expected of read-only ILists except Count().
Is this cool, or an abuse of IList?
Fibonacci numbers get impractically big quickly (hence IList<BigInteger> above) . A bounded infinite sequence might be more sensible, it could implement IList<long> or IList<double>.
Addendum II: Fibonacci sequence may have been a bad example, because computing distant values is expensive - to find the nth value one has to compute all earlier values. Thus as Mošmondor said, one might as well make it an IEnumerable and use .ElementAt. However there exist other sequences where one can compute distant values quickly without computing earlier values. (Surprisingly the digits of pi are such a sequence). These sequences are more 'listy', they truly support random access.
Edit: No-one argues against infinite IEnumerables. How do they handle Count()?
To most developers, IList and ICollection imply that you have a pre-evaluated, in-memory collection to work with. With IList specifically, there is an implicit contract of constant-time Add* and indexing operations. This is why LinkedList<T> does not implement IList<T>. I would consider a FibonacciList to be a violation of this implied contract.
Note the following paragraph from a recent MSDN Magazine article discussing the reasons for adding read-only collection interfaces to .NET 4.5:
IEnumerable<T> is sufficient for most scenarios that deal with collections of types, but sometimes you need more power than it provides:
Materialization: IEnumerable<T> does not allow you to express whether the collection is already available (“materialized”) or whether it’s computed every time you iterate over it (for example, if it represents a LINQ query). When an algorithm requires multiple iterations over the collection, this can result in performance degradation if computing the sequence is expensive; it can also cause subtle bugs because of identity mismatches when objects are being generated again on subsequent passes.
As others have pointed out, there is also the question of what you would return for .Count.
It's perfectly fine to use IEnumerable or IQueryable in for such collections of data, because there is an expectation that these types can be lazily evaluated.
Regarding Edit 1: .Count() is not implemented by the IEnumerable<T> interface: it is an extension method. As such, developers need to expect that it can take any amount of time, and they need to avoid calling it in cases where they don't actually need to know the number of items. For example, if you just want to know whether an IEnumerable<T> has any items, it's better to use .Any(). If you know that there's a maximum number of items you want to deal with, you can use .Take(). If a collection has more than int.MaxValue items in it, .Count() will encounter an operation overflow. So there are some workarounds that can help to reduce the danger associated with infinite sequences. Obviously if programmers haven't taken these possibilities into account, it can still cause problems, though.
Regarding Edit 2: If you're planning to implement your sequence in a way that indexing is constant-time, that addresses my main point pretty handily. Sixlettervariables's answer still holds true, though.
*Obviously there's more to this: Add is only expected to work if IList.IsFixedSize returns false. Modification is only possible if IsReadOnly returns false, etc. IList was a poorly-thought-out interface in the first place: a fact which may finally be remedied by the introduction of read-only collection interfaces in .NET 4.5.
Update
Having given this some additional thought, I've come to the personal opinion that IEnumerable<>s should not be infinite either. In addition to materializing methods like .ToList(), LINQ has several non-streaming operations like .OrderBy() which must consume the entire IEnumerable<> before the first result can be returned. Since so many methods assume IEnumerable<>s are safe to traverse in their entirety, it would be a violation of the Liskov Substitution Principle to produce an IEnumerable<> that is inherently unsafe to traverse indefinitely.
If you find that your application often requires segments of the Fibonacci sequence as IEnumerables, I'd suggest creating a method with a signature similar to Enumerable.Range(int, int), which allows the user to define a starting and ending index.
If you'd like to embark on a Gee-Whiz project, you could conceivably develop a Fibonacci-based IQueryable<> provider, where users could use a limited subset of LINQ query syntax, like so:
// LINQ to Fibonacci!
var fibQuery = from n in Fibonacci.Numbers // (returns an IQueryable<>)
where n.Index > 5 && n.Value < 20000
select n.Value;
var fibCount = fibQuery.Count();
var fibList = fibQuery.ToList();
Since your query provider would have the power to evaluate the where clauses as lambda expressions, you could have enough control to implement Count methods and .GetEnumerator() in a way as to ensure that the query is restrictive enough to produce a real answer, or throw an exception as soon as the method is called.
But this reeks of being clever, and would probably be a really bad idea for any real-life software.
I would imagine that a conforming implementation must be finite, otherwise what would you return for ICollection<T>.Count?
/// <summary>
/// Gets the number of elements contained in the <see cref="ICollection{T}" />.
/// </summary>
int Count { get; }
Another consideration is CopyTo, which under its normal overload would never stop in a Fibonacci case.
What this means is an appropriate implementation of a Fibonacci Sequence would be simply IEnumerable<int> (using a generator pattern). (Ab)use of an IList<T> would just cause problems.
In your case, I would rather 'violate' IEnumerable and have my way with yield return.
:)
An infinite collection would probably best be implemented as an IEnumerable<T>, not an IList<T>. You could also make use of the yield return syntax when implementing, like so (ignore overflow issues, etc.):
public IEnumerable<long> Fib()
{
yield return 1;
yield return 1;
long l1 = 1;
long l2 = 1;
while (true)
{
long t = l1;
l1 = l2;
l2 = t + l1;
yield return l2;
}
}
As #CodeInChaos pointed out in the comments, the Item property of IList has signature
T this[ int index ] { get; set; }
We see ILists are indexed by ints, so their length is bounded by Int32.MaxValue . Elements of greater index would be inaccessible. This occurred to me when writing the question, but I left it out, because the problem is fun to think about otherwise.
EDIT
Having had a day to reflect on my answer and, in light of #StriplingWarrior's comment. I fear I have to make a reversal. I started trying this out last night and now I wonder what would I really lose by abandoning IList?
I think it would wiser to implement just IEnumerable and, declare a local Count() method that throws a NotSupportedException method to prevent the enumerator running until an OutOfMemoryException occurs. I would still add an IndexOf and Contains method and Item indexer property to expose higher performance alternatives like Binet's Formula but, I'd be free to change the signatures of these members to use extended datatypes potentially, even System.Numerics.BigInteger.
If I were implementing multiple series I would declare an ISeries interface for these members. Who know's, perhaps somthing like this will eventually be part of the framework.
I disagree with what appears to be a consensus view. Whilst IList has many members that cannot be implemented for an infinite series it does have an IsReadOnly member. It seems acceptable, certainly in the case of ReadOnlyCollection<>, to implement the majority of members with a NotSupportedException. Following this precedent, I don't see why this should be unacceptable if it is a side effect of some other gain in function.
In this specific Fibbonaci series case, there are established algortihms, see here and here, for shortcircuiting the normal cumalitive enumeration approach which I think would yield siginifcant performance benefits. Exposing these benefits through IList seems warranted to me.
Ideally, .Net would support some other, more appropriate super class of interface, somewhat closer to IEnumerable<> but, until that arrives in some future version, this has got to be a sensible approach.
I'm working on an implementation of IList<BigInteger> to illustrate
Summarising what I've seen so far:
You can fulfil 5 out of 6, throwing a NotSupportedException on Count()
I would have said this is probably good enough to go for it, however as servy has pointed out, the indexer is incredibly inefficient for any non-calculated and cached number.
In this case, I would say the only contract that fits your continual stream of calculations is IEnumerable.
The other option you have is to create something that looks a lot like an IList but isn't actually.
I just realize that maybe I was mistaken all the time in exposing T[] to my views, instead of IEnumerable<T>.
Usually, for this kind of code:
foreach (var item in items) {}
item should be T[] or IEnumerable<T>?
Than, if I need to get the count of the items, would the Array.Count be faster over the IEnumerable<T>.Count()?
IEnumerable<T> is generally a better choice here, for the reasons listed elsewhere. However, I want to bring up one point about Count(). Quintin is incorrect when he says that the type itself implements Count(). It's actually implemented in Enumerable.Count() as an extension method, which means other types don't get to override it to provide more efficient implementations.
By default, Count() has to iterate over the whole sequence to count the items. However, it does know about ICollection<T> and ICollection, and is optimised for those cases. (In .NET 3.5 IIRC it's only optimised for ICollection<T>.) Now the array does implement that, so Enumerable.Count() defers to ICollection<T>.Count and avoids iterating over the whole sequence. It's still going to be slightly slower than calling Length directly, because Count() has to discover that it implements ICollection<T> to start with - but at least it's still O(1).
The same kind of thing is true for performance in general: the JITted code may well be somewhat tighter when iterating over an array rather than a general sequence. You'd basically be giving the JIT more information to play with, and even the C# compiler itself treats arrays differently for iteration (using the indexer directly).
However, these performance differences are going to be inconsequential for most applications - I'd definitely go with the more general interface until I had good reason not to.
It's partially inconsequential, but standard theory would dictate "Program against an interface, not an implementation". With the interface model you can change the actual datatype being passed without effecting the caller as long as it conforms to the same interface.
The contrast to that is that you might have a reason for exposing an array specifically and in which case would want to express that.
For your example I think IEnumerable<T> would be desirable. It's also worthy to note that for testing purposes using an interface could reduce the amount of headache you would incur if you had particular classes you would have to re-create all the time, collections aren't as bad generally, but having an interface contract you can mock easily is very nice.
Added for edit:
This is more inconsequential because the underlying datatype is what will implement the Count() method, for an array it should access the known length, I would not worry about any perceived overhead of the method.
See Jon Skeet's answer for an explanation of the Count() implementation.
T[] (one sized, zero based) also implements ICollection<T> and IList<T> with IEnumerable<T>.
Therefore if you want lesser coupling in your application IEnumerable<T> is preferable. Unless you want indexed access inside foreach.
Since Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces, I would use IEnumerable, unless you need to use these interfaces.
http://msdn.microsoft.com/en-us/library/system.array.aspx
Your gut feeling is correct, if all the view cares about, or should care about, is having an enumerable, that's all it should demand in its interfaces.
What is it logically (conceptually) from the outside?
If it's an array, then return the array. If the only point is to enumerate, then return IEnumerable. Otherwise IList or ICollection may be the way to go.
If you want to offer lots of functionality but not allow it to be modified, then perhaps use a List internally and return the ReadonlyList returned from it's .AsReadOnly() method.
Given that changing the code from an array to IEnumerable at a later date is easy, but changing it the other way is not, I would go with a IEnumerable until you know you need the small spead benfit of return an array.
When I'm writing my DAL or other code that returns a set of items, should I always make my return statement:
public IEnumerable<FooBar> GetRecentItems()
or
public IList<FooBar> GetRecentItems()
Currently, in my code I have been trying to use IEnumerable as much as possible but I'm not sure if this is best practice? It seemed right because I was returning the most generic datatype while still being descriptive of what it does, but perhaps this isn't correct to do.
Framework design guidelines recommend using the class Collection when you need to return a collection that is modifiable by the caller or ReadOnlyCollection for read only collections.
The reason this is preferred to a simple IList is that IList does not inform the caller if its read only or not.
If you return an IEnumerable<T> instead, certain operations may be a little trickier for the caller to perform. Also you no longer will give the caller the flexibility to modify the collection, something that you may or may not want.
Keep in mind that LINQ contains a few tricks up its sleeve and will optimize certain calls based on the type they are performed on. So, for example, if you perform a Count and the underlying collection is a List it will NOT walk through all the elements.
Personally, for an ORM I would probably stick with Collection<T> as my return value.
It really depends on why you are using that specific interface.
For example, IList<T> has several methods that aren't present in IEnumerable<T>:
IndexOf(T item)
Insert(int index, T item)
RemoveAt(int index)
and Properties:
T this[int index] { get; set; }
If you need these methods in any way, then by all means return IList<T>.
Also, if the method that consumes your IEnumerable<T> result is expecting an IList<T>, it will save the CLR from considering any conversions required, thus optimizing the compiled code.
In general, you should require the most generic and return the most specific thing that you can. So if you have a method that takes a parameter, and you only really need what's available in IEnumerable, then that should be your parameter type. If your method could return either an IList or an IEnumerable, prefer returning IList. This ensures that it is usable by the widest range of consumers.
Be loose in what you require, and explicit in what you provide.
That depends...
Returning the least derived type (IEnumerable) will leave you the most leeway to change the underlying implementation down the track.
Returning a more derived type (IList) provides the users of your API with more operations on the result.
I would always suggest returning the least derived type that has all the operations your users are going to need... so basically, you first have to deremine what operations on the result make sense in the context of the API you are defining.
One thing to consider is that if you're using a deferred-execution LINQ statement to generate your IEnumerable<T>, calling .ToList() before you return from your method means that your items may be iterated twice - once to create the List, and once when the caller loops through, filters, or transforms your return value. When practical, I like to avoid converting the results of LINQ-to-Objects to a concrete List or Dictionary until I have to. If my caller needs a List, that's a single easy method call away - I don't need to make that decision for them, and that makes my code slightly more efficient in the cases where the caller is just doing a foreach.
List<T> offers the calling code many more features, such as modifying the returned object and access by index. So the question boils down to: in your application's specific use case, do you WANT to support such uses (presumably by returning a freshly constructed collection!), for the caller's convenience -- or do you want speed for the simple case when all the caller needs is to loop through the collection and you can safely return a reference to a real underlying collection without fearing this will get it erroneously changed, etc?
Only you can answer this question, and only by understanding well what your callers will want to do with the return value, and how important performance is here (how big are the collections you would be copying, how likely is this to be a bottleneck, etc).
I think you can use either, but each has a use. Basically List is IEnumerable but you have
count functionality, add element, remove element
IEnumerable is not efficient for counting elements
If the collection is intended to be readonly, or the modification of the collection is controlled by the Parent then returning an IList just for Count is not a good idea.
In Linq, there is a Count() extension method on IEnumerable<T> which inside the CLR will shortcut to .Count if the underlying type is of IList, so performance difference is negligible.
Generally I feel (opinion) it is better practice to return IEnumerable where possible, if you need to do additions then add these methods to the parent class, otherwise the consumer is then managing the collection within Model which violates the principles, e.g. manufacturer.Models.Add(model) violates law of demeter. Of course these are just guidelines and not hard and fast rules, but until you have full grasps of applicability, following blindly is better than not following at all.
public interface IManufacturer
{
IEnumerable<Model> Models {get;}
void AddModel(Model model);
}
(Note: If using nNHibernate you might need to map to private IList using different accessors.)
It's not so simple when you are talking about return values instead of input parameters. When it's an input parameter, you know exactly what you need to do. So, if you need to be able to iterate over the collection, you take an IEnumberable whereas if you need to add or remove, you take an IList.
In the case of a return value, it's tougher. What does your caller expect? If you return an IEnumerable, then he will not know a priori that he can make an IList out of it. But, if you return an IList, he will know that he can iterate over it. So, you have to take into account what your caller is going to do with the data. The functionality that your caller needs/expects is what should govern when making the decision on what to return.
TL; DR; – summary
If you develop in-house software, do use the specific type(Like List) for the return
values and the most generic type for input parameters even in case of collections.
If a method is a part of a redistributable library’s public API, use
interfaces instead of concrete collection types to introduce both return values and input parameters.
If a method returns a read-only collection, show that by using IReadOnlyList or IReadOnlyCollection as the return value type.
More
as all have said it depends,
if you don't want Add/Remove functioanlity at calling layer then i will vote for IEnumerable as it provides only iteration and basic functionality which in design prespective i like.
Returning IList my votes are always againist it but it's mainly what you like and what not.
in performance terms i think they are more of same.
If you do not counting in your external code it is always better to return IEnumerable, because later you can change your implementation (without external code impact), for example, for yield iterator logic and conserve memory resources (very good language feature by the way).
However if you need items count, don't forget that there is another layer between IEnumerable and IList - ICollection.
I might be a bit off here, seeing that no one else suggested it so far, but why don't you return an (I)Collection<T>?
From what I remember, Collection<T> was the preferred return type over List<T> because it abstracts away the implementation. They all implement IEnumerable, but that sounds to me a bit too low-level for the job.
I think you can use either, but each has a use. Basically List is IEnumerable but you have count functionality, Add element, remove element
IEnumerable is not efficient for counting elements, or getting a specific element in the collection.
List is a collection which is ideally suited to finding specific elements, easy to add elements, or remove them.
Generally I try to use List where possible as this gives me more flexibility.
Use
List<FooBar> getRecentItems()
rather than
IList<FooBar> GetRecentItems()
I think the general rule is to use the more specific class to return, to avoid doing unneeded work and give your caller more options.
That said, I think it's more important to consider the code in front of you which you are writing than the code the next guy will write (within reason.) This is because you can make assumptions about the code that already exists.
Remember that moving UP to a collection from IEnumerable in an interface will work, moving down to IEnumerable from a collection will break existing code.
If these opinions all seem conflicted, it's because the decision is subjective.
IEnumerable<T> contains a small subset of what is inside List<T>, which contains the same stuff as IEnumerable<T> but more! You only use IEnumerable<T> if you want a smaller set of features. Use List<T> if you plan to use a larger, richer set of features.
The Pizza Explanation
Here is a much more comprehensive explanation of why you would use an Interface like IEnumerable<T> versus List<T>, or vise versa, when instantiating objects in C languages like Microsoft C#.
Think of Interfaces like IEnumerable<T> and IList<T> as the individual ingredients in a pizza (pepperoni, mushrooms, black olives...) and concrete classes like List<T> as the pizza. List<T> is in fact a Supreme Pizza that always contains all the Interface ingredients combined (ICollection, IEnumerable, IList, etc).
What you get as far as a pizza and its toppings is determined by how you "type" your list when you create its object reference in memory. You have to declare the type of pizza you are cooking as follows:
// Pepperoni Pizza: This gives you a single Interface's members,
// or a pizza with one topping because List<T> is limited to
// acting like an IEnumerable<T> type.
IEnumerable<string> pepperoniPizza = new List<string>();
// Supreme Pizza: This gives you access to ALL 8 Interface
// members combined or a pizza with ALL the ingredients
// because List type uses all Interfaces!!
IList<string> supremePizza = new List<string>();
Note you cannot instantiate an Interface as itself (or eat raw pepperoni). When you instantiate List<T> as one Interface type like IEnumerable<T> you only have access to its Implementations and get the pepperoni pizza with one topping. You can only access IEnumerable<T> members and cannot see all the other Interface members in List<T>.
When List<T> is instantiated as IList<T> it implements all 8 Interfaces, so it has access to all the members of all the Interfaces it has implemented (or a Supreme Pizza toppings)!
Here is the List<T> class, showing you WHY that is. Notice the List<T> in the .NET Library has implemented all the other Interfaces including IList!! But IEnumerable<T> implements just a small subsection of those List Interface members.
public class List<T> :
ICollection<T>,
IEnumerable<T>,
IEnumerable,
IList<T>,
IReadOnlyCollection<T>,
IReadOnlyList<T>,
ICollection,
IList
{
// List<T> types implement all these goodies and more!
public List();
public List(IEnumerable<T> collection);
public List(int capacity);
public T this[int index] { get; set; }
public int Count { get; }
public int Capacity { get; set; }
public void Add(T item);
public void AddRange(IEnumerable<T> collection);
public ReadOnlyCollection<T> AsReadOnly();
public bool Exists(Predicate<T> match);
public T Find(Predicate<T> match);
public void ForEach(Action<T> action);
public void RemoveAt(int index);
public void Sort(Comparison<T> comparison);
// ......and much more....
}
So why NOT instantiate List<T> as List<T> ALL THE TIME?
Instantiating a List<T> as List<T> gives you access to all Interface members! But you might not need everything. Choosing one Interface type allows your application to store a smaller object with less members and keeps your application tight. Who needs Supreme Pizza every time?
But there is a second reason for using Interface types: Flexibility. Because other types in .NET, including your own custom ones, might use the same "popular" Interface type, it means you can later substitute your List<T> type with any other type that implements, say IEnumerable<T>. If your variable is an Interface type, you can now switch out the object created with something other than List<T>. Dependency Injection is a good example of this type of flexibility using Interfaces rather than concrete types, and why you might want to create objects using Interfaces.
EDIT:
From the answers given, it's been made rather clear to me how the design I'm asking about below should actually be implemented. With those suggestions in mind (and in response to a comment politely pointing out that my example code does not even compile), I've edited the following code to reflect what the general consensus seems to be. The question that remains may no longer make sense in light of the code, but I'm leaving it as it is for posterity.
Suppose I have three overloads of a function, one taking IEnumerable<T>, one taking ICollection<T>, and one taking IList<T>, something like the following:
public static T GetMiddle<T>(IEnumerable<T> values) {
IList<T> list = values as IList<T>;
if (list != null) return GetMiddle(list);
int count = GetCount<T>(values);
T middle = default(T);
int index = 0;
foreach (T value in values) {
if (index++ >= count / 2) {
middle = value;
break;
}
}
return middle;
}
private static T GetMiddle<T>(IList<T> values) {
int middleIndex = values.Count / 2;
return values[middleIndex];
}
private static int GetCount<T>(IEnumerable<T> values) {
// if values is actually an ICollection<T> (e.g., List<T>),
// we can get the count quite cheaply
ICollection<T> genericCollection = values as ICollection<T>;
if (genericCollection != null) return genericCollection.Count;
// same for ICollection (e.g., Queue<T>, Stack<T>)
ICollection collection = values as ICollection;
if (collection != null) return collection.Count;
// otherwise, we've got to count values ourselves
int count = 0;
foreach (T value in values) count++;
return count;
}
The idea here is that, if I've got an IList<T>, that makes my job easiest; on the other hand, I can still do the job with an ICollection<T> or even an IEnumerable<T>; the implementation for those interfaces just isn't as efficient.
I wasn't sure if this would even work (if the runtime would be able to choose an overload based on the parameter passed), but I've tested it and it seems to.
My question is: is there a problem with this approach that I haven't thought of? Alternately, is this in fact a good approach, but there's a better way of accomplishing it (maybe by attempting to cast the values argument up to an IList<T> first and running the more efficient overload if the cast works)? I'm just interested to know others' thoughts.
If you have a look at how LINQ extension methods are implemented using Reflector, you can see that a few extension methods on IEnumerable<T>, such as Count(), attempt to cast the sequence to an ICollection<T> or an IList<T> to optimize the operation (for example, using the ICollection<T>.Count property instead of iterating through an IEnumerable<T> and counting the elements). So your best bet is most likely to accept an IEnumerable<T> and then do this kind of optimizations if ICollection<T> or IList<T> are available.
I think one version accepting IEnumerable<T> would be the way to go, and check inside the method if the parameter is one of the more derived collection types. With three versions as you propose, you lose the efficiency benefit if someone passes you a (runtime) IList<T> that the compiler statically considers an IEnumerable<T>:
IList<string> stringList = new List<string> { "A", "B", "C" };
IEnumerable<string> seq = stringList;
Extensions.GetMiddle(stringList); // calls IList version
Extensions.GetMiddle(seq); // calls IEnumerable version
I'd say it's uncommon, and potentially confusing, so would be unlikely to be a good choice for a public API.
You could accept an IEnumerable<T> parameter, and internally check if it is in fact an ICollection<T> or IList<T>, and optimize accordingly.
This might be analagous to some of the optimizations in some of the IEnumerable<T> extension methods in the .NET 3.5 Framework.
I am really indifferent. If I saw it your way I would not think anything of it. But Joe's idea has merit. It might look like the following.
public static T GetMiddle<T>(IEnumerable<T> values)
{
if (values is IList<T>) return GetMiddle((IList<T>)values);
if (values is ICollection<T>) return GetMiddle((ICollection<T>)values);
// Use the default implementation here.
}
private static T GetMiddle<T>(ICollection<T> values)
{
}
private static T GetMiddle<T>(IList<T> values)
{
}
While it is legal to overload a method to accept either a base type or a derived type, with all other parameters being otherwise identical, it is only advantageous to do so if the compiler will often be able to identify the latter form as being a better match. Because it would be very common for objects which implement ICollection<T> to be passed around by code which only needs an IEnumerable<T>, it would be very common for implementations of ICollection<T> to be passed into the IEnumerable<T> overload. Consequently, the IEnumerable<T> overload should probably check whether a passed-in object implements ICollection<T> and handle then specially if so.
If the most natural way of implementing the logic for an ICollection<T> would be to write a special method for it, there would be nothing particularly wrong with having a public overload which accepts an ICollection<T>, and having the IEnumerable<T> overload call the ICollection<T> one if given an object that implements ICollection<T>. Having such an overload be public wouldn't add much value, but it likely wouldn't hurt anything either. On the other hand, in situations where an object implements both IEnumerable<T> and ICollection, but not ICollection<T> (for example, a List<Cat> implements IEnumerable<Animal> and ICollection, but not ICollection<Animal>), one might want to use both interfaces, but that could not be done without either typecasting in the method that uses them, or passing the method which uses them both an ICollection reference and an IEnumerable<T> reference. The latter would be very ugly in a public method, and the former approach would lose the benefits of overloading.
Usually when designing interfaces you want to accept a 'lowest common denominator' type for the arguments. For return types it is a matter of some debate. I generally think creating the above overloads is overkill. It's biggest problem is the introduction of unneeded code-paths that now must be tested. Better to have one method that performs the operation one way and works 100% of the time. With the given overloads above you might have an inconsistency in behavior and not even realize it, or worse yet you may accidentally introduce a change in one and not in the other copies.
If you can do it with IEnumerable<T> then use that, if not then use the least interface needed.
No. It's certainly uncommon.
Anyway.
Since IList<T> inherits from ICollection<T> and IEnumerable<T>, and ICollection<T> inherits from IEnumerable<T>, your only concern would be performance in IEnumerable<T> types.
I just see no reason to overload the function in that way, providing different signatures to achieve exactly the same result and accepting exactly the same types as parameter (no matter if you have an IEnumerable<T> or IList<T>, you would be able to pass it to any of the three overloads); that would just cause confusion.
When you overload a function, is just to provide a way to pass a different type of parameter that you cannot pass to the function if it would not have that signature.
Don't optimize unless it's really necessary.
If you want to optimize, do it undercover.
You won't pretend someone using your class to be aware of that "optimization" in order to decide which method signature to use, right?