Is IList really a better alternative to an array - c#

Today I read a post from Eric Lippert that describes the harm of arrays. It´s mentioned that when we need a collection of values we should provide values, not a variable that points to a list of values. Thus Eric suggests that whenever we want to return a collection of items in a method, we should return an IList<T>, which provides the same as an array, namely it:
enables indexed access
enables iterating the items
is strongly typed
However in contrast to an array the list also provides members that add or remove items and thus modify the collection-object. We could of course wrap the collection into a ReadOnlyCollection and return an IEnumerable<T> but then we´d lose the indexed accessability. Moreover a caller can´t know if the ReSharper-warning "possible iterations of the same enumerations" applies as he doesn´t know that internally that enumeration is just a list wrapped within a ReadOnlyCollection. So a caller can´t know if the collection was already materialized or not.
So what I want is a collection of items where the collection itself is immutable (the items however don´t have to be, they won´t on an IList neither), meaning we can´t add/remove/insert items to the underlying list. However it seems weird to me returning a ReadOnlyCollection from my API, at least I´ve never seen an API doing so.
Thus array seems perfect for my needs, doesn´t it?

We could of course wrap the collection into a ReadOnlyCollection and return an IEnumerable<T>
Why do that? ReadOnlyCollection<T> implements IList<T>, so unless there's some better approach, declaring a return type of IList<T> and returning an instance of ReadOnlyCollection<T> seems like a good way to go.
However, it just so happens that in current .NET Framework versions, there is a slightly better way: return an instance of ReadOnlyCollection<T>, but specify a return type of IReadOnlyList<T>. While IList<T> doesn't really promise to allow modification by the caller, IReadOnlyList<T> is explicit about the intent.

Related

IEnumerable to IReadOnlyCollection

I have IEnumerable<Object> and need to pass to a method as a parameter but this method takes IReadOnlyCollection<Object>
Is it possible to convert IEnumerable<Object> to IReadOnlyCollection<Object> ?
One way would be to construct a list, and call AsReadOnly() on it:
IReadOnlyCollection<Object> rdOnly = orig.ToList().AsReadOnly();
This produces ReadOnlyCollection<object>, which implements IReadOnlyCollection<Object>.
Note: Since List<T> implements IReadOnlyCollection<T> as well, the call to AsReadOnly() is optional. Although it is possible to call your method with the result of ToList(), I prefer using AsReadOnly(), so that the readers of my code would see that the method that I am calling has no intention to modify my list. Of course they could find out the same thing by looking at the signature of the method that I am calling, but it is nice to be explicit about it.
Since the other answers seem to steer in the direction of wrapping the collections in a truly read-only type, let me add this.
I have rarely, if ever, seen a situation where the caller is so scared that an IEnumerable<T>-taking method might maliciously try to cast that IEnumerable<T> back to a List or other mutable type, and start mutating it. Cue organ music and evil laughter!
No. If the code you are working with is even remotely reasonable, then if it asks for a type that only has read functionality (IEnumerable<T>, IReadOnlyCollection<T>...), it will only read.
Use ToList() and be done with it.
As a side note, if you are creating the method in question, it is generally best to ask for no more than an IEnumerable<T>, indicating that you "just want a bunch of items to read". Whether or not you need its Count or need to enumerate it multiple times is an implementation detail, and is certainly prone to change. If you need multiple enumeration, simply do this:
items = items as IReadOnlyCollection<T> ?? items.ToList(); // Avoid multiple enumeration
This keeps the responsibility where it belongs (as locally as possible) and the method signature clean.
When returning a bunch of items, on the other hand, I prefer to return an IReadOnlyCollection<T>. Why? The goal is to give the caller something that fulfills reasonsable expectations - no more, no less. Those expectations are usually that the collection is materialized and that the Count is known - precisely what IReadOnlyCollection<T> provides (and a simple IEnumerable<T> does not). By being no more specific than this, our contract matches expectations, and the method is still free to change the underlying collection. (In contrast, if a method returns a List<T>, it makes me wonder what context there is that I should want to index into the list and mutate it... and the answer is usually "none".)
As an alternative to dasblinkenlight's answer, to prevent the caller casting to List<T>, instead of doing orig.ToList().AsReadOnly(), the following might be better:
ReadOnlyCollection<object> rdOnly = Array.AsReadOnly(orig.ToArray());
It's the same number of method calls, but one takes the other as a parameter instead of being called on the return value.

Returning 'IList' vs 'ICollection' vs 'Collection'

I am confused about which collection type that I should return from my public API methods and properties.
The collections that I have in mind are IList, ICollection and Collection.
Is returning one of these types always preferred over the others, or does it depend on the specific situation?
ICollection<T> is an interface that exposes collection semantics such as Add(), Remove(), and Count.
Collection<T> is a concrete implementation of the ICollection<T> interface.
IList<T> is essentially an ICollection<T> with random order-based access.
In this case you should decide whether or not your results require list semantics such as order based indexing (then use IList<T>) or whether you just need to return an unordered "bag" of results (then use ICollection<T>).
Generally you should return a type that is as general as possible, i.e. one that knows just enough of the returned data that the consumer needs to use. That way you have greater freedom to change the implementation of the API, without breaking the code that is using it.
Consider also the IEnumerable<T> interface as return type. If the result is only going to be iterated, the consumer doesn't need more than that.
The main difference between the IList<T> and ICollection<T> interfaces is that IList<T> allows you to access elements via an index. IList<T> describes array-like types. Elements in an ICollection<T> can only be accessed through enumeration. Both allow the insertion and deletion of elements.
If you only need to enumerate a collection, then IEnumerable<T> is to be preferred. It has two advantages over the others:
It disallows changes to the collection (but not to the elements, if they are of reference type).
It allows the largest possible variety of sources, including enumerations that are generated algorithmically and are not collections at all.
Allows lazy evaluation and can be queried with LINQ.
Collection<T> is a base class that is mainly useful to implementers of collections. If you expose it in interfaces (APIs), many useful collections not deriving from it will be excluded.
One disadvantage of IList<T> is that arrays implement it but do not allow you to add or remove items (i.e. you cannot change the array length). An exception will be thrown if you call IList<T>.Add(item) on an array. The situation is somewhat defused as IList<T> has a Boolean property IsReadOnly that you can check before attempting to do so. But in my eyes, this is still a design flaw in the library. Therefore, I use List<T> directly, when the possibility to add or remove items is required.
Which one should I choose? Let's consider just List<T> and IEnumerable<T> as examples for specialized / generalized types:
Method input parameter
IEnumerable<T> greatest flexibility for the caller. Restrictive for the implementer, read-only.
List<T> Restrictive for the caller. Gives flexibility to the implementer, can manipulate the collection.
Method ouput parameter or return value
IEnumerable<T> Restrictive for the caller, read-only. Greatest flexibility for the implementer. Allows to return about any collection or to implement an iterator (yield return).
List<T> Greatest flexibility for the caller, can manipulate the returned collection. Restrictive for the implementer.
Well, at this point you may be disappointed because I don't give you a simple answer. A statement like "always use this for input and that for output" would not be constructive. The reality is that it depends on use case. A method like void AddMissingEntries(TColl collection) will have to provide a collection type having an Add method or may even require a HashSet<T> for efficiency. A method void PrintItems(TColl collection) can happily live with an IEnumerable<T>.
IList<T> is the base interface for all generic lists. Since it is an ordered collection, the implementation can decide on the ordering, ranging from sorted order to insertion order. Moreover Ilist has Item property that allows methods to read and edit entries in the list based on their index.
This makes it possible to insert, remove a value into/from the list at a position index.
Also since IList<T> : ICollection<T>, all the methods from ICollection<T> are also available here for implementation.
ICollection<T> is the base interface for all generic collections. It defines size, enumerators and synchronization methods. You can add or remove an item into a collection but you cannot choose at which position it happens due to the absence of index property.
Collection<T> provides an implementation for IList<T>, IList and IReadOnlyList<T>.
If you use a narrower interface type such as ICollection<T> instead of IList<T>, you protect your code against breaking changes. If you use a wider interface type such as IList<T>, you are more in danger of breaking code changes.
Quoting from a source,
ICollection, ICollection<T> : You want to modify the collection or
you care about its size.
IList, IList<T>: You want to modify the collection and you care about the ordering and / or positioning of the elements in the collection.
Returning an interface type is more general, so (lacking further information on your specific use case) I'd lean towards that. If you want to expose indexing support, choose IList<T>, otherwise ICollection<T> will suffice. Finally, if you want to indicate that the returned types are read only, choose IEnumerable<T>.
And, in case you haven't read it before, Brad Abrams and Krzysztof Cwalina wrote a great book titled "Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries" (you can download a digest from here).
There are some subjects that come from this question:
interfaces versus classes
which specific class, from several alike classes, collection, list, array ?
Common classes versus subitem ("generics") collections
You may want to highlight that its an Object Oriented A.P.I.
interfaces versus classes
If you don't have much experience with interfaces, I recommend stick to classes.
I see a lot of times of developers jumping to interfaces, even if its not necesarilly.
And, end doing a poor interface design, instead of, a good class design,
which, by the way, can eventually, be migrated to a good interface design ...
You'll see a lot of interfaces in A.P.I., but, don't rush to it,
if you don't need it.
You will eventually learn how to apply interfaces, to your code.
which specific class, from several alike classes, collection, list, array ?
There are several classes in c# (dotnet) that can be interchanged. As already mention, if you need something from a more specific class, such as "CanBeSortedClass", then make it explicit in your A.P.I..
Does your A.P.I. user really needs to know, that your class can be sorted, or apply some format to the elements ? Then use "CanBeSortedClass" or "ElementsCanBePaintedClass",
otherwise use "GenericBrandClass".
Otherwise, use a more general class.
Common collection classes versus subitem ("generics") collections
You'll find that there are classes that contains others elements,
and you can specify that all elements should be of an specific type.
Generic Collections are those classes that you can use the same collection,
for several code applications, without having to create a new collection,
for each new subitem type, like this: Collection.
Does your A.P.I. user is going to need a very specific type, same for all elements ?
Use something like List<WashingtonApple> .
Does your A.P.I. user is going to need several related types ?
Expose List<Fruit> for your A.P.I., and use List<Orange> List<Banana>, List<Strawberry> internally, where Orange, Banana and Strawberry are descendants from Fruit .
Does your A.P.I. user is going to need a generic type collection ?
Use List, where all items are object (s).
Cheers.

Why array implements IList?

See the definition of System.Array class
public abstract class Array : IList, ...
Theoretically, I should be able to write this bit and be happy
int[] list = new int[] {};
IList iList = (IList)list;
I also should be able to call any method from the iList
ilist.Add(1); //exception here
My question is not why I get an exception, but rather why Array implements IList?
Because an array allows fast access by index, and IList/IList<T> are the only collection interfaces that support this. So perhaps your real question is "Why is there no interface for constant collections with indexers?" And to that I have no answer.
There are no readonly interfaces for collections either. And I'm missing those even more than a constant sized with indexers interface.
IMO there should be several more (generic) collection interfaces depending on the features of a collection. And the names should have been different too, List for something with an indexer is really stupid IMO.
Just Enumeration IEnumerable<T>
Readonly but no indexer (.Count, .Contains,...)
Resizable but no indexer, i.e. set like (Add, Remove,...) current ICollection<T>
Readonly with indexer (indexer, indexof,...)
Constant size with indexer (indexer with a setter)
Variable size with indexer (Insert,...) current IList<T>
I think the current collection interfaces are bad design. But since they have properties telling you which methods are valid (and this is part of the contract of these methods), it doesn't break the substitution principle.
The remarks section of the documentation for IList says:
IList is a descendant of the
ICollection interface and is the base
interface of all non-generic lists.
IList implementations fall into three
categories: read-only, fixed-size, and
variable-size. A read-only IList
cannot be modified. A fixed-size IList
does not allow the addition or removal
of elements, but it allows the
modification of existing elements. A
variable-size IList allows the
addition, removal, and modification of
elements.
Obviously, arrays fall into the fixed-size category, so by the definition of the interface it makes sense.
Because not all ILists are mutable (see IList.IsFixedSize and IList.IsReadOnly), and arrays certainly behave like fixed-size lists.
If your question is really "why does it implement a non-generic interface", then the answer is that these were around before generics came along.
It's a legacy that we have from the times when it wasn't clear how to deal with read only collections and whether or not Array is read only. There are IsFixedSize and IsReadOnly flags in the IList interface. IsReadOnly flag means that collection can't be changed at all and IsFixedSize means that collection does allow modification, but not adding or removal of items.
At the time of .Net 4.5 it was clear that some "intermediate" interfaces are required to work with read only collections, so IReadOnlyCollection<T> and IReadOnlyList<T> were introduced.
Here is a great blog post describing the details: Read only collections in .NET
Definition of IList interface is "Represents a non-generic collection of objects that can be individually accessed by index.". Array completely satisfies this definition, so must implement the interface.
Exception when calling Add() method is "System.NotSupportedException: Collection was of a fixed size" and occurred because array can not increase its capacity dynamically. Its capacity is defined during creation of array object.
Having an array implement IList (and transitively, ICollection) simplified the Linq2Objects engine, since casting the IEnumerable to IList/ICollection would also work for arrays.
For example, a Count() ends up calling the Array.Length under-the-hood, since it's casted to ICollection and the array's implementation returns Length.
Without this, the Linq2Objects engine would not have special treatment for arrays and perform horribly, or they'd need to double the code adding special-case treatment for arrays (like they do for IList). They must've opted to make array implement IList instead.
That's my take on "Why".
Also implementation details LINQ Last checks for IList , if it did not implement list they would need either 2 checks slowing down all Last calls or have Last on an Array taking O(N)
An Array is just one of many possible implementations of IList.
As code should be loosely coupled, depend on abstractions and what not... The concrete implementation of IList that uses consecutive memory (an array) to store it's values is called Array. We do not "add" IList to the Array class that's just the wrong order of reasoning; Array implements IList as an array.
The exception is exactly what the interface defines. It is not a surprise if you know the whole interface not just a single method. The interface also give you the opportunity to check the IsFixedSize property and see if it is safe to call the Add method.

Why create an IEnumerable?

I don't understand why I'd create an IEnumerable. Or why it's important.
I'm looking at the example for IEnumerable:
http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx
But I can basically do the same thing if I just went:
List<Person> people = new List<Person>();
so what's IEnumerable good for? Can you give me a situation where I'd need to create a class that implements IEnumerable?
IEnumerable is an interface, it exposes certain things to the outside. While you are completely right, you could just use a List<T>, but List<T> is very deep in the inheritance tree. What exactly does a List<T>? It stores items, it offers certain methods to Add and Remove. Now, what if you only need the "item-keeping" feature of a List<T>? That's what an IEnumerable<T> is - an abstract way of saying "I want to get a list of items I can iterate over". A list is "I want to get a collection which I can modify, can access by index and iterate". List<T> offers a lot more functionality than IEnumerable<T> does, but it takes up more memory. So if a method is taking an IEnumerable<T>, it doesn't care what exactly it gets, as long as the object offers the possibilites of IEnumerable<T>.
Also, you don't have to create your own IEnumerable<T>, a List<T> IS an IEnumerable<T>!
Lists are, of course IEnumerable - As a general rule, you want to be specific on what you output but broad on what you accept as input eg:
You have a sub which loops through a list of objects and writes something to the console...
You could declare the parameter is as either IEnumerable<T> or IList<T> (or even List<T>). Since you don't need to add to the input list, all you actually need to do is enumerate - so use IEnumerable - then your method will also accept other types which implement IEnumerable including IQueryable, Linked Lists, etc...
You're making your methods more generic for no cost.
Today, you generally wouldn't use IEnumerable anymore unless you were supporting software on an older version of the framework. Today, you'd normally use IEnumerable<T>. Amongst other benefits, IEnumerable fully implements all of the LINQ operations/extensions so that you can easily query any List type that implements IEnumerable<T> using LINQ.
Additionally, it doesn't tie the consumer of your code to a particular collection implementation.
It's rare that nowdays you need to create your own container classes, as you are right there alreay exists many good implementations.
However if you do create your own container class for some specific reason, you may like to implement IEnumerable or IEnumerable<T> because they are a standard "contract" for itteration and by providing an implementation you can take advantage of methods/apis that want an IEnumerable or IEnumerable<T> Linq for example will give you a bunch of useful extension methods for free.
An IList can be thought of as a particular implementation of IEnumerable. (One that can be added to and removed from easily.) There are others, such as IDictionary, which performs an entirely different function but can still be enumerated over. Generally, I would use IEnumerable as a more generic type reference when I only need an enumeration to satisfy a requirement and don't particularly care what kind it is. I can pass it an IList and more often than not I do just that, but the flexibility exists to pass it other enumerations as well.
Here is one situation that I think I have to implement IEnumerable but not using List<>
I want to get all items from a remote server. Let say I have one million items going to return. If you use List<> approach, you need to cache all one million items in the memory first. In some cases, you don't really want to do that because you don't want to use up too much memory. Using IEnumerable allows you to display the data on the screen and then dispose it right away. Therefore, using IEnumerable approach, the memory footprint of the program is much smaller.
It's my understanding that IEnumerable is provided to you as an interface for creating your own enumerable class types.
I believe a simple example of this would be recreating the List type, if you wanted to have your own set of features (or lack thereof) for it.
What if you want to enumerate over a collection that is potentially of infinite size, such as the Fibonacci numbers? You couldn't do that easily with a list, but if you had a class that implemented IEnumerable or IEnumerable<T>, it becomes easy.
When a built in container fits your needs you should definitely use that, and than IEnumerable comes for free. When for whatever reason you have to implement your own container, for example if it must be backed by a DB, than you should make sure to implement both IEnumerable and IEnumerable<T> for two reasons:
It makes foreach work, which is awesome
It enables almost all LINQ goodness. For example you will be able to filter your container down to objects that match a condition with an elegant one liner.
IEnumerable provides means for your API users (including yourself) to use your collection by the means of a foreach. For example, i implemented IENumerable in my Binary Tree class so i could just foreach over all of the items in the tree without having to Ctrl+C Ctrl+V all the logic required to traverse the tree InOrder.
Hope it helps :)
IEnumerable is useful if you have a collection or method which can return a bunch of things, but isn't a Dictionary, List, array, or other such predefined collection. It is especially useful in cases where the set of things to be returned might not be available when one starts outputting it. For example, an object to access records in a database might implement iEnumerable. While it might be possible for such an object to read all appropriate records into an array and return that, that may be impractical if there are a lot of records. Instead, the object could return an enumerator which could read the records in small groups and return them individually.

IEnumerable<T> vs T[]

I just realize that maybe I was mistaken all the time in exposing T[] to my views, instead of IEnumerable<T>.
Usually, for this kind of code:
foreach (var item in items) {}
item should be T[] or IEnumerable<T>?
Than, if I need to get the count of the items, would the Array.Count be faster over the IEnumerable<T>.Count()?
IEnumerable<T> is generally a better choice here, for the reasons listed elsewhere. However, I want to bring up one point about Count(). Quintin is incorrect when he says that the type itself implements Count(). It's actually implemented in Enumerable.Count() as an extension method, which means other types don't get to override it to provide more efficient implementations.
By default, Count() has to iterate over the whole sequence to count the items. However, it does know about ICollection<T> and ICollection, and is optimised for those cases. (In .NET 3.5 IIRC it's only optimised for ICollection<T>.) Now the array does implement that, so Enumerable.Count() defers to ICollection<T>.Count and avoids iterating over the whole sequence. It's still going to be slightly slower than calling Length directly, because Count() has to discover that it implements ICollection<T> to start with - but at least it's still O(1).
The same kind of thing is true for performance in general: the JITted code may well be somewhat tighter when iterating over an array rather than a general sequence. You'd basically be giving the JIT more information to play with, and even the C# compiler itself treats arrays differently for iteration (using the indexer directly).
However, these performance differences are going to be inconsequential for most applications - I'd definitely go with the more general interface until I had good reason not to.
It's partially inconsequential, but standard theory would dictate "Program against an interface, not an implementation". With the interface model you can change the actual datatype being passed without effecting the caller as long as it conforms to the same interface.
The contrast to that is that you might have a reason for exposing an array specifically and in which case would want to express that.
For your example I think IEnumerable<T> would be desirable. It's also worthy to note that for testing purposes using an interface could reduce the amount of headache you would incur if you had particular classes you would have to re-create all the time, collections aren't as bad generally, but having an interface contract you can mock easily is very nice.
Added for edit:
This is more inconsequential because the underlying datatype is what will implement the Count() method, for an array it should access the known length, I would not worry about any perceived overhead of the method.
See Jon Skeet's answer for an explanation of the Count() implementation.
T[] (one sized, zero based) also implements ICollection<T> and IList<T> with IEnumerable<T>.
Therefore if you want lesser coupling in your application IEnumerable<T> is preferable. Unless you want indexed access inside foreach.
Since Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces, I would use IEnumerable, unless you need to use these interfaces.
http://msdn.microsoft.com/en-us/library/system.array.aspx
Your gut feeling is correct, if all the view cares about, or should care about, is having an enumerable, that's all it should demand in its interfaces.
What is it logically (conceptually) from the outside?
If it's an array, then return the array. If the only point is to enumerate, then return IEnumerable. Otherwise IList or ICollection may be the way to go.
If you want to offer lots of functionality but not allow it to be modified, then perhaps use a List internally and return the ReadonlyList returned from it's .AsReadOnly() method.
Given that changing the code from an array to IEnumerable at a later date is easy, but changing it the other way is not, I would go with a IEnumerable until you know you need the small spead benfit of return an array.

Categories