Why is IList not deferred execution? - c#

As I understand it IEnumerable and IQueryable are deferred execution. Why wouldn't it be of benefit for IList to also support deferred execution?

The longer I think about it, the more I question whether the whole idea of "deferred execution" is actually of pedagogic value at all.
I'll answer your question by denying it. IEnumerable<T>, IQueryable<T> and IList<T> do not in any way represent "deferred" or "eager" calculations.
Rather, interfaces represent an ability to provide a service. IEnumerable<T> represents the service "I can provide a sequence, possibly infinite, of items of type T, one at a time". IQueryable<T> represents the service "I can represent a query against a data source, and provide the results of that query on demand". IList<T> represents the service "I can provide random access to a possibly mutable, finite-sized list of items of type T".
None of those services say anything about the implementation details of the service providers. The provider of an IList<T> service could be entirely lazy; the provider of an IQueryable<T> service could be entirely eager. If you want to make a deferred-execution IList<T>, you go right ahead. No one is stopping you!

IList<T> supports random access via the indexer - as well as the Count property. Both of these go against the spirit of deferred execution with streaming data, unless you're suggesting that you really just want something which loads the entire list as soon as you access anything. If that's what you're after, Lazy<T> might be the ticket...

As #Joe pointed out, IList<T> abstracts a noun. IEnumerable<T> and IQueryable<T> are abstractions of verbs.
IList<T> is an abstraction of a collection that can be treated as a list. IEnumerable<T> and 'IQueryable abstract actions: the enumeration over, or querying of the items contained in the underlying collection.

Think through the consequences of what you're saying.
What would be deferred, and until when?
In the case of IEnumerable and IQueryable what's deferred is the enumeration. And the interfaces don't expose anything else that depends on the enumeration having been done.

There are a multitude of reasons, just some:
an IList should be independent from the underlying data source that
was used to create the IList
accessing an IList member by index is expected to be O(1) - with deferred execution this cannot be guaranteed - access might be very slow, at least the first time.
accessing an IList member should not produce an exception because of how the IList being constructed when you access (there are all sorts of problems that can arise with deferred execution, think i.e. not being able to access the original data source) - this is going back to the first point.

Related

Why I should not always be using ICollection instead of IEnumerable?

I ended up in this post while searching for solutions to my problem - which led me to propose a new answer there - and - to be confronted with the following question:
Considering ICollection implements IEnumerable, and all linq extensions apply to both interfaces, is there any scenario where I would benefit from working with an IEnumerable instead of an ICollection ?
The non generic IEnumerable, for instance, does not provide a Count extension.
Both ICollection interfaces do.
Given all ICollection, in any case, provide all functionality IEnumerable implement - since it itself implements it - why then would I opt for IEnumerable in place of ICollection ?
Backward compatibility with previous frameworks where ICollection was not available ?
I think there are actually two questions to answer here.
When would I want IEnumerable<T>?
Unlike other collection types and the language in general, queries on IEnumerables are executed using lazy evaluation. That means you can potentially perform several related queries in only enumeration.
It's worth noting that lazy evaluation doesn't interact nicely with side effects because multiple enumeration could then give different results, you need to keep this in mind when using it. In a language like C#, lazy evaluation can be a very powerful tool but also a source of unexplained behaviour if you aren't careful.
When would I not want ICollection<T>?
ICollection<T> is a mutable interface, it exposes, amongst other things, add and remove methods. Unless you want external things to be mutating your object's contents, you don't want to be returning it. Likewise, you generally don't want to be passing it in as an argument for the same reason.
If you do want explicit mutability of the collection, by all means use ICollection<T>.
Additionally, unlike IEnumerable<T> or IReadOnlyCollection<T>, ICollection<T> is not covariant which reduces the flexibility of the type in certain use cases.
Non-generic versions
Things change a bit when it comes to the non-generic versions of these interfaces. In this case, the only real difference between the two is the lazy evaluation offered by IEnumerable and the eager evaluation of ICollection.
I personally would tend to avoid the non-generic versions due to the lack of type safety and poor performance from boxing/unboxing in the case of value types.
Summary
If you want lazy evaluation, use IEnumerable<T>. For eager evaluation and immutability use IReadOnlyCollection<T>. For explicit mutability use ICollection<T>.
IEnumerable provides a read-only interface to a collection and ICollection allows modification. Also IEnumerable needs just to know how to iterate over elements. ICollection has to provide more information.
This is semantically different. You don't always want to provide a functionality for modification of a collection.
There is a IReadOnlyCollection but it doesn't implement ICollection. This is a design of C#, that ReadOnly is a different stripped down interface.
The point made by Tim is quite important. The internal working for Count might be dramatically different. IEnumerable does not need to know how many elements it spans over. Collection has a Property, so it has to know how many elements it contains. That is another crucial difference.
The idea is to use the simplest contract (interface) which fulfills the requirements: ICollection is a collection, IEnumerable is a sequence. A sequence could have deferred execution, it could be infinite, etc. The interface IEnumerable just tells you that you can enumerate the sequence, that is all. This is different from ICollection, which represents an actual collection containing a finite number of items.
As you can see, these are quite different. You cannot ignore the semantics of these contracts, and just focus on which interface inherits which other one.
If your algorithm only involves enumeration of the input data, then it should take an IEnumerable. If you are, by contract, dealing with collections (i.e, you expect collections and nothing else), then you should use ICollection.

Why Are AsObservable and AsEnumerable Implemented Differently?

The implementation of Enumerable.AsEnumerable<T>(this IEnumerable<T> source) simply returns source. However Observable.AsObservable<T>(this IObservable<T> source) returns an AnonymousObservable<T> subscribing to the source rather than simply returning the source.
I understand these methods are really useful for changing the monad within a single query (going from IQueryable => IEnumerable). So why do the implementations differ?
The Observable version is more defensive, in that you can't cast it to some known type (if it original were implemented as a Subject<T> you'd never be able to cast it as such). So why does the Enumerable version not do something similar? If my underlying type is a List<T> but expose it as IEnumerable<T> through AsEnumerable, it will be possible to cast back to a List<T>.
Please note that this isn't a question on how to expose IEnumerable<T> without being able to cast to the underlying, but why the implementations between Enumerable and Observable are semantically different.
Your question is answered by the documentation, which I encourage you to read when you have such questions.
The purpose of AsEnumerable is to hint to the compiler "please stop using IQueryable and start treating this as an in-memory collection".
As the documentation states:
The AsEnumerable<TSource>(IEnumerable<TSource>) method has no effect other than to change the compile-time type of source from a type that implements IEnumerable<T> to IEnumerable<T> itself. AsEnumerable<TSource>(IEnumerable<TSource>) can be used to choose between query implementations when a sequence implements IEnumerable<T> but also has a different set of public query methods available.
If you want to hide the implementation of an underlying sequence, use sequence.Select(x=>x) or ToList or ToArray if you don't care that you're making a mutable sequence.
The purpose of AsObservable is to hide the implementation of the underlying collection. As the documentation says:
Observable.AsObservable<TSource> ... Hides the identity of an observable sequence.
Since the two methods have completely different purposes, they have completely different implementations.
You're right about the relationship between AsEnumerable and AsObservable wrt the aspect of switching from expression tree based queries to in-memory queries.
At the same time, exposing an Rx sequence based on a Subject<T> is very common, and we needed a way to hide it (otherwise the user could cast to IObservable<T> and inject elements).
A long while ago in the history of Rx pre-releases, we did have a separate Hide method, which was merely a Select(x => x) alias. We never quite liked it and decided to have a place where we deviated from the LINQ to Objects precise mirrorring, and made AsObservable play the role of Hide, also based on users who believed this was what it did to begin with.
Notice though, we do have an extension method called AsObservable on IQbservable<T> as well. That one does simply what AsEnumerable does too: it acts as the hint to the compiler to forget about the expression tree based querying mode and switch to in-memory queries.

IEnumerable & Good Practices (& WCF)

Is it a good practice to use IEnumerable application-wide whenever you don't need to actually add or remove things but only enumerate them?
Side question: Did you ever have any problems returning IEnumerable<T> from a WCF service? Can that cause problems to client applications? After all, I think that will be serialized to an array.
I tend to only return IEnumerable<T> when I want to hint to the caller that the implementation may use lazy evaluation. Otherwise, I'd usually return IList<T> or ICollection<T>, and implement as a ReadOnlyCollection<T> if the result should be readonly.
Lazy evaluation can be an important consideration: if your implementation can throw an exception, this won't be thrown until the caller starts enumerating the result. By returning IList<T> or ICollection<T>, you're guaranteeing that any exception will be thrown at the point the method is called.
In the case of a WCF method, returning IEnumerable<T> from a method that uses lazy evaluation means any exception might not be thrown until your response is being serialized - giving you less opportunity to handle it server-side.
I don't have any Good Practices sources, but i often tend to rely on List for my collections and it implements IEnumerable but i do pass it around as a List and not a IEnumerable, if i need it to be read only i rather pass a ReadOnlyCollection..
I don't like to return or accept IList<T> or List<T> because they implies the ability to modify a collection.
So prefer to return T[] as fixed-sized collection. Also array can be easily mapped to any other framework, platform, etc.
And prefer to accept IEnumerable<T> to emphasize that a method will enumerate that collection.

Why create an IEnumerable?

I don't understand why I'd create an IEnumerable. Or why it's important.
I'm looking at the example for IEnumerable:
http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx
But I can basically do the same thing if I just went:
List<Person> people = new List<Person>();
so what's IEnumerable good for? Can you give me a situation where I'd need to create a class that implements IEnumerable?
IEnumerable is an interface, it exposes certain things to the outside. While you are completely right, you could just use a List<T>, but List<T> is very deep in the inheritance tree. What exactly does a List<T>? It stores items, it offers certain methods to Add and Remove. Now, what if you only need the "item-keeping" feature of a List<T>? That's what an IEnumerable<T> is - an abstract way of saying "I want to get a list of items I can iterate over". A list is "I want to get a collection which I can modify, can access by index and iterate". List<T> offers a lot more functionality than IEnumerable<T> does, but it takes up more memory. So if a method is taking an IEnumerable<T>, it doesn't care what exactly it gets, as long as the object offers the possibilites of IEnumerable<T>.
Also, you don't have to create your own IEnumerable<T>, a List<T> IS an IEnumerable<T>!
Lists are, of course IEnumerable - As a general rule, you want to be specific on what you output but broad on what you accept as input eg:
You have a sub which loops through a list of objects and writes something to the console...
You could declare the parameter is as either IEnumerable<T> or IList<T> (or even List<T>). Since you don't need to add to the input list, all you actually need to do is enumerate - so use IEnumerable - then your method will also accept other types which implement IEnumerable including IQueryable, Linked Lists, etc...
You're making your methods more generic for no cost.
Today, you generally wouldn't use IEnumerable anymore unless you were supporting software on an older version of the framework. Today, you'd normally use IEnumerable<T>. Amongst other benefits, IEnumerable fully implements all of the LINQ operations/extensions so that you can easily query any List type that implements IEnumerable<T> using LINQ.
Additionally, it doesn't tie the consumer of your code to a particular collection implementation.
It's rare that nowdays you need to create your own container classes, as you are right there alreay exists many good implementations.
However if you do create your own container class for some specific reason, you may like to implement IEnumerable or IEnumerable<T> because they are a standard "contract" for itteration and by providing an implementation you can take advantage of methods/apis that want an IEnumerable or IEnumerable<T> Linq for example will give you a bunch of useful extension methods for free.
An IList can be thought of as a particular implementation of IEnumerable. (One that can be added to and removed from easily.) There are others, such as IDictionary, which performs an entirely different function but can still be enumerated over. Generally, I would use IEnumerable as a more generic type reference when I only need an enumeration to satisfy a requirement and don't particularly care what kind it is. I can pass it an IList and more often than not I do just that, but the flexibility exists to pass it other enumerations as well.
Here is one situation that I think I have to implement IEnumerable but not using List<>
I want to get all items from a remote server. Let say I have one million items going to return. If you use List<> approach, you need to cache all one million items in the memory first. In some cases, you don't really want to do that because you don't want to use up too much memory. Using IEnumerable allows you to display the data on the screen and then dispose it right away. Therefore, using IEnumerable approach, the memory footprint of the program is much smaller.
It's my understanding that IEnumerable is provided to you as an interface for creating your own enumerable class types.
I believe a simple example of this would be recreating the List type, if you wanted to have your own set of features (or lack thereof) for it.
What if you want to enumerate over a collection that is potentially of infinite size, such as the Fibonacci numbers? You couldn't do that easily with a list, but if you had a class that implemented IEnumerable or IEnumerable<T>, it becomes easy.
When a built in container fits your needs you should definitely use that, and than IEnumerable comes for free. When for whatever reason you have to implement your own container, for example if it must be backed by a DB, than you should make sure to implement both IEnumerable and IEnumerable<T> for two reasons:
It makes foreach work, which is awesome
It enables almost all LINQ goodness. For example you will be able to filter your container down to objects that match a condition with an elegant one liner.
IEnumerable provides means for your API users (including yourself) to use your collection by the means of a foreach. For example, i implemented IENumerable in my Binary Tree class so i could just foreach over all of the items in the tree without having to Ctrl+C Ctrl+V all the logic required to traverse the tree InOrder.
Hope it helps :)
IEnumerable is useful if you have a collection or method which can return a bunch of things, but isn't a Dictionary, List, array, or other such predefined collection. It is especially useful in cases where the set of things to be returned might not be available when one starts outputting it. For example, an object to access records in a database might implement iEnumerable. While it might be possible for such an object to read all appropriate records into an array and return that, that may be impractical if there are a lot of records. Instead, the object could return an enumerator which could read the records in small groups and return them individually.

IEnumerable<T> vs T[]

I just realize that maybe I was mistaken all the time in exposing T[] to my views, instead of IEnumerable<T>.
Usually, for this kind of code:
foreach (var item in items) {}
item should be T[] or IEnumerable<T>?
Than, if I need to get the count of the items, would the Array.Count be faster over the IEnumerable<T>.Count()?
IEnumerable<T> is generally a better choice here, for the reasons listed elsewhere. However, I want to bring up one point about Count(). Quintin is incorrect when he says that the type itself implements Count(). It's actually implemented in Enumerable.Count() as an extension method, which means other types don't get to override it to provide more efficient implementations.
By default, Count() has to iterate over the whole sequence to count the items. However, it does know about ICollection<T> and ICollection, and is optimised for those cases. (In .NET 3.5 IIRC it's only optimised for ICollection<T>.) Now the array does implement that, so Enumerable.Count() defers to ICollection<T>.Count and avoids iterating over the whole sequence. It's still going to be slightly slower than calling Length directly, because Count() has to discover that it implements ICollection<T> to start with - but at least it's still O(1).
The same kind of thing is true for performance in general: the JITted code may well be somewhat tighter when iterating over an array rather than a general sequence. You'd basically be giving the JIT more information to play with, and even the C# compiler itself treats arrays differently for iteration (using the indexer directly).
However, these performance differences are going to be inconsequential for most applications - I'd definitely go with the more general interface until I had good reason not to.
It's partially inconsequential, but standard theory would dictate "Program against an interface, not an implementation". With the interface model you can change the actual datatype being passed without effecting the caller as long as it conforms to the same interface.
The contrast to that is that you might have a reason for exposing an array specifically and in which case would want to express that.
For your example I think IEnumerable<T> would be desirable. It's also worthy to note that for testing purposes using an interface could reduce the amount of headache you would incur if you had particular classes you would have to re-create all the time, collections aren't as bad generally, but having an interface contract you can mock easily is very nice.
Added for edit:
This is more inconsequential because the underlying datatype is what will implement the Count() method, for an array it should access the known length, I would not worry about any perceived overhead of the method.
See Jon Skeet's answer for an explanation of the Count() implementation.
T[] (one sized, zero based) also implements ICollection<T> and IList<T> with IEnumerable<T>.
Therefore if you want lesser coupling in your application IEnumerable<T> is preferable. Unless you want indexed access inside foreach.
Since Array class implements the System.Collections.Generic.IList<T>, System.Collections.Generic.ICollection<T>, and System.Collections.Generic.IEnumerable<T> generic interfaces, I would use IEnumerable, unless you need to use these interfaces.
http://msdn.microsoft.com/en-us/library/system.array.aspx
Your gut feeling is correct, if all the view cares about, or should care about, is having an enumerable, that's all it should demand in its interfaces.
What is it logically (conceptually) from the outside?
If it's an array, then return the array. If the only point is to enumerate, then return IEnumerable. Otherwise IList or ICollection may be the way to go.
If you want to offer lots of functionality but not allow it to be modified, then perhaps use a List internally and return the ReadonlyList returned from it's .AsReadOnly() method.
Given that changing the code from an array to IEnumerable at a later date is easy, but changing it the other way is not, I would go with a IEnumerable until you know you need the small spead benfit of return an array.

Categories