Why Are AsObservable and AsEnumerable Implemented Differently? - c#

The implementation of Enumerable.AsEnumerable<T>(this IEnumerable<T> source) simply returns source. However Observable.AsObservable<T>(this IObservable<T> source) returns an AnonymousObservable<T> subscribing to the source rather than simply returning the source.
I understand these methods are really useful for changing the monad within a single query (going from IQueryable => IEnumerable). So why do the implementations differ?
The Observable version is more defensive, in that you can't cast it to some known type (if it original were implemented as a Subject<T> you'd never be able to cast it as such). So why does the Enumerable version not do something similar? If my underlying type is a List<T> but expose it as IEnumerable<T> through AsEnumerable, it will be possible to cast back to a List<T>.
Please note that this isn't a question on how to expose IEnumerable<T> without being able to cast to the underlying, but why the implementations between Enumerable and Observable are semantically different.

Your question is answered by the documentation, which I encourage you to read when you have such questions.
The purpose of AsEnumerable is to hint to the compiler "please stop using IQueryable and start treating this as an in-memory collection".
As the documentation states:
The AsEnumerable<TSource>(IEnumerable<TSource>) method has no effect other than to change the compile-time type of source from a type that implements IEnumerable<T> to IEnumerable<T> itself. AsEnumerable<TSource>(IEnumerable<TSource>) can be used to choose between query implementations when a sequence implements IEnumerable<T> but also has a different set of public query methods available.
If you want to hide the implementation of an underlying sequence, use sequence.Select(x=>x) or ToList or ToArray if you don't care that you're making a mutable sequence.
The purpose of AsObservable is to hide the implementation of the underlying collection. As the documentation says:
Observable.AsObservable<TSource> ... Hides the identity of an observable sequence.
Since the two methods have completely different purposes, they have completely different implementations.

You're right about the relationship between AsEnumerable and AsObservable wrt the aspect of switching from expression tree based queries to in-memory queries.
At the same time, exposing an Rx sequence based on a Subject<T> is very common, and we needed a way to hide it (otherwise the user could cast to IObservable<T> and inject elements).
A long while ago in the history of Rx pre-releases, we did have a separate Hide method, which was merely a Select(x => x) alias. We never quite liked it and decided to have a place where we deviated from the LINQ to Objects precise mirrorring, and made AsObservable play the role of Hide, also based on users who believed this was what it did to begin with.
Notice though, we do have an extension method called AsObservable on IQbservable<T> as well. That one does simply what AsEnumerable does too: it acts as the hint to the compiler to forget about the expression tree based querying mode and switch to in-memory queries.

Related

Using IReadOnlyCollection<T> instead of IEnumerable<T> for parameters to avoid possible multiple enumeration

My question is related to this one concerning the use of IEnumerable<T> vs IReadOnlyCollection<T>.
I too have always used IEnumerable<T> to expose collections as both return types and parameters because it benefits from being both immutable and lazily executed.
However, I am becoming increasingly concerned about the proliferation of places in my code where I must enumerate a parameter to avoid the possible multiple enumeration warning that ReSharper gives. I understand why ReSharper suggests this, and I agree with the code it suggests (below) in order to ensure encapsulation (i.e., no assumptions about the caller).
Foo[] arr = col as Foo[] ?? col.ToArray();
However, I find the repetitiveness of this code pollutive, and I agree with some sources that IReadOnlyCollection<T> is a better alternative, particularly the points made in this article, which states:
Lately, I’ve been considering the merits and demerits of returning
IEnumerable<T>.
On the plus side, it is about as minimal as an interface gets, so it
leaves you as method author more flexibility than committing to a
heavier alternative like IList<T> or (heaven forbid) an array.
However, as I outlined in the last post, an IEnumerable<T> return
entices callers to violate the Liskov Substitution Principle. It’s too
easy for them to use LINQ extension methods like Last() and Count(),
whose semantics IEnumerable<T> does not promise.
What’s needed is a better way to lock down a returned collection
without making such temptations so prominent. (I am reminded of Barney
Fife learning this lesson the hard way.)
Enter IReadOnlyCollection<T>, new in .NET 4.5. It adds just one
property to IEnumerable<T>: the Count property. By promising a count,
you assure your callers that your IEnumerable<T> really does have a
terminus. They can then use LINQ extension methods like Last() with a
clear conscience.
However, as the observant may have noticed, this article only talks about using IReadOnlyCollection<T> for return types. My question is, would the same arguments equally apply to using it for parameters also? Any theoretical thoughts or comments on this would also be appreciated.
In fact, I'm thinking a general rule of thumb to use IReadOnlyCollection<T> would be where there would be possible multiple enumeration (vis-à-vis the ReSharper warning) if IEnumerable<T> is used. Otherwise, use IEnumerable<T>.
Having thought about this further, I have come to the conclusion, based on the article I mentioned in my Question, that it is indeed OK to use IReadOnlyCollection<T> as a parameter, but only in functions where it will definitely be enumerated. If enumeration is conditional based on other parameters, object state, or workflow, then it should still be passed in as IEnumerable<T> so that lazy evaluation is semantically ensured.

Why I should not always be using ICollection instead of IEnumerable?

I ended up in this post while searching for solutions to my problem - which led me to propose a new answer there - and - to be confronted with the following question:
Considering ICollection implements IEnumerable, and all linq extensions apply to both interfaces, is there any scenario where I would benefit from working with an IEnumerable instead of an ICollection ?
The non generic IEnumerable, for instance, does not provide a Count extension.
Both ICollection interfaces do.
Given all ICollection, in any case, provide all functionality IEnumerable implement - since it itself implements it - why then would I opt for IEnumerable in place of ICollection ?
Backward compatibility with previous frameworks where ICollection was not available ?
I think there are actually two questions to answer here.
When would I want IEnumerable<T>?
Unlike other collection types and the language in general, queries on IEnumerables are executed using lazy evaluation. That means you can potentially perform several related queries in only enumeration.
It's worth noting that lazy evaluation doesn't interact nicely with side effects because multiple enumeration could then give different results, you need to keep this in mind when using it. In a language like C#, lazy evaluation can be a very powerful tool but also a source of unexplained behaviour if you aren't careful.
When would I not want ICollection<T>?
ICollection<T> is a mutable interface, it exposes, amongst other things, add and remove methods. Unless you want external things to be mutating your object's contents, you don't want to be returning it. Likewise, you generally don't want to be passing it in as an argument for the same reason.
If you do want explicit mutability of the collection, by all means use ICollection<T>.
Additionally, unlike IEnumerable<T> or IReadOnlyCollection<T>, ICollection<T> is not covariant which reduces the flexibility of the type in certain use cases.
Non-generic versions
Things change a bit when it comes to the non-generic versions of these interfaces. In this case, the only real difference between the two is the lazy evaluation offered by IEnumerable and the eager evaluation of ICollection.
I personally would tend to avoid the non-generic versions due to the lack of type safety and poor performance from boxing/unboxing in the case of value types.
Summary
If you want lazy evaluation, use IEnumerable<T>. For eager evaluation and immutability use IReadOnlyCollection<T>. For explicit mutability use ICollection<T>.
IEnumerable provides a read-only interface to a collection and ICollection allows modification. Also IEnumerable needs just to know how to iterate over elements. ICollection has to provide more information.
This is semantically different. You don't always want to provide a functionality for modification of a collection.
There is a IReadOnlyCollection but it doesn't implement ICollection. This is a design of C#, that ReadOnly is a different stripped down interface.
The point made by Tim is quite important. The internal working for Count might be dramatically different. IEnumerable does not need to know how many elements it spans over. Collection has a Property, so it has to know how many elements it contains. That is another crucial difference.
The idea is to use the simplest contract (interface) which fulfills the requirements: ICollection is a collection, IEnumerable is a sequence. A sequence could have deferred execution, it could be infinite, etc. The interface IEnumerable just tells you that you can enumerate the sequence, that is all. This is different from ICollection, which represents an actual collection containing a finite number of items.
As you can see, these are quite different. You cannot ignore the semantics of these contracts, and just focus on which interface inherits which other one.
If your algorithm only involves enumeration of the input data, then it should take an IEnumerable. If you are, by contract, dealing with collections (i.e, you expect collections and nothing else), then you should use ICollection.

IEnumerable to IReadOnlyCollection

I have IEnumerable<Object> and need to pass to a method as a parameter but this method takes IReadOnlyCollection<Object>
Is it possible to convert IEnumerable<Object> to IReadOnlyCollection<Object> ?
One way would be to construct a list, and call AsReadOnly() on it:
IReadOnlyCollection<Object> rdOnly = orig.ToList().AsReadOnly();
This produces ReadOnlyCollection<object>, which implements IReadOnlyCollection<Object>.
Note: Since List<T> implements IReadOnlyCollection<T> as well, the call to AsReadOnly() is optional. Although it is possible to call your method with the result of ToList(), I prefer using AsReadOnly(), so that the readers of my code would see that the method that I am calling has no intention to modify my list. Of course they could find out the same thing by looking at the signature of the method that I am calling, but it is nice to be explicit about it.
Since the other answers seem to steer in the direction of wrapping the collections in a truly read-only type, let me add this.
I have rarely, if ever, seen a situation where the caller is so scared that an IEnumerable<T>-taking method might maliciously try to cast that IEnumerable<T> back to a List or other mutable type, and start mutating it. Cue organ music and evil laughter!
No. If the code you are working with is even remotely reasonable, then if it asks for a type that only has read functionality (IEnumerable<T>, IReadOnlyCollection<T>...), it will only read.
Use ToList() and be done with it.
As a side note, if you are creating the method in question, it is generally best to ask for no more than an IEnumerable<T>, indicating that you "just want a bunch of items to read". Whether or not you need its Count or need to enumerate it multiple times is an implementation detail, and is certainly prone to change. If you need multiple enumeration, simply do this:
items = items as IReadOnlyCollection<T> ?? items.ToList(); // Avoid multiple enumeration
This keeps the responsibility where it belongs (as locally as possible) and the method signature clean.
When returning a bunch of items, on the other hand, I prefer to return an IReadOnlyCollection<T>. Why? The goal is to give the caller something that fulfills reasonsable expectations - no more, no less. Those expectations are usually that the collection is materialized and that the Count is known - precisely what IReadOnlyCollection<T> provides (and a simple IEnumerable<T> does not). By being no more specific than this, our contract matches expectations, and the method is still free to change the underlying collection. (In contrast, if a method returns a List<T>, it makes me wonder what context there is that I should want to index into the list and mutate it... and the answer is usually "none".)
As an alternative to dasblinkenlight's answer, to prevent the caller casting to List<T>, instead of doing orig.ToList().AsReadOnly(), the following might be better:
ReadOnlyCollection<object> rdOnly = Array.AsReadOnly(orig.ToArray());
It's the same number of method calls, but one takes the other as a parameter instead of being called on the return value.

Why is IList not deferred execution?

As I understand it IEnumerable and IQueryable are deferred execution. Why wouldn't it be of benefit for IList to also support deferred execution?
The longer I think about it, the more I question whether the whole idea of "deferred execution" is actually of pedagogic value at all.
I'll answer your question by denying it. IEnumerable<T>, IQueryable<T> and IList<T> do not in any way represent "deferred" or "eager" calculations.
Rather, interfaces represent an ability to provide a service. IEnumerable<T> represents the service "I can provide a sequence, possibly infinite, of items of type T, one at a time". IQueryable<T> represents the service "I can represent a query against a data source, and provide the results of that query on demand". IList<T> represents the service "I can provide random access to a possibly mutable, finite-sized list of items of type T".
None of those services say anything about the implementation details of the service providers. The provider of an IList<T> service could be entirely lazy; the provider of an IQueryable<T> service could be entirely eager. If you want to make a deferred-execution IList<T>, you go right ahead. No one is stopping you!
IList<T> supports random access via the indexer - as well as the Count property. Both of these go against the spirit of deferred execution with streaming data, unless you're suggesting that you really just want something which loads the entire list as soon as you access anything. If that's what you're after, Lazy<T> might be the ticket...
As #Joe pointed out, IList<T> abstracts a noun. IEnumerable<T> and IQueryable<T> are abstractions of verbs.
IList<T> is an abstraction of a collection that can be treated as a list. IEnumerable<T> and 'IQueryable abstract actions: the enumeration over, or querying of the items contained in the underlying collection.
Think through the consequences of what you're saying.
What would be deferred, and until when?
In the case of IEnumerable and IQueryable what's deferred is the enumeration. And the interfaces don't expose anything else that depends on the enumeration having been done.
There are a multitude of reasons, just some:
an IList should be independent from the underlying data source that
was used to create the IList
accessing an IList member by index is expected to be O(1) - with deferred execution this cannot be guaranteed - access might be very slow, at least the first time.
accessing an IList member should not produce an exception because of how the IList being constructed when you access (there are all sorts of problems that can arise with deferred execution, think i.e. not being able to access the original data source) - this is going back to the first point.

Why does IList<T> not provide all the methods that List<T> does? Which should I use?

I have always been taught that programming against an interface is better, so parameters on my methods I would set to IList<T> rather than List<T>..
But this means I have to cast to List<T> just to use some methods, one comes to mind is Find for example.
Why is this? Should I continue to program against interfaces, but continue to cast or revert?
I am a little bit confused why Find (for example) isn't available on the IList<T> which List<T> inherits from.
Personally I would use IList<T> rather than List<T>, but then use LINQ (Select, Where etc) instead of the List-specific methods.
Casting to List<T> removes much of the point of using IList<T> in the first place - and actually makes it more dangerous, as the implementation may be something other than List<T> at execution time.
In the case of lists you could continue programming against interfaces and use LINQ to filter your objects. You could even work with IEnumerable<T> which is even higher in the object hierarchy.
But more generally if the consumer of your API needs to call a specific method you probably haven't chosen the proper interface to expose.
I am a little bit confused why Find
(for example) isn't available on the
IList which List inherits from.
While I'm not privy to the decision process of the designers, there are a few things they were probably thinking.
1) Not putting these methods on IList keeps the intent of the contract clearer. According to MSDN, IList "Represents a collection of objects that can be individually accessed by index." Adding Find would change the contract to a searchable, indexable collection.
2) Every method you put on an interface makes it harder to implement the interface. If all of those methods were on IList, it would be much more tedious to implement IList. Especially since:
3) Most implementations of these methods would be the same. Find and several of the others on List would really be better placed on a helper class. Take for example, ReadOnlyCollection, Collection, ObservableCollection, and ReadOnlyObservableCollection. If I had to implement Find on all of those (pre-LINQ), I would make a helper class that takes IEnumerable and a predicate and just loop over the collections and have the implementations call the helper method.
4) LINQ (Not so much a reason why it didn't happen, more of why it isn't needed in the future.) With LINQ and extension methods, all IEnumerable's now "have" Find as an extension method (only they called it Where).
I think it's because IList can be different collection types (ie. an IEnumerable of some sort, an array or so).
You can use the Where extension method from System.Linq. Avoid casting back to List from IList.
If you find that the IList<T> parameter being passed between various classes is consistently being recast into List<T>, this indicates that there is a fundamental problem with your design.
From what you're describing, it's clear that you want to use polymorphism, but recasting on a consistent basis to List<T> would mean that IList<T> does not have the level of polymorphism you need.
On the other side of the coin, you simply might be targeting the wrong polymorphic method (e.g., Find rather than FirstOrDefault).
In either case, you should review your design and see what exactly you want to accomplish, and make the choice of List<T> or IList<T> based on the actual requirements, rather than conformity to style.
If you expose your method with a IList<> parameter, someone can pass, for exemple, a ReadOnlyCollection<>, witch is an IList<> but is not a List<>. So your API will crash at runtime.
If you expose a public method with a IList<> parameter, you cannot assume that it is a specific implementation of an IList<>. You must use it as an IList<> an nothing more.
If the list is some part of an Api or service that is exposed then it is probably better to have as an IList to allow the change of the implementation internally.
There is already much discussion on this topic.
No, in this case it has no sense to program to interfaces, because your List is NOT an IList, having extra methods on it.

Categories