IEnumerable and order - c#

I have got a question about the order in IEnumerable.
As far as I am aware, iterating through IEnumerable is pseudo-code can be written in the following way:
while (enumerable.HasNext())
{
object obj = enumerable.Current;
...
}
Now, assume, that one needs to operate on a sorted collection. Can IEnumerable be used in this case or is it better to try other means (i.e. IList) with indexation support?
In other words: does the contract of IEnumerable make any guarantees about the order in general?
So, IEnumerable is not a proper mean for a generic interface that guarantees ordering. The new question is what interface or class should be used for an immutable collection with order? ReadonlyCollection? IList? Both of them contain Add() method (even is not implemented in the former one).
My own thoughts: IEnumerable does not provide any guarantees about the ordering. The correct implementation could return same elements in different order in different enumerations (consider an SQL query)
I am aware of LINQ First(), but if IEnumerable does not say a word about it's ordering, this extension is pretty useless.

IEnumerable/IEnumerable<T> makes no guarantees about ordering, but the implementations that use IEnumerable/IEnumerable<T>may or may not guarantee ordering.
For instance, if you enumerate List<T>, order is guaranteed, but if you enumerate HashSet<T> no such guarantee is provided, yet both will be enumerated using the IEnumerable<T> interface.

Implementation detail. IEnumerable will enumerate the item - how that is implemented is up to the implementation. MOST lists etc. run along their natural order (index 0 upward etc.).
does the contract of IEnumerable guarantee us some order in general case?
No, it guarantees enumeration only (every item one time etc.). IEnumerable has no guaranteed order because it is also usable on unordered items.
I know about LINQ First(), but if IEnumerable does not say a word about it's order, this extension is rather useless.
No, it is not, because you may have intrinsic order. You give SQL as example - the result is an IEnumerable, but if I have enforced ordering before (By using OrderBy()) then the IEnumerable is ordered per definition of LINQ. AsEnumerable().First() gets me then the first item by Order.

Perhaps you are looking for the IOrderedEnumerable interface? It is returned by extensions methods like OrderBy() and allow for subsequent sorting with ThenBy().

You mix two points: enumerating and ordering.
When you enumerate over IEnumerable you should not care about order. You work with the interface, and its implementation should care about order.
For instance:
void Enumerate(IEnumerable sequence)
{
// loop
}
SortedList<T> sortedList = ...
Enumerate (sortedList);
Inside the method it's still a list with fixed order, but method doesn't know about particular interface implementation and it's peculiarity.

Related

How to count the items of an IEnumerable?

Given an instance IEnumerable o how can I get the item Count? (without enumerating through all the items)
For example, if the instance is of ICollection, ICollection<T> and IReadOnlyCollection<T>, each of these interfaces have their own Count method.
Is getting the Count property by reflection the only way?
Instead, can I check and cast o to ICollection<T> for example, so I can then call Count ?
It depends how badly you want to avoid enumerating the items if the count is not available otherwise.
If you can enumerate the items, you can use the LINQ method Enumerable.Count. It will look for a quick way to get the item count by casting into one of the interfaces. If it can't, it will enumerate.
If you want to avoid enumeration at any cost, you will have to perform a type cast. In a real life scenario you often will not have to consider all the interfaces you have named, since you usually use one of them (IReadOnlyCollection is rare and ICollection only used in legacy code). If you have to consider all of the interfaces, try them all in a separate method, which can be an extension:
static class CountExtensions {
public static int? TryCount<T>(this IEnumerable<T> items) {
switch (items) {
case ICollection<T> genCollection:
return genCollection.Count;
case ICollection legacyCollection:
return legacyCollection.Count;
case IReadOnlyCollection<T> roCollection:
return roCollection.Count;
default:
return null;
}
}
}
Access the extension method with:
int? count = myEnumerable.TryCount();
IEnumerable doesn't promise a count . What if it was a random sequence or a real time data feed from a sensor? It is entirely possible for the collection to be infinitely sized. The only way to count them is to start at zero and increment for each element that the enumerator provides. Which is exactly what LINQ does, so don't reinvent the wheel. LINQ is smart enough to use .Count properties of collections that support this.
The only way to really cover all your possible types for a collection is to use the generic interface and call the Count-method. This also covers other types such as streams or just iterators. Furthermore it will use the Count-property as of Count property vs Count() method? to avoid unneccessary overhead.
If you however have a non-generic collection you´d have to use reflection to use the correct property. However this is cumbersome and may fail if your collection doesn´t even have the property (e.g. an endless stream or just an iterator). On the other hand IEnumerable<T>.Count() will handle those types with the optimization mentioned above. Only if neccessary it will iterate the entire collection.

What is IEnumerable interface in c#? What if we dont use it?

Searched in internet for What is IEnumerable interface in C#? The problem it solves? What if we don't use it? But never really did not get much. Lots of posts explain how to implement it.
I've also found the following example
List<string> List = new List<string>();
List.Add("Sourav");
List.Add("Ram");
List.Add("Sachin");
IEnumerable names = from n in List where (n.StartsWith("S")) select n;
// var names = from n in List where (n.StartsWith("S")) select n;
foreach (string name in names)
{
Console.WriteLine(name);
}
The above ex outputs:
Sourav
Sachin
I wanted to know, the advantage of using IEnumerable in the above example? I can achieve the same using 'var' (commented line).
I would appreciate if anyone of you can help me out to understand this and whats the benefit of using IEnumerable with an example? What if we don't use it?.
Beyond reading the documentation I'd describe IEnumerable<T> as a collection of Ts, it can be iterated over and many other functions can be carried out (such as Where(), Any() and Count()) however it's not designed for adding and removing elements. That's a List<T>.
It's useful because it's a fundamental interface for many collections, various data access layers and ORMs use it and many extension methods are automatically included for it.
Many concrete implementations of Lists, Arrays, Bags, Queues, Stacks all implement it allowing a wide variety of collections to use it's extension methods.
Also collections implementing either IEnumerable or IEnumerable can be used in a foreach loop.
From msdn
for each element in an array or an object collection that implements
the System.Collections.IEnumerable or
System.Collections.Generic.IEnumerable interface.
In your code example you've got a variable called names which will be an IEnumerable<string>, it's important to understand that it will be an IEnumerable<string> regardless of whether you use the var keyword or not. var just allows you to avoid writing the type so explicitly each time.
TLDR
It's a common base interface for many different types of collections which let you use your collection in foreach loops and provides a lot of extra extension methods for free.
IEnumerable and much more preferred IEnumerable<T> are the standard way to handle the 'sequence of elements' pattern.
The idea is each type : IEnumerable<T> looks like if there's a label: "ENUMERATE ME". No matter what's there: queue of order items, collection of controls, records from a sql query, xml element subnodes etc etc etc - it's all the same from enumerable's point of view: you've got a sequence and you can do something for each item from the sequence.
Note that IEnumerable is somewhat limited: there's no count, no indexed access, no guarantee for repeatable results, no way to check if enumerable is empty but to get the enumerator and to check if there is anything. The simplicity allows to cover almost all use cases, from collections to ad-hoc sequences (custom iterators, linq queries etc).
The question was asked multiple times, here're some answers: 1, 2, 3
MSDN
"The disadvantage of omitting IEnumerable and IEnumerator is that the collection class is no longer interoperable with the foreach statements, or equivalent statements, of other common language runtime languages."
So you need to implement this interface so your custom collection type can be used with other CLR languages. It seems like a CLS requirement.

Returning 'IList' vs 'ICollection' vs 'Collection'

I am confused about which collection type that I should return from my public API methods and properties.
The collections that I have in mind are IList, ICollection and Collection.
Is returning one of these types always preferred over the others, or does it depend on the specific situation?
ICollection<T> is an interface that exposes collection semantics such as Add(), Remove(), and Count.
Collection<T> is a concrete implementation of the ICollection<T> interface.
IList<T> is essentially an ICollection<T> with random order-based access.
In this case you should decide whether or not your results require list semantics such as order based indexing (then use IList<T>) or whether you just need to return an unordered "bag" of results (then use ICollection<T>).
Generally you should return a type that is as general as possible, i.e. one that knows just enough of the returned data that the consumer needs to use. That way you have greater freedom to change the implementation of the API, without breaking the code that is using it.
Consider also the IEnumerable<T> interface as return type. If the result is only going to be iterated, the consumer doesn't need more than that.
The main difference between the IList<T> and ICollection<T> interfaces is that IList<T> allows you to access elements via an index. IList<T> describes array-like types. Elements in an ICollection<T> can only be accessed through enumeration. Both allow the insertion and deletion of elements.
If you only need to enumerate a collection, then IEnumerable<T> is to be preferred. It has two advantages over the others:
It disallows changes to the collection (but not to the elements, if they are of reference type).
It allows the largest possible variety of sources, including enumerations that are generated algorithmically and are not collections at all.
Allows lazy evaluation and can be queried with LINQ.
Collection<T> is a base class that is mainly useful to implementers of collections. If you expose it in interfaces (APIs), many useful collections not deriving from it will be excluded.
One disadvantage of IList<T> is that arrays implement it but do not allow you to add or remove items (i.e. you cannot change the array length). An exception will be thrown if you call IList<T>.Add(item) on an array. The situation is somewhat defused as IList<T> has a Boolean property IsReadOnly that you can check before attempting to do so. But in my eyes, this is still a design flaw in the library. Therefore, I use List<T> directly, when the possibility to add or remove items is required.
Which one should I choose? Let's consider just List<T> and IEnumerable<T> as examples for specialized / generalized types:
Method input parameter
IEnumerable<T> greatest flexibility for the caller. Restrictive for the implementer, read-only.
List<T> Restrictive for the caller. Gives flexibility to the implementer, can manipulate the collection.
Method ouput parameter or return value
IEnumerable<T> Restrictive for the caller, read-only. Greatest flexibility for the implementer. Allows to return about any collection or to implement an iterator (yield return).
List<T> Greatest flexibility for the caller, can manipulate the returned collection. Restrictive for the implementer.
Well, at this point you may be disappointed because I don't give you a simple answer. A statement like "always use this for input and that for output" would not be constructive. The reality is that it depends on use case. A method like void AddMissingEntries(TColl collection) will have to provide a collection type having an Add method or may even require a HashSet<T> for efficiency. A method void PrintItems(TColl collection) can happily live with an IEnumerable<T>.
IList<T> is the base interface for all generic lists. Since it is an ordered collection, the implementation can decide on the ordering, ranging from sorted order to insertion order. Moreover Ilist has Item property that allows methods to read and edit entries in the list based on their index.
This makes it possible to insert, remove a value into/from the list at a position index.
Also since IList<T> : ICollection<T>, all the methods from ICollection<T> are also available here for implementation.
ICollection<T> is the base interface for all generic collections. It defines size, enumerators and synchronization methods. You can add or remove an item into a collection but you cannot choose at which position it happens due to the absence of index property.
Collection<T> provides an implementation for IList<T>, IList and IReadOnlyList<T>.
If you use a narrower interface type such as ICollection<T> instead of IList<T>, you protect your code against breaking changes. If you use a wider interface type such as IList<T>, you are more in danger of breaking code changes.
Quoting from a source,
ICollection, ICollection<T> : You want to modify the collection or
you care about its size.
IList, IList<T>: You want to modify the collection and you care about the ordering and / or positioning of the elements in the collection.
Returning an interface type is more general, so (lacking further information on your specific use case) I'd lean towards that. If you want to expose indexing support, choose IList<T>, otherwise ICollection<T> will suffice. Finally, if you want to indicate that the returned types are read only, choose IEnumerable<T>.
And, in case you haven't read it before, Brad Abrams and Krzysztof Cwalina wrote a great book titled "Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries" (you can download a digest from here).
There are some subjects that come from this question:
interfaces versus classes
which specific class, from several alike classes, collection, list, array ?
Common classes versus subitem ("generics") collections
You may want to highlight that its an Object Oriented A.P.I.
interfaces versus classes
If you don't have much experience with interfaces, I recommend stick to classes.
I see a lot of times of developers jumping to interfaces, even if its not necesarilly.
And, end doing a poor interface design, instead of, a good class design,
which, by the way, can eventually, be migrated to a good interface design ...
You'll see a lot of interfaces in A.P.I., but, don't rush to it,
if you don't need it.
You will eventually learn how to apply interfaces, to your code.
which specific class, from several alike classes, collection, list, array ?
There are several classes in c# (dotnet) that can be interchanged. As already mention, if you need something from a more specific class, such as "CanBeSortedClass", then make it explicit in your A.P.I..
Does your A.P.I. user really needs to know, that your class can be sorted, or apply some format to the elements ? Then use "CanBeSortedClass" or "ElementsCanBePaintedClass",
otherwise use "GenericBrandClass".
Otherwise, use a more general class.
Common collection classes versus subitem ("generics") collections
You'll find that there are classes that contains others elements,
and you can specify that all elements should be of an specific type.
Generic Collections are those classes that you can use the same collection,
for several code applications, without having to create a new collection,
for each new subitem type, like this: Collection.
Does your A.P.I. user is going to need a very specific type, same for all elements ?
Use something like List<WashingtonApple> .
Does your A.P.I. user is going to need several related types ?
Expose List<Fruit> for your A.P.I., and use List<Orange> List<Banana>, List<Strawberry> internally, where Orange, Banana and Strawberry are descendants from Fruit .
Does your A.P.I. user is going to need a generic type collection ?
Use List, where all items are object (s).
Cheers.

Does an ICollection<T> have an order?

Following the rules that a public APIs should never return a list, i'm blinding converting all code that returned lists, to return ICollection<T> instead:
public IList<T> CommaSeparate(String value) {...}
becomes
public ICollection<T> CommaSeparate(String value) {...}
And although an ICollection has a Count, there is no way to get items by that index.
And although an ICollection exposes an enumerator (allowing foreach), i see no guarantee that the order of enumeration starts at the "top" of the list, as opposed to the "bottom".
i could mitigate this by avoiding the use of ICollection, and instead use Collection:
public Collection<T> Commaseparate(String value) {...}
This allows the use of an Items[index] syntax.
Unfortunately, my internal implementation constructs an array; which i can be cast to return IList or ICollection, but not as a Collection.
Is there a ways to access items of a collection in order?
This begs the wider question: Does an ICollection even have an order?
Conceptually, imagine i want to parse a command line string. It is critical that the order of items be maintained.
Conceptually, i require a contract that indicates an "ordered" set of string tuples. In the case of an API contract, to indicate order, which of the following is correct:
IEnumerable<String> Grob(string s)
ICollection<String> Grob(string s)
IList<String> Grob(string s)
Collection<String> Grob(string s)
List<String> Grob(string s)
The ICollection<T> interface doesn't specify anything about an order. The objects will be in the order specified by the object returned. For example, if you return the Values collection of a SortedDictionary, the objects will be in the the order defined by the dictionary's comparer.
If you need the method to return, by contract, a type whose contract requires a certain order, then you should express that in the method's signature by returning the more specific type.
Regardless of the runtime type of the object returned, consider the behavior when the static reference is IList<T> or ICollection<T>: When you call GetEnumerator() (perhaps implicitly in a foreach loop), you're going to call the same method and get the same object regardless of the static type of the reference. It will therefore behave the same way regardless of the CommaSeparate() method's return type.
Additional thought:
As someone else pointed out, the FXCop rule warns against using List<T>, not IList<T>; the question you linked to is asking why FXCop doesn't recommend using IList<T> in place of List<T>, which is another matter. If I imagine that you are parsing a command-line string where order is important, I would stick with IList<T> if I were you.
ICollection does not have a guaranteed order, but the class that actually implements it may (or may not).
If you want to return an ordered collection, then return an IList<T> and don't get too hung up on FxCop's generally sound, but very generic, advice.
No, ICollection does not imply an order.
The ICollection instance has the "order" of whatever class that implements it. That is, referencing a List<T> as an ICollection will not alter its order at all.
Likewise, if you access an unordered collection as an ICollection, it will not impose an order on the unordered collection either.
So, to your question:
Does an ICollection even have an order?
The answer is: it depends solely on the class that implements it.
ICollection<T> may have an order, but the actual ordering depends on the class implementing it.
It does not have accesor for an item at given index. IList<T> specializes this interface to provide access by index.
An ICollection<T> is just an interface; whether it's ordered or not is entirely dependent up the implementation underlying it (which is supposed to be opaque).
If you want to be able to access it by index, you'd want to return things as an IList<T>, which is both IEnumerable<T> and ICollection<T>. One should bear in mind, though, that depending on the underlying implementation, that getting at an arbitrary item in the collection could require O(N/2) time on the average.
My inclination would be to avoid the 'collection' interfaces altogether and instead use a custom type representing the collection in terms of the problem domain and exposing the appropriate logical operations suitable for that type.
ICollection is just an interface—there is no implementation or explicit specification about ordering. That means if you return something that enumerates in an ordered manner, whatever is consuming your ICollection will do so in an ordered manner.
Order is only implied by the underlying, implementing, object. There is no specification in ICollection that says it should be ordered or not. Enumerating over a result multiple times will invoke the underlying object's enumerator, which is the only place that those rules would be set. An object doesn't change the way it is enumerated just because it inherits this interface. Had the interface specified that it is an ordered result, then you could safely rely on the ordered result of the implementing object.
It depends on the implementation of the instance. An ICollection that happens to be a List has an order, an ICollection that happens to be a Collection does not.
All ICollections implement IEnumerable, which returns the items one at a time, ordered or otherwise.
EDIT: In reply to your additional example about command line parsing in the question, I would argue that the appropriate return type depends on what you are doing with those arguments afterward, but IEnumerable is probably the right one.
My reasoning is that IList, ICollection, and their concrete implementations permit modification of the list returned from Grob, which you probably don't want. Since .NET doesn't have an Indexed Sequence interface, IEnumerable is the best bet to prevent your callers from doing something weird like trying to modify the parameter list that they get back.
If you expect that all present and future versions of your method will have no difficulty returning an object that will be able to quickly and easily retrieve the Nth item, use type IList<T> to return a reference to something that implements both IList<T> and non-generic ICollection. If you expect that some present or future versions might not be able to quickly and easily return the Nth item, but would be able to instantly report the number of items, use type ICollection<T> to return a reference something that implements ICollection<T> and non-generic ICollection. If you expect that present or future versions may have trouble even knowing how many items there are, return IEnumerable<T>. The question of sequencing is irrelevant; the ability to access the Nth thing implies that a defined sequence exists, but ICollection<T> says neither more nor less about sequencing than IEnumerable<T>.

Why create an IEnumerable?

I don't understand why I'd create an IEnumerable. Or why it's important.
I'm looking at the example for IEnumerable:
http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx
But I can basically do the same thing if I just went:
List<Person> people = new List<Person>();
so what's IEnumerable good for? Can you give me a situation where I'd need to create a class that implements IEnumerable?
IEnumerable is an interface, it exposes certain things to the outside. While you are completely right, you could just use a List<T>, but List<T> is very deep in the inheritance tree. What exactly does a List<T>? It stores items, it offers certain methods to Add and Remove. Now, what if you only need the "item-keeping" feature of a List<T>? That's what an IEnumerable<T> is - an abstract way of saying "I want to get a list of items I can iterate over". A list is "I want to get a collection which I can modify, can access by index and iterate". List<T> offers a lot more functionality than IEnumerable<T> does, but it takes up more memory. So if a method is taking an IEnumerable<T>, it doesn't care what exactly it gets, as long as the object offers the possibilites of IEnumerable<T>.
Also, you don't have to create your own IEnumerable<T>, a List<T> IS an IEnumerable<T>!
Lists are, of course IEnumerable - As a general rule, you want to be specific on what you output but broad on what you accept as input eg:
You have a sub which loops through a list of objects and writes something to the console...
You could declare the parameter is as either IEnumerable<T> or IList<T> (or even List<T>). Since you don't need to add to the input list, all you actually need to do is enumerate - so use IEnumerable - then your method will also accept other types which implement IEnumerable including IQueryable, Linked Lists, etc...
You're making your methods more generic for no cost.
Today, you generally wouldn't use IEnumerable anymore unless you were supporting software on an older version of the framework. Today, you'd normally use IEnumerable<T>. Amongst other benefits, IEnumerable fully implements all of the LINQ operations/extensions so that you can easily query any List type that implements IEnumerable<T> using LINQ.
Additionally, it doesn't tie the consumer of your code to a particular collection implementation.
It's rare that nowdays you need to create your own container classes, as you are right there alreay exists many good implementations.
However if you do create your own container class for some specific reason, you may like to implement IEnumerable or IEnumerable<T> because they are a standard "contract" for itteration and by providing an implementation you can take advantage of methods/apis that want an IEnumerable or IEnumerable<T> Linq for example will give you a bunch of useful extension methods for free.
An IList can be thought of as a particular implementation of IEnumerable. (One that can be added to and removed from easily.) There are others, such as IDictionary, which performs an entirely different function but can still be enumerated over. Generally, I would use IEnumerable as a more generic type reference when I only need an enumeration to satisfy a requirement and don't particularly care what kind it is. I can pass it an IList and more often than not I do just that, but the flexibility exists to pass it other enumerations as well.
Here is one situation that I think I have to implement IEnumerable but not using List<>
I want to get all items from a remote server. Let say I have one million items going to return. If you use List<> approach, you need to cache all one million items in the memory first. In some cases, you don't really want to do that because you don't want to use up too much memory. Using IEnumerable allows you to display the data on the screen and then dispose it right away. Therefore, using IEnumerable approach, the memory footprint of the program is much smaller.
It's my understanding that IEnumerable is provided to you as an interface for creating your own enumerable class types.
I believe a simple example of this would be recreating the List type, if you wanted to have your own set of features (or lack thereof) for it.
What if you want to enumerate over a collection that is potentially of infinite size, such as the Fibonacci numbers? You couldn't do that easily with a list, but if you had a class that implemented IEnumerable or IEnumerable<T>, it becomes easy.
When a built in container fits your needs you should definitely use that, and than IEnumerable comes for free. When for whatever reason you have to implement your own container, for example if it must be backed by a DB, than you should make sure to implement both IEnumerable and IEnumerable<T> for two reasons:
It makes foreach work, which is awesome
It enables almost all LINQ goodness. For example you will be able to filter your container down to objects that match a condition with an elegant one liner.
IEnumerable provides means for your API users (including yourself) to use your collection by the means of a foreach. For example, i implemented IENumerable in my Binary Tree class so i could just foreach over all of the items in the tree without having to Ctrl+C Ctrl+V all the logic required to traverse the tree InOrder.
Hope it helps :)
IEnumerable is useful if you have a collection or method which can return a bunch of things, but isn't a Dictionary, List, array, or other such predefined collection. It is especially useful in cases where the set of things to be returned might not be available when one starts outputting it. For example, an object to access records in a database might implement iEnumerable. While it might be possible for such an object to read all appropriate records into an array and return that, that may be impractical if there are a lot of records. Instead, the object could return an enumerator which could read the records in small groups and return them individually.

Categories