I'm trying to understand the difference between sequences and lists.
In F# there is a clear distinction between the two. However in C# I have seen programmers refer to IEnumerable collections as a sequence. Is what makes IEnumerable a sequence the fact that it returns an object to iterate through the collection?
Perhaps the real distinction is purely found in functional languages?
Not really - you tend to have random access to a list, as well as being able to get its count quickly etc. Admittedly linked lists don't have the random access nature... but then they don't implement IList<T>. There's a grey area between the facilities provided by a particular platform and the general concepts.
Sequences (as represented by IEnumerable<T>) are read-only, forward-only, one item at a time, and potentially infinite. Of course any one implementation of a sequence may also be a list (e.g. List<T>) but when you're treating it as a sequence, you can basically iterate over it (repeatedly) and that's it.
I think that the confusion may arise from the fact that collections like List<T> implement the interface IEnumerable<T>. If you have a subtype relationship in general (e.g. supertype Shape with two subtypes Rectangle and Circle), you can interpret the relation as an "is-a" hierarchy.
This means that it is perfectly fine to say that "Circle is a Shape" and similarly, people would say that "List<T> is an IEnumerable<T>" that is, "list is a sequence". This makes some sense, because a list is a special type of a sequence. In general, sequences can be also lazily generated and infinite (and these types cannot also be lists). An example of a (perfectly valid) sequence that cannot be generated by a list would look like this:
// C# version // F# version
IEnumerable<int> Numbers() { let rec loop n = seq {
int i = 0; yield n
while (true) yield return i++; yield! loop(n + 1) }
} let numbers = loop(0)
This would be also true for F#, because F# list type also implements IEnumerable<T>, but functional programming doesn't put that strong emphasis on object oriented point of view (and implicit conversions that enable the "is a" interpretation are used less frequently in F#).
Sequence content is calculated on demand so you can implement for example infinite sequence without affecting your memory.
So in C# you can write a sequence, for example
IEnumerable<int> Null() {
yield return 0;
}
It will return infinite sequence of zeros.
You can write
int[] array = Null().Take(10).ToArray()
And it will take 10*4 bytes of memory despite sequence is infinite.
So as you see, C# does have distinction between sequence and collection
Related
Searched in internet for What is IEnumerable interface in C#? The problem it solves? What if we don't use it? But never really did not get much. Lots of posts explain how to implement it.
I've also found the following example
List<string> List = new List<string>();
List.Add("Sourav");
List.Add("Ram");
List.Add("Sachin");
IEnumerable names = from n in List where (n.StartsWith("S")) select n;
// var names = from n in List where (n.StartsWith("S")) select n;
foreach (string name in names)
{
Console.WriteLine(name);
}
The above ex outputs:
Sourav
Sachin
I wanted to know, the advantage of using IEnumerable in the above example? I can achieve the same using 'var' (commented line).
I would appreciate if anyone of you can help me out to understand this and whats the benefit of using IEnumerable with an example? What if we don't use it?.
Beyond reading the documentation I'd describe IEnumerable<T> as a collection of Ts, it can be iterated over and many other functions can be carried out (such as Where(), Any() and Count()) however it's not designed for adding and removing elements. That's a List<T>.
It's useful because it's a fundamental interface for many collections, various data access layers and ORMs use it and many extension methods are automatically included for it.
Many concrete implementations of Lists, Arrays, Bags, Queues, Stacks all implement it allowing a wide variety of collections to use it's extension methods.
Also collections implementing either IEnumerable or IEnumerable can be used in a foreach loop.
From msdn
for each element in an array or an object collection that implements
the System.Collections.IEnumerable or
System.Collections.Generic.IEnumerable interface.
In your code example you've got a variable called names which will be an IEnumerable<string>, it's important to understand that it will be an IEnumerable<string> regardless of whether you use the var keyword or not. var just allows you to avoid writing the type so explicitly each time.
TLDR
It's a common base interface for many different types of collections which let you use your collection in foreach loops and provides a lot of extra extension methods for free.
IEnumerable and much more preferred IEnumerable<T> are the standard way to handle the 'sequence of elements' pattern.
The idea is each type : IEnumerable<T> looks like if there's a label: "ENUMERATE ME". No matter what's there: queue of order items, collection of controls, records from a sql query, xml element subnodes etc etc etc - it's all the same from enumerable's point of view: you've got a sequence and you can do something for each item from the sequence.
Note that IEnumerable is somewhat limited: there's no count, no indexed access, no guarantee for repeatable results, no way to check if enumerable is empty but to get the enumerator and to check if there is anything. The simplicity allows to cover almost all use cases, from collections to ad-hoc sequences (custom iterators, linq queries etc).
The question was asked multiple times, here're some answers: 1, 2, 3
MSDN
"The disadvantage of omitting IEnumerable and IEnumerator is that the collection class is no longer interoperable with the foreach statements, or equivalent statements, of other common language runtime languages."
So you need to implement this interface so your custom collection type can be used with other CLR languages. It seems like a CLS requirement.
The different attributes of Tuples and Lists;
Tuples are heterogeneous and Lists are homogeneous,
Tuples are immutable while Lists are mutable,
often dictate the use of one type over the other. In other scenarios, however, either data type could be equally appropriate. As such , what are the memory and/or performance implications of Tuples versus Lists that might also guide our decision?
Thanks,
Well, in addition to what you've mentioned there's the rather significant difference that a tuple can only contain up to eight items. (OK, you can technically make arbitrarily large tuples by making the last tuple argument type another tuple, but I can't help but feel that you'd have to be slightly insane to actually do that.)
Unlike tuples in a language like Python, tuples in C# can't really be used as a general purpose data structure. One of the most common use cases for a tuple in C# is for returning multiple values from a function, or passing multiple values to functions that for some reasons can only take one (e.g. when passing e.Argument to a BackgroundWorker), or any other situation where you can't be bothered to make a custom class, and you can't use an anonymous type.
Since you need to know exactly how many items (and what types of items) they will contain at compile time, tuples are really of severely limited use. Lists, on the other hand, are for general purpose storage of homogeneous data, where you don't necessarily know how many items you're going to have. I'd love to see an example of a piece of code where, as you put it, "either data type could be equally appropriate".
Furthermore, since tuples and lists solve completely different problems, it's probably of fairly limited interest to compare the memory/performance implications. But for what it's worth, tuples are implemented as classes, not as structs, so they're stored on the heap just like lists, and they aren't copied when you pass them between functions, unlike value types. They do however implement the IStructuralEquatable and IStructuralComparable interfaces, and their Equals method is implemented such that this will return true: new Tuple<int>(1).Equals(new Tuple<int>(1)) (meanwhile, new List<int>() { 1 }.Equals(new List<int>() { 1 }) is false).
Must .NET's IList be finite? Suppose I write a class FibonacciList implementing IList<BigInteger>
The property Item[n] returns the nth Fibonacci number.
The property IsReadOnly returns true.
The methods IndexOf and Contains we can implement easily enough because the Fibonacci sequence is increasing - to test if the number m is Fibonacci, we need only to compute the finite sequence of Fibonacci numbers up to m.
The method GetEnumerator() doing the right thing
We've now implemented all the methods expected of read-only ILists except Count().
Is this cool, or an abuse of IList?
Fibonacci numbers get impractically big quickly (hence IList<BigInteger> above) . A bounded infinite sequence might be more sensible, it could implement IList<long> or IList<double>.
Addendum II: Fibonacci sequence may have been a bad example, because computing distant values is expensive - to find the nth value one has to compute all earlier values. Thus as Mošmondor said, one might as well make it an IEnumerable and use .ElementAt. However there exist other sequences where one can compute distant values quickly without computing earlier values. (Surprisingly the digits of pi are such a sequence). These sequences are more 'listy', they truly support random access.
Edit: No-one argues against infinite IEnumerables. How do they handle Count()?
To most developers, IList and ICollection imply that you have a pre-evaluated, in-memory collection to work with. With IList specifically, there is an implicit contract of constant-time Add* and indexing operations. This is why LinkedList<T> does not implement IList<T>. I would consider a FibonacciList to be a violation of this implied contract.
Note the following paragraph from a recent MSDN Magazine article discussing the reasons for adding read-only collection interfaces to .NET 4.5:
IEnumerable<T> is sufficient for most scenarios that deal with collections of types, but sometimes you need more power than it provides:
Materialization: IEnumerable<T> does not allow you to express whether the collection is already available (“materialized”) or whether it’s computed every time you iterate over it (for example, if it represents a LINQ query). When an algorithm requires multiple iterations over the collection, this can result in performance degradation if computing the sequence is expensive; it can also cause subtle bugs because of identity mismatches when objects are being generated again on subsequent passes.
As others have pointed out, there is also the question of what you would return for .Count.
It's perfectly fine to use IEnumerable or IQueryable in for such collections of data, because there is an expectation that these types can be lazily evaluated.
Regarding Edit 1: .Count() is not implemented by the IEnumerable<T> interface: it is an extension method. As such, developers need to expect that it can take any amount of time, and they need to avoid calling it in cases where they don't actually need to know the number of items. For example, if you just want to know whether an IEnumerable<T> has any items, it's better to use .Any(). If you know that there's a maximum number of items you want to deal with, you can use .Take(). If a collection has more than int.MaxValue items in it, .Count() will encounter an operation overflow. So there are some workarounds that can help to reduce the danger associated with infinite sequences. Obviously if programmers haven't taken these possibilities into account, it can still cause problems, though.
Regarding Edit 2: If you're planning to implement your sequence in a way that indexing is constant-time, that addresses my main point pretty handily. Sixlettervariables's answer still holds true, though.
*Obviously there's more to this: Add is only expected to work if IList.IsFixedSize returns false. Modification is only possible if IsReadOnly returns false, etc. IList was a poorly-thought-out interface in the first place: a fact which may finally be remedied by the introduction of read-only collection interfaces in .NET 4.5.
Update
Having given this some additional thought, I've come to the personal opinion that IEnumerable<>s should not be infinite either. In addition to materializing methods like .ToList(), LINQ has several non-streaming operations like .OrderBy() which must consume the entire IEnumerable<> before the first result can be returned. Since so many methods assume IEnumerable<>s are safe to traverse in their entirety, it would be a violation of the Liskov Substitution Principle to produce an IEnumerable<> that is inherently unsafe to traverse indefinitely.
If you find that your application often requires segments of the Fibonacci sequence as IEnumerables, I'd suggest creating a method with a signature similar to Enumerable.Range(int, int), which allows the user to define a starting and ending index.
If you'd like to embark on a Gee-Whiz project, you could conceivably develop a Fibonacci-based IQueryable<> provider, where users could use a limited subset of LINQ query syntax, like so:
// LINQ to Fibonacci!
var fibQuery = from n in Fibonacci.Numbers // (returns an IQueryable<>)
where n.Index > 5 && n.Value < 20000
select n.Value;
var fibCount = fibQuery.Count();
var fibList = fibQuery.ToList();
Since your query provider would have the power to evaluate the where clauses as lambda expressions, you could have enough control to implement Count methods and .GetEnumerator() in a way as to ensure that the query is restrictive enough to produce a real answer, or throw an exception as soon as the method is called.
But this reeks of being clever, and would probably be a really bad idea for any real-life software.
I would imagine that a conforming implementation must be finite, otherwise what would you return for ICollection<T>.Count?
/// <summary>
/// Gets the number of elements contained in the <see cref="ICollection{T}" />.
/// </summary>
int Count { get; }
Another consideration is CopyTo, which under its normal overload would never stop in a Fibonacci case.
What this means is an appropriate implementation of a Fibonacci Sequence would be simply IEnumerable<int> (using a generator pattern). (Ab)use of an IList<T> would just cause problems.
In your case, I would rather 'violate' IEnumerable and have my way with yield return.
:)
An infinite collection would probably best be implemented as an IEnumerable<T>, not an IList<T>. You could also make use of the yield return syntax when implementing, like so (ignore overflow issues, etc.):
public IEnumerable<long> Fib()
{
yield return 1;
yield return 1;
long l1 = 1;
long l2 = 1;
while (true)
{
long t = l1;
l1 = l2;
l2 = t + l1;
yield return l2;
}
}
As #CodeInChaos pointed out in the comments, the Item property of IList has signature
T this[ int index ] { get; set; }
We see ILists are indexed by ints, so their length is bounded by Int32.MaxValue . Elements of greater index would be inaccessible. This occurred to me when writing the question, but I left it out, because the problem is fun to think about otherwise.
EDIT
Having had a day to reflect on my answer and, in light of #StriplingWarrior's comment. I fear I have to make a reversal. I started trying this out last night and now I wonder what would I really lose by abandoning IList?
I think it would wiser to implement just IEnumerable and, declare a local Count() method that throws a NotSupportedException method to prevent the enumerator running until an OutOfMemoryException occurs. I would still add an IndexOf and Contains method and Item indexer property to expose higher performance alternatives like Binet's Formula but, I'd be free to change the signatures of these members to use extended datatypes potentially, even System.Numerics.BigInteger.
If I were implementing multiple series I would declare an ISeries interface for these members. Who know's, perhaps somthing like this will eventually be part of the framework.
I disagree with what appears to be a consensus view. Whilst IList has many members that cannot be implemented for an infinite series it does have an IsReadOnly member. It seems acceptable, certainly in the case of ReadOnlyCollection<>, to implement the majority of members with a NotSupportedException. Following this precedent, I don't see why this should be unacceptable if it is a side effect of some other gain in function.
In this specific Fibbonaci series case, there are established algortihms, see here and here, for shortcircuiting the normal cumalitive enumeration approach which I think would yield siginifcant performance benefits. Exposing these benefits through IList seems warranted to me.
Ideally, .Net would support some other, more appropriate super class of interface, somewhat closer to IEnumerable<> but, until that arrives in some future version, this has got to be a sensible approach.
I'm working on an implementation of IList<BigInteger> to illustrate
Summarising what I've seen so far:
You can fulfil 5 out of 6, throwing a NotSupportedException on Count()
I would have said this is probably good enough to go for it, however as servy has pointed out, the indexer is incredibly inefficient for any non-calculated and cached number.
In this case, I would say the only contract that fits your continual stream of calculations is IEnumerable.
The other option you have is to create something that looks a lot like an IList but isn't actually.
Will anyone describe IEnumerable and what is difference between IEnumerable and array
and where to use it.. all information about it and how to use it.
An array is a collection of objects with a set size.
int[] array = [0, 1, 2];
This makes it very useful in situations where you may want to access an item in a particular spot in the collection since the location in memory of each element is already known
array[1];
Also, the size of the array can be calculated quickly.
IEnumerable, on the other hand, basically says that given a start position it is possible to get the next value. One example of this may be an infinite series of numbers:
public IEnumerable<int> Infinite()
{
int i = 0;
while(true)
yield return i++;
}
Unlike an array an enumerable collection can be any size and it is possible to create the elements as they are required, rather than upfront, this allows for powerful constructs and is used extensively by LINQ to facilitate complex queries.
//This line won't do anything until you actually enumerate the created variable
IEnumerable<int> firstTenOddNumbers = Infinite().Where(x => x % 2 == 1).Take(10);
However the only way to get a specific element is to start at the beginning and enumerate through to the one you want. This will be considerably more expensive than getting the element from a pre-generated array.
Of course you can enumerate through an array, so an array implements the IEnumerable interface.
.NET has its IEnumerable interface misnamed - it should be IIterable. Basically a System.Collection.IEnumerable or (since generics) System.Collection.Generic.IEnumerable allows you to use foreach on the object implementing these interfaces.
(Side note: actually .NET is using duck typing for foreach, so you are not required to implement these interfaces - it's enough if you provide the suitable method implementations.)
An array (System.Array) is a type of a sequence (where by sequence I mean an iterable data structure, i.e. anything that implements IEnumerable), with some important differences.
For example, an IEnumerable can be - and is often - lazy-loaded. That means that until you explicitly iterate over it, the items won't be created. This can lead to strange behaviour if you're not aware of it.
As a consequence, an IEnumerable has no means of telling you how many items it contains until you actually iterate over it (which the Count extension method in System.Linq.Enumerable class does).
An array has a Length property, and with this we have arrived to the most important difference: an array if a sequence of fixed (and known) items. It also provides an indexer, so you can conveniently access its items without actually iterating over it.
And just for the record, the "real" enumerations in .NET are types defined with the enum keyword. They allow you express a choices without using magic numbers or strings. They can be also used as flags, when marked with the FlagsAttribute.
I suggest you to use your favioure search engine to get more details about these concepts - my brief summary clearly doesn't aim to provide a deep insight to these features.
An Array is a collection of data. It's implied that the items are store contiguously, and are directly addessable.
IEnumerable is a description of a collection of data. They aren't collections themselves. Specifically, it means that the collection can be stepped through, one item at a time.
IF you define a varaible as type IEnumerable, then it can reference a collection of any type that fits that description.
Arrays are Enumerable. So are Lists, Dictionaries, Sets and other collection types. Also, things which don't appear to be collection can be Enumerable, such as a string (which is IEnumerable<char>), or or the object returned by Enumerable.Range(), which generates a new item for each step without ever actually holding it anywhere.
Arrays
A .Net array is a collection of multiple values stored consecutively in memory. Individual elements in an array can be randomly accessed by index (and doing that is quite efficient). Important members of an array are:
this[Int32 index] (indexing operator)
Length
C# has built-in support for arrays and they can be initialized directly from code:
var array = new[] { 1, 2, 3, 4 };
Arrays can also be multidimensional and implement several interfaces including IEnumerable<T> (where T is the element type of the array).
IEnumerable<T>
The IEnumerable<T> interface defines the method GetEnumerator() but that method is rarely used directly. Instead the foreach loop is used to iterate through the enumeration:
IEnumerable<T> enumerable = ...;
foreach (T element in enumerable)
...
If the enumeration is done over an array or a list all the elements in the enumeration exists during the enumeration but it is also possible to enumerate elements that are created on the fly. The yield return construct is very useful for this.
It is possible to create an array from an enumeration:
var array = enumerable.ToArray();
This will get all elements from the enumeration and store them consecutively in a single array.
To sum it up:
Arrays are collection of elements that can be randomly accessed by index
Enumerations are abstraction over a collection of elements that can be accessed one after the other in a forward moving manner
One thing is that Arrays allow random access to some fixed size content. Where the IEnumerable interface provides the data sequentially, which you can pull from the IEnumerable one at a time until the data source is exhausted.
I have been thinking about the IEnumerator.Reset() method. I read in the MSDN documentation that it only there for COM interop. As a C++ programmer it looks to me like a IEnumerator which supports Reset is what I would call a forward iterator, while an IEnumerator which does not support Reset is really an input iterator.
So part one of my question is, is this understanding correct?
The second part of my question is, would it be of any benefit in C# if there was a distinction made between input iterators and forward iterators (or "enumerators" if you prefer)? Would it not help eliminate some confusion among programmers, like the one found in this SO question about cloning iterators?
EDIT: Clarification on forward and input iterators. An input iterator only guarantees that you can enumerate the members of a collection (or from a generator function or an input stream) only once. This is exactly how IEnumerator works in C#. Whether or not you can enumerate a second time, is determined by whether or not Reset is supported. A forward iterator, does not have this restriction. You can enumerate over the members as often as you want.
Some C# programmers don't underestand why an IEnumerator cannot be reliably used in a multipass algorithm. Consider the following case:
void PrintContents(IEnumerator<int> xs)
{
while (iter.MoveNext())
Console.WriteLine(iter.Current);
iter.Reset();
while (iter.MoveNext())
Console.WriteLine(iter.Current);
}
If we call PrintContents in this context, no problem:
List<int> ys = new List<int>() { 1, 2, 3 }
PrintContents(ys.GetEnumerator());
However look at the following:
IEnumerable<int> GenerateInts() {
System.Random rnd = new System.Random();
for (int i=0; i < 10; ++i)
yield return Rnd.Next();
}
PrintContents(GenerateInts());
If the IEnumerator supported Reset, in other words supported multi-pass algorithms, then each time you iterated over the collection it would be different. This would be undesirable, because it would be surprising behavior. This example is a bit faked, but it does occur in the real world (e.g. reading from file streams).
Reset was a big mistake. I call shenanigans on Reset. In my opinion, the correct way to reflect the distinction you are making between "forward iterators" and "input iterators" in the .NET type system is with the distinction between IEnumerable<T> and IEnumerator<T>.
See also this answer, where Microsoft's Eric Lippert (in an unofficial capactiy, no doubt, my point is only that he's someone with more credentials than I have to make the claim that this was a design mistake) makes a similar point in comments. Also see also his awesome blog.
Interesting question. My take is that of course C# would benefit. However, it wouldn't be easy to add.
The distinction exists in C++ because of its much more flexible type system. In C#, you don't have a robust generic way to clone objects, which is necessary to represent forward iterators (to support multi-pass iteration). And of course, for this to be really useful, you'd also need to support bidirectional and random-access iterators/enumerators. And to get them all working smoothly, you really need some form of duck-typing, like C++ templates have.
Ultimately, the scopes of the two concepts are different.
In C++, iterators are supposed to represent everything you need to know about a range of values. Given a pair of iterators, I don't need the original container. I can sort, I can search, I can manipulate and copy elements as much as I like. The original container is out of the picture.
In C#, enumerators are not meant to do quite as much. Ultimately, they're just designed to let you run through the sequence in a linear manner.
As for Reset(), it is widely accepted that it was a mistake to add it in the first place. If it had worked, and been implemented correctly, then yes, you could say your enumerator was analogous to forward iterators, but in general, it's best to ignore it as a mistake. And then all enumerators are similar only to input iterators.
Unfortunately.
Coming from the C# perspective:
You almost never use IEnumerator directly. Usually you do a foreach statement, which expects a IEnumerable.
IEnumerable _myCollection;
...
foreach (var item in _myCollection) { /* Do something */ }
You don't pass around IEnumerator either. If you want to pass an collection which needs iteration, you pass IEnumerable. Since IEnumerable has a single function, which returns an IEnumerator, it can be used to iterate the collection multiple times (multiple passes).
There's no need for a Reset() function on IEnumerator because if you want to start over, you just throw away the old one (garbage collected) and get a new one.
The .NET framework would benefit immensely if there were a means of asking an IEnumerator<T> about what abilities it could support and what promises it could make. Such features would also be helpful in IEnumerable<T>, but being able to ask the questions of an enumerator would allow code that can receive an enumerator from wrappers like ReadOnlyCollection to use the underlying collection in improve ways without having to involve the wrapper.
Given any enumerator for a collection that is capable of being enumerated in its entirety and isn't too big, one could produce from it an IEnumerable<T> that would always yield the same sequence of items (specifically the set of items remaining in the enumerator) by reading its entire content to an array, disposing and discarding the enumerator, and getting an enumerators from the array (using that in place of the original abandoned enumerator), wrapping the array in a ReadOnlyCollection<T>, and returning that. Although such an approach would work with any kind of enumerable collection meeting the above criteria, it would be horribly inefficient with most of them. Having a means of asking an enumerator to yield its remaining contents in an immutable IEnumerable<T> would allow many kinds of enumerators to perform the indicated action much more efficiently.
I don't think so. I would call IEnumerable a forward iterator, and an input iterator. It does not allow you to go backwards, or modify the underlying collection. With the addition of the foreach keyword, iterators are almost a non-thought most of the time.
Opinion:
The difference between input iterators (get each one) vs. output iterators (do something to each one) is too trivial to justify an addition to the framework. Also, in order to do an output iterator, you would need to pass a delegate to the iterator. The input iterator seems more natural to C# programmers.
There's also IList<T> if the programmer wants random access.