Will anyone describe IEnumerable and what is difference between IEnumerable and array
and where to use it.. all information about it and how to use it.
An array is a collection of objects with a set size.
int[] array = [0, 1, 2];
This makes it very useful in situations where you may want to access an item in a particular spot in the collection since the location in memory of each element is already known
array[1];
Also, the size of the array can be calculated quickly.
IEnumerable, on the other hand, basically says that given a start position it is possible to get the next value. One example of this may be an infinite series of numbers:
public IEnumerable<int> Infinite()
{
int i = 0;
while(true)
yield return i++;
}
Unlike an array an enumerable collection can be any size and it is possible to create the elements as they are required, rather than upfront, this allows for powerful constructs and is used extensively by LINQ to facilitate complex queries.
//This line won't do anything until you actually enumerate the created variable
IEnumerable<int> firstTenOddNumbers = Infinite().Where(x => x % 2 == 1).Take(10);
However the only way to get a specific element is to start at the beginning and enumerate through to the one you want. This will be considerably more expensive than getting the element from a pre-generated array.
Of course you can enumerate through an array, so an array implements the IEnumerable interface.
.NET has its IEnumerable interface misnamed - it should be IIterable. Basically a System.Collection.IEnumerable or (since generics) System.Collection.Generic.IEnumerable allows you to use foreach on the object implementing these interfaces.
(Side note: actually .NET is using duck typing for foreach, so you are not required to implement these interfaces - it's enough if you provide the suitable method implementations.)
An array (System.Array) is a type of a sequence (where by sequence I mean an iterable data structure, i.e. anything that implements IEnumerable), with some important differences.
For example, an IEnumerable can be - and is often - lazy-loaded. That means that until you explicitly iterate over it, the items won't be created. This can lead to strange behaviour if you're not aware of it.
As a consequence, an IEnumerable has no means of telling you how many items it contains until you actually iterate over it (which the Count extension method in System.Linq.Enumerable class does).
An array has a Length property, and with this we have arrived to the most important difference: an array if a sequence of fixed (and known) items. It also provides an indexer, so you can conveniently access its items without actually iterating over it.
And just for the record, the "real" enumerations in .NET are types defined with the enum keyword. They allow you express a choices without using magic numbers or strings. They can be also used as flags, when marked with the FlagsAttribute.
I suggest you to use your favioure search engine to get more details about these concepts - my brief summary clearly doesn't aim to provide a deep insight to these features.
An Array is a collection of data. It's implied that the items are store contiguously, and are directly addessable.
IEnumerable is a description of a collection of data. They aren't collections themselves. Specifically, it means that the collection can be stepped through, one item at a time.
IF you define a varaible as type IEnumerable, then it can reference a collection of any type that fits that description.
Arrays are Enumerable. So are Lists, Dictionaries, Sets and other collection types. Also, things which don't appear to be collection can be Enumerable, such as a string (which is IEnumerable<char>), or or the object returned by Enumerable.Range(), which generates a new item for each step without ever actually holding it anywhere.
Arrays
A .Net array is a collection of multiple values stored consecutively in memory. Individual elements in an array can be randomly accessed by index (and doing that is quite efficient). Important members of an array are:
this[Int32 index] (indexing operator)
Length
C# has built-in support for arrays and they can be initialized directly from code:
var array = new[] { 1, 2, 3, 4 };
Arrays can also be multidimensional and implement several interfaces including IEnumerable<T> (where T is the element type of the array).
IEnumerable<T>
The IEnumerable<T> interface defines the method GetEnumerator() but that method is rarely used directly. Instead the foreach loop is used to iterate through the enumeration:
IEnumerable<T> enumerable = ...;
foreach (T element in enumerable)
...
If the enumeration is done over an array or a list all the elements in the enumeration exists during the enumeration but it is also possible to enumerate elements that are created on the fly. The yield return construct is very useful for this.
It is possible to create an array from an enumeration:
var array = enumerable.ToArray();
This will get all elements from the enumeration and store them consecutively in a single array.
To sum it up:
Arrays are collection of elements that can be randomly accessed by index
Enumerations are abstraction over a collection of elements that can be accessed one after the other in a forward moving manner
One thing is that Arrays allow random access to some fixed size content. Where the IEnumerable interface provides the data sequentially, which you can pull from the IEnumerable one at a time until the data source is exhausted.
Related
Just now find it by chance, Add(T) is defined in ICollection<T>, instead of IEnumerable<T>. And extension methods in Enumerable.cs don't contain Add(T), which I think is really weird. Since an object is enumerable, it must "looks like" a collection of items. Can anyone tell me why?
An IEnumerable<T> is just a sequence of elements; see it as a forward only cursor. Because a lot of those sequences are generating values, streams of data, or record sets from a database, it makes no sense to Add items to them.
IEnumerable is for reading, not for writing.
An enumerable is exactly that - something you can enumerate over and discover all the items. It does not imply that you can add to it.
Being able to enumerate is universal to many types of objects. For example, it is shared by arrays and collections. But you can't 'add' to an array without messing about with it's structure - whereas a Collection is specifically built to be added to and removed from.
Technically you can 'add' to an enumerable, however - by using Concat<> - however all this does is create an enumerator that enumerates from one enumerable to the next - giving the illusion of a single contigious set.
Each ICollection should be IEnumerable (I think, and the .NET Framework team seems to agree with me ;-)), but the other way around does not always make sense. There is a hierarchy of "collection like objects" in this world, and your assumption that an enumerable would be a collection you can add items to does not hold true in that hierarchy.
Example: a list of primary color names would be an IEnumerable returning "Red", "Blue" and "Green". It would make no logical sense at all to be able to do a primaryColors.Add("Bright Purple") on a "collection" filled like this:
...whatever...
{
...
var primaryColors = EnumeratePrimaryColors();
...
}
private static IEnumerable<string> EnumeratePrimaryColors() {
yield return "Red";
yield return "Blue";
yield return "Green";
}
As its name says, you can enumerate (loop) over an IEnumerable, and that's about it.
When you want to be able to Add something to it, it wouldn't be just an enumerable anymore, since it has extra features.
For instance, an array is an IEnumerable, but an array has a fixed length, so you can't add new items to it.
IEnumerable is just the 'base' for all kind of collections (even readonly collections - which have obviously no Add() method).
The more functionality you'd add to such 'base interface', the more specific it would be.
The name says it all. IEnumerable is for enumerating items only. ICollection is the actual collection of items and thus supports the Add method.
I have the following which is an array
package.Resources
If I use
package.Resources.ToList().Add(resouce);
package.Resources doesn't actually contain the new item.
I have to use
var packageList = package.Resources.ToList();
packageList.Add(resource);
package.Resources = packageList.ToArray();
Why is that?
ToList() creates a completely new, different list based on the original array.
LINQ is read-only; Language INtegrated Query - it is only querying the data, not modifying it. All LINQ methods produce a projection - e.g., they project the original sequence into a new one, so you're always working against that.
When you call package.Resources.ToList().Add(resouce);, it's functionally the same as doing this:
var resourcesAsList = new List<WhateverTypeResourcesContains>();
foreach(var item in package.Resources)
{
resourcesAsList.Add(item);
}
var resourcesAsList.Add(resouce);
What this means is that package.Resources hasn't been modified, you've created a List<T> that contains each item that package.Resources contained.
Exanding upon what Rex said, when you call .ToList() on an array object, what you are saying is, "Make me a brand new List object that's a distinct reference from the original array, but use the objects from my original array as the objects in my list."
Because arrays are fixed in size, there's really no way to Add() an item to it without copying the existing array to a new array that has an increased capacity (even ReDim Preserve in the VB world copies the array behind the scense).
Because of this, you should ask yourself why you are using an array if you don't know the total number of items in the array when you instantiate it. When you know the number of items or when you are instantiating the array from an existing List or IEnumerable, arrays can be a decent light-weight way of holding a collection of objects. However, when you don't know the capacity until runtime and the source is not an IEnumerable , a List might be a better option because it is built to grow in capacity as items are added.
I use the following collection types as properties in my classes:
IEnumerable
I use IEnumerable when I'm going to be iterating over a collection that already exists and I won't be adding to it. For example, if I query a directory for its files and I want to have those files as a property in a class, I might use public IEnumerable<FileInfo> DirectoryFiles {get; set;}.
IEnumerable allows me to query over an existing set of data, but I don't necessarily care what collection it is, as long as it implements IEnumerable.
List<T>
I use a List<T> when I need a collection that could grow or shrink dynamically, i.e. the collection doesn't arealdy exist somewhere else and I may need to add or remove items from it.
A good example of this might be if you are allowing a user to add and remove items to a list box in the user interface of your application. Since the items in the list box might grow or shrink, a List<T> makes sense.
ObservableCollection<T>
I use this type of collection in the Silverlight/WPF world because it has built-in events for when the number of items in the collection changes. This is especially handy for data-binding scenarios when you want a UI element to automatically update when the list changes.
I almost never use Arrays explicitly unless I'm consuming an object that already has arrays. Granted, they're a nice lightweight way in which store a collection, but I can usually get by with the 3 types I listed above.
The property Resources is returning an IEnumerable. Enumerable.ToList() builds a new list from this IEnumerable. You are then adding an item to a new list, not the original collection that Resources is accessing, hence your update is having no affect.
at the moment I'm using a List<short> as a buffer to hold things for a while while a calculation is made to each value based on other values further down the buffer. I then realised that this probably wasn't very effecient as I have been told that List<> is a linked list so every time I do whatever = myList[100]; the poor thing is having to jump down all the other nodes first to get to the value I want. I dont want to use a regular Array because I have got loads of Add() and Remove()s kicking around in other places in the code. So I need a class that inherits IList<T> but uses a regular array data structure. Does anyone know a class in .net that works this way so I dont have to write my own? I tried using ArrayList but it 'aint generic!
List<T> doesn't use a linked list implementation. Internally it uses an array, so it appears to be exactly what you need. Note that, because it's an array, Remove/insert could be an expensive operation depending on the size of the list and the position item being removed/inserted - O(n). Without knowing more about how you are using it, though, it's hard to recommend a better data structure.
Quoting from the Remarks section of the docs.
The List(T) class is the generic equivalent of the ArrayList class. It implements the IList(T) generic interface using an array whose size is dynamically increased as required.
List<T> is backed by an array, not a linked list. Indexed accesses of a List<T> happen in constant time.
In addition to tvanfosson's correct answer, if you're ever unsure of how something works internally, just load up the .NET Reflector and you can see exactly how things are implemented. In this case, drilling down to the indexer of List<T> shows us the following code:
public T this[int index]
{
get
{
if (index >= this._size)
{
ThrowHelper.ThrowArgumentOutOfRangeException();
}
return this._items[index];
}
// ...
where you can see that this._items[index] is an array of the generic type T.
No, a List<T> is a generic collection, not a linked list. If you need add and remove functionality then List<T> is the implementation most people default to.
I'm trying to understand the difference between sequences and lists.
In F# there is a clear distinction between the two. However in C# I have seen programmers refer to IEnumerable collections as a sequence. Is what makes IEnumerable a sequence the fact that it returns an object to iterate through the collection?
Perhaps the real distinction is purely found in functional languages?
Not really - you tend to have random access to a list, as well as being able to get its count quickly etc. Admittedly linked lists don't have the random access nature... but then they don't implement IList<T>. There's a grey area between the facilities provided by a particular platform and the general concepts.
Sequences (as represented by IEnumerable<T>) are read-only, forward-only, one item at a time, and potentially infinite. Of course any one implementation of a sequence may also be a list (e.g. List<T>) but when you're treating it as a sequence, you can basically iterate over it (repeatedly) and that's it.
I think that the confusion may arise from the fact that collections like List<T> implement the interface IEnumerable<T>. If you have a subtype relationship in general (e.g. supertype Shape with two subtypes Rectangle and Circle), you can interpret the relation as an "is-a" hierarchy.
This means that it is perfectly fine to say that "Circle is a Shape" and similarly, people would say that "List<T> is an IEnumerable<T>" that is, "list is a sequence". This makes some sense, because a list is a special type of a sequence. In general, sequences can be also lazily generated and infinite (and these types cannot also be lists). An example of a (perfectly valid) sequence that cannot be generated by a list would look like this:
// C# version // F# version
IEnumerable<int> Numbers() { let rec loop n = seq {
int i = 0; yield n
while (true) yield return i++; yield! loop(n + 1) }
} let numbers = loop(0)
This would be also true for F#, because F# list type also implements IEnumerable<T>, but functional programming doesn't put that strong emphasis on object oriented point of view (and implicit conversions that enable the "is a" interpretation are used less frequently in F#).
Sequence content is calculated on demand so you can implement for example infinite sequence without affecting your memory.
So in C# you can write a sequence, for example
IEnumerable<int> Null() {
yield return 0;
}
It will return infinite sequence of zeros.
You can write
int[] array = Null().Take(10).ToArray()
And it will take 10*4 bytes of memory despite sequence is infinite.
So as you see, C# does have distinction between sequence and collection
I've always been told that adding an element to an array happens like this:
An empty copy of the array+1element is
created and then the data from the
original array is copied into it then
the new data for the new element is
then loaded
If this is true, then using an array within a scenario that requires a lot of element activity is contra-indicated due to memory and CPU utilization, correct?
If that is the case, shouldn't you try to avoid using an array as much as possible when you will be adding a lot of elements? Should you use iStringMap instead? If so, what happens if you need more than two dimensions AND need to add a lot of element additions. Do you just take the performance hit or is there something else that should be used?
Look at the generic List<T> as a replacement for arrays. They support most of the same things arrays do, including allocating an initial storage size if you want.
This really depends on what you mean by "add."
If you mean:
T[] array;
int i;
T value;
...
if (i >= 0 && i <= array.Length)
array[i] = value;
Then, no, this does not create a new array, and is in-fact the fastest way to alter any kind of IList in .NET.
If, however, you're using something like ArrayList, List, Collection, etc. then calling the "Add" method may create a new array -- but they are smart about it, they don't just resize by 1 element, they grow geometrically, so if you're adding lots of values only every once in a while will it have to allocate a new array. Even then, you can use the "Capacity" property to force it to grow before hand, if you know how many elements you're adding (list.Capacity += numberOfAddedElements)
In general, I prefer to avoid array usage. Just use List<T>. It uses a dynamically-sized array internally, and is fast enough for most usage. If you're using multi-dimentional arrays, use List<List<List<T>>> if you have to. It's not that much worse in terms of memory, and is much simpler to add items to.
If you're in the 0.1% of usage that requires extreme speed, make sure it's your list accesses that are really the problem before you try to optimize it.
If you're going to be adding/removing elements a lot, just use a List. If it's multidimensional, you can always use a List<List<int>> or something.
On the other hand, lists are less efficient than arrays if what you're mostly doing is traversing the list, because arrays are all in one place in your CPU cache, where objects in a list are scattered all over the place.
If you want to use an array for efficient reading but you're going to be "adding" elements frequently, you have two main options:
1) Generate it as a List (or List of Lists) and then use ToArray() to turn it into an efficient array structure.
2) Allocate the array to be larger than you need, then put the objects into the pre-allocated cells. If you end up needing even more elements than you pre-allocated, you can just reallocate the array when it fills, doubling the size each time. This gives O(log n) resizing performance instead of O(n) like it would be with a reallocate-once-per-add array. Note that this is pretty much how StringBuilder works, giving you a faster way to continually append to a string.
When to abandon the use of arrays
First and foremost, when semantics of arrays dont match with your intent - Need a dynamically growing collection? A set which doesn't allow duplicates? A collection that has to remain immutable? Avoid arrays in all that cases. That's 99% of the cases. Just stating the obvious basic point.
Secondly, when you are not coding for absolute performance criticalness - That's about 95% of the cases. Arrays perform better marginally, especially in iteration. It almost always never matter.
When you're not forced by an argument with params keyword - I just wished params accepted any IEnumerable<T> or even better a language construct itself to denote a sequence (and not a framework type).
When you are not writing legacy code, or dealing with interop
In short, its very rare that you would actually need an array. I will add as to why may one avoid it?
The biggest reason to avoid arrays imo is conceptual. Arrays are closer to implementation and farther from abstraction. Arrays conveys more how it is done than what is done which is against the spirit of high level languages. That's not surprising, considering arrays are closer to the metal, they are straight out of a special type (though internally array is a class). Not to be pedagogical, but arrays really do translate to a semantic meaning very very rarely required. The most useful and frequent semantics are that of a collections with any entries, sets with distinct items, key value maps etc with any combination of addable, readonly, immutable, order-respecting variants. Think about this, you might want an addable collection, or readonly collection with predefined items with no further modification, but how often does your logic look like "I want a dynamically addable collection but only a fixed number of them and they should be modifiable too"? Very rare I would say.
Array was designed during pre-generics era and it mimics genericity with lot of run time hacks and it will show its oddities here and there. Some of the catches I found:
Broken covariance.
string[] strings = ...
object[] objects = strings;
objects[0] = 1; //compiles, but gives a runtime exception.
Arrays can give you reference to a struct!. That's unlike anywhere else. A sample:
struct Value { public int mutable; }
var array = new[] { new Value() };
array[0].mutable = 1; //<-- compiles !
//a List<Value>[0].mutable = 1; doesnt compile since editing a copy makes no sense
print array[0].mutable // 1, expected or unexpected? confusing surely
Run time implemented methods like ICollection<T>.Contains can be different for structs and classes. It's not a big deal, but if you forget to override non generic Equals correctly for reference types expecting generic collection to look for generic Equals, you will get incorrect results.
public class Class : IEquatable<Class>
{
public bool Equals(Class other)
{
Console.WriteLine("generic");
return true;
}
public override bool Equals(object obj)
{
Console.WriteLine("non generic");
return true;
}
}
public struct Struct : IEquatable<Struct>
{
public bool Equals(Struct other)
{
Console.WriteLine("generic");
return true;
}
public override bool Equals(object obj)
{
Console.WriteLine("non generic");
return true;
}
}
class[].Contains(test); //prints "non generic"
struct[].Contains(test); //prints "generic"
The Length property and [] indexer on T[] seem to be regular properties that you can access through reflection (which should involve some magic), but when it comes to expression trees you have to spit out the exact same code the compiler does. There are ArrayLength and ArrayIndex methods to do that separately. One such question here. Another example:
Expression<Func<string>> e = () => new[] { "a" }[0];
//e.Body.NodeType == ExpressionType.ArrayIndex
Expression<Func<string>> e = () => new List<string>() { "a" }[0];
//e.Body.NodeType == ExpressionType.Call;
Yet another one. string[].IsReadOnly returns false, but if you are casting, IList<string>.IsReadOnly returns true.
Type checking gone wrong: (object)new ConsoleColor[0] is int[] returns true, whereas new ConsoleColor[0] is int[] returns false. Same is true for uint[] and int[] comparisons. No such problems if you use any other collection types.
How to abandon the use of arrays.
The most commonly used substitute is List<T> which has a cleaner API. But it is a dynamically growing structure which means you can add to a List<T> at the end or insert anywhere to any capacity. There is no substitute for the exact behaviour of an array, but people mostly use arrays as readonly collection where you can't add anything to its end. A substitute is ReadOnlyCollection<T>.
When the array is resized, a new array must be allocated, and the contents copied. If you are only modifying the contents of the array, it is just a memory assignment.
So, you should not use arrays when you don't know the size of the array, or the size is likely to change. However, if you have a fixed length array, they are an easy way of retrieving elements by index.
ArrayList and List grow the array by more than one when needed (I think it's by doubling the size, but I haven't checked the source). They are generally the best choice when you are building a dynamically sized array.
When your benchmarks indicate that array resize is seriously slowing down your application (remember - premature optimization is the root of all evil), you can evaluate writing a custom array class with tweaked resizing behavior.
Generally, if you must have the BEST indexed lookup performance it's best to build a List first and then turn it into a array thus paying a small penalty at first but avoiding any later. If the issue is that you will be continually adding new data and removing old data then you may want to use a ArrayList or List for convenience but keep in mind that they are just special case Arrays. When they "grow" they allocate a completely new array and copy everything into it which is extremely slow.
ArrayList is just an Array which grows when needed.
Add is amortized O(1), just be careful to make sure the resize won't happen at a bad time.
Insert is O(n) all items to the right must be moved over.
Remove is O(n) all items to the right must be moved over.
Also important to keep in mind that List is not a linked list. It's just a typed ArrayList. The List documentation does note that it performs better in most cases but does not say why.
The best thing to do is to pick a data structure which is appropriate to your problem. This depends one a LOT of things and so you may want to browse the System.Collections.Generic Namespace.
In this particular case I would say that if you can come up with a good key value Dictionary would be your best bet. It has insert and remove that approaches O(1). However, even with a Dictionary you have to be careful not to let it resize it's internal array (an O(n) operation). It's best to give them a lot of room by specifying a larger-then-you-expect-to-use initial capacity in the constructor.
-Rick
A standard array should be defined with a length, which reserves all of the memory that it needs in a contiguous block. Adding an item to the array would put it inside of the block of already reserved memory.
Arrays are great for few writes and many reads, particularly those of an iterative nature - for anything else, use one of the many other data structures.
You are correct an array is great for look ups. However modifications to the size of the array are costly.
You should use a container that supports incremental size adjustments in the scenario where you're modifying the size of the array. You could use an ArrayList which allows you to set the initial size, and you could continually check the size versus the capacity and then increment the capacity by a large chunk to limit the number of resizes.
Or you could just use a linked list. Then however look ups are slow...
If I think I'm going to be adding items to the collection a lot over its lifetime, than I'll use a List. If I know for sure what the size of the collection will be when its declared, then I'll use an array.
Another time I generally use an array over a List is when I need to return a collection as a property of an object - I don't want callers adding items that collection via List's Add methods, but instead want them to add items to the collection via my object's interface. In that case, I'll take the internal List and call ToArray and return an array.
If you are going to be doing a lot of adding, and you will not be doing random access (such as myArray[i]). You could consider using a linked list (LinkedList<T>), because it will never have to "grow" like the List<T> implementation. Keep in mind, though, that you can only really access items in a LinkedList<T> implementation using the IEnumerable<T> interface.
The best thing you can do is to allocate as much memory as you need upfront if possible. This will prevent .NET from having to make additional calls to get memory on the heap. Failing that then it makes sense to allocate in chunks of five or whatever number makes sense for your application.
This is a rule you can apply to anything really.