List Data Structure C# Efficiency - c#

at the moment I'm using a List<short> as a buffer to hold things for a while while a calculation is made to each value based on other values further down the buffer. I then realised that this probably wasn't very effecient as I have been told that List<> is a linked list so every time I do whatever = myList[100]; the poor thing is having to jump down all the other nodes first to get to the value I want. I dont want to use a regular Array because I have got loads of Add() and Remove()s kicking around in other places in the code. So I need a class that inherits IList<T> but uses a regular array data structure. Does anyone know a class in .net that works this way so I dont have to write my own? I tried using ArrayList but it 'aint generic!

List<T> doesn't use a linked list implementation. Internally it uses an array, so it appears to be exactly what you need. Note that, because it's an array, Remove/insert could be an expensive operation depending on the size of the list and the position item being removed/inserted - O(n). Without knowing more about how you are using it, though, it's hard to recommend a better data structure.
Quoting from the Remarks section of the docs.
The List(T) class is the generic equivalent of the ArrayList class. It implements the IList(T) generic interface using an array whose size is dynamically increased as required.

List<T> is backed by an array, not a linked list. Indexed accesses of a List<T> happen in constant time.

In addition to tvanfosson's correct answer, if you're ever unsure of how something works internally, just load up the .NET Reflector and you can see exactly how things are implemented. In this case, drilling down to the indexer of List<T> shows us the following code:
public T this[int index]
{
get
{
if (index >= this._size)
{
ThrowHelper.ThrowArgumentOutOfRangeException();
}
return this._items[index];
}
// ...
where you can see that this._items[index] is an array of the generic type T.

No, a List<T> is a generic collection, not a linked list. If you need add and remove functionality then List<T> is the implementation most people default to.

Related

C# List .ConvertAll Efficiency and overhead

I recently learned about List's .ConvertAll extension. I used it a couple times in code today at work to convert a large list of my objects to a list of some other object. It seems to work really well. However I'm unsure how efficient or fast this is compared to just iterating the list and converting the object. Does .ConvertAll use anything special to speed up the conversion process or is it just a short hand way of converting Lists without having to set up a loop?
No better way to find out than to go directly to the source, literally :)
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs#dbcc8a668882c0db
As you can see, there's no special magic going on. It just iterates over the list and creates a new item by the converter function that you specify.
To be honest, I was not aware of this method. The more idiomatic .NET way to do this kind of projection is through the use of the Select extension method on IEnumerable<T> like so: source.Select(input => new Something(input.Name)). The advantage of this is threefold:
It's more idomatic as I said, the ConvertAll is likely a remnant of the pre-C#3.0 days. It's not a very arcane method by any means and ConvertAll is a pretty clear description, but it might still be better to stick to what other people know, which is Select.
It's available on all IEnumerable<T>, while ConvertAll only works on instances of List<T>. It doesn't matter if it's an array, a list or a dictionary, Select works with all of them.
Select is lazy. It doesn't do anything until you iterate over it. This means that it returns an IEnumerable<TOutput> which you can then convert to a list by calling ToList() or not if you don't actually need a list. Or if you just want to convert and retrieve the first two items out of a list of a million items, you can simply do source.Select(input => new Something(input.Name)).Take(2).
But if your question is purely about the performance of converting a whole list to another list, then ConvertAll is likely to be somewhat faster as it's less generic than a Select followed by a ToList (it knows that a list has a size and can directly access elements by index from the underlying array for instance).
Decompiled using ILSPy:
public List<TOutput> ConvertAll<TOutput>(Converter<T, TOutput> converter)
{
if (converter == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.converter);
}
List<TOutput> list = new List<TOutput>(this._size);
for (int i = 0; i < this._size; i++)
{
list._items[i] = converter(this._items[i]);
}
list._size = this._size;
return list;
}
Create a new list.
Populate the new list by iterating over the current instance, executing the specified delegate.
Return the new list.
Does .ConvertAll use anything special to speed up the conversion
process or is it just a short hand way of converting Lists without
having to set up a loop?
It doesn't do anything special with regards to conversion (what "special" thing could it do?) It is directly modifying the private _items and _size members, so it might be trivially faster under some circumstances.
As usual, if the solution makes you more productive, code easier to read, etc. use it until profiling reveals a compelling performance reason to not use it.
It's the second way you described it - basically a short-hand way without setting up a loop.
Here's the guts of ConvertAll():
List<TOutput> list = new List<TOutput>(this._size);
for (int index = 0; index < this._size; ++index)
list._items[index] = converter(this._items[index]);
list._size = this._size;
return list;
Where TOutput is whatever type you're converting to, and converter is a delegate indicating the method that will do the conversion.
So it loops through the List you passed in, running each element through the method you specify, and then returns a new List of the specified type.
For precise timing in your scenarios you need to measure yourself.
Do not expect any miracles - it have to be O(n) operation since each element need to be converted and added to destination list.
Consider using Enumerable.Select instead as it will do lazy evaluation that may allow avoiding second copy of large list, especially you you need to do any filtering of items along the way.

In .net is linked list an underlying class for other Lists?

I was asked in a .net interview the significance of linkedlist in .net. I answered that linkedlist is used where you have to do a lot of inserts, but I have never had to use linkedlist in any of the code I wrote. The interviewer then told me that all the lists in .net use linkedlist as its underlying type. When I came home, I couldn't find anything online to support his statement. Can anyone comment on the validity of his statement?
I think your interviewer simply wrong.
LinkedList, by definiiton, is a list of entites connected with each other, so in order to get to some item X, you need traverse all the list, all along til that item. There is no way you can access that item via index(just an example).
LinkedList is just a different datastructure, and for sure it wasn't used on all BCL list types.
It's very convenient choice when your going to have linked enitities and consume small memory (no additional data need other then pointer to neighbor), but you pay a cost of traversal/picking/removing/updating speed on it.
Sounds like BS to me. If you use reflection or check the .net / mono source code you can see they use an array as base type:
private T[] _items;
MSDN says That the c# List<T> is like an ArrayList
The List class is the generic equivalent of the ArrayList class. It
implements the IList generic interface using an array whose size is
dynamically increased as required.
This implies that the plain List<T> is not a linked list.

Why create an IEnumerable?

I don't understand why I'd create an IEnumerable. Or why it's important.
I'm looking at the example for IEnumerable:
http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx
But I can basically do the same thing if I just went:
List<Person> people = new List<Person>();
so what's IEnumerable good for? Can you give me a situation where I'd need to create a class that implements IEnumerable?
IEnumerable is an interface, it exposes certain things to the outside. While you are completely right, you could just use a List<T>, but List<T> is very deep in the inheritance tree. What exactly does a List<T>? It stores items, it offers certain methods to Add and Remove. Now, what if you only need the "item-keeping" feature of a List<T>? That's what an IEnumerable<T> is - an abstract way of saying "I want to get a list of items I can iterate over". A list is "I want to get a collection which I can modify, can access by index and iterate". List<T> offers a lot more functionality than IEnumerable<T> does, but it takes up more memory. So if a method is taking an IEnumerable<T>, it doesn't care what exactly it gets, as long as the object offers the possibilites of IEnumerable<T>.
Also, you don't have to create your own IEnumerable<T>, a List<T> IS an IEnumerable<T>!
Lists are, of course IEnumerable - As a general rule, you want to be specific on what you output but broad on what you accept as input eg:
You have a sub which loops through a list of objects and writes something to the console...
You could declare the parameter is as either IEnumerable<T> or IList<T> (or even List<T>). Since you don't need to add to the input list, all you actually need to do is enumerate - so use IEnumerable - then your method will also accept other types which implement IEnumerable including IQueryable, Linked Lists, etc...
You're making your methods more generic for no cost.
Today, you generally wouldn't use IEnumerable anymore unless you were supporting software on an older version of the framework. Today, you'd normally use IEnumerable<T>. Amongst other benefits, IEnumerable fully implements all of the LINQ operations/extensions so that you can easily query any List type that implements IEnumerable<T> using LINQ.
Additionally, it doesn't tie the consumer of your code to a particular collection implementation.
It's rare that nowdays you need to create your own container classes, as you are right there alreay exists many good implementations.
However if you do create your own container class for some specific reason, you may like to implement IEnumerable or IEnumerable<T> because they are a standard "contract" for itteration and by providing an implementation you can take advantage of methods/apis that want an IEnumerable or IEnumerable<T> Linq for example will give you a bunch of useful extension methods for free.
An IList can be thought of as a particular implementation of IEnumerable. (One that can be added to and removed from easily.) There are others, such as IDictionary, which performs an entirely different function but can still be enumerated over. Generally, I would use IEnumerable as a more generic type reference when I only need an enumeration to satisfy a requirement and don't particularly care what kind it is. I can pass it an IList and more often than not I do just that, but the flexibility exists to pass it other enumerations as well.
Here is one situation that I think I have to implement IEnumerable but not using List<>
I want to get all items from a remote server. Let say I have one million items going to return. If you use List<> approach, you need to cache all one million items in the memory first. In some cases, you don't really want to do that because you don't want to use up too much memory. Using IEnumerable allows you to display the data on the screen and then dispose it right away. Therefore, using IEnumerable approach, the memory footprint of the program is much smaller.
It's my understanding that IEnumerable is provided to you as an interface for creating your own enumerable class types.
I believe a simple example of this would be recreating the List type, if you wanted to have your own set of features (or lack thereof) for it.
What if you want to enumerate over a collection that is potentially of infinite size, such as the Fibonacci numbers? You couldn't do that easily with a list, but if you had a class that implemented IEnumerable or IEnumerable<T>, it becomes easy.
When a built in container fits your needs you should definitely use that, and than IEnumerable comes for free. When for whatever reason you have to implement your own container, for example if it must be backed by a DB, than you should make sure to implement both IEnumerable and IEnumerable<T> for two reasons:
It makes foreach work, which is awesome
It enables almost all LINQ goodness. For example you will be able to filter your container down to objects that match a condition with an elegant one liner.
IEnumerable provides means for your API users (including yourself) to use your collection by the means of a foreach. For example, i implemented IENumerable in my Binary Tree class so i could just foreach over all of the items in the tree without having to Ctrl+C Ctrl+V all the logic required to traverse the tree InOrder.
Hope it helps :)
IEnumerable is useful if you have a collection or method which can return a bunch of things, but isn't a Dictionary, List, array, or other such predefined collection. It is especially useful in cases where the set of things to be returned might not be available when one starts outputting it. For example, an object to access records in a database might implement iEnumerable. While it might be possible for such an object to read all appropriate records into an array and return that, that may be impractical if there are a lot of records. Instead, the object could return an enumerator which could read the records in small groups and return them individually.

What is the difference between IEnumerable and arrays?

Will anyone describe IEnumerable and what is difference between IEnumerable and array
and where to use it.. all information about it and how to use it.
An array is a collection of objects with a set size.
int[] array = [0, 1, 2];
This makes it very useful in situations where you may want to access an item in a particular spot in the collection since the location in memory of each element is already known
array[1];
Also, the size of the array can be calculated quickly.
IEnumerable, on the other hand, basically says that given a start position it is possible to get the next value. One example of this may be an infinite series of numbers:
public IEnumerable<int> Infinite()
{
int i = 0;
while(true)
yield return i++;
}
Unlike an array an enumerable collection can be any size and it is possible to create the elements as they are required, rather than upfront, this allows for powerful constructs and is used extensively by LINQ to facilitate complex queries.
//This line won't do anything until you actually enumerate the created variable
IEnumerable<int> firstTenOddNumbers = Infinite().Where(x => x % 2 == 1).Take(10);
However the only way to get a specific element is to start at the beginning and enumerate through to the one you want. This will be considerably more expensive than getting the element from a pre-generated array.
Of course you can enumerate through an array, so an array implements the IEnumerable interface.
.NET has its IEnumerable interface misnamed - it should be IIterable. Basically a System.Collection.IEnumerable or (since generics) System.Collection.Generic.IEnumerable allows you to use foreach on the object implementing these interfaces.
(Side note: actually .NET is using duck typing for foreach, so you are not required to implement these interfaces - it's enough if you provide the suitable method implementations.)
An array (System.Array) is a type of a sequence (where by sequence I mean an iterable data structure, i.e. anything that implements IEnumerable), with some important differences.
For example, an IEnumerable can be - and is often - lazy-loaded. That means that until you explicitly iterate over it, the items won't be created. This can lead to strange behaviour if you're not aware of it.
As a consequence, an IEnumerable has no means of telling you how many items it contains until you actually iterate over it (which the Count extension method in System.Linq.Enumerable class does).
An array has a Length property, and with this we have arrived to the most important difference: an array if a sequence of fixed (and known) items. It also provides an indexer, so you can conveniently access its items without actually iterating over it.
And just for the record, the "real" enumerations in .NET are types defined with the enum keyword. They allow you express a choices without using magic numbers or strings. They can be also used as flags, when marked with the FlagsAttribute.
I suggest you to use your favioure search engine to get more details about these concepts - my brief summary clearly doesn't aim to provide a deep insight to these features.
An Array is a collection of data. It's implied that the items are store contiguously, and are directly addessable.
IEnumerable is a description of a collection of data. They aren't collections themselves. Specifically, it means that the collection can be stepped through, one item at a time.
IF you define a varaible as type IEnumerable, then it can reference a collection of any type that fits that description.
Arrays are Enumerable. So are Lists, Dictionaries, Sets and other collection types. Also, things which don't appear to be collection can be Enumerable, such as a string (which is IEnumerable<char>), or or the object returned by Enumerable.Range(), which generates a new item for each step without ever actually holding it anywhere.
Arrays
A .Net array is a collection of multiple values stored consecutively in memory. Individual elements in an array can be randomly accessed by index (and doing that is quite efficient). Important members of an array are:
this[Int32 index] (indexing operator)
Length
C# has built-in support for arrays and they can be initialized directly from code:
var array = new[] { 1, 2, 3, 4 };
Arrays can also be multidimensional and implement several interfaces including IEnumerable<T> (where T is the element type of the array).
IEnumerable<T>
The IEnumerable<T> interface defines the method GetEnumerator() but that method is rarely used directly. Instead the foreach loop is used to iterate through the enumeration:
IEnumerable<T> enumerable = ...;
foreach (T element in enumerable)
...
If the enumeration is done over an array or a list all the elements in the enumeration exists during the enumeration but it is also possible to enumerate elements that are created on the fly. The yield return construct is very useful for this.
It is possible to create an array from an enumeration:
var array = enumerable.ToArray();
This will get all elements from the enumeration and store them consecutively in a single array.
To sum it up:
Arrays are collection of elements that can be randomly accessed by index
Enumerations are abstraction over a collection of elements that can be accessed one after the other in a forward moving manner
One thing is that Arrays allow random access to some fixed size content. Where the IEnumerable interface provides the data sequentially, which you can pull from the IEnumerable one at a time until the data source is exhausted.

ArrayList versus an array of objects versus Collection of T

I have a class Customer (with typical customer properties) and I need to pass around, and databind, a "chunk" of Customer instances. Currently I'm using an array of Customer, but I've also used Collection of T (and List of T before I knew about Collection of T). I'd like the thinnest way to pass this chunk around using C# and .NET 3.5.
Currently, the array of Customer is working just fine for me. It data binds well and seems to be as lightweight as it gets. I don't need the stuff List of T offers and Collection of T still seems like overkill. The array does require that I know ahead of time how many Customers I'm adding to the chunk, but I always know that in advance (given rows in a page, for example).
Am I missing something fundamental or is the array of Customer OK? Is there a tradeoff I'm missing?
Also, I'm assuming that Collection of T makes the old loosely-typed ArrayList obsolete. Am I right there?
Yes, Collection<T> (or List<T> more commonly) makes ArrayList pretty much obsolete. In particular, I believe ArrayList isn't even supported in Silverlight 2.
Arrays are okay in some cases, but should be considered somewhat harmful - they have various disadvantages. (They're at the heart of the implementation of most collections, of course...) I'd go into more details, but Eric Lippert does it so much better than I ever could in the article referenced by the link. I would summarise it here, but that's quite hard to do. It really is worth just reading the whole post.
No one has mentioned the Framework Guidelines advice: Don't use List<T> in public API's:
We don’t recommend using List in
public APIs for two reasons.
List<T> is not designed to be extended. i.e. you cannot override any
members. This for example means that
an object returning List<T> from a
property won’t be able to get notified
when the collection is modified.
Collection<T> lets you overrides
SetItem protected member to get
“notified” when a new items is added
or an existing item is changed.
List has lots of members that are not relevant in many scenarios. We
say that List<T> is too “busy” for
public object models. Imagine
ListView.Items property returning
List<T> with all its richness. Now,
look at the actual ListView.Items
return type; it’s way simpler and
similar to Collection<T> or
ReadOnlyCollection<T>
Also, if your goal is two-way Databinding, have a look at BindingList<T> (with the caveat that it is not sortable 'out of the box'!)
Generally, you should 'pass around' IEnumerable<T> or ICollection<T> (depending on whether it makes sense for your consumer to add items).
If you have an immutable list of customers, that is... your list of customers will not change, it's relatively small, and you will always iterate over it first to last and you don't need to add to the list or remove from it, then an array is probably just fine.
If you're unsure, however, then your best bet is a collection of some type. What collection you choose depends on the operations you wish to perform on it. Collections are all about inserts, manipulations, lookups, and deletes. If you do frequent frequent searches for a given element, then a dictionary may be best. If you need to sort the data, then perhaps a SortedList will work better.
I wouldn't worry about "lightweight", unless you're talking a massive number of elements, and even then the advantages of O(1) lookups outweigh the costs of resources.
When you "pass around" a collection, you're only passing a reference, which is basically a pointer. So there is no performance difference between passing a collection and an array.
I'm going to put in a dissenting argument to both Jon and Eric Lippert )which means that you should be very weary of my answer, indeed!).
The heart of Eric Lippert's arguments against arrays is that the contents are immutable, while the data structure itself is not. With regards to returning them from methods, the contents of a List are just as mutable. In fact, because you can add or subtract elements from a List, I would argue that this makes the return value more mutable than an array.
The other reason I'm fond of Arrays is because sometime back I had a small section of performance critical code, so I benchmarked the performance characteristics of the two, and arrays blew Lists out of the water. Now, let me caveat this by saying it was a narrow test for how I was going to use them in a specific situation, and it goes against what I understand of both, but the numbers were wildly different.
Anyway, listen to Jon and Eric =), and I agree that List almost always makes more sense.
I agree with Alun, with one addition. If you may want to address the return value by subscript myArray[n], then use an IList.
An Array inherently supports IList (as well as IEnumerable and ICollection, for that matter). So if you pass by interface, you can still use the array as your underlying data structure. In this way, the methods that you are passing the array into don't have to "know" that the underlying datastructure is an array:
public void Test()
{
IList<Item> test = MyMethod();
}
public IList<Item> MyMethod()
{
Item[] items = new Item[] {new Item()};
return items;
}

Categories