Which C# containers are contiguous and which are node-based?

Which C# containers are contiguous and which are node-based? - c#

In Item 1 of Effective STL, Herb Sutter makes a distinction between contiguous and node-based containers. Vectors, strings, and deques are contiguous, while linked lists and associative containers are node-based. This is useful for performance considerations (speed of insertion or deletion from the start, middle, or end, iteration, large memory allocation considerations, etc.)
In particular, I'm interested in List<T> and a list such as: BaseList : CollectionBase, ITypedList.
I heard somewhere that List<T> is more like a std::vector<T> than a linked list. So are both these C# containers contiguous? What node containers are available besides LinkedList<T>? Is there a comparison on MSDN somewhere, perhaps?

There is a table in this article that lists which containers are contiguous (scroll down):
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
Excerpt:
The List is a basic contiguous storage container. Some people may call this a vector or dynamic array. Essentially it is an array of items that grow once its current capacity is exceeded. Because the items are stored contiguously as an array, you can access items in the List by index very quickly. However inserting and removing in the beginning or middle of the List are very costly because you must shift all the items up or down as you delete or insert respectively. However, adding and removing at the end of a List is an amortized constant operation - O(1). Typically List is the standard go-to collection when you don't have any other constraints, and typically we favor a List even over arrays unless we are sure the size will remain absolutely fixed.

The most reliable way to check is by browsing the source code.
For instance, here's the code for List<T>, which states in a comment at the top of the file:
** Purpose: Implements a generic, dynamically sized list as an
** array.

According to MSDN,
The List class is the generic equivalent of the ArrayList class. It
implements the IList generic interface by using an array whose size
is dynamically increased as required.
So, a List<T> is contiguous.

Related

When will one prefer array, LinkedList or ArrayList over List<T>?

Is there any point of using those data types other then legacy code? Other data types like Dictionary or Graph are understandably used because they provide extra / different functionality. But array, LinkedList or ArrayList have less of a functionality and sometimes worst performance then List (ArrayList is less memory efficient in value types)
Then why use them at all?
Note: this is not an opinion - based question. All I want to know is use cases for these types
Another Note: I know about Linked list's O(1) insert time. I am asking when should it be utilized over the standard List, which has O(1) access time?
When it is better to use? (and the question about ArrayList and array remains)

ArrayList? sure: don't use it, basically ever (unless you don't want to migrate some legacy code, or can't because somebody has unwisely used BinaryFormatter).
LinkedList<T>, however, is not in the same category - it is niche, but it has uses due to cheap insertion/removal/etc, unlike List<T> which would need to move data around to perform insertion/removal. In most scenarios, you probably don't need that feature, so: don't use it unless you do?

LinkedList
Here is a list of differentiators from the List implementation. You use it when the items in the list need to maintain a specific order (hence the next and previous references).
Represents a doubly linked list.
LinkedList<T> provides separate nodes of type LinkedListNode<T>, so insert and removal are O(1) operations.
You can remove nodes and reinsert them, either in the same list or in another list, which results in no additional objects allocated on the heap. Because the list also maintains an internal count, getting the Count property is an O(1) operation.
Each node in a LinkedList<T> object is of the type LinkedListNode<T>. Because the LinkedList<T> is doubly linked, each node points forward to the Next node and backward to the Previous node.
List Vs ArrayList
ArrayList is a deprecated implementation used in the past. Prefer List<T> generic implementation in any new code.
As a generic collection, List<T> implements the generic IEnumerable<T> interface and can be used easily in LINQ
ArrayList belongs to the days that C# didn't have generics. It's deprecated in favor of List<T>. You shouldn't use ArrayList in new code that targets .NET >= 2.0 unless you have to interface with an old API that uses it.
Array vs List
Array is a fixed size collection and it supports multiple dimensions. It is the most efficient of the three for simple insert and iterations.

Are List<> elements sequentially located in heap like array?

I'm learning C# and basically know the difference between arrays and Lists that the last is a generic and can dynamically grow but I'm wondering:
are List elements sequentially located in heap like array or is each element located "randomly" in a different locations?
and if that is true, does that affect the speed of access & data retrieval from memory?
and if that is true, is this what makes arrays a little faster than Lists?

Let's see the second and the third questions first:
and if that true does that affect the speed of access & data retrieval from memory ?
and if that true is this what makes array little faster than list ?
There is only a single type of "native" collection in .NET (with .NET I mean the CLR, so the runtime): the array (technically, if you consider a string a type of collection, then there are two native types of collections :-) ) (technically part 2: not all the arrays you think that are arrays are "native" arrays... Only the monodimensional 0 based arrays are "native" arrays. Arrays of type T[,] aren't, and arrays where the first element doesn't have an index of 0 aren't) . Every other collection (other than the LinkedList<>) is built atop it. If you look at the List<T> with IlSpy you'll see that at the base of it there is a T[] with an added int for the Count (the T[].Length is the Capacity). Clearly an array is a little faster than a List<T> because to use it, you have one less indirection (you access the array directly, instead of accessing the array that accesses the list).
Let's see the first question:
does List elements sequentially located in heap like array or each element is located randomly in different locations?
Being based on an array internally, clearly the List<> memorizes its elements like an array, so in a contiguous block of memory (but be aware that with a List<SomeObject> where SomeObject is a reference type, the list is a list of references, not of objects, so the references are put in a contiguous block of memory (we will ignore that with the advanced memory management of computers, the word "contiguous block of memory" isn't exact", it would be better to say "a contiguous block of addresses") )
(yes, even Dictionary<> and HashSet<> are built atop arrays. Conversely a tree-like collection could be built without using an array, because it's more similar to a LinkedList)
Some additional details: there are four groups of instructions in the CIL language (the intermediate language used in compiled .NET programs) that are used with "native" arrays:
Newarr
Ldelem and family Ldelem_*
Stelem and family Stelem_*
ReadOnly (don't ask me its use, I don't know, and the documentation isn't clear)
if you look at OpCodes.Newarr you'll see this comment in the XML documentation:
// Summary:
// Pushes an object reference to a new zero-based, one-dimensional array whose
// elements are of a specific type onto the evaluation stack.

Yes, elements in a List are stored contiguously, just like an array. A List actually uses arrays internally, but that is an implementation detail that you shouldn't really need to be concerned with.
Of course, in order to get the correct impression from that statement, you also have to understand a bit about memory management in .NET. Namely, the difference between value types and reference types, and how objects of those types are stored. Value types will be stored in contiguous memory. With reference types, the references will be stored in contiguous memory, but not the instances themselves.
The advantage of using a List is that the logic inside of the class handles allocating and managing the items for you. You can add elements anywhere, remove elements from anywhere, and grow the entire size of the collection without having to do any extra work. This is, of course, also what makes a List slightly slower than an array. If any reallocation has to happen in order to comply with your request, there'll be a performance hit as a new, larger-sized array is allocated and the elements are copied to it. But it won't be any slower than if you wrote the code to do it manually with a raw array.
If your length requirement is fixed (i.e., you never need to grow/expand the total capacity of the array), you can go ahead and use a raw array. It might even be marginally faster than a List because it avoids the extra overhead and indirection (although that is subject to being optimized out by the JIT compiler).
If you need to be able to dynamically resize the collection, or you need any of the other features provided by the List class, just use a List. The performance difference will be virtually imperceptible.

List<long> vs long[], memory usage

Regarding the size in memory for the
List<long> ListOfLongs;
long[] ArrayOfLongs;
If each has N elements, how much memory they eat up?
I am asking that because as of my knowledge, .NET has not template (generics) specialization.

Practically the same amount of memory (technically, the List will probably consume some more because it has over-allocated so that it can grow more easily).
Generic collections in .NET do not need to box the items they hold, which would be a massive memory and performance sink.

The List<T> owns an array T[]. It uses an exponential growth strategy for this array, so a list with n elements usually has a backing array with size larger than n. Also the smaller arrays need to be garbage collected, which can be annoying if the are large enough to be on the LoH.
But you can avoid this by specifying a capacity manually, for example as a constructor parameter. Then a single array with the desired capacity will be allocated, so you avoid both of the above problems.
In addition List<T> has a small O(1) overhead for the list object itself.
But there is no per element overhead when using generics. The runtime creates a specialized version for each value type you pass in. No boxing of the elements occurs.
But you can't use C++ style template specialization, where you effectively overload the implementation for certain type parameters. All generic instantiations share the same C# code.
i.e. there is no specialized IL code, but each value type gets a specialized machine code implementation based on the same source-code.

I am asking that because as of my knowledge, .NET has not template (generics) specialization.
.Net doesn't have template specialization in the sense that you (as the programmer) can supply different code depending on the type arguments. But the compiler still can (and does) produce different code for value types than for reference type, i.e. (unlike in Java) value types are not boxed when put into a generic container. They're stored efficiently.

Using lists is more practical than using plain arrays. The key for performance and memory consumption is the Capacity of a list. By default it starts with a value of 4 and increases to 8, 16, 32, 64, ... whenever the elements of the list reach the defined capacity. Each increment is translated to an internal re-allocation and Array.Copy. So if you have a list with 1000 items and you expect 100 items in a day, you can instantiate the list with a capacity of 1200 (error margin in prediction 100%). This way you will avoid the re-allocation for 2000 items whenever you add the 10001 item, and of course the continuous re-allocations and Array.Copy to fill it with the existing 1000 items.

what is difference between string array and list of string in c#

I hear on MSDN that an array is faster than a collection.
Can you tell me how string[] is faster then List<string>.

Arrays are a lower level abstraction than collections such as lists. The CLR knows about arrays directly, so there's slightly less work involved in iterating, accessing etc.
However, this should almost never dictate which you actually use. The performance difference will be negligible in most real-world applications. I rarely find it appropriate to use arrays rather than the various generic collection classes, and indeed some consider arrays somewhat harmful. One significant downside is that there's no such thing as an immutable array (other than an empty one)... whereas you can expose read-only collections through an API relatively easily.

The article is from 2004, that means it's about .net 1.1 and there was no generics.
Array vs collection performance actually was a problem back then because collection types caused a lot of exta boxing-unboxing operations. But since .net 2.0, where generics was introduced, difference in performance almost gone.

An array is not resizable. This means that when it is created one block of memory is allocated, large enough to hold as many elements as you specify.
A List on the other hand is implicitly resizable. Each time you Add an item, the framework may need to allocate more memory to hold the item you just added. This is an expensive operation, so we end up saying "List is slower than array".
Of course this is a very simplified explanation, but hopefully enough to paint the picture.

An array is the simplest form of collection, so it's faster than other collections. A List (and many other collections) actually uses an array internally to hold its items.
An array is of course also limited by its simplicity. Most notably you can't change the size of an array. If you want a dynamic collection you would use a List.

List<string> is class with a private member that is a string[]. The MSDN documentation states this fact in several places. The List class is basically a wrapper class around an array that gives the array other functionality.
The answer of which is faster all depends on what you are trying to do with the list/array. For accessing and assigning values to elements, the array is probably negligibly faster since the List is an abstraction of the array (as Jon Skeet has said).
If you intend on having a data structure that grows over time (gets more and more elements), performance (ave. speed) wise the List will start to shine. That is because each time you resize an array to add another element it is an O(n) operation. When you add an element to a List (and the list is already at capacity) the list will double itself in size. I won't get into the nitty gritty details, but basically this means that increasing the size of a List is on average a O(log n) operation. Of course this has drawbacks too (you could have almost twice the amount of memory allocated as you really need if you only go a couple items past its last capacity).
Edit: I got a little mixed up in the paragraph above. As Eric has said below, the number of resizes for a List is O(log n), but the actual cost associated with resizing the array is amortized to O(1).

.Net Data structures: ArrayList, List, HashTable, Dictionary, SortedList, SortedDictionary -- Speed, memory, and when to use each? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
.NET has a lot of complex data structures. Unfortunately, some of them are quite similar and I'm not always sure when to use one and when to use another. Most of my C# and VB books talk about them to a certain extent, but they never really go into any real detail.
What's the difference between Array, ArrayList, List, Hashtable, Dictionary, SortedList, and SortedDictionary?
Which ones are enumerable (IList -- can do 'foreach' loops)? Which ones use key/value pairs (IDict)?
What about memory footprint? Insertion speed? Retrieval speed?
Are there any other data structures worth mentioning?
I'm still searching for more details on memory usage and speed (Big-O notation)

Off the top of my head:
Array* - represents an old-school memory array - kind of like a alias for a normal type[] array. Can enumerate. Can't grow automatically. I would assume very fast insert and retrival speed.
ArrayList - automatically growing array. Adds more overhead. Can enum., probably slower than a normal array but still pretty fast. These are used a lot in .NET
List - one of my favs - can be used with generics, so you can have a strongly typed array, e.g. List<string>. Other than that, acts very much like ArrayList
Hashtable - plain old hashtable. O(1) to O(n) worst case. Can enumerate the value and keys properties, and do key/val pairs
Dictionary - same as above only strongly typed via generics, such as Dictionary<string, string>
SortedList - a sorted generic list. Slowed on insertion since it has to figure out where to put things. Can enum., probably the same on retrieval since it doesn't have to resort, but deletion will be slower than a plain old list.
I tend to use List and Dictionary all the time - once you start using them strongly typed with generics, its really hard to go back to the standard non-generic ones.
There are lots of other data structures too - there's KeyValuePair which you can use to do some interesting things, there's a SortedDictionary which can be useful as well.

If at all possible, use generics. This includes:
List instead of ArrayList
Dictionary instead of HashTable

First, all collections in .NET implement IEnumerable.
Second, a lot of the collections are duplicates because generics were added in version 2.0 of the framework.
So, although the generic collections likely add features, for the most part:
List is a generic implementation of ArrayList.
Dictionary<T,K> is a generic implementation of Hashtable
Arrays are a fixed size collection that you can change the value stored at a given index.
SortedDictionary is an IDictionary<T,K> that is sorted based on the keys.
SortedList is an IDictionary<T,K> that is sorted based on a required IComparer.
So, the IDictionary implementations (those supporting KeyValuePairs) are:
Hashtable
Dictionary<T,K>
SortedList<T,K>
SortedDictionary<T,K>
Another collection that was added in .NET 3.5 is the Hashset. It is a collection that supports set operations.
Also, the LinkedList is a standard linked-list implementation (the List is an array-list for faster retrieval).

Here are a few general tips for you:
You can use foreach on types that implement IEnumerable. IList is essentially an IEnumberable with Count and Item (accessing items using a zero-based index) properties. IDictionary on the other hand means you can access items by any-hashable index.
Array, ArrayList and List all implement IList.
Dictionary, SortedDictionary, and Hashtable implement IDictionary.
If you are using .NET 2.0 or higher, it is recommended that you use generic counterparts of mentioned types.
For time and space complexity of various operations on these types, you should consult their documentation.
.NET data structures are in System.Collections namespace. There are type libraries such as PowerCollections which offer additional data structures.
To get a thorough understanding of data structures, consult resources such as CLRS.

.NET data structures:
More to conversation about why ArrayList and List are actually different
Arrays
As one user states, Arrays are the "old school" collection (yes, arrays are considered a collection though not part of System.Collections). But, what is "old school" about arrays in comparison to other collections, i.e the ones you have listed in your title (here, ArrayList and List(Of T))? Let's start with the basics by looking at Arrays.
To start, Arrays in Microsoft .NET are, "mechanisms that allow you to treat several [logically-related] items as a single collection," (see linked article). What does that mean? Arrays store individual members (elements) sequentially, one after the other in memory with a starting address. By using the array, we can easily access the sequentially stored elements beginning at that address.
Beyond that and contrary to programming 101 common conceptions, Arrays really can be quite complex:
Arrays can be single dimension, multidimensional, or jadded (jagged arrays are worth reading about). Arrays themselves are not dynamic: once initialized, an array of n size reserves enough space to hold n number of objects. The number of elements in the array cannot grow or shrink. Dim _array As Int32() = New Int32(100) reserves enough space on the memory block for the array to contain 100 Int32 primitive type objects (in this case, the array is initialized to contain 0s). The address of this block is returned to _array.
According to the article, Common Language Specification (CLS) requires that all arrays be zero-based. Arrays in .NET support non-zero-based arrays; however, this is less common. As a result of the "common-ness" of zero-based arrays, Microsoft has spent a lot of time optimizing their performance; therefore, single dimension, zero-based (SZs) arrays are "special" - and really the best implementation of an array (as opposed to multidimensional, etc.) - because SZs have specific intermediary language instructions for manipulating them.
Arrays are always passed by reference (as a memory address) - an important piece of the Array puzzle to know. While they do bounds checking (will throw an error), bounds checking can also be disabled on arrays.
Again, the biggest hindrance to arrays is that they are not re-sizable. They have a "fixed" capacity. Introducing ArrayList and List(Of T) to our history:
ArrayList - non-generic list
The ArrayList (along with List(Of T) - though there are some critical differences, here, explained later) - is perhaps best thought of as the next addition to collections (in the broad sense). ArrayList inherit from the IList (a descendant of 'ICollection') interface. ArrayLists, themselves, are bulkier - requiring more overhead - than Lists.
IList does enable the implementation to treat ArrayLists as fixed-sized lists (like Arrays); however, beyond the additional functionallity added by ArrayLists, there are no real advantages to using ArrayLists that are fixed size as ArrayLists (over Arrays) in this case are markedly slower.
From my reading, ArrayLists cannot be jagged: "Using multidimensional arrays as elements... is not supported". Again, another nail in the coffin of ArrayLists. ArrayLists are also not "typed" - meaning that, underneath everything, an ArrayList is simply a dynamic Array of Objects: Object[]. This requires a lot of boxing (implicit) and unboxing (explicit) when implementing ArrayLists, again adding to their overhead.
Unsubstantiated thought: I think I remember either reading or having heard from one of my professors that ArrayLists are sort of the bastard conceptual child of the attempt to move from Arrays to List-type Collections, i.e. while once having been a great improvement to Arrays, they are no longer the best option as further development has been done with respect to collections
List(Of T): What ArrayList became (and hoped to be)
The difference in memory usage is significant enough to where a List(Of Int32) consumed 56% less memory than an ArrayList containing the same primitive type (8 MB vs. 19 MB in the above gentleman's linked demonstration: again, linked here) - though this is a result compounded by the 64-bit machine. This difference really demonstrates two things: first (1), a boxed Int32-type "object" (ArrayList) is much bigger than a pure Int32 primitive type (List); second (2), the difference is exponential as a result of the inner-workings of a 64-bit machine.
So, what's the difference and what is a List(Of T)? MSDN defines a List(Of T) as, "... a strongly typed list of objects that can be accessed by index." The importance here is the "strongly typed" bit: a List(Of T) 'recognizes' types and stores the objects as their type. So, an Int32 is stored as an Int32 and not an Object type. This eliminates the issues caused by boxing and unboxing.
MSDN specifies this difference only comes into play when storing primitive types and not reference types. Too, the difference really occurs on a large scale: over 500 elements. What's more interesting is that the MSDN documentation reads, "It is to your advantage to use the type-specific implementation of the List(Of T) class instead of using the ArrayList class...."
Essentially, List(Of T) is ArrayList, but better. It is the "generic equivalent" of ArrayList. Like ArrayList, it is not guaranteed to be sorted until sorted (go figure). List(Of T) also has some added functionality.

I found "Choose a Collection" section of Microsoft Docs on Collection and Data Structure page really useful
C# Collections and Data Structures : Choose a collection
And also the following matrix to compare some other features

I sympathise with the question - I too found (find?) the choice bewildering, so I set out scientifically to see which data structure is the fastest (I did the test using VB, but I imagine C# would be the same, since both languages do the same thing at the CLR level). You can see some benchmarking results conducted by me here (there's also some discussion of which data type is best to use in which circumstances).

They're spelled out pretty well in intellisense. Just type System.Collections. or System.Collections.Generics (preferred) and you'll get a list and short description of what's available.

Hashtables/Dictionaries are O(1) performance, meaning that performance is not a function of size. That's important to know.
EDIT: In practice, the average time complexity for Hashtable/Dictionary<> lookups is O(1).

The generic collections will perform better than their non-generic counterparts, especially when iterating through many items. This is because boxing and unboxing no longer occurs.

An important note about Hashtable vs Dictionary for high frequency systematic trading engineering: Thread Safety Issue
Hashtable is thread safe for use by multiple threads.
Dictionary public static members are thread safe, but any instance members are not guaranteed to be so.
So Hashtable remains the 'standard' choice in this regard.

There are subtle and not-so-subtle differences between generic and non-generic collections. They merely use different underlying data structures. For example, Hashtable guarantees one-writer-many-readers without sync. Dictionary does not.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.