The different attributes of Tuples and Lists;
Tuples are heterogeneous and Lists are homogeneous,
Tuples are immutable while Lists are mutable,
often dictate the use of one type over the other. In other scenarios, however, either data type could be equally appropriate. As such , what are the memory and/or performance implications of Tuples versus Lists that might also guide our decision?
Thanks,
Well, in addition to what you've mentioned there's the rather significant difference that a tuple can only contain up to eight items. (OK, you can technically make arbitrarily large tuples by making the last tuple argument type another tuple, but I can't help but feel that you'd have to be slightly insane to actually do that.)
Unlike tuples in a language like Python, tuples in C# can't really be used as a general purpose data structure. One of the most common use cases for a tuple in C# is for returning multiple values from a function, or passing multiple values to functions that for some reasons can only take one (e.g. when passing e.Argument to a BackgroundWorker), or any other situation where you can't be bothered to make a custom class, and you can't use an anonymous type.
Since you need to know exactly how many items (and what types of items) they will contain at compile time, tuples are really of severely limited use. Lists, on the other hand, are for general purpose storage of homogeneous data, where you don't necessarily know how many items you're going to have. I'd love to see an example of a piece of code where, as you put it, "either data type could be equally appropriate".
Furthermore, since tuples and lists solve completely different problems, it's probably of fairly limited interest to compare the memory/performance implications. But for what it's worth, tuples are implemented as classes, not as structs, so they're stored on the heap just like lists, and they aren't copied when you pass them between functions, unlike value types. They do however implement the IStructuralEquatable and IStructuralComparable interfaces, and their Equals method is implemented such that this will return true: new Tuple<int>(1).Equals(new Tuple<int>(1)) (meanwhile, new List<int>() { 1 }.Equals(new List<int>() { 1 }) is false).
Related
In the .NET Framework, there are interfaces for immutable dictionaries, lists, queues, sets, stacks.
But there is no interface for immutable arrays to be found.
Also, immutable arrays are implemented as a struct in contrast to others which are class.
Can you explain why such design?
ImmutableArray<T> and ImmutableList<T> are functionally compatible types, so implementing just IImmutableList<T> is enough for both.
They differ only in characteristics, which is described in the documentation:
ImmutableArray<T> uses an actual underlying array, so accessing an element by index is an O(1) operation, whereas adding a new element (and thus creating a new array and copying all items from the previous one) has an O(n) cost
In contrast, ImmutableList<T> is represented as a binary tree, so both element access by index and adding a new item have O(log N) cost.
Additionally, ImmutableArray<T> is a struct so if used directly, accessing the underlying array is very fast. However, if it is boxed (eg. using as an interface implementation, including LINQ), its usage will be somewhat slower.
Conclusion: you must decide which one fits better for your needs. If you typically create it once and then just access elements ImmutableArray<T> (or even a simple List<T> can be better). But if you add new elements more frequently than access them by index, then ImmutableList<T> can be a more optimal choice.
In general interfaces and classes don't have an one-to-one relationship in .NET. For example see a recent comment by a Microsoft engineer on GitHib:
We generally don't expose collection abstractions, unless the BCL contains a reasonable number of implementations that share the same feature set.
By "collection abstractions" they mean interfaces. Some immutable collections do have such a relationship, like the ImmutableQueue<T> collection and the IImmutableQueue<T> interface, but you could see them more as an exception than the rule.
The ImmutableArray<T> struct is interesting. You can learn why it was implemented as a struct in this blog post by Immo Landwerth. Quote:
ImmutableArray<T> is a very thin wrapper around a regular array and thus shares all the benefits with them. We even made it a value type (struct) as it only has a single field which holds the array it wraps. This makes the size of the value type identical to the reference of the array. In other words: passing around an immutable array is as cheap as passing around the underlying array. Since it’s a value type, there is also no additional object allocation necessary to create the immutable wrapper, which can reduce GC pressure.
The ImmutableArray<T> implements many interfaces, but if you store it as an interface in a variable or field, it will be boxed. So you generally don't want to do this. Preferably you'd want to store it and pass it around as is.
CA1023: Indexers should not be multidimensional
Indexers, that is, indexed properties, should use a single index.
Multi-dimensional indexers can significantly reduce the usability of
the library. If the design requires multiple indexes, reconsider
whether the type represents a logical data store. If not, use a
method.
To fix a violation of this rule, change the design to use a lone integer or string index, or use a method instead of the indexer.
This seems very strange to me, why does this significantly affect anything? other than making a multidimensional index much less intuitive?
Change the design to use a string? And do what with it? Parse the numbers out at the other end and lose strong typing?
Can someone give me some reasons why there is something wrong with Multidimensional indexers?
If the design requires multiple indexes, reconsider whether the type
represents a logical data store. If not, use a method.
The problem is that indexers cannot be specifically named (except, as #volpav comments, for interfacing with languages that don't natively support them - by default, they are called Item) whereas methods must be. This means that a client may have problems guessing at the meaning of an indexer if it is not immediately obvious ("represents a logical data store"). This can be particularly vexing if an indexer has multiple parameters and/or multiple overloads of the indexer exist. Of course, naming indexer parameters can help matters (although the indexer itself cannot be specifically named in C#), but consider that methods can hint at the meanings of the return value and the parameters in the name of the method itself (consider GetCustomersByCountryAndAge vs this[string, int] ). This greatly helps readability when browsing source code.
Well-written XML documentation for your indexer would also help.
To fix a violation of this rule, change the design to use a lone
integer or string index
This somewhat poorly phrased sentence appears to be advice combined with the observation that int and string are the most common parameter types for indexers. It should likely just read "change the design to use a lone index."
I have a class Customer (with typical customer properties) and I need to pass around, and databind, a "chunk" of Customer instances. Currently I'm using an array of Customer, but I've also used Collection of T (and List of T before I knew about Collection of T). I'd like the thinnest way to pass this chunk around using C# and .NET 3.5.
Currently, the array of Customer is working just fine for me. It data binds well and seems to be as lightweight as it gets. I don't need the stuff List of T offers and Collection of T still seems like overkill. The array does require that I know ahead of time how many Customers I'm adding to the chunk, but I always know that in advance (given rows in a page, for example).
Am I missing something fundamental or is the array of Customer OK? Is there a tradeoff I'm missing?
Also, I'm assuming that Collection of T makes the old loosely-typed ArrayList obsolete. Am I right there?
Yes, Collection<T> (or List<T> more commonly) makes ArrayList pretty much obsolete. In particular, I believe ArrayList isn't even supported in Silverlight 2.
Arrays are okay in some cases, but should be considered somewhat harmful - they have various disadvantages. (They're at the heart of the implementation of most collections, of course...) I'd go into more details, but Eric Lippert does it so much better than I ever could in the article referenced by the link. I would summarise it here, but that's quite hard to do. It really is worth just reading the whole post.
No one has mentioned the Framework Guidelines advice: Don't use List<T> in public API's:
We don’t recommend using List in
public APIs for two reasons.
List<T> is not designed to be extended. i.e. you cannot override any
members. This for example means that
an object returning List<T> from a
property won’t be able to get notified
when the collection is modified.
Collection<T> lets you overrides
SetItem protected member to get
“notified” when a new items is added
or an existing item is changed.
List has lots of members that are not relevant in many scenarios. We
say that List<T> is too “busy” for
public object models. Imagine
ListView.Items property returning
List<T> with all its richness. Now,
look at the actual ListView.Items
return type; it’s way simpler and
similar to Collection<T> or
ReadOnlyCollection<T>
Also, if your goal is two-way Databinding, have a look at BindingList<T> (with the caveat that it is not sortable 'out of the box'!)
Generally, you should 'pass around' IEnumerable<T> or ICollection<T> (depending on whether it makes sense for your consumer to add items).
If you have an immutable list of customers, that is... your list of customers will not change, it's relatively small, and you will always iterate over it first to last and you don't need to add to the list or remove from it, then an array is probably just fine.
If you're unsure, however, then your best bet is a collection of some type. What collection you choose depends on the operations you wish to perform on it. Collections are all about inserts, manipulations, lookups, and deletes. If you do frequent frequent searches for a given element, then a dictionary may be best. If you need to sort the data, then perhaps a SortedList will work better.
I wouldn't worry about "lightweight", unless you're talking a massive number of elements, and even then the advantages of O(1) lookups outweigh the costs of resources.
When you "pass around" a collection, you're only passing a reference, which is basically a pointer. So there is no performance difference between passing a collection and an array.
I'm going to put in a dissenting argument to both Jon and Eric Lippert )which means that you should be very weary of my answer, indeed!).
The heart of Eric Lippert's arguments against arrays is that the contents are immutable, while the data structure itself is not. With regards to returning them from methods, the contents of a List are just as mutable. In fact, because you can add or subtract elements from a List, I would argue that this makes the return value more mutable than an array.
The other reason I'm fond of Arrays is because sometime back I had a small section of performance critical code, so I benchmarked the performance characteristics of the two, and arrays blew Lists out of the water. Now, let me caveat this by saying it was a narrow test for how I was going to use them in a specific situation, and it goes against what I understand of both, but the numbers were wildly different.
Anyway, listen to Jon and Eric =), and I agree that List almost always makes more sense.
I agree with Alun, with one addition. If you may want to address the return value by subscript myArray[n], then use an IList.
An Array inherently supports IList (as well as IEnumerable and ICollection, for that matter). So if you pass by interface, you can still use the array as your underlying data structure. In this way, the methods that you are passing the array into don't have to "know" that the underlying datastructure is an array:
public void Test()
{
IList<Item> test = MyMethod();
}
public IList<Item> MyMethod()
{
Item[] items = new Item[] {new Item()};
return items;
}
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
.NET has a lot of complex data structures. Unfortunately, some of them are quite similar and I'm not always sure when to use one and when to use another. Most of my C# and VB books talk about them to a certain extent, but they never really go into any real detail.
What's the difference between Array, ArrayList, List, Hashtable, Dictionary, SortedList, and SortedDictionary?
Which ones are enumerable (IList -- can do 'foreach' loops)? Which ones use key/value pairs (IDict)?
What about memory footprint? Insertion speed? Retrieval speed?
Are there any other data structures worth mentioning?
I'm still searching for more details on memory usage and speed (Big-O notation)
Off the top of my head:
Array* - represents an old-school memory array - kind of like a alias for a normal type[] array. Can enumerate. Can't grow automatically. I would assume very fast insert and retrival speed.
ArrayList - automatically growing array. Adds more overhead. Can enum., probably slower than a normal array but still pretty fast. These are used a lot in .NET
List - one of my favs - can be used with generics, so you can have a strongly typed array, e.g. List<string>. Other than that, acts very much like ArrayList
Hashtable - plain old hashtable. O(1) to O(n) worst case. Can enumerate the value and keys properties, and do key/val pairs
Dictionary - same as above only strongly typed via generics, such as Dictionary<string, string>
SortedList - a sorted generic list. Slowed on insertion since it has to figure out where to put things. Can enum., probably the same on retrieval since it doesn't have to resort, but deletion will be slower than a plain old list.
I tend to use List and Dictionary all the time - once you start using them strongly typed with generics, its really hard to go back to the standard non-generic ones.
There are lots of other data structures too - there's KeyValuePair which you can use to do some interesting things, there's a SortedDictionary which can be useful as well.
If at all possible, use generics. This includes:
List instead of ArrayList
Dictionary instead of HashTable
First, all collections in .NET implement IEnumerable.
Second, a lot of the collections are duplicates because generics were added in version 2.0 of the framework.
So, although the generic collections likely add features, for the most part:
List is a generic implementation of ArrayList.
Dictionary<T,K> is a generic implementation of Hashtable
Arrays are a fixed size collection that you can change the value stored at a given index.
SortedDictionary is an IDictionary<T,K> that is sorted based on the keys.
SortedList is an IDictionary<T,K> that is sorted based on a required IComparer.
So, the IDictionary implementations (those supporting KeyValuePairs) are:
Hashtable
Dictionary<T,K>
SortedList<T,K>
SortedDictionary<T,K>
Another collection that was added in .NET 3.5 is the Hashset. It is a collection that supports set operations.
Also, the LinkedList is a standard linked-list implementation (the List is an array-list for faster retrieval).
Here are a few general tips for you:
You can use foreach on types that implement IEnumerable. IList is essentially an IEnumberable with Count and Item (accessing items using a zero-based index) properties. IDictionary on the other hand means you can access items by any-hashable index.
Array, ArrayList and List all implement IList.
Dictionary, SortedDictionary, and Hashtable implement IDictionary.
If you are using .NET 2.0 or higher, it is recommended that you use generic counterparts of mentioned types.
For time and space complexity of various operations on these types, you should consult their documentation.
.NET data structures are in System.Collections namespace. There are type libraries such as PowerCollections which offer additional data structures.
To get a thorough understanding of data structures, consult resources such as CLRS.
.NET data structures:
More to conversation about why ArrayList and List are actually different
Arrays
As one user states, Arrays are the "old school" collection (yes, arrays are considered a collection though not part of System.Collections). But, what is "old school" about arrays in comparison to other collections, i.e the ones you have listed in your title (here, ArrayList and List(Of T))? Let's start with the basics by looking at Arrays.
To start, Arrays in Microsoft .NET are, "mechanisms that allow you to treat several [logically-related] items as a single collection," (see linked article). What does that mean? Arrays store individual members (elements) sequentially, one after the other in memory with a starting address. By using the array, we can easily access the sequentially stored elements beginning at that address.
Beyond that and contrary to programming 101 common conceptions, Arrays really can be quite complex:
Arrays can be single dimension, multidimensional, or jadded (jagged arrays are worth reading about). Arrays themselves are not dynamic: once initialized, an array of n size reserves enough space to hold n number of objects. The number of elements in the array cannot grow or shrink. Dim _array As Int32() = New Int32(100) reserves enough space on the memory block for the array to contain 100 Int32 primitive type objects (in this case, the array is initialized to contain 0s). The address of this block is returned to _array.
According to the article, Common Language Specification (CLS) requires that all arrays be zero-based. Arrays in .NET support non-zero-based arrays; however, this is less common. As a result of the "common-ness" of zero-based arrays, Microsoft has spent a lot of time optimizing their performance; therefore, single dimension, zero-based (SZs) arrays are "special" - and really the best implementation of an array (as opposed to multidimensional, etc.) - because SZs have specific intermediary language instructions for manipulating them.
Arrays are always passed by reference (as a memory address) - an important piece of the Array puzzle to know. While they do bounds checking (will throw an error), bounds checking can also be disabled on arrays.
Again, the biggest hindrance to arrays is that they are not re-sizable. They have a "fixed" capacity. Introducing ArrayList and List(Of T) to our history:
ArrayList - non-generic list
The ArrayList (along with List(Of T) - though there are some critical differences, here, explained later) - is perhaps best thought of as the next addition to collections (in the broad sense). ArrayList inherit from the IList (a descendant of 'ICollection') interface. ArrayLists, themselves, are bulkier - requiring more overhead - than Lists.
IList does enable the implementation to treat ArrayLists as fixed-sized lists (like Arrays); however, beyond the additional functionallity added by ArrayLists, there are no real advantages to using ArrayLists that are fixed size as ArrayLists (over Arrays) in this case are markedly slower.
From my reading, ArrayLists cannot be jagged: "Using multidimensional arrays as elements... is not supported". Again, another nail in the coffin of ArrayLists. ArrayLists are also not "typed" - meaning that, underneath everything, an ArrayList is simply a dynamic Array of Objects: Object[]. This requires a lot of boxing (implicit) and unboxing (explicit) when implementing ArrayLists, again adding to their overhead.
Unsubstantiated thought: I think I remember either reading or having heard from one of my professors that ArrayLists are sort of the bastard conceptual child of the attempt to move from Arrays to List-type Collections, i.e. while once having been a great improvement to Arrays, they are no longer the best option as further development has been done with respect to collections
List(Of T): What ArrayList became (and hoped to be)
The difference in memory usage is significant enough to where a List(Of Int32) consumed 56% less memory than an ArrayList containing the same primitive type (8 MB vs. 19 MB in the above gentleman's linked demonstration: again, linked here) - though this is a result compounded by the 64-bit machine. This difference really demonstrates two things: first (1), a boxed Int32-type "object" (ArrayList) is much bigger than a pure Int32 primitive type (List); second (2), the difference is exponential as a result of the inner-workings of a 64-bit machine.
So, what's the difference and what is a List(Of T)? MSDN defines a List(Of T) as, "... a strongly typed list of objects that can be accessed by index." The importance here is the "strongly typed" bit: a List(Of T) 'recognizes' types and stores the objects as their type. So, an Int32 is stored as an Int32 and not an Object type. This eliminates the issues caused by boxing and unboxing.
MSDN specifies this difference only comes into play when storing primitive types and not reference types. Too, the difference really occurs on a large scale: over 500 elements. What's more interesting is that the MSDN documentation reads, "It is to your advantage to use the type-specific implementation of the List(Of T) class instead of using the ArrayList class...."
Essentially, List(Of T) is ArrayList, but better. It is the "generic equivalent" of ArrayList. Like ArrayList, it is not guaranteed to be sorted until sorted (go figure). List(Of T) also has some added functionality.
I found "Choose a Collection" section of Microsoft Docs on Collection and Data Structure page really useful
C# Collections and Data Structures : Choose a collection
And also the following matrix to compare some other features
I sympathise with the question - I too found (find?) the choice bewildering, so I set out scientifically to see which data structure is the fastest (I did the test using VB, but I imagine C# would be the same, since both languages do the same thing at the CLR level). You can see some benchmarking results conducted by me here (there's also some discussion of which data type is best to use in which circumstances).
They're spelled out pretty well in intellisense. Just type System.Collections. or System.Collections.Generics (preferred) and you'll get a list and short description of what's available.
Hashtables/Dictionaries are O(1) performance, meaning that performance is not a function of size. That's important to know.
EDIT: In practice, the average time complexity for Hashtable/Dictionary<> lookups is O(1).
The generic collections will perform better than their non-generic counterparts, especially when iterating through many items. This is because boxing and unboxing no longer occurs.
An important note about Hashtable vs Dictionary for high frequency systematic trading engineering: Thread Safety Issue
Hashtable is thread safe for use by multiple threads.
Dictionary public static members are thread safe, but any instance members are not guaranteed to be so.
So Hashtable remains the 'standard' choice in this regard.
There are subtle and not-so-subtle differences between generic and non-generic collections. They merely use different underlying data structures. For example, Hashtable guarantees one-writer-many-readers without sync. Dictionary does not.
In C# There seem to be quite a few different lists. Off the top of my head I was able to come up with a couple, however I'm sure there are many more.
List<String> Types = new List<String>();
ArrayList Types2 = new ArrayList();
LinkedList<String> Types4 = new LinkedList<String>();
My question is when is it beneficial to use one over the other?
More specifically I am returning lists of unknown size from functions and I was wondering if there is a particular list that was better at this.
List<String> Types = new List<String>();
LinkedList<String> Types4 = new LinkedList<String>();
are generic lists, i.e. you define the data type that would go in there which decreased boxing and un-boxing.
for difference in list vs linklist, see this --> When should I use a List vs a LinkedList
ArrayList is a non-generic collection, which can be used to store any type of data type.
99% of the time List is what you'll want. Avoid the non-generic collections at all costs.
LinkedList is useful for adding or removing without shuffling items around, although you have to forego random access as a result. One advantage it does have is you can remove items whilst iterating through the nodes.
ArrayList is a holdover from before Generics. There's really no reason to use them ... they're slow and use more memory than List<>. In general, there's probably no reason to use LinkedList either unless you are inserting midway through VERY large lists.
The only thing you'll find in .NET faster than a List<> is a fixed array ... but the performance difference is surprisingly small.
See the article on Commonly Used Collection Types from MSDN for a list of the the various types of collections available to you, and their intended uses.
ArrayList is a .Net 1.0 list type.
List is a generic list introduced with generics in .Net 2.0.
Generic lists provide better compile time support. Generics lists are type safe. You cannot add objects of wrong type. Therefor you know which type the stored objects has. There are no typechecks and typecasts nessecary.
I dont know about performance differences.
This questions says something about the difference of List and LinkedList.
As mentioned, don't use ArrayList if at all possible.
Here's an bit on Wikipedia about the differences between arrays and linked lists.
In summary:
Arrays
Fast random access
Fast inserting/deleting at end
Good memory locality
Linked Lists
Fast inserting/deleting at beginning
Fast inserting/deleting at end
Fast inserting/deleting at middle (with enumerator)
Generally, use List. Don't use ArrayList; it's obsolete. Use LinkedList in the rare cases where you need to be able to add without resizing and don't mind the overhead and loss of random access.
ArrayList is probably smaller, memory-wise, since it is based on an array. It also has fast random-access to elements. However, adding or removing to the list will take longer. This might be sped up slightly if the object over-allocates under the assumption that you are going to keep adding. (That will, of course, reduce the memory advantage.)
The other lists will be slightly larger (4-to-8 bytes more memory per element), and will have poor random access times. However, it is very fast to add or remove objects to the ends of the list. Also, memory usage is usually spot-on for what you need.