The underlay of C# where() method and C# HashTable - c#

I am new in C#, having got used to Java and its data structures. Recently, when I wrote C#, trying to get the list of results based on users' choices. I used HashTable in C# but people say I can call where() method in List.
I am wondering the underlay of where method (array? or some other data structure?)so that I can figure out the real cost of where() function and HashTable.
Also, I know most of HashTable is created based on array or BST, so how does HashTable work in C#?

In general, I wouldn't use Hashtable in C#, but instead use Dictionary<TKey, TValue>, as this provides type safety.
Both classes have access that approaches O(1) for accessing items in the collection. This is mentioned in the documentation:
Getting or setting the value of this property approaches an O(1) operation.
Note that using a List<T> will effectively use an array internally, which means .Where becomes an O(n) filter. For finding a single element, FirstOrDefault is typically a better choice. If there aren't many items in the collection, this is often fine, but if you need fast access, then a Dictionary<T,U> is a better option.

Related

When will one prefer array, LinkedList or ArrayList over List<T>?

Is there any point of using those data types other then legacy code? Other data types like Dictionary or Graph are understandably used because they provide extra / different functionality. But array, LinkedList or ArrayList have less of a functionality and sometimes worst performance then List (ArrayList is less memory efficient in value types)
Then why use them at all?
Note: this is not an opinion - based question. All I want to know is use cases for these types
Another Note: I know about Linked list's O(1) insert time. I am asking when should it be utilized over the standard List, which has O(1) access time?
When it is better to use? (and the question about ArrayList and array remains)
ArrayList? sure: don't use it, basically ever (unless you don't want to migrate some legacy code, or can't because somebody has unwisely used BinaryFormatter).
LinkedList<T>, however, is not in the same category - it is niche, but it has uses due to cheap insertion/removal/etc, unlike List<T> which would need to move data around to perform insertion/removal. In most scenarios, you probably don't need that feature, so: don't use it unless you do?
LinkedList
Here is a list of differentiators from the List implementation. You use it when the items in the list need to maintain a specific order (hence the next and previous references).
Represents a doubly linked list.
LinkedList<T> provides separate nodes of type LinkedListNode<T>, so insert and removal are O(1) operations.
You can remove nodes and reinsert them, either in the same list or in another list, which results in no additional objects allocated on the heap. Because the list also maintains an internal count, getting the Count property is an O(1) operation.
Each node in a LinkedList<T> object is of the type LinkedListNode<T>. Because the LinkedList<T> is doubly linked, each node points forward to the Next node and backward to the Previous node.
List Vs ArrayList
ArrayList is a deprecated implementation used in the past. Prefer List<T> generic implementation in any new code.
As a generic collection, List<T> implements the generic IEnumerable<T> interface and can be used easily in LINQ
ArrayList belongs to the days that C# didn't have generics. It's deprecated in favor of List<T>. You shouldn't use ArrayList in new code that targets .NET >= 2.0 unless you have to interface with an old API that uses it.
Array vs List
Array is a fixed size collection and it supports multiple dimensions. It is the most efficient of the three for simple insert and iterations.

Cost of enumerating ILookup results? Better to always use Dictionary<TKey,List<TElement>>?

I have a ILookup<TKey,TElement> lookup from which I fairly often get elements and iterate trough them using LINQ or foreach. I look up like this IEnumerable<TElement> results = lookup[key];.
Thus, results needs to be enumerated at least once every time I use lookup results (and even more if I'm iterating multiple times if I don't use .ToList() first).
Even though its not as "clean", wouldn't it be better (performance-wise) to use a Dictionary<TKey,List<TElement>>, so that all results from a key are only enumerated on construction of the dictionary? Just how taxing is ToList()?
ToLookup, like all the other ToXXX LINQ methods, uses immediate execution. The resulting object has no reference to the original source. It effectively does build a Dictionary<TKey, List<TElement>> - not those exact types, perhaps, but equivalent to it.
Note that there's a difference though, which may or may not be useful to you - the indexer for a lookup returns an empty sequence if you give it a key which doesn't exist, rather than throwing an exception. That can make life much easier if you want to be able to just index it by any key and iterate over the corresponding values.
Also note that although it's not explicitly documented, the implementation used for the value sequences does implement ICollection<T>, so calling the LINQ Count() method is O(1) - it doesn't need to iterate over all the elements.
See my Edulinq post on ToLookup for more details.
Assuming the implementation is System.Linq.Lookup (does ILookup have any other implementations?), the elements presented in lookup[key] are stored in an array of elements as a field of System.Linq.Lookup.Grouping. Repeatedly looking them up won't cause a re-iteration of source. Of course, rebuilding the Lookup will be more costly, but once built, the source is no longer accessed.

Why create an IEnumerable?

I don't understand why I'd create an IEnumerable. Or why it's important.
I'm looking at the example for IEnumerable:
http://msdn.microsoft.com/en-us/library/system.collections.ienumerable.aspx
But I can basically do the same thing if I just went:
List<Person> people = new List<Person>();
so what's IEnumerable good for? Can you give me a situation where I'd need to create a class that implements IEnumerable?
IEnumerable is an interface, it exposes certain things to the outside. While you are completely right, you could just use a List<T>, but List<T> is very deep in the inheritance tree. What exactly does a List<T>? It stores items, it offers certain methods to Add and Remove. Now, what if you only need the "item-keeping" feature of a List<T>? That's what an IEnumerable<T> is - an abstract way of saying "I want to get a list of items I can iterate over". A list is "I want to get a collection which I can modify, can access by index and iterate". List<T> offers a lot more functionality than IEnumerable<T> does, but it takes up more memory. So if a method is taking an IEnumerable<T>, it doesn't care what exactly it gets, as long as the object offers the possibilites of IEnumerable<T>.
Also, you don't have to create your own IEnumerable<T>, a List<T> IS an IEnumerable<T>!
Lists are, of course IEnumerable - As a general rule, you want to be specific on what you output but broad on what you accept as input eg:
You have a sub which loops through a list of objects and writes something to the console...
You could declare the parameter is as either IEnumerable<T> or IList<T> (or even List<T>). Since you don't need to add to the input list, all you actually need to do is enumerate - so use IEnumerable - then your method will also accept other types which implement IEnumerable including IQueryable, Linked Lists, etc...
You're making your methods more generic for no cost.
Today, you generally wouldn't use IEnumerable anymore unless you were supporting software on an older version of the framework. Today, you'd normally use IEnumerable<T>. Amongst other benefits, IEnumerable fully implements all of the LINQ operations/extensions so that you can easily query any List type that implements IEnumerable<T> using LINQ.
Additionally, it doesn't tie the consumer of your code to a particular collection implementation.
It's rare that nowdays you need to create your own container classes, as you are right there alreay exists many good implementations.
However if you do create your own container class for some specific reason, you may like to implement IEnumerable or IEnumerable<T> because they are a standard "contract" for itteration and by providing an implementation you can take advantage of methods/apis that want an IEnumerable or IEnumerable<T> Linq for example will give you a bunch of useful extension methods for free.
An IList can be thought of as a particular implementation of IEnumerable. (One that can be added to and removed from easily.) There are others, such as IDictionary, which performs an entirely different function but can still be enumerated over. Generally, I would use IEnumerable as a more generic type reference when I only need an enumeration to satisfy a requirement and don't particularly care what kind it is. I can pass it an IList and more often than not I do just that, but the flexibility exists to pass it other enumerations as well.
Here is one situation that I think I have to implement IEnumerable but not using List<>
I want to get all items from a remote server. Let say I have one million items going to return. If you use List<> approach, you need to cache all one million items in the memory first. In some cases, you don't really want to do that because you don't want to use up too much memory. Using IEnumerable allows you to display the data on the screen and then dispose it right away. Therefore, using IEnumerable approach, the memory footprint of the program is much smaller.
It's my understanding that IEnumerable is provided to you as an interface for creating your own enumerable class types.
I believe a simple example of this would be recreating the List type, if you wanted to have your own set of features (or lack thereof) for it.
What if you want to enumerate over a collection that is potentially of infinite size, such as the Fibonacci numbers? You couldn't do that easily with a list, but if you had a class that implemented IEnumerable or IEnumerable<T>, it becomes easy.
When a built in container fits your needs you should definitely use that, and than IEnumerable comes for free. When for whatever reason you have to implement your own container, for example if it must be backed by a DB, than you should make sure to implement both IEnumerable and IEnumerable<T> for two reasons:
It makes foreach work, which is awesome
It enables almost all LINQ goodness. For example you will be able to filter your container down to objects that match a condition with an elegant one liner.
IEnumerable provides means for your API users (including yourself) to use your collection by the means of a foreach. For example, i implemented IENumerable in my Binary Tree class so i could just foreach over all of the items in the tree without having to Ctrl+C Ctrl+V all the logic required to traverse the tree InOrder.
Hope it helps :)
IEnumerable is useful if you have a collection or method which can return a bunch of things, but isn't a Dictionary, List, array, or other such predefined collection. It is especially useful in cases where the set of things to be returned might not be available when one starts outputting it. For example, an object to access records in a database might implement iEnumerable. While it might be possible for such an object to read all appropriate records into an array and return that, that may be impractical if there are a lot of records. Instead, the object could return an enumerator which could read the records in small groups and return them individually.

.Net Data structures: ArrayList, List, HashTable, Dictionary, SortedList, SortedDictionary -- Speed, memory, and when to use each? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
.NET has a lot of complex data structures. Unfortunately, some of them are quite similar and I'm not always sure when to use one and when to use another. Most of my C# and VB books talk about them to a certain extent, but they never really go into any real detail.
What's the difference between Array, ArrayList, List, Hashtable, Dictionary, SortedList, and SortedDictionary?
Which ones are enumerable (IList -- can do 'foreach' loops)? Which ones use key/value pairs (IDict)?
What about memory footprint? Insertion speed? Retrieval speed?
Are there any other data structures worth mentioning?
I'm still searching for more details on memory usage and speed (Big-O notation)
Off the top of my head:
Array* - represents an old-school memory array - kind of like a alias for a normal type[] array. Can enumerate. Can't grow automatically. I would assume very fast insert and retrival speed.
ArrayList - automatically growing array. Adds more overhead. Can enum., probably slower than a normal array but still pretty fast. These are used a lot in .NET
List - one of my favs - can be used with generics, so you can have a strongly typed array, e.g. List<string>. Other than that, acts very much like ArrayList
Hashtable - plain old hashtable. O(1) to O(n) worst case. Can enumerate the value and keys properties, and do key/val pairs
Dictionary - same as above only strongly typed via generics, such as Dictionary<string, string>
SortedList - a sorted generic list. Slowed on insertion since it has to figure out where to put things. Can enum., probably the same on retrieval since it doesn't have to resort, but deletion will be slower than a plain old list.
I tend to use List and Dictionary all the time - once you start using them strongly typed with generics, its really hard to go back to the standard non-generic ones.
There are lots of other data structures too - there's KeyValuePair which you can use to do some interesting things, there's a SortedDictionary which can be useful as well.
If at all possible, use generics. This includes:
List instead of ArrayList
Dictionary instead of HashTable
First, all collections in .NET implement IEnumerable.
Second, a lot of the collections are duplicates because generics were added in version 2.0 of the framework.
So, although the generic collections likely add features, for the most part:
List is a generic implementation of ArrayList.
Dictionary<T,K> is a generic implementation of Hashtable
Arrays are a fixed size collection that you can change the value stored at a given index.
SortedDictionary is an IDictionary<T,K> that is sorted based on the keys.
SortedList is an IDictionary<T,K> that is sorted based on a required IComparer.
So, the IDictionary implementations (those supporting KeyValuePairs) are:
Hashtable
Dictionary<T,K>
SortedList<T,K>
SortedDictionary<T,K>
Another collection that was added in .NET 3.5 is the Hashset. It is a collection that supports set operations.
Also, the LinkedList is a standard linked-list implementation (the List is an array-list for faster retrieval).
Here are a few general tips for you:
You can use foreach on types that implement IEnumerable. IList is essentially an IEnumberable with Count and Item (accessing items using a zero-based index) properties. IDictionary on the other hand means you can access items by any-hashable index.
Array, ArrayList and List all implement IList.
Dictionary, SortedDictionary, and Hashtable implement IDictionary.
If you are using .NET 2.0 or higher, it is recommended that you use generic counterparts of mentioned types.
For time and space complexity of various operations on these types, you should consult their documentation.
.NET data structures are in System.Collections namespace. There are type libraries such as PowerCollections which offer additional data structures.
To get a thorough understanding of data structures, consult resources such as CLRS.
.NET data structures:
More to conversation about why ArrayList and List are actually different
Arrays
As one user states, Arrays are the "old school" collection (yes, arrays are considered a collection though not part of System.Collections). But, what is "old school" about arrays in comparison to other collections, i.e the ones you have listed in your title (here, ArrayList and List(Of T))? Let's start with the basics by looking at Arrays.
To start, Arrays in Microsoft .NET are, "mechanisms that allow you to treat several [logically-related] items as a single collection," (see linked article). What does that mean? Arrays store individual members (elements) sequentially, one after the other in memory with a starting address. By using the array, we can easily access the sequentially stored elements beginning at that address.
Beyond that and contrary to programming 101 common conceptions, Arrays really can be quite complex:
Arrays can be single dimension, multidimensional, or jadded (jagged arrays are worth reading about). Arrays themselves are not dynamic: once initialized, an array of n size reserves enough space to hold n number of objects. The number of elements in the array cannot grow or shrink. Dim _array As Int32() = New Int32(100) reserves enough space on the memory block for the array to contain 100 Int32 primitive type objects (in this case, the array is initialized to contain 0s). The address of this block is returned to _array.
According to the article, Common Language Specification (CLS) requires that all arrays be zero-based. Arrays in .NET support non-zero-based arrays; however, this is less common. As a result of the "common-ness" of zero-based arrays, Microsoft has spent a lot of time optimizing their performance; therefore, single dimension, zero-based (SZs) arrays are "special" - and really the best implementation of an array (as opposed to multidimensional, etc.) - because SZs have specific intermediary language instructions for manipulating them.
Arrays are always passed by reference (as a memory address) - an important piece of the Array puzzle to know. While they do bounds checking (will throw an error), bounds checking can also be disabled on arrays.
Again, the biggest hindrance to arrays is that they are not re-sizable. They have a "fixed" capacity. Introducing ArrayList and List(Of T) to our history:
ArrayList - non-generic list
The ArrayList (along with List(Of T) - though there are some critical differences, here, explained later) - is perhaps best thought of as the next addition to collections (in the broad sense). ArrayList inherit from the IList (a descendant of 'ICollection') interface. ArrayLists, themselves, are bulkier - requiring more overhead - than Lists.
IList does enable the implementation to treat ArrayLists as fixed-sized lists (like Arrays); however, beyond the additional functionallity added by ArrayLists, there are no real advantages to using ArrayLists that are fixed size as ArrayLists (over Arrays) in this case are markedly slower.
From my reading, ArrayLists cannot be jagged: "Using multidimensional arrays as elements... is not supported". Again, another nail in the coffin of ArrayLists. ArrayLists are also not "typed" - meaning that, underneath everything, an ArrayList is simply a dynamic Array of Objects: Object[]. This requires a lot of boxing (implicit) and unboxing (explicit) when implementing ArrayLists, again adding to their overhead.
Unsubstantiated thought: I think I remember either reading or having heard from one of my professors that ArrayLists are sort of the bastard conceptual child of the attempt to move from Arrays to List-type Collections, i.e. while once having been a great improvement to Arrays, they are no longer the best option as further development has been done with respect to collections
List(Of T): What ArrayList became (and hoped to be)
The difference in memory usage is significant enough to where a List(Of Int32) consumed 56% less memory than an ArrayList containing the same primitive type (8 MB vs. 19 MB in the above gentleman's linked demonstration: again, linked here) - though this is a result compounded by the 64-bit machine. This difference really demonstrates two things: first (1), a boxed Int32-type "object" (ArrayList) is much bigger than a pure Int32 primitive type (List); second (2), the difference is exponential as a result of the inner-workings of a 64-bit machine.
So, what's the difference and what is a List(Of T)? MSDN defines a List(Of T) as, "... a strongly typed list of objects that can be accessed by index." The importance here is the "strongly typed" bit: a List(Of T) 'recognizes' types and stores the objects as their type. So, an Int32 is stored as an Int32 and not an Object type. This eliminates the issues caused by boxing and unboxing.
MSDN specifies this difference only comes into play when storing primitive types and not reference types. Too, the difference really occurs on a large scale: over 500 elements. What's more interesting is that the MSDN documentation reads, "It is to your advantage to use the type-specific implementation of the List(Of T) class instead of using the ArrayList class...."
Essentially, List(Of T) is ArrayList, but better. It is the "generic equivalent" of ArrayList. Like ArrayList, it is not guaranteed to be sorted until sorted (go figure). List(Of T) also has some added functionality.
I found "Choose a Collection" section of Microsoft Docs on Collection and Data Structure page really useful
C# Collections and Data Structures : Choose a collection
And also the following matrix to compare some other features
I sympathise with the question - I too found (find?) the choice bewildering, so I set out scientifically to see which data structure is the fastest (I did the test using VB, but I imagine C# would be the same, since both languages do the same thing at the CLR level). You can see some benchmarking results conducted by me here (there's also some discussion of which data type is best to use in which circumstances).
They're spelled out pretty well in intellisense. Just type System.Collections. or System.Collections.Generics (preferred) and you'll get a list and short description of what's available.
Hashtables/Dictionaries are O(1) performance, meaning that performance is not a function of size. That's important to know.
EDIT: In practice, the average time complexity for Hashtable/Dictionary<> lookups is O(1).
The generic collections will perform better than their non-generic counterparts, especially when iterating through many items. This is because boxing and unboxing no longer occurs.
An important note about Hashtable vs Dictionary for high frequency systematic trading engineering: Thread Safety Issue
Hashtable is thread safe for use by multiple threads.
Dictionary public static members are thread safe, but any instance members are not guaranteed to be so.
So Hashtable remains the 'standard' choice in this regard.
There are subtle and not-so-subtle differences between generic and non-generic collections. They merely use different underlying data structures. For example, Hashtable guarantees one-writer-many-readers without sync. Dictionary does not.

ASP.NET C# Lists Which and When?

In C# There seem to be quite a few different lists. Off the top of my head I was able to come up with a couple, however I'm sure there are many more.
List<String> Types = new List<String>();
ArrayList Types2 = new ArrayList();
LinkedList<String> Types4 = new LinkedList<String>();
My question is when is it beneficial to use one over the other?
More specifically I am returning lists of unknown size from functions and I was wondering if there is a particular list that was better at this.
List<String> Types = new List<String>();
LinkedList<String> Types4 = new LinkedList<String>();
are generic lists, i.e. you define the data type that would go in there which decreased boxing and un-boxing.
for difference in list vs linklist, see this --> When should I use a List vs a LinkedList
ArrayList is a non-generic collection, which can be used to store any type of data type.
99% of the time List is what you'll want. Avoid the non-generic collections at all costs.
LinkedList is useful for adding or removing without shuffling items around, although you have to forego random access as a result. One advantage it does have is you can remove items whilst iterating through the nodes.
ArrayList is a holdover from before Generics. There's really no reason to use them ... they're slow and use more memory than List<>. In general, there's probably no reason to use LinkedList either unless you are inserting midway through VERY large lists.
The only thing you'll find in .NET faster than a List<> is a fixed array ... but the performance difference is surprisingly small.
See the article on Commonly Used Collection Types from MSDN for a list of the the various types of collections available to you, and their intended uses.
ArrayList is a .Net 1.0 list type.
List is a generic list introduced with generics in .Net 2.0.
Generic lists provide better compile time support. Generics lists are type safe. You cannot add objects of wrong type. Therefor you know which type the stored objects has. There are no typechecks and typecasts nessecary.
I dont know about performance differences.
This questions says something about the difference of List and LinkedList.
As mentioned, don't use ArrayList if at all possible.
Here's an bit on Wikipedia about the differences between arrays and linked lists.
In summary:
Arrays
Fast random access
Fast inserting/deleting at end
Good memory locality
Linked Lists
Fast inserting/deleting at beginning
Fast inserting/deleting at end
Fast inserting/deleting at middle (with enumerator)
Generally, use List. Don't use ArrayList; it's obsolete. Use LinkedList in the rare cases where you need to be able to add without resizing and don't mind the overhead and loss of random access.
ArrayList is probably smaller, memory-wise, since it is based on an array. It also has fast random-access to elements. However, adding or removing to the list will take longer. This might be sped up slightly if the object over-allocates under the assumption that you are going to keep adding. (That will, of course, reduce the memory advantage.)
The other lists will be slightly larger (4-to-8 bytes more memory per element), and will have poor random access times. However, it is very fast to add or remove objects to the ends of the list. Also, memory usage is usually spot-on for what you need.

Categories