I'm new to C#, but I never understood what the () at the end of a dictionary means!
Let's say I have a class, and I declare a field like this:
public Dictionary<InventoryType, Inventory> Inventory { get; private set; }
And then, on the constructor, I have:
Inventory = new Dictionary<InventoryType, Inventory>(6);
What does the 6 at the end means? Thank you!
From MSDN:
Dictionary(Int32) Initializes a new instance of the Dictionary class that is empty, has the specified initial capacity, and uses the default equality comparer for the key type.
So it's the initial size of the dictionary.
Look at this question for why you would want to set the initial capacity.
That 6 is the initial capacity. It's a soft indication for the collection how big it will be, it's not a limit of any kind.
When you know that a collection will hold a certain number of elements it is slightly more efficient to allocate the space for those elements at once. Otherwise the collection will use an algorithm to grow but that involves copying data.
Most collections start with capcity=4 and when that is filled they grow to 8, then to 16 etc. Each time a new internal array (depends on the actuial collection) is allocated and the existing data has to be copied.
And that means that 6 is rather low, and hardly worthwile.
Related
Certain collection types in .Net have an optional "Initial Capacity" constructor parameter. For example:
Dictionary<string, string> something = new Dictionary<string,string>(20);
List<string> anything = new List<string>(50);
I can't seem to find what the default initial capacity is for these objects on MSDN.
If I know I will only be storing 12 or so items in a dictionary, doesn't it make sense to set the initial capacity to something like 20?
My reasoning is, assuming that the capacity grows like it does for a StringBuilder, which doubles each time the capacity is hit, and each reallocation is costly, why not pre-set the size to something you know will hold your data, with some extra room just in case? If the initial capacity is 100, and I know I will only need a dozen or so, it seems as though the rest of that memory is allocated for nothing.
If the default values are not documented, the reason is likely that the optimal initial capacity is an implementation detail and subject to change between framework versions. That is, you shouldn't write code that assumes a certain default value.
The constructor overloads with a capacity are for cases in which you know better than the class what number of items are to be expected. For example, if you create a collection of 50 values and know that this number will never increase, you can initialize the collection with a capacity of 50, so it won't have to resize if the default capacity is lower.
That said, you can determine the default values using Reflector. For example, in .NET 4.0 (and probably previous versions as well),
a List<T> is initialized with a capacity of 0. When the first item is added, it is reinitialized to a capacity of 4. Subsequently, whenever the capacity is reached, the capacity is doubled.
a Dictionary<T> is intialized with a capacity of 0 as well. But it uses a completely different algorithm to increase the capacity: it increases the capacity always to prime numbers.
If you know the size, then tell it; a minor optimisation in most "small" cases, but useful for bigger collections. I would mainly worry about this if I am throwing a "decent" amount of data in, as it can then avoid having to allocate, copy and collect multiple arrays.
Most collections indeed use a doubling strategy.
Checking the source, the default capacity for both List<T> and Dictionary<TKey, TValue> is 0.
Another issue with the ConcurrentDictionary (currently) and using its constructor to set an initial size is that its performance appears to be hindered.
For example, here's some example code and benchmarks I tried.
I ran the code on my machine and got similar results.
That is, when the initial size is specified, it does nothing to increase the ConcurrentDictionary's speed when adding objects. Technically, I think it should because it doesn't have to take time or resources to resize itself.
Yes, it may not run as fast as a normal Dictionary, but I would still expect a ConcurrentDictionary with its initial size set to have consistent, faster performance than a ConcurrentDictionary that doesn't have its initial size set, especially when one knows in advance the number of items that are going to be added to it.
So the moral of the story is setting the initial size doesn't always guarantee a performance improvement.
Use this regular expression new List[<].*[>][(\(\))?[ ]+[{] in Visual Studio Ctrl+Shift+F with regular expression option on to search all lists that you might have to add an initial capacity to it ;-)
I have a class, CircularBuffer, which has a method CreateBuffer. The class does a number of things but occasionally I need to change the size of an array that is used in the class. I do not need the data any longer. Here is the class:
static class CircularBuffer
{
static Array[,] buffer;
static int columns, rows;
public static void CreateBuffer(int columns, int rows)
{
buffer = new Array[rows,columns];
}
//other methods that use the buffer
}
Now the size of the buffer is up to 100 x 2048 floats. Is this going to cause any memory issues, or will it be automatically replaced with no issues?
Thanks
You are, technically speaking, not recreating anything. You are simply creating a new array and overwriting the variable's value (the address, so to speak, of the array its referencing).
It's important therefore that you distinguish what you are really replacing; you are not replacing the array, only the reference to it.
Problems? None. By your code, the old array will not be reachable anymore and will therefore be eligible for collection by the GC. If the collection ever happens is up to the GC but its not something you should worry about.
What data structure is used in building arraylist, as we are able to add/delete values dynamically on it.
I was assuming that its using linkedlist but after doing some google, i found that its using vector.. but no more details about it.
On modern processors, the memory cache is king. Using the cache efficiently makes an enormous difference, the processor can easily be stalled for hundreds of cycles when the program accesses an address whose content isn't cached, waiting for the very slow memory bus to supply the data.
Accessing memory is most efficient when you access it sequentially. The odds that a byte will be available in the cache is then the greatest, it is very likely to be present in the same cache line. Which makes arrays by far the most efficient collection object, assuming you index the array elements sequentially.
Accordingly, all .NET collection classes except LinkedList use arrays to store data. Including hashed collections (Hashtable, Dictionary, Hashset), they use an array of arrays. Including ArrayList. LinkedList should be avoided due to its very poor cache locality, except when cheap inserts and deletes at random known locations is the primary concern.
A problem with arrays is that their size is fixed which makes it difficult to implement auto-sizing collections, like ArrayList. This is solved by intentionally wasting address space. Whenever the array fills up to capacity, the array is reallocated and the elements are copied. The reallocation is double the previous size, you can observe this from the Capacity property. While this sounds expensive, the algorithm is amortized O(1) and the virtual memory sub-system in the operating system ensures that you don't actually pay for memory that you don't use.
You can avoid the not-so-cheap copying by guessing the Capacity up front. More details about that in this answer.
Arraylist internally use arrays to store the data and resize the array when ever needed.
The java implementation of Arraylist internally creates an array with initial size and resizes the array.
You can see the implementation here:
http://www.docjar.com/html/api/java/util/ArrayList.java.html
This is for Java, but the concepts are same for .NET.
From the MSDN page:
Implements the IList interface using an array whose size is dynamically increased as required.
Some of the benefits of using the class instead of an array directly are:
it can be used anywhere an IList
it handles the resizing and copying for you when adding/removing items from the middle of the array
it keeps track of the 'last' item in the array
it provides methods for binary searching for items in the array
See here: ArrayList source
As already mentioned, it is an array underneath.
private object[] _items;
Here is the Add() method:
public virtual int Add(object value)
{
if (this._size == this._items.Length)
{
this.EnsureCapacity(this._size + 1);
}
this._items[this._size] = value;
ArrayList expr_2D = this;
ArrayList arg_2E_0 = expr_2D;
expr_2D._version = arg_2E_0._version + 1;
ArrayList expr_3B = this;
ArrayList arg_3C_0 = expr_3B;
ArrayList arg_45_0 = expr_3B;
int expr_41 = arg_3C_0._size;
int arg_42_0 = expr_41;
int arg_44_0 = expr_41;
int i = arg_42_0;
arg_45_0._size = arg_44_0 + 1;
return i;
}
As you can see, EnsureCapacity is called...which ends up calling set_Capacity:
public virtual void set_Capacity(int value)
{
if (value < this._size)
{
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_SmallCapacity"));
}
if (value != this._items.Length)
{
if (value <= 0)
{
this._items = new object[4];
goto IL_65;
}
object[] array = new object[value];
if (this._size > 0)
{
Array.Copy(this._items, 0, array, 0, this._size);
}
this._items = array;
return;
}
IL_65:
}
Where the entire array is copied to a larger array if the capacity needs to be increased.
The ArrayList stores values internally as an array of objects and exposes some public helper methods to make working with the array easier (exposed via the IList interface).
When items are inserted, all the elements to the right of the insertion point are moved to the right, making inserts rather inefficient. Appends, on the other hand, are quick because there is no need to shift elements (unless the internal array has reached capacity, in which case it is replaced with a larger array).
Because the values are stored internally as an array, it provides the advantages of arrays (such as efficient searches if the values are sorted).
Certain collection types in .Net have an optional "Initial Capacity" constructor parameter. For example:
Dictionary<string, string> something = new Dictionary<string,string>(20);
List<string> anything = new List<string>(50);
I can't seem to find what the default initial capacity is for these objects on MSDN.
If I know I will only be storing 12 or so items in a dictionary, doesn't it make sense to set the initial capacity to something like 20?
My reasoning is, assuming that the capacity grows like it does for a StringBuilder, which doubles each time the capacity is hit, and each reallocation is costly, why not pre-set the size to something you know will hold your data, with some extra room just in case? If the initial capacity is 100, and I know I will only need a dozen or so, it seems as though the rest of that memory is allocated for nothing.
If the default values are not documented, the reason is likely that the optimal initial capacity is an implementation detail and subject to change between framework versions. That is, you shouldn't write code that assumes a certain default value.
The constructor overloads with a capacity are for cases in which you know better than the class what number of items are to be expected. For example, if you create a collection of 50 values and know that this number will never increase, you can initialize the collection with a capacity of 50, so it won't have to resize if the default capacity is lower.
That said, you can determine the default values using Reflector. For example, in .NET 4.0 (and probably previous versions as well),
a List<T> is initialized with a capacity of 0. When the first item is added, it is reinitialized to a capacity of 4. Subsequently, whenever the capacity is reached, the capacity is doubled.
a Dictionary<T> is intialized with a capacity of 0 as well. But it uses a completely different algorithm to increase the capacity: it increases the capacity always to prime numbers.
If you know the size, then tell it; a minor optimisation in most "small" cases, but useful for bigger collections. I would mainly worry about this if I am throwing a "decent" amount of data in, as it can then avoid having to allocate, copy and collect multiple arrays.
Most collections indeed use a doubling strategy.
Checking the source, the default capacity for both List<T> and Dictionary<TKey, TValue> is 0.
Another issue with the ConcurrentDictionary (currently) and using its constructor to set an initial size is that its performance appears to be hindered.
For example, here's some example code and benchmarks I tried.
I ran the code on my machine and got similar results.
That is, when the initial size is specified, it does nothing to increase the ConcurrentDictionary's speed when adding objects. Technically, I think it should because it doesn't have to take time or resources to resize itself.
Yes, it may not run as fast as a normal Dictionary, but I would still expect a ConcurrentDictionary with its initial size set to have consistent, faster performance than a ConcurrentDictionary that doesn't have its initial size set, especially when one knows in advance the number of items that are going to be added to it.
So the moral of the story is setting the initial size doesn't always guarantee a performance improvement.
Use this regular expression new List[<].*[>][(\(\))?[ ]+[{] in Visual Studio Ctrl+Shift+F with regular expression option on to search all lists that you might have to add an initial capacity to it ;-)
Say I have a rolling collection of values where I specify the size of the collection and any time a new value is added, any old values beyond this specified size are dropped off. Obviously (and I've tested this) the best type of collection to use for this behavior is a Queue:
myQueue.Enqueue(newValue)
If myQueue.Count > specifiedSize Then myQueue.Dequeue()
However, what if I want to calculate the difference between the first and last items in the Queue? Obviously I can't access the items by index. But to switch from a Queue to something implementing IList seems like overkill, as does writing a new Queue-like class. Right now I've got:
Dim firstValue As Integer = myQueue.Peek()
Dim lastValue As Integer = myQueue.ToArray()(myQueue.Count - 1)
Dim diff As Integer = lastValue - firstValue
That call to ToArray() bothers me, but a superior alternative isn't coming to me. Any suggestions?
One thing you could do is have a temporary variable that stores the value that was just enqueued because that will be the last value and so the variable can be accessed to get that value.
Seems to me if you need quick access to the first item in the list, then you're using the wrong data structure. Switch a LinkedList instead, which conveniently has First and Last properties.
Be sure you only add and remove items to the linked list using AddLast and RemoveFirst to maintain the Queue property. To prevent yourself from inadvertantly violating the Queue property, consider creating a wrapper class around the linked list and exposing only the properties you need from your queue.
public class LastQ<T> : Queue<T>
{
public T Last { get; private set; }
public new void Enqueue(T item)
{
Last = item;
base.Enqueue(item);
}
}
Edit:
Obviously this basic class should be more robust to do things like protect the Last property on an empty queue. But this should be enough for the basic idea.
You could use a deque (double-ended queue).
I don't think there is one built into System.Collections(.Generic) but here's some info on the data structure. If you implemented something like this you could just use PeekLeft() and PeekRight() to get the first and last values.
Of course, it will be up to you whether or not implementing your own deque is preferable to dealing with the unsexiness of ToArray(). :)
http://www.codeproject.com/KB/recipes/deque.aspx
Your best bet would be to keep track of the last value added to the Queue, then use the myQueue.Peek() function to see the "first" (meaning next) item in the list without removing it.