What data structure is used in building arraylist, as we are able to add/delete values dynamically on it.
I was assuming that its using linkedlist but after doing some google, i found that its using vector.. but no more details about it.
On modern processors, the memory cache is king. Using the cache efficiently makes an enormous difference, the processor can easily be stalled for hundreds of cycles when the program accesses an address whose content isn't cached, waiting for the very slow memory bus to supply the data.
Accessing memory is most efficient when you access it sequentially. The odds that a byte will be available in the cache is then the greatest, it is very likely to be present in the same cache line. Which makes arrays by far the most efficient collection object, assuming you index the array elements sequentially.
Accordingly, all .NET collection classes except LinkedList use arrays to store data. Including hashed collections (Hashtable, Dictionary, Hashset), they use an array of arrays. Including ArrayList. LinkedList should be avoided due to its very poor cache locality, except when cheap inserts and deletes at random known locations is the primary concern.
A problem with arrays is that their size is fixed which makes it difficult to implement auto-sizing collections, like ArrayList. This is solved by intentionally wasting address space. Whenever the array fills up to capacity, the array is reallocated and the elements are copied. The reallocation is double the previous size, you can observe this from the Capacity property. While this sounds expensive, the algorithm is amortized O(1) and the virtual memory sub-system in the operating system ensures that you don't actually pay for memory that you don't use.
You can avoid the not-so-cheap copying by guessing the Capacity up front. More details about that in this answer.
Arraylist internally use arrays to store the data and resize the array when ever needed.
The java implementation of Arraylist internally creates an array with initial size and resizes the array.
You can see the implementation here:
http://www.docjar.com/html/api/java/util/ArrayList.java.html
This is for Java, but the concepts are same for .NET.
From the MSDN page:
Implements the IList interface using an array whose size is dynamically increased as required.
Some of the benefits of using the class instead of an array directly are:
it can be used anywhere an IList
it handles the resizing and copying for you when adding/removing items from the middle of the array
it keeps track of the 'last' item in the array
it provides methods for binary searching for items in the array
See here: ArrayList source
As already mentioned, it is an array underneath.
private object[] _items;
Here is the Add() method:
public virtual int Add(object value)
{
if (this._size == this._items.Length)
{
this.EnsureCapacity(this._size + 1);
}
this._items[this._size] = value;
ArrayList expr_2D = this;
ArrayList arg_2E_0 = expr_2D;
expr_2D._version = arg_2E_0._version + 1;
ArrayList expr_3B = this;
ArrayList arg_3C_0 = expr_3B;
ArrayList arg_45_0 = expr_3B;
int expr_41 = arg_3C_0._size;
int arg_42_0 = expr_41;
int arg_44_0 = expr_41;
int i = arg_42_0;
arg_45_0._size = arg_44_0 + 1;
return i;
}
As you can see, EnsureCapacity is called...which ends up calling set_Capacity:
public virtual void set_Capacity(int value)
{
if (value < this._size)
{
throw new ArgumentOutOfRangeException("value", Environment.GetResourceString("ArgumentOutOfRange_SmallCapacity"));
}
if (value != this._items.Length)
{
if (value <= 0)
{
this._items = new object[4];
goto IL_65;
}
object[] array = new object[value];
if (this._size > 0)
{
Array.Copy(this._items, 0, array, 0, this._size);
}
this._items = array;
return;
}
IL_65:
}
Where the entire array is copied to a larger array if the capacity needs to be increased.
The ArrayList stores values internally as an array of objects and exposes some public helper methods to make working with the array easier (exposed via the IList interface).
When items are inserted, all the elements to the right of the insertion point are moved to the right, making inserts rather inefficient. Appends, on the other hand, are quick because there is no need to shift elements (unless the internal array has reached capacity, in which case it is replaced with a larger array).
Because the values are stored internally as an array, it provides the advantages of arrays (such as efficient searches if the values are sorted).
Related
I have a class, CircularBuffer, which has a method CreateBuffer. The class does a number of things but occasionally I need to change the size of an array that is used in the class. I do not need the data any longer. Here is the class:
static class CircularBuffer
{
static Array[,] buffer;
static int columns, rows;
public static void CreateBuffer(int columns, int rows)
{
buffer = new Array[rows,columns];
}
//other methods that use the buffer
}
Now the size of the buffer is up to 100 x 2048 floats. Is this going to cause any memory issues, or will it be automatically replaced with no issues?
Thanks
You are, technically speaking, not recreating anything. You are simply creating a new array and overwriting the variable's value (the address, so to speak, of the array its referencing).
It's important therefore that you distinguish what you are really replacing; you are not replacing the array, only the reference to it.
Problems? None. By your code, the old array will not be reachable anymore and will therefore be eligible for collection by the GC. If the collection ever happens is up to the GC but its not something you should worry about.
I'm trying to debug the reason for high CPU usage in our legacy website, and from looking at some analysis in DebugDiag, I suspect that the amount of objects on the LOH, and subsequent GC collections, could be a reason. In one .dbg file we have ~3.5gb on the LOH with the majority of those objects being strings.
I'm aware that for objects to go on the LOH, they must be over 85000 bytes.
What I'm not sure of is if this refers to, for example, a single array. Or can it refer to a large object graph?
What I mean by that is if I have object Foo, that contains n other objects, each containing n objects themselves. With each of these objects containing strings, and the total size of Foo (and all child objects) being greater than 85000 bytes would Foo be placed on the LOH? Or, if somewhere in the Foo object graph there was a single array greater than 85000 bytes would it just be that array that got placed on the LOH?
Thanks.
You are right that if array is larger than 85000 then it will be consider as LOH not entire object. To explain this here I created example.
class Program
{
static void Main(string[] args)
{
Obj o = new Obj();
o.Allocate(85000);
Console.WriteLine(System.GC.GetGeneration(o));
Console.WriteLine(System.GC.GetGeneration(o.items));
Console.WriteLine(System.GC.GetGeneration(o.items2));
Console.WriteLine(System.GC.GetGeneration(o.Data));
Console.ReadLine();
}
class Obj
{
public byte[] items = null;
public byte[] items2 = null;
public string Data = string.Empty;
public void Allocate(int i)
{
items = new byte[i];
items2 = new byte[10];
Data = System.Text.Encoding.UTF8.GetString(items);
}
}
}
Here if you notice that string data. It is also consider as LOH because string is character array. Items2 is not LOH and Items is LOH but actual object o is not LOH.
Only individual objects greater than 85,000 bytes go into the LOH. An object graph can total more than that but not be in the LOH so long as no individual object crosses the limit. Byte arrays and strings are the most typical culprits.
I agree with #daspek and #dotnetstep.
For instance in #dotnetstep's example, o should have a size of 12bytes on a 32bit machine and 24bytes on a 64bit machine.
All the fields defined in the Obj class are reference types and so all Obj stores are pointers to the heap locations where its child elements (in this case arrays) are created.
Now each of these objects are arrays (string being an array of char[]) so they can get placed on the LOH if they exceed the limit of 85000bytes.
The only way a regular object can get placed on the LOH everytime it is created is if it has 21251 or more fields on a 32 bit machine and 10626 or more fields on a 64 bit machine if all the fields are REFERENCE types. For all intents and purposes, that is not usually a practical kind of class create; it will be extremely rare to come across such a class.
In an instance where the contained fields are structs, they are considered to be part of the class's definition. So rather than hold a 4byte heap address, the contents of the struct are part of the class's layout - a good number of fields with large structs may not quickly get one to the limit of 85000bytes but it can still be breached.
I'm new to C#, but I never understood what the () at the end of a dictionary means!
Let's say I have a class, and I declare a field like this:
public Dictionary<InventoryType, Inventory> Inventory { get; private set; }
And then, on the constructor, I have:
Inventory = new Dictionary<InventoryType, Inventory>(6);
What does the 6 at the end means? Thank you!
From MSDN:
Dictionary(Int32) Initializes a new instance of the Dictionary class that is empty, has the specified initial capacity, and uses the default equality comparer for the key type.
So it's the initial size of the dictionary.
Look at this question for why you would want to set the initial capacity.
That 6 is the initial capacity. It's a soft indication for the collection how big it will be, it's not a limit of any kind.
When you know that a collection will hold a certain number of elements it is slightly more efficient to allocate the space for those elements at once. Otherwise the collection will use an algorithm to grow but that involves copying data.
Most collections start with capcity=4 and when that is filled they grow to 8, then to 16 etc. Each time a new internal array (depends on the actuial collection) is allocated and the existing data has to be copied.
And that means that 6 is rather low, and hardly worthwile.
I have been looking at .NET libraries using ILSpy and have come across List<T> class definition in System.Collections.Generic namespace. I see that the class uses methods like this one:
// System.Collections.Generic.List<T>
/// <summary>Removes all elements from the <see cref="T:System.Collections.Generic.List`1" />.</summary>
public void Clear()
{
if (this._size > 0)
{
Array.Clear(this._items, 0, this._size);
this._size = 0;
}
this._version++;
}
So, the Clear() method of the List<T> class actually uses Array.Clear method. I have seen many other List<T> methods that use Array stuff in the body.
Does this mean that List<T> is actually an undercover Array or List only uses some part of Array methods?
I know lists are type safe and don't require boxing/unboxing but this has confused me a bit.
The list class is not itself an array. In other words, it does not derive from an array. Instead it encapsulates an array that is used by the implementation to hold the list's member elements.
Since List<T> offers random access to its elements, and those elements are indexed 0..Count-1, using an array to store the elements is the obvious implementation.
This tends to surprise C++ programmers that know std::list. A linked list, covered in .NET as well with the LinkedList class. And has the same perf characteristics, O(1) for inserts and deletes.
You should however in general avoid it. Linked lists do not perform well on modern processors. Which greatly depend on the cpu caches to get reasonable performance with memory that's many times slower than the execution core. A simple array is by far the data structure that takes most advantage of the cache. Accessing an element gives very high odds that subsequent elements are present in the cache as well. That is not the case for a linked list, elements tend to be scattered throughout the address space, make a cache miss likely. They can be very expensive, as much as 200 cycles with the cpu doing nothing but waiting on the memory sub-system to supply the data.
But do keep the perf characteristics in mind, adding or removing an element that is not at the end of the List costs O(n), just like an array. And a large List can generate a lot of garbage as the array needs to be expanded, setting the Capacity property up front can help a lot to avoid that. More about that in this answer. And otherwise the exact same concerns for std::vector<>.
Yes, List<T> uses an array internally to store the items, although in most cases the array is actually larger than the number of elements in the collection -- it has some extra "padding" at the end so that you can add new items without it having to reallocate memory every time. It keeps track of the actual size of the collection with a separate field (you can see this._size in your generated code). When you add more elements than the current array has room for, it will automatically allocate a new larger array -- twice as big, I think -- and copy over all the existing elements.
If you're concerned about a List<T> using more memory than necessary, you can set the size of the array explicitly with the constructor override that accepts a capacity parameter, if you know the size in advance, or call the TrimExcess() method to make sure the array is (close to) to actual size of the collection.
Random access memory is an array, so in that sense all data structures from linked-lists to heaps and beyond, that rely on random-access to memory for their performance behaviour, are built on the array that is system memory. It is more a question of how many-levels of abstraction are in between.
Of course in a modern virtual memory machine, the random-access system memory is itself an abstraction built on a complicated virtual-memory model of multi-tier pipelined caches, non-cached RAM, and disk.
So, I need an array of items. And I was wondering which one would be the fastest/best to use (in c#), I'll be doing to following things:
Adding elements at the end
Removing elements at the start
Looking at the first and last element (every frame)
Clearing it occasionally
Converting it to a normal array (not a list. I'm using iTween and it asks a normal array.) I'll do this almost every frame.
So, what would be the best to use considering these things? Especially the last one, since I'm doing that every frame. Should I just use an array, or is there something else that converts very fast to array and also has easy adding/removing of elements at the start & end?
Requirements 1) and 2) point to a Queue<T>, it is the only standard collection optimized for these 2 operations.
3) You'll need a little trickery for getting at the Last element, First is Peek().
4) is simple (.Clear())
5) The standard .ToArray() method will do this.
You will not escape copying all elements (O(n)) for item 5)
You could take a look at LinkedList<T>.
It has O(1) support for inspecting, adding and removing items at the beginning or end. It requires O(n) to copy to an array, but that seems unavoidable. The copy could be avoided if the API you were using accepted an ICollection<T> or IEnumerable<T>, but if that can't be changed then you may be stuck with using ToArray.
If your list changes less than once per frame then you could cache the array and only call ToArray again if the list has changed since the previous frame. Here's an implementation of a few of the methods, to give you an idea of how this potential optimization can work:
private LinkedList<T> list = new LinkedList<T>();
private bool isDirty = true;
private T[] array;
public void Enqueue(T t)
{
list.AddLast(t);
isDirty = true;
}
public T[] ToArray()
{
if (isDirty)
{
array = list.ToArray();
isDirty = false;
}
return array;
}
I'm assuming you are using classes (and not structs)? (If you are using structs (value type) then that changes things a bit.)
The System.Collections.Generic.List class lets you do all that, and quickly. The only part that could be done better with a LinkedList is removing from the start, but a single block memory copy isn't much pain, and it will create arrays without any hassle.
I wouldn't recommend using a Linked List, especially if you are only removing from the start or end. Each addition (with the standard LinkedList collection) requires a memory allocation (it has to build an object to reference what you actually want to add).
Lists also have lots of convenient functions, which you need to be careful when using if performance is an issue. Lists are essentially arrays which get bigger as you add stuff (every time you overfill them, they get much bigger, which saves excessive memory operations). Clearing them requires no effort, and leaves the memory allocated to be used another day.
In personal experience, .NET isn't suited to generic Linked Lists, you need to be writing your code specifically to work with them throughout. Lists:
Are easy to use
Do everything you want
Won't leave your memory looking like swiss cheese (well, as best you can do when you are allocating a new array every frame - I recommend you give the garbage collector the chance to get rid of any old arrays before making a new one if these Arrays are going to be big by re-using any array references and nulling any you don't need).
The right choice will depend heavily on the specifics of the application, but List is always a safe bet if you ask me, and you won't have to write any structure specific code to get it working.
If you do feel like using Lists, you'll want to look into these methods and properties:
ToArray() // Makes those arrays you want
Clear() // Clears the array
Add(item) // Adds an item to the end
RemoveAt(index) // index 0 for the first item, .Count - 1 for the last
Count // Retrieves the number of items in the list - it's not a free lookup, so try an avoid needless requests
Sorry if this whole post is overkill.
How about a circular array? If you keep the index of the last element and the first, you can have O(1) for every criteria you gave.
EDIT: You could take a C++ vector approach for capacity: double the size when it gets full.
Regular List will do the work and it is faster than LinkedList for insert.
Adding elements at the end -> myList.Insert(myList.Count - 1)
Removing elements at the start -> myList.RemoveAt(0)
Looking at the first and last element (every frame) -> myList[0] or
myList[myList.Count - 1]
Clearing it occasionally -> myList.Clear()
Converting it to a normal array (not a list. I'm using iTween and it
asks a normal array.) I'll do this almost every frame. ->
myList.ToArray()