I have the following question:
List<int> list = new List<int>(10);
for (int i = 0; i < 10; i++)
list.Add(i);
Now list.Count and list.Capacity are 10. It's OK. But what will happen when I will try to remove first item?
list.RemoveAt(0);
Count is now 9 and Capacity still 10, but what happened inside list? List had to go through all the elements like:
list[0] = list[1];
list[1] = list[2];
// etc...
list[9] = null;
?
May be it could be better just to do by myself smthng like:
list[0] = list[list.Count - 1];
? But items order will be changed in this case.
And how long will list.RemoveAt(0) take if I have a List with 10000000 elements with a preinitialized length? Will there be any difference if List will not have preinited length?
UPD:
Looked to the source (didn't know that they are in free access o.O ):
// Removes the element at the given index. The size of the list is
// decreased by one.
//
public void RemoveAt(int index) {
if ((uint)index >= (uint)_size) {
ThrowHelper.ThrowArgumentOutOfRangeException();
}
_size--;
if (index < _size) {
Array.Copy(_items, index + 1, _items, index, _size - index);
}
_items[_size] = default(T);
_version++;
}
So it really has Array.Copy inside. What a pity.
Thanks to #TomTom.
What about you go int othe source of List and check and then write some tests? Obviously this is highly important to you. Anyhow, the ton of questions you have all make this quite too broad.
In general, since source are public if often helps to just look into them.
Take a look at the LinkedList. It's only O(1) to remove item from it
As you pointed out for a generic List, a RemoveAt(0) operation will take O(N) for a list of N items. (as it will process N items). This is because a List is backed by an array.
Per MSDN, removing index I from a List with count C takes C - I. You can use this to answer your question around the initial capacity (no it doesnt help)
You can use other data structures, like a LinkedList which is written as a linked list (as the name suggest) and will remove the 1st item in O(1). However, other operations are significantly worse than a List
This is what happens:
public void RemoveAt(int index) {
if ((uint)index >= (uint)_size) {
ThrowHelper.ThrowArgumentOutOfRangeException();
}
Contract.EndContractBlock();
_size--;
if (index < _size) {
Array.Copy(_items, index + 1, _items, index, _size - index);
}
_items[_size] = default(T);
_version++;
}
Look it up at:
http://referencesource.microsoft.com/#mscorlib/system/collections/generic/list.cs,3d46113cc199059a
Double linked list is the fastest, or use unsafe pointer change.
Related
I am exploring the fastest way to iterate through three sorted lists to find the position of the first item which is equal to or less than a double value. The lists contains two columns of doubles.
I have the two following working examples attached below, these are encompassed by a bigger while loop (which also modifies the currentPressure list changing the [0] value) value. But, considering the amount of rows (500,000+) being parsed by the bigger while loop, the code below is too slow (one iteration of the three while loops takes >20 ms).
"allPressures" contains all rows while currentPressure is modified by the remaining code. The while loops are used to align the time from the Flow, Rpm and Position lists to the Time in the pressure list.
In other words I am trying to find the quickest way to determine the x of
for instance
FlowList[x].Time =< currentPressure[0].Time
Any suggestions are greatly appreciated!
Examples:
for (int i = 0; i < allPressures.Count; i++)
{
if (FlowList[i].Time >= currentPressure[0].Time)
{
fl = i;
break;
}
}
for (int i = 0; i < allPressures.Count; i++)
{
if (RpmList[i].Time >= currentPressure[0].Time)
{
rp = i;
break;
}
}
for (int i = 0; i < allPressures.Count; i++)
{
if (PositionList[i].Time >= currentPressure[0].Time)
{
bp = i;
break;
}
}
Using while loop:
while (FlowList[fl].Time < currentPressure[0].Time)
{
fl++;
}
while (RpmList[rp].Time < currentPressure[0].Time)
{
rp++;
}
while (PositionList[bp].Time < currentPressure[0].Time)
{
bp++;
}
The problem is that your are doing a linear search. This means that in the worst case scenario your are iterating over all the elements in your lists. This gives you a computational complexity of O(3*n) where n is the length of your lists and 3 is the number of lists you are searching.
Since your lists are sorted you can use the much faster binary search which has a complexity of O(log(n)) and in your case O(3*log(n)).
Luckily you don't have to implement it yourself, because .NET offers the helper method List.BinarySearch(). You will need the one that takes a custom comparer, because you want to compare PressureData objects.
Since you are looking for the index of the closest value that's less than your search value, you'll have to use a little trick: when BinarySearch() doesn't find a matching value it returns the index of the next element that is larger than the search value. From this it's easy to find the previous element that is smaller than the search value.
Here is an extension method the implements this:
public static int FindMaxIndex<T>(
this List<T> sortedList, T inclusiveUpperBound, IComparer<T> comparer = null)
{
var index = sortedList.BinarySearch(inclusiveUpperBound, comparer);
// The max value was found in the list. Just return its index.
if (index >= 0)
return index;
// The max value was not found and "~index" is the index of the
// next value greater than the search value.
index = ~index;
// There are values in the list less than the search value.
// Return the index of the closest one.
if (index > 0)
return index - 1;
// All values in the list are greater than the search value.
return -1;
}
Test it at https://dotnetfiddle.net/kLZsM5
Use this method with a comparer that understands PressureData objects:
var pdc = Comparer<PressureData>.Create((x, y) => x.Time.CompareTo(y.Time));
var fl = FlowList.FindMaxIndex(currentPressure[0], pdc);
Here is a working example: https://dotnetfiddle.net/Dmgzsv
Let's say I have a list, messages, with three items. I wan't to loop through them and remove one item at a time.
for (int i = 0; i < messages.Count; i++)
{
messages.RemoveAt(i);
}
(I've removed lots of irrelevant code)
What happen to the remaining messages after the first iteration? Are they moved to another index or can I do it like this to remove all three messages?
Thank you
The index of all elements behind the index you remove will be decremented.
If you want to avoid this with your loop let it run in reverse (delete from highest index to lowest).
for (int i = messages.Count - 1; i >= 0; i--)
{
messages.RemoveAt(i);
}
or just use
messages.Clear()
to delete all elements at once without taking care about any indices.
If you just want to clear the List it's also more efficient to use Clear since it is a O(n) operation. RemoveAt is O(n) as well but inside another O(n) loop which makes it O(n^2) - not that it would matter with 3 elements as mentionend in your example but when talking about larger lists it would certainly make a difference.
In your code, it's simpler to just call messages.Clear();. There's no need to remove each element separately.
Your code will skip every other element as it removes them until the for loop's conditional is no longer met. It will remove the elements at indexes 0 and 2 because you said your collection has three elements.
Let's step through your algorithm:
Initially, the list has three items, listed with their indexes: 0: "Hello", 1: "World", and 2: "Foo".
Your loop removes the element at index 0. The list now looks like this:
0: "World", 1: "Foo"
However, your loop executes again, since i now equals 1 and 1 < 2. The element at index 1 is then removed:
0: "World"
i is incremented to 2 and the conditional is no longer met (i is not less than 1). Your list now consists of what used to be the second element.
You need to iterate backward
for (int i = messages.Count - 1; i >=0; i--)
{
messages.RemoveAt(i);
}
Because in your current loop, you will be left with one time, if your list contains 3 items.
If you want to remove all items from your list then there is a method List<T>.RemoveAll Method
They're moved, see MSDN on List<T>.RemoveAt method:
When you call RemoveAt to remove an item, the remaining items in the
list are renumbered to replace the removed item. For example, if you
remove the item at index 3, the item at index 4 is moved to the 3
position.
To remove all elements, the Clear method is more suitable.
Go thru reverse loop..
for(int i = messages.Count - 1; i >= 0 ; i--) {
messages.RemoveAt(i);
}
You could just changes it to always delete the first one
List<string> messages = new List<string>();
messages.Add("a");
messages.Add("b");
messages.Add("c");
for (int i = 0; i < messages.Count; i++)
{
messages.RemoveAt(0);
}
or to clear whole list in one statement
messages.Clear()
.NET Reference Source has following definition of RemoveAt method:
public void RemoveAt(int index)
{
if ((uint)index >= (uint)_size)
ThrowHelper.ThrowArgumentOutOfRangeException();
Contract.EndContractBlock();
_size--;
if (index < _size)
Array.Copy(_items, index + 1, _items, index, _size - index);
_items[_size] = default(T);
_version++;
}
As you can see - if you remove item which is not last one copying of array items occurs (all items from index + 1 till the end are moved). So in your case its better to remove items from the end to avoid array copying on each iteration:
for (int i = messages.Count - 1; i >= 0; i--)
{
messages.RemoveAt(i);
}
Or simply call messages.Clear() if you want to remove them all without additional logic - in that case internal array just cleared and size set to zero.
Like the other posts, you need to iterate backward
You've got many solutions to remove the items
messages.Clear();
or
while(messages.Count != 0){
message.RemoveAt(0);
}
I am initializing my list as below -
List<string> lFiles = new List<string>(12);
and now I want to add/insert my string at specific index.
like I am using below -
lFiles.Insert(6,"File.log.6");
it it throwing excepton as - "Index must be within the bounds of the List."
While initializing I have declared capacity of List but still I am not able insert strings at random indexes.
Anybody knows what I am missing??
The constructor that takes an int32 as parameter doesn't add items to the list, it just pre-allocates some capacity for better performances (this is implementation details). In your case, your list is still empty.
You are initializing the capacity of the list (basically setting the initial size of the internal array for performance purposes), but it does not actually add any elements to the list.
The easiest way to check this is try this:
var list1 = new List<int>();
var list2 = new List<int>(12);
Console.WriteLine(list1.Count); //output is 0
Console.WriteLine(list2.Count); //output is 0
This shows that you still don't have any elements in your list.
In order to initialize populate the array with default or blank elements, you need to actually put something into the list.
int count = 12;
int value = 0
List<T> list = new List<T>(count);
list.AddRange(Enumerable.Repeat(value, count));
There is small confusion with list. When you provide some capacity for constructor, it creates internal array of provided size and fills it with default values of T:
public List(int capacity)
{
if (capacity < 0)
throw new ArgumentException();
if (capacity == 0)
this._items = List<T>._emptyArray;
else
this._items = new T[capacity];
}
But list does not treat that default values as items added to list. Yep, that is confusing a little. Memory is allocated for array, but count of items in list still will be zero. You can check it:
List<string> lFiles = new List<string>(12);
Console.WriteLine(lFiles.Count); // 0
Console.WriteLine(lFiles.Capacity); // 12
Count does not returns size of internal data structure, it returns 'logical' size of list (i.e. number of items which was added and not removed):
public int Count
{
get { return this._size; }
}
And size is changed only when you add or remove items to list. E.g.
public void Add(T item)
{
if (this._size == this._items.Length)
this.EnsureCapacity(this._size + 1); // resize items array
this._items[this._size++] = item; // change size
this._version++;
}
When you are inserting some item at specific index, list does not checks if enough space allocated for items array (well it checks, but just for resizing inner array if current capacity is not enough). List verifies that there is enough items already contained in list (i.e. added, but not removed):
public void Insert(int index, T item)
{
if (index > this._size) // here you get an exception, because size is zero
throw new ArgumentOutOfRangeException();
if (this._size == this._items.Length)
this.EnsureCapacity(this._size + 1); // resize items
if (index < this._size)
Array.Copy(_items, index, this._items, index + 1, this._size - index);
this._items[index] = item;
this._size++;
this._version++;
}
The capacity is just a hint how many elements to expect. There are still no elements in your list.
I think you might want to use a new Dictionary<int, string>(), not a list.
That will let you use the int as a key to set and look up values by:
Otherwise, if you want to use position-based "list", you should just use an string-array instead (but note that that will not let you adjust the size automatically):
var arr = new string[12];
arr[6] = "string at position 6";
Edit: I will add some benchmark results. To about a 1000 - 5000 items in the list, IList and RemoveAt beats ISet and Remove, but that's not something to worry about since the differences are marginal. The real fun begins when collection size extends to 10000 and more. I'm posting only those data
I was answering a question here last night and faced a bizarre situation.
First a set of simple methods:
static Random rnd = new Random();
public static int GetRandomIndex<T>(this ICollection<T> source)
{
return rnd.Next(source.Count);
}
public static T GetRandom<T>(this IList<T> source)
{
return source[source.GetRandomIndex()];
}
------------------------------------------------------------------------------------------------------------------------------------
Let's say I'm removing N number of items from a collection randomly. I would write this function:
public static void RemoveRandomly1<T>(this ISet<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
var inList = source.ToList();
int i = 0;
while (source.Count > countToRemain)
{
source.Remove(inList.GetRandom());
i++;
}
}
or
public static void RemoveRandomly2<T>(this IList<T> source, int countToRemove)
{
int countToRemain = source.Count - countToRemove;
int j = 0;
while (source.Count > countToRemain)
{
source.RemoveAt(source.GetRandomIndex());
j++;
}
}
As you can see the first function is written for an ISet and the second for normal IList. In the first function I'm removing by item from ISet and by index in IList, both of which I believe are O(1). Why is the second function performing so much worse than the first, especially when the lists get bigger?
Odds (my take):
1) In the first function the ISet is converted to an IList (to get the random item from the IList), where as there is no such thing performed in the second function.
Advantage IList.
2) In the first function a call to GetRandomItem is made, where as in the second, a call to GetRandomIndex is made, that's one step less again.
Though trivial, advantage IList.
3) In the first function, the random item is got from a separate list, so the obtained item might be already removed from ISet. This leads in more iterations in the while loop in the first function. In the second function, the random index is got from the source that is being iterated on, hence there are never repetitive iterations. I have tested this and verified this.
i > j always, advantage IList.
I thought the reason for this behaviour is that a List would need constant resizing when items are added or removed. But apparently no in some other testing. I ran:
public static void Remove1(this ISet<int> set)
{
int count = set.Count;
for (int i = 0; i < count; i++)
{
set.Remove(i + 1);
}
}
public static void Remove2(this IList<int> lst)
{
for (int i = lst.Count - 1; i >= 0; i--)
{
lst.RemoveAt(i);
}
}
and found that the second function runs faster.
Test bed:
var f = Enumerable.Range(1, 100000);
var s = new HashSet<int>(f);
var l = new List<int>(f);
Benchmark(() =>
{
//some examples...
s.RemoveRandomly1(2500);
l.RemoveRandomly2(2500);
s.Remove1();
l.Remove2();
}, 1);
public static void Benchmark(Action method, int iterations = 10000)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < iterations; i++)
method();
sw.Stop();
MsgBox.ShowDialog(sw.Elapsed.TotalMilliseconds.ToString());
}
Just trying to know what's with the two structures.. Thanks..
Result:
var f = Enumerable.Range(1, 10000);
s.RemoveRandomly1(7500); => 5ms
l.RemoveRandomly2(7500); => 20ms
var f = Enumerable.Range(1, 100000);
s.RemoveRandomly1(7500); => 7ms
l.RemoveRandomly2(7500); => 275ms
var f = Enumerable.Range(1, 1000000);
s.RemoveRandomly1(75000); => 50ms
l.RemoveRandomly2(75000); => 925000ms
For most typical needs a list would do though..!
First off, IList and ISet aren't implementations of anything. I can write an IList or an ISet implementation that will run very differently, so the concrete implementations are what is important (List and HashSet in your case).
Accessing a List item by index is O(1) but not removing by RemoveAt which is O(n).
List removing from the end will be fast because it doesn't have to copy anything, it just decrements its internal counter that stores how many items it has until the number of empty spots in the underlying array goes below a threshold, at which point it will copy the array to a smaller one. Once you hit the max capacity of the underlying array it creates a new array double the size and copies the elements over. If you go below a certain threshold it will create an array half the size and copy the elements over. It tracks how large it is with a length property, so that unused slots appear like they aren't there.
Randomly removing from a list means that it will have to copy all the array entries that come after the index so that they slide down one spot, which is inherently pretty slow, particularly as the size of the list gets bigger. If you have a List with 1 million entries, and you remove something at index 500,000, it has to copy the second half of the array down a spot.
var fillData = new List<int>();
for (var i = 0; i < 100000; i++)
fillData.Add(i);
var stopwatch1 = new Stopwatch();
stopwatch1.Start();
var autoFill = new List<int>();
autoFill.AddRange(fillData);
stopwatch1.Stop();
var stopwatch2 = new Stopwatch();
stopwatch2.Start();
var manualFill = new List<int>();
foreach (var i in fillData)
manualFill.Add(i);
stopwatch2.Stop();
When I take 4 results from stopwach1 and stopwach2, stopwatch1 has always lower value than stopwatch2. That means addrange is always faster than foreach.
Does anyone know why?
Potentially, AddRange can check where the value passed to it implements IList or IList<T>. If it does, it can find out how many values are in the range, and thus how much space it needs to allocate... whereas the foreach loop may need to reallocate several times.
Additionally, even after allocation, List<T> can use IList<T>.CopyTo to perform a bulk copy into the underlying array (for ranges which implement IList<T>, of course.)
I suspect you'll find that if you try your test again but using Enumerable.Range(0, 100000) for fillData instead of a List<T>, the two will take about the same time.
If you are using Add, it is resizing the inner array gradually as needed (doubling), from the default starting size of 10 (IIRC). If you use:
var manualFill = new List<int>(fillData.Count);
I expect it'll change radically (no more resizes / data copy).
From reflector, AddRange does this internally, rather than growing in doubling:
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
if (count > 0)
{
this.EnsureCapacity(this._size + count);
// ^^^ this the key bit, and prevents slow growth when possible ^^^
Because AddRange checks size of added items and increases size of internal array only once.
The dissassembly from reflector for the List AddRange method has the following code
ICollection<T> is2 = collection as ICollection<T>;
if (is2 != null)
{
int count = is2.Count;
if (count > 0)
{
this.EnsureCapacity(this._size + count);
if (index < this._size)
{
Array.Copy(this._items, index, this._items, index + count, this._size - index);
}
if (this == is2)
{
Array.Copy(this._items, 0, this._items, index, index);
Array.Copy(this._items, (int) (index + count), this._items, (int) (index * 2), (int) (this._size - index));
}
else
{
T[] array = new T[count];
is2.CopyTo(array, 0);
array.CopyTo(this._items, index);
}
this._size += count;
}
}
As you can see there are some optimizations like EnsureCapacity() call and using Array.Copy().
When using AddRange the Collection can increase the size of the array once and then copy the values into it.
Using a foreach statement the collection needs to increase size of the collection more than once.
Increasing thr size means copying the complete array which takes time.
This is like asking the waiter to bring you one beer ten times and asking him to bring you 10 beers at once.
What do you think is faster :)
i suppose this is the result of optimisation of memory allocation.
for AddRange memory allocates only once, and while foreach on each iteration reallocation is done.
also may be there are some optimisations in AddRange implementation (memcpy for example)
Try out initialize intiial list capacity before manually adding items:
var manualFill = new List<int>(fillData.Count);
It is because the Foreach loop will add all the values that the loop is getting one a time and
the AddRange() method will gather all the values it is getting as a "chunk" and add that chunk at once to the specified location.
Simply understanding, it is just like you have a list of 10 items to bring from the market, which would be faster bringing all that one by one or all at single time.