Does List.ToArray() preserve indices? - c#

Will the index an item in a List theList be the same as its index in theList.ToArray()? i.e., does theList[i]==theList.ToArray()[i] for sure? Assuming the answer is yes just want to double check.
Update:
Here is a use case, if you can't imagine this is useful: I am using an optimization routine that accepts an array as input. Currently my variables are in a list. I want to run the optimization routine on the variables, then take the result of the optimization routine (which is an array obviously), and put them back in the list. I want to be sure I can just put them back at their original indices.

ToArray uses Array.Copy under the hood which preserves order.
From Array.Copy
Copies a range of elements from an Array starting at the first element
and pastes them into another Array starting at the first element. The
length is specified as a 32-bit integer.
From ToArray
The elements are copied using Array.Copy, which is an O(n) operation,
where n is Count.

LINQ's ToArray just enumerates the collection and stuffs the results into an array. List<T> enumerates linearly, so yes; the indices will match.
Please note that this is not guaranteed by the .NET specification/documentation, but is unlikely to change.

Related

Native Array, Stack, ArrayList or List for creating 'low pass' filter

Trying to create a type of low pass filter where I constantly average the previous 10 float values. Not sure whether to use builtin arrays (native .NET arrays), the .NET Stack operator, or perhaps an ArrayList or List.
In pseudocode I need to
1- Define the array or Stack containing 10 floats
2- Every update Push a new value to the Array, Stack, List
3- Check the length and if greater than10 remove the first or oldest float value from the Array, Stack, List
4- Get the average of all float values in the Array, Stack, List
5- Repeat steps 2-4
1st question Should I be using built in Arrays, Stack, ArrayList or List instead? I notice in the Stack documentation there is no method for removing the oldest (bottom)item from the stack, but perhaps I am missing something
https://msdn.microsoft.com/en-us/library/3278tedw(v=vs.100).aspx
What I need is all the functionality of the (Javascript only) Array class but in C#.
2nd Can anyone help with actual syntax using either approach? Any help appreciated!
You've probably missed the Queue data structure. With it, you can put new elements to the end and delete old ones from the front.
Additionally, you don't need to look at all the elements in the array to compute the average if there are constantly 10 elements. Knowing the previous average you can compute the new average like this:
newAvg = oldAvg + (newElem - deletedElem)/10;
or more shortly:
avg += (newElem - deletedElem)/10;
You could use the generic Queue<T> collection to store the values:
var queue = new Queue<float>();
queue.Enqueue(1.0f); // pushes new item
queue.Dequeue(); // removes oldest item
To obtain the average, use the LINQ Average() extension method:
var average = queue.Average();
Stacks are LIFO(Last In First Out).
If you want to remove the oldest value, then you'd want to use a queue (First In First Out).

when to use List<T>.BinarySearch?

The generic List<T> in .NET has a BinarySearch() method. BinarySearch() is an efficient algorithm for searching large datasets. I think I read that if everyone in the world was listed in a phone book then binary search could find any person within 35 steps. At what point should a BinarySearch() be used on a List as opposed to using the standard .Where clause with a lambda? How big should the data set be before switching from Where to BinarySearch? Or does Where already use a binary search behind the scenes?
When to use List<T>.BinarySearch?
As you can read in the documentation manual:
Searches the entire sorted List<T> for an element using the default comparer and returns the zero-based index of the element.
Furthermore it can only be used to match a given element, not a predicate since a generic predicate would defeat the order constraint.
So the list has to be sorted, either by the default comparator, or by a given comparator:
public int BinarySearch(T item) //default comparator
public int BinarySearch(T item, IComparer<T> comparer) //given comparator
The algorithm runs in O(log n) time whereas the where clause runs in O(n) time, which means in practice it will nearly always outperform the second (unless it is very likely the element will be found in the front of the list).
Or does .Where already use a binary search behind the scenes?
No, it can't since. A List<T> is not always sorted. First checking whether the list is sorted, or sorting the the list would require a computational effort of O(n) or O(n log n) respectively which would be the same or even more expensive than linear search. In other words, first checking whether the list is sorted and then - if that's the case - perform binary search would be more expensive than performing a linear search. A linear search can handle both unordered lists and predicates but at a much larger cost.
At what point should a BinarySearch() be used on a List as opposed to using the standard Where clause with a lambda?
Any time the list is sorted relative to the value you're searching for, BinarySearch will (on average) give you better performance than Where regardless of size.
If the list is unordered, or in an order that does not correspond to the value you're looking for (you can't use a binary search to find a person in the phone book by first name) then BinarySearch will not give you the right results.
Note that it only returns one index, while Where will give you all items that match the criteria, but you can search on either side of the found element if there are duplicates (BinarySearch gives you one index that matches, not necessarily the first index).
Obviously the bigger the list, the more improvement BinarySearch is going to give you.
does Where already use a binary search behind the scenes?
No - it iterates through the list in physical order.

C# Increasing an array by one element at the end

In my program I have a bunch of growing arrays where a new element is grown one by one to the end of the array. I identified Lists to be a speed bottleneck in a critical part of my program due to their slow access time in comparison with an array - switching to an array increased performance tremendously to an acceptable level. So to grow the array i'm using Array.Resize. This works well as my implementation restricts the array size to approximately 20 elements, so the O(N) performance of Array.Resize is bounded.
But it would be better if there was a way to just increase an array by one element at the end without having to use Array.Resize; which I believe does a copy of the old array to the newly sized array.
So my question is, is there a more efficiant method for adding one element to the end of an array without using List or Array.Resize?
A List has constant time access just like an array. For 'growing arrays' you really should be using List.
When you know that you may be adding elements to an array backed structure, you don't want to add one new size at a time. Usually it is best to grow an array by doubling it's size when it fills up.
As has been previously mentioned, List<T> is what you are looking for. If you know the initial size of the list, you can supply an initial capacity to the constructor, which will increase your performance for your initial allocations:
List<int> values = new List<int>(5);
values.Add(1);
values.Add(2);
values.Add(3);
values.Add(4);
values.Add(5);
List's allocate 4 elements to begin with (unless you specify a capacity when you construct it) and then grow every 4 elements.
Why don't you try a similar thing with Array? I.e. create it as having 4 elements, then when you insert the fifth element, first grow the array by another 4 elements.
There is no way to resize an array, so the only way to get a larger array is to use Array.Resize to create a new array.
Why not just create the arrays to have 20 elements from start (or whatever capacity you need at most), and use a variable to keep track of how many elements are used in the array? That way you never have to resize any arrays.
Growing an array AFAIK means that a new array is allocated, the existing content being copied to the new instance. I doubt that this should be faster than using List...?
it's much faster to resize an array in chunks (like 10) and store this as a seperate variable e.g capacity and then only resize the array when the capacity is reached. This is how a list works but if you prefer to use arrays then you should look into resizing them in larger chunks especially if you have a large number of Array.Resize calls
I think that every method, that wants to use array, will not be ever optimized because an array is a static structure so I think it's better to use dynamic structures like List or others.

.NET: How to efficiently check for uniqueness in a List<string> of 50,000 items?

In some library code, I have a List that can contain 50,000 items or more.
Callers of the library can invoke methods that result in strings being added to the list. How do I efficiently check for uniqueness of the strings being added?
Currently, just before adding a string, I scan the entire list and compare each string to the to-be-added string. This starts showing scale problems above 10,000 items.
I will benchmark this, but interested in insight.
if I replace the List<> with a Dictionary<> , will ContainsKey() be appreciably faster as the list grows to 10,000 items and beyond?
if I defer the uniqueness check until after all items have been added, will it be faster? At that point I would need to check every element against every other element, still an n^^2 operation.
EDIT
Some basic benchmark results. I created an abstract class that exposes 2 methods: Fill and Scan. Fill just fills the collection with n items (I used 50,000). Scan scans the list m times (I used 5000) to see if a given value is present. Then I built an implementation of that class for List, and another for HashSet.
The strings used were uniformly 11 characters in length, and randomly generated via a method in the abstract class.
A very basic micro-benchmark.
Hello from Cheeso.Tests.ListTester
filling 50000 items...
scanning 5000 items...
Time to fill: 00:00:00.4428266
Time to scan: 00:00:13.0291180
Hello from Cheeso.Tests.HashSetTester
filling 50000 items...
scanning 5000 items...
Time to fill: 00:00:00.3797751
Time to scan: 00:00:00.4364431
So, for strings of that length, HashSet is roughly 25x faster than List , when scanning for uniqueness. Also, for this size of collection, HashSet has zero penalty over List when adding items to the collection.
The results are interesting and not valid. To get valid results, I'd need to do warmup intervals, multiple trials, with random selection of the implementation. But I feel confident that that would move the bar only slightly.
Thanks everyone.
EDIT2
After adding randomization and multple trials, HashSet consistently outperforms List in this case, by about 20x.
These results don't necessarily hold for strings of variable length, more complex objects, or different collection sizes.
You should use the HashSet<T> class, which is specifically designed for what you're doing.
Use HashSet<string> instead of List<string>, then it should scale very well.
From my tests, HashSet<string> takes no time compared to List<string> :)
Possibly off-topic, but if you want to scale very large unique sets of strings (millions+) in a language-independent way, you might check out Bloom Filters.
Does the Contains(T) function not work for you?
I have read that dictionary<> is implemented as an associative array. In some languages (not necessarily anything related to .NET), string indexes are stored as a tree structure that forks at each node based upon the character in the node. Please see http://en.wikipedia.org/wiki/Associative_arrays.
A similar data structure was devised by Aho and Corasick in 1973 (I think). If you store 50,000 strings in such a structure, then it matters not how many strings you are storing. It matters more the length of the strings. If they are are about the same length, then you will likely never see a slow-down in lookups because the search algorithm is linear in run-time with respect to the length of the string you are searching for. Even for a red-black tree or AVL tree, the search run-time depends more upon the length of the string you are searching for rather than the number of elements in the index. However, if you choose to implement your index keys with a hash function, you now incurr the cost of hashing the string (going to be O(m), m = string length) and also the lookup of the string in the index, which will likely be on the order of O(log(n)), n = number of elements in the index.
edit: I'm not a .NET guru. Other more experienced people suggest another structure. I would take their word over mine.
edit2: your analysis is a little off for comparing uniqueness. If you use a hashing structure or dictionary, then it will not be an O(n^2) operation because of the reasoning I posted above. If you continue to use a list, then you are correct that it is O(n^2) * (max length of a string in your set) because you must examine each element in the list each time.

How to shift items in an array?

I have an array of items that are time sensitive. After an amount of time, the last item needs to fall off and a new item is put at the beginning.
What is the best way to do this?
I would suggest using a queue, just a special instance of an array or list. When your timed event occurs, pop the last item from the queue, and then push your new item on.
Probably the easiest way to do this with an array is to use a circular index. Rather than always looking at array[n], you would reference array[cIndex] (where cIndex referrs to the item in the array being indexed (cIndex is incremented based on the arraySize (cIndex % arraySize)).
When you choose to drop the oldest item in the array, you would simply reference the element located at ((cIndex + (arraySize - 1)) % arraySize).
Alternatively, you could use a linkedList approach.
Use a Queue instead.
By using a Queue, preferably one implemented using a linked-list.
Have a look at using a Queue rather than a simple array.
A queue would work if there a fixed number of items.
Given that the 'amount of time' is known, how about a SortedDictionary with a DateTime key and override the Add method to remove all items with keys that are too old.
LinkedList<T> has AddFirst and RemoveLast members that should work perfectly.
EDIT: Looking at the Queue docs, it seems they use an internal array. As long as the implementation uses a circular-array type algorithm performance should be fine.
In csharp 3 you can do:
original = new[] { newItem }.Concat(
original.Take(original.Count() - 1)).ToArray()
But you are probably better off using a specialised datastructure
Queue is great for FIFO arrays. For generic array handling, use List(T)'s
Insert(0, x) and RemoveAt(0) methods to put or remove items in front of the list, for example.
Technically you need a deque. A queue has items pushed and popped off one end only. A deque is open at both ends.
Most languages will allow array manipulation, just remove the first element and put another one on the end.
Alternatively you can shift every element, by looping. Just replace each element (starting from the oldest) with its neighbour. Then place the new item in the last element.
If you know that your deque won't go above a certain size, then you can make it circular. You'll need two pointers to tell you where the two ends are though. Adding and removing items, will increase/decrease your pointers accordingly. You'll have to detect a buffer overflow condition (i.e. your pointers 'cross'). And you'll have to use modular arithmetic so your pointers go in a circle around the array.
Or you could time stamp each element in the array and remove them when they become too 'old'. You can either do this by keeping a separate array indexed in the same way, or by having an array of two element arrays, with the time stamp stored in one of the sub-elements.
If you're looking for the fastest way of doing this, it's going to be a circular array: you keep track of your current position in the array (ndx), and the end of the array (end), so when you insert an item, you implicitly eliminate the oldest item.
A circular array is the fastest implementation of a fixed-size queue that I know of.
For example, in C/C++ it would look like this for ints (quitting when you get a 0):
int queue[SIZE];
int ndx=0; // start at the beginning of the array
int end=SIZE-1;
int newitem;
while(1){
cin >> newitem;
if(!newitem) // quit if it's a 0
break;
if(ndx>end) // need to loop around the end of the array
ndx=0;
queue[ndx] = newitem;
ndx++
}
Lots of optimization could be done, but if you want to built it yourself, this is the fastest route.
If you don't care about performance, use a shipped Queue object because it should be generalized.
It may or may not be optimized, and it may not support a fixed size list, so be sure to check the documentation on it before using.

Categories