Why do I get an index of -1 for C# array? - c#

I get the following result for the code I added right after the output:
Initializing
2.4
Searching
Index: 55504605
Time: 0.0374
Index: 21891944
Time: 0.0178
Index: 56663763
Time: 0.0425
Index: 37441319
Time: 0.0261
Index: -1
Time: 0.0676
Index: 9344095
Time: 0.0062
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var sw = new Stopwatch();
int[] big = new int[100000000];
Console.WriteLine("Initializing");
sw.Start();
var r = new Random(0);
for (int i=0; i < big.Length; ++i)
{
big[i] = r.Next(big.Length);
}
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString("s\\.f"));
Console.WriteLine();
Console.WriteLine("Searching");
for (int i=0; i<6; ++i)
{
int searchFor = r.Next(big.Length);
sw.Reset();
sw.Start();
int index = Array.IndexOf(big, searchFor);
sw.Stop();
Console.WriteLine("Index: {0}", index);
Console.WriteLine("Time: {0:s\\.ffff}", sw.Elapsed);
}
Console.Read();
}
}
}
I do not understand why I get an index of -1 for the 5th iteration. Isn't the code returning the location of the match within the array? Isn't array numbered from 0 to 100,000,000?
I am not sure if I can ask a follow up question here. The question is about array binary search implemented in the CLR of C# namespace. According to this link if the number is not found, the search returns the complement of the index of an array element. It says if the number is less than one or two elements, the result will the bit wise complement of the index of the first element which is greater. If the value is larger than all the elements, the result will be bit wise complement of the index of the last element of the array. I want to know what good is that result and what is the logic behind returning the bit wise complement of these values?
Below is my results and my modified code:
namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
var sw = new Stopwatch();
int[] big = new int[100000000];
Console.WriteLine("Initializing");
sw.Start();
var r = new Random(0);
for (int i=0; i < big.Length; ++i)
{
big[i] = r.Next(big.Length);
}
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString("s\\.f"));
Console.WriteLine(sw.ElapsedTicks.ToString("s\\.f"));
Console.WriteLine();
Console.WriteLine("Searching");
for (int i=0; i<6; ++i)
{
int searchFor = r.Next(big.Length);
sw.Reset();
sw.Start();
int index = Array.IndexOf(big, searchFor);
sw.Stop();
Console.WriteLine("Index: {0}", index);
Console.WriteLine("Time: {0:s\\.ffff}", sw.Elapsed);
}
Console.WriteLine();
Console.WriteLine("Sorting");
sw.Reset();
sw.Start();
Array.Sort(big);
sw.Stop();
Console.WriteLine(sw.Elapsed.ToString("s\\.f"));
Console.WriteLine();
Console.WriteLine("Searching (binary)");
for (int i=0; i<6; ++i)
{
int searchFor = r.Next() % big.Length;
sw.Reset();
sw.Start();
int index = Array.BinarySearch(big, searchFor);
sw.Stop();
Console.WriteLine("Index: {0}", index);
Console.WriteLine("Time: {0:s\\.fffffff}", sw.Elapsed);
}
Console.ReadLine();
}
}
}
Index: 55504605
Time: 0.0460
Index: 21891944
Time: 0.0147
Index: 56663763
Time: 0.0377
Index: 37441319
Time: 0.0248
Index: -1
Time: 0.0755
Index: 9344095
Time: 0.0068
Sorting
16.5
Searching (binary)
Index: 8990721
Time: 0.0000844
Index: 4404823
Time: 0.0000046
Index: 52683151
Time: 0.0000059
Index: -37241611
Time: 0.0000238
Index: -49384544
Time: 0.0000021
Index: 88243160
Time: 0.0000064
Just a couple of qualifying statements: 1- The above code is not mine. I am just trying to understand it. 2- If I used any terms incorrectly, please let me know as I am in learning process.

Isn't array numbered from 0 to 100,000,000?
No. You initialize an array of 100000000 items with 100000000 random numbers.
int[] big = new int[100000000];
for (int i=0; i < big.Length; ++i)
{
big[i] = r.Next(big.Length);
}
You then try to find the index of six random numbers:
for (int i=0; i<6; ++i)
{
int searchFor = r.Next(big.Length);
int index = Array.IndexOf(big, searchFor);
However, there's no guarantee that the big array contains all numbers between 0 and 100000000. Random.Next() can return duplicate values, meaning some other numbers from that range are missing.
So there is a chance that r.Next(big.Length) in the second loop returns a number that isn't in the array, hence the return value of -1.
If you actually want to shuffle the numbers from 0-99999999, then instead generate a list containing those numbers, and shuffle it.

Related

Create Hashset with a large number of elements (1M)

I have to create a HashSet with the elements from 1 to N+1, where N is a large number (1M).
For example, if N = 5, the HashSet will have then integers {1, 2, 3, 4, 5, 6 }.
The only way I have found is:
HashSet<int> numbers = new HashSet<int>(N);
for (int i = 1; i <= (N + 1) ; i++)
{
numbers.Add(i);
}
Are there another faster (more efficient) ways to do it?
6 is a tiny number of items so I suspect the real problem is adding a few thousand items. The delays in this case are caused by buffer reallocations, not the speed of Add itself.
The solution to this is to specify even an approximate capacity when constructing the HashSet :
var set=new HashSet<int>(1000);
If, and only if, the input implements ICollection<T>, the HashSet<T>(IEnumerable<T>) constructor will check the size of input collection and use it as its capacity:
if (collection is ICollection<T> coll)
{
int count = coll.Count;
if (count > 0)
{
Initialize(count);
}
}
Explanation
Most containers in .NET use buffers internally to store data. This is far faster than implementing containers using pointers, nodes etc due to CPU cache and RAM access delays. Accessing the next item in the CPU's cache is far faster than chasing a pointer in RAM in all CPUs.
The downside is that each time the buffer is full a new one will have to be allocated. Typically, this buffer will have twice the size of the original buffer. Adding items one by one can result in log2(N) reallocations. This works fine for a moderate number of items but can result in a lot of orphaned buffers when adding eg 1000 items one by one. All those temporary buffers will have to be garbage collected at some point, causing additional delays.
Here's the code to test the three options:
var N = 1000000;
var trials = new List<(int method, TimeSpan duration)>();
for (var i = 0; i < 100; i++)
{
var sw = Stopwatch.StartNew();
HashSet<int> numbers1 = new HashSet<int>(Enumerable.Range(1, N + 1));
sw.Stop();
trials.Add((1, sw.Elapsed));
sw = Stopwatch.StartNew();
HashSet<int> numbers2 = new HashSet<int>(N);
for (int n = 1; n < N + 1; n++)
numbers2.Add(n);
sw.Stop();
trials.Add((2, sw.Elapsed));
HashSet<int> numbers3 = new HashSet<int>(N);
foreach (int n in Enumerable.Range(1, N + 1))
numbers3.Add(n);
sw.Stop();
trials.Add((3, sw.Elapsed));
}
for (int j = 1; j <= 3; j++)
Console.WriteLine(trials.Where(x => x.method == j).Average(x => x.duration.TotalMilliseconds));
Typical output is this:
31.314788
16.493208
16.493208
It is nearly twice as fast to preallocate the capacity of the HashSet<int>.
There is no difference between the traditional loop and a LINQ foreach option.
To build on #Enigmativity's answer, here's a proper benchmark using BenchmarkDotNet:
public class Benchmark
{
private const int N = 1000000;
[Benchmark]
public HashSet<int> EnumerableRange() => new HashSet<int>(Enumerable.Range(1, N + 1));
[Benchmark]
public HashSet<int> NoPreallocation()
{
var result = new HashSet<int>();
for (int n = 1; n < N + 1; n++)
{
result.Add(n);
}
return result;
}
[Benchmark]
public HashSet<int> Preallocation()
{
var result = new HashSet<int>(N);
for (int n = 1; n < N + 1; n++)
{
result.Add(n);
}
return result;
}
}
public class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run(typeof(Program).Assembly);
}
}
With the results:
Method
Mean
Error
StdDev
EnumerableRange
29.17 ms
0.743 ms
2.179 ms
NoPreallocation
23.96 ms
0.471 ms
0.775 ms
Preallocation
11.68 ms
0.233 ms
0.665 ms
As we can see, using linq is a bit slower than not using linq (as expected), and pre-allocating saves a significant amount of time.

Get set of random numbers from input List having fixed sum using C#

I am looking for a C# algorithm that would give me a set of random integers from input List, such that the sum of obtained random integers is N.
For example:
If the list is {1,2,3,4,5,6...100} and N is 20, then the algorithm should return a set of random numbers like {5,6,9} or {9,11} or {1,2,3,4,10} etc.
Note that the count of integers in result set need not be fixed. Also, the input list can have duplicate integers. Performance is one of my priority as the input list can be large (around 1000 integers) and I need to randomize about 2-3 times in a single web request. I am flexible with not sticking to List as datatype if there is a performance issue with Lists.
I have tried below method which is very rudimentary and performance inefficient:
Use the Random class to get a random index from the input list
Get the integer from input list present at index obtained in #1. Lets call this integer X.
Sum = Sum + X.
Remove X from input list so that it does not get selected next.
If Sum is less than required total N, add X to outputList and go back to #1.
If the Sum is more than required total N, reinitialize everything and restart the process.
If the Sum is equal to required total N, return outputList
while(!reachedTotal)
{
//Initialize everything
inputList.AddRange(originalInputList);
outputList = new List<int>();
while (!reachedTotal)
{
random = r.Next(inputList.Count);
sum += inputList.ElementAt(random);
if(sum<N)
{
outputList.Add(inputList.ElementAt(random));
inputList.RemoveAt(random);
}
else if(sum>N)
break;
else
reachedTotal = true;
}
}
This is a stochastical approach that gives you a solution within a 10% range of N - Assuming one exists
using System;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace StackOverflowSnippets
{
class Program
{
static void Main(string[] args)
{
// ----------------------------------------------------------------------------------
// The code you are interested in starts below this line
const Int32 N = 100;
Int32 nLowerBound = (90 * N) / 100; Int32 nUpperBound = (110 * N) / 100;
Random rnd = new Random();
Int32 runningSum = 0;
Int32 nextIndex = 0;
List<Int32> inputList = GenerateRandomList( /* entries = */ 1000);
List<Int32> o = new List<Int32>();
while (runningSum < nLowerBound)
{
nextIndex = rnd.Next(inputList.Count); if (nUpperBound < (runningSum + inputList[nextIndex])) continue;
runningSum += inputList[nextIndex];
o.Add(inputList[nextIndex]);
inputList.RemoveAt(nextIndex);
}
// The code you are interested in ends above this line
// ----------------------------------------------------------------------------------
StringBuilder b = new StringBuilder();
for(Int32 i = 0; i < o.Count;i++)
{
if (b.Length != 0) b.Append(",");
b.Append(o[i].ToString());
}
Console.WriteLine("Exact N : " + N);
Console.WriteLine("Upper Bound: " + nUpperBound);
Console.WriteLine("Lower Bound: " + nLowerBound);
Console.WriteLine();
Console.WriteLine("sum(" + b.ToString() + ")=" + GetSum(o).ToString());
Console.ReadLine();
}
// -------------------------------------------------------------------
#region Helper methods
private static object GetSum(List<int> o)
{
Int32 sum = 0;
foreach (Int32 i in o) sum += i;
return sum;
}
private static List<Int32> GenerateRandomList(Int32 entries)
{
List<Int32> l = new List<Int32>();
for(Int32 i = 1; i < entries; i++)
{
l.Add(i);
}
return l;
}
#endregion
}
}
EDIT
Forgot to remove the element from the input-list so it cannot be selected twice
Fixed the 'remove element' insertion

What is the fastest implementation of sql like 'x%' in c# collections on a key

I have a need to do very quick prefix "sql like" searches over a hundreds of thousands of keys. I have tried doing performance tests using a SortedList, a Dictionary, and a SortedDictionary, which I do like so :
var dictionary = new Dictionary<string, object>();
// add a million random strings
var results = dictionary.Where(x=>x.Key.StartsWith(prefix));
I find that that they all take a long time, Dictionary is the fastest, and SortedDictionary the slowest.
Then I tried a Trie implementation from http://www.codeproject.com/Articles/640998/NET-Data-Structures-for-Prefix-String-Search-and-S which is a magnitude faster, ie. milliseconds instead of seconds.
So my question is, is there no .NET collection I can use for the said requirement? I would have assumed that this would be a common requirement.
My basic test :
class Program
{
static readonly Dictionary<string, object> dictionary = new Dictionary<string, object>();
static Trie<object> trie = new Trie<object>();
static void Main(string[] args)
{
var random = new Random();
for (var i = 0; i < 100000; i++)
{
var randomstring = RandomString(random, 7);
dictionary.Add(randomstring, null);
trie.Add(randomstring, null);
}
var lookups = new string[10000];
for (var i = 0; i < lookups.Length; i++)
{
lookups[i] = RandomString(random, 3);
}
// compare searching
var sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
var exists = dictionary.Any(k => k.Key.StartsWith(lookup));
}
sw.Stop();
Console.WriteLine("dictionary.Any(k => k.Key.StartsWith(randomstring)) took : {0} ms", sw.ElapsedMilliseconds);
// test other collections
sw.Restart();
foreach (var lookup in lookups)
{
var exists = trie.Retrieve(lookup).Any();
}
sw.Stop();
Console.WriteLine("trie.Retrieve(lookup) took : {0} ms", sw.ElapsedMilliseconds);
Console.ReadKey();
}
public static string RandomString(Random random,int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
Results:
dictionary.Any(k => k.Key.StartsWith(randomstring)) took : 80990 ms
trie.Retrieve(lookup) took : 115 ms
If sorting matters, try to use a SortedList instead of SortedDictionary. They both have the same functionality but they are implemented differently. SortedList is faster when you want to enumerate the elements (and you can access the elements by index), and SortedDictionary is faster if there are a lot of elements and you want to insert a new element in the middle of the collection.
So try this:
var sortedList = new SortedList<string, object>();
// populate list...
sortedList.Keys.Any(k => k.StartsWith(lookup));
If you have a million elements, but you don't want to re-order them once the dictionary is populated, you can combine their advantages: populate a SortedDictionary with the random elements, and then create a new List<KeyValuePair<,>> or SortedList<,> from that.
So, after little test I found something close enought with usage BinarySearch only Cons is that you have to sort keys from a to z. But the biggest the list, the slower it will be so Ternary Search is fastest from all you can actualy found with binary pc architecture.
Method: (Credits shoult go to #Guffa)
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max)
{
while (max >= min)
{
var mid = (min + max) / 2;
var comp = string.CompareOrdinal(words[mid].Substring(0, prefix.Length), prefix);
if (comp >= 0)
{
if (comp > 0)
max = mid - 1;
else
return mid;
}
else
min = mid + 1;
}
return -1;
}
and test implementation
var keysToList = dictionary.Keys.OrderBy(q => q).ToList();
sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
bool exist = BinarySearchStartsWith(keysToList, lookup, 0, keysToList.Count - 1)!= -1
}
sw.Stop();
If you can sort the keys once and then use them repeatedly to look up the prefixes, then you can use a binary search to speed things up.
To get the maximum performance, I shall use two arrays, once for keys and one for values, and use the overload of Array.Sort() which sorts a main and an adjunct array.
Then you can use Array.BinarySearch() to search for the nearest key which starts with a given prefix, and return the indices for those that match.
When I try it, it seems to only take around 0.003ms per check if there are one or more matching prefixes.
Here's a runnable console application to demonstrate (remember to do your timings on a RELEASE build):
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
class Program
{
public static void Main()
{
int count = 1000000;
object obj = new object();
var keys = new string[count];
var values = new object[count];
for (int i = 0; i < count; ++i)
{
keys[i] = randomString(5, 16);
values[i] = obj;
}
// Sort key array and value arrays in tandem to keep the relation between keys and values.
Array.Sort(keys, values);
// Now you can use StartsWith() to return the indices of strings in keys[]
// that start with a specific string. The indices can be used to look up the
// corresponding values in values[].
Console.WriteLine("Count of ZZ = " + StartsWith(keys, "ZZ").Count());
// Test a load of times with 1000 random prefixes.
var prefixes = new string[1000];
for (int i = 0; i < 1000; ++i)
prefixes[i] = randomString(1, 8);
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; ++i)
for (int j = 0; j < 1000; ++j)
StartsWith(keys, prefixes[j]).Any();
Console.WriteLine("1,000,000 checks took {0} for {1} ms each.", sw.Elapsed, sw.ElapsedMilliseconds/1000000.0);
}
public static IEnumerable<int> StartsWith(string[] array, string prefix)
{
int index = Array.BinarySearch(array, prefix);
if (index < 0)
index = ~index;
// We might have landed partway through a set of matches, so find the first match.
if (index < array.Length)
while ((index > 0) && array[index-1].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
--index;
while ((index < array.Length) && array[index].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
yield return index++;
}
static string randomString(int minLength, int maxLength)
{
int length = rng.Next(minLength, maxLength);
const string CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
return new string(Enumerable.Repeat(CHARS, length)
.Select(s => s[rng.Next(s.Length)]).ToArray());
}
static readonly Random rng = new Random(12345);
}
}

Formatting an array output in c#

I am trying to create a method that sums and outputs integers in an array. I understand how to sum the integers, however I am having trouble producing the desired output.
If I pass into the method 8, 3, 3
I need the output to look as follows.
For the list = <8, 3, 3> the sum is: 14
Once again I am familiar with how to sum, I am unfamiliar with how to format this.
Here is my method so far...
public static void Sum(params int[] number)
{
int total = 0;
for (int i = 0; i < number.Length; ++i)
{
total = total + number[i];
}
Console.Write("For the list =, the sum of its elements is : {0}.", total);
Console.Write("\n");
}
Use the String.Join method to make a string with the values in the array:
string values = String.Join(", ", number);
Then just add it to the output:
Console.Write("For the list = <{0}> the sum of its elements is : {1}.", values, total);
Prior to framework 4 there is no overload of String.Join that can take an array of anything other than string, so if you are using an older framwork you need to turn the integers to strings:
string values = String.Join(", ", number.Select(n => n.ToString()));
public static void Sum(params int[] number)
{
int total=0;
for (int i = 0; i < number.Length; ++i) total = total + number[i];
string ext = String.Format ("<{0}>", String.Join (",", number));
Console.Write("For the list ={0} the sum is: {1}.",ext, total);
Console.Write("\n");
}
There are several ways you could go to produce the desired output, one would be:
public static void Sum(params int[] number)
{
int total=0;
StringBuild tmp="For the list =
for (int i = 0; i < number.Length; ++i)
{
total = total + number[i];
}
Console.Write("For the list =, the sum of its elements is : {0}.", total);
Console.Write("\n");
}
This will do it:
using System;
using System.Linq;
public class Test
{
public static void Main()
{
Sum(8,3,3);
}
public static void Sum(params int[] number)
{
Console.WriteLine("For the list <{0}>, the sum of its elements is: {1}",
string.Join(", ", number),
number.Sum());
}
}
outputs:
For the list <8, 3, 3>, the sum of its elements is: 14
Click here for a working sample

Performance differences... so dramatic?

Just now I read some posts about List<T> vs LinkedList<T>, so I decided to benchmark some structures myself. I benchmarked Stack<T>, Queue<T>, List<T> and LinkedList<T> by adding data and removing data to/from the front/end. Here's the benchmark result:
Pushing to Stack... Time used: 7067 ticks
Poping from Stack... Time used: 2508 ticks
Enqueue to Queue... Time used: 7509 ticks
Dequeue from Queue... Time used: 2973 ticks
Insert to List at the front... Time used: 5211897 ticks
RemoveAt from List at the front... Time used: 5198380 ticks
Add to List at the end... Time used: 5691 ticks
RemoveAt from List at the end... Time used: 3484 ticks
AddFirst to LinkedList... Time used: 14057 ticks
RemoveFirst from LinkedList... Time used: 5132 ticks
AddLast to LinkedList... Time used: 9294 ticks
RemoveLast from LinkedList... Time used: 4414 ticks
Code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace Benchmarking
{
static class Collections
{
public static void run()
{
Random rand = new Random();
Stopwatch sw = new Stopwatch();
Stack<int> stack = new Stack<int>();
Queue<int> queue = new Queue<int>();
List<int> list1 = new List<int>();
List<int> list2 = new List<int>();
LinkedList<int> linkedlist1 = new LinkedList<int>();
LinkedList<int> linkedlist2 = new LinkedList<int>();
int dummy;
sw.Reset();
Console.Write("{0,40}", "Pushing to Stack...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
stack.Push(rand.Next());
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "Poping from Stack...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
dummy = stack.Pop();
dummy++;
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "Enqueue to Queue...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
queue.Enqueue(rand.Next());
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "Dequeue from Queue...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
dummy = queue.Dequeue();
dummy++;
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "Insert to List at the front...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
list1.Insert(0, rand.Next());
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "RemoveAt from List at the front...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
dummy = list1[0];
list1.RemoveAt(0);
dummy++;
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "Add to List at the end...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
list2.Add(rand.Next());
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "RemoveAt from List at the end...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
dummy = list2[list2.Count - 1];
list2.RemoveAt(list2.Count - 1);
dummy++;
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "AddFirst to LinkedList...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
linkedlist1.AddFirst(rand.Next());
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "RemoveFirst from LinkedList...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
dummy = linkedlist1.First.Value;
linkedlist1.RemoveFirst();
dummy++;
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "AddLast to LinkedList...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
linkedlist2.AddLast(rand.Next());
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks", sw.ElapsedTicks);
sw.Reset();
Console.Write("{0,40}", "RemoveLast from LinkedList...");
sw.Start();
for (int i = 0; i < 100000; i++)
{
dummy = linkedlist2.Last.Value;
linkedlist2.RemoveLast();
dummy++;
}
sw.Stop();
Console.WriteLine(" Time used: {0,9} ticks\n", sw.ElapsedTicks);
}
}
}
The differences are so dramatic!
As you can see, the performance of Stack<T> and Queue<T> are fast and comparable, that's expected.
For List<T>, using the front and the end has so much differences! And to my surprise, performance of adding/removing from the end is actually comparable to the performance of Stack<T>.
For LinkedList<T>, manipulating with the front is fast (-er than List<T>), but for the end, it is incredibly slow for removing manipulating with the end is too.
So... can any experts account on:
the similarity in performance of using Stack<T> and the end of List<T>,
the differences in using the front and the end of List<T>, and
the reason that using the end of LinkedList<T> is so slow (not applicable as that is a coding error due to the use of Linq's Last(), thanks to CodesInChaos)?
I think I know why List<T> doesn't handle the front so well... because List<T>needs to move the whole list back and fro when doing that. Correct me if I am wrong.
P.S. My System.Diagnostics.Stopwatch.Frequency is 2435947, and the program is targeted to .NET 4 Client Profile and compiled with C# 4.0, on Windows 7 Visual Studio 2010.
Concerning 1:
Stack<T>'s and List<T>'s performance being similar isn't surprising. I'd expect both of them to use arrays with a doubling strategy. This leads to amortized constant-time additions.
You can use List<T> everywhere you can use Stack<T>, but it leads to less expressive code.
Concerning 2:
I think I know why List<T> doesn't handle the front so well... because List<T> needs to move the whole list back and fro when doing that.
That's correct. Inserting/removing elements at the beginning is expensive because it moves all elements. Getting or replacing elements at the beginning on the other hand is cheap.
Concerning 3:
Your slow LinkedList<T>.RemoveLast value is a mistake in your benchmarking code.
Removing or getting the last item of a doubly linked list is cheap. In the case of LinkedList<T> that means that RemoveLast and Last are cheap.
But you weren't using the Last property, but LINQ's extension method Last(). On collections that don't implement IList<T> it iterates the whole list, giving it O(n) runtime.
List<T> is a dynamic over-allocating array (a data structure you'll also see in many other languages' standard library). This means it internally uses of a "static" array (an array that can't be resized, known as just "array" in .NET) which may be and often is larger than the size of the list. Appending then simply increments a counter and uses the next, previously unused, slot of the internal array. The array is only re-allocated (which requires copying all elements) if the internal array becomes to small to accommodate all items. When that happens, the size of the array is increased by a factors (not a constant), usually 2.
This ensures that amortized time complexity (basically, the average time per operation over a long sequence of operations) for appending is O(1) even in the worst case. For adding at the front, no such optimization is feasible (at least not while keeping both random access and O(1) appending at the end). It always has to copy all elements to move them into their new slots (making space for the added element in the first slot). Stack<T> does the same thing, you just don't notice the discrepancy with adding to the front because you only ever operate on one end (the fast one).
Getting the end of a linked list depends a lot on the internals of your list. One can maintain a reference to the last element, but this makes all operations on the list more complicated, and may (I don't have an example at hand) make some operations much more expensive. Lacking such a reference, appending to the end requires walking through all elements of the linked list to find the last node, which is of course awfully slow for lists of nontrivial size.
As pointed out by #CodesInChaos, your linked list manipulation was flawed. The fast retrieval of the end you see now is most likely caused by LinkedList<T> explicitly maintaining a reference to the last node, as mentioned above. Note that getting an element not at either end is still slow.
The speed comes essentially from the number of operations needed to insert, delete, or search an item. You already noticed, that list needs memory transfers.
Stack is a list, that is accessible only at the top element -- and the computer always knows, where it is.
The linked list is another thing: the start of the list is known, thus it's very fast to add or remove from the start -- but finding the last element takes time. Caching the location of the last element OTOH is only worthwhile for addition. For deletion one needs to traverse the complete list minus one element to find the 'hook' or pointer to the last one.
Just looking at the numbers, one can make some educated guesses of the internals of each data structure:
pop from a stack is fast, as expected
push to stack is slower. and it's slower than adding to the end of the list. Why?
apparently the allocation unit size for stack is smaller -- it may only increase the stack size by 100, while growing the list could be done in units of 1000.
A list seems to be a static array. Accessing the list at the front requires memory transfers, that take time in proportion to the list length.
Basic linked list operations shouldn't take that much longer, it's generally only required to
new_item.next = list_start; list_start = new_item; // to add
list_start = list_start.next; // to remove
however, as addLast is so fast, it means that also when adding or deleting to a linked list, one has to update the pointer to the last element also. So there's extra bookkeeping.
Doubly linked lists OTOH make it relatively fast to insert and delete at both ends of the list (I've been informed that a better code uses DLLs), however,
links to previous and next item also double the work for the bookkeeping
the similarity in performance of using Stack and the end of List,
As explained by delnan, they both use a simple array internally, so they behave very similar when working at the end. You could see a stack being a list with just access to the last object.
the differences in using the front and the end of List
You already suspected it correctly. Manipulating the beginning of a list, means that the underlying array needs to change. Adding an item usually means that you need to shift all other elements by one, same with removing. If you know that you will be manipulating both ends of a list, you’re better of using a linked list.
the reason that using the end of LinkedList is so slow?
Usually, element insertion and deletion for linked lists at any position can be done in constant time, as you just need to change at most two pointers. The problem is just getting to the position. A normal linked list has just a pointer to its first element. So if you want to get to the last element, you need to iterate through all elements. A queue implemented with a linked list usually solves this problem by having an additional pointer to the last element, so adding elements is possible in constant time as well. The more sophisticated data structure would be a double linked list that has both pointers to the first and last element, and where each element also contains a pointer to the next and previous element.
What you should learn about this is that there are many different data structures that are made for a single purpose, which they can handle very efficiently. Choosing the correct structure depends a lot on what you want to do.
I have a Java background and I guess your question relates more to general datastructures than a specific language. Also, I apologize if my statements are incorrect.
1. the similarity in performance of using Stack and the end of List
2. the differences in using the front and the end of List, and
At least in Java, Stacks are implemented using arrays (Apologies if that is not the case with C#. You could refer to the source for the implementation) And same is the case of Lists. Typical with an array, all insertions at the end takes lesser time than at the beginning because the pre-existing values in the array needs to be moved down to accommodate the insertion at the beginning.
Link to Stack.java source and its superclass Vector
3. the reason that using the end of LinkedList is so slow?
LinkedList do not allow random access and have to traverse through the nodes before reaching your insertion point. If you find that the performance is slower for the last nodes, then I suppose the LinkedList implementation should be a singly-linked list. I guess you would want to consider a doubly-linked-list for optimal performance while accessing elements at the end.
http://en.wikipedia.org/wiki/Linked_list
Just improved some of the deficiencies of the previous code, especially the influence of Random and the dummy calculations. Array still tops everything, but the performance of List is impressing and LinkedList is very good for random insertions.
The sorted results are:
12 array[i]
40 list2[i]
62 FillArray
68 list2.RemoveAt
78 stack.Pop
126 list2.Add
127 queue.Dequeue
159 stack.Push
161 foreach_linkedlist1
191 queue.Enqueue
218 linkedlist1.RemoveFirst
219 linkedlist2.RemoveLast
2470 linkedlist2.AddLast
2940 linkedlist1.AddFirst
The code is:
using System;
using System.Collections.Generic;
using System.Diagnostics;
//
namespace Benchmarking {
//
static class Collections {
//
public static void Main() {
const int limit = 9000000;
Stopwatch sw = new Stopwatch();
Stack<int> stack = new Stack<int>();
Queue<int> queue = new Queue<int>();
List<int> list1 = new List<int>();
List<int> list2 = new List<int>();
LinkedList<int> linkedlist1 = new LinkedList<int>();
LinkedList<int> linkedlist2 = new LinkedList<int>();
int dummy;
sw.Reset();
Console.Write( "{0,40} ", "stack.Push");
sw.Start();
for ( int i = 0; i < limit; i++ ) {
stack.Push( i );
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "stack.Pop" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
stack.Pop();
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "queue.Enqueue" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
queue.Enqueue( i );
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "queue.Dequeue" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
queue.Dequeue();
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
//sw.Reset();
//Console.Write( "{0,40} ", "Insert to List at the front..." );
//sw.Start();
//for ( int i = 0; i < limit; i++ ) {
// list1.Insert( 0, i );
//}
//sw.Stop();
//Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
//
//sw.Reset();
//Console.Write( "{0,40} ", "RemoveAt from List at the front..." );
//sw.Start();
//for ( int i = 0; i < limit; i++ ) {
// dummy = list1[ 0 ];
// list1.RemoveAt( 0 );
// dummy++;
//}
//sw.Stop();
//Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "list2.Add" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
list2.Add( i );
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "list2.RemoveAt" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
list2.RemoveAt( list2.Count - 1 );
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "linkedlist1.AddFirst" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
linkedlist1.AddFirst( i );
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "linkedlist1.RemoveFirst" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
linkedlist1.RemoveFirst();
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "linkedlist2.AddLast" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
linkedlist2.AddLast( i );
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "linkedlist2.RemoveLast" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
linkedlist2.RemoveLast();
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
// Fill again
for ( int i = 0; i < limit; i++ ) {
list2.Add( i );
}
sw.Reset();
Console.Write( "{0,40} ", "list2[i]" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
dummy = list2[ i ];
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
// Fill array
sw.Reset();
Console.Write( "{0,40} ", "FillArray" );
sw.Start();
var array = new int[ limit ];
for ( int i = 0; i < limit; i++ ) {
array[ i ] = i;
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
sw.Reset();
Console.Write( "{0,40} ", "array[i]" );
sw.Start();
for ( int i = 0; i < limit; i++ ) {
dummy = array[ i ];
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
// Fill again
for ( int i = 0; i < limit; i++ ) {
linkedlist1.AddFirst( i );
}
sw.Reset();
Console.Write( "{0,40} ", "foreach_linkedlist1" );
sw.Start();
foreach ( var item in linkedlist1 ) {
dummy = item;
}
sw.Stop();
Console.WriteLine( sw.ElapsedMilliseconds.ToString() );
//
Console.WriteLine( "Press Enter to end." );
Console.ReadLine();
}
}
}

Categories