I want to initiate a list of N objects with zeros( 0.0 ) . I thought of doing it like that:
var TempList = new List<float>(new float[(int)(N)]);
Is there any better(more efficeint) way to do that?
Your current solution creates an array with the sole purpose of initialising a list with zeros, and then throws that array away. This might appear to be not efficient. However, as we shall see, it is in fact very efficient!
Here's a method that doesn't create an intermediary array:
int n = 100;
var list = new List<float>(n);
for (int i = 0; i < n; ++i)
list.Add(0f);
Alternatively, you can use Enumerable.Repeat() to provide 0f "n" times, like so:
var list = new List<float>(n);
list.AddRange(Enumerable.Repeat(0f, n));
But both these methods turn out to be a slower!
Here's a little test app to do some timings.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
public class Program
{
private static void Main()
{
var sw = new Stopwatch();
int n = 1024*1024*16;
int count = 10;
int dummy = 0;
for (int trial = 0; trial < 4; ++trial)
{
sw.Restart();
for (int i = 0; i < count; ++i)
dummy += method1(n).Count;
Console.WriteLine("Enumerable.Repeat() took " + sw.Elapsed);
sw.Restart();
for (int i = 0; i < count; ++i)
dummy += method2(n).Count;
Console.WriteLine("list.Add() took " + sw.Elapsed);
sw.Restart();
for (int i = 0; i < count; ++i)
dummy += method3(n).Count;
Console.WriteLine("(new float[n]) took " + sw.Elapsed);
Console.WriteLine("\n");
}
}
private static List<float> method1(int n)
{
var list = new List<float>(n);
list.AddRange(Enumerable.Repeat(0f, n));
return list;
}
private static List<float> method2(int n)
{
var list = new List<float>(n);
for (int i = 0; i < n; ++i)
list.Add(0f);
return list;
}
private static List<float> method3(int n)
{
return new List<float>(new float[n]);
}
}
}
Here's my results for a RELEASE build:
Enumerable.Repeat() took 00:00:02.9508207
list.Add() took 00:00:01.1986594
(new float[n]) took 00:00:00.5318123
So it turns out that creating an intermediary array is quite a lot faster. However, be aware that this testing code is flawed because it doesn't account for garbage collection overhead caused by allocating the intermediary array (which is very hard to time properly).
Finally, there is a REALLY EVIL, NASTY way you can optimise this using reflection. But this is brittle, will probably fail to work in the future, and should never, ever be used in production code.
I present it here only as a curiosity:
private static List<float> method4(int n)
{
var list = new List<float>(n);
list.GetType().GetField("_size", BindingFlags.NonPublic | BindingFlags.Instance).SetValue(list, n);
return list;
}
Doing this reduces the time to less than a tenth of a second, compared to the next fastest method which takes half a second. But don't do it.
What is wrong with
float[] A = new float[N];
or
List<float> A = new List<float>(N);
Note that trying to micromanage the compiler is not optimization. Start with the cleanest code that does what you want and let the compiler do its thing.
Edit 1
The solution with List<float> produces an empty list, with only internally N items initialized. So we can trick it with some reflection
static void Main(string[] args)
{
int N=100;
float[] array = new float[N];
List<float> list=new List<float>(N);
var size=typeof(List<float>).GetField("_size", BindingFlags.Instance|BindingFlags.NonPublic);
size.SetValue(list, N);
// Now list has 100 zero items
}
Why not:
var itemsWithZeros = new float[length];
Related
I have to create a HashSet with the elements from 1 to N+1, where N is a large number (1M).
For example, if N = 5, the HashSet will have then integers {1, 2, 3, 4, 5, 6 }.
The only way I have found is:
HashSet<int> numbers = new HashSet<int>(N);
for (int i = 1; i <= (N + 1) ; i++)
{
numbers.Add(i);
}
Are there another faster (more efficient) ways to do it?
6 is a tiny number of items so I suspect the real problem is adding a few thousand items. The delays in this case are caused by buffer reallocations, not the speed of Add itself.
The solution to this is to specify even an approximate capacity when constructing the HashSet :
var set=new HashSet<int>(1000);
If, and only if, the input implements ICollection<T>, the HashSet<T>(IEnumerable<T>) constructor will check the size of input collection and use it as its capacity:
if (collection is ICollection<T> coll)
{
int count = coll.Count;
if (count > 0)
{
Initialize(count);
}
}
Explanation
Most containers in .NET use buffers internally to store data. This is far faster than implementing containers using pointers, nodes etc due to CPU cache and RAM access delays. Accessing the next item in the CPU's cache is far faster than chasing a pointer in RAM in all CPUs.
The downside is that each time the buffer is full a new one will have to be allocated. Typically, this buffer will have twice the size of the original buffer. Adding items one by one can result in log2(N) reallocations. This works fine for a moderate number of items but can result in a lot of orphaned buffers when adding eg 1000 items one by one. All those temporary buffers will have to be garbage collected at some point, causing additional delays.
Here's the code to test the three options:
var N = 1000000;
var trials = new List<(int method, TimeSpan duration)>();
for (var i = 0; i < 100; i++)
{
var sw = Stopwatch.StartNew();
HashSet<int> numbers1 = new HashSet<int>(Enumerable.Range(1, N + 1));
sw.Stop();
trials.Add((1, sw.Elapsed));
sw = Stopwatch.StartNew();
HashSet<int> numbers2 = new HashSet<int>(N);
for (int n = 1; n < N + 1; n++)
numbers2.Add(n);
sw.Stop();
trials.Add((2, sw.Elapsed));
HashSet<int> numbers3 = new HashSet<int>(N);
foreach (int n in Enumerable.Range(1, N + 1))
numbers3.Add(n);
sw.Stop();
trials.Add((3, sw.Elapsed));
}
for (int j = 1; j <= 3; j++)
Console.WriteLine(trials.Where(x => x.method == j).Average(x => x.duration.TotalMilliseconds));
Typical output is this:
31.314788
16.493208
16.493208
It is nearly twice as fast to preallocate the capacity of the HashSet<int>.
There is no difference between the traditional loop and a LINQ foreach option.
To build on #Enigmativity's answer, here's a proper benchmark using BenchmarkDotNet:
public class Benchmark
{
private const int N = 1000000;
[Benchmark]
public HashSet<int> EnumerableRange() => new HashSet<int>(Enumerable.Range(1, N + 1));
[Benchmark]
public HashSet<int> NoPreallocation()
{
var result = new HashSet<int>();
for (int n = 1; n < N + 1; n++)
{
result.Add(n);
}
return result;
}
[Benchmark]
public HashSet<int> Preallocation()
{
var result = new HashSet<int>(N);
for (int n = 1; n < N + 1; n++)
{
result.Add(n);
}
return result;
}
}
public class Program
{
public static void Main(string[] args)
{
BenchmarkRunner.Run(typeof(Program).Assembly);
}
}
With the results:
Method
Mean
Error
StdDev
EnumerableRange
29.17 ms
0.743 ms
2.179 ms
NoPreallocation
23.96 ms
0.471 ms
0.775 ms
Preallocation
11.68 ms
0.233 ms
0.665 ms
As we can see, using linq is a bit slower than not using linq (as expected), and pre-allocating saves a significant amount of time.
I've read other SO posts on the complexity of LINQ's OrderBy function such as this one and so I'm wondering why the following test I made
using System;
using System.Linq;
using System.Collections.Generic;
using System.Diagnostics;
public class Program
{
public static void Main()
{
double[] avgs = new double[100];
int tests_per_size = 1000;
Random rnd = new Random();
Stopwatch stpw = new Stopwatch();
for(int i = 1; i <= avgs.Length; ++i)
{
double sum = 0;
int[] arr = new int[i];
for(int j = 0; j < tests_per_size; ++j)
{
for(int k = 0; k < arr.Length; ++k)
arr[k] = rnd.Next(Int32.MinValue, Int32.MaxValue);
stpw.Start();
var slist = arr.OrderBy(x => x).ToList();
stpw.Stop();
sum += stpw.ElapsedTicks;
}
avgs[i-1] = sum / (double)tests_per_size;
}
foreach(var t in avgs)
Console.WriteLine(t);
}
}
gave me the following results
15076,327
17261,652
19528,579
21993,155
24674,83
26927,163
29332,665
32018,45
35143,727
38955,111
43188,589
47605,542
52243,952
57166,918
63454,059
70261,749
75997,727
82249,885
88953,873
96958,163
104520,145
112432,1
120746,806
129694,464
138588,981
148007,988
157616,249
167493,94
177748,543
188904,677
200761,557
212235,986
225877,753
239173,783
252288,474
265901,092
279629,762
294529,835
309429,827
326944,916
343254,802
361306,427
378797,508
395831,364
413546,694
431166,319
449165,652
467562,618
487180,928
505969,021
525013,641
544555,831
564859,752
585357,237
606849,766
628464,581
651009,432
673865,517
697340,663
720709,903
744837,668
769024,863
793921,415
819441,534
845185,441
873421,004
901587,713
928140,083
955403,824
983023,284
1011295,028
1040868,504
1070366,748
1100416,455
1131158,53
1162260,852
1193641,253
1225165,58
1257410,12
1289450,658
1322668,533
1358718,074
1400162,62
1440996,876
1483102,815
1531781,127
1581157,377
1627831,867
1673969,553
1713026,287
1750012,667
1787497,946
1825893,268
1864184,643
1902912,621
1942420,978
1982395,399
2023052,109
2063803,114
2106027,85
Notice how it approximately doubles every 10 numbers.
Well, for one thing, you're never Restarting your stopwatch, so the timings you're seeing are accumulative. If you change your Start() call to Restart() you'll get some saner values.
Another important point to make is that you're only testing arrays up to a size of 100, which is not nearly enough to clearly see the algorithm's asymptotic behavior.
Finally, note that you're not just testing OrderBy(): you're also testing ToList(). The effect won't be huge, but a good test should isolate the parts that you're really interested in.
I have a need to do very quick prefix "sql like" searches over a hundreds of thousands of keys. I have tried doing performance tests using a SortedList, a Dictionary, and a SortedDictionary, which I do like so :
var dictionary = new Dictionary<string, object>();
// add a million random strings
var results = dictionary.Where(x=>x.Key.StartsWith(prefix));
I find that that they all take a long time, Dictionary is the fastest, and SortedDictionary the slowest.
Then I tried a Trie implementation from http://www.codeproject.com/Articles/640998/NET-Data-Structures-for-Prefix-String-Search-and-S which is a magnitude faster, ie. milliseconds instead of seconds.
So my question is, is there no .NET collection I can use for the said requirement? I would have assumed that this would be a common requirement.
My basic test :
class Program
{
static readonly Dictionary<string, object> dictionary = new Dictionary<string, object>();
static Trie<object> trie = new Trie<object>();
static void Main(string[] args)
{
var random = new Random();
for (var i = 0; i < 100000; i++)
{
var randomstring = RandomString(random, 7);
dictionary.Add(randomstring, null);
trie.Add(randomstring, null);
}
var lookups = new string[10000];
for (var i = 0; i < lookups.Length; i++)
{
lookups[i] = RandomString(random, 3);
}
// compare searching
var sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
var exists = dictionary.Any(k => k.Key.StartsWith(lookup));
}
sw.Stop();
Console.WriteLine("dictionary.Any(k => k.Key.StartsWith(randomstring)) took : {0} ms", sw.ElapsedMilliseconds);
// test other collections
sw.Restart();
foreach (var lookup in lookups)
{
var exists = trie.Retrieve(lookup).Any();
}
sw.Stop();
Console.WriteLine("trie.Retrieve(lookup) took : {0} ms", sw.ElapsedMilliseconds);
Console.ReadKey();
}
public static string RandomString(Random random,int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
Results:
dictionary.Any(k => k.Key.StartsWith(randomstring)) took : 80990 ms
trie.Retrieve(lookup) took : 115 ms
If sorting matters, try to use a SortedList instead of SortedDictionary. They both have the same functionality but they are implemented differently. SortedList is faster when you want to enumerate the elements (and you can access the elements by index), and SortedDictionary is faster if there are a lot of elements and you want to insert a new element in the middle of the collection.
So try this:
var sortedList = new SortedList<string, object>();
// populate list...
sortedList.Keys.Any(k => k.StartsWith(lookup));
If you have a million elements, but you don't want to re-order them once the dictionary is populated, you can combine their advantages: populate a SortedDictionary with the random elements, and then create a new List<KeyValuePair<,>> or SortedList<,> from that.
So, after little test I found something close enought with usage BinarySearch only Cons is that you have to sort keys from a to z. But the biggest the list, the slower it will be so Ternary Search is fastest from all you can actualy found with binary pc architecture.
Method: (Credits shoult go to #Guffa)
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max)
{
while (max >= min)
{
var mid = (min + max) / 2;
var comp = string.CompareOrdinal(words[mid].Substring(0, prefix.Length), prefix);
if (comp >= 0)
{
if (comp > 0)
max = mid - 1;
else
return mid;
}
else
min = mid + 1;
}
return -1;
}
and test implementation
var keysToList = dictionary.Keys.OrderBy(q => q).ToList();
sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
bool exist = BinarySearchStartsWith(keysToList, lookup, 0, keysToList.Count - 1)!= -1
}
sw.Stop();
If you can sort the keys once and then use them repeatedly to look up the prefixes, then you can use a binary search to speed things up.
To get the maximum performance, I shall use two arrays, once for keys and one for values, and use the overload of Array.Sort() which sorts a main and an adjunct array.
Then you can use Array.BinarySearch() to search for the nearest key which starts with a given prefix, and return the indices for those that match.
When I try it, it seems to only take around 0.003ms per check if there are one or more matching prefixes.
Here's a runnable console application to demonstrate (remember to do your timings on a RELEASE build):
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
class Program
{
public static void Main()
{
int count = 1000000;
object obj = new object();
var keys = new string[count];
var values = new object[count];
for (int i = 0; i < count; ++i)
{
keys[i] = randomString(5, 16);
values[i] = obj;
}
// Sort key array and value arrays in tandem to keep the relation between keys and values.
Array.Sort(keys, values);
// Now you can use StartsWith() to return the indices of strings in keys[]
// that start with a specific string. The indices can be used to look up the
// corresponding values in values[].
Console.WriteLine("Count of ZZ = " + StartsWith(keys, "ZZ").Count());
// Test a load of times with 1000 random prefixes.
var prefixes = new string[1000];
for (int i = 0; i < 1000; ++i)
prefixes[i] = randomString(1, 8);
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; ++i)
for (int j = 0; j < 1000; ++j)
StartsWith(keys, prefixes[j]).Any();
Console.WriteLine("1,000,000 checks took {0} for {1} ms each.", sw.Elapsed, sw.ElapsedMilliseconds/1000000.0);
}
public static IEnumerable<int> StartsWith(string[] array, string prefix)
{
int index = Array.BinarySearch(array, prefix);
if (index < 0)
index = ~index;
// We might have landed partway through a set of matches, so find the first match.
if (index < array.Length)
while ((index > 0) && array[index-1].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
--index;
while ((index < array.Length) && array[index].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
yield return index++;
}
static string randomString(int minLength, int maxLength)
{
int length = rng.Next(minLength, maxLength);
const string CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
return new string(Enumerable.Repeat(CHARS, length)
.Select(s => s[rng.Next(s.Length)]).ToArray());
}
static readonly Random rng = new Random(12345);
}
}
I have implemented B-Tree and now I am trying to find the best size per node. I am using time benchmarking to measure the speed.
The problem is that it crashes on the second tested number in benchmarking method.
For the example down the output in console is
Benchmarking 10
Benchmarking 11
The crash is in insert method of Node class, but it does not matter because when I tested using any number as SIZE it works well. Maybe I do not understand how the class objects are created or something like that.
I also tried calling benchmarking method with variuos numbers from main, but same result, during second call it crashed.
Can somebody please look at it and explain me what am I doing wrong? Thanks in advance!
I put part of the code here and here is the whole thing http://pastebin.com/AcihW1Qk
public static void benchmark(String filename, int d, int h)
{
using (System.IO.StreamWriter file = new System.IO.StreamWriter(filename))
{
Stopwatch sw = new Stopwatch();
for(int i = d; i <= h; i++)
{
Console.WriteLine("Benchmarking SIZE = " + i.ToString());
file.WriteLine("SIZE = " + i.ToString());
sw.Start();
// code here
Node.setSize(i);
Tree tree = new Tree();
for (int k = 0; k < 10000000; k++)
tree.insert(k);
Random r = new Random(10);
for (int k = 0; k < 10000; k++)
{
int x = r.Next(10000000);
tree.contains(x);
}
file.WriteLine("Depth of tree is " + tree.depth.ToString());
// end of code
sw.Stop();
file.WriteLine("TIME = " + sw.ElapsedMilliseconds.ToString());
file.WriteLine();
sw.Reset();
}
}
static void Main(string[] args)
{
benchmark("benchmark 10-11.txt", 10,11);
}
Anyone know any speed differences between Where and FindAll on List. I know Where is part of IEnumerable and FindAll is part of List, I'm just curious what's faster.
The FindAll method of the List<T> class actually constructs a new list object, and adds results to it. The Where extension method for IEnumerable<T> will simply iterate over an existing list and yield an enumeration of the matching results without creating or adding anything (other than the enumerator itself.)
Given a small set, the two would likely perform comparably. However, given a larger set, Where should outperform FindAll, as the new List created to contain the results will have to dynamically grow to contain additional results. Memory usage of FindAll will also start to grow exponentially as the number of matching results increases, where as Where should have constant minimal memory usage (in and of itself...excluding whatever you do with the results.)
FindAll is obviously slower than Where, because it needs to create a new list.
Anyway, I think you really should consider Jon Hanna comment - you'll probably need to do some operations on your results and list would be more useful than IEnumerable in many cases.
I wrote small test, just paste it in Console App project. It measures time/ticks of: function execution, operations on results collection(to get perf. of 'real' usage, and to be sure that compiler won't optimize unused data etc. - I'm new to C# and don't know how it works yet,sorry).
Notice: every measured function except WhereIENumerable() creates new List of elements. I might be doing something wrong, but clearly iterating IEnumerable takes much more time than iterating list.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace Tests
{
public class Dummy
{
public int Val;
public Dummy(int val)
{
Val = val;
}
}
public class WhereOrFindAll
{
const int ElCount = 20000000;
const int FilterVal =1000;
const int MaxVal = 2000;
const bool CheckSum = true; // Checks sum of elements in list of resutls
static List<Dummy> list = new List<Dummy>();
public delegate void FuncToTest();
public static long TestTicks(FuncToTest function, string msg)
{
Stopwatch watch = new Stopwatch();
watch.Start();
function();
watch.Stop();
Console.Write("\r\n"+msg + "\t ticks: " + (watch.ElapsedTicks));
return watch.ElapsedTicks;
}
static void Check(List<Dummy> list)
{
if (!CheckSum) return;
Stopwatch watch = new Stopwatch();
watch.Start();
long res=0;
int count = list.Count;
for (int i = 0; i < count; i++) res += list[i].Val;
for (int i = 0; i < count; i++) res -= (long)(list[i].Val * 0.3);
watch.Stop();
Console.Write("\r\n\nCheck sum: " + res.ToString() + "\t iteration ticks: " + watch.ElapsedTicks);
}
static void Check(IEnumerable<Dummy> ieNumerable)
{
if (!CheckSum) return;
Stopwatch watch = new Stopwatch();
watch.Start();
IEnumerator<Dummy> ieNumerator = ieNumerable.GetEnumerator();
long res = 0;
while (ieNumerator.MoveNext()) res += ieNumerator.Current.Val;
ieNumerator=ieNumerable.GetEnumerator();
while (ieNumerator.MoveNext()) res -= (long)(ieNumerator.Current.Val * 0.3);
watch.Stop();
Console.Write("\r\n\nCheck sum: " + res.ToString() + "\t iteration ticks :" + watch.ElapsedTicks);
}
static void Generate()
{
if (list.Count > 0)
return;
var rand = new Random();
for (int i = 0; i < ElCount; i++)
list.Add(new Dummy(rand.Next(MaxVal)));
}
static void For()
{
List<Dummy> resList = new List<Dummy>();
int count = list.Count;
for (int i = 0; i < count; i++)
{
if (list[i].Val < FilterVal)
resList.Add(list[i]);
}
Check(resList);
}
static void Foreach()
{
List<Dummy> resList = new List<Dummy>();
int count = list.Count;
foreach (Dummy dummy in list)
{
if (dummy.Val < FilterVal)
resList.Add(dummy);
}
Check(resList);
}
static void WhereToList()
{
List<Dummy> resList = list.Where(x => x.Val < FilterVal).ToList<Dummy>();
Check(resList);
}
static void WhereIEnumerable()
{
Stopwatch watch = new Stopwatch();
IEnumerable<Dummy> iEnumerable = list.Where(x => x.Val < FilterVal);
Check(iEnumerable);
}
static void FindAll()
{
List<Dummy> resList = list.FindAll(x => x.Val < FilterVal);
Check(resList);
}
public static void Run()
{
Generate();
long[] ticks = { 0, 0, 0, 0, 0 };
for (int i = 0; i < 10; i++)
{
ticks[0] += TestTicks(For, "For \t\t");
ticks[1] += TestTicks(Foreach, "Foreach \t");
ticks[2] += TestTicks(WhereToList, "Where to list \t");
ticks[3] += TestTicks(WhereIEnumerable, "Where Ienum \t");
ticks[4] += TestTicks(FindAll, "FindAll \t");
Console.Write("\r\n---------------");
}
for (int i = 0; i < 5; i++)
Console.Write("\r\n"+ticks[i].ToString());
}
}
class Program
{
static void Main(string[] args)
{
WhereOrFindAll.Run();
Console.Read();
}
}
}
Results(ticks) - CheckSum enabled(some operations on results), mode: release without debugging(CTRL+F5):
- 16,222,276 (for ->list)
- 17,151,121 (foreach -> list)
- 4,741,494 (where ->list)
- 27,122,285 (where ->ienum)
- 18,821,571 (findall ->list)
CheckSum disabled (not using returned list at all):
- 10,885,004 (for ->list)
- 11,221,888 (foreach ->list)
- 18,688,433 (where ->list)
- 1,075 (where ->ienum)
- 13,720,243 (findall ->list)
Your results can be slightly different, to get real results you need more iterations.
UPDATE(from comment): Looking through that code I agree, .Where should have, at worst, equal performance but almost always better.
Original answer:
.FindAll() should be faster, it takes advantage of already knowing the List's size and looping through the internal array with a simple for loop. .Where() has to fire up an enumerator (a sealed framework class called WhereIterator in this case) and do the same job in a less specific way.
Keep in mind though, that .Where() is enumerable, not actively creating a List in memory and filling it. It's more like a stream, so the memory use on something very large can have a significant difference. Also, you could start using the results in a parallel fashion much faster using there .Where() approach in 4.0.
Where is much, much faster than FindAll. No matter how big the list is, Where takes exactly the same amount of time.
Of course Where just creates a query. It doesn't actually do anything, unlike FindAll which does create a list.
The answer from jrista makes senses. However, the new list adds the same objects, thus just growing with reference to existing objects, which should not be that slow.
As long as 3.5 / Linq extension are possible, Where stays better anyway.
FindAll makes much more sense when limited with 2.0