Consider this code:
var str1 = "1234567890qwertyuiop[asdfghjkl;zxcvbnm1,.";
Dictionary<string, Object> objects = new Dictionary<string, object> {{str1, new object()}};
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 100000000; i++)
{
object result;
objects.TryGetValue(str1, out result);
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
//////////////////////////////////////////////////////////////
var list = new List<string>();
list.Add(str1);
stopwatch.Start();
for (int i = 0; i < 100000000; i++)
{
foreach (var item in list)
{
var result = "";
if (item == str1)
{
result = item;
}
}
}
stopwatch.Stop();
Console.WriteLine(stopwatch.ElapsedMilliseconds);
When you run this code will see this result:
5157 // Dictionary
3881 // List
So the list is faster than a dictionary.
I want to know why. Is there any relation between string and length?
Because you only have one element, your list code simply skips a hash check and does no extra work.
Because you have such a long key, the dictionary is doing even more work to compute the key's hash, whereas the list implementation skips all of that because the strings are reference-equal (it doesn't need to compare the content at all).
Add more items and the performance will change dramatically.
In short, O(1) is not faster than O(n) if n is 1.
Related
User can pass any number of list with same number of elements in it. Example- user has passed below 3 (could be dynamic with same number of elements in it) list -
hospitalId - H11, H12, H13...n
patientId - P11, P12, P13...n
statusId - S11, S13, S11...n
What is the efficient way of creating a set out of it and storing it as a string in below format? Need a c# code for it.
expected output -
"((H11,P11, S11), (H12, P12, S13), (H13, P13, S11))"
You should iterate through your list and append them index wise to prepare the result.
StringBuilder builder = new StringBuilder();
builder.Append("(");
for(var index = 0; index < n; index++)
{
builder.AppendFormat("({0}, {1}, {2})", hospitalId[index], patientId[index], statusId[index]);
}
builder.Append(")");
var result = builder.ToString();
If you have n number of List<T> items with the same length, a basic loop will do the trick. Here's a version as an extension method that will take any number of lists as an input:
public static IEnumerable<IEnumerable<T>> ZipMultiple<T>(this List<List<T>> source)
{
var counts = source.Select(s => s.Count).Distinct();
if (counts.Count() != 1)
{
throw new ArgumentException("Lists aren't the same length");
}
for (var i = 0; i < counts.First(); i++)
{
var item = new List<T>();
for (var j = 0; j < source.Count; j++)
{
item.Add(source[j][i]);
}
yield return item;
}
}
After that, it's pretty simple to convert the result into a string in another loop, or you can do it as a single liner:
var zipped = lists.ZipMultiple();
var output = $"({string.Join(", ", zipped.Select(x => $"({string.Join(",", x)})"))})";
I have a need to do very quick prefix "sql like" searches over a hundreds of thousands of keys. I have tried doing performance tests using a SortedList, a Dictionary, and a SortedDictionary, which I do like so :
var dictionary = new Dictionary<string, object>();
// add a million random strings
var results = dictionary.Where(x=>x.Key.StartsWith(prefix));
I find that that they all take a long time, Dictionary is the fastest, and SortedDictionary the slowest.
Then I tried a Trie implementation from http://www.codeproject.com/Articles/640998/NET-Data-Structures-for-Prefix-String-Search-and-S which is a magnitude faster, ie. milliseconds instead of seconds.
So my question is, is there no .NET collection I can use for the said requirement? I would have assumed that this would be a common requirement.
My basic test :
class Program
{
static readonly Dictionary<string, object> dictionary = new Dictionary<string, object>();
static Trie<object> trie = new Trie<object>();
static void Main(string[] args)
{
var random = new Random();
for (var i = 0; i < 100000; i++)
{
var randomstring = RandomString(random, 7);
dictionary.Add(randomstring, null);
trie.Add(randomstring, null);
}
var lookups = new string[10000];
for (var i = 0; i < lookups.Length; i++)
{
lookups[i] = RandomString(random, 3);
}
// compare searching
var sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
var exists = dictionary.Any(k => k.Key.StartsWith(lookup));
}
sw.Stop();
Console.WriteLine("dictionary.Any(k => k.Key.StartsWith(randomstring)) took : {0} ms", sw.ElapsedMilliseconds);
// test other collections
sw.Restart();
foreach (var lookup in lookups)
{
var exists = trie.Retrieve(lookup).Any();
}
sw.Stop();
Console.WriteLine("trie.Retrieve(lookup) took : {0} ms", sw.ElapsedMilliseconds);
Console.ReadKey();
}
public static string RandomString(Random random,int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
Results:
dictionary.Any(k => k.Key.StartsWith(randomstring)) took : 80990 ms
trie.Retrieve(lookup) took : 115 ms
If sorting matters, try to use a SortedList instead of SortedDictionary. They both have the same functionality but they are implemented differently. SortedList is faster when you want to enumerate the elements (and you can access the elements by index), and SortedDictionary is faster if there are a lot of elements and you want to insert a new element in the middle of the collection.
So try this:
var sortedList = new SortedList<string, object>();
// populate list...
sortedList.Keys.Any(k => k.StartsWith(lookup));
If you have a million elements, but you don't want to re-order them once the dictionary is populated, you can combine their advantages: populate a SortedDictionary with the random elements, and then create a new List<KeyValuePair<,>> or SortedList<,> from that.
So, after little test I found something close enought with usage BinarySearch only Cons is that you have to sort keys from a to z. But the biggest the list, the slower it will be so Ternary Search is fastest from all you can actualy found with binary pc architecture.
Method: (Credits shoult go to #Guffa)
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max)
{
while (max >= min)
{
var mid = (min + max) / 2;
var comp = string.CompareOrdinal(words[mid].Substring(0, prefix.Length), prefix);
if (comp >= 0)
{
if (comp > 0)
max = mid - 1;
else
return mid;
}
else
min = mid + 1;
}
return -1;
}
and test implementation
var keysToList = dictionary.Keys.OrderBy(q => q).ToList();
sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
bool exist = BinarySearchStartsWith(keysToList, lookup, 0, keysToList.Count - 1)!= -1
}
sw.Stop();
If you can sort the keys once and then use them repeatedly to look up the prefixes, then you can use a binary search to speed things up.
To get the maximum performance, I shall use two arrays, once for keys and one for values, and use the overload of Array.Sort() which sorts a main and an adjunct array.
Then you can use Array.BinarySearch() to search for the nearest key which starts with a given prefix, and return the indices for those that match.
When I try it, it seems to only take around 0.003ms per check if there are one or more matching prefixes.
Here's a runnable console application to demonstrate (remember to do your timings on a RELEASE build):
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
class Program
{
public static void Main()
{
int count = 1000000;
object obj = new object();
var keys = new string[count];
var values = new object[count];
for (int i = 0; i < count; ++i)
{
keys[i] = randomString(5, 16);
values[i] = obj;
}
// Sort key array and value arrays in tandem to keep the relation between keys and values.
Array.Sort(keys, values);
// Now you can use StartsWith() to return the indices of strings in keys[]
// that start with a specific string. The indices can be used to look up the
// corresponding values in values[].
Console.WriteLine("Count of ZZ = " + StartsWith(keys, "ZZ").Count());
// Test a load of times with 1000 random prefixes.
var prefixes = new string[1000];
for (int i = 0; i < 1000; ++i)
prefixes[i] = randomString(1, 8);
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; ++i)
for (int j = 0; j < 1000; ++j)
StartsWith(keys, prefixes[j]).Any();
Console.WriteLine("1,000,000 checks took {0} for {1} ms each.", sw.Elapsed, sw.ElapsedMilliseconds/1000000.0);
}
public static IEnumerable<int> StartsWith(string[] array, string prefix)
{
int index = Array.BinarySearch(array, prefix);
if (index < 0)
index = ~index;
// We might have landed partway through a set of matches, so find the first match.
if (index < array.Length)
while ((index > 0) && array[index-1].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
--index;
while ((index < array.Length) && array[index].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
yield return index++;
}
static string randomString(int minLength, int maxLength)
{
int length = rng.Next(minLength, maxLength);
const string CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
return new string(Enumerable.Repeat(CHARS, length)
.Select(s => s[rng.Next(s.Length)]).ToArray());
}
static readonly Random rng = new Random(12345);
}
}
So I noticed that a treeview took unusually long to sort, first I figured that most of the time was spent repainting the control after adding each sorted item. But eitherway I had a gut feeling that List<T>.Sort() was taking longer than reasonable so I used a custom sort method to benchmark it against. The results were interesting, List<T>.Sort() took ~20 times longer, that's the biggest disappointment in performance I've ever encountered in .NET for such a simple task.
My question is, what could be the reason for this? My guess is the overhead of invoking the comparison delegate, which further has to call String.Compare() (in case of string sorting). Increasing the size of the list appears to increase the performance gap. Any ideas? I'm trying to use .NET classes as much as possible but in cases like this I just can't.
Edit:
static List<string> Sort(List<string> list)
{
if (list.Count == 0)
{
return new List<string>();
}
List<string> _list = new List<string>(list.Count);
_list.Add(list[0]);
int length = list.Count;
for (int i = 1; i < length; i++)
{
string item = list[i];
int j;
for (j = _list.Count - 1; j >= 0; j--)
{
if (String.Compare(item, _list[j]) > 0)
{
_list.Insert(j + 1, item);
break;
}
}
if (j == -1)
{
_list.Insert(0, item);
}
}
return _list;
}
Answer: It's not.
I ran the following benchmark in a simple console app and your code was slower:
static void Main(string[] args)
{
long totalListSortTime = 0;
long totalCustomSortTime = 0;
for (int c = 0; c < 100; c++)
{
List<string> list1 = new List<string>();
List<string> list2 = new List<string>();
for (int i = 0; i < 5000; i++)
{
var rando = RandomString(15);
list1.Add(rando);
list2.Add(rando);
}
Stopwatch watch1 = new Stopwatch();
Stopwatch watch2 = new Stopwatch();
watch2.Start();
list2 = Sort(list2);
watch2.Stop();
totalCustomSortTime += watch2.ElapsedMilliseconds;
watch1.Start();
list1.Sort();
watch1.Stop();
totalListSortTime += watch1.ElapsedMilliseconds;
}
Console.WriteLine("totalListSortTime = " + totalListSortTime);
Console.WriteLine("totalCustomSortTime = " + totalCustomSortTime);
Console.ReadLine();
}
Result:
I haven't had the time to fully test it because I had a blackout (writing from phone now), but it would seem your code (from Pastebin) is sorting several times an already ordered list, so it would seem that your algorithm could be faster to...sort an already sorted list. In case the standard .NET implementation is a Quick Sort, this would be natural since QS has its worst case scenario on already sorted lists.
A question posted earlier got me thinking. Would Any() and Count() perform similarly when used on an empty list?
As explained here, both should go through the same steps of GetEnumerator()/MoveNext()/Dispose().
I tested this out using quick program on LINQPad:
static void Main()
{
var list = new List<int>();
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 10000; i++)
list.Any();
stopwatch.Stop();
Console.WriteLine("Time elapsed for Any() : {0}", stopwatch.Elapsed);
stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < 10000; i++)
list.Count();
stopwatch.Stop();
Console.WriteLine("Time elapsed for Count(): {0}", stopwatch.Elapsed);
}
And the general result seems to indicate that Count() is faster in this situation. Why is that?
I'm not sure if I got the benchmark right, I would appreciate any correction if not.
Edit: I understand that it would make more sense semantically. The first link I've posted in the question shows a situation where it does make sense to do use Count() directly since the value would be used, hence the question.
The Count() method is optimized for ICollection<T> type, so the pattern GetEnumerator()/MoveNext()/Dispose() is not used.
list.Count();
Is translated to
((ICollection)list).Count;
Whereas the Any() has to build an enumerator.
So the Count() method is faster.
Here a benchmarks for 4 differents IEnumerable instance. The MyEmpty looks like IEnumerable<T> MyEmpty<T>() { yield break; }
iterations : 100000000
Function Any() Count()
new List<int>() 4.310 2.252
Enumerable.Empty<int>() 3.623 6.975
new int[0] 3.960 7.036
MyEmpty<int>() 5.631 7.194
As casperOne said in the comment, Enumerable.Empty<int>() is ICollection<int>, because it is an array, and arrays are not good with the Count() extension because the cast to ICollection<int> is not trivial.
Anyway, for a homemade empty IEnumerable, we can see what we expected, that Count() is slower than Any(), due to the overhead of testing if the IEnumerable is a ICollection.
Complete benchmark:
class Program
{
public const long Iterations = (long)1e8;
static void Main()
{
var results = new Dictionary<string, Tuple<TimeSpan, TimeSpan>>();
results.Add("new List<int>()", Benchmark(new List<int>(), Iterations));
results.Add("Enumerable.Empty<int>()", Benchmark(Enumerable.Empty<int>(), Iterations));
results.Add("new int[0]", Benchmark(new int[0], Iterations));
results.Add("MyEmpty<int>()", Benchmark(MyEmpty<int>(), Iterations));
Console.WriteLine("Function".PadRight(30) + "Any()".PadRight(10) + "Count()");
foreach (var result in results)
{
Console.WriteLine("{0}{1}{2}", result.Key.PadRight(30), Math.Round(result.Value.Item1.TotalSeconds, 3).ToString().PadRight(10), Math.Round(result.Value.Item2.TotalSeconds, 3));
}
Console.ReadLine();
}
public static Tuple<TimeSpan, TimeSpan> Benchmark(IEnumerable<int> source, long iterations)
{
var anyWatch = new Stopwatch();
anyWatch.Start();
for (long i = 0; i < iterations; i++) source.Any();
anyWatch.Stop();
var countWatch = new Stopwatch();
countWatch.Start();
for (long i = 0; i < iterations; i++) source.Count();
countWatch.Stop();
return new Tuple<TimeSpan, TimeSpan>(anyWatch.Elapsed, countWatch.Elapsed);
}
public static IEnumerable<T> MyEmpty<T>() { yield break; }
}
Anyone know any speed differences between Where and FindAll on List. I know Where is part of IEnumerable and FindAll is part of List, I'm just curious what's faster.
The FindAll method of the List<T> class actually constructs a new list object, and adds results to it. The Where extension method for IEnumerable<T> will simply iterate over an existing list and yield an enumeration of the matching results without creating or adding anything (other than the enumerator itself.)
Given a small set, the two would likely perform comparably. However, given a larger set, Where should outperform FindAll, as the new List created to contain the results will have to dynamically grow to contain additional results. Memory usage of FindAll will also start to grow exponentially as the number of matching results increases, where as Where should have constant minimal memory usage (in and of itself...excluding whatever you do with the results.)
FindAll is obviously slower than Where, because it needs to create a new list.
Anyway, I think you really should consider Jon Hanna comment - you'll probably need to do some operations on your results and list would be more useful than IEnumerable in many cases.
I wrote small test, just paste it in Console App project. It measures time/ticks of: function execution, operations on results collection(to get perf. of 'real' usage, and to be sure that compiler won't optimize unused data etc. - I'm new to C# and don't know how it works yet,sorry).
Notice: every measured function except WhereIENumerable() creates new List of elements. I might be doing something wrong, but clearly iterating IEnumerable takes much more time than iterating list.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace Tests
{
public class Dummy
{
public int Val;
public Dummy(int val)
{
Val = val;
}
}
public class WhereOrFindAll
{
const int ElCount = 20000000;
const int FilterVal =1000;
const int MaxVal = 2000;
const bool CheckSum = true; // Checks sum of elements in list of resutls
static List<Dummy> list = new List<Dummy>();
public delegate void FuncToTest();
public static long TestTicks(FuncToTest function, string msg)
{
Stopwatch watch = new Stopwatch();
watch.Start();
function();
watch.Stop();
Console.Write("\r\n"+msg + "\t ticks: " + (watch.ElapsedTicks));
return watch.ElapsedTicks;
}
static void Check(List<Dummy> list)
{
if (!CheckSum) return;
Stopwatch watch = new Stopwatch();
watch.Start();
long res=0;
int count = list.Count;
for (int i = 0; i < count; i++) res += list[i].Val;
for (int i = 0; i < count; i++) res -= (long)(list[i].Val * 0.3);
watch.Stop();
Console.Write("\r\n\nCheck sum: " + res.ToString() + "\t iteration ticks: " + watch.ElapsedTicks);
}
static void Check(IEnumerable<Dummy> ieNumerable)
{
if (!CheckSum) return;
Stopwatch watch = new Stopwatch();
watch.Start();
IEnumerator<Dummy> ieNumerator = ieNumerable.GetEnumerator();
long res = 0;
while (ieNumerator.MoveNext()) res += ieNumerator.Current.Val;
ieNumerator=ieNumerable.GetEnumerator();
while (ieNumerator.MoveNext()) res -= (long)(ieNumerator.Current.Val * 0.3);
watch.Stop();
Console.Write("\r\n\nCheck sum: " + res.ToString() + "\t iteration ticks :" + watch.ElapsedTicks);
}
static void Generate()
{
if (list.Count > 0)
return;
var rand = new Random();
for (int i = 0; i < ElCount; i++)
list.Add(new Dummy(rand.Next(MaxVal)));
}
static void For()
{
List<Dummy> resList = new List<Dummy>();
int count = list.Count;
for (int i = 0; i < count; i++)
{
if (list[i].Val < FilterVal)
resList.Add(list[i]);
}
Check(resList);
}
static void Foreach()
{
List<Dummy> resList = new List<Dummy>();
int count = list.Count;
foreach (Dummy dummy in list)
{
if (dummy.Val < FilterVal)
resList.Add(dummy);
}
Check(resList);
}
static void WhereToList()
{
List<Dummy> resList = list.Where(x => x.Val < FilterVal).ToList<Dummy>();
Check(resList);
}
static void WhereIEnumerable()
{
Stopwatch watch = new Stopwatch();
IEnumerable<Dummy> iEnumerable = list.Where(x => x.Val < FilterVal);
Check(iEnumerable);
}
static void FindAll()
{
List<Dummy> resList = list.FindAll(x => x.Val < FilterVal);
Check(resList);
}
public static void Run()
{
Generate();
long[] ticks = { 0, 0, 0, 0, 0 };
for (int i = 0; i < 10; i++)
{
ticks[0] += TestTicks(For, "For \t\t");
ticks[1] += TestTicks(Foreach, "Foreach \t");
ticks[2] += TestTicks(WhereToList, "Where to list \t");
ticks[3] += TestTicks(WhereIEnumerable, "Where Ienum \t");
ticks[4] += TestTicks(FindAll, "FindAll \t");
Console.Write("\r\n---------------");
}
for (int i = 0; i < 5; i++)
Console.Write("\r\n"+ticks[i].ToString());
}
}
class Program
{
static void Main(string[] args)
{
WhereOrFindAll.Run();
Console.Read();
}
}
}
Results(ticks) - CheckSum enabled(some operations on results), mode: release without debugging(CTRL+F5):
- 16,222,276 (for ->list)
- 17,151,121 (foreach -> list)
- 4,741,494 (where ->list)
- 27,122,285 (where ->ienum)
- 18,821,571 (findall ->list)
CheckSum disabled (not using returned list at all):
- 10,885,004 (for ->list)
- 11,221,888 (foreach ->list)
- 18,688,433 (where ->list)
- 1,075 (where ->ienum)
- 13,720,243 (findall ->list)
Your results can be slightly different, to get real results you need more iterations.
UPDATE(from comment): Looking through that code I agree, .Where should have, at worst, equal performance but almost always better.
Original answer:
.FindAll() should be faster, it takes advantage of already knowing the List's size and looping through the internal array with a simple for loop. .Where() has to fire up an enumerator (a sealed framework class called WhereIterator in this case) and do the same job in a less specific way.
Keep in mind though, that .Where() is enumerable, not actively creating a List in memory and filling it. It's more like a stream, so the memory use on something very large can have a significant difference. Also, you could start using the results in a parallel fashion much faster using there .Where() approach in 4.0.
Where is much, much faster than FindAll. No matter how big the list is, Where takes exactly the same amount of time.
Of course Where just creates a query. It doesn't actually do anything, unlike FindAll which does create a list.
The answer from jrista makes senses. However, the new list adds the same objects, thus just growing with reference to existing objects, which should not be that slow.
As long as 3.5 / Linq extension are possible, Where stays better anyway.
FindAll makes much more sense when limited with 2.0