Linq Optimization for Count And Group By

Linq Optimization for Count And Group By - c#

i've written written a code for counting each byte frequency in binary file. Using Linq. Code seem to slow when performing the Linq expression. Its seem hard to implement Parallelism on this kind of logic. To build the freq table over 475MB it took approx 1 mins.
class Program
{
static void Main(string[] args)
{
Dictionary<byte, int> freq = new Dictionary<byte, int>();
Stopwatch sw = new Stopwatch();
sw.Start();
//File Size 478.668 KB
byte[] ltext = File.ReadAllBytes(#"D:\Setup.exe");
sw.Stop();
Console.WriteLine("Reading File {0}", GetTime(sw));
sw.Start();
Dictionary<byte, int> result = (from i in ltext
group i by i into g
orderby g.Count() descending
select new { Key = g.Key, Freq = g.Count() })
.ToDictionary(x => x.Key, x => x.Freq);
sw.Stop();
Console.WriteLine("Generating Freq Table {0}", GetTime(sw));
foreach (var i in result)
{
Console.WriteLine(i);
}
Console.WriteLine(result.Count);
Console.ReadLine();
}
static string GetTime(Stopwatch sw)
{
TimeSpan ts = sw.Elapsed;
string elapsedTime = String.Format("{0} min {1} sec {2} ms",ts.Minutes, ts.Seconds, ts.Milliseconds);
return elapsedTime;
}
I've tried to implement non linq solution using few loops, the performance its about the same. Please, any advice to optimize this. Sorry For my bad English

This took a bit over a second on a 442MB file on my otherwise poky Dell laptop:
byte[] ltext = File.ReadAllBytes(#"c:\temp\bigfile.bin");
var freq = new long[256];
var sw = Stopwatch.StartNew();
foreach (byte b in ltext) {
freq[b]++;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Very hard to beat the raw perf of an array.

The following displays the frequency of bytes in descending order in a 465MB file on my machine in under 9 seconds when build in release mode.
Note, I've made it faster by reading the file in 100000 byte blocks (you can experiment with this - 16K blocks made no appreciable difference on my machine). The point is that the inner loop is the one supplying bytes. Calling Stream.ReadByte() is fast but not nearly as fast as indexing a byte in an array.
Also, reading the whole file into memory exerts extreme memory pressure which will hamper performance and will fail completely if the file is large enough.
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
class Program
{
static void Main( string[] args )
{
Console.WriteLine( "Reading file..." );
var sw = Stopwatch.StartNew();
var frequency = new long[ 256 ];
using ( var input = File.OpenRead( #"c:\Temp\TestFile.dat" ) )
{
var buffer = new byte[ 100000 ];
int bytesRead;
do
{
bytesRead = input.Read( buffer, 0, buffer.Length );
for ( var i = 0; i < bytesRead; i++ )
frequency[ buffer[ i ] ]++;
} while ( bytesRead == buffer.Length );
}
Console.WriteLine( "Read file in " + sw.ElapsedMilliseconds + "ms" );
var result = frequency.Select( ( f, i ) => new ByteFrequency { Byte = i, Frequency = f } )
.OrderByDescending( x => x.Frequency );
foreach ( var byteCount in result )
Console.WriteLine( byteCount.Byte + " " + byteCount.Frequency );
}
public class ByteFrequency
{
public int Byte { get; set; }
public long Frequency { get; set; }
}
}

Why not just
int[] freq = new int[256];
foreach (byte b in ltext)
freq[b]++;
?

Related

C# OpenCL GPU implementation for double array math

How can I make the for loop of this function to use the GPU with OpenCL?
public static double[] Calculate(double[] num, int period)
{
var final = new double[num.Length];
double sum = num[0];
double coeff = 2.0 / (1.0 + period);
for (int i = 0; i < num.Length; i++)
{
sum += coeff * (num[i] - sum);
final[i] = sum;
}
return final;
}

Your problem as written does not fit well with something that would work on a GPU. You cannot parallelize (in a way that improves performance) the operation on a single array because the value of the nth element depends on elements 1 to n. However, you can utilize the GPU to process multiple arrays, where each GPU core operates on a separate array.
The full code for the solution is at the end of the answer, but the results of the test, to calculate on 10,000 arrays each of which has 10,000 elements, generates the following (on a GTX1080M and an i7 7700k with 32GB RAM):
Task Generating Data: 1096.4583ms
Task CPU Single Thread: 596.2624ms
Task CPU Parallel: 179.1717ms
GPU CPU->GPU: 89ms
GPU Execute: 86ms
GPU GPU->CPU: 29ms
Task Running GPU: 921.4781ms
Finished
In this test, we measure the speed at which we can generate results into a managed C# array using the CPU with one thread, the CPU with all threads, and finally the GPU using all cores. We validate that the results from each test are identical, using the function AreTheSame.
The fastest time is processing the arrays on the CPU using all threads (Task CPU Parallel: 179ms).
The GPU is actually the slowest (Task Running GPU: 922ms), but this is because of the time taken to reformat the C# arrays in a way that they can be transferred onto the GPU.
If this bottleneck were removed (which is quite possible, depending on your use case), the GPU could potentially be the fastest. If the data were already formatted in a manner that can be immediately be transferred onto the GPU, the total processing time for the GPU would be 204ms (CPU->GPU: 89ms + Execute: 86ms + GPU->CPU: 29 ms = 204ms). This is still slower than the parallel CPU option, but on a different sort of data set, it might be faster.
To get the data back from the GPU (the most important part of actually using the GPU), we use the function ComputeCommandQueue.Read. This transfers the altered array on the GPU back to the CPU.
To run the following code, reference the Cloo Nuget Package (I used 0.9.1). And make sure to compile on x64 (you will need the memory). You may need to update your graphics card driver too if it fails to find an OpenCL device.
class Program
{
static string CalculateKernel
{
get
{
return #"
kernel void Calc(global int* offsets, global int* lengths, global double* doubles, double periodFactor)
{
int id = get_global_id(0);
int start = offsets[id];
int length = lengths[id];
int end = start + length;
double sum = doubles[start];
for(int i = start; i < end; i++)
{
sum = sum + periodFactor * ( doubles[i] - sum );
doubles[i] = sum;
}
}";
}
}
public static double[] Calculate(double[] num, int period)
{
var final = new double[num.Length];
double sum = num[0];
double coeff = 2.0 / (1.0 + period);
for (int i = 0; i < num.Length; i++)
{
sum += coeff * (num[i] - sum);
final[i] = sum;
}
return final;
}
static void Main(string[] args)
{
int maxElements = 10000;
int numArrays = 10000;
int computeCores = 2048;
double[][] sets = new double[numArrays][];
using (Timer("Generating Data"))
{
Random elementRand = new Random(1);
for (int i = 0; i < numArrays; i++)
{
sets[i] = GetRandomDoubles(elementRand.Next((int)(maxElements * 0.9), maxElements), randomSeed: i);
}
}
int period = 14;
double[][] singleResults;
using (Timer("CPU Single Thread"))
{
singleResults = CalculateCPU(sets, period);
}
double[][] parallelResults;
using (Timer("CPU Parallel"))
{
parallelResults = CalculateCPUParallel(sets, period);
}
if (!AreTheSame(singleResults, parallelResults)) throw new Exception();
double[][] gpuResults;
using (Timer("Running GPU"))
{
gpuResults = CalculateGPU(computeCores, sets, period);
}
if (!AreTheSame(singleResults, gpuResults)) throw new Exception();
Console.WriteLine("Finished");
Console.ReadKey();
}
public static bool AreTheSame(double[][] a1, double[][] a2)
{
if (a1.Length != a2.Length) return false;
for (int i = 0; i < a1.Length; i++)
{
var ar1 = a1[i];
var ar2 = a2[i];
if (ar1.Length != ar2.Length) return false;
for (int j = 0; j < ar1.Length; j++)
if (Math.Abs(ar1[j] - ar2[j]) > 0.0000001) return false;
}
return true;
}
public static double[][] CalculateGPU(int partitionSize, double[][] sets, int period)
{
ComputeContextPropertyList cpl = new ComputeContextPropertyList(ComputePlatform.Platforms[0]);
ComputeContext context = new ComputeContext(ComputeDeviceTypes.Gpu, cpl, null, IntPtr.Zero);
ComputeProgram program = new ComputeProgram(context, new string[] { CalculateKernel });
program.Build(null, null, null, IntPtr.Zero);
ComputeCommandQueue commands = new ComputeCommandQueue(context, context.Devices[0], ComputeCommandQueueFlags.None);
ComputeEventList events = new ComputeEventList();
ComputeKernel kernel = program.CreateKernel("Calc");
double[][] results = new double[sets.Length][];
double periodFactor = 2d / (1d + period);
Stopwatch sendStopWatch = new Stopwatch();
Stopwatch executeStopWatch = new Stopwatch();
Stopwatch recieveStopWatch = new Stopwatch();
int offset = 0;
while (true)
{
int first = offset;
int last = Math.Min(offset + partitionSize, sets.Length);
int length = last - first;
var merged = Merge(sets, first, length);
sendStopWatch.Start();
ComputeBuffer<int> offsetBuffer = new ComputeBuffer<int>(
context,
ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer,
merged.Offsets);
ComputeBuffer<int> lengthsBuffer = new ComputeBuffer<int>(
context,
ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer,
merged.Lengths);
ComputeBuffer<double> doublesBuffer = new ComputeBuffer<double>(
context,
ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer,
merged.Doubles);
kernel.SetMemoryArgument(0, offsetBuffer);
kernel.SetMemoryArgument(1, lengthsBuffer);
kernel.SetMemoryArgument(2, doublesBuffer);
kernel.SetValueArgument(3, periodFactor);
sendStopWatch.Stop();
executeStopWatch.Start();
commands.Execute(kernel, null, new long[] { merged.Lengths.Length }, null, events);
executeStopWatch.Stop();
using (var pin = Pinned(merged.Doubles))
{
recieveStopWatch.Start();
commands.Read(doublesBuffer, false, 0, merged.Doubles.Length, pin.Address, events);
commands.Finish();
recieveStopWatch.Stop();
}
for (int i = 0; i < merged.Lengths.Length; i++)
{
int len = merged.Lengths[i];
int off = merged.Offsets[i];
var res = new double[len];
Array.Copy(merged.Doubles,off,res,0,len);
results[first + i] = res;
}
offset += partitionSize;
if (offset >= sets.Length) break;
}
Console.WriteLine("GPU CPU->GPU: " + recieveStopWatch.ElapsedMilliseconds + "ms");
Console.WriteLine("GPU Execute: " + executeStopWatch.ElapsedMilliseconds + "ms");
Console.WriteLine("GPU GPU->CPU: " + sendStopWatch.ElapsedMilliseconds + "ms");
return results;
}
public static PinnedHandle Pinned(object obj) => new PinnedHandle(obj);
public class PinnedHandle : IDisposable
{
public IntPtr Address => handle.AddrOfPinnedObject();
private GCHandle handle;
public PinnedHandle(object val)
{
handle = GCHandle.Alloc(val, GCHandleType.Pinned);
}
public void Dispose()
{
handle.Free();
}
}
public class MergedResults
{
public double[] Doubles { get; set; }
public int[] Lengths { get; set; }
public int[] Offsets { get; set; }
}
public static MergedResults Merge(double[][] sets, int offset, int length)
{
List<int> lengths = new List<int>(length);
List<int> offsets = new List<int>(length);
for (int i = 0; i < length; i++)
{
var arr = sets[i + offset];
lengths.Add(arr.Length);
}
var totalLength = lengths.Sum();
double[] doubles = new double[totalLength];
int dataOffset = 0;
for (int i = 0; i < length; i++)
{
var arr = sets[i + offset];
Array.Copy(arr, 0, doubles, dataOffset, arr.Length);
offsets.Add(dataOffset);
dataOffset += arr.Length;
}
return new MergedResults()
{
Doubles = doubles,
Lengths = lengths.ToArray(),
Offsets = offsets.ToArray(),
};
}
public static IDisposable Timer(string name)
{
return new SWTimer(name);
}
public class SWTimer : IDisposable
{
private Stopwatch _sw;
private string _name;
public SWTimer(string name)
{
_name = name;
_sw = Stopwatch.StartNew();
}
public void Dispose()
{
_sw.Stop();
Console.WriteLine("Task " + _name + ": " + _sw.Elapsed.TotalMilliseconds + "ms");
}
}
public static double[][] CalculateCPU(double[][] arrays, int period)
{
double[][] results = new double[arrays.Length][];
for (var index = 0; index < arrays.Length; index++)
{
var arr = arrays[index];
results[index] = Calculate(arr, period);
}
return results;
}
public static double[][] CalculateCPUParallel(double[][] arrays, int period)
{
double[][] results = new double[arrays.Length][];
Parallel.For(0, arrays.Length, i =>
{
var arr = arrays[i];
results[i] = Calculate(arr, period);
});
return results;
}
static double[] GetRandomDoubles(int num, int randomSeed)
{
Random r = new Random(randomSeed);
var res = new double[num];
for (int i = 0; i < num; i++)
res[i] = r.NextDouble() * 0.9 + 0.05;
return res;
}
}

as commenter Cory stated refer to this link for setup.
How to use your GPU in .NET
Here is how you would use this project:
Add the Nuget Package Cloo
Add reference to OpenCLlib.dll
Download OpenCLLib.zip
Add using OpenCL
static void Main(string[] args)
{
int[] Primes = { 1,2,3,4,5,6,7 };
EasyCL cl = new EasyCL();
cl.Accelerator = AcceleratorDevice.GPU;
cl.LoadKernel(IsPrime);
cl.Invoke("GetIfPrime", 0, Primes.Length, Primes, 1.0);
}
static string IsPrime
{
get
{
return #"
kernel void GetIfPrime(global int* num, int period)
{
int index = get_global_id(0);
int sum = (2.0 / (1.0 + period)) * (num[index] - num[0]);
printf("" %d \n"",sum);
}";
}
}

for (int i = 0; i < num.Length; i++)
{
sum += coeff * (num[i] - sum);
final[i] = sum;
}
means first element is multiplied by coeff 1 time and subtracted from 2nd element. First element also multiplied by square of coeff and this time added to 3rd element. Then first element multiplied by cube of coeff and subtracted from 4th element.
This is going like this:
-e0*c*c*c + e1*c*c - e2*c = f3
e0*c*c*c*c - e1*c*c*c + e2*c*c - e3*c = f4
-e0*c*c*c*c*c + e1*c*c*c*c - e2*c*c*c + e3*c*c - e4*c =f5
For all elements, scan through for all smaller id elements and compute this:
if difference of id values(lets call it k) of elements is odd, take subtraction, if not then take addition. Before addition or subtraction, multiply that value by k-th power of coeff. Lastly, multiply the current num value by coefficient and add it to current cell. Current cell value is final(i).
This is O(N*N) and looks like an all-pairs compute kernel. An example using an open-source C# OpenCL project:
ClNumberCruncher cruncher = new ClNumberCruncher(ClPlatforms.all().gpus(), #"
__kernel void foo(__global double * num, __global double * final, __global int *parameters)
{
int threadId = get_global_id(0);
int period = parameters[0];
double coeff = 2.0 / (1.0 + period);
double sumOfElements = 0.0;
for(int i=0;i<threadId;i++)
{
// negativity of coeff is to select addition or subtraction for different powers of coeff
double powKofCoeff = pow(-coeff,threadId-i);
sumOfElements += powKofCoeff * num[i];
}
final[threadId] = sumOfElements + num[threadId] * coeff;
}
");
cruncher.performanceFeed = true; // getting benchmark feedback on console
double[] numArray = new double[10000];
double[] finalArray = new double[10000];
int[] parameters = new int[10];
int period = 15;
parameters[0] = period;
ClArray<double> numGpuArray = numArray;
numGpuArray.readOnly = true; // gpus read this from host
ClArray<double> finalGpuArray = finalArray; // finalArray will have results
finalGpuArray.writeOnly = true; // gpus write this to host
ClArray<int> parametersGpu = parameters;
parametersGpu.readOnly = true;
// calculate kernels with exact same ordering of parameters
// num(double),final(double),parameters(int)
// finalGpuArray points to __global double * final
numGpuArray.nextParam(finalGpuArray, parametersGpu).compute(cruncher, 1, "foo", 10000, 100);
// first compute always lags because of compiling the kernel so here are repeated computes to get actual performance
numGpuArray.nextParam(finalGpuArray, parametersGpu).compute(cruncher, 1, "foo", 10000, 100);
numGpuArray.nextParam(finalGpuArray, parametersGpu).compute(cruncher, 1, "foo", 10000, 100);
Results are on finalArray array for 10000 elements, using 100 workitems per workitem-group.
GPGPU part takes 82ms on a rx550 gpu which has very low ratio of 64bit-to-32bit compute performance(because consumer gaming cards are not good at double precision for new series). An Nvidia Tesla or an Amd Vega would easily compute this kernel without crippled performance. Fx8150(8 cores) completes in 683ms. If you need to specifically select only an integrated-GPU and its CPU, you can use
ClPlatforms.all().gpus().devicesWithHostMemorySharing() + ClPlatforms.all().cpus() when creating ClNumberCruncher instance.
binaries of api:
https://www.codeproject.com/Articles/1181213/Easy-OpenCL-Multiple-Device-Load-Balancing-and-Pip
or source code to compile on your pc:
https://github.com/tugrul512bit/Cekirdekler
if you have multiple gpus, it uses them without any extra code. Including a cpu to the computations would pull gpu effectiveness down in this sample for first iteration (repeatations complete in 76ms with cpu+gpu) so its better to use 2-3 GPU instead of CPU+GPU.
I didn't check numerical stability(you should use Kahan-Summation when adding millions or more values into same variable but I didn't use it for readability and don't have an idea about if 64-bit values need this too like 32-bit ones) or any value correctness, you should do it. Also foo kernel is not optimized. It makes %50 of core times idle so it should be better scheduled like this:
thread-0: compute element 0 and element N-1
thread-1: compute element 1 and element N-2
thread-m: compute element N/2-1 and element N/2
so all workitems get similar amount of work. On top of this, using 100 for workgroup size is not optimal. It should be something like 128,256,512 or 1024(for Nvidia) but this means array size should also be an integer multiple of this too. Then it would need extra control logic in the kernel to not go out of array borders. For even more performance, for loop could have multiple partial sums to do a "loop unrolling".

Why is my ElapsedMilliseconds always zero here?

So I'm trying to measure the performance of the hash set I created versus the performance of the same elements in a List and in the following block of code
Stopwatch Watch = new Stopwatch();
long tList = 0, tHset = 0; // ms
foreach ( string Str in Copy )
{
// measure time to look up string in ordinary list
Watch.Start();
if ( ListVersion.Contains(Str) ) { }
Watch.Stop();
tList += Watch.ElapsedMilliseconds;
// now measure time to look up same string in my hash set
Watch.Reset();
Watch.Start();
if ( this.Contains(Str) ) { }
Watch.Stop();
tHset += Watch.ElapsedMilliseconds;
Watch.Reset();
}
int n = Copy.Count;
Console.WriteLine("Average milliseconds to look up in List: {0}", tList / n);
Console.WriteLine("Average milliseconds to look up in hashset: {0}", tHset / n);
it is outputing 0 for both. Any idea why this is? Relevant documentation: https://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch(v=vs.110).aspx

That's because the operation is faster than precision of the Stapwatch.
Instead of measuring each of the Contains call separately measure a group of them:
Stopwatch Watch = new Stopwatch();
long tList = 0, tHset = 0; // ms
// measure time to look up string in ordinary list
Watch.Start();
foreach ( string Str in Copy )
{
if ( ListVersion.Contains(Str) ) { }
}
Watch.Stop();
tList = Watch.ElapsedMilliseconds;
// now measure time to look up same string in my hash set
Watch.Reset();
Watch.Start();
foreach ( string Str in Copy )
{
if ( this.Contains(Str) ) { }
}
Watch.Stop();
tHset = Watch.ElapsedMilliseconds;
Console.WriteLine("Total milliseconds to look up in List: {0}", tList);
Console.WriteLine("Total milliseconds to look up in hashset: {0}", tHset);
As you can see, I also changed the code to print total time spent instead of average. With operations so fast performance is usually presented in Xs per Y operations instead of average. E.g. 40ms per 10 million lookups.
Also, it's possible that in Release mode parts of your code will be optimized away, because it doesn't actually do anything. Consider counting number of elements for which Contains returns true and printing that number out at the end.

You can keep your code as it is and instead of doing :
Watch.ElapsedMilliseconds
You do this :
Watch.Elapsed.TotalMilliseconds
This way you will have the fractional part of the millisecond

What is the fastest implementation of sql like 'x%' in c# collections on a key

I have a need to do very quick prefix "sql like" searches over a hundreds of thousands of keys. I have tried doing performance tests using a SortedList, a Dictionary, and a SortedDictionary, which I do like so :
var dictionary = new Dictionary<string, object>();
// add a million random strings
var results = dictionary.Where(x=>x.Key.StartsWith(prefix));
I find that that they all take a long time, Dictionary is the fastest, and SortedDictionary the slowest.
Then I tried a Trie implementation from http://www.codeproject.com/Articles/640998/NET-Data-Structures-for-Prefix-String-Search-and-S which is a magnitude faster, ie. milliseconds instead of seconds.
So my question is, is there no .NET collection I can use for the said requirement? I would have assumed that this would be a common requirement.
My basic test :
class Program
{
static readonly Dictionary<string, object> dictionary = new Dictionary<string, object>();
static Trie<object> trie = new Trie<object>();
static void Main(string[] args)
{
var random = new Random();
for (var i = 0; i < 100000; i++)
{
var randomstring = RandomString(random, 7);
dictionary.Add(randomstring, null);
trie.Add(randomstring, null);
}
var lookups = new string[10000];
for (var i = 0; i < lookups.Length; i++)
{
lookups[i] = RandomString(random, 3);
}
// compare searching
var sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
var exists = dictionary.Any(k => k.Key.StartsWith(lookup));
}
sw.Stop();
Console.WriteLine("dictionary.Any(k => k.Key.StartsWith(randomstring)) took : {0} ms", sw.ElapsedMilliseconds);
// test other collections
sw.Restart();
foreach (var lookup in lookups)
{
var exists = trie.Retrieve(lookup).Any();
}
sw.Stop();
Console.WriteLine("trie.Retrieve(lookup) took : {0} ms", sw.ElapsedMilliseconds);
Console.ReadKey();
}
public static string RandomString(Random random,int length)
{
const string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
return new string(Enumerable.Repeat(chars, length)
.Select(s => s[random.Next(s.Length)]).ToArray());
}
}
Results:
dictionary.Any(k => k.Key.StartsWith(randomstring)) took : 80990 ms
trie.Retrieve(lookup) took : 115 ms

If sorting matters, try to use a SortedList instead of SortedDictionary. They both have the same functionality but they are implemented differently. SortedList is faster when you want to enumerate the elements (and you can access the elements by index), and SortedDictionary is faster if there are a lot of elements and you want to insert a new element in the middle of the collection.
So try this:
var sortedList = new SortedList<string, object>();
// populate list...
sortedList.Keys.Any(k => k.StartsWith(lookup));
If you have a million elements, but you don't want to re-order them once the dictionary is populated, you can combine their advantages: populate a SortedDictionary with the random elements, and then create a new List<KeyValuePair<,>> or SortedList<,> from that.

So, after little test I found something close enought with usage BinarySearch only Cons is that you have to sort keys from a to z. But the biggest the list, the slower it will be so Ternary Search is fastest from all you can actualy found with binary pc architecture.
Method: (Credits shoult go to #Guffa)
public static int BinarySearchStartsWith(List<string> words, string prefix, int min, int max)
{
while (max >= min)
{
var mid = (min + max) / 2;
var comp = string.CompareOrdinal(words[mid].Substring(0, prefix.Length), prefix);
if (comp >= 0)
{
if (comp > 0)
max = mid - 1;
else
return mid;
}
else
min = mid + 1;
}
return -1;
}
and test implementation
var keysToList = dictionary.Keys.OrderBy(q => q).ToList();
sw = new Stopwatch();
sw.Start();
foreach (var lookup in lookups)
{
bool exist = BinarySearchStartsWith(keysToList, lookup, 0, keysToList.Count - 1)!= -1
}
sw.Stop();

If you can sort the keys once and then use them repeatedly to look up the prefixes, then you can use a binary search to speed things up.
To get the maximum performance, I shall use two arrays, once for keys and one for values, and use the overload of Array.Sort() which sorts a main and an adjunct array.
Then you can use Array.BinarySearch() to search for the nearest key which starts with a given prefix, and return the indices for those that match.
When I try it, it seems to only take around 0.003ms per check if there are one or more matching prefixes.
Here's a runnable console application to demonstrate (remember to do your timings on a RELEASE build):
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Diagnostics;
using System.Linq;
namespace Demo
{
class Program
{
public static void Main()
{
int count = 1000000;
object obj = new object();
var keys = new string[count];
var values = new object[count];
for (int i = 0; i < count; ++i)
{
keys[i] = randomString(5, 16);
values[i] = obj;
}
// Sort key array and value arrays in tandem to keep the relation between keys and values.
Array.Sort(keys, values);
// Now you can use StartsWith() to return the indices of strings in keys[]
// that start with a specific string. The indices can be used to look up the
// corresponding values in values[].
Console.WriteLine("Count of ZZ = " + StartsWith(keys, "ZZ").Count());
// Test a load of times with 1000 random prefixes.
var prefixes = new string[1000];
for (int i = 0; i < 1000; ++i)
prefixes[i] = randomString(1, 8);
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; ++i)
for (int j = 0; j < 1000; ++j)
StartsWith(keys, prefixes[j]).Any();
Console.WriteLine("1,000,000 checks took {0} for {1} ms each.", sw.Elapsed, sw.ElapsedMilliseconds/1000000.0);
}
public static IEnumerable<int> StartsWith(string[] array, string prefix)
{
int index = Array.BinarySearch(array, prefix);
if (index < 0)
index = ~index;
// We might have landed partway through a set of matches, so find the first match.
if (index < array.Length)
while ((index > 0) && array[index-1].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
--index;
while ((index < array.Length) && array[index].StartsWith(prefix, StringComparison.OrdinalIgnoreCase))
yield return index++;
}
static string randomString(int minLength, int maxLength)
{
int length = rng.Next(minLength, maxLength);
const string CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
return new string(Enumerable.Repeat(CHARS, length)
.Select(s => s[rng.Next(s.Length)]).ToArray());
}
static readonly Random rng = new Random(12345);
}
}

Sort a List<StringBuilder>

Requirement: Iterate through a sorted list of strings, adding a char at the beginning of each string, then re-sorting. This may need to be done a few thousand times. I tried using a regular List of strings but, as expected, the process was way too slow.
I was going to try a List of StringBuilders but there is no direct way to sort the list. Any workarounds come to mind?

You've stated you can't sort a Link - however, you can if you can supply your own sort comparison:
List<StringBuilder> strings = new List<StringBuilder>();
// ...
strings.Sort((s1, s2) => s1.ToString().CompareTo(s2.ToString()));
The problem here as #phoog notes, is that in order to do so it allocates a lot of extra strings and isn't very efficient. The sort he provides is better. What we can do to figure out which approach is better is supply a test. You can see the fiddle here: http://dotnetfiddle.net/Px4fys
The fiddle uses very few strings and very few iterations because it's in a fiddle and there's a memory limit. If you paste this into a console app and run in Release you'll find there's huge differences. As #phoog also suggests LinkedList<char> wins hands-down. StringBuilder is the slowest.
If we bump up the values and run in Release mode:
const int NumStrings= 1000;
const int NumIterations= 1500;
We'll find the results:
List<StringBuilder> - Elapsed Milliseconds: 27,678
List<string> - Elapsed Milliseconds: 2,932
LinkedList<char> - Elapsed Milliseconds: 912
EDIT 2: When I bumped both values up to 3000 and 3000
List<StringBuilder> - Elapsed Milliseconds: // Had to comment out - was taking several minutes
List<string> - Elapsed Milliseconds: 45,928
LinkedList<char> - Elapsed Milliseconds: 6,823

The string builders will be a bit quicker than strings, but still slow, since you have to copy the entire buffer to add a character at the beginning.
You can create a custom comparison method (or comparer object if you prefer) and pass it to the List.Sort method:
int CompareStringBuilders(StringBuilder a, StringBuilder b)
{
for (int i = 0; i < a.Length && i < b.Length; i++)
{
var comparison = a[i].CompareTo(b[i]);
if (comparison != 0)
return comparison;
}
return a.Length.CompareTo(b.Length);
}
Invoke it like this:
var list = new List<StringBuilder>();
//...
list.Sort(CompareStringBuilders);
You would probably do better to look for a different solution to your problem, however.
Linked lists offer quick prepending, so how about using LinkedList<char>? This might not work if you need other StringBuilder functions, of course.
StringBuilder was rewritten for .NET 4, so I've struck out my earlier comments about slow prepending of characters. If performance is an issue, you should test to see where the problems actually lie.

Thanks to all for the suggestions posted. I checked these, and I have to say that I'm astonished that LinkedList works incredibly well, except for memory usage.
Another surprise is the slow sorting speed of the StringBuilder list. It works quickly as expected for the char insert phase. But the posted benchmarks above reflect what I've found: StringBuilder sorts very slowly for some reason. Painfully slow.
List of strings sorts faster. But counter to intuition, List of LinkedList sorts very fast. I have no idea how navigating a linked list could possibly be faster than simple indexing of a buffer (as in strings and StringBuilder), but it is. I would never have thought to try LinkedList. Compliments to McAden for the insight!
But unfortunately, LinkedList runs the system out of RAM. So, back to the drawing board.

Sort the StringBuilders as described in Phoog's answer, but keep the strings in reverse order in the StringBuilder instances - this way, you can optimize the "prepending" of each new character by appending it to the end of the StringBuilder's current value:
Update: with test program
class Program
{
static readonly Random _rng = new Random();
static void Main(string[] args)
{
int stringCount = 2500;
int initialStringSize = 100;
int maxRng = 4;
int numberOfPrepends = 2500;
int iterations = 5;
Console.WriteLine( "String Count: {0}; # of Prepends: {1}; # of Unique Chars: {2}", stringCount, numberOfPrepends, maxRng );
var startingStrings = new List<string>();
for( int i = 0; i < stringCount; ++i )
{
var sb = new StringBuilder( initialStringSize );
for( int j = 0; j < initialStringSize; ++j )
{
sb.Append( _rng.Next( 0, maxRng ) );
}
startingStrings.Add( sb.ToString() );
}
for( int i = 0; i < iterations; ++i )
{
TestUsingStringBuilderAppendWithReversedStrings( startingStrings, maxRng, numberOfPrepends );
TestUsingStringBuilderPrepend( startingStrings, maxRng, numberOfPrepends );
}
var input = Console.ReadLine();
}
private static void TestUsingStringBuilderAppendWithReversedStrings( IEnumerable<string> startingStrings, int maxRng, int numberOfPrepends )
{
var builders = new List<StringBuilder>();
var start = DateTime.Now;
foreach( var str in startingStrings )
{
builders.Add( new StringBuilder( str ).Reverse() );
}
for( int i = 0; i < numberOfPrepends; ++i )
{
foreach( var sb in builders )
{
sb.Append( _rng.Next( 0, maxRng ) );
}
builders.Sort( ( x, y ) =>
{
var comparison = 0;
var xOffset = x.Length;
var yOffset = y.Length;
while( 0 < xOffset && 0 < yOffset && 0 == comparison )
{
--xOffset;
--yOffset;
comparison = x[ xOffset ].CompareTo( y[ yOffset ] );
}
if( 0 != comparison )
{
return comparison;
}
return xOffset.CompareTo( yOffset );
} );
}
builders.ForEach( sb => sb.Reverse() );
var end = DateTime.Now;
Console.WriteLine( "StringBuilder Reverse Append - Total Milliseconds: {0}", end.Subtract( start ).TotalMilliseconds );
}
private static void TestUsingStringBuilderPrepend( IEnumerable<string> startingStrings, int maxRng, int numberOfPrepends )
{
var builders = new List<StringBuilder>();
var start = DateTime.Now;
foreach( var str in startingStrings )
{
builders.Add( new StringBuilder( str ) );
}
for( int i = 0; i < numberOfPrepends; ++i )
{
foreach( var sb in builders )
{
sb.Insert( 0, _rng.Next( 0, maxRng ) );
}
builders.Sort( ( x, y ) =>
{
var comparison = 0;
for( int offset = 0; offset < x.Length && offset < y.Length && 0 == comparison; ++offset )
{
comparison = x[ offset ].CompareTo( y[ offset ] );
}
if( 0 != comparison )
{
return comparison;
}
return x.Length.CompareTo( y.Length );
} );
}
var end = DateTime.Now;
Console.WriteLine( "StringBulder Prepend - Total Milliseconds: {0}", end.Subtract( start ).TotalMilliseconds );
}
}
public static class Extensions
{
public static StringBuilder Reverse( this StringBuilder stringBuilder )
{
var endOffset = stringBuilder.Length - 1;
char a;
for( int beginOffset = 0; beginOffset < endOffset; ++beginOffset, --endOffset )
{
a = stringBuilder[ beginOffset ];
stringBuilder[ beginOffset ] = stringBuilder[ endOffset ];
stringBuilder[ endOffset ] = a;
}
return stringBuilder;
}
}
results:
2500 strings initially at 100 characters, 2500 prepends:

C# - What is the most efficient way to generate 10 character random alphanumeric string in?

What is the most efficient way to generate 10-character random alphanumeric string in c#?

http://msdn.microsoft.com/en-us/library/system.io.path.getrandomfilename.aspx
string randomName = Path.GetRandomFileName();
randomName = randomName.Replace(".", string.Empty);
// take substring...
eg:
?Path.GetRandomFileName();
"rlwi1uew.5ha"
?Path.GetRandomFileName();
"gcwhcoiy.vxl"
?Path.GetRandomFileName();
"2pzyljzf.k41"
?Path.GetRandomFileName();
"kyjzcccf.d3c"

var buffer = new byte[5];
new Random().NextBytes(buffer);
Console.WriteLine(string.Join("", buffer.Select(b => b.ToString("X2"))));

var buffer = new byte[15];
new Random().NextBytes(buffer);
string rnd = Convert.ToBase64String (buffer).Substring (10);
The only problem I see with this is that it also uses + and /, so you'll have to replace them with something, too.
string rnd = Convert.ToBase64String (buffer)
.Substring (10)
.Replace ('/', '0')
.Replace ('+', '1');

Guid is pretty fast
Guid.NewGuid().ToString("N").Substring(0, 10);
From MSDN
A GUID is a 128-bit integer (16 bytes) that can be used across all computers and networks wherever a unique identifier is required. Such an identifier has a very low probability of being duplicated.
It might not be unique for a billion requests since you need only 10 characters. But it generates a string from 0 to 9 and A to F.
Performance
Tested using
public static void Test(Action a)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (var i = 0; i < 10000; ++i)
a();
sw.Stop();
Console.WriteLine("ms: {0} ticks: {1}", sw.ElapsedMilliseconds, sw.ElapsedTicks);
}
Guid method
Test(() =>
{
var xxx = Guid.NewGuid().ToString("N").Substring(0, 10);
});
// Result
// 6 ms
// 17273 ticks
Bytes method
Test(() =>
{
var buffer = new byte[5];
new Random().NextBytes(buffer);
var x = string.Join("", buffer.Select(b => b.ToString("X2")));
});
// Result:
// 57 ms
// 165642 ticks
It is up to you to pick between high speed or high reliability.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Linq Optimization for Count And Group By - c#

Why not just int[] freq = new int[256]; foreach (byte b in ltext) freq[b]++; ?

Related

C# OpenCL GPU implementation for double array math

Why is my ElapsedMilliseconds always zero here?

What is the fastest implementation of sql like 'x%' in c# collections on a key

Sort a List<StringBuilder>

C# - What is the most efficient way to generate 10 character random alphanumeric string in?

Categories

Resources