Ways to improve string memory allocation

Ways to improve string memory allocation - c#

This question is more theoretical than practical, but still.
I've been looking for a chance to improve the following code from the string memory allocation standpoint:
/* Output for n = 3:
*
* ' #'
* ' ##'
* '###'
*
*/
public static string[] staircase(int n) {
string[] result = new string[n];
for(var i = 0; i < result.Length; i++) {
var spaces = string.Empty.PadLeft(n - i - 1, ' ');
var sharpes = string.Empty.PadRight(i + 1, '#');
result[i] = spaces + sharpes;
}
return result;
}
PadHelper is the method, that is eventually called under the hood twice per iteration.
So, correct me if I'm wrong, but it seems like memory is allocated at least 3 times per iteration.
Any code improvements will be highly appreciated.

how about:
result[i] = new string('#',i).PadLeft(n)
?
Note that this still allocates two strings internally, but I honestly don't see that as a problem. The garbage collector will take care of it for you.

StringBuilder is always an answer when it comes to string allocations; I'm sure you know that so apparently you want something else. Well, since your strings are all the same length, you can declare a single char[] array, populate it every time (only requires changing one array element on each iteration) and then use the string(char[]) constructor:
public static string[] staircase(int n)
{
char[] buf = new char[n];
string[] result = new string[n];
for (var i = 0; i < n - 1; i++)
{
buf[i] = ' ';
}
for (var i = 0; i < n; i++)
{
buf[n - i - 1] = '#';
result[i] = new string(buf);
}
return result;
}

You can save on both allocations and speed by starting with a string that contains all the Spaces and all the Sharpes you're ever going to need, and then taking substrings from that, as follows:
public string[] Staircase2()
{
string allChars = new string(' ', n - 1) + new string('#', n); // n-1 spaces + n sharpes
string[] result = new string[n];
for (var i = 0; i < result.Length; i++)
result[i] = allChars.Substring(i, n);
return result;
}
I used BenchmarkDotNet to compare Staircase1 (your original approach) with Staircase2 (my approach above) from n=2 upto n=8, see the results below.
It shows that Staircase2 is always faster (see the Mean column), and it allocates fewer bytes starting from n=3.
| Method | n | Mean | Error | StdDev | Allocated |
|----------- |-- |------------:|-----------:|-----------:|----------:|
| Staircase1 | 2 | 229.36 ns | 4.3320 ns | 4.0522 ns | 92 B |
| Staircase2 | 2 | 92.00 ns | 0.7200 ns | 0.6735 ns | 116 B |
| Staircase1 | 3 | 375.06 ns | 3.3043 ns | 3.0908 ns | 156 B |
| Staircase2 | 3 | 114.12 ns | 2.8933 ns | 3.2159 ns | 148 B |
| Staircase1 | 4 | 507.32 ns | 3.8995 ns | 3.2562 ns | 236 B |
| Staircase2 | 4 | 142.78 ns | 1.4575 ns | 1.3634 ns | 196 B |
| Staircase1 | 5 | 650.03 ns | 15.1515 ns | 25.7284 ns | 312 B |
| Staircase2 | 5 | 169.25 ns | 1.9076 ns | 1.6911 ns | 232 B |
| Staircase1 | 6 | 785.75 ns | 16.9353 ns | 15.8413 ns | 412 B |
| Staircase2 | 6 | 195.91 ns | 2.9852 ns | 2.4928 ns | 292 B |
| Staircase1 | 7 | 919.15 ns | 11.4145 ns | 10.6771 ns | 500 B |
| Staircase2 | 7 | 237.55 ns | 4.6380 ns | 4.9627 ns | 332 B |
| Staircase1 | 8 | 1,075.66 ns | 26.7013 ns | 40.7756 ns | 620 B |
| Staircase2 | 8 | 255.50 ns | 2.6894 ns | 2.3841 ns | 404 B |
This doesn't mean that Staircase2 is the absolute best possible, but certainly there is a way that is better than the original.

You can project your desired results using the Linq Select method. For example, something like this:
public static string[] staircase(int n) {
return Enumerable.Range(1, n).Select(i => new string('#', i).PadLeft(n)).ToArray();
}
Alternate approach using an int array:
public static string[] staircase(int n) {
return (new int[n]).Select((x,i) => new string('#', i+1).PadLeft(n)).ToArray();
}
HTH

Related

What causes the difference of different copy method of different array length in C#?

In C#, there are some different ways to copy the elements of an array to another. To the best of my knowledge, they are "For" loop, Array.CopyTo, Span<T>.CopyTo, T[].CopyTo and Buffer.BlockCopy.
Since looping to copy the elements is always the slowest way, I skip it and run benchmark test for the other four methods. However, it seems that the speed of them are related with the length of the array, which really confused me.
My code of benchmark test is shown below. My experiment environment is Windows 11, .NET 6, Intel 12700 CPU, 64bits, using "BenchmarkDotnet" as the benchmark test framework.
public class UnitTest1
{
static readonly int times = 1000;
static readonly int arrayLength = 8;
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
public static int[] GetRandomArray(int length)
{
int[] array = new int[length];
for (int i = 0; i < length; i++)
{
array[i] = new Random(DateTime.Now.Millisecond).Next(int.MinValue, int.MaxValue);
}
System.Threading.Thread.Sleep(2000);
return array;
}
[Benchmark]
public void TestArrayCopy()
{
for (var j = 0; j < times; j++)
{
src.CopyTo(dst, 0);
}
}
[Benchmark]
public void TestSingleSpanCopy()
{
var dstSpan = dst.AsSpan();
for (var j = 0; j < times; j++)
{
src.CopyTo(dstSpan);
}
}
[Benchmark]
public void TestDoubleSpanCopy()
{
var srcSpan = src.AsSpan();
var dstSpan = dst.AsSpan();
for (var j = 0; j < times; j++)
{
srcSpan.CopyTo(dstSpan);
}
}
[Benchmark]
public void BufferCopy()
{
for (var j = 0; j < times; j++)
{
System.Buffer.BlockCopy(src, 0, dst, 0, sizeof(int) * src.Length);
}
}
}
Here are the test results.
times = 1000, arrayLength = 8
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 3.061 us | 0.0370 us | 0.0543 us |
| TestSingleSpanCopy | 1.297 us | 0.0041 us | 0.0038 us |
| TestDoubleSpanCopy | 1.113 us | 0.0190 us | 0.0203 us |
| BufferCopy | 7.162 us | 0.1250 us | 0.1044 us |
times = 1000, arrayLength = 16
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 3.426 us | 0.0677 us | 0.0806 us |
| TestSingleSpanCopy | 1.609 us | 0.0264 us | 0.0206 us |
| TestDoubleSpanCopy | 1.478 us | 0.0228 us | 0.0202 us |
| BufferCopy | 7.465 us | 0.0866 us | 0.0723 us |
times = 1000, arrayLength = 32
| Method | Mean | Error | StdDev | Median |
|------------------- |----------:|----------:|----------:|----------:|
| TestArrayCopy | 4.063 us | 0.0417 us | 0.0390 us | 4.076 us |
| TestSingleSpanCopy | 4.115 us | 0.3552 us | 1.0473 us | 4.334 us |
| TestDoubleSpanCopy | 3.576 us | 0.3391 us | 0.9998 us | 3.601 us |
| BufferCopy | 12.922 us | 0.7339 us | 2.1640 us | 13.814 us |
times = 1000, arrayLength = 128
| Method | Mean | Error | StdDev | Median |
|------------------- |----------:|----------:|----------:|----------:|
| TestArrayCopy | 7.865 us | 0.0919 us | 0.0815 us | 7.842 us |
| TestSingleSpanCopy | 7.036 us | 0.2694 us | 0.7900 us | 7.256 us |
| TestDoubleSpanCopy | 7.351 us | 0.0914 us | 0.0855 us | 7.382 us |
| BufferCopy | 10.955 us | 0.1157 us | 0.1083 us | 10.947 us |
times = 1000, arrayLength = 1024
| Method | Mean | Error | StdDev | Median |
|------------------- |---------:|---------:|----------:|---------:|
| TestArrayCopy | 45.16 us | 3.619 us | 10.670 us | 48.95 us |
| TestSingleSpanCopy | 36.85 us | 3.608 us | 10.638 us | 34.77 us |
| TestDoubleSpanCopy | 38.88 us | 3.378 us | 9.960 us | 39.91 us |
| BufferCopy | 48.83 us | 4.352 us | 12.833 us | 53.65 us |
times = 1000, arrayLength = 16384
| Method | Mean | Error | StdDev |
|------------------- |---------:|----------:|----------:|
| TestArrayCopy | 1.417 ms | 0.1096 ms | 0.3233 ms |
| TestSingleSpanCopy | 1.487 ms | 0.1012 ms | 0.2983 ms |
| TestDoubleSpanCopy | 1.438 ms | 0.1115 ms | 0.3287 ms |
| BufferCopy | 1.423 ms | 0.1147 ms | 0.3383 ms |
times = 100, arrayLength = 65536
| Method | Mean | Error | StdDev |
|------------------- |---------:|---------:|----------:|
| TestArrayCopy | 630.9 us | 47.01 us | 138.61 us |
| TestSingleSpanCopy | 629.5 us | 46.83 us | 138.08 us |
| TestDoubleSpanCopy | 655.4 us | 47.23 us | 139.25 us |
| BufferCopy | 419.0 us | 3.31 us | 2.93 us |
When the arrayLength is 8 or 16, the Span<T>.CopyTo() is the fastest. When the arrayLength is 32 or 128, the first three way are almost the same and all faster than Buffer.BlockCopy.Ehen the arrayLength is 1024, however, the Span<T>.CopyTo and T[].CopyTo are again faster than the other two ways. When the arrayLength is 16384, these four ways are almost the same. But when the arrayLength is 65536, the Buffer.BlockCopy is the fastest! Besides, the Span<T>.CopyTo here is a bit slower than the first two ways.
I really can't understand the results. At first I guess it's the cpu cache that matters. However, the L1 Cache of my CPU is 960KB, which is larger than the space of the array of any test case. Maybe it's the different implementation that causes this?
I will appreciate it if you are willing to explain it for me or discuss with me. I will also think about it and update the question if I get an idea.
As #Ralf mentioned, the source and destination of the array in each time are all the same, which could impact on the results. I modified my code and tried the test again, as is shown below. To avoid the time consume, I just declare a new array each time instead of randomize it manually.
using System.Buffers;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Program
{
public static void Main(string[] args)
{
var summary = BenchmarkRunner.Run(typeof(Program).Assembly);
Console.WriteLine(summary);
}
}
public class UnitTest1
{
static readonly int times = 1000;
static readonly int arrayLength = 8;
public static int[] GetRandomArray(int length)
{
int[] array = new int[length];
//for (int i = 0; i < length; i++)
//{
// array[i] = new Random(DateTime.Now.Millisecond).Next(int.MinValue, int.MaxValue);
//}
return array;
}
[Benchmark]
public void ArrayCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.CopyTo(dst, 0);
}
}
[Benchmark]
public void SingleSpanCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.CopyTo(dst.AsSpan());
}
}
[Benchmark]
public void DoubleSpanCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
src.AsSpan().CopyTo(dst.AsSpan());
}
}
[Benchmark]
public void BufferCopy()
{
for (var j = 0; j < times; j++)
{
int[] src = GetRandomArray(arrayLength);
int[] dst = new int[arrayLength];
System.Buffer.BlockCopy(src, 0, dst, 0, sizeof(int) * src.Length);
}
}
}
times = 1000, arrayLength = 8
| Method | Mean | Error | StdDev | Median |
|--------------- |----------:|----------:|----------:|----------:|
| ArrayCopy | 8.843 us | 0.1762 us | 0.3040 us | 8.843 us |
| SingleSpanCopy | 6.864 us | 0.1366 us | 0.1519 us | 6.880 us |
| DoubleSpanCopy | 10.543 us | 0.9496 us | 2.7999 us | 10.689 us |
| BufferCopy | 21.270 us | 1.3477 us | 3.9738 us | 22.630 us |
times = 1000, arrayLength = 16
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 16.94 us | 0.952 us | 2.808 us | 17.27 us |
| SingleSpanCopy | 12.54 us | 1.054 us | 3.109 us | 12.32 us |
| DoubleSpanCopy | 13.23 us | 0.930 us | 2.741 us | 13.25 us |
| BufferCopy | 23.43 us | 1.218 us | 3.591 us | 24.99 us |
times = 1000, arrayLength = 32
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 24.35 us | 1.774 us | 5.229 us | 26.23 us |
| SingleSpanCopy | 20.64 us | 1.726 us | 5.089 us | 21.09 us |
| DoubleSpanCopy | 19.97 us | 1.915 us | 5.646 us | 20.08 us |
| BufferCopy | 26.24 us | 2.547 us | 7.511 us | 24.59 us |
times = 1000, arrayLength = 128
| Method | Mean | Error | StdDev |
|--------------- |---------:|---------:|---------:|
| ArrayCopy | 39.11 us | 0.529 us | 0.495 us |
| SingleSpanCopy | 39.14 us | 0.782 us | 1.070 us |
| DoubleSpanCopy | 40.24 us | 0.798 us | 1.398 us |
| BufferCopy | 42.20 us | 0.480 us | 0.426 us |
times = 1000, arrayLength = 1024
| Method | Mean | Error | StdDev |
|--------------- |---------:|--------:|--------:|
| ArrayCopy | 254.6 us | 4.92 us | 8.87 us |
| SingleSpanCopy | 241.4 us | 2.98 us | 2.78 us |
| DoubleSpanCopy | 243.7 us | 4.75 us | 4.66 us |
| BufferCopy | 243.0 us | 2.85 us | 2.66 us |
times = 1000, arayLength = 16384
| Method | Mean | Error | StdDev |
|--------------- |---------:|----------:|----------:|
| ArrayCopy | 4.325 ms | 0.0268 ms | 0.0250 ms |
| SingleSpanCopy | 4.300 ms | 0.0120 ms | 0.0112 ms |
| DoubleSpanCopy | 4.307 ms | 0.0348 ms | 0.0325 ms |
| BufferCopy | 4.293 ms | 0.0238 ms | 0.0222 ms |
times = 100, arrayLength = 65536
| Method | Mean | Error | StdDev | Median |
|--------------- |---------:|---------:|---------:|---------:|
| ArrayCopy | 153.6 ms | 1.46 ms | 1.29 ms | 153.1 ms |
| SingleSpanCopy | 213.4 ms | 8.78 ms | 25.87 ms | 218.2 ms |
| DoubleSpanCopy | 221.2 ms | 9.51 ms | 28.04 ms | 229.7 ms |
| BufferCopy | 203.1 ms | 10.92 ms | 32.18 ms | 205.6 ms |
#Ralf is right, there is indeed some differences. The most significant one is that when arrayLength = 65536, Array.Copy instead of Buffer.BlockCopy is the fastest.
But still, the results are very confusing..

Are you sure you can repeat the same benchmark and get the same results? Perhaps it was just a one time occurence, maybe caused by heat issues or another app taking processor time. When I try it on my machine, the values I get are more in line with what you'd expect.
It says Windows 10 for some reason, I'm using 11 too.
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
11th Gen Intel Core i9-11980HK 2.60GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK=6.0.201
[Host] : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
DefaultJob : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
| Method | Mean | Error | StdDev |
|------------------- |---------:|--------:|--------:|
| TestArrayCopy | 466.6 us | 0.69 us | 0.61 us |
| TestSingleSpanCopy | 444.7 us | 1.07 us | 1.00 us |
| TestDoubleSpanCopy | 443.8 us | 0.62 us | 0.52 us |
| BufferCopy | 447.1 us | 7.28 us | 6.08 us |
Just before posting this, I realized: Your CPU, 12700, has performance and efficiency cores. What if it ran most of the benchmark on efficiency cores and just so happened to run the BufferCopy part on performance cores? Can you try disabling your efficiency cores in BIOS?

Is there a way to reverse each 2 bytes of a file?

Basic summary of what I'm trying to achieve in this idea (As far as that I know there isn't exactly a function to do what I'm doing).
What I need to do is idea is to reverse every 2 bytes of a file. Reading a file bytes and reversing each 2 bytes.
Example: 05 04 82 FF
Output: 04 05 FF 82
I have some idea of it. But I know my attempts are WAY off.
To clarify.
I'm trying to take a bin file.
Read the bytes inside the file.
And reversing every 2 inside that file and close it.
If anyone can clear this complicated way up that would be great?

There are many approaches you could take to achieve this.
Here is a fairly efficient streaming approach with low allocations, using all the characters we know and love.... ArrayPool, Span<T>, and FileStream.
Note 1 : Adjust the buffer size to something that suits your hardware if needed.
Note 2 : This lacks basic sanity checks and fault tolerance, it will also die miserably if the size of the file isn't devisable by 2.
Given
private static ArrayPool<byte> pool = ArrayPool<byte>.Shared;
private const int BufferSize = 4096;
public static void Swap(string fileName)
{
var tempFileName = Path.ChangeExtension(fileName, "bob");
var buffer = pool.Rent(BufferSize);
try
{
var span = new Span<byte>(buffer);
using var oldFs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.None, BufferSize);
using var newFs = new FileStream(tempFileName, FileMode.Create, FileAccess.Write, FileShare.None, BufferSize);
var size = 0;
while ((size = oldFs.Read(span)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = span[i];
span[i] = span[i + 1];
span[i + 1] = temp;
}
newFs.Write(span.Slice(0,size));
}
}
finally
{
pool.Return(buffer);
}
File.Move(tempFileName, fileName,true);
}
Test
File.WriteAllText(#"D:\Test1.txt","1234567890abcdef");
Swap(#"D:\Test1.txt");
var result = File.ReadAllText(#"D:\Test1.txt");
Console.WriteLine(result == "2143658709badcfe");
Output
True
Benchmarks
This was just a simple benchmark comparing the current solution with a simple array approach and pointers, varying the buffer size which you might do to increase HDD throughput. Technically it's only benchmarking one run through of a 10Mb data block, However the allocations would sky rocket if the methods got run more than once.
Environment
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1198 (1909/November2018Update/19H2)
AMD Ryzen 9 3900X, 1 CPU, 24 logical and 12 physical cores
.NET Core SDK=5.0.100
[Host] : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT [AttachedDebugger]
.NET Core 5.0 : .NET Core 5.0.0 (CoreCLR 5.0.20.51904, CoreFX 5.0.20.51904), X64 RyuJIT
Job=.NET Core 5.0 Runtime=.NET Core 5.0
Results
| Method | N | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------- |------ |------------:|---------:|---------:|-------:|------:|------:|----------:|
| SwapSpanPool | 4094 | 25.89 ns | 0.078 ns | 0.069 ns | - | - | - | - |
| SwapArray | 4094 | 157.70 ns | 0.516 ns | 0.483 ns | 0.4923 | - | - | 4120 B |
| SwapUnsafe | 4094 | 154.71 ns | 0.293 ns | 0.274 ns | 0.4923 | - | - | 4120 B |
|------------- |------ |------------:|---------:|---------:|-------:|------:|------:|----------:|
| SwapSpanPool | 16384 | 25.82 ns | 0.048 ns | 0.043 ns | - | - | - | - |
| SwapArray | 16384 | 520.62 ns | 1.186 ns | 1.109 ns | 1.9569 | - | - | 16408 B |
| SwapUnsafe | 16384 | 518.82 ns | 1.361 ns | 1.273 ns | 1.9569 | - | - | 16408 B |
|------------- |------ |------------:|---------:|---------:|-------:|------:|------:|----------:|
| SwapSpanPool | 65536 | 25.81 ns | 0.049 ns | 0.043 ns | - | - | - | - |
| SwapArray | 65536 | 1,840.41 ns | 5.792 ns | 5.418 ns | 7.8106 | - | - | 65560 B |
| SwapUnsafe | 65536 | 1,846.57 ns | 3.715 ns | 3.475 ns | 7.8106 | - | - | 65560 B |
Setup
[MemoryDiagnoser]
[SimpleJob(RuntimeMoniker.NetCoreApp50)]
public class DumbTest
{
private static readonly ArrayPool<byte> pool = ArrayPool<byte>.Shared;
private MemoryStream ms1;
private MemoryStream ms2;
[Params(4094, 16384, 65536)] public int N;
[GlobalSetup]
public void Setup()
{
var data = new byte[10 * 1024 * 1024];
new Random(42).NextBytes(data);
ms1 = new MemoryStream(data);
ms2 = new MemoryStream(new byte[10 * 1024 * 1024]);
}
public void SpanPool()
{
var buffer = pool.Rent(N);
try
{
var span = new Span<byte>(buffer);
var size = 0;
while ((size = ms1.Read(span)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = span[i];
span[i] = span[i + 1];
span[i + 1] = temp;
}
ms2.Write(span.Slice(0, size));
}
}
finally
{
pool.Return(buffer);
}
}
public void Array()
{
var buffer = new byte[N];
var size = 0;
while ((size = ms1.Read(buffer)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = buffer[i];
buffer[i] = buffer[i + 1];
buffer[i + 1] = temp;
}
ms2.Write(buffer, 0, size);
}
}
public unsafe void Unsafe()
{
var buffer = new byte[N];
fixed (byte* p = buffer)
{
var size = 0;
while ((size = ms1.Read(buffer)) > 0)
{
for (var i = 0; i < size; i += 2)
{
var temp = buffer[i];
p[i] = p[i + 1];
p[i + 1] = temp;
}
ms2.Write(buffer, 0, size);
}
}
}
[Benchmark]
public void SwapSpanPool()
{
SpanPool();
}
[Benchmark]
public void SwapArray()
{
Array();
}
[Benchmark]
public void SwapUnsafe()
{
Unsafe();
}
}

Why is BitArray faster than array of bools?

I have this implementation of Sieve of Eratosthenes in C#:
public static BitArray Count()
{
const int halfSize = MaxSize / 2;
var mark = new BitArray(halfSize);
const int max = halfSize - 2;
var maxFactor = (int) Math.Sqrt(MaxSize + 1) / 2;
for (var i = 1; i <= maxFactor; ++i)
{
if (mark[i]) continue;
var p = i + i + 1;
var k = p * p >> 1;
for (; k <= max; k += p)
{
mark[k] = true;
}
}
return mark;
}
It gives results good enough for me. Nonetheless, I decided to test this algorithm using arrays of bools, expecting it to use more memory but be faster. And to my surprise that wasn't the result. Benchmark.NET on .NET Core 3.1 shows that bool array is more than two times slower than BitArray. Considering that latter uses more method calls and gives much longer asm (BitArray vs. bool array), how is it possible?
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| Method | Mean | Error | StdDev | Median | Min | Max | Op/s | Allocated |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| SieveBool | 294.7 ms | 4.00 ms | 3.74 ms | 293.5 ms | 290.8 ms | 304.0 ms | 3.393 | 33.38 MB |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| SieveBitArray | 130.2 ms | 1.03 ms | 0.97 ms | 130.3 ms | 128.5 ms | 132.1 ms | 7.680 | 4.17 MB |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
Results are similar when using fields instead of initializing arrays in methods (except there is no allocation of course).

Generic performance vs. Type performance

I have several methods where I need to convert data (arrays) from one type into the other.
Sometimes I can work with generics, sometimes not, because the type is loaded from a configuration after the object is created. Since I need many different type conversions I created a ArrayConvert class that is able to handle this for me. My data is extremely large and I have to do it very that of course I try as much as possible to prevent the conversion, but in my situation this is not always possible.
The ArrayConvert class looks like following:
public static class ArrayConvert
{
public delegate void Converter(Array a1, Array a2);
static readonly Dictionary<(Type srcType, Type tgtType), Converter> converters = new Dictionary<(Type fromType, Type toType), Converter>();
static ArrayConvert()
{
converters.Add((typeof(float), typeof(int)), FloatToInt);
}
[MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
public static void FloatToInt(Array a1, Array a2)
{
int N = a1.Length;
var srcArray = (float[])a1;
var tgtArray = (int[])a2;
for (int i = 0; i < N; i++)
tgtArray[i] = (int)srcArray[i];
}
[MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
public static void FloatToInt(float[] a1, int[] a2)
{
int N = a1.Length;
var srcArray = a1;
var tgtArray = a2;
for (int i = 0; i < N; i++)
tgtArray[i] = (int)srcArray[i];
}
[MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
public static void Convert(Type srcType, Array srcArray, Type tgtType, Array tgtArray)
{
if (converters.TryGetValue((srcType, tgtType), out var converter))
{
converter(srcArray, tgtArray);
return;
}
throw new NotImplementedException();
}
[MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
public static void ConvertGenericFast<TSrcType, TTgtType>(TSrcType[] srcArray, TTgtType[] tgtArray)
{
if (converters.TryGetValue((typeof(TSrcType), typeof(TTgtType)), out var converter))
{
converter(srcArray, tgtArray);
return;
}
throw new NotImplementedException();
}
[MethodImpl(MethodImplOptions.AggressiveInlining | MethodImplOptions.AggressiveOptimization)]
public static void ConvertGenericSlow<TSrcType, TTgtType>(TSrcType[] srcArray, TTgtType[] tgtArray)
{
Convert(typeof(TSrcType), srcArray, typeof(TTgtType), tgtArray);
}
}
When I know write a benchmark around this conversion methods, I can see pretty weird results.
Here's the benchmark class:
public class Tester
{
public readonly static int N = 100000;
public readonly static float[] SrcData;
public readonly static int[] TgtData;
static Tester()
{
SrcData = new float[N];
TgtData = new int[N];
for (int i = 0; i < N; i++)
{
SrcData[i] = i;
TgtData[i] = i;
}
}
[Benchmark]
public void ConvertWithType() => ArrayConvert.Convert(typeof(float), SrcData, typeof(int), TgtData);
[Benchmark]
public void ConvertWithGenericFast() => ArrayConvert.ConvertGenericFast<float, int>(SrcData, TgtData);
[Benchmark]
public void ConvertWithGenericSlow() => ArrayConvert.ConvertGenericSlow<float, int>(SrcData, TgtData);
[Benchmark]
public void ConvertWithKnownDirectWithType() => ArrayConvert.FloatToInt(SrcData, TgtData);
[Benchmark]
public void ConvertWithKnownDirectWithArray() => ArrayConvert.FloatToInt((Array)SrcData, (Array)TgtData);
}
Benchmarks:
Runtime = .NET Core 3.1.2 (CoreCLR 4.700.20.6602, CoreFX 4.700.20.6702), X64 RyuJIT; GC = Concurrent Workstation
Array Size: 10.000
| Method | Mean | Error | StdDev |
|-------------------------------- |---------:|----------:|----------:|
| ConvertWithType | 8.518 us | 0.0080 us | 0.0071 us |
| ConvertWithGenericFast | 8.684 us | 0.1163 us | 0.1088 us |
| ConvertWithGenericSlow | 8.482 us | 0.0028 us | 0.0023 us |
| ConvertWithKnownDirectWithType | 8.334 us | 0.0027 us | 0.0024 us |
| ConvertWithKnownDirectWithArray | 8.562 us | 0.0893 us | 0.0746 us |
Array Size: 100.000
| Method | Mean | Error | StdDev | Median |
|-------------------------------- |---------:|---------:|---------:|---------:|
| ConvertWithType | 68.40 us | 0.772 us | 1.372 us | 67.77 us |
| ConvertWithGenericFast | 68.03 us | 0.627 us | 0.770 us | 67.83 us |
| ConvertWithGenericSlow | 69.11 us | 0.944 us | 0.883 us | 68.90 us |
| ConvertWithKnownDirectWithType | 67.45 us | 0.689 us | 0.611 us | 67.34 us |
| ConvertWithKnownDirectWithArray | 67.24 us | 0.425 us | 0.398 us | 67.20 us |
Array Size: 1.000.000
| Method | Mean | Error | StdDev | Median |
|-------------------------------- |---------:|---------:|---------:|---------:|
| ConvertWithType | 693.9 us | 8.06 us | 7.54 us | 693.4 us |
| ConvertWithGenericFast | 800.2 us | 26.99 us | 79.58 us | 865.8 us |
| ConvertWithGenericSlow | 872.7 us | 6.27 us | 5.86 us | 870.1 us |
| ConvertWithKnownDirectWithType | 743.3 us | 24.66 us | 71.94 us | 704.1 us |
| ConvertWithKnownDirectWithArray | 870.9 us | 7.82 us | 7.32 us | 866.5 us |
Array Size: 10.000.000
| Method | Mean | Error | StdDev |
|-------------------------------- |----------:|----------:|----------:|
| ConvertWithType | 8.739 ms | 0.1120 ms | 0.0993 ms |
| ConvertWithGenericFast | 10.052 ms | 0.0918 ms | 0.0859 ms |
| ConvertWithGenericSlow | 10.015 ms | 0.0563 ms | 0.0439 ms |
| ConvertWithKnownDirectWithType | 10.070 ms | 0.0058 ms | 0.0045 ms |
| ConvertWithKnownDirectWithArray | 10.096 ms | 0.0996 ms | 0.0931 ms |
Why is the ConvertWithType always faster except the first two with 10.000 and 100.000 elements?
Why is the ConvertWithKnownDirectWithType not the fastest?
Why is there almost no difference between ConvertWithGenericFast and ConvertWithGenericSlow?
Why is there in 1.000.000 elements such a high standard deviation and error?
Furthermore, with Span and Memory I do not have a common "typeless" interface anymore as I had with array, since there is no common "base". So would there be a way to use the Span including the Array as the example above, or is there even a better and faster solution?

Looping for vs foreach difference in iterating dictionary c#

I have a below foreach loop which does the job. I am curious to know for my below case - is it better to use for loop instead of foreach loop for performance issues?
Since I read it that for loop is faster than foreach loop so I am kinda confuse as well.
foreach (KeyValuePair<string, StringValues> v in values)
{
string key = v.Key;
StringValues val = v.Value;
if (val.Count > 0)
{
if (!string.IsNullOrWhiteSpace(val[0]))
{
switch (key)
{
case ABC:
One = val[0];
break;
case PQR:
Two = val[0];
break;
//.. bunch of other case block here with similar stuff
}
}
}
}

The lack of any kind of indexer in the IDictionary<> interface due to dictionaries not having a defined ordering can make it difficult to iterate without the use of foreach/GetEnumerator(). Given...
Dictionary<int, int> dictionary = Enumerable.Range(0, 10).ToDictionary(i => i, i => -i);
...since you know the keys comprise a contiguous range of integers, you can use for to loop through all possible key values...
// This exploits the fact that we know keys from 0..9 exist in dictionary
for (int key = 0; key < dictionary.Count; key++)
{
int value = dictionary[key];
// ...
}
If you can't make that assumption, however, it becomes much trickier. You could iterate the Keys collection property to get each element's key...but that collection doesn't allow indexing, either, so you're back where you started with the foreach vs. for dilemma. If you insist on using for, though, one way to do so is by copying Keys to an array and then iterating that...
// Copy the Keys property to an array to allow indexing
int[] keys = new int[dictionary.Count];
dictionary.Keys.CopyTo(keys, 0);
// This makes no assumptions about the distribution of keys in dictionary
for (int index = 0; index < dictionary.Count; index++)
{
int key = keys[index];
int value = Source[key];
// ...
}
Of course, CopyTo() will enumerate Keys one complete time before you even have a chance to do so yourself, so that can only hurt performance.
If you are working with a fixed set of keys that's known ahead of time, or you don't mind having to maintain a separate collections of keys every time the dictionary's keys change, a slightly better way is to cache the keys in a structure that can be indexed...
int[] keyCache = Enumerable.Range(0, 10).ToArray();
// ...
// This retrieves known keys stored separately from dictionary
for (int index = 0; index < keyCache.Length; index++)
{
int key = keyCache[index];
int value = dictionary[key];
// ...
}
It might be tempting to use the LINQ ElementAt() method instead; after all, it's easy enough to use...
for (int index = 0; index < dictionary.Count; index++)
{
KeyValuePair<int, int> pair = dictionary.ElementAt(index);
// ...
}
This is very bad for performance, however. ElementAt() can only special-case for indexing when the input collection implements IList<>, which Dictionary<> does not nor does IDictionary<> inherit from it. Otherwise, for every index you try to retrieve it has to start from the beginning. Consider enumerating the entire 10-element dictionary defined above...
| Index requested | Elements enumerated | Total elements enumerated |
|:---------------:|:----------------------------:|:-------------------------:|
| 0 | 0 | 1 |
| 1 | 0, 1 | 3 |
| 2 | 0, 1, 2 | 6 |
| 3 | 0, 1, 2, 3 | 10 |
| 4 | 0, 1, 2, 3, 4 | 15 |
| 5 | 0, 1, 2, 3, 4, 5 | 21 |
| 6 | 0, 1, 2, 3, 4, 5, 6 | 28 |
| 7 | 0, 1, 2, 3, 4, 5, 6, 7 | 36 |
| 8 | 0, 1, 2, 3, 4, 5, 6, 7, 8 | 45 |
| 9 | 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 | 55 |
Add all that up and it will take 55 enumerations to step through a 10-element dictionary! So, in an effort to improve performance by eliminating foreach/GetEnumerator() this has only moved the GetEnumerator() call under the covers and made performance worse.
As for the actual performance differences of these approaches, here are the results I got...
// * Summary *
BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.657 (1909/November2018Update/19H2)
Intel Core i7 CPU 860 2.80GHz (Nehalem), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.201
[Host] : .NET Core 3.1.3 (CoreCLR 4.700.20.11803, CoreFX 4.700.20.12001), X64 RyuJIT
.NET 4.8 : .NET Framework 4.8 (4.8.4121.0), X64 RyuJIT
.NET Core 3.1 : .NET Core 3.1.3 (CoreCLR 4.700.20.11803, CoreFX 4.700.20.12001), X64 RyuJIT
| Method | Job | Runtime | Size | Mean | Error | StdDev | Ratio | RatioSD |
|---------------------------------- |-------------- |-------------- |------- |--------------------:|------------------:|------------------:|----------:|--------:|
| GetEnumerator | .NET 4.8 | .NET 4.8 | 10 | 118.4 ns | 1.71 ns | 1.76 ns | 1.02 | 0.02 |
| ForEach | .NET 4.8 | .NET 4.8 | 10 | 116.4 ns | 1.44 ns | 1.28 ns | 1.00 | 0.00 |
| For_Indexer_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 10 | 147.6 ns | 2.96 ns | 3.17 ns | 1.26 | 0.02 |
| While_Indexer_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 10 | 149.2 ns | 1.72 ns | 1.61 ns | 1.28 | 0.02 |
| For_TryGetValue_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 10 | 154.5 ns | 1.16 ns | 0.97 ns | 1.33 | 0.01 |
| While_TryGetValue_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 10 | 160.8 ns | 1.93 ns | 1.71 ns | 1.38 | 0.01 |
| For_Indexer_CopyToKeys | .NET 4.8 | .NET 4.8 | 10 | 177.5 ns | 1.37 ns | 1.14 ns | 1.53 | 0.02 |
| While_Indexer_CopyToKeys | .NET 4.8 | .NET 4.8 | 10 | 185.6 ns | 3.69 ns | 4.80 ns | 1.59 | 0.05 |
| For_Indexer_CachedKeys | .NET 4.8 | .NET 4.8 | 10 | 154.5 ns | 2.83 ns | 2.64 ns | 1.33 | 0.03 |
| While_Indexer_CachedKeys | .NET 4.8 | .NET 4.8 | 10 | 155.3 ns | 2.35 ns | 2.08 ns | 1.33 | 0.02 |
| For_ElementAt | .NET 4.8 | .NET 4.8 | 10 | 1,009.2 ns | 8.61 ns | 7.19 ns | 8.67 | 0.12 |
| While_ElementAt | .NET 4.8 | .NET 4.8 | 10 | 1,140.9 ns | 14.36 ns | 13.43 ns | 9.81 | 0.16 |
| | | | | | | | | |
| GetEnumerator | .NET Core 3.1 | .NET Core 3.1 | 10 | 118.6 ns | 2.39 ns | 3.19 ns | 0.98 | 0.03 |
| ForEach | .NET Core 3.1 | .NET Core 3.1 | 10 | 120.3 ns | 1.28 ns | 1.14 ns | 1.00 | 0.00 |
| For_Indexer_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 126.1 ns | 0.67 ns | 0.56 ns | 1.05 | 0.01 |
| While_Indexer_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 135.5 ns | 2.28 ns | 2.02 ns | 1.13 | 0.02 |
| For_TryGetValue_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 131.0 ns | 2.41 ns | 2.25 ns | 1.09 | 0.02 |
| While_TryGetValue_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 133.9 ns | 1.42 ns | 1.19 ns | 1.11 | 0.01 |
| For_Indexer_CopyToKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 162.4 ns | 2.32 ns | 2.06 ns | 1.35 | 0.02 |
| While_Indexer_CopyToKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 166.3 ns | 1.29 ns | 1.21 ns | 1.38 | 0.02 |
| For_Indexer_CachedKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 136.0 ns | 1.27 ns | 1.19 ns | 1.13 | 0.02 |
| While_Indexer_CachedKeys | .NET Core 3.1 | .NET Core 3.1 | 10 | 142.3 ns | 2.84 ns | 4.59 ns | 1.14 | 0.02 |
| For_ElementAt | .NET Core 3.1 | .NET Core 3.1 | 10 | 952.4 ns | 10.08 ns | 8.94 ns | 7.92 | 0.13 |
| While_ElementAt | .NET Core 3.1 | .NET Core 3.1 | 10 | 953.8 ns | 8.86 ns | 7.40 ns | 7.93 | 0.12 |
| | | | | | | | | |
| GetEnumerator | .NET 4.8 | .NET 4.8 | 1000 | 9,344.9 ns | 80.50 ns | 75.30 ns | 1.00 | 0.01 |
| ForEach | .NET 4.8 | .NET 4.8 | 1000 | 9,360.2 ns | 82.04 ns | 64.05 ns | 1.00 | 0.00 |
| For_Indexer_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 1000 | 15,122.4 ns | 81.71 ns | 68.23 ns | 1.62 | 0.01 |
| While_Indexer_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 1000 | 15,106.4 ns | 85.68 ns | 75.96 ns | 1.61 | 0.02 |
| For_TryGetValue_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 1000 | 16,160.3 ns | 270.09 ns | 252.64 ns | 1.73 | 0.03 |
| While_TryGetValue_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 1000 | 16,452.4 ns | 146.51 ns | 129.88 ns | 1.76 | 0.02 |
| For_Indexer_CopyToKeys | .NET 4.8 | .NET 4.8 | 1000 | 17,407.1 ns | 251.38 ns | 222.84 ns | 1.86 | 0.03 |
| While_Indexer_CopyToKeys | .NET 4.8 | .NET 4.8 | 1000 | 17,034.0 ns | 295.71 ns | 404.77 ns | 1.85 | 0.05 |
| For_Indexer_CachedKeys | .NET 4.8 | .NET 4.8 | 1000 | 16,277.5 ns | 69.91 ns | 58.38 ns | 1.74 | 0.02 |
| While_Indexer_CachedKeys | .NET 4.8 | .NET 4.8 | 1000 | 15,131.9 ns | 55.97 ns | 46.74 ns | 1.62 | 0.01 |
| For_ElementAt | .NET 4.8 | .NET 4.8 | 1000 | 4,859,257.3 ns | 18,862.72 ns | 15,751.22 ns | 519.24 | 4.36 |
| While_ElementAt | .NET 4.8 | .NET 4.8 | 1000 | 3,837,001.5 ns | 7,396.43 ns | 6,556.74 ns | 409.85 | 3.11 |
| | | | | | | | | |
| GetEnumerator | .NET Core 3.1 | .NET Core 3.1 | 1000 | 9,029.9 ns | 21.69 ns | 18.12 ns | 1.00 | 0.00 |
| ForEach | .NET Core 3.1 | .NET Core 3.1 | 1000 | 9,022.4 ns | 13.08 ns | 10.92 ns | 1.00 | 0.00 |
| For_Indexer_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 11,396.9 ns | 18.42 ns | 15.38 ns | 1.26 | 0.00 |
| While_Indexer_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 12,504.6 ns | 13.82 ns | 10.79 ns | 1.39 | 0.00 |
| For_TryGetValue_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 12,244.1 ns | 73.90 ns | 69.13 ns | 1.36 | 0.01 |
| While_TryGetValue_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 12,437.4 ns | 22.48 ns | 18.77 ns | 1.38 | 0.00 |
| For_Indexer_CopyToKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 13,717.9 ns | 36.98 ns | 30.88 ns | 1.52 | 0.00 |
| While_Indexer_CopyToKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 14,099.6 ns | 20.44 ns | 18.12 ns | 1.56 | 0.00 |
| For_Indexer_CachedKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 12,640.4 ns | 23.31 ns | 19.47 ns | 1.40 | 0.00 |
| While_Indexer_CachedKeys | .NET Core 3.1 | .NET Core 3.1 | 1000 | 12,610.5 ns | 20.97 ns | 17.51 ns | 1.40 | 0.00 |
| For_ElementAt | .NET Core 3.1 | .NET Core 3.1 | 1000 | 3,402,799.3 ns | 15,880.59 ns | 14,077.73 ns | 377.13 | 1.73 |
| While_ElementAt | .NET Core 3.1 | .NET Core 3.1 | 1000 | 3,399,305.2 ns | 5,822.01 ns | 5,161.06 ns | 376.76 | 0.74 |
| | | | | | | | | |
| GetEnumerator | .NET 4.8 | .NET 4.8 | 100000 | 885,621.4 ns | 1,617.29 ns | 1,350.51 ns | 1.00 | 0.00 |
| ForEach | .NET 4.8 | .NET 4.8 | 100000 | 884,591.8 ns | 1,781.29 ns | 1,390.72 ns | 1.00 | 0.00 |
| For_Indexer_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,424,062.0 ns | 2,791.28 ns | 2,474.39 ns | 1.61 | 0.00 |
| While_Indexer_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,435,667.1 ns | 3,696.89 ns | 3,277.19 ns | 1.62 | 0.00 |
| For_TryGetValue_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,502,486.1 ns | 3,750.98 ns | 3,325.15 ns | 1.70 | 0.00 |
| While_TryGetValue_ConsecutiveKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,558,335.7 ns | 4,619.63 ns | 3,857.60 ns | 1.76 | 0.00 |
| For_Indexer_CopyToKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,685,000.7 ns | 4,676.88 ns | 3,651.40 ns | 1.90 | 0.01 |
| While_Indexer_CopyToKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,722,418.0 ns | 3,431.67 ns | 3,042.08 ns | 1.95 | 0.01 |
| For_Indexer_CachedKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,499,782.0 ns | 2,951.84 ns | 2,616.73 ns | 1.70 | 0.00 |
| While_Indexer_CachedKeys | .NET 4.8 | .NET 4.8 | 100000 | 1,583,570.2 ns | 3,880.57 ns | 3,440.03 ns | 1.79 | 0.00 |
| For_ElementAt | .NET 4.8 | .NET 4.8 | 100000 | 37,917,621,633.3 ns | 47,744,618.60 ns | 44,660,345.86 ns | 42,868.63 | 93.80 |
| While_ElementAt | .NET 4.8 | .NET 4.8 | 100000 | 38,343,003,642.9 ns | 262,502,616.47 ns | 232,701,732.10 ns | 43,315.66 | 229.53 |
| | | | | | | | | |
| GetEnumerator | .NET Core 3.1 | .NET Core 3.1 | 100000 | 900,980.9 ns | 2,477.29 ns | 2,068.65 ns | 1.00 | 0.00 |
| ForEach | .NET Core 3.1 | .NET Core 3.1 | 100000 | 899,775.7 ns | 1,040.30 ns | 868.70 ns | 1.00 | 0.00 |
| For_Indexer_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,177,153.8 ns | 1,689.80 ns | 1,411.06 ns | 1.31 | 0.00 |
| While_Indexer_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,255,795.4 ns | 2,562.23 ns | 2,139.58 ns | 1.40 | 0.00 |
| For_TryGetValue_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,226,163.3 ns | 2,317.36 ns | 1,809.25 ns | 1.36 | 0.00 |
| While_TryGetValue_ConsecutiveKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,245,130.0 ns | 4,146.38 ns | 3,237.22 ns | 1.38 | 0.00 |
| For_Indexer_CopyToKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,430,340.4 ns | 7,834.82 ns | 6,945.37 ns | 1.59 | 0.01 |
| While_Indexer_CopyToKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,472,807.7 ns | 5,363.80 ns | 4,754.87 ns | 1.64 | 0.01 |
| For_Indexer_CachedKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,289,902.4 ns | 2,739.78 ns | 2,139.04 ns | 1.43 | 0.00 |
| While_Indexer_CachedKeys | .NET Core 3.1 | .NET Core 3.1 | 100000 | 1,276,484.8 ns | 4,652.23 ns | 3,884.82 ns | 1.42 | 0.00 |
| For_ElementAt | .NET Core 3.1 | .NET Core 3.1 | 100000 | 33,717,209,257.1 ns | 200,565,125.50 ns | 177,795,759.65 ns | 37,460.45 | 216.07 |
| While_ElementAt | .NET Core 3.1 | .NET Core 3.1 | 100000 | 34,064,932,086.7 ns | 225,399,893.36 ns | 210,839,200.10 ns | 37,841.10 | 204.02 |
...from this little program I wrote using BenchmarkDotNet...
using System;
using System.Collections.Generic;
using System.Linq;
using System.Reflection;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
namespace SO61507883
{
[SimpleJob(RuntimeMoniker.Net48)]
[SimpleJob(RuntimeMoniker.NetCoreApp31)]
public class Benchmarks
{
public static IReadOnlyList<int> DictionarySizes
{
get;
} = Array.AsReadOnly(new int[] { 10, 1_000 });
[ParamsSource(nameof(DictionarySizes))]
public int Size
{
get; set;
}
public Dictionary<int, int> Source
{
get; set;
}
// Only used by the *_CachedKeys() benchmark methods
public int[] KeyCache
{
get; set;
}
[GlobalSetup()]
public void Setup()
{
Source = Enumerable.Range(0, Size)
.ToDictionary(i => i, i => -i);
KeyCache = new int[Size];
Source.Keys.CopyTo(KeyCache, 0);
}
[Benchmark()]
public (int keySum, int valueSum) GetEnumerator()
{
int keySum = 0;
int valueSum = 0;
using (Dictionary<int, int>.Enumerator enumerator = Source.GetEnumerator())
while (enumerator.MoveNext())
{
KeyValuePair<int, int> pair = enumerator.Current;
keySum += pair.Key;
valueSum += pair.Value;
}
return (keySum, valueSum);
}
[Benchmark(Baseline = true)]
public (int keySum, int valueSum) ForEach()
{
int keySum = 0;
int valueSum = 0;
foreach (KeyValuePair<int, int> pair in Source)
{
keySum += pair.Key;
valueSum += pair.Value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) For_Indexer_ConsecutiveKeys()
{
int keySum = 0;
int valueSum = 0;
// This exploits the fact that we know keys from 0..Size-1 exist in Source
for (int key = 0; key < Size; key++)
{
int value = Source[key];
keySum += key;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) While_Indexer_ConsecutiveKeys()
{
int key = 0;
int keySum = 0;
int valueSum = 0;
// This exploits the fact that we know keys from 0..Size-1 exist in Source
while (key < Size)
{
int value = Source[key];
keySum += key++;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) For_TryGetValue_ConsecutiveKeys()
{
int keySum = 0;
int valueSum = 0;
// This exploits the fact that we know keys from 0..Size-1 exist in Source
for (int key = 0; key < Size; key++)
if (Source.TryGetValue(key, out int value))
{
keySum += key;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) While_TryGetValue_ConsecutiveKeys()
{
int key = 0;
int keySum = 0;
int valueSum = 0;
// This exploits the fact that we know keys from 0..Size-1 exist in Source
while (key < Size)
if (Source.TryGetValue(key, out int value))
{
keySum += key++;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) For_Indexer_CopyToKeys()
{
// Copy the Keys property to an array to allow indexing
int[] keys = new int[Size];
Source.Keys.CopyTo(keys, 0);
int keySum = 0;
int valueSum = 0;
// This makes no assumptions about the distribution of keys in Source
for (int index = 0; index < Size; index++)
{
int key = keys[index];
int value = Source[key];
keySum += key;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) While_Indexer_CopyToKeys()
{
// Copy the Keys property to an array to allow indexing
int[] keys = new int[Size];
Source.Keys.CopyTo(keys, 0);
int index = 0;
int keySum = 0;
int valueSum = 0;
// This makes no assumptions about the distribution of keys in Source
while (index < Size)
{
int key = keys[index++];
int value = Source[key];
keySum += key;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) For_Indexer_CachedKeys()
{
int keySum = 0;
int valueSum = 0;
// This retrieves known keys stored separately from Source
for (int index = 0; index < Size; index++)
{
int key = KeyCache[index];
int value = Source[key];
keySum += key;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) While_Indexer_CachedKeys()
{
int index = 0;
int keySum = 0;
int valueSum = 0;
// This retrieves known keys stored separately from Source
while (index < Size)
{
int key = KeyCache[index++];
int value = Source[key];
keySum += key;
valueSum += value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) For_ElementAt()
{
int keySum = 0;
int valueSum = 0;
for (int index = 0; index < Size; index++)
{
KeyValuePair<int, int> pair = Source.ElementAt(index);
keySum += pair.Key;
valueSum += pair.Value;
}
return (keySum, valueSum);
}
[Benchmark()]
public (int keySum, int valueSum) While_ElementAt()
{
int index = 0;
int keySum = 0;
int valueSum = 0;
while (index < Size)
{
KeyValuePair<int, int> pair = Source.ElementAt(index++);
keySum += pair.Key;
valueSum += pair.Value;
}
return (keySum, valueSum);
}
}
static class Program
{
static void Main(string[] args)
{
switch (args?.FirstOrDefault()?.ToUpper())
{
case "BENCHMARK":
BenchmarkMethods();
break;
case "TEST":
TestMethods();
break;
default:
DisplayUsage();
break;
}
}
static void DisplayUsage()
{
string assemblyLocation = Assembly.GetEntryAssembly().Location;
string assemblyFileName = System.IO.Path.GetFileName(assemblyLocation);
Console.WriteLine($"{assemblyFileName} {{ BENCHMARK | TEST }}");
Console.WriteLine("\tBENCHMARK - Benchmark dictionary enumeration methods.");
Console.WriteLine("\t TEST - Display results of dictionary enumeration methods.");
}
static void BenchmarkMethods()
{
BenchmarkDotNet.Running.BenchmarkRunner.Run<Benchmarks>();
}
static void TestMethods()
{
// Find, setup, and call the benchmark methods the same way BenchmarkDotNet would
Benchmarks benchmarks = new Benchmarks();
IEnumerable<MethodInfo> benchmarkMethods = benchmarks.GetType()
.GetMethods()
.Where(
method => method.CustomAttributes.Any(
attributeData => typeof(BenchmarkAttribute).IsAssignableFrom(attributeData.AttributeType)
)
);
foreach (MethodInfo method in benchmarkMethods)
{
Console.WriteLine($"{method.Name}():");
foreach (int size in Benchmarks.DictionarySizes)
{
benchmarks.Size = size;
benchmarks.Setup();
(int, int) result = ((int, int)) method.Invoke(benchmarks, Array.Empty<object>());
Console.WriteLine($"\t{size:N0} elements => {result}");
}
}
}
}
}
Note that the code above omits 100_000 from the Benchmarks.DictionarySizes property because it adds more than an hour to the run time.
Conclusions:
foreach/GetEnumerator() are the fastest ways to iterate a dictionary.
Depending on the runtime it's, at best, slightly slower using a for or while loop when you can make some assumptions about your keys, but it's still slower.
Using ElementAt() inside a for loop has terrible performance that only gets slower the bigger the dictionary gets.

It would matter only in very extreme cases. The performance draw back in foreach is that you have to write into another variable, which in for you don't.
The foreach is basically this:
for(int i = 0, i < something.Length; i++)
{
var item = something[i]; //which is why you can just use the item from collection
//your code using the item var...
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Ways to improve string memory allocation - c#

how about: result[i] = new string('#',i).PadLeft(n) ? Note that this still allocates two strings internally, but I honestly don't see that as a problem. The garbage collector will take care of it for you.

Related

What causes the difference of different copy method of different array length in C#?

Is there a way to reverse each 2 bytes of a file?

Why is BitArray faster than array of bools?

Generic performance vs. Type performance

Looping for vs foreach difference in iterating dictionary c#

Categories

Resources