Comparing Byte Arrays In C# (without for loop) [duplicate] - c#
How can I do this fast?
Sure I can do this:
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length)
return false;
for (int i=0; i<a1.Length; i++)
if (a1[i]!=a2[i])
return false;
return true;
}
But I'm looking for either a BCL function or some highly optimized proven way to do this.
java.util.Arrays.equals((sbyte[])(Array)a1, (sbyte[])(Array)a2);
works nicely, but it doesn't look like that would work for x64.
Note my super-fast answer here.
You can use Enumerable.SequenceEqual method.
using System;
using System.Linq;
...
var a1 = new int[] { 1, 2, 3};
var a2 = new int[] { 1, 2, 3};
var a3 = new int[] { 1, 2, 4};
var x = a1.SequenceEqual(a2); // true
var y = a1.SequenceEqual(a3); // false
If you can't use .NET 3.5 for some reason, your method is OK.
Compiler\run-time environment will optimize your loop so you don't need to worry about performance.
P/Invoke powers activate!
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static bool ByteArrayCompare(byte[] b1, byte[] b2)
{
// Validate buffers are the same length.
// This also ensures that the count does not exceed the length of either buffer.
return b1.Length == b2.Length && memcmp(b1, b2, b1.Length) == 0;
}
Span<T> offers an extremely competitive alternative without having to throw confusing and/or non-portable fluff into your own application's code base:
// byte[] is implicitly convertible to ReadOnlySpan<byte>
static bool ByteArrayCompare(ReadOnlySpan<byte> a1, ReadOnlySpan<byte> a2)
{
return a1.SequenceEqual(a2);
}
The (guts of the) implementation as of .NET 6.0.4 can be found here.
I've revised #EliArbel's gist to add this method as SpansEqual, drop most of the less interesting performers in others' benchmarks, run it with different array sizes, output graphs, and mark SpansEqual as the baseline so that it reports how the different methods compare to SpansEqual.
The below numbers are from the results, lightly edited to remove "Error" column.
| Method | ByteCount | Mean | StdDev | Ratio | RatioSD |
|-------------- |----------- |-------------------:|----------------:|------:|--------:|
| SpansEqual | 15 | 2.074 ns | 0.0233 ns | 1.00 | 0.00 |
| LongPointers | 15 | 2.854 ns | 0.0632 ns | 1.38 | 0.03 |
| Unrolled | 15 | 12.449 ns | 0.2487 ns | 6.00 | 0.13 |
| PInvokeMemcmp | 15 | 7.525 ns | 0.1057 ns | 3.63 | 0.06 |
| | | | | | |
| SpansEqual | 1026 | 15.629 ns | 0.1712 ns | 1.00 | 0.00 |
| LongPointers | 1026 | 46.487 ns | 0.2938 ns | 2.98 | 0.04 |
| Unrolled | 1026 | 23.786 ns | 0.1044 ns | 1.52 | 0.02 |
| PInvokeMemcmp | 1026 | 28.299 ns | 0.2781 ns | 1.81 | 0.03 |
| | | | | | |
| SpansEqual | 1048585 | 17,920.329 ns | 153.0750 ns | 1.00 | 0.00 |
| LongPointers | 1048585 | 42,077.448 ns | 309.9067 ns | 2.35 | 0.02 |
| Unrolled | 1048585 | 29,084.901 ns | 428.8496 ns | 1.62 | 0.03 |
| PInvokeMemcmp | 1048585 | 30,847.572 ns | 213.3162 ns | 1.72 | 0.02 |
| | | | | | |
| SpansEqual | 2147483591 | 124,752,376.667 ns | 552,281.0202 ns | 1.00 | 0.00 |
| LongPointers | 2147483591 | 139,477,269.231 ns | 331,458.5429 ns | 1.12 | 0.00 |
| Unrolled | 2147483591 | 137,617,423.077 ns | 238,349.5093 ns | 1.10 | 0.00 |
| PInvokeMemcmp | 2147483591 | 138,373,253.846 ns | 288,447.8278 ns | 1.11 | 0.01 |
I was surprised to see SpansEqual not come out on top for the max-array-size methods, but the difference is so minor that I don't think it'll ever matter. After refreshing to run on .NET 6.0.4 with my newer hardware, SpansEqual now comfortably outperforms all others at all array sizes.
My system info:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.202
[Host] : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
There's a new built-in solution for this in .NET 4 - IStructuralEquatable
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(a1, a2);
}
Edit: modern fast way is to use a1.SequenceEquals(a2)
User gil suggested unsafe code which spawned this solution:
// Copyright (c) 2008-2013 Hafthor Stefansson
// Distributed under the MIT/X11 software license
// Ref: http://www.opensource.org/licenses/mit-license.php.
static unsafe bool UnsafeCompare(byte[] a1, byte[] a2) {
unchecked {
if(a1==a2) return true;
if(a1==null || a2==null || a1.Length!=a2.Length)
return false;
fixed (byte* p1=a1, p2=a2) {
byte* x1=p1, x2=p2;
int l = a1.Length;
for (int i=0; i < l/8; i++, x1+=8, x2+=8)
if (*((long*)x1) != *((long*)x2)) return false;
if ((l & 4)!=0) { if (*((int*)x1)!=*((int*)x2)) return false; x1+=4; x2+=4; }
if ((l & 2)!=0) { if (*((short*)x1)!=*((short*)x2)) return false; x1+=2; x2+=2; }
if ((l & 1)!=0) if (*((byte*)x1) != *((byte*)x2)) return false;
return true;
}
}
}
which does 64-bit based comparison for as much of the array as possible. This kind of counts on the fact that the arrays start qword aligned. It'll work if not qword aligned, just not as fast as if it were.
It performs about seven timers faster than the simple `for` loop. Using the J# library performed equivalently to the original `for` loop. Using .SequenceEqual runs around seven times slower; I think just because it is using IEnumerator.MoveNext. I imagine LINQ-based solutions being at least that slow or worse.
If you are not opposed to doing it, you can import the J# assembly "vjslib.dll" and use its Arrays.equals(byte[], byte[]) method...
Don't blame me if someone laughs at you though...
EDIT: For what little it is worth, I used Reflector to disassemble the code for that, and here is what it looks like:
public static bool equals(sbyte[] a1, sbyte[] a2)
{
if (a1 == a2)
{
return true;
}
if ((a1 != null) && (a2 != null))
{
if (a1.Length != a2.Length)
{
return false;
}
for (int i = 0; i < a1.Length; i++)
{
if (a1[i] != a2[i])
{
return false;
}
}
return true;
}
return false;
}
.NET 3.5 and newer have a new public type, System.Data.Linq.Binary that encapsulates byte[]. It implements IEquatable<Binary> that (in effect) compares two byte arrays. Note that System.Data.Linq.Binary also has implicit conversion operator from byte[].
MSDN documentation:System.Data.Linq.Binary
Reflector decompile of the Equals method:
private bool EqualsTo(Binary binary)
{
if (this != binary)
{
if (binary == null)
{
return false;
}
if (this.bytes.Length != binary.bytes.Length)
{
return false;
}
if (this.hashCode != binary.hashCode)
{
return false;
}
int index = 0;
int length = this.bytes.Length;
while (index < length)
{
if (this.bytes[index] != binary.bytes[index])
{
return false;
}
index++;
}
}
return true;
}
Interesting twist is that they only proceed to byte-by-byte comparison loop if hashes of the two Binary objects are the same. This, however, comes at the cost of computing the hash in constructor of Binary objects (by traversing the array with for loop :-) ).
The above implementation means that in the worst case you may have to traverse the arrays three times: first to compute hash of array1, then to compute hash of array2 and finally (because this is the worst case scenario, lengths and hashes equal) to compare bytes in array1 with bytes in array 2.
Overall, even though System.Data.Linq.Binary is built into BCL, I don't think it is the fastest way to compare two byte arrays :-|.
I posted a similar question about checking if byte[] is full of zeroes. (SIMD code was beaten so I removed it from this answer.) Here is fastest code from my comparisons:
static unsafe bool EqualBytesLongUnrolled (byte[] data1, byte[] data2)
{
if (data1 == data2)
return true;
if (data1.Length != data2.Length)
return false;
fixed (byte* bytes1 = data1, bytes2 = data2) {
int len = data1.Length;
int rem = len % (sizeof(long) * 16);
long* b1 = (long*)bytes1;
long* b2 = (long*)bytes2;
long* e1 = (long*)(bytes1 + len - rem);
while (b1 < e1) {
if (*(b1) != *(b2) || *(b1 + 1) != *(b2 + 1) ||
*(b1 + 2) != *(b2 + 2) || *(b1 + 3) != *(b2 + 3) ||
*(b1 + 4) != *(b2 + 4) || *(b1 + 5) != *(b2 + 5) ||
*(b1 + 6) != *(b2 + 6) || *(b1 + 7) != *(b2 + 7) ||
*(b1 + 8) != *(b2 + 8) || *(b1 + 9) != *(b2 + 9) ||
*(b1 + 10) != *(b2 + 10) || *(b1 + 11) != *(b2 + 11) ||
*(b1 + 12) != *(b2 + 12) || *(b1 + 13) != *(b2 + 13) ||
*(b1 + 14) != *(b2 + 14) || *(b1 + 15) != *(b2 + 15))
return false;
b1 += 16;
b2 += 16;
}
for (int i = 0; i < rem; i++)
if (data1 [len - 1 - i] != data2 [len - 1 - i])
return false;
return true;
}
}
Measured on two 256MB byte arrays:
UnsafeCompare : 86,8784 ms
EqualBytesSimd : 71,5125 ms
EqualBytesSimdUnrolled : 73,1917 ms
EqualBytesLongUnrolled : 39,8623 ms
using System.Linq; //SequenceEqual
byte[] ByteArray1 = null;
byte[] ByteArray2 = null;
ByteArray1 = MyFunct1();
ByteArray2 = MyFunct2();
if (ByteArray1.SequenceEqual<byte>(ByteArray2) == true)
{
MessageBox.Show("Match");
}
else
{
MessageBox.Show("Don't match");
}
Let's add one more!
Recently Microsoft released a special NuGet package, System.Runtime.CompilerServices.Unsafe. It's special because it's written in IL, and provides low-level functionality not directly available in C#.
One of its methods, Unsafe.As<T>(object) allows casting any reference type to another reference type, skipping any safety checks. This is usually a very bad idea, but if both types have the same structure, it can work. So we can use this to cast a byte[] to a long[]:
bool CompareWithUnsafeLibrary(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length) return false;
var longSize = (int)Math.Floor(a1.Length / 8.0);
var long1 = Unsafe.As<long[]>(a1);
var long2 = Unsafe.As<long[]>(a2);
for (var i = 0; i < longSize; i++)
{
if (long1[i] != long2[i]) return false;
}
for (var i = longSize * 8; i < a1.Length; i++)
{
if (a1[i] != a2[i]) return false;
}
return true;
}
Note that long1.Length would still return the original array's length, since it's stored in a field in the array's memory structure.
This method is not quite as fast as other methods demonstrated here, but it is a lot faster than the naive method, doesn't use unsafe code or P/Invoke or pinning, and the implementation is quite straightforward (IMO). Here are some BenchmarkDotNet results from my machine:
BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4870HQ CPU 2.50GHz, ProcessorCount=8
Frequency=2435775 Hz, Resolution=410.5470 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
Method | Mean | StdDev |
----------------------- |-------------- |---------- |
UnsafeLibrary | 125.8229 ns | 0.3588 ns |
UnsafeCompare | 89.9036 ns | 0.8243 ns |
JSharpEquals | 1,432.1717 ns | 1.3161 ns |
EqualBytesLongUnrolled | 43.7863 ns | 0.8923 ns |
NewMemCmp | 65.4108 ns | 0.2202 ns |
ArraysEqual | 910.8372 ns | 2.6082 ns |
PInvokeMemcmp | 52.7201 ns | 0.1105 ns |
I've also created a gist with all the tests.
I developed a method that slightly beats memcmp() (plinth's answer) and very slighly beats EqualBytesLongUnrolled() (Arek Bulski's answer) on my PC. Basically, it unrolls the loop by 4 instead of 8.
Update 30 Mar. 2019:
Starting in .NET core 3.0, we have SIMD support!
This solution is fastest by a considerable margin on my PC:
#if NETCOREAPP3_0
using System.Runtime.Intrinsics.X86;
#endif
…
public static unsafe bool Compare(byte[] arr0, byte[] arr1)
{
if (arr0 == arr1)
{
return true;
}
if (arr0 == null || arr1 == null)
{
return false;
}
if (arr0.Length != arr1.Length)
{
return false;
}
if (arr0.Length == 0)
{
return true;
}
fixed (byte* b0 = arr0, b1 = arr1)
{
#if NETCOREAPP3_0
if (Avx2.IsSupported)
{
return Compare256(b0, b1, arr0.Length);
}
else if (Sse2.IsSupported)
{
return Compare128(b0, b1, arr0.Length);
}
else
#endif
{
return Compare64(b0, b1, arr0.Length);
}
}
}
#if NETCOREAPP3_0
public static unsafe bool Compare256(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus128 = lastAddr - 128;
const int mask = -1;
while (b0 < lastAddrMinus128) // unroll the loop so that we are comparing 128 bytes at a time.
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 32), Avx.LoadVector256(b1 + 32))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 64), Avx.LoadVector256(b1 + 64))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 96), Avx.LoadVector256(b1 + 96))) != mask)
{
return false;
}
b0 += 128;
b1 += 128;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
public static unsafe bool Compare128(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus64 = lastAddr - 64;
const int mask = 0xFFFF;
while (b0 < lastAddrMinus64) // unroll the loop so that we are comparing 64 bytes at a time.
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 16), Sse2.LoadVector128(b1 + 16))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 32), Sse2.LoadVector128(b1 + 32))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 48), Sse2.LoadVector128(b1 + 48))) != mask)
{
return false;
}
b0 += 64;
b1 += 64;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
#endif
public static unsafe bool Compare64(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus32 = lastAddr - 32;
while (b0 < lastAddrMinus32) // unroll the loop so that we are comparing 32 bytes at a time.
{
if (*(ulong*)b0 != *(ulong*)b1) return false;
if (*(ulong*)(b0 + 8) != *(ulong*)(b1 + 8)) return false;
if (*(ulong*)(b0 + 16) != *(ulong*)(b1 + 16)) return false;
if (*(ulong*)(b0 + 24) != *(ulong*)(b1 + 24)) return false;
b0 += 32;
b1 += 32;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
I would use unsafe code and run the for loop comparing Int32 pointers.
Maybe you should also consider checking the arrays to be non-null.
If you look at how .NET does string.Equals, you see that it uses a private method called EqualsHelper which has an "unsafe" pointer implementation. .NET Reflector is your friend to see how things are done internally.
This can be used as a template for byte array comparison which I did an implementation on in blog post Fast byte array comparison in C#. I also did some rudimentary benchmarks to see when a safe implementation is faster than the unsafe.
That said, unless you really need killer performance, I'd go for a simple fr loop comparison.
For those of you that care about order (i.e. want your memcmp to return an int like it should instead of nothing), .NET Core 3.0 (and presumably .NET Standard 2.1 aka .NET 5.0) will include a Span.SequenceCompareTo(...) extension method (plus a Span.SequenceEqualTo) that can be used to compare two ReadOnlySpan<T> instances (where T: IComparable<T>).
In the original GitHub proposal, the discussion included approach comparisons with jump table calculations, reading a byte[] as long[], SIMD usage, and p/invoke to the CLR implementation's memcmp.
Going forward, this should be your go-to method for comparing byte arrays or byte ranges (as should using Span<byte> instead of byte[] for your .NET Standard 2.1 APIs), and it is sufficiently fast enough that you should no longer care about optimizing it (and no, despite the similarities in name it does not perform as abysmally as the horrid Enumerable.SequenceEqual).
#if NETCOREAPP3_0_OR_GREATER
// Using the platform-native Span<T>.SequenceEqual<T>(..)
public static int Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
var span1 = range1.AsSpan(offset1, count);
var span2 = range2.AsSpan(offset2, count);
return span1.SequenceCompareTo(span2);
// or, if you don't care about ordering
// return span1.SequenceEqual(span2);
}
#else
// The most basic implementation, in platform-agnostic, safe C#
public static bool Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
// Working backwards lets the compiler optimize away bound checking after the first loop
for (int i = count - 1; i >= 0; --i)
{
if (range1[offset1 + i] != range2[offset2 + i])
{
return false;
}
}
return true;
}
#endif
I did some measurements using attached program .net 4.7 release build without the debugger attached. I think people have been using the wrong metric since what you are about if you care about speed here is how long it takes to figure out if two byte arrays are equal. i.e. throughput in bytes.
StructuralComparison : 4.6 MiB/s
for : 274.5 MiB/s
ToUInt32 : 263.6 MiB/s
ToUInt64 : 474.9 MiB/s
memcmp : 8500.8 MiB/s
As you can see, there's no better way than memcmp and it's orders of magnitude faster. A simple for loop is the second best option. And it still boggles my mind why Microsoft cannot simply include a Buffer.Compare method.
[Program.cs]:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace memcmp
{
class Program
{
static byte[] TestVector(int size)
{
var data = new byte[size];
using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider())
{
rng.GetBytes(data);
}
return data;
}
static TimeSpan Measure(string testCase, TimeSpan offset, Action action, bool ignore = false)
{
var t = Stopwatch.StartNew();
var n = 0L;
while (t.Elapsed < TimeSpan.FromSeconds(10))
{
action();
n++;
}
var elapsed = t.Elapsed - offset;
if (!ignore)
{
Console.WriteLine($"{testCase,-16} : {n / elapsed.TotalSeconds,16:0.0} MiB/s");
}
return elapsed;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static void Main(string[] args)
{
// how quickly can we establish if two sequences of bytes are equal?
// note that we are testing the speed of different comparsion methods
var a = TestVector(1024 * 1024); // 1 MiB
var b = (byte[])a.Clone();
// was meant to offset the overhead of everything but copying but my attempt was a horrible mistake... should have reacted sooner due to the initially ridiculous throughput values...
// Measure("offset", new TimeSpan(), () => { return; }, ignore: true);
var offset = TimeZone.Zero
Measure("StructuralComparison", offset, () =>
{
StructuralComparisons.StructuralEqualityComparer.Equals(a, b);
});
Measure("for", offset, () =>
{
for (int i = 0; i < a.Length; i++)
{
if (a[i] != b[i]) break;
}
});
Measure("ToUInt32", offset, () =>
{
for (int i = 0; i < a.Length; i += 4)
{
if (BitConverter.ToUInt32(a, i) != BitConverter.ToUInt32(b, i)) break;
}
});
Measure("ToUInt64", offset, () =>
{
for (int i = 0; i < a.Length; i += 8)
{
if (BitConverter.ToUInt64(a, i) != BitConverter.ToUInt64(b, i)) break;
}
});
Measure("memcmp", offset, () =>
{
memcmp(a, b, a.Length);
});
}
}
}
Couldn't find a solution I'm completely happy with (reasonable performance, but no unsafe code/pinvoke) so I came up with this, nothing really original, but works:
/// <summary>
///
/// </summary>
/// <param name="array1"></param>
/// <param name="array2"></param>
/// <param name="bytesToCompare"> 0 means compare entire arrays</param>
/// <returns></returns>
public static bool ArraysEqual(byte[] array1, byte[] array2, int bytesToCompare = 0)
{
if (array1.Length != array2.Length) return false;
var length = (bytesToCompare == 0) ? array1.Length : bytesToCompare;
var tailIdx = length - length % sizeof(Int64);
//check in 8 byte chunks
for (var i = 0; i < tailIdx; i += sizeof(Int64))
{
if (BitConverter.ToInt64(array1, i) != BitConverter.ToInt64(array2, i)) return false;
}
//check the remainder of the array, always shorter than 8 bytes
for (var i = tailIdx; i < length; i++)
{
if (array1[i] != array2[i]) return false;
}
return true;
}
Performance compared with some of the other solutions on this page:
Simple Loop: 19837 ticks, 1.00
*BitConverter: 4886 ticks, 4.06
UnsafeCompare: 1636 ticks, 12.12
EqualBytesLongUnrolled: 637 ticks, 31.09
P/Invoke memcmp: 369 ticks, 53.67
Tested in linqpad, 1000000 bytes identical arrays (worst case scenario), 500 iterations each.
It seems that EqualBytesLongUnrolled is the best from the above suggested.
Skipped methods (Enumerable.SequenceEqual,StructuralComparisons.StructuralEqualityComparer.Equals), were not-patient-for-slow. On 265MB arrays I have measured this:
Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-3770 CPU 3.40GHz, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT]
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1590.0
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.0443 ms | 1.1880 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 29.9917 ms | 0.7480 ms | 0.99 | 0.04 |
msvcrt_memcmp | 30.0930 ms | 0.2964 ms | 1.00 | 0.03 |
UnsafeCompare | 31.0520 ms | 0.7072 ms | 1.03 | 0.04 |
ByteArrayCompare | 212.9980 ms | 2.0776 ms | 7.06 | 0.25 |
OS=Windows
Processor=?, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003131
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.1789 ms | 0.0437 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 30.1985 ms | 0.1782 ms | 1.00 | 0.01 |
msvcrt_memcmp | 30.1084 ms | 0.0660 ms | 1.00 | 0.00 |
UnsafeCompare | 31.1845 ms | 0.4051 ms | 1.03 | 0.01 |
ByteArrayCompare | 212.0213 ms | 0.1694 ms | 7.03 | 0.01 |
For comparing short byte arrays the following is an interesting hack:
if(myByteArray1.Length != myByteArray2.Length) return false;
if(myByteArray1.Length == 8)
return BitConverter.ToInt64(myByteArray1, 0) == BitConverter.ToInt64(myByteArray2, 0);
else if(myByteArray.Length == 4)
return BitConverter.ToInt32(myByteArray2, 0) == BitConverter.ToInt32(myByteArray2, 0);
Then I would probably fall out to the solution listed in the question.
It'd be interesting to do a performance analysis of this code.
I have not seen many linq solutions here.
I am not sure of the performance implications, however I generally stick to linq as rule of thumb and then optimize later if necessary.
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
return !array1.Where((t, i) => t != array2[i]).Any();
}
Please do note this only works if they are the same size arrays.
an extension could look like so
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
if (array1.Length != array2.Length) return false;
return !array1.Where((t, i) => t != array2[i]).Any();
}
I thought about block-transfer acceleration methods built into many graphics cards. But then you would have to copy over all the data byte-wise, so this doesn't help you much if you don't want to implement a whole portion of your logic in unmanaged and hardware-dependent code...
Another way of optimization similar to the approach shown above would be to store as much of your data as possible in a long[] rather than a byte[] right from the start, for example if you are reading it sequentially from a binary file, or if you use a memory mapped file, read in data as long[] or single long values. Then, your comparison loop will only need 1/8th of the number of iterations it would have to do for a byte[] containing the same amount of data.
It is a matter of when and how often you need to compare vs. when and how often you need to access the data in a byte-by-byte manner, e.g. to use it in an API call as a parameter in a method that expects a byte[]. In the end, you only can tell if you really know the use case...
Sorry, if you're looking for a managed way you're already doing it correctly and to my knowledge there's no built in method in the BCL for doing this.
You should add some initial null checks and then just reuse it as if it where in BCL.
I settled on a solution inspired by the EqualBytesLongUnrolled method posted by ArekBulski with an additional optimization. In my instance, array differences in arrays tend to be near the tail of the arrays. In testing, I found that when this is the case for large arrays, being able to compare array elements in reverse order gives this solution a huge performance gain over the memcmp based solution. Here is that solution:
public enum CompareDirection { Forward, Backward }
private static unsafe bool UnsafeEquals(byte[] a, byte[] b, CompareDirection direction = CompareDirection.Forward)
{
// returns when a and b are same array or both null
if (a == b) return true;
// if either is null or different lengths, can't be equal
if (a == null || b == null || a.Length != b.Length)
return false;
const int UNROLLED = 16; // count of longs 'unrolled' in optimization
int size = sizeof(long) * UNROLLED; // 128 bytes (min size for 'unrolled' optimization)
int len = a.Length;
int n = len / size; // count of full 128 byte segments
int r = len % size; // count of remaining 'unoptimized' bytes
// pin the arrays and access them via pointers
fixed (byte* pb_a = a, pb_b = b)
{
if (r > 0 && direction == CompareDirection.Backward)
{
byte* pa = pb_a + len - 1;
byte* pb = pb_b + len - 1;
byte* phead = pb_a + len - r;
while(pa >= phead)
{
if (*pa != *pb) return false;
pa--;
pb--;
}
}
if (n > 0)
{
int nOffset = n * size;
if (direction == CompareDirection.Forward)
{
long* pa = (long*)pb_a;
long* pb = (long*)pb_b;
long* ptail = (long*)(pb_a + nOffset);
while (pa < ptail)
{
if (*(pa + 0) != *(pb + 0) || *(pa + 1) != *(pb + 1) ||
*(pa + 2) != *(pb + 2) || *(pa + 3) != *(pb + 3) ||
*(pa + 4) != *(pb + 4) || *(pa + 5) != *(pb + 5) ||
*(pa + 6) != *(pb + 6) || *(pa + 7) != *(pb + 7) ||
*(pa + 8) != *(pb + 8) || *(pa + 9) != *(pb + 9) ||
*(pa + 10) != *(pb + 10) || *(pa + 11) != *(pb + 11) ||
*(pa + 12) != *(pb + 12) || *(pa + 13) != *(pb + 13) ||
*(pa + 14) != *(pb + 14) || *(pa + 15) != *(pb + 15)
)
{
return false;
}
pa += UNROLLED;
pb += UNROLLED;
}
}
else
{
long* pa = (long*)(pb_a + nOffset);
long* pb = (long*)(pb_b + nOffset);
long* phead = (long*)pb_a;
while (phead < pa)
{
if (*(pa - 1) != *(pb - 1) || *(pa - 2) != *(pb - 2) ||
*(pa - 3) != *(pb - 3) || *(pa - 4) != *(pb - 4) ||
*(pa - 5) != *(pb - 5) || *(pa - 6) != *(pb - 6) ||
*(pa - 7) != *(pb - 7) || *(pa - 8) != *(pb - 8) ||
*(pa - 9) != *(pb - 9) || *(pa - 10) != *(pb - 10) ||
*(pa - 11) != *(pb - 11) || *(pa - 12) != *(pb - 12) ||
*(pa - 13) != *(pb - 13) || *(pa - 14) != *(pb - 14) ||
*(pa - 15) != *(pb - 15) || *(pa - 16) != *(pb - 16)
)
{
return false;
}
pa -= UNROLLED;
pb -= UNROLLED;
}
}
}
if (r > 0 && direction == CompareDirection.Forward)
{
byte* pa = pb_a + len - r;
byte* pb = pb_b + len - r;
byte* ptail = pb_a + len;
while(pa < ptail)
{
if (*pa != *pb) return false;
pa++;
pb++;
}
}
}
return true;
}
This is almost certainly much slower than any other version given here, but it was fun to write.
static bool ByteArrayEquals(byte[] a1, byte[] a2)
{
return a1.Zip(a2, (l, r) => l == r).All(x => x);
}
This is similar to others, but the difference here is that there is no falling through to the next highest number of bytes I can check at once, e.g. if I have 63 bytes (in my SIMD example) I can check the equality of the first 32 bytes, and then the last 32 bytes, which is faster than checking 32 bytes, 16 bytes, 8 bytes, and so on. The first check you enter is the only check you will need to compare all of the bytes.
This does come out on top in my tests, but just by a hair.
The following code is exactly how I tested it in airbreather/ArrayComparePerf.cs.
public unsafe bool SIMDNoFallThrough() #requires System.Runtime.Intrinsics.X86
{
if (a1 == null || a2 == null)
return false;
int length0 = a1.Length;
if (length0 != a2.Length) return false;
fixed (byte* b00 = a1, b01 = a2)
{
byte* b0 = b00, b1 = b01, last0 = b0 + length0, last1 = b1 + length0, last32 = last0 - 31;
if (length0 > 31)
{
while (b0 < last32)
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != -1)
return false;
b0 += 32;
b1 += 32;
}
return Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(last0 - 32), Avx.LoadVector256(last1 - 32))) == -1;
}
if (length0 > 15)
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != 65535)
return false;
return Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(last0 - 16), Sse2.LoadVector128(last1 - 16))) == 65535;
}
if (length0 > 7)
{
if (*(ulong*)b0 != *(ulong*)b1)
return false;
return *(ulong*)(last0 - 8) == *(ulong*)(last1 - 8);
}
if (length0 > 3)
{
if (*(uint*)b0 != *(uint*)b1)
return false;
return *(uint*)(last0 - 4) == *(uint*)(last1 - 4);
}
if (length0 > 1)
{
if (*(ushort*)b0 != *(ushort*)b1)
return false;
return *(ushort*)(last0 - 2) == *(ushort*)(last1 - 2);
}
return *b0 == *b1;
}
}
If no SIMD is preferred, the same method applied to the the existing LongPointers algorithm:
public unsafe bool LongPointersNoFallThrough()
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
return false;
fixed (byte* p1 = a1, p2 = a2)
{
byte* x1 = p1, x2 = p2;
int l = a1.Length;
if ((l & 8) != 0)
{
for (int i = 0; i < l / 8; i++, x1 += 8, x2 += 8)
if (*(long*)x1 != *(long*)x2) return false;
return *(long*)(x1 + (l - 8)) == *(long*)(x2 + (l - 8));
}
if ((l & 4) != 0)
{
if (*(int*)x1 != *(int*)x2) return false; x1 += 4; x2 += 4;
return *(int*)(x1 + (l - 4)) == *(int*)(x2 + (l - 4));
}
if ((l & 2) != 0)
{
if (*(short*)x1 != *(short*)x2) return false; x1 += 2; x2 += 2;
return *(short*)(x1 + (l - 2)) == *(short*)(x2 + (l - 2));
}
return *x1 == *x2;
}
}
If you are looking for a very fast byte array equality comparer, I suggest you take a look at this STSdb Labs article: Byte array equality comparer. It features some of the fastest implementations for byte[] array equality comparing, which are presented, performance tested and summarized.
You can also focus on these implementations:
BigEndianByteArrayComparer - fast byte[] array comparer from left to right (BigEndian)
BigEndianByteArrayEqualityComparer - - fast byte[] equality comparer from left to right (BigEndian)
LittleEndianByteArrayComparer - fast byte[] array comparer from right to left (LittleEndian)
LittleEndianByteArrayEqualityComparer - fast byte[] equality comparer from right to left (LittleEndian)
Use SequenceEquals for this to comparison.
The short answer is this:
public bool Compare(byte[] b1, byte[] b2)
{
return Encoding.ASCII.GetString(b1) == Encoding.ASCII.GetString(b2);
}
In such a way you can use the optimized .NET string compare to make a byte array compare without the need to write unsafe code. This is how it is done in the background:
private unsafe static bool EqualsHelper(String strA, String strB)
{
Contract.Requires(strA != null);
Contract.Requires(strB != null);
Contract.Requires(strA.Length == strB.Length);
int length = strA.Length;
fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
{
char* a = ap;
char* b = bp;
// Unroll the loop
#if AMD64
// For the AMD64 bit platform we unroll by 12 and
// check three qwords at a time. This is less code
// than the 32 bit case and is shorter
// pathlength.
while (length >= 12)
{
if (*(long*)a != *(long*)b) return false;
if (*(long*)(a+4) != *(long*)(b+4)) return false;
if (*(long*)(a+8) != *(long*)(b+8)) return false;
a += 12; b += 12; length -= 12;
}
#else
while (length >= 10)
{
if (*(int*)a != *(int*)b) return false;
if (*(int*)(a+2) != *(int*)(b+2)) return false;
if (*(int*)(a+4) != *(int*)(b+4)) return false;
if (*(int*)(a+6) != *(int*)(b+6)) return false;
if (*(int*)(a+8) != *(int*)(b+8)) return false;
a += 10; b += 10; length -= 10;
}
#endif
// This depends on the fact that the String objects are
// always zero terminated and that the terminating zero is not included
// in the length. For odd string sizes, the last compare will include
// the zero terminator.
while (length > 0)
{
if (*(int*)a != *(int*)b) break;
a += 2; b += 2; length -= 2;
}
return (length <= 0);
}
}
Since many of the fancy solutions above don't work with UWP and because I love Linq and functional approaches I pressent you my version to this problem.
To escape the comparison when the first difference occures, I chose .FirstOrDefault()
public static bool CompareByteArrays(byte[] ba0, byte[] ba1) =>
!(ba0.Length != ba1.Length || Enumerable.Range(1,ba0.Length)
.FirstOrDefault(n => ba0[n] != ba1[n]) > 0);
Related
How to convert a float/double/half to a minifloat the optimal way (improve my already working code)?
I've written an IEEE 754 "quarter" 8-bit minifloat in a 1.3.4.−3 format in C#. It was mostly a fun little side-project, testing whether or not I understand floats. Actually, though, I find myself using it more than I'd like to admit :) (bandwidth > clock ticks) Here's my code for converting the minifloat to a 32-bit float: public static implicit operator float(quarter q) { int sign = (q.value & 0b1000_0000) << 24; int fusedExponentMantissa = (q.value & 0b0111_1111) << (23 - MANTISSA_BITS); if ((q.value & 0b0111_0000) == 0b0111_0000) // NaN/Infinity { return asfloat(sign | (255 << 23) | fusedExponentMantissa); } else // normal and subnormal { float magic = asfloat((255 - 1 + EXPONENT_BIAS) << 23); return magic * asfloat(sign | fusedExponentMantissa); } } where quarter.value is the stored byte and "asfloat" is simply *(float*)&myUInt.The "magic" number makes use of mantissa overflow in the subnormal case, which affects the f_32 exponent (integer multiplication and mask + add is slower than FPU-switch and float multiplication). I guess one could optimize away the branch, too. But here comes the problematic code - float_32 to float_8: public static explicit operator quarter(float f) { byte f8_sign = (byte)((asuint(f) & 0x8000_0000u) >> 24); uint f32_exponent = asuint(f) & 0x7F80_0000u; uint f32_mantissa = asuint(f) & 0x007F_FFFFu; if (f32_exponent < (120 << 23)) // underflow => preserve +/- 0 { return new quarter { value = f8_sign }; } else if (f32_exponent > (130 << 23)) // overflow => +/- infinity or preserve NaN { return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) }; } else { switch (f32_exponent) { case 120 << 23: // 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0 { return new quarter { value = (byte)(f8_sign | touint8(f32_mantissa != 0)) }; } case 121 << 23: // 2^(-6) * (1 + mantissa): return +/- quarter.epsilon = 2^(-2) * (0 + 2^(-4)); if the mantissa is > 0.5 i.e. 2^(-6) * max(mantissa, 1.75), return 2^(-2) * 2^(-3) { return new quarter { value = (byte)(f8_sign | (Epsilon.value + touint8(f32_mantissa > 0x0040_0000))) }; } case 122 << 23: { return new quarter { value = (byte)(f8_sign | 0b0000_0010u | (f32_mantissa >> 22)) }; } case 123 << 23: { return new quarter { value = (byte)(f8_sign | 0b0000_0100u | (f32_mantissa >> 21)) }; } case 124 << 23: { return new quarter { value = (byte)(f8_sign | 0b0000_1000u | (f32_mantissa >> 20)) }; } default: { const uint exponentDelta = (127 + EXPONENT_BIAS) << 23; return new quarter { value = (byte)(f8_sign | (((f32_exponent - exponentDelta) | f32_mantissa) >> 19)) }; } } } } ... where the function "asuint" is simply *(uint*)&myFloat and "touint8" is simply *(byte*)&myBoolean i.e. myBoolean ? 1 : 0. The first five cases deal with numbers that can only be represented as subnormals in a "quarter". I want to get rid of the switch at the very least. There's obviously a pattern (same as with float8_to_float32) but I haven't been able to figure out how I could unify the entire switch for days... I tried to google how hardware converts doubles to floats but that yielded no results either. My requirements are to hold on to the IEEE-754 standard, meaning: NaN, infinity preservation and clamping to infinity/zero in case of over-/underflow, aswell as rounding to epsilon when the larger type's value is closer to epsilon than 0 (first switch case aswell as the underflow limit in the first if statement). Can anyone at least push me in the right direction please?
This may not be optimal, but it uses strictly conforming C code except as noted in the first comment, so no pointer aliasing or other manipulation of the bits of a floating-point object. A thorough test program is included. #include <inttypes.h> #include <math.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> /* Notes on portability: uint8_t is an optional type. Its use here is easily replaced by unsigned char. Round-to-nearest is required in FloatToMini. Floating-point must be base two, and the constant in the Dekker-Veltkamp split is hardcoded for IEEE-754 binary64 but could be adopted to other formats. (Change the exponent in 0x1p48 to the number of bits in the significand minus five.) */ /* Convert a double to a 1-3-4 floating-point format. Round-to-nearest is required. */ static uint8_t FloatToMini(double x) { // Extract the sign bit of x, moved into its position in a mini-float. uint8_t s = !!signbit(x) << 7; x = fabs(x); /* If x is a NaN, return a quiet NaN with the copied sign. Significand bits are not preserved. */ if (x != x) return s | 0x78; /* If |x| is greater than or equal to the rounding point between the maximum finite value and infinity, return infinity with the copied sign. (0x1.fp0 is the largest representable significand, 0x1.f8 is that plus half an ULP, and the largest exponent is 3, so 0x1.f8p3 is that rounding point.) */ if (0x1.f8p3 <= x) return s | 0x70; // If x is subnormal, encode with zero exponent. if (x < 0x1p-2 - 0x1p-7) return s | (uint8_t) nearbyint(x * 0x1p6); /* Round to five significand bits using the Dekker-Veltkamp Split. (The cast eliminates the excess precision that the C standard allows.) */ double d = x * (0x1p48 + 1); x = d - (double) (d-x); /* Separate the significand and exponent. C's frexp scales the exponent so the significand is in [.5, 1), hence the e-1 below. */ int e; x = frexp(x, &e) - .5; return s | (e-1+3) << 4 | (uint8_t) (x*0x1p5); } static void Show(double x) { printf("%g -> 0x%02" PRIx8 ".\n", x, FloatToMini(x)); } static void Test(double x, uint8_t expected) { uint8_t observed = FloatToMini(x); if (expected != observed) { printf("Error, %.9g (%a) produced 0x%02" PRIx8 " but expected 0x%02" PRIx8 ".\n", x, x, observed, expected); exit(EXIT_FAILURE); } } int main(void) { // Set the value of an ULP in [1, 2). static const double ULP = 0x1p-4; // Test all even significands with normal exponents. for (double s = 1; s < 2; s += 2*ULP) // Test with trailing bits less than or equal to 1/2 ULP in magnitude. for (double t = -ULP / (s == 1 ? 4 : 2); t <= +ULP/2; t += ULP/16) // Test with all normal exponents. for (int e = 1-3; e < 7-3; ++e) // Test with both signs. for (int sign = -1; sign <= +1; sign += 2) { // Prepare the expected encoding. uint8_t expected = (0 < sign ? 0 : 1) << 7 | (e+3) << 4 | (uint8_t) ((s-1) * 0x1p4); Test(sign * ldexp(s+t, e), expected); } // Test all odd significands with normal exponents. for (double s = 1 + 1*ULP; s < 2; s += 2*ULP) // Test with trailing bits less than or equal to 1/2 ULP in magnitude. for (double t = -ULP/2+ULP/16; t < +ULP/2; t += ULP/16) // Test with all normal exponents. for (int e = 1-3; e < 7-3; ++e) // Test with both signs. for (int sign = -1; sign <= +1; sign += 2) { // Prepare the expected encoding. uint8_t expected = (0 < sign ? 0 : 1) << 7 | (e+3) << 4 | (uint8_t) ((s-1) * 0x1p4); Test(sign * ldexp(s+t, e), expected); } // Set the value of an ULP in the subnormal range. static const double subULP = ULP * 0x1p-2; // Test all even significands with the subnormal exponent. for (double s = 0; s < 0x1p-2; s += 2*subULP) // Test with trailing bits less than or equal to 1/2 ULP in magnitude. for (double t = s == 0 ? 0 : -subULP/2; t <= +subULP/2; t += subULP/16) { // Test with both signs. for (int sign = -1; sign <= +1; sign += 2) { // Prepare the expected encoding. uint8_t expected = (0 < sign ? 0 : 1) << 7 | (uint8_t) (s/subULP); Test(sign * (s+t), expected); } } // Test all odd significands with the subnormal exponent. for (double s = 0 + 1*subULP; s < 0x1p-2; s += 2*subULP) // Test with trailing bits less than or equal to 1/2 ULP in magnitude. for (double t = -subULP/2 + subULP/16; t < +subULP/2; t += subULP/16) { // Test with both signs. for (int sign = -1; sign <= +1; sign += 2) { // Prepare the expected encoding. uint8_t expected = (0 < sign ? 0 : 1) << 7 | (uint8_t) (s/subULP); Test(sign * (s+t), expected); } } // Test at and slightly under the point of rounding to infinity. Test(+15.75, 0x70); Test(-15.75, 0xf0); Test(nexttoward(+15.75, 0), 0x6f); Test(nexttoward(-15.75, 0), 0xef); // Test infinities and NaNs. Test(+INFINITY, 0x70); Test(-INFINITY, 0xf0); Test(+NAN, 0x78); Test(-NAN, 0xf8); Show(0); Show(0x1p-6); Show(0x1p-2); Show(0x1.1p-2); Show(0x1.2p-2); Show(0x1.4p-2); Show(0x1.8p-2); Show(0x1p-1); Show(15.5); Show(15.75); Show(16); Show(NAN); Show(1./6); Show(1./3); Show(2./3); }
I hate to answer my own question... But this may still not be the optimal solution. Although #Eric Postpischil's solution uses an established algorithm, it is not very well suited for minifloats, since there are so few denormals in 4 mantissa bits. Additionally, the overhead of multiple float arithmetic operations - and because of the actual code behind frexp in particular, it only has one branch less (or two when inlined and optimized) than my original solution and is also not that great in regards to instruction level parallelism. So here's my current solution: public static explicit operator quarter(float f) { byte f8_sign = (byte)((asuint(f) >> 31) << 7); uint f32_exponent = (asuint(f) >> 23) & 0x00FFu; uint f32_mantissa = asuint(f) & 0x007F_FFFFu; if (f32_exponent < 120) // underflow => preserve +/- 0 { return new quarter { value = f8_sign }; } else if (f32_exponent > 130) // overflow => +/- infinity or preserve NaN { return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) }; } else { int cmp = 125 - (int)f32_exponent; int cmpIsZeroOrNegativeMask = (cmp - 1) >> 31; int denormalExponent = andnot(0b0001_0000 >> cmp, cmpIsZeroOrNegativeMask); // special case 121: sets it to quarter.Epsilon denormalExponent += touint8((f32_exponent == 121) & (f32_mantissa >= 0x0040_0000)); // case 121: 2^(-6) * (1 + mantissa): return +/- quarter.Epsilon = 2^(-2) * 2^(-4); if the mantissa is >= 0.5 return 2^(-2) * 2^(-3) denormalExponent |= touint8((f32_exponent == 120) & (f32_mantissa != 0)); // case 120: 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0 int normalExponent = (cmpIsZeroOrNegativeMask & ((int)f32_exponent - (127 + EXPONENT_BIAS))) << 4; int mantissaShift = 19 + andnot(cmp, cmpIsZeroOrNegativeMask); return new quarter { value = (byte)((f8_sign | normalExponent) | (denormalExponent | (f32_mantissa >> mantissaShift))) }; } } But note that the particular andnot(int a, int b) function I use returns a & ~b and...not ~a & b. Thanks for your help :) I'm keeping this open since, as mentioned, this may very well not be the best solution - but at least it's my own... PS: This is probably a good example for why PREMATURE optimization is bad; Your code is much less readable. Make sure you have the functionality backed up by unit tests and make sure you even need the optimization in the first place.
...And after some time and in the spirit of transparent progression, I want to show the final version, since I believe to have found the optimal implementation; more later. First off, here it is (the code should speak for itself, which is why it is this "much"): unsafe struct quarter { const bool IEEE_754_STANDARD = true; //standard: true const bool SIGN_BIT = IEEE_754_STANDARD || true; //standard: true const int BITS = 8 * sizeof(byte); //standard: 8 const int EXPONENT_BITS = 3 + (SIGN_BIT ? 0 : 1); //standard: 3 const int MANTISSA_BITS = BITS - EXPONENT_BITS - (SIGN_BIT ? 1 : 0); //standard: 4 const int EXPONENT_BIAS = -(((1 << BITS) - 1) >> (BITS - (EXPONENT_BITS - 1))); //standard: -3 const int MAX_EXPONENT = EXPONENT_BIAS + ((1 << EXPONENT_BITS) - 1) - (IEEE_754_STANDARD ? 1 : 0); //standard: 3 const int SIGNALING_EXPONENT = (MAX_EXPONENT - EXPONENT_BIAS + (IEEE_754_STANDARD ? 1 : 0)) << MANTISSA_BITS; //standard: 0b0111_0000 const int F32_BITS = 8 * sizeof(float); const int F32_EXPONENT_BITS = 8; const int F32_MANTISSA_BITS = 23; const int F32_EXPONENT_BIAS = -(int)(((1L << F32_BITS) - 1) >> (F32_BITS - (F32_EXPONENT_BITS - 1))); const int F32_MAX_EXPONENT = F32_EXPONENT_BIAS + ((1 << F32_EXPONENT_BITS) - 1 - 1); const int F32_SIGNALING_EXPONENT = (F32_MAX_EXPONENT - F32_EXPONENT_BIAS + 1) << F32_MANTISSA_BITS; const int F32_SHL_LOSE_SIGN = (F32_BITS - (MANTISSA_BITS + EXPONENT_BITS)); const int F32_SHR_PLACE_MANTISSA = MANTISSA_BITS + ((1 + F32_EXPONENT_BITS) - (MANTISSA_BITS + EXPONENT_BITS)); const int F32_MAGIC = (((1 << F32_EXPONENT_BITS) - 1) - (1 + EXPONENT_BITS)) << F32_MANTISSA_BITS; byte _value; static quarter Epsilon => new quarter { _value = 1 }; static quarter MaxValue => new quarter { _value = (byte)(SIGNALING_EXPONENT - 1) }; static quarter NaN => new quarter { _value = (byte)(SIGNALING_EXPONENT | 1) }; static quarter PositiveInfinity => new quarter { _value = (byte)SIGNALING_EXPONENT }; static uint asuint(float f) => *(uint*)&f; static float asfloat(uint u) => *(float*)&u; static byte tobyte(bool b) => *(byte*)&b; static float ToFloat(quarter q, bool promiseInRange) { uint fusedExponentMantissa = ((uint)q._value << F32_SHL_LOSE_SIGN) >> F32_SHR_PLACE_MANTISSA; uint sign = ((uint)q._value >> (BITS - 1)) << (F32_BITS - 1); if (!promiseInRange) { bool nanInf = (q._value & SIGNALING_EXPONENT) == SIGNALING_EXPONENT; uint ifNanInf = asuint(float.PositiveInfinity) & (uint)(-tobyte(nanInf)); return (nanInf ? 1f : asfloat(F32_MAGIC)) * asfloat(sign | fusedExponentMantissa | ifNanInf); } else { return asfloat(F32_MAGIC) * asfloat(sign | fusedExponentMantissa); } } static quarter ToQuarter(float f, bool promiseInRange) { float inRange = f * (1f / asfloat(F32_MAGIC)); uint q = asuint(inRange) >> (F32_MANTISSA_BITS - (1 + EXPONENT_BITS)); uint f8_sign = asuint(f) >> (F32_BITS - 1); if (!promiseInRange) { uint f32_exponent = asuint(f) & F32_SIGNALING_EXPONENT; bool overflow = f32_exponent > (uint)(-F32_EXPONENT_BIAS + MAX_EXPONENT << F32_MANTISSA_BITS); bool notNaNInf = f32_exponent != F32_SIGNALING_EXPONENT; f8_sign ^= tobyte(!notNaNInf); if (overflow & notNaNInf) { q = PositiveInfinity._value; } } f8_sign <<= (BITS - 1); return new quarter{ _value = (byte)(q ^ f8_sign) }; } } Turns out that in fact, the reverse operation of converting the mini-float to a 32 bit float by multiplying with a magic constant is also the reverse operation of a multiplication (wow...): a floating point division by that constant. Luckily "by that constant" and not the other way around; we can calculate the reciprocal at compile time and multiply by it instead. This only fails, as with the reverse operation, when converting to- and from 'INF' and 'NaN'. Absolute overflow with any biased 32 exponent with exponent % (MAX_EXPONENT + 1) != 0 is not translated into 'INF' and positive 'INF' is translated into negative 'INF'. Although this enables some optimizations through the bool paramater, this mostly just reduces code size and more importantly (especially for SIMD versions, where small data types really shine) reduces the need for constants. Speaking of SIMD: This scalar version can be optimized a little by using SSE/SSE2 intrinsics. The (disabled) optimizations (would) run completely in parallel to the floating point multiplication followed by a shift, taking a total of 5 to 6+ clock cycles (very CPU dependant), which is astonishingly close to native hardware instructions (~4 to 5 clock cycles).
How to parse signed zero?
Is it possible to parse signed zero? I tried several approaches but no one gives the proper result: float test1 = Convert.ToSingle("-0.0"); float test2 = float.Parse("-0.0"); float test3; float.TryParse("-0.0", out test3); If I use the value directly initialized it is just fine: float test4 = -0.0f; So the problem seems to be in the parsing procedures of c#. I hope somebody could tell if there is some option or workaround for that. The difference could only be seen by converting to binary: var bin= BitConverter.GetBytes(test4);
I think there is no way to force float.Parse (or Convert.ToSingle) to respect negative zero. It just works like this (ignores sign in this case). So you have to check that yourself, for example: string target = "-0.0"; float result = float.Parse(target, CultureInfo.InvariantCulture); if (result == 0f && target.TrimStart().StartsWith("-")) result = -0f; If we look at source code for coreclr, we'll see (skipping all irrelevant parts): private static bool NumberBufferToDouble(ref NumberBuffer number, ref double value) { double d = NumberToDouble(ref number); uint e = DoubleHelper.Exponent(d); ulong m = DoubleHelper.Mantissa(d); if (e == 0x7FF) { return false; } if (e == 0 && m == 0) { d = 0; // < relevant part } value = d; return true; } As you see, if mantissa and exponent are both zero - value is explicitly assigned to 0. So there is no way you can change that. Full .NET implementation has NumberBufferToDouble as InternalCall (implemented in pure C\C++), but I assume it does something similar.
Updated Results Summary Mode : Release Test Framework : .NET Framework 4.7.1 Benchmarks runs : 100 times (averaged/scale) Tests limited to 10 digits Name | Time | Range | StdDev | Cycles | Pass ----------------------------------------------------------------------- Mine Unchecked | 9.645 ms | 0.259 ms | 0.30 | 32,815,064 | Yes Mine Unchecked2 | 10.863 ms | 1.337 ms | 0.35 | 36,959,457 | Yes Mine Safe | 11.908 ms | 0.993 ms | 0.53 | 40,541,885 | Yes float.Parse | 26.973 ms | 0.525 ms | 1.40 | 91,755,742 | Yes Evk | 31.513 ms | 1.515 ms | 7.96 | 103,288,681 | Base Test Limited to 38 digits Name | Time | Range | StdDev | Cycles | Pass ----------------------------------------------------------------------- Mine Unchecked | 17.694 ms | 0.276 ms | 0.50 | 60,178,511 | No Mine Unchecked2 | 23.980 ms | 0.417 ms | 0.34 | 81,641,998 | Yes Mine Safe | 25.078 ms | 0.124 ms | 0.63 | 85,306,389 | Yes float.Parse | 36.985 ms | 0.052 ms | 1.60 | 125,929,286 | Yes Evk | 39.159 ms | 0.406 ms | 3.26 | 133,043,100 | Base Test Limited to 98 digits (way over the range of a float) Name | Time | Range | StdDev | Cycles | Pass ----------------------------------------------------------------------- Mine Unchecked2 | 46.780 ms | 0.580 ms | 0.57 | 159,272,055 | Yes Mine Safe | 48.048 ms | 0.566 ms | 0.63 | 163,601,133 | Yes Mine Unchecked | 48.528 ms | 1.056 ms | 0.58 | 165,238,857 | No float.Parse | 55.935 ms | 1.461 ms | 0.95 | 190,456,039 | Yes Evk | 56.636 ms | 0.429 ms | 1.75 | 192,531,045 | Base Verifiably, Mine Unchecked is good for smaller numbers however when using powers at the end of the calculation to do fractional numbers it doesn't work for larger digit combinations, also because its just powers of 10 it plays with a i just a big switch statement which makes it marginally faster. Background Ok because of the various comments I got, and the work I put into this. I thought I’d rewrite this post with the most accurate benchmarks I could get. And all the logic behind them So when this first question come up, id had written my own benchmark framework and often just like writing a quick parser for these things and using unsafe code, 9 times out of 10 I can get this stuff faster than the framework equivalent. At first this was easy, just write a simple logic to parse numbers with decimal point places, and I did pretty well, however the initial results weren’t as accurate as they could have been, because my test data was just using the ‘f’ format specifier, and would turn larger precision numbers in to short formats with only 0’s. In the end I just couldn’t write a reliable parses to deal with exponent notation I.e 1.2324234233E+23. The only way I could get the maths to work was using BIGINTEGER and lots of hacks to force the right precision into a floating point value. This turned to be super slow. I even went to the float IEEE specs and try to do the maths to construct it in bits, this wasn’t that hard, and however the formula has loops in it and was complicated to get right. In the end I had to give up on exponent notation. So this is what I ended up with My testing framework runs on input data a list of 10000 flaots as strings, which is shared across the tests and generated for each test run, A test run is just going through the each test (remembering it’s the same data for each test) and adds up the results then averages them. This is about as good as it can get. I can increase the runs to 1000 or factors more however they don’t really change. In this case because we are testing a method that takes basically one variable (a string representation of a float) there is no point scaling this as its not set based, however I can tweak the input to cater for different lengths of floats, i.e., strings that are 10, 20 right up to 98 digits. Remembering a float only goes up to 38 anyway. To check the results I used the following, I have previously written a test unit that covers every float conceivable, and they work, except for a variation where I use Powers to calculate the decimal part of the number. Note, my framework only tests 1 result set, and its not part of the framework private bool Action(List<float> floats, List<float> list) { if (floats.Count != list.Count) return false; // sanity check for (int i = 0; i < list.Count; i++) { // nan is a special case as there is more than one possible bit value // for it if ( floats[i] != list[i] && !float.IsNaN(floats[i]) && !float.IsNaN(list[i])) return false; } return true; } In this case im testing again 3 types of input as shown below Setup // numberDecimalDigits specifies how long the output will be private static NumberFormatInfo GetNumberFormatInfo(int numberDecimalDigits) { return new NumberFormatInfo { NumberDecimalSeparator = ".", NumberDecimalDigits = numberDecimalDigits }; } // generate a random float by create an int, and converting it to float in pointers private static unsafe string GetRadomFloatString(IFormatProvider formatInfo) { var val = Rand.Next(0, int.MaxValue); if (Rand.Next(0, 2) == 1) val *= -1; var f = *(float*)&val; return f.ToString("f", formatInfo); } Test Data 1 // limits the out put to 10 characters // also because of that it has to check for trunced vales and // regenerates them public static List<string> GenerateInput10(int scale) { var result = new List<string>(scale); while (result.Count < scale) { var val = GetRadomFloatString(GetNumberFormatInfo(10)); if (val != "0.0000000000") result.Add(val); } result.Insert(0, (-0f).ToString("f", CultureInfo.InvariantCulture)); result.Insert(0, "-0"); result.Insert(0, "0.00"); result.Insert(0, float.NegativeInfinity.ToString("f", CultureInfo.InvariantCulture)); result.Insert(0, float.PositiveInfinity.ToString("f", CultureInfo.InvariantCulture)); return result; } Test Data 2 // basically that max value for a float public static List<string> GenerateInput38(int scale) { var result = Enumerable.Range(1, scale) .Select(x => GetRadomFloatString(GetNumberFormatInfo(38))) .ToList(); result.Insert(0, (-0f).ToString("f", CultureInfo.InvariantCulture)); result.Insert(0, "-0"); result.Insert(0, float.NegativeInfinity.ToString("f", CultureInfo.InvariantCulture)); result.Insert(0, float.PositiveInfinity.ToString("f", CultureInfo.InvariantCulture)); return result; } Test Data 3 // Lets take this to the limit public static List<string> GenerateInput98(int scale) { var result = Enumerable.Range(1, scale) .Select(x => GetRadomFloatString(GetNumberFormatInfo(98))) .ToList(); result.Insert(0, (-0f).ToString("f", CultureInfo.InvariantCulture)); result.Insert(0, "-0"); result.Insert(0, float.NegativeInfinity.ToString("f", CultureInfo.InvariantCulture)); result.Insert(0, float.PositiveInfinity.ToString("f", CultureInfo.InvariantCulture)); return result; } These are the tests I used Evk private float ParseMyFloat(string value) { var result = float.Parse(value, CultureInfo.InvariantCulture); if (result == 0f && value.TrimStart() .StartsWith("-")) { result = -0f; } return result; } Mine safe I call it safe as it tries to check for invalid strings [MethodImpl(MethodImplOptions.AggressiveInlining)] private unsafe float ParseMyFloat(string value) { double result = 0, dec = 0; if (value[0] == 'N' && value == "NaN") return float.NaN; if (value[0] == 'I' && value == "Infinity")return float.PositiveInfinity; if (value[0] == '-' && value[1] == 'I' && value == "-Infinity")return float.NegativeInfinity; fixed (char* ptr = value) { char* l, e; char* start = ptr, length = ptr + value.Length; if (*ptr == '-') start++; for (l = start; *l >= '0' && *l <= '9' && l < length; l++) result = result * 10 + *l - 48; if (*l == '.') { char* r; for (r = length - 1; r > l && *r >= '0' && *r <= '9'; r--) dec = (dec + (*r - 48)) / 10; if (l != r) throw new FormatException($"Invalid float : {value}"); } else if (l != length) throw new FormatException($"Invalid float : {value}"); result += dec; return *ptr == '-' ? (float)result * -1 : (float)result; } } Unchecked This fails for larger strings, but is ok for smaller ones [MethodImpl(MethodImplOptions.AggressiveInlining)] private unsafe float ParseMyFloat(string value) { if (value[0] == 'N' && value == "NaN") return float.NaN; if (value[0] == 'I' && value == "Infinity") return float.PositiveInfinity; if (value[0] == '-' && value[1] == 'I' && value == "-Infinity") return float.NegativeInfinity; fixed (char* ptr = value) { var point = 0; double result = 0, dec = 0; char* c, start = ptr, length = ptr + value.Length; if (*ptr == '-') start++; for (c = start; c < length && *c != '.'; c++) result = result * 10 + *c - 48; if (*c == '.') { point = (int)(length - 1 - c); for (c++; c < length; c++) dec = dec * 10 + *c - 48; } // MyPow is just a massive switch statement if (dec > 0) result += dec / MyPow(point); return *ptr == '-' ? (float)result * -1 : (float)result; } } Unchecked 2 [MethodImpl(MethodImplOptions.AggressiveInlining)] private unsafe float ParseMyFloat(string value) { if (value[0] == 'N' && value == "NaN") return float.NaN; if (value[0] == 'I' && value == "Infinity") return float.PositiveInfinity; if (value[0] == '-' && value[1] == 'I' && value == "-Infinity") return float.NegativeInfinity; fixed (char* ptr = value) { double result = 0, dec = 0; char* c, start = ptr, length = ptr + value.Length; if (*ptr == '-') start++; for (c = start; c < length && *c != '.'; c++) result = result * 10 + *c - 48; // this division seems unsafe for a double, // however i have tested it with every float and it works if (*c == '.') for (var d = length - 1; d > c; d--) dec = (dec + (*d - 48)) / 10; result += dec; return *ptr == '-' ? (float)result * -1 : (float)result; } } Float.parse float.Parse(t, CultureInfo.InvariantCulture) Original Answer Assuming you don't need a TryParse method, i managed to use pointers and custom parsing to achieve what i think you want. The benchmark uses a list of 1,000,000 random floats and runs each version 100 times, all versions use the same data Test Framework : .NET Framework 4.7.1 Scale : 1000000 Name | Time | Delta | Deviation | Cycles ---------------------------------------------------------------------- Mine Unchecked2 | 45.585 ms | 1.283 ms | 1.70 | 155,051,452 Mine Unchecked | 46.388 ms | 1.812 ms | 1.17 | 157,751,710 Mine Safe | 46.694 ms | 2.651 ms | 1.07 | 158,697,413 float.Parse | 173.229 ms | 4.795 ms | 5.41 | 589,297,449 Evk | 287.931 ms | 7.447 ms | 11.96 | 979,598,364 Chopped for brevity Note, Both these version cant deal with extended format, NaN, +Infinity, or -Infinity. However, it wouldn't be hard to implement at little overhead. I have checked this pretty well, though i must admit i havent written any unit tests, so use at your own risk. Disclaimer, I think Evk's StartsWith version could probably be more optimized, however it will still be (at best) slightly slower than float.Parse
You can try this: string target = "-0.0"; decimal result= (decimal.Parse(target, System.Globalization.NumberStyles.AllowParentheses | System.Globalization.NumberStyles.AllowLeadingWhite | System.Globalization.NumberStyles.AllowTrailingWhite | System.Globalization.NumberStyles.AllowThousands | System.Globalization.NumberStyles.AllowDecimalPoint | System.Globalization.NumberStyles.AllowLeadingSign));
Checking if Bytes are 0x00
What is the most readable (and idiomatic) to write this method? private bool BytesAreValid(byte[] bytes) { var t = (bytes[0] | bytes[1] | bytes[2]); return t != 0; } I need a function which tests the first three bytes of a file that it's not begin with 00 00 00. Haven't done much byte manipulation. The code above doesn't seem correct to me, since t is inferred of type Int32.
t is type-inferred to be an Int32 Yup, because the | operator (like most operators) isn't defined for byte - the bytes are promoted to int values. (See section 7.11.1 of the C# 4 spec for details.) But given that you only want to compare it with 0, that's fine anyway. Personally I'd just write it as: return bytes[0] != 0 && bytes[1] != 0 && bytes[2] != 0; Or even: return (bytes[0] != 0) && (bytes[1] != 0) && (bytes[2] != 0); Both of these seem clearer to me.
private bool BytesAreValid(byte[] bytes) { return !bytes.Take(3).SequenceEqual(new byte[] { 0, 0, 0 }); }
To anticipate variable array lengths and avoid null reference exceptions: private bool BytesAreValid(byte[] bytes) { if (bytes == null) return false; return !Array.Exists(bytes, x => x == 0); } Non-Linq version: private bool BytesAreValid(byte[] bytes) { if (bytes == null) return false; for (int i = 0; i < bytes.Length; i++) { if (bytes[i] == 0) return false; } return true; }
Convert bool[] to byte[]
I have a List<bool> which I want to convert to a byte[]. How do i do this? list.toArray() creates a bool[].
Here's two approaches, depending on whether you want to pack the bits into bytes, or have as many bytes as original bits: bool[] bools = { true, false, true, false, false, true, false, true, true }; // basic - same count byte[] arr1 = Array.ConvertAll(bools, b => b ? (byte)1 : (byte)0); // pack (in this case, using the first bool as the lsb - if you want // the first bool as the msb, reverse things ;-p) int bytes = bools.Length / 8; if ((bools.Length % 8) != 0) bytes++; byte[] arr2 = new byte[bytes]; int bitIndex = 0, byteIndex = 0; for (int i = 0; i < bools.Length; i++) { if (bools[i]) { arr2[byteIndex] |= (byte)(((byte)1) << bitIndex); } bitIndex++; if (bitIndex == 8) { bitIndex = 0; byteIndex++; } }
Marc's answer is good already, but... Assuming you are the kind of person that is comfortable doing bit-twiddling, or just want to write less code and squeeze out some more performance, then this here code is for you good sir / madame: byte[] PackBoolsInByteArray(bool[] bools) { int len = bools.Length; int bytes = len >> 3; if ((len & 0x07) != 0) ++bytes; byte[] arr2 = new byte[bytes]; for (int i = 0; i < bools.Length; i++) { if (bools[i]) arr2[i >> 3] |= (byte)(1 << (i & 0x07)); } } It does the exact same thing as Marc's code, it's just more succinct. Of course if we really want to go all out we could unroll it too... ...and while we are at it lets throw in a curve ball on the return type! IEnumerable<byte> PackBoolsInByteEnumerable(bool[] bools) { int len = bools.Length; int rem = len & 0x07; // hint: rem = len % 8. /* byte[] byteArr = rem == 0 // length is a multiple of 8? (no remainder?) ? new byte[len >> 3] // -yes- : new byte[(len >> 3)+ 1]; // -no- */ const byte BZ = 0, B0 = 1 << 0, B1 = 1 << 1, B2 = 1 << 2, B3 = 1 << 3, B4 = 1 << 4, B5 = 1 << 5, B6 = 1 << 6, B7 = 1 << 7; byte b; int i = 0; for (int mul = len & ~0x07; i < mul; i += 8) // hint: len = mul + rem. { b = bools[i] ? B0 : BZ; if (bools[i + 1]) b |= B1; if (bools[i + 2]) b |= B2; if (bools[i + 3]) b |= B3; if (bools[i + 4]) b |= B4; if (bools[i + 5]) b |= B5; if (bools[i + 6]) b |= B6; if (bools[i + 7]) b |= B7; //byteArr[i >> 3] = b; yield return b; } if (rem != 0) // take care of the remainder... { b = bools[i] ? B0 : BZ; // (there is at least one more bool.) switch (rem) // rem is [1:7] (fall-through switch!) { case 7: if (bools[i + 6]) b |= B6; goto case 6; case 6: if (bools[i + 5]) b |= B5; goto case 5; case 5: if (bools[i + 4]) b |= B4; goto case 4; case 4: if (bools[i + 3]) b |= B3; goto case 3; case 3: if (bools[i + 2]) b |= B2; goto case 2; case 2: if (bools[i + 1]) b |= B1; break; // case 1 is the statement above the switch! } //byteArr[i >> 3] = b; // write the last byte to the array. yield return b; // yield the last byte. } //return byteArr; } Tip: As you can see I included the code for returning a byte[] as comments. Simply comment out the two yield statements instead if that is what you want/need. Twiddling Hints: Shifting x >> 3 is a cheaper x / 8. Masking x & 0x07 is a cheaper x % 8. Masking x & ~0x07 is a cheaper x - x % 8. Edit: Here is some example documentation: /// <summary> /// Bit-packs an array of booleans into bytes, one bit per boolean. /// </summary><remarks> /// Booleans are bit-packed into bytes, in order, from least significant /// bit to most significant bit of each byte.<br/> /// If the length of the input array isn't a multiple of eight, then one /// or more of the most significant bits in the last byte returned will /// be unused. Unused bits are zero / unset. /// </remarks> /// <param name="bools">An array of booleans to pack into bytes.</param> /// <returns> /// An IEnumerable<byte> of bytes each containing (up to) eight /// bit-packed booleans. /// </returns>
You can use LINQ. This won't be efficient, but will be simple. I'm assuming that you want one byte per bool. bool[] a = new bool[] { true, false, true, true, false, true }; byte[] b = (from x in a select x ? (byte)0x1 : (byte)0x0).ToArray();
Or the IEnumerable approach to AnorZaken's answer: static IEnumerable<byte> PackBools(IEnumerable<bool> bools) { int bitIndex = 0; byte currentByte = 0; foreach (bool val in bools) { if (val) currentByte |= (byte)(1 << bitIndex); if (++bitIndex == 8) { yield return currentByte; bitIndex = 0; currentByte = 0; } } if (bitIndex != 8) { yield return currentByte; } } And the according unpacking where paddingEnd means the amount of bits to discard from the last byte to unpack: static IEnumerable<bool> UnpackBools(IEnumerable<byte> bytes, int paddingEnd = 0) { using (var enumerator = bytes.GetEnumerator()) { bool last = !enumerator.MoveNext(); while (!last) { byte current = enumerator.Current; last = !enumerator.MoveNext(); for (int i = 0; i < 8 - (last ? paddingEnd : 0); i++) { yield return (current & (1 << i)) != 0; } } } }
If you have any control over the type of list, try to make it a List, which will then produce the byte[] on ToArray(). If you have an ArrayList, you can use: (byte[])list.ToArray(typeof(byte)); To get the List, you could create one with your unspecified list iterator as an input to the constructor, and then produce the ToArray()? Or copy each item, casting to a new byte from bool? Some info on what type of list it is might help.
Have a look at the BitConverter class. Depending on the exact nature of your requirement, it may solve your problem quite neatly.
Another LINQ approach, less effective than #hfcs101's but would easily work for other value types as well: var a = new [] { true, false, true, true, false, true }; byte[] b = a.Select(BitConverter.GetBytes).SelectMany(x => x).ToArray();
Comparing two byte arrays in .NET
How can I do this fast? Sure I can do this: static bool ByteArrayCompare(byte[] a1, byte[] a2) { if (a1.Length != a2.Length) return false; for (int i=0; i<a1.Length; i++) if (a1[i]!=a2[i]) return false; return true; } But I'm looking for either a BCL function or some highly optimized proven way to do this. java.util.Arrays.equals((sbyte[])(Array)a1, (sbyte[])(Array)a2); works nicely, but it doesn't look like that would work for x64. Note my super-fast answer here.
You can use Enumerable.SequenceEqual method. using System; using System.Linq; ... var a1 = new int[] { 1, 2, 3}; var a2 = new int[] { 1, 2, 3}; var a3 = new int[] { 1, 2, 4}; var x = a1.SequenceEqual(a2); // true var y = a1.SequenceEqual(a3); // false If you can't use .NET 3.5 for some reason, your method is OK. Compiler\run-time environment will optimize your loop so you don't need to worry about performance.
P/Invoke powers activate! [DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)] static extern int memcmp(byte[] b1, byte[] b2, long count); static bool ByteArrayCompare(byte[] b1, byte[] b2) { // Validate buffers are the same length. // This also ensures that the count does not exceed the length of either buffer. return b1.Length == b2.Length && memcmp(b1, b2, b1.Length) == 0; }
Span<T> offers an extremely competitive alternative without having to throw confusing and/or non-portable fluff into your own application's code base: // byte[] is implicitly convertible to ReadOnlySpan<byte> static bool ByteArrayCompare(ReadOnlySpan<byte> a1, ReadOnlySpan<byte> a2) { return a1.SequenceEqual(a2); } The (guts of the) implementation as of .NET 6.0.4 can be found here. I've revised #EliArbel's gist to add this method as SpansEqual, drop most of the less interesting performers in others' benchmarks, run it with different array sizes, output graphs, and mark SpansEqual as the baseline so that it reports how the different methods compare to SpansEqual. The below numbers are from the results, lightly edited to remove "Error" column. | Method | ByteCount | Mean | StdDev | Ratio | RatioSD | |-------------- |----------- |-------------------:|----------------:|------:|--------:| | SpansEqual | 15 | 2.074 ns | 0.0233 ns | 1.00 | 0.00 | | LongPointers | 15 | 2.854 ns | 0.0632 ns | 1.38 | 0.03 | | Unrolled | 15 | 12.449 ns | 0.2487 ns | 6.00 | 0.13 | | PInvokeMemcmp | 15 | 7.525 ns | 0.1057 ns | 3.63 | 0.06 | | | | | | | | | SpansEqual | 1026 | 15.629 ns | 0.1712 ns | 1.00 | 0.00 | | LongPointers | 1026 | 46.487 ns | 0.2938 ns | 2.98 | 0.04 | | Unrolled | 1026 | 23.786 ns | 0.1044 ns | 1.52 | 0.02 | | PInvokeMemcmp | 1026 | 28.299 ns | 0.2781 ns | 1.81 | 0.03 | | | | | | | | | SpansEqual | 1048585 | 17,920.329 ns | 153.0750 ns | 1.00 | 0.00 | | LongPointers | 1048585 | 42,077.448 ns | 309.9067 ns | 2.35 | 0.02 | | Unrolled | 1048585 | 29,084.901 ns | 428.8496 ns | 1.62 | 0.03 | | PInvokeMemcmp | 1048585 | 30,847.572 ns | 213.3162 ns | 1.72 | 0.02 | | | | | | | | | SpansEqual | 2147483591 | 124,752,376.667 ns | 552,281.0202 ns | 1.00 | 0.00 | | LongPointers | 2147483591 | 139,477,269.231 ns | 331,458.5429 ns | 1.12 | 0.00 | | Unrolled | 2147483591 | 137,617,423.077 ns | 238,349.5093 ns | 1.10 | 0.00 | | PInvokeMemcmp | 2147483591 | 138,373,253.846 ns | 288,447.8278 ns | 1.11 | 0.01 | I was surprised to see SpansEqual not come out on top for the max-array-size methods, but the difference is so minor that I don't think it'll ever matter. After refreshing to run on .NET 6.0.4 with my newer hardware, SpansEqual now comfortably outperforms all others at all array sizes. My system info: BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000 AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores .NET SDK=6.0.202 [Host] : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
There's a new built-in solution for this in .NET 4 - IStructuralEquatable static bool ByteArrayCompare(byte[] a1, byte[] a2) { return StructuralComparisons.StructuralEqualityComparer.Equals(a1, a2); }
Edit: modern fast way is to use a1.SequenceEquals(a2) User gil suggested unsafe code which spawned this solution: // Copyright (c) 2008-2013 Hafthor Stefansson // Distributed under the MIT/X11 software license // Ref: http://www.opensource.org/licenses/mit-license.php. static unsafe bool UnsafeCompare(byte[] a1, byte[] a2) { unchecked { if(a1==a2) return true; if(a1==null || a2==null || a1.Length!=a2.Length) return false; fixed (byte* p1=a1, p2=a2) { byte* x1=p1, x2=p2; int l = a1.Length; for (int i=0; i < l/8; i++, x1+=8, x2+=8) if (*((long*)x1) != *((long*)x2)) return false; if ((l & 4)!=0) { if (*((int*)x1)!=*((int*)x2)) return false; x1+=4; x2+=4; } if ((l & 2)!=0) { if (*((short*)x1)!=*((short*)x2)) return false; x1+=2; x2+=2; } if ((l & 1)!=0) if (*((byte*)x1) != *((byte*)x2)) return false; return true; } } } which does 64-bit based comparison for as much of the array as possible. This kind of counts on the fact that the arrays start qword aligned. It'll work if not qword aligned, just not as fast as if it were. It performs about seven timers faster than the simple `for` loop. Using the J# library performed equivalently to the original `for` loop. Using .SequenceEqual runs around seven times slower; I think just because it is using IEnumerator.MoveNext. I imagine LINQ-based solutions being at least that slow or worse.
If you are not opposed to doing it, you can import the J# assembly "vjslib.dll" and use its Arrays.equals(byte[], byte[]) method... Don't blame me if someone laughs at you though... EDIT: For what little it is worth, I used Reflector to disassemble the code for that, and here is what it looks like: public static bool equals(sbyte[] a1, sbyte[] a2) { if (a1 == a2) { return true; } if ((a1 != null) && (a2 != null)) { if (a1.Length != a2.Length) { return false; } for (int i = 0; i < a1.Length; i++) { if (a1[i] != a2[i]) { return false; } } return true; } return false; }
.NET 3.5 and newer have a new public type, System.Data.Linq.Binary that encapsulates byte[]. It implements IEquatable<Binary> that (in effect) compares two byte arrays. Note that System.Data.Linq.Binary also has implicit conversion operator from byte[]. MSDN documentation:System.Data.Linq.Binary Reflector decompile of the Equals method: private bool EqualsTo(Binary binary) { if (this != binary) { if (binary == null) { return false; } if (this.bytes.Length != binary.bytes.Length) { return false; } if (this.hashCode != binary.hashCode) { return false; } int index = 0; int length = this.bytes.Length; while (index < length) { if (this.bytes[index] != binary.bytes[index]) { return false; } index++; } } return true; } Interesting twist is that they only proceed to byte-by-byte comparison loop if hashes of the two Binary objects are the same. This, however, comes at the cost of computing the hash in constructor of Binary objects (by traversing the array with for loop :-) ). The above implementation means that in the worst case you may have to traverse the arrays three times: first to compute hash of array1, then to compute hash of array2 and finally (because this is the worst case scenario, lengths and hashes equal) to compare bytes in array1 with bytes in array 2. Overall, even though System.Data.Linq.Binary is built into BCL, I don't think it is the fastest way to compare two byte arrays :-|.
I posted a similar question about checking if byte[] is full of zeroes. (SIMD code was beaten so I removed it from this answer.) Here is fastest code from my comparisons: static unsafe bool EqualBytesLongUnrolled (byte[] data1, byte[] data2) { if (data1 == data2) return true; if (data1.Length != data2.Length) return false; fixed (byte* bytes1 = data1, bytes2 = data2) { int len = data1.Length; int rem = len % (sizeof(long) * 16); long* b1 = (long*)bytes1; long* b2 = (long*)bytes2; long* e1 = (long*)(bytes1 + len - rem); while (b1 < e1) { if (*(b1) != *(b2) || *(b1 + 1) != *(b2 + 1) || *(b1 + 2) != *(b2 + 2) || *(b1 + 3) != *(b2 + 3) || *(b1 + 4) != *(b2 + 4) || *(b1 + 5) != *(b2 + 5) || *(b1 + 6) != *(b2 + 6) || *(b1 + 7) != *(b2 + 7) || *(b1 + 8) != *(b2 + 8) || *(b1 + 9) != *(b2 + 9) || *(b1 + 10) != *(b2 + 10) || *(b1 + 11) != *(b2 + 11) || *(b1 + 12) != *(b2 + 12) || *(b1 + 13) != *(b2 + 13) || *(b1 + 14) != *(b2 + 14) || *(b1 + 15) != *(b2 + 15)) return false; b1 += 16; b2 += 16; } for (int i = 0; i < rem; i++) if (data1 [len - 1 - i] != data2 [len - 1 - i]) return false; return true; } } Measured on two 256MB byte arrays: UnsafeCompare : 86,8784 ms EqualBytesSimd : 71,5125 ms EqualBytesSimdUnrolled : 73,1917 ms EqualBytesLongUnrolled : 39,8623 ms
using System.Linq; //SequenceEqual byte[] ByteArray1 = null; byte[] ByteArray2 = null; ByteArray1 = MyFunct1(); ByteArray2 = MyFunct2(); if (ByteArray1.SequenceEqual<byte>(ByteArray2) == true) { MessageBox.Show("Match"); } else { MessageBox.Show("Don't match"); }
Let's add one more! Recently Microsoft released a special NuGet package, System.Runtime.CompilerServices.Unsafe. It's special because it's written in IL, and provides low-level functionality not directly available in C#. One of its methods, Unsafe.As<T>(object) allows casting any reference type to another reference type, skipping any safety checks. This is usually a very bad idea, but if both types have the same structure, it can work. So we can use this to cast a byte[] to a long[]: bool CompareWithUnsafeLibrary(byte[] a1, byte[] a2) { if (a1.Length != a2.Length) return false; var longSize = (int)Math.Floor(a1.Length / 8.0); var long1 = Unsafe.As<long[]>(a1); var long2 = Unsafe.As<long[]>(a2); for (var i = 0; i < longSize; i++) { if (long1[i] != long2[i]) return false; } for (var i = longSize * 8; i < a1.Length; i++) { if (a1[i] != a2[i]) return false; } return true; } Note that long1.Length would still return the original array's length, since it's stored in a field in the array's memory structure. This method is not quite as fast as other methods demonstrated here, but it is a lot faster than the naive method, doesn't use unsafe code or P/Invoke or pinning, and the implementation is quite straightforward (IMO). Here are some BenchmarkDotNet results from my machine: BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-4870HQ CPU 2.50GHz, ProcessorCount=8 Frequency=2435775 Hz, Resolution=410.5470 ns, Timer=TSC [Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0 DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0 Method | Mean | StdDev | ----------------------- |-------------- |---------- | UnsafeLibrary | 125.8229 ns | 0.3588 ns | UnsafeCompare | 89.9036 ns | 0.8243 ns | JSharpEquals | 1,432.1717 ns | 1.3161 ns | EqualBytesLongUnrolled | 43.7863 ns | 0.8923 ns | NewMemCmp | 65.4108 ns | 0.2202 ns | ArraysEqual | 910.8372 ns | 2.6082 ns | PInvokeMemcmp | 52.7201 ns | 0.1105 ns | I've also created a gist with all the tests.
I developed a method that slightly beats memcmp() (plinth's answer) and very slighly beats EqualBytesLongUnrolled() (Arek Bulski's answer) on my PC. Basically, it unrolls the loop by 4 instead of 8. Update 30 Mar. 2019: Starting in .NET core 3.0, we have SIMD support! This solution is fastest by a considerable margin on my PC: #if NETCOREAPP3_0 using System.Runtime.Intrinsics.X86; #endif … public static unsafe bool Compare(byte[] arr0, byte[] arr1) { if (arr0 == arr1) { return true; } if (arr0 == null || arr1 == null) { return false; } if (arr0.Length != arr1.Length) { return false; } if (arr0.Length == 0) { return true; } fixed (byte* b0 = arr0, b1 = arr1) { #if NETCOREAPP3_0 if (Avx2.IsSupported) { return Compare256(b0, b1, arr0.Length); } else if (Sse2.IsSupported) { return Compare128(b0, b1, arr0.Length); } else #endif { return Compare64(b0, b1, arr0.Length); } } } #if NETCOREAPP3_0 public static unsafe bool Compare256(byte* b0, byte* b1, int length) { byte* lastAddr = b0 + length; byte* lastAddrMinus128 = lastAddr - 128; const int mask = -1; while (b0 < lastAddrMinus128) // unroll the loop so that we are comparing 128 bytes at a time. { if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != mask) { return false; } if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 32), Avx.LoadVector256(b1 + 32))) != mask) { return false; } if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 64), Avx.LoadVector256(b1 + 64))) != mask) { return false; } if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 96), Avx.LoadVector256(b1 + 96))) != mask) { return false; } b0 += 128; b1 += 128; } while (b0 < lastAddr) { if (*b0 != *b1) return false; b0++; b1++; } return true; } public static unsafe bool Compare128(byte* b0, byte* b1, int length) { byte* lastAddr = b0 + length; byte* lastAddrMinus64 = lastAddr - 64; const int mask = 0xFFFF; while (b0 < lastAddrMinus64) // unroll the loop so that we are comparing 64 bytes at a time. { if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != mask) { return false; } if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 16), Sse2.LoadVector128(b1 + 16))) != mask) { return false; } if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 32), Sse2.LoadVector128(b1 + 32))) != mask) { return false; } if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 48), Sse2.LoadVector128(b1 + 48))) != mask) { return false; } b0 += 64; b1 += 64; } while (b0 < lastAddr) { if (*b0 != *b1) return false; b0++; b1++; } return true; } #endif public static unsafe bool Compare64(byte* b0, byte* b1, int length) { byte* lastAddr = b0 + length; byte* lastAddrMinus32 = lastAddr - 32; while (b0 < lastAddrMinus32) // unroll the loop so that we are comparing 32 bytes at a time. { if (*(ulong*)b0 != *(ulong*)b1) return false; if (*(ulong*)(b0 + 8) != *(ulong*)(b1 + 8)) return false; if (*(ulong*)(b0 + 16) != *(ulong*)(b1 + 16)) return false; if (*(ulong*)(b0 + 24) != *(ulong*)(b1 + 24)) return false; b0 += 32; b1 += 32; } while (b0 < lastAddr) { if (*b0 != *b1) return false; b0++; b1++; } return true; }
I would use unsafe code and run the for loop comparing Int32 pointers. Maybe you should also consider checking the arrays to be non-null.
If you look at how .NET does string.Equals, you see that it uses a private method called EqualsHelper which has an "unsafe" pointer implementation. .NET Reflector is your friend to see how things are done internally. This can be used as a template for byte array comparison which I did an implementation on in blog post Fast byte array comparison in C#. I also did some rudimentary benchmarks to see when a safe implementation is faster than the unsafe. That said, unless you really need killer performance, I'd go for a simple fr loop comparison.
For those of you that care about order (i.e. want your memcmp to return an int like it should instead of nothing), .NET Core 3.0 (and presumably .NET Standard 2.1 aka .NET 5.0) will include a Span.SequenceCompareTo(...) extension method (plus a Span.SequenceEqualTo) that can be used to compare two ReadOnlySpan<T> instances (where T: IComparable<T>). In the original GitHub proposal, the discussion included approach comparisons with jump table calculations, reading a byte[] as long[], SIMD usage, and p/invoke to the CLR implementation's memcmp. Going forward, this should be your go-to method for comparing byte arrays or byte ranges (as should using Span<byte> instead of byte[] for your .NET Standard 2.1 APIs), and it is sufficiently fast enough that you should no longer care about optimizing it (and no, despite the similarities in name it does not perform as abysmally as the horrid Enumerable.SequenceEqual). #if NETCOREAPP3_0_OR_GREATER // Using the platform-native Span<T>.SequenceEqual<T>(..) public static int Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count) { var span1 = range1.AsSpan(offset1, count); var span2 = range2.AsSpan(offset2, count); return span1.SequenceCompareTo(span2); // or, if you don't care about ordering // return span1.SequenceEqual(span2); } #else // The most basic implementation, in platform-agnostic, safe C# public static bool Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count) { // Working backwards lets the compiler optimize away bound checking after the first loop for (int i = count - 1; i >= 0; --i) { if (range1[offset1 + i] != range2[offset2 + i]) { return false; } } return true; } #endif
I did some measurements using attached program .net 4.7 release build without the debugger attached. I think people have been using the wrong metric since what you are about if you care about speed here is how long it takes to figure out if two byte arrays are equal. i.e. throughput in bytes. StructuralComparison : 4.6 MiB/s for : 274.5 MiB/s ToUInt32 : 263.6 MiB/s ToUInt64 : 474.9 MiB/s memcmp : 8500.8 MiB/s As you can see, there's no better way than memcmp and it's orders of magnitude faster. A simple for loop is the second best option. And it still boggles my mind why Microsoft cannot simply include a Buffer.Compare method. [Program.cs]: using System; using System.Collections; using System.Collections.Generic; using System.Diagnostics; using System.Linq; using System.Runtime.InteropServices; using System.Text; using System.Threading.Tasks; namespace memcmp { class Program { static byte[] TestVector(int size) { var data = new byte[size]; using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider()) { rng.GetBytes(data); } return data; } static TimeSpan Measure(string testCase, TimeSpan offset, Action action, bool ignore = false) { var t = Stopwatch.StartNew(); var n = 0L; while (t.Elapsed < TimeSpan.FromSeconds(10)) { action(); n++; } var elapsed = t.Elapsed - offset; if (!ignore) { Console.WriteLine($"{testCase,-16} : {n / elapsed.TotalSeconds,16:0.0} MiB/s"); } return elapsed; } [DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)] static extern int memcmp(byte[] b1, byte[] b2, long count); static void Main(string[] args) { // how quickly can we establish if two sequences of bytes are equal? // note that we are testing the speed of different comparsion methods var a = TestVector(1024 * 1024); // 1 MiB var b = (byte[])a.Clone(); // was meant to offset the overhead of everything but copying but my attempt was a horrible mistake... should have reacted sooner due to the initially ridiculous throughput values... // Measure("offset", new TimeSpan(), () => { return; }, ignore: true); var offset = TimeZone.Zero Measure("StructuralComparison", offset, () => { StructuralComparisons.StructuralEqualityComparer.Equals(a, b); }); Measure("for", offset, () => { for (int i = 0; i < a.Length; i++) { if (a[i] != b[i]) break; } }); Measure("ToUInt32", offset, () => { for (int i = 0; i < a.Length; i += 4) { if (BitConverter.ToUInt32(a, i) != BitConverter.ToUInt32(b, i)) break; } }); Measure("ToUInt64", offset, () => { for (int i = 0; i < a.Length; i += 8) { if (BitConverter.ToUInt64(a, i) != BitConverter.ToUInt64(b, i)) break; } }); Measure("memcmp", offset, () => { memcmp(a, b, a.Length); }); } } }
Couldn't find a solution I'm completely happy with (reasonable performance, but no unsafe code/pinvoke) so I came up with this, nothing really original, but works: /// <summary> /// /// </summary> /// <param name="array1"></param> /// <param name="array2"></param> /// <param name="bytesToCompare"> 0 means compare entire arrays</param> /// <returns></returns> public static bool ArraysEqual(byte[] array1, byte[] array2, int bytesToCompare = 0) { if (array1.Length != array2.Length) return false; var length = (bytesToCompare == 0) ? array1.Length : bytesToCompare; var tailIdx = length - length % sizeof(Int64); //check in 8 byte chunks for (var i = 0; i < tailIdx; i += sizeof(Int64)) { if (BitConverter.ToInt64(array1, i) != BitConverter.ToInt64(array2, i)) return false; } //check the remainder of the array, always shorter than 8 bytes for (var i = tailIdx; i < length; i++) { if (array1[i] != array2[i]) return false; } return true; } Performance compared with some of the other solutions on this page: Simple Loop: 19837 ticks, 1.00 *BitConverter: 4886 ticks, 4.06 UnsafeCompare: 1636 ticks, 12.12 EqualBytesLongUnrolled: 637 ticks, 31.09 P/Invoke memcmp: 369 ticks, 53.67 Tested in linqpad, 1000000 bytes identical arrays (worst case scenario), 500 iterations each.
It seems that EqualBytesLongUnrolled is the best from the above suggested. Skipped methods (Enumerable.SequenceEqual,StructuralComparisons.StructuralEqualityComparer.Equals), were not-patient-for-slow. On 265MB arrays I have measured this: Host Process Environment Information: BenchmarkDotNet.Core=v0.9.9.0 OS=Microsoft Windows NT 6.2.9200.0 Processor=Intel(R) Core(TM) i7-3770 CPU 3.40GHz, ProcessorCount=8 Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC CLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT] GC=Concurrent Workstation JitModules=clrjit-v4.6.1590.0 Type=CompareMemoriesBenchmarks Mode=Throughput Method | Median | StdDev | Scaled | Scaled-SD | ----------------------- |------------ |---------- |------- |---------- | NewMemCopy | 30.0443 ms | 1.1880 ms | 1.00 | 0.00 | EqualBytesLongUnrolled | 29.9917 ms | 0.7480 ms | 0.99 | 0.04 | msvcrt_memcmp | 30.0930 ms | 0.2964 ms | 1.00 | 0.03 | UnsafeCompare | 31.0520 ms | 0.7072 ms | 1.03 | 0.04 | ByteArrayCompare | 212.9980 ms | 2.0776 ms | 7.06 | 0.25 | OS=Windows Processor=?, ProcessorCount=8 Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC CLR=CORE, Arch=64-bit ? [RyuJIT] GC=Concurrent Workstation dotnet cli version: 1.0.0-preview2-003131 Type=CompareMemoriesBenchmarks Mode=Throughput Method | Median | StdDev | Scaled | Scaled-SD | ----------------------- |------------ |---------- |------- |---------- | NewMemCopy | 30.1789 ms | 0.0437 ms | 1.00 | 0.00 | EqualBytesLongUnrolled | 30.1985 ms | 0.1782 ms | 1.00 | 0.01 | msvcrt_memcmp | 30.1084 ms | 0.0660 ms | 1.00 | 0.00 | UnsafeCompare | 31.1845 ms | 0.4051 ms | 1.03 | 0.01 | ByteArrayCompare | 212.0213 ms | 0.1694 ms | 7.03 | 0.01 |
For comparing short byte arrays the following is an interesting hack: if(myByteArray1.Length != myByteArray2.Length) return false; if(myByteArray1.Length == 8) return BitConverter.ToInt64(myByteArray1, 0) == BitConverter.ToInt64(myByteArray2, 0); else if(myByteArray.Length == 4) return BitConverter.ToInt32(myByteArray2, 0) == BitConverter.ToInt32(myByteArray2, 0); Then I would probably fall out to the solution listed in the question. It'd be interesting to do a performance analysis of this code.
I have not seen many linq solutions here. I am not sure of the performance implications, however I generally stick to linq as rule of thumb and then optimize later if necessary. public bool CompareTwoArrays(byte[] array1, byte[] array2) { return !array1.Where((t, i) => t != array2[i]).Any(); } Please do note this only works if they are the same size arrays. an extension could look like so public bool CompareTwoArrays(byte[] array1, byte[] array2) { if (array1.Length != array2.Length) return false; return !array1.Where((t, i) => t != array2[i]).Any(); }
I thought about block-transfer acceleration methods built into many graphics cards. But then you would have to copy over all the data byte-wise, so this doesn't help you much if you don't want to implement a whole portion of your logic in unmanaged and hardware-dependent code... Another way of optimization similar to the approach shown above would be to store as much of your data as possible in a long[] rather than a byte[] right from the start, for example if you are reading it sequentially from a binary file, or if you use a memory mapped file, read in data as long[] or single long values. Then, your comparison loop will only need 1/8th of the number of iterations it would have to do for a byte[] containing the same amount of data. It is a matter of when and how often you need to compare vs. when and how often you need to access the data in a byte-by-byte manner, e.g. to use it in an API call as a parameter in a method that expects a byte[]. In the end, you only can tell if you really know the use case...
Sorry, if you're looking for a managed way you're already doing it correctly and to my knowledge there's no built in method in the BCL for doing this. You should add some initial null checks and then just reuse it as if it where in BCL.
I settled on a solution inspired by the EqualBytesLongUnrolled method posted by ArekBulski with an additional optimization. In my instance, array differences in arrays tend to be near the tail of the arrays. In testing, I found that when this is the case for large arrays, being able to compare array elements in reverse order gives this solution a huge performance gain over the memcmp based solution. Here is that solution: public enum CompareDirection { Forward, Backward } private static unsafe bool UnsafeEquals(byte[] a, byte[] b, CompareDirection direction = CompareDirection.Forward) { // returns when a and b are same array or both null if (a == b) return true; // if either is null or different lengths, can't be equal if (a == null || b == null || a.Length != b.Length) return false; const int UNROLLED = 16; // count of longs 'unrolled' in optimization int size = sizeof(long) * UNROLLED; // 128 bytes (min size for 'unrolled' optimization) int len = a.Length; int n = len / size; // count of full 128 byte segments int r = len % size; // count of remaining 'unoptimized' bytes // pin the arrays and access them via pointers fixed (byte* pb_a = a, pb_b = b) { if (r > 0 && direction == CompareDirection.Backward) { byte* pa = pb_a + len - 1; byte* pb = pb_b + len - 1; byte* phead = pb_a + len - r; while(pa >= phead) { if (*pa != *pb) return false; pa--; pb--; } } if (n > 0) { int nOffset = n * size; if (direction == CompareDirection.Forward) { long* pa = (long*)pb_a; long* pb = (long*)pb_b; long* ptail = (long*)(pb_a + nOffset); while (pa < ptail) { if (*(pa + 0) != *(pb + 0) || *(pa + 1) != *(pb + 1) || *(pa + 2) != *(pb + 2) || *(pa + 3) != *(pb + 3) || *(pa + 4) != *(pb + 4) || *(pa + 5) != *(pb + 5) || *(pa + 6) != *(pb + 6) || *(pa + 7) != *(pb + 7) || *(pa + 8) != *(pb + 8) || *(pa + 9) != *(pb + 9) || *(pa + 10) != *(pb + 10) || *(pa + 11) != *(pb + 11) || *(pa + 12) != *(pb + 12) || *(pa + 13) != *(pb + 13) || *(pa + 14) != *(pb + 14) || *(pa + 15) != *(pb + 15) ) { return false; } pa += UNROLLED; pb += UNROLLED; } } else { long* pa = (long*)(pb_a + nOffset); long* pb = (long*)(pb_b + nOffset); long* phead = (long*)pb_a; while (phead < pa) { if (*(pa - 1) != *(pb - 1) || *(pa - 2) != *(pb - 2) || *(pa - 3) != *(pb - 3) || *(pa - 4) != *(pb - 4) || *(pa - 5) != *(pb - 5) || *(pa - 6) != *(pb - 6) || *(pa - 7) != *(pb - 7) || *(pa - 8) != *(pb - 8) || *(pa - 9) != *(pb - 9) || *(pa - 10) != *(pb - 10) || *(pa - 11) != *(pb - 11) || *(pa - 12) != *(pb - 12) || *(pa - 13) != *(pb - 13) || *(pa - 14) != *(pb - 14) || *(pa - 15) != *(pb - 15) || *(pa - 16) != *(pb - 16) ) { return false; } pa -= UNROLLED; pb -= UNROLLED; } } } if (r > 0 && direction == CompareDirection.Forward) { byte* pa = pb_a + len - r; byte* pb = pb_b + len - r; byte* ptail = pb_a + len; while(pa < ptail) { if (*pa != *pb) return false; pa++; pb++; } } } return true; }
This is almost certainly much slower than any other version given here, but it was fun to write. static bool ByteArrayEquals(byte[] a1, byte[] a2) { return a1.Zip(a2, (l, r) => l == r).All(x => x); }
This is similar to others, but the difference here is that there is no falling through to the next highest number of bytes I can check at once, e.g. if I have 63 bytes (in my SIMD example) I can check the equality of the first 32 bytes, and then the last 32 bytes, which is faster than checking 32 bytes, 16 bytes, 8 bytes, and so on. The first check you enter is the only check you will need to compare all of the bytes. This does come out on top in my tests, but just by a hair. The following code is exactly how I tested it in airbreather/ArrayComparePerf.cs. public unsafe bool SIMDNoFallThrough() #requires System.Runtime.Intrinsics.X86 { if (a1 == null || a2 == null) return false; int length0 = a1.Length; if (length0 != a2.Length) return false; fixed (byte* b00 = a1, b01 = a2) { byte* b0 = b00, b1 = b01, last0 = b0 + length0, last1 = b1 + length0, last32 = last0 - 31; if (length0 > 31) { while (b0 < last32) { if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != -1) return false; b0 += 32; b1 += 32; } return Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(last0 - 32), Avx.LoadVector256(last1 - 32))) == -1; } if (length0 > 15) { if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != 65535) return false; return Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(last0 - 16), Sse2.LoadVector128(last1 - 16))) == 65535; } if (length0 > 7) { if (*(ulong*)b0 != *(ulong*)b1) return false; return *(ulong*)(last0 - 8) == *(ulong*)(last1 - 8); } if (length0 > 3) { if (*(uint*)b0 != *(uint*)b1) return false; return *(uint*)(last0 - 4) == *(uint*)(last1 - 4); } if (length0 > 1) { if (*(ushort*)b0 != *(ushort*)b1) return false; return *(ushort*)(last0 - 2) == *(ushort*)(last1 - 2); } return *b0 == *b1; } } If no SIMD is preferred, the same method applied to the the existing LongPointers algorithm: public unsafe bool LongPointersNoFallThrough() { if (a1 == null || a2 == null || a1.Length != a2.Length) return false; fixed (byte* p1 = a1, p2 = a2) { byte* x1 = p1, x2 = p2; int l = a1.Length; if ((l & 8) != 0) { for (int i = 0; i < l / 8; i++, x1 += 8, x2 += 8) if (*(long*)x1 != *(long*)x2) return false; return *(long*)(x1 + (l - 8)) == *(long*)(x2 + (l - 8)); } if ((l & 4) != 0) { if (*(int*)x1 != *(int*)x2) return false; x1 += 4; x2 += 4; return *(int*)(x1 + (l - 4)) == *(int*)(x2 + (l - 4)); } if ((l & 2) != 0) { if (*(short*)x1 != *(short*)x2) return false; x1 += 2; x2 += 2; return *(short*)(x1 + (l - 2)) == *(short*)(x2 + (l - 2)); } return *x1 == *x2; } }
If you are looking for a very fast byte array equality comparer, I suggest you take a look at this STSdb Labs article: Byte array equality comparer. It features some of the fastest implementations for byte[] array equality comparing, which are presented, performance tested and summarized. You can also focus on these implementations: BigEndianByteArrayComparer - fast byte[] array comparer from left to right (BigEndian) BigEndianByteArrayEqualityComparer - - fast byte[] equality comparer from left to right (BigEndian) LittleEndianByteArrayComparer - fast byte[] array comparer from right to left (LittleEndian) LittleEndianByteArrayEqualityComparer - fast byte[] equality comparer from right to left (LittleEndian)
Use SequenceEquals for this to comparison.
The short answer is this: public bool Compare(byte[] b1, byte[] b2) { return Encoding.ASCII.GetString(b1) == Encoding.ASCII.GetString(b2); } In such a way you can use the optimized .NET string compare to make a byte array compare without the need to write unsafe code. This is how it is done in the background: private unsafe static bool EqualsHelper(String strA, String strB) { Contract.Requires(strA != null); Contract.Requires(strB != null); Contract.Requires(strA.Length == strB.Length); int length = strA.Length; fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar) { char* a = ap; char* b = bp; // Unroll the loop #if AMD64 // For the AMD64 bit platform we unroll by 12 and // check three qwords at a time. This is less code // than the 32 bit case and is shorter // pathlength. while (length >= 12) { if (*(long*)a != *(long*)b) return false; if (*(long*)(a+4) != *(long*)(b+4)) return false; if (*(long*)(a+8) != *(long*)(b+8)) return false; a += 12; b += 12; length -= 12; } #else while (length >= 10) { if (*(int*)a != *(int*)b) return false; if (*(int*)(a+2) != *(int*)(b+2)) return false; if (*(int*)(a+4) != *(int*)(b+4)) return false; if (*(int*)(a+6) != *(int*)(b+6)) return false; if (*(int*)(a+8) != *(int*)(b+8)) return false; a += 10; b += 10; length -= 10; } #endif // This depends on the fact that the String objects are // always zero terminated and that the terminating zero is not included // in the length. For odd string sizes, the last compare will include // the zero terminator. while (length > 0) { if (*(int*)a != *(int*)b) break; a += 2; b += 2; length -= 2; } return (length <= 0); } }
Since many of the fancy solutions above don't work with UWP and because I love Linq and functional approaches I pressent you my version to this problem. To escape the comparison when the first difference occures, I chose .FirstOrDefault() public static bool CompareByteArrays(byte[] ba0, byte[] ba1) => !(ba0.Length != ba1.Length || Enumerable.Range(1,ba0.Length) .FirstOrDefault(n => ba0[n] != ba1[n]) > 0);