How to parse signed zero? - c#

Is it possible to parse signed zero?
I tried several approaches but no one gives the proper result:
float test1 = Convert.ToSingle("-0.0");
float test2 = float.Parse("-0.0");
float test3;
float.TryParse("-0.0", out test3);
If I use the value directly initialized it is just fine:
float test4 = -0.0f;
So the problem seems to be in the parsing procedures of c#. I hope somebody could tell if there is some option or workaround for that.
The difference could only be seen by converting to binary:
var bin= BitConverter.GetBytes(test4);

I think there is no way to force float.Parse (or Convert.ToSingle) to respect negative zero. It just works like this (ignores sign in this case). So you have to check that yourself, for example:
string target = "-0.0";
float result = float.Parse(target, CultureInfo.InvariantCulture);
if (result == 0f && target.TrimStart().StartsWith("-"))
result = -0f;
If we look at source code for coreclr, we'll see (skipping all irrelevant parts):
private static bool NumberBufferToDouble(ref NumberBuffer number, ref double value)
{
double d = NumberToDouble(ref number);
uint e = DoubleHelper.Exponent(d);
ulong m = DoubleHelper.Mantissa(d);
if (e == 0x7FF)
{
return false;
}
if (e == 0 && m == 0)
{
d = 0; // < relevant part
}
value = d;
return true;
}
As you see, if mantissa and exponent are both zero - value is explicitly assigned to 0. So there is no way you can change that.
Full .NET implementation has NumberBufferToDouble as InternalCall (implemented in pure C\C++), but I assume it does something similar.

Updated Results
Summary
Mode : Release
Test Framework : .NET Framework 4.7.1
Benchmarks runs : 100 times (averaged/scale)
Tests limited to 10 digits
Name | Time | Range | StdDev | Cycles | Pass
-----------------------------------------------------------------------
Mine Unchecked | 9.645 ms | 0.259 ms | 0.30 | 32,815,064 | Yes
Mine Unchecked2 | 10.863 ms | 1.337 ms | 0.35 | 36,959,457 | Yes
Mine Safe | 11.908 ms | 0.993 ms | 0.53 | 40,541,885 | Yes
float.Parse | 26.973 ms | 0.525 ms | 1.40 | 91,755,742 | Yes
Evk | 31.513 ms | 1.515 ms | 7.96 | 103,288,681 | Base
Test Limited to 38 digits
Name | Time | Range | StdDev | Cycles | Pass
-----------------------------------------------------------------------
Mine Unchecked | 17.694 ms | 0.276 ms | 0.50 | 60,178,511 | No
Mine Unchecked2 | 23.980 ms | 0.417 ms | 0.34 | 81,641,998 | Yes
Mine Safe | 25.078 ms | 0.124 ms | 0.63 | 85,306,389 | Yes
float.Parse | 36.985 ms | 0.052 ms | 1.60 | 125,929,286 | Yes
Evk | 39.159 ms | 0.406 ms | 3.26 | 133,043,100 | Base
Test Limited to 98 digits (way over the range of a float)
Name | Time | Range | StdDev | Cycles | Pass
-----------------------------------------------------------------------
Mine Unchecked2 | 46.780 ms | 0.580 ms | 0.57 | 159,272,055 | Yes
Mine Safe | 48.048 ms | 0.566 ms | 0.63 | 163,601,133 | Yes
Mine Unchecked | 48.528 ms | 1.056 ms | 0.58 | 165,238,857 | No
float.Parse | 55.935 ms | 1.461 ms | 0.95 | 190,456,039 | Yes
Evk | 56.636 ms | 0.429 ms | 1.75 | 192,531,045 | Base
Verifiably, Mine Unchecked is good for smaller numbers however when using powers at the end of the calculation to do fractional numbers it doesn't work for larger digit combinations, also because its just powers of 10 it plays with a i just a big switch statement which makes it marginally faster.
Background
Ok because of the various comments I got, and the work I put into this. I thought I’d rewrite this post with the most accurate benchmarks I could get. And all the logic behind them
So when this first question come up, id had written my own benchmark framework and often just like writing a quick parser for these things and using unsafe code, 9 times out of 10 I can get this stuff faster than the framework equivalent.
At first this was easy, just write a simple logic to parse numbers with decimal point places, and I did pretty well, however the initial results weren’t as accurate as they could have been, because my test data was just using the ‘f’ format specifier, and would turn larger precision numbers in to short formats with only 0’s.
In the end I just couldn’t write a reliable parses to deal with exponent notation I.e 1.2324234233E+23. The only way I could get the maths to work was using BIGINTEGER and lots of hacks to force the right precision into a floating point value. This turned to be super slow. I even went to the float IEEE specs and try to do the maths to construct it in bits, this wasn’t that hard, and however the formula has loops in it and was complicated to get right. In the end I had to give up on exponent notation.
So this is what I ended up with
My testing framework runs on input data a list of 10000 flaots as strings, which is shared across the tests and generated for each test run, A test run is just going through the each test (remembering it’s the same data for each test) and adds up the results then averages them. This is about as good as it can get. I can increase the runs to 1000 or factors more however they don’t really change. In this case because we are testing a method that takes basically one variable (a string representation of a float) there is no point scaling this as its not set based, however I can tweak the input to cater for different lengths of floats, i.e., strings that are 10, 20 right up to 98 digits. Remembering a float only goes up to 38 anyway.
To check the results I used the following, I have previously written a test unit that covers every float conceivable, and they work, except for a variation where I use Powers to calculate the decimal part of the number.
Note, my framework only tests 1 result set, and its not part of the
framework
private bool Action(List<float> floats, List<float> list)
{
if (floats.Count != list.Count)
return false; // sanity check
for (int i = 0; i < list.Count; i++)
{
// nan is a special case as there is more than one possible bit value
// for it
if ( floats[i] != list[i] && !float.IsNaN(floats[i]) && !float.IsNaN(list[i]))
return false;
}
return true;
}
In this case im testing again 3 types of input as shown below
Setup
// numberDecimalDigits specifies how long the output will be
private static NumberFormatInfo GetNumberFormatInfo(int numberDecimalDigits)
{
return new NumberFormatInfo
{
NumberDecimalSeparator = ".",
NumberDecimalDigits = numberDecimalDigits
};
}
// generate a random float by create an int, and converting it to float in pointers
private static unsafe string GetRadomFloatString(IFormatProvider formatInfo)
{
var val = Rand.Next(0, int.MaxValue);
if (Rand.Next(0, 2) == 1)
val *= -1;
var f = *(float*)&val;
return f.ToString("f", formatInfo);
}
Test Data 1
// limits the out put to 10 characters
// also because of that it has to check for trunced vales and
// regenerates them
public static List<string> GenerateInput10(int scale)
{
var result = new List<string>(scale);
while (result.Count < scale)
{
var val = GetRadomFloatString(GetNumberFormatInfo(10));
if (val != "0.0000000000")
result.Add(val);
}
result.Insert(0, (-0f).ToString("f", CultureInfo.InvariantCulture));
result.Insert(0, "-0");
result.Insert(0, "0.00");
result.Insert(0, float.NegativeInfinity.ToString("f", CultureInfo.InvariantCulture));
result.Insert(0, float.PositiveInfinity.ToString("f", CultureInfo.InvariantCulture));
return result;
}
Test Data 2
// basically that max value for a float
public static List<string> GenerateInput38(int scale)
{
var result = Enumerable.Range(1, scale)
.Select(x => GetRadomFloatString(GetNumberFormatInfo(38)))
.ToList();
result.Insert(0, (-0f).ToString("f", CultureInfo.InvariantCulture));
result.Insert(0, "-0");
result.Insert(0, float.NegativeInfinity.ToString("f", CultureInfo.InvariantCulture));
result.Insert(0, float.PositiveInfinity.ToString("f", CultureInfo.InvariantCulture));
return result;
}
Test Data 3
// Lets take this to the limit
public static List<string> GenerateInput98(int scale)
{
var result = Enumerable.Range(1, scale)
.Select(x => GetRadomFloatString(GetNumberFormatInfo(98)))
.ToList();
result.Insert(0, (-0f).ToString("f", CultureInfo.InvariantCulture));
result.Insert(0, "-0");
result.Insert(0, float.NegativeInfinity.ToString("f", CultureInfo.InvariantCulture));
result.Insert(0, float.PositiveInfinity.ToString("f", CultureInfo.InvariantCulture));
return result;
}
These are the tests I used
Evk
private float ParseMyFloat(string value)
{
var result = float.Parse(value, CultureInfo.InvariantCulture);
if (result == 0f && value.TrimStart()
.StartsWith("-"))
{
result = -0f;
}
return result;
}
Mine safe
I call it safe as it tries to check for invalid strings
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private unsafe float ParseMyFloat(string value)
{
double result = 0, dec = 0;
if (value[0] == 'N' && value == "NaN") return float.NaN;
if (value[0] == 'I' && value == "Infinity")return float.PositiveInfinity;
if (value[0] == '-' && value[1] == 'I' && value == "-Infinity")return float.NegativeInfinity;
fixed (char* ptr = value)
{
char* l, e;
char* start = ptr, length = ptr + value.Length;
if (*ptr == '-') start++;
for (l = start; *l >= '0' && *l <= '9' && l < length; l++)
result = result * 10 + *l - 48;
if (*l == '.')
{
char* r;
for (r = length - 1; r > l && *r >= '0' && *r <= '9'; r--)
dec = (dec + (*r - 48)) / 10;
if (l != r)
throw new FormatException($"Invalid float : {value}");
}
else if (l != length)
throw new FormatException($"Invalid float : {value}");
result += dec;
return *ptr == '-' ? (float)result * -1 : (float)result;
}
}
Unchecked
This fails for larger strings, but is ok for smaller ones
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private unsafe float ParseMyFloat(string value)
{
if (value[0] == 'N' && value == "NaN") return float.NaN;
if (value[0] == 'I' && value == "Infinity") return float.PositiveInfinity;
if (value[0] == '-' && value[1] == 'I' && value == "-Infinity") return float.NegativeInfinity;
fixed (char* ptr = value)
{
var point = 0;
double result = 0, dec = 0;
char* c, start = ptr, length = ptr + value.Length;
if (*ptr == '-') start++;
for (c = start; c < length && *c != '.'; c++)
result = result * 10 + *c - 48;
if (*c == '.')
{
point = (int)(length - 1 - c);
for (c++; c < length; c++)
dec = dec * 10 + *c - 48;
}
// MyPow is just a massive switch statement
if (dec > 0)
result += dec / MyPow(point);
return *ptr == '-' ? (float)result * -1 : (float)result;
}
}
Unchecked 2
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private unsafe float ParseMyFloat(string value)
{
if (value[0] == 'N' && value == "NaN") return float.NaN;
if (value[0] == 'I' && value == "Infinity") return float.PositiveInfinity;
if (value[0] == '-' && value[1] == 'I' && value == "-Infinity") return float.NegativeInfinity;
fixed (char* ptr = value)
{
double result = 0, dec = 0;
char* c, start = ptr, length = ptr + value.Length;
if (*ptr == '-') start++;
for (c = start; c < length && *c != '.'; c++)
result = result * 10 + *c - 48;
// this division seems unsafe for a double,
// however i have tested it with every float and it works
if (*c == '.')
for (var d = length - 1; d > c; d--)
dec = (dec + (*d - 48)) / 10;
result += dec;
return *ptr == '-' ? (float)result * -1 : (float)result;
}
}
Float.parse
float.Parse(t, CultureInfo.InvariantCulture)
Original Answer
Assuming you don't need a TryParse method, i managed to use pointers and custom parsing to achieve what i think you want.
The benchmark uses a list of 1,000,000 random floats and runs each version 100 times, all versions use the same data
Test Framework : .NET Framework 4.7.1
Scale : 1000000
Name | Time | Delta | Deviation | Cycles
----------------------------------------------------------------------
Mine Unchecked2 | 45.585 ms | 1.283 ms | 1.70 | 155,051,452
Mine Unchecked | 46.388 ms | 1.812 ms | 1.17 | 157,751,710
Mine Safe | 46.694 ms | 2.651 ms | 1.07 | 158,697,413
float.Parse | 173.229 ms | 4.795 ms | 5.41 | 589,297,449
Evk | 287.931 ms | 7.447 ms | 11.96 | 979,598,364
Chopped for brevity
Note, Both these version cant deal with extended format, NaN, +Infinity, or -Infinity. However, it wouldn't be hard to implement at little overhead.
I have checked this pretty well, though i must admit i havent written any unit tests, so use at your own risk.
Disclaimer, I think Evk's StartsWith version could probably be more optimized, however it will still be (at best) slightly slower than float.Parse

You can try this:
string target = "-0.0";
decimal result= (decimal.Parse(target,
System.Globalization.NumberStyles.AllowParentheses |
System.Globalization.NumberStyles.AllowLeadingWhite |
System.Globalization.NumberStyles.AllowTrailingWhite |
System.Globalization.NumberStyles.AllowThousands |
System.Globalization.NumberStyles.AllowDecimalPoint |
System.Globalization.NumberStyles.AllowLeadingSign));

Related

How to convert a float/double/half to a minifloat the optimal way (improve my already working code)?

I've written an IEEE 754 "quarter" 8-bit minifloat in a 1.3.4.−3 format in C#.
It was mostly a fun little side-project, testing whether or not I understand floats.
Actually, though, I find myself using it more than I'd like to admit :) (bandwidth > clock ticks)
Here's my code for converting the minifloat to a 32-bit float:
public static implicit operator float(quarter q)
{
int sign = (q.value & 0b1000_0000) << 24;
int fusedExponentMantissa = (q.value & 0b0111_1111) << (23 - MANTISSA_BITS);
if ((q.value & 0b0111_0000) == 0b0111_0000) // NaN/Infinity
{
return asfloat(sign | (255 << 23) | fusedExponentMantissa);
}
else // normal and subnormal
{
float magic = asfloat((255 - 1 + EXPONENT_BIAS) << 23);
return magic * asfloat(sign | fusedExponentMantissa);
}
}
where quarter.value is the stored byte and "asfloat" is simply *(float*)&myUInt.The "magic" number makes use of mantissa overflow in the subnormal case, which affects the f_32 exponent (integer multiplication and mask + add is slower than FPU-switch and float multiplication). I guess one could optimize away the branch, too.
But here comes the problematic code - float_32 to float_8:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) & 0x8000_0000u) >> 24);
uint f32_exponent = asuint(f) & 0x7F80_0000u;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < (120 << 23)) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > (130 << 23)) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
switch (f32_exponent)
{
case 120 << 23: // 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
{
return new quarter { value = (byte)(f8_sign | touint8(f32_mantissa != 0)) };
}
case 121 << 23: // 2^(-6) * (1 + mantissa): return +/- quarter.epsilon = 2^(-2) * (0 + 2^(-4)); if the mantissa is > 0.5 i.e. 2^(-6) * max(mantissa, 1.75), return 2^(-2) * 2^(-3)
{
return new quarter { value = (byte)(f8_sign | (Epsilon.value + touint8(f32_mantissa > 0x0040_0000))) };
}
case 122 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0010u | (f32_mantissa >> 22)) };
}
case 123 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_0100u | (f32_mantissa >> 21)) };
}
case 124 << 23:
{
return new quarter { value = (byte)(f8_sign | 0b0000_1000u | (f32_mantissa >> 20)) };
}
default:
{
const uint exponentDelta = (127 + EXPONENT_BIAS) << 23;
return new quarter { value = (byte)(f8_sign | (((f32_exponent - exponentDelta) | f32_mantissa) >> 19)) };
}
}
}
}
... where the function
"asuint" is simply *(uint*)&myFloat and
"touint8" is simply *(byte*)&myBoolean i.e. myBoolean ? 1 : 0.
The first five cases deal with numbers that can only be represented as subnormals in a "quarter".
I want to get rid of the switch at the very least. There's obviously a pattern (same as with float8_to_float32) but I haven't been able to figure out how I could unify the entire switch for days... I tried to google how hardware converts doubles to floats but that yielded no results either.
My requirements are to hold on to the IEEE-754 standard, meaning:
NaN, infinity preservation and clamping to infinity/zero in case of over-/underflow, aswell as rounding to epsilon when the larger type's value is closer to epsilon than 0 (first switch case aswell as the underflow limit in the first if statement).
Can anyone at least push me in the right direction please?
This may not be optimal, but it uses strictly conforming C code except as noted in the first comment, so no pointer aliasing or other manipulation of the bits of a floating-point object. A thorough test program is included.
#include <inttypes.h>
#include <math.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
/* Notes on portability:
uint8_t is an optional type. Its use here is easily replaced by
unsigned char.
Round-to-nearest is required in FloatToMini.
Floating-point must be base two, and the constant in the
Dekker-Veltkamp split is hardcoded for IEEE-754 binary64 but could be
adopted to other formats. (Change the exponent in 0x1p48 to the number
of bits in the significand minus five.)
*/
/* Convert a double to a 1-3-4 floating-point format. Round-to-nearest is
required.
*/
static uint8_t FloatToMini(double x)
{
// Extract the sign bit of x, moved into its position in a mini-float.
uint8_t s = !!signbit(x) << 7;
x = fabs(x);
/* If x is a NaN, return a quiet NaN with the copied sign. Significand
bits are not preserved.
*/
if (x != x)
return s | 0x78;
/* If |x| is greater than or equal to the rounding point between the
maximum finite value and infinity, return infinity with the copied sign.
(0x1.fp0 is the largest representable significand, 0x1.f8 is that plus
half an ULP, and the largest exponent is 3, so 0x1.f8p3 is that
rounding point.)
*/
if (0x1.f8p3 <= x)
return s | 0x70;
// If x is subnormal, encode with zero exponent.
if (x < 0x1p-2 - 0x1p-7)
return s | (uint8_t) nearbyint(x * 0x1p6);
/* Round to five significand bits using the Dekker-Veltkamp Split. (The
cast eliminates the excess precision that the C standard allows.)
*/
double d = x * (0x1p48 + 1);
x = d - (double) (d-x);
/* Separate the significand and exponent. C's frexp scales the exponent
so the significand is in [.5, 1), hence the e-1 below.
*/
int e;
x = frexp(x, &e) - .5;
return s | (e-1+3) << 4 | (uint8_t) (x*0x1p5);
}
static void Show(double x)
{
printf("%g -> 0x%02" PRIx8 ".\n", x, FloatToMini(x));
}
static void Test(double x, uint8_t expected)
{
uint8_t observed = FloatToMini(x);
if (expected != observed)
{
printf("Error, %.9g (%a) produced 0x%02" PRIx8
" but expected 0x%02" PRIx8 ".\n",
x, x, observed, expected);
exit(EXIT_FAILURE);
}
}
int main(void)
{
// Set the value of an ULP in [1, 2).
static const double ULP = 0x1p-4;
// Test all even significands with normal exponents.
for (double s = 1; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP / (s == 1 ? 4 : 2); t <= +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Test all odd significands with normal exponents.
for (double s = 1 + 1*ULP; s < 2; s += 2*ULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -ULP/2+ULP/16; t < +ULP/2; t += ULP/16)
// Test with all normal exponents.
for (int e = 1-3; e < 7-3; ++e)
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (e+3) << 4
| (uint8_t) ((s-1) * 0x1p4);
Test(sign * ldexp(s+t, e), expected);
}
// Set the value of an ULP in the subnormal range.
static const double subULP = ULP * 0x1p-2;
// Test all even significands with the subnormal exponent.
for (double s = 0; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = s == 0 ? 0 : -subULP/2; t <= +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test all odd significands with the subnormal exponent.
for (double s = 0 + 1*subULP; s < 0x1p-2; s += 2*subULP)
// Test with trailing bits less than or equal to 1/2 ULP in magnitude.
for (double t = -subULP/2 + subULP/16; t < +subULP/2; t += subULP/16)
{
// Test with both signs.
for (int sign = -1; sign <= +1; sign += 2)
{
// Prepare the expected encoding.
uint8_t expected =
(0 < sign ? 0 : 1) << 7
| (uint8_t) (s/subULP);
Test(sign * (s+t), expected);
}
}
// Test at and slightly under the point of rounding to infinity.
Test(+15.75, 0x70);
Test(-15.75, 0xf0);
Test(nexttoward(+15.75, 0), 0x6f);
Test(nexttoward(-15.75, 0), 0xef);
// Test infinities and NaNs.
Test(+INFINITY, 0x70);
Test(-INFINITY, 0xf0);
Test(+NAN, 0x78);
Test(-NAN, 0xf8);
Show(0);
Show(0x1p-6);
Show(0x1p-2);
Show(0x1.1p-2);
Show(0x1.2p-2);
Show(0x1.4p-2);
Show(0x1.8p-2);
Show(0x1p-1);
Show(15.5);
Show(15.75);
Show(16);
Show(NAN);
Show(1./6);
Show(1./3);
Show(2./3);
}
I hate to answer my own question... But this may still not be the optimal solution.
Although #Eric Postpischil's solution uses an established algorithm, it is not very well suited for minifloats, since there are so few denormals in 4 mantissa bits. Additionally, the overhead of multiple float arithmetic operations - and because of the actual code behind frexp in particular, it only has one branch less (or two when inlined and optimized) than my original solution and is also not that great in regards to instruction level parallelism.
So here's my current solution:
public static explicit operator quarter(float f)
{
byte f8_sign = (byte)((asuint(f) >> 31) << 7);
uint f32_exponent = (asuint(f) >> 23) & 0x00FFu;
uint f32_mantissa = asuint(f) & 0x007F_FFFFu;
if (f32_exponent < 120) // underflow => preserve +/- 0
{
return new quarter { value = f8_sign };
}
else if (f32_exponent > 130) // overflow => +/- infinity or preserve NaN
{
return new quarter { value = (byte)(f8_sign | PositiveInfinity.value | touint8(isnan(f))) };
}
else
{
int cmp = 125 - (int)f32_exponent;
int cmpIsZeroOrNegativeMask = (cmp - 1) >> 31;
int denormalExponent = andnot(0b0001_0000 >> cmp, cmpIsZeroOrNegativeMask); // special case 121: sets it to quarter.Epsilon
denormalExponent += touint8((f32_exponent == 121) & (f32_mantissa >= 0x0040_0000)); // case 121: 2^(-6) * (1 + mantissa): return +/- quarter.Epsilon = 2^(-2) * 2^(-4); if the mantissa is >= 0.5 return 2^(-2) * 2^(-3)
denormalExponent |= touint8((f32_exponent == 120) & (f32_mantissa != 0)); // case 120: 2^(-7) * 1.(mantissa > 0) means the value is closer to quarter.epsilon than 0
int normalExponent = (cmpIsZeroOrNegativeMask & ((int)f32_exponent - (127 + EXPONENT_BIAS))) << 4;
int mantissaShift = 19 + andnot(cmp, cmpIsZeroOrNegativeMask);
return new quarter { value = (byte)((f8_sign | normalExponent) | (denormalExponent | (f32_mantissa >> mantissaShift))) };
}
}
But note that the particular andnot(int a, int b) function I use returns a & ~b and...not ~a & b.
Thanks for your help :) I'm keeping this open since, as mentioned, this may very well not be the best solution - but at least it's my own...
PS: This is probably a good example for why PREMATURE optimization is bad; Your code is much less readable. Make sure you have the functionality backed up by unit tests and make sure you even need the optimization in the first place.
...And after some time and in the spirit of transparent progression, I want to show the final version, since I believe to have found the optimal implementation; more later.
First off, here it is (the code should speak for itself, which is why it is this "much"):
unsafe struct quarter
{
const bool IEEE_754_STANDARD = true; //standard: true
const bool SIGN_BIT = IEEE_754_STANDARD || true; //standard: true
const int BITS = 8 * sizeof(byte); //standard: 8
const int EXPONENT_BITS = 3 + (SIGN_BIT ? 0 : 1); //standard: 3
const int MANTISSA_BITS = BITS - EXPONENT_BITS - (SIGN_BIT ? 1 : 0); //standard: 4
const int EXPONENT_BIAS = -(((1 << BITS) - 1) >> (BITS - (EXPONENT_BITS - 1))); //standard: -3
const int MAX_EXPONENT = EXPONENT_BIAS + ((1 << EXPONENT_BITS) - 1) - (IEEE_754_STANDARD ? 1 : 0); //standard: 3
const int SIGNALING_EXPONENT = (MAX_EXPONENT - EXPONENT_BIAS + (IEEE_754_STANDARD ? 1 : 0)) << MANTISSA_BITS; //standard: 0b0111_0000
const int F32_BITS = 8 * sizeof(float);
const int F32_EXPONENT_BITS = 8;
const int F32_MANTISSA_BITS = 23;
const int F32_EXPONENT_BIAS = -(int)(((1L << F32_BITS) - 1) >> (F32_BITS - (F32_EXPONENT_BITS - 1)));
const int F32_MAX_EXPONENT = F32_EXPONENT_BIAS + ((1 << F32_EXPONENT_BITS) - 1 - 1);
const int F32_SIGNALING_EXPONENT = (F32_MAX_EXPONENT - F32_EXPONENT_BIAS + 1) << F32_MANTISSA_BITS;
const int F32_SHL_LOSE_SIGN = (F32_BITS - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_SHR_PLACE_MANTISSA = MANTISSA_BITS + ((1 + F32_EXPONENT_BITS) - (MANTISSA_BITS + EXPONENT_BITS));
const int F32_MAGIC = (((1 << F32_EXPONENT_BITS) - 1) - (1 + EXPONENT_BITS)) << F32_MANTISSA_BITS;
byte _value;
static quarter Epsilon => new quarter { _value = 1 };
static quarter MaxValue => new quarter { _value = (byte)(SIGNALING_EXPONENT - 1) };
static quarter NaN => new quarter { _value = (byte)(SIGNALING_EXPONENT | 1) };
static quarter PositiveInfinity => new quarter { _value = (byte)SIGNALING_EXPONENT };
static uint asuint(float f) => *(uint*)&f;
static float asfloat(uint u) => *(float*)&u;
static byte tobyte(bool b) => *(byte*)&b;
static float ToFloat(quarter q, bool promiseInRange)
{
uint fusedExponentMantissa = ((uint)q._value << F32_SHL_LOSE_SIGN) >> F32_SHR_PLACE_MANTISSA;
uint sign = ((uint)q._value >> (BITS - 1)) << (F32_BITS - 1);
if (!promiseInRange)
{
bool nanInf = (q._value & SIGNALING_EXPONENT) == SIGNALING_EXPONENT;
uint ifNanInf = asuint(float.PositiveInfinity) & (uint)(-tobyte(nanInf));
return (nanInf ? 1f : asfloat(F32_MAGIC)) * asfloat(sign | fusedExponentMantissa | ifNanInf);
}
else
{
return asfloat(F32_MAGIC) * asfloat(sign | fusedExponentMantissa);
}
}
static quarter ToQuarter(float f, bool promiseInRange)
{
float inRange = f * (1f / asfloat(F32_MAGIC));
uint q = asuint(inRange) >> (F32_MANTISSA_BITS - (1 + EXPONENT_BITS));
uint f8_sign = asuint(f) >> (F32_BITS - 1);
if (!promiseInRange)
{
uint f32_exponent = asuint(f) & F32_SIGNALING_EXPONENT;
bool overflow = f32_exponent > (uint)(-F32_EXPONENT_BIAS + MAX_EXPONENT << F32_MANTISSA_BITS);
bool notNaNInf = f32_exponent != F32_SIGNALING_EXPONENT;
f8_sign ^= tobyte(!notNaNInf);
if (overflow & notNaNInf)
{
q = PositiveInfinity._value;
}
}
f8_sign <<= (BITS - 1);
return new quarter{ _value = (byte)(q ^ f8_sign) };
}
}
Turns out that in fact, the reverse operation of converting the mini-float to a 32 bit float by multiplying with a magic constant is also the reverse operation of a multiplication (wow...): a floating point division by that constant.
Luckily "by that constant" and not the other way around; we can calculate the reciprocal at compile time and multiply by it instead. This only fails, as with the reverse operation, when converting to- and from 'INF' and 'NaN'. Absolute overflow with any biased 32 exponent with exponent % (MAX_EXPONENT + 1) != 0 is not translated into 'INF' and positive 'INF' is translated into negative 'INF'.
Although this enables some optimizations through the bool paramater, this mostly just reduces code size and more importantly (especially for SIMD versions, where small data types really shine) reduces the need for constants. Speaking of SIMD: This scalar version can be optimized a little by using SSE/SSE2 intrinsics.
The (disabled) optimizations (would) run completely in parallel to the floating point multiplication followed by a shift, taking a total of 5 to 6+ clock cycles (very CPU dependant), which is astonishingly close to native hardware instructions (~4 to 5 clock cycles).

Why is BitArray faster than array of bools?

I have this implementation of Sieve of Eratosthenes in C#:
public static BitArray Count()
{
const int halfSize = MaxSize / 2;
var mark = new BitArray(halfSize);
const int max = halfSize - 2;
var maxFactor = (int) Math.Sqrt(MaxSize + 1) / 2;
for (var i = 1; i <= maxFactor; ++i)
{
if (mark[i]) continue;
var p = i + i + 1;
var k = p * p >> 1;
for (; k <= max; k += p)
{
mark[k] = true;
}
}
return mark;
}
It gives results good enough for me. Nonetheless, I decided to test this algorithm using arrays of bools, expecting it to use more memory but be faster. And to my surprise that wasn't the result. Benchmark.NET on .NET Core 3.1 shows that bool array is more than two times slower than BitArray. Considering that latter uses more method calls and gives much longer asm (BitArray vs. bool array), how is it possible?
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| Method | Mean | Error | StdDev | Median | Min | Max | Op/s | Allocated |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| SieveBool | 294.7 ms | 4.00 ms | 3.74 ms | 293.5 ms | 290.8 ms | 304.0 ms | 3.393 | 33.38 MB |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
| SieveBitArray | 130.2 ms | 1.03 ms | 0.97 ms | 130.3 ms | 128.5 ms | 132.1 ms | 7.680 | 4.17 MB |
+---------------+----------+---------+---------+----------+----------+----------+-------+-----------+
Results are similar when using fields instead of initializing arrays in methods (except there is no allocation of course).

I want to find the maximum frequency of a common digit in a consecutive subset of integer array

The partial digit subsequence of an array A is a subsequence of integers in which each consecutive integers have at least 1 digit in common
I keep a dictionary with 0 to 9 characters and the count of each subsequent characters. then i loop through all values in integer array and take each digit and check my dictionary for the count of that digit.
public static void Main(string[] args)
{
Dictionary<char, int> dct = new Dictionary<char, int>
{
{ '0', 0 },
{ '1', 0 },
{ '2', 0 },
{ '3', 0 },
{ '4', 0 },
{ '5', 0 },
{ '6', 0 },
{ '7', 0 },
{ '8', 0 },
{ '9', 0 }
};
string[] arr = Console.ReadLine().Split(' ');
for (int i = 0; i < arr.Length; i++)
{
string str = string.Join("", arr[i].Distinct());
for (int j = 0; j < str.Length; j++)
{
int count = dct[str[j]];
if (count == i || (i > 0 && arr[i - 1].Contains(str[j])))
{
count++;
dct[str[j]] = count;
}
else dct[str[j]] = 1;
}
}
string s = dct.Aggregate((l, r) => l.Value > r.Value ? l : r).Key.ToString();
Console.WriteLine(s);
}
for e.g,
12 23 231
Answer would be 2 because it occurs 3 times
The array can contain 10^18 elements.
can someone help me with an optimal solution. This algorithm is not fit to handle large data in an array.
All the posted answers are wrong because all of them have ignored the most important part of the question:
The array can contain 10^18 elements.
This array is being read from disk? Supposing each element is two bytes, that's two million terabyte drives just for the array. I don't think that's going to fit into memory. You'll have to go with a streaming solution.
How long will the streaming solution take? If you can process a billion array items a second, which seems within reason, your program will take 32 years to execute.
Your requirements are not realistic, and so the problem cannot feasibly be solved with the resources of a single person. You'll need the resources of a large corporation or nation to attack this problem, and you'll need a lot of funding for hardware acquisition and management.
The linear algorithm is trivial; it's the size of the data that is the entire problem. Start building your data center somewhere with cheap power and friendly tax laws, because you are going to be importing a lot of disks.
You shouldn't need to go through the array elements one by one, you can simply merge the entire string array into 1 string and go through the characters
12 23 231 -> "1223231" , loop through and count it.
It should be fast enough O(n) and require only 10 entries in your dictionary. How "fast" do you exactly need ?
I didn't use arrays, I'm not sure if u must use arrays, if not, check this solution.
static void Main(string[] args)
{
List<char> numbers = new List<char>();
Dictionary<char, int> dct = new Dictionary<char, int>()
{
{ '0',0 },
{ '1',0 },
{ '2',0 },
{ '3',0 },
{ '4',0 },
{ '5',0 },
{ '6',0 },
{ '7',0 },
{ '8',0 },
{ '9',0 },
};
string option;
do
{
Console.Write("Enter number: ");
string number = Console.ReadLine();
numbers.AddRange(number);
Console.Write("Enter 'X' if u want to finish work: ");
option = Console.ReadLine();
} while (option.ToLower() != "x");
foreach(char c in numbers)
{
if(dct.ContainsKey(c))
{
dct[c]++;
}
}
foreach(var keyValue in dct)
{
Console.WriteLine($"Char {keyValue.Key} was used {keyValue.Value} times");
}
Console.ReadKey(true);
}
Certainly not an efficient solution but this will work.
public class Program
{
public static int arrLength = 0;
public static string[] arr;
public static Dictionary<char, int> dct = new Dictionary<char, int>();
public static void Main(string[] args)
{
dct.Add('0', 0);
dct.Add('1', 0);
dct.Add('2', 0);
dct.Add('3', 0);
dct.Add('4', 0);
dct.Add('5', 0);
dct.Add('6', 0);
dct.Add('7', 0);
dct.Add('8', 0);
dct.Add('9', 0);
arr = Console.ReadLine().Split(' ');
arrLength = arr.Length;
foreach (string str in arr)
{
char[] ch = str.ToCharArray();
ch = ch.Distinct<char>().ToArray();
foreach (char c in ch)
{
Exists(c, Array.IndexOf(arr, str));
}
}
int val = dct.Values.Max();
foreach(KeyValuePair<char,int> v in dct.Where(x => x.Value == val))
{
Console.WriteLine("Common digit {0} with frequency {1} ",v.Key,v.Value+1);
}
Console.ReadLine();
}
public static bool Exists(char c, int pos)
{
int count = 0;
if (pos == arrLength - 1)
return false;
for (int i = pos; i < arrLength - 1; i++)
{
if (arr[i + 1].ToCharArray().Contains(c))
{
count++;
if (count > dct[c])
dct[c] = count;
}
else
break;
}
return true;
}
}
As somebody else pointed out, if you have 10^18 numbers then this is going to be a lot more data than you can fit into memory. So you need a streaming solution. You also don't want to spend a lot of time on memory allocation or converting strings to character arrays, calling functions to de-duplicate digits, etc. Ideally, you need a solution that looks at each character once.
The memory requirement of the program below is very small: just two small arrays of long integers.
The algorithm I developed maintains two arrays of counts per digit. One is the maximum number of consecutive occurrences of a digit, and the other is the most recent count of consecutive occurrences.
The code itself reads the file character-by-character, accumulating digits until it encounters a character that is not a digit, then it updates the current counts array for each digit encountered. If the current count exceeds the maximum count, then the max count for that digit is updated. If a digit doesn't appear in a number, then its current count is reset to 0.
The occurrence of individual digits in a number is maintained by setting bits in the digits variable. That way, a number like 1221 will not count the digits twice.
using (var input = File.OpenText("filename"))
{
var maxCounts = new long[]{0,0,0,0,0,0,0,0,0,0};
var latestCounts = new long[]{0,0,0,0,0,0,0,0,0,0};
char prevChar = ' ';
word digits = 0;
while (!input.EndOfStream)
{
var c = input.Read();
// If the character is a digit, set the corresponding bit
if (char.IsDigit(c))
{
digits |= (1 << (c-'0'));
prevChar = c;
continue;
}
// test here to prevent resetting counts when there are multiple non-digit
// characters between numbers.
if (!char.IsDigit(prevChar))
{
continue;
}
prevChar = c;
// digits has a bit set for every digit
// that occurred in the number.
// Update the counts arrays.
// For each of the first 10 bits, update the corresponding count.
for (int i = 0; i < 10; ++i)
{
if ((digits & 1) == 1)
{
++latestCounts[i];
if (latestCounts[i] > maxCounts[i])
{
maxCounts[i] = latestCounts[i];
}
}
else
{
latestCounts[i] = 0;
}
// Shift the next bit into place.
digits >> 1;
}
digits = 0;
}
}
This code minimizes the processing required, but the program's running time will be dominated by the speed at which you can read the file. There are optimizations you can make to increase the input speed, but ultimately you're limited to your system's data transfer speed.
I'll give you three versions.
Basically, I just loaded up a list of random ints as string, the scale is how many, and run it on Core and Framework to see. Each test was run 10 times and averaged.
Mine1
Uses Distinct
public unsafe class Mine : Benchmark<List<string>, char>
{
protected override char InternalRun()
{
var result = new int[10];
var asd = Input.Select(x => new string(x.Distinct().ToArray())).ToList();
var raw = string.Join("", asd);
fixed (char* pInput = raw)
{
var len = pInput + raw.Length;
for (var p = pInput; p < len; p++)
{
result[*p - 48]++;
}
}
return (char)(result.ToList().IndexOf(result.Max()) + '0');
}
}
Mine2
Basically this uses a second array to work things out
public unsafe class Mine2 : Benchmark<List<string>, char>
{
protected override char InternalRun()
{
var result = new int[10];
var current = new int[10];
var raw = string.Join(" ", Input);
fixed (char* pInput = raw)
{
var len = pInput + raw.Length;
for (var p = pInput; p < len; p++)
if (*p != ' ')
current[*p - 48] = 1;
else
for (var i = 0; i < 10; i++)
{
result[i] += current[i];
current[i] = 0;
}
}
return (char)(result.ToList().IndexOf(result.Max()) + '0');
}
}
Mine3
No Joins or string allocation
public unsafe class Mine3 : Benchmark<List<string>, char>
{
protected override char InternalRun()
{
var result = new int[10];
foreach (var item in Input)
fixed (char* pInput = item)
{
var current = new int[10];
var len = pInput + item.Length;
for (var p = pInput; p < len; p++)
current[*p - 48] = 1;
for (var i = 0; i < 10; i++)
{
result[i] += current[i];
current[i] = 0;
}
}
return (char)(result.ToList().IndexOf(result.Max()) + '0');
}
}
#Results .Net Framework 4.7.1
Mode : Release
Test Framework : .Net Framework 4.7.1
Benchmarks runs : 10 times (averaged)
Scale : 10,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
--------------------------------------------------------------------------
Mine3 | 0.533 ms | 0.431 ms | 0.10 | 1,751,372 | Base | 0.00 %
Mine2 | 0.994 ms | 0.773 ms | 0.38 | 3,100,896 | Yes | -86.63 %
Mine | 8.122 ms | 7.012 ms | 1.29 | 27,480,083 | Yes | -1,424.78 %
Original | 20.729 ms | 16.044 ms | 4.56 | 65,316,558 | No | -3,791.47 %
Scale : 100,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
------------------------------------------------------------------------------
Mine3 | 4.766 ms | 4.475 ms | 0.34 | 16,140,716 | Base | 0.00 %
Mine2 | 8.424 ms | 7.890 ms | 0.33 | 28,577,416 | Yes | -76.76 %
Mine | 96.650 ms | 93.066 ms | 3.35 | 327,615,266 | Yes | -1,927.94 %
Original | 163.342 ms | 154.070 ms | 12.61 | 550,038,934 | No | -3,327.32 %
Scale : 1,000,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
------------------------------------------------------------------------------------
Mine3 | 49.827 ms | 48.600 ms | 1.19 | 169,162,589 | Base | 0.00 %
Mine2 | 106.334 ms | 97.641 ms | 6.53 | 359,773,719 | Yes | -113.41 %
Mine | 1,051.600 ms | 1,000.731 ms | 35.75 | 3,511,515,189 | Yes | -2,010.51 %
Original | 1,640.385 ms | 1,588.431 ms | 65.50 | 5,538,915,638 | No | -3,192.18 %
#Results .Net Core 2.0
Mode : Release
Test Framework : Core 2.0
Benchmarks runs : 10 times (averaged)
Scale : 10,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
--------------------------------------------------------------------------
Mine3 | 0.476 ms | 0.353 ms | 0.12 | 1,545,995 | Base | 0.00 %
Mine2 | 0.554 ms | 0.551 ms | 0.00 | 1,883,570 | Yes | -16.23 %
Mine | 7.585 ms | 5.875 ms | 1.27 | 25,580,339 | Yes | -1,492.28 %
Original | 21.380 ms | 16.263 ms | 6.46 | 65,741,909 | No | -4,388.14 %
Scale : 100,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
------------------------------------------------------------------------------
Mine3 | 3.946 ms | 3.685 ms | 0.25 | 13,409,181 | Base | 0.00 %
Mine2 | 6.203 ms | 5.796 ms | 0.33 | 21,042,340 | Yes | -57.21 %
Mine | 72.975 ms | 68.599 ms | 4.13 | 246,471,960 | Yes | -1,749.41 %
Original | 161.400 ms | 145.664 ms | 19.37 | 544,703,761 | Yes | -3,990.40 %
Scale : 1,000,000
Name | Average | Fastest | StDv | Cycles | Pass | Gain
------------------------------------------------------------------------------------
Mine3 | 41.036 ms | 38.928 ms | 2.47 | 139,045,736 | Base | 0.00 %
Mine2 | 71.283 ms | 68.777 ms | 2.49 | 241,525,269 | Yes | -73.71 %
Mine | 749.250 ms | 720.809 ms | 27.79 | 2,479,171,863 | Yes | -1,725.84 %
Original | 1,517.240 ms | 1,477.321 ms | 48.94 | 5,142,422,700 | No | -3,597.35 %
#Summary
String allocation, join, and distinct suck for performance. If you need more performance you could probably break the list up into work loads and smash this in parallel.

Comparing Byte Arrays In C# (without for loop) [duplicate]

How can I do this fast?
Sure I can do this:
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length)
return false;
for (int i=0; i<a1.Length; i++)
if (a1[i]!=a2[i])
return false;
return true;
}
But I'm looking for either a BCL function or some highly optimized proven way to do this.
java.util.Arrays.equals((sbyte[])(Array)a1, (sbyte[])(Array)a2);
works nicely, but it doesn't look like that would work for x64.
Note my super-fast answer here.
You can use Enumerable.SequenceEqual method.
using System;
using System.Linq;
...
var a1 = new int[] { 1, 2, 3};
var a2 = new int[] { 1, 2, 3};
var a3 = new int[] { 1, 2, 4};
var x = a1.SequenceEqual(a2); // true
var y = a1.SequenceEqual(a3); // false
If you can't use .NET 3.5 for some reason, your method is OK.
Compiler\run-time environment will optimize your loop so you don't need to worry about performance.
P/Invoke powers activate!
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static bool ByteArrayCompare(byte[] b1, byte[] b2)
{
// Validate buffers are the same length.
// This also ensures that the count does not exceed the length of either buffer.
return b1.Length == b2.Length && memcmp(b1, b2, b1.Length) == 0;
}
Span<T> offers an extremely competitive alternative without having to throw confusing and/or non-portable fluff into your own application's code base:
// byte[] is implicitly convertible to ReadOnlySpan<byte>
static bool ByteArrayCompare(ReadOnlySpan<byte> a1, ReadOnlySpan<byte> a2)
{
return a1.SequenceEqual(a2);
}
The (guts of the) implementation as of .NET 6.0.4 can be found here.
I've revised #EliArbel's gist to add this method as SpansEqual, drop most of the less interesting performers in others' benchmarks, run it with different array sizes, output graphs, and mark SpansEqual as the baseline so that it reports how the different methods compare to SpansEqual.
The below numbers are from the results, lightly edited to remove "Error" column.
| Method | ByteCount | Mean | StdDev | Ratio | RatioSD |
|-------------- |----------- |-------------------:|----------------:|------:|--------:|
| SpansEqual | 15 | 2.074 ns | 0.0233 ns | 1.00 | 0.00 |
| LongPointers | 15 | 2.854 ns | 0.0632 ns | 1.38 | 0.03 |
| Unrolled | 15 | 12.449 ns | 0.2487 ns | 6.00 | 0.13 |
| PInvokeMemcmp | 15 | 7.525 ns | 0.1057 ns | 3.63 | 0.06 |
| | | | | | |
| SpansEqual | 1026 | 15.629 ns | 0.1712 ns | 1.00 | 0.00 |
| LongPointers | 1026 | 46.487 ns | 0.2938 ns | 2.98 | 0.04 |
| Unrolled | 1026 | 23.786 ns | 0.1044 ns | 1.52 | 0.02 |
| PInvokeMemcmp | 1026 | 28.299 ns | 0.2781 ns | 1.81 | 0.03 |
| | | | | | |
| SpansEqual | 1048585 | 17,920.329 ns | 153.0750 ns | 1.00 | 0.00 |
| LongPointers | 1048585 | 42,077.448 ns | 309.9067 ns | 2.35 | 0.02 |
| Unrolled | 1048585 | 29,084.901 ns | 428.8496 ns | 1.62 | 0.03 |
| PInvokeMemcmp | 1048585 | 30,847.572 ns | 213.3162 ns | 1.72 | 0.02 |
| | | | | | |
| SpansEqual | 2147483591 | 124,752,376.667 ns | 552,281.0202 ns | 1.00 | 0.00 |
| LongPointers | 2147483591 | 139,477,269.231 ns | 331,458.5429 ns | 1.12 | 0.00 |
| Unrolled | 2147483591 | 137,617,423.077 ns | 238,349.5093 ns | 1.10 | 0.00 |
| PInvokeMemcmp | 2147483591 | 138,373,253.846 ns | 288,447.8278 ns | 1.11 | 0.01 |
I was surprised to see SpansEqual not come out on top for the max-array-size methods, but the difference is so minor that I don't think it'll ever matter. After refreshing to run on .NET 6.0.4 with my newer hardware, SpansEqual now comfortably outperforms all others at all array sizes.
My system info:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.202
[Host] : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
There's a new built-in solution for this in .NET 4 - IStructuralEquatable
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(a1, a2);
}
Edit: modern fast way is to use a1.SequenceEquals(a2)
User gil suggested unsafe code which spawned this solution:
// Copyright (c) 2008-2013 Hafthor Stefansson
// Distributed under the MIT/X11 software license
// Ref: http://www.opensource.org/licenses/mit-license.php.
static unsafe bool UnsafeCompare(byte[] a1, byte[] a2) {
unchecked {
if(a1==a2) return true;
if(a1==null || a2==null || a1.Length!=a2.Length)
return false;
fixed (byte* p1=a1, p2=a2) {
byte* x1=p1, x2=p2;
int l = a1.Length;
for (int i=0; i < l/8; i++, x1+=8, x2+=8)
if (*((long*)x1) != *((long*)x2)) return false;
if ((l & 4)!=0) { if (*((int*)x1)!=*((int*)x2)) return false; x1+=4; x2+=4; }
if ((l & 2)!=0) { if (*((short*)x1)!=*((short*)x2)) return false; x1+=2; x2+=2; }
if ((l & 1)!=0) if (*((byte*)x1) != *((byte*)x2)) return false;
return true;
}
}
}
which does 64-bit based comparison for as much of the array as possible. This kind of counts on the fact that the arrays start qword aligned. It'll work if not qword aligned, just not as fast as if it were.
It performs about seven timers faster than the simple `for` loop. Using the J# library performed equivalently to the original `for` loop. Using .SequenceEqual runs around seven times slower; I think just because it is using IEnumerator.MoveNext. I imagine LINQ-based solutions being at least that slow or worse.
If you are not opposed to doing it, you can import the J# assembly "vjslib.dll" and use its Arrays.equals(byte[], byte[]) method...
Don't blame me if someone laughs at you though...
EDIT: For what little it is worth, I used Reflector to disassemble the code for that, and here is what it looks like:
public static bool equals(sbyte[] a1, sbyte[] a2)
{
if (a1 == a2)
{
return true;
}
if ((a1 != null) && (a2 != null))
{
if (a1.Length != a2.Length)
{
return false;
}
for (int i = 0; i < a1.Length; i++)
{
if (a1[i] != a2[i])
{
return false;
}
}
return true;
}
return false;
}
.NET 3.5 and newer have a new public type, System.Data.Linq.Binary that encapsulates byte[]. It implements IEquatable<Binary> that (in effect) compares two byte arrays. Note that System.Data.Linq.Binary also has implicit conversion operator from byte[].
MSDN documentation:System.Data.Linq.Binary
Reflector decompile of the Equals method:
private bool EqualsTo(Binary binary)
{
if (this != binary)
{
if (binary == null)
{
return false;
}
if (this.bytes.Length != binary.bytes.Length)
{
return false;
}
if (this.hashCode != binary.hashCode)
{
return false;
}
int index = 0;
int length = this.bytes.Length;
while (index < length)
{
if (this.bytes[index] != binary.bytes[index])
{
return false;
}
index++;
}
}
return true;
}
Interesting twist is that they only proceed to byte-by-byte comparison loop if hashes of the two Binary objects are the same. This, however, comes at the cost of computing the hash in constructor of Binary objects (by traversing the array with for loop :-) ).
The above implementation means that in the worst case you may have to traverse the arrays three times: first to compute hash of array1, then to compute hash of array2 and finally (because this is the worst case scenario, lengths and hashes equal) to compare bytes in array1 with bytes in array 2.
Overall, even though System.Data.Linq.Binary is built into BCL, I don't think it is the fastest way to compare two byte arrays :-|.
I posted a similar question about checking if byte[] is full of zeroes. (SIMD code was beaten so I removed it from this answer.) Here is fastest code from my comparisons:
static unsafe bool EqualBytesLongUnrolled (byte[] data1, byte[] data2)
{
if (data1 == data2)
return true;
if (data1.Length != data2.Length)
return false;
fixed (byte* bytes1 = data1, bytes2 = data2) {
int len = data1.Length;
int rem = len % (sizeof(long) * 16);
long* b1 = (long*)bytes1;
long* b2 = (long*)bytes2;
long* e1 = (long*)(bytes1 + len - rem);
while (b1 < e1) {
if (*(b1) != *(b2) || *(b1 + 1) != *(b2 + 1) ||
*(b1 + 2) != *(b2 + 2) || *(b1 + 3) != *(b2 + 3) ||
*(b1 + 4) != *(b2 + 4) || *(b1 + 5) != *(b2 + 5) ||
*(b1 + 6) != *(b2 + 6) || *(b1 + 7) != *(b2 + 7) ||
*(b1 + 8) != *(b2 + 8) || *(b1 + 9) != *(b2 + 9) ||
*(b1 + 10) != *(b2 + 10) || *(b1 + 11) != *(b2 + 11) ||
*(b1 + 12) != *(b2 + 12) || *(b1 + 13) != *(b2 + 13) ||
*(b1 + 14) != *(b2 + 14) || *(b1 + 15) != *(b2 + 15))
return false;
b1 += 16;
b2 += 16;
}
for (int i = 0; i < rem; i++)
if (data1 [len - 1 - i] != data2 [len - 1 - i])
return false;
return true;
}
}
Measured on two 256MB byte arrays:
UnsafeCompare : 86,8784 ms
EqualBytesSimd : 71,5125 ms
EqualBytesSimdUnrolled : 73,1917 ms
EqualBytesLongUnrolled : 39,8623 ms
using System.Linq; //SequenceEqual
byte[] ByteArray1 = null;
byte[] ByteArray2 = null;
ByteArray1 = MyFunct1();
ByteArray2 = MyFunct2();
if (ByteArray1.SequenceEqual<byte>(ByteArray2) == true)
{
MessageBox.Show("Match");
}
else
{
MessageBox.Show("Don't match");
}
Let's add one more!
Recently Microsoft released a special NuGet package, System.Runtime.CompilerServices.Unsafe. It's special because it's written in IL, and provides low-level functionality not directly available in C#.
One of its methods, Unsafe.As<T>(object) allows casting any reference type to another reference type, skipping any safety checks. This is usually a very bad idea, but if both types have the same structure, it can work. So we can use this to cast a byte[] to a long[]:
bool CompareWithUnsafeLibrary(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length) return false;
var longSize = (int)Math.Floor(a1.Length / 8.0);
var long1 = Unsafe.As<long[]>(a1);
var long2 = Unsafe.As<long[]>(a2);
for (var i = 0; i < longSize; i++)
{
if (long1[i] != long2[i]) return false;
}
for (var i = longSize * 8; i < a1.Length; i++)
{
if (a1[i] != a2[i]) return false;
}
return true;
}
Note that long1.Length would still return the original array's length, since it's stored in a field in the array's memory structure.
This method is not quite as fast as other methods demonstrated here, but it is a lot faster than the naive method, doesn't use unsafe code or P/Invoke or pinning, and the implementation is quite straightforward (IMO). Here are some BenchmarkDotNet results from my machine:
BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4870HQ CPU 2.50GHz, ProcessorCount=8
Frequency=2435775 Hz, Resolution=410.5470 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
Method | Mean | StdDev |
----------------------- |-------------- |---------- |
UnsafeLibrary | 125.8229 ns | 0.3588 ns |
UnsafeCompare | 89.9036 ns | 0.8243 ns |
JSharpEquals | 1,432.1717 ns | 1.3161 ns |
EqualBytesLongUnrolled | 43.7863 ns | 0.8923 ns |
NewMemCmp | 65.4108 ns | 0.2202 ns |
ArraysEqual | 910.8372 ns | 2.6082 ns |
PInvokeMemcmp | 52.7201 ns | 0.1105 ns |
I've also created a gist with all the tests.
I developed a method that slightly beats memcmp() (plinth's answer) and very slighly beats EqualBytesLongUnrolled() (Arek Bulski's answer) on my PC. Basically, it unrolls the loop by 4 instead of 8.
Update 30 Mar. 2019:
Starting in .NET core 3.0, we have SIMD support!
This solution is fastest by a considerable margin on my PC:
#if NETCOREAPP3_0
using System.Runtime.Intrinsics.X86;
#endif
…
public static unsafe bool Compare(byte[] arr0, byte[] arr1)
{
if (arr0 == arr1)
{
return true;
}
if (arr0 == null || arr1 == null)
{
return false;
}
if (arr0.Length != arr1.Length)
{
return false;
}
if (arr0.Length == 0)
{
return true;
}
fixed (byte* b0 = arr0, b1 = arr1)
{
#if NETCOREAPP3_0
if (Avx2.IsSupported)
{
return Compare256(b0, b1, arr0.Length);
}
else if (Sse2.IsSupported)
{
return Compare128(b0, b1, arr0.Length);
}
else
#endif
{
return Compare64(b0, b1, arr0.Length);
}
}
}
#if NETCOREAPP3_0
public static unsafe bool Compare256(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus128 = lastAddr - 128;
const int mask = -1;
while (b0 < lastAddrMinus128) // unroll the loop so that we are comparing 128 bytes at a time.
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 32), Avx.LoadVector256(b1 + 32))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 64), Avx.LoadVector256(b1 + 64))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 96), Avx.LoadVector256(b1 + 96))) != mask)
{
return false;
}
b0 += 128;
b1 += 128;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
public static unsafe bool Compare128(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus64 = lastAddr - 64;
const int mask = 0xFFFF;
while (b0 < lastAddrMinus64) // unroll the loop so that we are comparing 64 bytes at a time.
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 16), Sse2.LoadVector128(b1 + 16))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 32), Sse2.LoadVector128(b1 + 32))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 48), Sse2.LoadVector128(b1 + 48))) != mask)
{
return false;
}
b0 += 64;
b1 += 64;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
#endif
public static unsafe bool Compare64(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus32 = lastAddr - 32;
while (b0 < lastAddrMinus32) // unroll the loop so that we are comparing 32 bytes at a time.
{
if (*(ulong*)b0 != *(ulong*)b1) return false;
if (*(ulong*)(b0 + 8) != *(ulong*)(b1 + 8)) return false;
if (*(ulong*)(b0 + 16) != *(ulong*)(b1 + 16)) return false;
if (*(ulong*)(b0 + 24) != *(ulong*)(b1 + 24)) return false;
b0 += 32;
b1 += 32;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
I would use unsafe code and run the for loop comparing Int32 pointers.
Maybe you should also consider checking the arrays to be non-null.
If you look at how .NET does string.Equals, you see that it uses a private method called EqualsHelper which has an "unsafe" pointer implementation. .NET Reflector is your friend to see how things are done internally.
This can be used as a template for byte array comparison which I did an implementation on in blog post Fast byte array comparison in C#. I also did some rudimentary benchmarks to see when a safe implementation is faster than the unsafe.
That said, unless you really need killer performance, I'd go for a simple fr loop comparison.
For those of you that care about order (i.e. want your memcmp to return an int like it should instead of nothing), .NET Core 3.0 (and presumably .NET Standard 2.1 aka .NET 5.0) will include a Span.SequenceCompareTo(...) extension method (plus a Span.SequenceEqualTo) that can be used to compare two ReadOnlySpan<T> instances (where T: IComparable<T>).
In the original GitHub proposal, the discussion included approach comparisons with jump table calculations, reading a byte[] as long[], SIMD usage, and p/invoke to the CLR implementation's memcmp.
Going forward, this should be your go-to method for comparing byte arrays or byte ranges (as should using Span<byte> instead of byte[] for your .NET Standard 2.1 APIs), and it is sufficiently fast enough that you should no longer care about optimizing it (and no, despite the similarities in name it does not perform as abysmally as the horrid Enumerable.SequenceEqual).
#if NETCOREAPP3_0_OR_GREATER
// Using the platform-native Span<T>.SequenceEqual<T>(..)
public static int Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
var span1 = range1.AsSpan(offset1, count);
var span2 = range2.AsSpan(offset2, count);
return span1.SequenceCompareTo(span2);
// or, if you don't care about ordering
// return span1.SequenceEqual(span2);
}
#else
// The most basic implementation, in platform-agnostic, safe C#
public static bool Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
// Working backwards lets the compiler optimize away bound checking after the first loop
for (int i = count - 1; i >= 0; --i)
{
if (range1[offset1 + i] != range2[offset2 + i])
{
return false;
}
}
return true;
}
#endif
I did some measurements using attached program .net 4.7 release build without the debugger attached. I think people have been using the wrong metric since what you are about if you care about speed here is how long it takes to figure out if two byte arrays are equal. i.e. throughput in bytes.
StructuralComparison : 4.6 MiB/s
for : 274.5 MiB/s
ToUInt32 : 263.6 MiB/s
ToUInt64 : 474.9 MiB/s
memcmp : 8500.8 MiB/s
As you can see, there's no better way than memcmp and it's orders of magnitude faster. A simple for loop is the second best option. And it still boggles my mind why Microsoft cannot simply include a Buffer.Compare method.
[Program.cs]:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace memcmp
{
class Program
{
static byte[] TestVector(int size)
{
var data = new byte[size];
using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider())
{
rng.GetBytes(data);
}
return data;
}
static TimeSpan Measure(string testCase, TimeSpan offset, Action action, bool ignore = false)
{
var t = Stopwatch.StartNew();
var n = 0L;
while (t.Elapsed < TimeSpan.FromSeconds(10))
{
action();
n++;
}
var elapsed = t.Elapsed - offset;
if (!ignore)
{
Console.WriteLine($"{testCase,-16} : {n / elapsed.TotalSeconds,16:0.0} MiB/s");
}
return elapsed;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static void Main(string[] args)
{
// how quickly can we establish if two sequences of bytes are equal?
// note that we are testing the speed of different comparsion methods
var a = TestVector(1024 * 1024); // 1 MiB
var b = (byte[])a.Clone();
// was meant to offset the overhead of everything but copying but my attempt was a horrible mistake... should have reacted sooner due to the initially ridiculous throughput values...
// Measure("offset", new TimeSpan(), () => { return; }, ignore: true);
var offset = TimeZone.Zero
Measure("StructuralComparison", offset, () =>
{
StructuralComparisons.StructuralEqualityComparer.Equals(a, b);
});
Measure("for", offset, () =>
{
for (int i = 0; i < a.Length; i++)
{
if (a[i] != b[i]) break;
}
});
Measure("ToUInt32", offset, () =>
{
for (int i = 0; i < a.Length; i += 4)
{
if (BitConverter.ToUInt32(a, i) != BitConverter.ToUInt32(b, i)) break;
}
});
Measure("ToUInt64", offset, () =>
{
for (int i = 0; i < a.Length; i += 8)
{
if (BitConverter.ToUInt64(a, i) != BitConverter.ToUInt64(b, i)) break;
}
});
Measure("memcmp", offset, () =>
{
memcmp(a, b, a.Length);
});
}
}
}
Couldn't find a solution I'm completely happy with (reasonable performance, but no unsafe code/pinvoke) so I came up with this, nothing really original, but works:
/// <summary>
///
/// </summary>
/// <param name="array1"></param>
/// <param name="array2"></param>
/// <param name="bytesToCompare"> 0 means compare entire arrays</param>
/// <returns></returns>
public static bool ArraysEqual(byte[] array1, byte[] array2, int bytesToCompare = 0)
{
if (array1.Length != array2.Length) return false;
var length = (bytesToCompare == 0) ? array1.Length : bytesToCompare;
var tailIdx = length - length % sizeof(Int64);
//check in 8 byte chunks
for (var i = 0; i < tailIdx; i += sizeof(Int64))
{
if (BitConverter.ToInt64(array1, i) != BitConverter.ToInt64(array2, i)) return false;
}
//check the remainder of the array, always shorter than 8 bytes
for (var i = tailIdx; i < length; i++)
{
if (array1[i] != array2[i]) return false;
}
return true;
}
Performance compared with some of the other solutions on this page:
Simple Loop: 19837 ticks, 1.00
*BitConverter: 4886 ticks, 4.06
UnsafeCompare: 1636 ticks, 12.12
EqualBytesLongUnrolled: 637 ticks, 31.09
P/Invoke memcmp: 369 ticks, 53.67
Tested in linqpad, 1000000 bytes identical arrays (worst case scenario), 500 iterations each.
It seems that EqualBytesLongUnrolled is the best from the above suggested.
Skipped methods (Enumerable.SequenceEqual,StructuralComparisons.StructuralEqualityComparer.Equals), were not-patient-for-slow. On 265MB arrays I have measured this:
Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-3770 CPU 3.40GHz, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT]
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1590.0
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.0443 ms | 1.1880 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 29.9917 ms | 0.7480 ms | 0.99 | 0.04 |
msvcrt_memcmp | 30.0930 ms | 0.2964 ms | 1.00 | 0.03 |
UnsafeCompare | 31.0520 ms | 0.7072 ms | 1.03 | 0.04 |
ByteArrayCompare | 212.9980 ms | 2.0776 ms | 7.06 | 0.25 |
OS=Windows
Processor=?, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003131
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.1789 ms | 0.0437 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 30.1985 ms | 0.1782 ms | 1.00 | 0.01 |
msvcrt_memcmp | 30.1084 ms | 0.0660 ms | 1.00 | 0.00 |
UnsafeCompare | 31.1845 ms | 0.4051 ms | 1.03 | 0.01 |
ByteArrayCompare | 212.0213 ms | 0.1694 ms | 7.03 | 0.01 |
For comparing short byte arrays the following is an interesting hack:
if(myByteArray1.Length != myByteArray2.Length) return false;
if(myByteArray1.Length == 8)
return BitConverter.ToInt64(myByteArray1, 0) == BitConverter.ToInt64(myByteArray2, 0);
else if(myByteArray.Length == 4)
return BitConverter.ToInt32(myByteArray2, 0) == BitConverter.ToInt32(myByteArray2, 0);
Then I would probably fall out to the solution listed in the question.
It'd be interesting to do a performance analysis of this code.
I have not seen many linq solutions here.
I am not sure of the performance implications, however I generally stick to linq as rule of thumb and then optimize later if necessary.
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
return !array1.Where((t, i) => t != array2[i]).Any();
}
Please do note this only works if they are the same size arrays.
an extension could look like so
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
if (array1.Length != array2.Length) return false;
return !array1.Where((t, i) => t != array2[i]).Any();
}
I thought about block-transfer acceleration methods built into many graphics cards. But then you would have to copy over all the data byte-wise, so this doesn't help you much if you don't want to implement a whole portion of your logic in unmanaged and hardware-dependent code...
Another way of optimization similar to the approach shown above would be to store as much of your data as possible in a long[] rather than a byte[] right from the start, for example if you are reading it sequentially from a binary file, or if you use a memory mapped file, read in data as long[] or single long values. Then, your comparison loop will only need 1/8th of the number of iterations it would have to do for a byte[] containing the same amount of data.
It is a matter of when and how often you need to compare vs. when and how often you need to access the data in a byte-by-byte manner, e.g. to use it in an API call as a parameter in a method that expects a byte[]. In the end, you only can tell if you really know the use case...
Sorry, if you're looking for a managed way you're already doing it correctly and to my knowledge there's no built in method in the BCL for doing this.
You should add some initial null checks and then just reuse it as if it where in BCL.
I settled on a solution inspired by the EqualBytesLongUnrolled method posted by ArekBulski with an additional optimization. In my instance, array differences in arrays tend to be near the tail of the arrays. In testing, I found that when this is the case for large arrays, being able to compare array elements in reverse order gives this solution a huge performance gain over the memcmp based solution. Here is that solution:
public enum CompareDirection { Forward, Backward }
private static unsafe bool UnsafeEquals(byte[] a, byte[] b, CompareDirection direction = CompareDirection.Forward)
{
// returns when a and b are same array or both null
if (a == b) return true;
// if either is null or different lengths, can't be equal
if (a == null || b == null || a.Length != b.Length)
return false;
const int UNROLLED = 16; // count of longs 'unrolled' in optimization
int size = sizeof(long) * UNROLLED; // 128 bytes (min size for 'unrolled' optimization)
int len = a.Length;
int n = len / size; // count of full 128 byte segments
int r = len % size; // count of remaining 'unoptimized' bytes
// pin the arrays and access them via pointers
fixed (byte* pb_a = a, pb_b = b)
{
if (r > 0 && direction == CompareDirection.Backward)
{
byte* pa = pb_a + len - 1;
byte* pb = pb_b + len - 1;
byte* phead = pb_a + len - r;
while(pa >= phead)
{
if (*pa != *pb) return false;
pa--;
pb--;
}
}
if (n > 0)
{
int nOffset = n * size;
if (direction == CompareDirection.Forward)
{
long* pa = (long*)pb_a;
long* pb = (long*)pb_b;
long* ptail = (long*)(pb_a + nOffset);
while (pa < ptail)
{
if (*(pa + 0) != *(pb + 0) || *(pa + 1) != *(pb + 1) ||
*(pa + 2) != *(pb + 2) || *(pa + 3) != *(pb + 3) ||
*(pa + 4) != *(pb + 4) || *(pa + 5) != *(pb + 5) ||
*(pa + 6) != *(pb + 6) || *(pa + 7) != *(pb + 7) ||
*(pa + 8) != *(pb + 8) || *(pa + 9) != *(pb + 9) ||
*(pa + 10) != *(pb + 10) || *(pa + 11) != *(pb + 11) ||
*(pa + 12) != *(pb + 12) || *(pa + 13) != *(pb + 13) ||
*(pa + 14) != *(pb + 14) || *(pa + 15) != *(pb + 15)
)
{
return false;
}
pa += UNROLLED;
pb += UNROLLED;
}
}
else
{
long* pa = (long*)(pb_a + nOffset);
long* pb = (long*)(pb_b + nOffset);
long* phead = (long*)pb_a;
while (phead < pa)
{
if (*(pa - 1) != *(pb - 1) || *(pa - 2) != *(pb - 2) ||
*(pa - 3) != *(pb - 3) || *(pa - 4) != *(pb - 4) ||
*(pa - 5) != *(pb - 5) || *(pa - 6) != *(pb - 6) ||
*(pa - 7) != *(pb - 7) || *(pa - 8) != *(pb - 8) ||
*(pa - 9) != *(pb - 9) || *(pa - 10) != *(pb - 10) ||
*(pa - 11) != *(pb - 11) || *(pa - 12) != *(pb - 12) ||
*(pa - 13) != *(pb - 13) || *(pa - 14) != *(pb - 14) ||
*(pa - 15) != *(pb - 15) || *(pa - 16) != *(pb - 16)
)
{
return false;
}
pa -= UNROLLED;
pb -= UNROLLED;
}
}
}
if (r > 0 && direction == CompareDirection.Forward)
{
byte* pa = pb_a + len - r;
byte* pb = pb_b + len - r;
byte* ptail = pb_a + len;
while(pa < ptail)
{
if (*pa != *pb) return false;
pa++;
pb++;
}
}
}
return true;
}
This is almost certainly much slower than any other version given here, but it was fun to write.
static bool ByteArrayEquals(byte[] a1, byte[] a2)
{
return a1.Zip(a2, (l, r) => l == r).All(x => x);
}
This is similar to others, but the difference here is that there is no falling through to the next highest number of bytes I can check at once, e.g. if I have 63 bytes (in my SIMD example) I can check the equality of the first 32 bytes, and then the last 32 bytes, which is faster than checking 32 bytes, 16 bytes, 8 bytes, and so on. The first check you enter is the only check you will need to compare all of the bytes.
This does come out on top in my tests, but just by a hair.
The following code is exactly how I tested it in airbreather/ArrayComparePerf.cs.
public unsafe bool SIMDNoFallThrough() #requires System.Runtime.Intrinsics.X86
{
if (a1 == null || a2 == null)
return false;
int length0 = a1.Length;
if (length0 != a2.Length) return false;
fixed (byte* b00 = a1, b01 = a2)
{
byte* b0 = b00, b1 = b01, last0 = b0 + length0, last1 = b1 + length0, last32 = last0 - 31;
if (length0 > 31)
{
while (b0 < last32)
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != -1)
return false;
b0 += 32;
b1 += 32;
}
return Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(last0 - 32), Avx.LoadVector256(last1 - 32))) == -1;
}
if (length0 > 15)
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != 65535)
return false;
return Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(last0 - 16), Sse2.LoadVector128(last1 - 16))) == 65535;
}
if (length0 > 7)
{
if (*(ulong*)b0 != *(ulong*)b1)
return false;
return *(ulong*)(last0 - 8) == *(ulong*)(last1 - 8);
}
if (length0 > 3)
{
if (*(uint*)b0 != *(uint*)b1)
return false;
return *(uint*)(last0 - 4) == *(uint*)(last1 - 4);
}
if (length0 > 1)
{
if (*(ushort*)b0 != *(ushort*)b1)
return false;
return *(ushort*)(last0 - 2) == *(ushort*)(last1 - 2);
}
return *b0 == *b1;
}
}
If no SIMD is preferred, the same method applied to the the existing LongPointers algorithm:
public unsafe bool LongPointersNoFallThrough()
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
return false;
fixed (byte* p1 = a1, p2 = a2)
{
byte* x1 = p1, x2 = p2;
int l = a1.Length;
if ((l & 8) != 0)
{
for (int i = 0; i < l / 8; i++, x1 += 8, x2 += 8)
if (*(long*)x1 != *(long*)x2) return false;
return *(long*)(x1 + (l - 8)) == *(long*)(x2 + (l - 8));
}
if ((l & 4) != 0)
{
if (*(int*)x1 != *(int*)x2) return false; x1 += 4; x2 += 4;
return *(int*)(x1 + (l - 4)) == *(int*)(x2 + (l - 4));
}
if ((l & 2) != 0)
{
if (*(short*)x1 != *(short*)x2) return false; x1 += 2; x2 += 2;
return *(short*)(x1 + (l - 2)) == *(short*)(x2 + (l - 2));
}
return *x1 == *x2;
}
}
If you are looking for a very fast byte array equality comparer, I suggest you take a look at this STSdb Labs article: Byte array equality comparer. It features some of the fastest implementations for byte[] array equality comparing, which are presented, performance tested and summarized.
You can also focus on these implementations:
BigEndianByteArrayComparer - fast byte[] array comparer from left to right (BigEndian)
BigEndianByteArrayEqualityComparer - - fast byte[] equality comparer from left to right (BigEndian)
LittleEndianByteArrayComparer - fast byte[] array comparer from right to left (LittleEndian)
LittleEndianByteArrayEqualityComparer - fast byte[] equality comparer from right to left (LittleEndian)
Use SequenceEquals for this to comparison.
The short answer is this:
public bool Compare(byte[] b1, byte[] b2)
{
return Encoding.ASCII.GetString(b1) == Encoding.ASCII.GetString(b2);
}
In such a way you can use the optimized .NET string compare to make a byte array compare without the need to write unsafe code. This is how it is done in the background:
private unsafe static bool EqualsHelper(String strA, String strB)
{
Contract.Requires(strA != null);
Contract.Requires(strB != null);
Contract.Requires(strA.Length == strB.Length);
int length = strA.Length;
fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
{
char* a = ap;
char* b = bp;
// Unroll the loop
#if AMD64
// For the AMD64 bit platform we unroll by 12 and
// check three qwords at a time. This is less code
// than the 32 bit case and is shorter
// pathlength.
while (length >= 12)
{
if (*(long*)a != *(long*)b) return false;
if (*(long*)(a+4) != *(long*)(b+4)) return false;
if (*(long*)(a+8) != *(long*)(b+8)) return false;
a += 12; b += 12; length -= 12;
}
#else
while (length >= 10)
{
if (*(int*)a != *(int*)b) return false;
if (*(int*)(a+2) != *(int*)(b+2)) return false;
if (*(int*)(a+4) != *(int*)(b+4)) return false;
if (*(int*)(a+6) != *(int*)(b+6)) return false;
if (*(int*)(a+8) != *(int*)(b+8)) return false;
a += 10; b += 10; length -= 10;
}
#endif
// This depends on the fact that the String objects are
// always zero terminated and that the terminating zero is not included
// in the length. For odd string sizes, the last compare will include
// the zero terminator.
while (length > 0)
{
if (*(int*)a != *(int*)b) break;
a += 2; b += 2; length -= 2;
}
return (length <= 0);
}
}
Since many of the fancy solutions above don't work with UWP and because I love Linq and functional approaches I pressent you my version to this problem.
To escape the comparison when the first difference occures, I chose .FirstOrDefault()
public static bool CompareByteArrays(byte[] ba0, byte[] ba1) =>
!(ba0.Length != ba1.Length || Enumerable.Range(1,ba0.Length)
.FirstOrDefault(n => ba0[n] != ba1[n]) > 0);

Comparing two byte arrays in .NET

How can I do this fast?
Sure I can do this:
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length)
return false;
for (int i=0; i<a1.Length; i++)
if (a1[i]!=a2[i])
return false;
return true;
}
But I'm looking for either a BCL function or some highly optimized proven way to do this.
java.util.Arrays.equals((sbyte[])(Array)a1, (sbyte[])(Array)a2);
works nicely, but it doesn't look like that would work for x64.
Note my super-fast answer here.
You can use Enumerable.SequenceEqual method.
using System;
using System.Linq;
...
var a1 = new int[] { 1, 2, 3};
var a2 = new int[] { 1, 2, 3};
var a3 = new int[] { 1, 2, 4};
var x = a1.SequenceEqual(a2); // true
var y = a1.SequenceEqual(a3); // false
If you can't use .NET 3.5 for some reason, your method is OK.
Compiler\run-time environment will optimize your loop so you don't need to worry about performance.
P/Invoke powers activate!
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static bool ByteArrayCompare(byte[] b1, byte[] b2)
{
// Validate buffers are the same length.
// This also ensures that the count does not exceed the length of either buffer.
return b1.Length == b2.Length && memcmp(b1, b2, b1.Length) == 0;
}
Span<T> offers an extremely competitive alternative without having to throw confusing and/or non-portable fluff into your own application's code base:
// byte[] is implicitly convertible to ReadOnlySpan<byte>
static bool ByteArrayCompare(ReadOnlySpan<byte> a1, ReadOnlySpan<byte> a2)
{
return a1.SequenceEqual(a2);
}
The (guts of the) implementation as of .NET 6.0.4 can be found here.
I've revised #EliArbel's gist to add this method as SpansEqual, drop most of the less interesting performers in others' benchmarks, run it with different array sizes, output graphs, and mark SpansEqual as the baseline so that it reports how the different methods compare to SpansEqual.
The below numbers are from the results, lightly edited to remove "Error" column.
| Method | ByteCount | Mean | StdDev | Ratio | RatioSD |
|-------------- |----------- |-------------------:|----------------:|------:|--------:|
| SpansEqual | 15 | 2.074 ns | 0.0233 ns | 1.00 | 0.00 |
| LongPointers | 15 | 2.854 ns | 0.0632 ns | 1.38 | 0.03 |
| Unrolled | 15 | 12.449 ns | 0.2487 ns | 6.00 | 0.13 |
| PInvokeMemcmp | 15 | 7.525 ns | 0.1057 ns | 3.63 | 0.06 |
| | | | | | |
| SpansEqual | 1026 | 15.629 ns | 0.1712 ns | 1.00 | 0.00 |
| LongPointers | 1026 | 46.487 ns | 0.2938 ns | 2.98 | 0.04 |
| Unrolled | 1026 | 23.786 ns | 0.1044 ns | 1.52 | 0.02 |
| PInvokeMemcmp | 1026 | 28.299 ns | 0.2781 ns | 1.81 | 0.03 |
| | | | | | |
| SpansEqual | 1048585 | 17,920.329 ns | 153.0750 ns | 1.00 | 0.00 |
| LongPointers | 1048585 | 42,077.448 ns | 309.9067 ns | 2.35 | 0.02 |
| Unrolled | 1048585 | 29,084.901 ns | 428.8496 ns | 1.62 | 0.03 |
| PInvokeMemcmp | 1048585 | 30,847.572 ns | 213.3162 ns | 1.72 | 0.02 |
| | | | | | |
| SpansEqual | 2147483591 | 124,752,376.667 ns | 552,281.0202 ns | 1.00 | 0.00 |
| LongPointers | 2147483591 | 139,477,269.231 ns | 331,458.5429 ns | 1.12 | 0.00 |
| Unrolled | 2147483591 | 137,617,423.077 ns | 238,349.5093 ns | 1.10 | 0.00 |
| PInvokeMemcmp | 2147483591 | 138,373,253.846 ns | 288,447.8278 ns | 1.11 | 0.01 |
I was surprised to see SpansEqual not come out on top for the max-array-size methods, but the difference is so minor that I don't think it'll ever matter. After refreshing to run on .NET 6.0.4 with my newer hardware, SpansEqual now comfortably outperforms all others at all array sizes.
My system info:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.202
[Host] : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
There's a new built-in solution for this in .NET 4 - IStructuralEquatable
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(a1, a2);
}
Edit: modern fast way is to use a1.SequenceEquals(a2)
User gil suggested unsafe code which spawned this solution:
// Copyright (c) 2008-2013 Hafthor Stefansson
// Distributed under the MIT/X11 software license
// Ref: http://www.opensource.org/licenses/mit-license.php.
static unsafe bool UnsafeCompare(byte[] a1, byte[] a2) {
unchecked {
if(a1==a2) return true;
if(a1==null || a2==null || a1.Length!=a2.Length)
return false;
fixed (byte* p1=a1, p2=a2) {
byte* x1=p1, x2=p2;
int l = a1.Length;
for (int i=0; i < l/8; i++, x1+=8, x2+=8)
if (*((long*)x1) != *((long*)x2)) return false;
if ((l & 4)!=0) { if (*((int*)x1)!=*((int*)x2)) return false; x1+=4; x2+=4; }
if ((l & 2)!=0) { if (*((short*)x1)!=*((short*)x2)) return false; x1+=2; x2+=2; }
if ((l & 1)!=0) if (*((byte*)x1) != *((byte*)x2)) return false;
return true;
}
}
}
which does 64-bit based comparison for as much of the array as possible. This kind of counts on the fact that the arrays start qword aligned. It'll work if not qword aligned, just not as fast as if it were.
It performs about seven timers faster than the simple `for` loop. Using the J# library performed equivalently to the original `for` loop. Using .SequenceEqual runs around seven times slower; I think just because it is using IEnumerator.MoveNext. I imagine LINQ-based solutions being at least that slow or worse.
If you are not opposed to doing it, you can import the J# assembly "vjslib.dll" and use its Arrays.equals(byte[], byte[]) method...
Don't blame me if someone laughs at you though...
EDIT: For what little it is worth, I used Reflector to disassemble the code for that, and here is what it looks like:
public static bool equals(sbyte[] a1, sbyte[] a2)
{
if (a1 == a2)
{
return true;
}
if ((a1 != null) && (a2 != null))
{
if (a1.Length != a2.Length)
{
return false;
}
for (int i = 0; i < a1.Length; i++)
{
if (a1[i] != a2[i])
{
return false;
}
}
return true;
}
return false;
}
.NET 3.5 and newer have a new public type, System.Data.Linq.Binary that encapsulates byte[]. It implements IEquatable<Binary> that (in effect) compares two byte arrays. Note that System.Data.Linq.Binary also has implicit conversion operator from byte[].
MSDN documentation:System.Data.Linq.Binary
Reflector decompile of the Equals method:
private bool EqualsTo(Binary binary)
{
if (this != binary)
{
if (binary == null)
{
return false;
}
if (this.bytes.Length != binary.bytes.Length)
{
return false;
}
if (this.hashCode != binary.hashCode)
{
return false;
}
int index = 0;
int length = this.bytes.Length;
while (index < length)
{
if (this.bytes[index] != binary.bytes[index])
{
return false;
}
index++;
}
}
return true;
}
Interesting twist is that they only proceed to byte-by-byte comparison loop if hashes of the two Binary objects are the same. This, however, comes at the cost of computing the hash in constructor of Binary objects (by traversing the array with for loop :-) ).
The above implementation means that in the worst case you may have to traverse the arrays three times: first to compute hash of array1, then to compute hash of array2 and finally (because this is the worst case scenario, lengths and hashes equal) to compare bytes in array1 with bytes in array 2.
Overall, even though System.Data.Linq.Binary is built into BCL, I don't think it is the fastest way to compare two byte arrays :-|.
I posted a similar question about checking if byte[] is full of zeroes. (SIMD code was beaten so I removed it from this answer.) Here is fastest code from my comparisons:
static unsafe bool EqualBytesLongUnrolled (byte[] data1, byte[] data2)
{
if (data1 == data2)
return true;
if (data1.Length != data2.Length)
return false;
fixed (byte* bytes1 = data1, bytes2 = data2) {
int len = data1.Length;
int rem = len % (sizeof(long) * 16);
long* b1 = (long*)bytes1;
long* b2 = (long*)bytes2;
long* e1 = (long*)(bytes1 + len - rem);
while (b1 < e1) {
if (*(b1) != *(b2) || *(b1 + 1) != *(b2 + 1) ||
*(b1 + 2) != *(b2 + 2) || *(b1 + 3) != *(b2 + 3) ||
*(b1 + 4) != *(b2 + 4) || *(b1 + 5) != *(b2 + 5) ||
*(b1 + 6) != *(b2 + 6) || *(b1 + 7) != *(b2 + 7) ||
*(b1 + 8) != *(b2 + 8) || *(b1 + 9) != *(b2 + 9) ||
*(b1 + 10) != *(b2 + 10) || *(b1 + 11) != *(b2 + 11) ||
*(b1 + 12) != *(b2 + 12) || *(b1 + 13) != *(b2 + 13) ||
*(b1 + 14) != *(b2 + 14) || *(b1 + 15) != *(b2 + 15))
return false;
b1 += 16;
b2 += 16;
}
for (int i = 0; i < rem; i++)
if (data1 [len - 1 - i] != data2 [len - 1 - i])
return false;
return true;
}
}
Measured on two 256MB byte arrays:
UnsafeCompare : 86,8784 ms
EqualBytesSimd : 71,5125 ms
EqualBytesSimdUnrolled : 73,1917 ms
EqualBytesLongUnrolled : 39,8623 ms
using System.Linq; //SequenceEqual
byte[] ByteArray1 = null;
byte[] ByteArray2 = null;
ByteArray1 = MyFunct1();
ByteArray2 = MyFunct2();
if (ByteArray1.SequenceEqual<byte>(ByteArray2) == true)
{
MessageBox.Show("Match");
}
else
{
MessageBox.Show("Don't match");
}
Let's add one more!
Recently Microsoft released a special NuGet package, System.Runtime.CompilerServices.Unsafe. It's special because it's written in IL, and provides low-level functionality not directly available in C#.
One of its methods, Unsafe.As<T>(object) allows casting any reference type to another reference type, skipping any safety checks. This is usually a very bad idea, but if both types have the same structure, it can work. So we can use this to cast a byte[] to a long[]:
bool CompareWithUnsafeLibrary(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length) return false;
var longSize = (int)Math.Floor(a1.Length / 8.0);
var long1 = Unsafe.As<long[]>(a1);
var long2 = Unsafe.As<long[]>(a2);
for (var i = 0; i < longSize; i++)
{
if (long1[i] != long2[i]) return false;
}
for (var i = longSize * 8; i < a1.Length; i++)
{
if (a1[i] != a2[i]) return false;
}
return true;
}
Note that long1.Length would still return the original array's length, since it's stored in a field in the array's memory structure.
This method is not quite as fast as other methods demonstrated here, but it is a lot faster than the naive method, doesn't use unsafe code or P/Invoke or pinning, and the implementation is quite straightforward (IMO). Here are some BenchmarkDotNet results from my machine:
BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4870HQ CPU 2.50GHz, ProcessorCount=8
Frequency=2435775 Hz, Resolution=410.5470 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
Method | Mean | StdDev |
----------------------- |-------------- |---------- |
UnsafeLibrary | 125.8229 ns | 0.3588 ns |
UnsafeCompare | 89.9036 ns | 0.8243 ns |
JSharpEquals | 1,432.1717 ns | 1.3161 ns |
EqualBytesLongUnrolled | 43.7863 ns | 0.8923 ns |
NewMemCmp | 65.4108 ns | 0.2202 ns |
ArraysEqual | 910.8372 ns | 2.6082 ns |
PInvokeMemcmp | 52.7201 ns | 0.1105 ns |
I've also created a gist with all the tests.
I developed a method that slightly beats memcmp() (plinth's answer) and very slighly beats EqualBytesLongUnrolled() (Arek Bulski's answer) on my PC. Basically, it unrolls the loop by 4 instead of 8.
Update 30 Mar. 2019:
Starting in .NET core 3.0, we have SIMD support!
This solution is fastest by a considerable margin on my PC:
#if NETCOREAPP3_0
using System.Runtime.Intrinsics.X86;
#endif
…
public static unsafe bool Compare(byte[] arr0, byte[] arr1)
{
if (arr0 == arr1)
{
return true;
}
if (arr0 == null || arr1 == null)
{
return false;
}
if (arr0.Length != arr1.Length)
{
return false;
}
if (arr0.Length == 0)
{
return true;
}
fixed (byte* b0 = arr0, b1 = arr1)
{
#if NETCOREAPP3_0
if (Avx2.IsSupported)
{
return Compare256(b0, b1, arr0.Length);
}
else if (Sse2.IsSupported)
{
return Compare128(b0, b1, arr0.Length);
}
else
#endif
{
return Compare64(b0, b1, arr0.Length);
}
}
}
#if NETCOREAPP3_0
public static unsafe bool Compare256(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus128 = lastAddr - 128;
const int mask = -1;
while (b0 < lastAddrMinus128) // unroll the loop so that we are comparing 128 bytes at a time.
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 32), Avx.LoadVector256(b1 + 32))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 64), Avx.LoadVector256(b1 + 64))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 96), Avx.LoadVector256(b1 + 96))) != mask)
{
return false;
}
b0 += 128;
b1 += 128;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
public static unsafe bool Compare128(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus64 = lastAddr - 64;
const int mask = 0xFFFF;
while (b0 < lastAddrMinus64) // unroll the loop so that we are comparing 64 bytes at a time.
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 16), Sse2.LoadVector128(b1 + 16))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 32), Sse2.LoadVector128(b1 + 32))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 48), Sse2.LoadVector128(b1 + 48))) != mask)
{
return false;
}
b0 += 64;
b1 += 64;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
#endif
public static unsafe bool Compare64(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus32 = lastAddr - 32;
while (b0 < lastAddrMinus32) // unroll the loop so that we are comparing 32 bytes at a time.
{
if (*(ulong*)b0 != *(ulong*)b1) return false;
if (*(ulong*)(b0 + 8) != *(ulong*)(b1 + 8)) return false;
if (*(ulong*)(b0 + 16) != *(ulong*)(b1 + 16)) return false;
if (*(ulong*)(b0 + 24) != *(ulong*)(b1 + 24)) return false;
b0 += 32;
b1 += 32;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
I would use unsafe code and run the for loop comparing Int32 pointers.
Maybe you should also consider checking the arrays to be non-null.
If you look at how .NET does string.Equals, you see that it uses a private method called EqualsHelper which has an "unsafe" pointer implementation. .NET Reflector is your friend to see how things are done internally.
This can be used as a template for byte array comparison which I did an implementation on in blog post Fast byte array comparison in C#. I also did some rudimentary benchmarks to see when a safe implementation is faster than the unsafe.
That said, unless you really need killer performance, I'd go for a simple fr loop comparison.
For those of you that care about order (i.e. want your memcmp to return an int like it should instead of nothing), .NET Core 3.0 (and presumably .NET Standard 2.1 aka .NET 5.0) will include a Span.SequenceCompareTo(...) extension method (plus a Span.SequenceEqualTo) that can be used to compare two ReadOnlySpan<T> instances (where T: IComparable<T>).
In the original GitHub proposal, the discussion included approach comparisons with jump table calculations, reading a byte[] as long[], SIMD usage, and p/invoke to the CLR implementation's memcmp.
Going forward, this should be your go-to method for comparing byte arrays or byte ranges (as should using Span<byte> instead of byte[] for your .NET Standard 2.1 APIs), and it is sufficiently fast enough that you should no longer care about optimizing it (and no, despite the similarities in name it does not perform as abysmally as the horrid Enumerable.SequenceEqual).
#if NETCOREAPP3_0_OR_GREATER
// Using the platform-native Span<T>.SequenceEqual<T>(..)
public static int Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
var span1 = range1.AsSpan(offset1, count);
var span2 = range2.AsSpan(offset2, count);
return span1.SequenceCompareTo(span2);
// or, if you don't care about ordering
// return span1.SequenceEqual(span2);
}
#else
// The most basic implementation, in platform-agnostic, safe C#
public static bool Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
// Working backwards lets the compiler optimize away bound checking after the first loop
for (int i = count - 1; i >= 0; --i)
{
if (range1[offset1 + i] != range2[offset2 + i])
{
return false;
}
}
return true;
}
#endif
I did some measurements using attached program .net 4.7 release build without the debugger attached. I think people have been using the wrong metric since what you are about if you care about speed here is how long it takes to figure out if two byte arrays are equal. i.e. throughput in bytes.
StructuralComparison : 4.6 MiB/s
for : 274.5 MiB/s
ToUInt32 : 263.6 MiB/s
ToUInt64 : 474.9 MiB/s
memcmp : 8500.8 MiB/s
As you can see, there's no better way than memcmp and it's orders of magnitude faster. A simple for loop is the second best option. And it still boggles my mind why Microsoft cannot simply include a Buffer.Compare method.
[Program.cs]:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace memcmp
{
class Program
{
static byte[] TestVector(int size)
{
var data = new byte[size];
using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider())
{
rng.GetBytes(data);
}
return data;
}
static TimeSpan Measure(string testCase, TimeSpan offset, Action action, bool ignore = false)
{
var t = Stopwatch.StartNew();
var n = 0L;
while (t.Elapsed < TimeSpan.FromSeconds(10))
{
action();
n++;
}
var elapsed = t.Elapsed - offset;
if (!ignore)
{
Console.WriteLine($"{testCase,-16} : {n / elapsed.TotalSeconds,16:0.0} MiB/s");
}
return elapsed;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static void Main(string[] args)
{
// how quickly can we establish if two sequences of bytes are equal?
// note that we are testing the speed of different comparsion methods
var a = TestVector(1024 * 1024); // 1 MiB
var b = (byte[])a.Clone();
// was meant to offset the overhead of everything but copying but my attempt was a horrible mistake... should have reacted sooner due to the initially ridiculous throughput values...
// Measure("offset", new TimeSpan(), () => { return; }, ignore: true);
var offset = TimeZone.Zero
Measure("StructuralComparison", offset, () =>
{
StructuralComparisons.StructuralEqualityComparer.Equals(a, b);
});
Measure("for", offset, () =>
{
for (int i = 0; i < a.Length; i++)
{
if (a[i] != b[i]) break;
}
});
Measure("ToUInt32", offset, () =>
{
for (int i = 0; i < a.Length; i += 4)
{
if (BitConverter.ToUInt32(a, i) != BitConverter.ToUInt32(b, i)) break;
}
});
Measure("ToUInt64", offset, () =>
{
for (int i = 0; i < a.Length; i += 8)
{
if (BitConverter.ToUInt64(a, i) != BitConverter.ToUInt64(b, i)) break;
}
});
Measure("memcmp", offset, () =>
{
memcmp(a, b, a.Length);
});
}
}
}
Couldn't find a solution I'm completely happy with (reasonable performance, but no unsafe code/pinvoke) so I came up with this, nothing really original, but works:
/// <summary>
///
/// </summary>
/// <param name="array1"></param>
/// <param name="array2"></param>
/// <param name="bytesToCompare"> 0 means compare entire arrays</param>
/// <returns></returns>
public static bool ArraysEqual(byte[] array1, byte[] array2, int bytesToCompare = 0)
{
if (array1.Length != array2.Length) return false;
var length = (bytesToCompare == 0) ? array1.Length : bytesToCompare;
var tailIdx = length - length % sizeof(Int64);
//check in 8 byte chunks
for (var i = 0; i < tailIdx; i += sizeof(Int64))
{
if (BitConverter.ToInt64(array1, i) != BitConverter.ToInt64(array2, i)) return false;
}
//check the remainder of the array, always shorter than 8 bytes
for (var i = tailIdx; i < length; i++)
{
if (array1[i] != array2[i]) return false;
}
return true;
}
Performance compared with some of the other solutions on this page:
Simple Loop: 19837 ticks, 1.00
*BitConverter: 4886 ticks, 4.06
UnsafeCompare: 1636 ticks, 12.12
EqualBytesLongUnrolled: 637 ticks, 31.09
P/Invoke memcmp: 369 ticks, 53.67
Tested in linqpad, 1000000 bytes identical arrays (worst case scenario), 500 iterations each.
It seems that EqualBytesLongUnrolled is the best from the above suggested.
Skipped methods (Enumerable.SequenceEqual,StructuralComparisons.StructuralEqualityComparer.Equals), were not-patient-for-slow. On 265MB arrays I have measured this:
Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-3770 CPU 3.40GHz, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT]
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1590.0
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.0443 ms | 1.1880 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 29.9917 ms | 0.7480 ms | 0.99 | 0.04 |
msvcrt_memcmp | 30.0930 ms | 0.2964 ms | 1.00 | 0.03 |
UnsafeCompare | 31.0520 ms | 0.7072 ms | 1.03 | 0.04 |
ByteArrayCompare | 212.9980 ms | 2.0776 ms | 7.06 | 0.25 |
OS=Windows
Processor=?, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003131
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.1789 ms | 0.0437 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 30.1985 ms | 0.1782 ms | 1.00 | 0.01 |
msvcrt_memcmp | 30.1084 ms | 0.0660 ms | 1.00 | 0.00 |
UnsafeCompare | 31.1845 ms | 0.4051 ms | 1.03 | 0.01 |
ByteArrayCompare | 212.0213 ms | 0.1694 ms | 7.03 | 0.01 |
For comparing short byte arrays the following is an interesting hack:
if(myByteArray1.Length != myByteArray2.Length) return false;
if(myByteArray1.Length == 8)
return BitConverter.ToInt64(myByteArray1, 0) == BitConverter.ToInt64(myByteArray2, 0);
else if(myByteArray.Length == 4)
return BitConverter.ToInt32(myByteArray2, 0) == BitConverter.ToInt32(myByteArray2, 0);
Then I would probably fall out to the solution listed in the question.
It'd be interesting to do a performance analysis of this code.
I have not seen many linq solutions here.
I am not sure of the performance implications, however I generally stick to linq as rule of thumb and then optimize later if necessary.
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
return !array1.Where((t, i) => t != array2[i]).Any();
}
Please do note this only works if they are the same size arrays.
an extension could look like so
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
if (array1.Length != array2.Length) return false;
return !array1.Where((t, i) => t != array2[i]).Any();
}
I thought about block-transfer acceleration methods built into many graphics cards. But then you would have to copy over all the data byte-wise, so this doesn't help you much if you don't want to implement a whole portion of your logic in unmanaged and hardware-dependent code...
Another way of optimization similar to the approach shown above would be to store as much of your data as possible in a long[] rather than a byte[] right from the start, for example if you are reading it sequentially from a binary file, or if you use a memory mapped file, read in data as long[] or single long values. Then, your comparison loop will only need 1/8th of the number of iterations it would have to do for a byte[] containing the same amount of data.
It is a matter of when and how often you need to compare vs. when and how often you need to access the data in a byte-by-byte manner, e.g. to use it in an API call as a parameter in a method that expects a byte[]. In the end, you only can tell if you really know the use case...
Sorry, if you're looking for a managed way you're already doing it correctly and to my knowledge there's no built in method in the BCL for doing this.
You should add some initial null checks and then just reuse it as if it where in BCL.
I settled on a solution inspired by the EqualBytesLongUnrolled method posted by ArekBulski with an additional optimization. In my instance, array differences in arrays tend to be near the tail of the arrays. In testing, I found that when this is the case for large arrays, being able to compare array elements in reverse order gives this solution a huge performance gain over the memcmp based solution. Here is that solution:
public enum CompareDirection { Forward, Backward }
private static unsafe bool UnsafeEquals(byte[] a, byte[] b, CompareDirection direction = CompareDirection.Forward)
{
// returns when a and b are same array or both null
if (a == b) return true;
// if either is null or different lengths, can't be equal
if (a == null || b == null || a.Length != b.Length)
return false;
const int UNROLLED = 16; // count of longs 'unrolled' in optimization
int size = sizeof(long) * UNROLLED; // 128 bytes (min size for 'unrolled' optimization)
int len = a.Length;
int n = len / size; // count of full 128 byte segments
int r = len % size; // count of remaining 'unoptimized' bytes
// pin the arrays and access them via pointers
fixed (byte* pb_a = a, pb_b = b)
{
if (r > 0 && direction == CompareDirection.Backward)
{
byte* pa = pb_a + len - 1;
byte* pb = pb_b + len - 1;
byte* phead = pb_a + len - r;
while(pa >= phead)
{
if (*pa != *pb) return false;
pa--;
pb--;
}
}
if (n > 0)
{
int nOffset = n * size;
if (direction == CompareDirection.Forward)
{
long* pa = (long*)pb_a;
long* pb = (long*)pb_b;
long* ptail = (long*)(pb_a + nOffset);
while (pa < ptail)
{
if (*(pa + 0) != *(pb + 0) || *(pa + 1) != *(pb + 1) ||
*(pa + 2) != *(pb + 2) || *(pa + 3) != *(pb + 3) ||
*(pa + 4) != *(pb + 4) || *(pa + 5) != *(pb + 5) ||
*(pa + 6) != *(pb + 6) || *(pa + 7) != *(pb + 7) ||
*(pa + 8) != *(pb + 8) || *(pa + 9) != *(pb + 9) ||
*(pa + 10) != *(pb + 10) || *(pa + 11) != *(pb + 11) ||
*(pa + 12) != *(pb + 12) || *(pa + 13) != *(pb + 13) ||
*(pa + 14) != *(pb + 14) || *(pa + 15) != *(pb + 15)
)
{
return false;
}
pa += UNROLLED;
pb += UNROLLED;
}
}
else
{
long* pa = (long*)(pb_a + nOffset);
long* pb = (long*)(pb_b + nOffset);
long* phead = (long*)pb_a;
while (phead < pa)
{
if (*(pa - 1) != *(pb - 1) || *(pa - 2) != *(pb - 2) ||
*(pa - 3) != *(pb - 3) || *(pa - 4) != *(pb - 4) ||
*(pa - 5) != *(pb - 5) || *(pa - 6) != *(pb - 6) ||
*(pa - 7) != *(pb - 7) || *(pa - 8) != *(pb - 8) ||
*(pa - 9) != *(pb - 9) || *(pa - 10) != *(pb - 10) ||
*(pa - 11) != *(pb - 11) || *(pa - 12) != *(pb - 12) ||
*(pa - 13) != *(pb - 13) || *(pa - 14) != *(pb - 14) ||
*(pa - 15) != *(pb - 15) || *(pa - 16) != *(pb - 16)
)
{
return false;
}
pa -= UNROLLED;
pb -= UNROLLED;
}
}
}
if (r > 0 && direction == CompareDirection.Forward)
{
byte* pa = pb_a + len - r;
byte* pb = pb_b + len - r;
byte* ptail = pb_a + len;
while(pa < ptail)
{
if (*pa != *pb) return false;
pa++;
pb++;
}
}
}
return true;
}
This is almost certainly much slower than any other version given here, but it was fun to write.
static bool ByteArrayEquals(byte[] a1, byte[] a2)
{
return a1.Zip(a2, (l, r) => l == r).All(x => x);
}
This is similar to others, but the difference here is that there is no falling through to the next highest number of bytes I can check at once, e.g. if I have 63 bytes (in my SIMD example) I can check the equality of the first 32 bytes, and then the last 32 bytes, which is faster than checking 32 bytes, 16 bytes, 8 bytes, and so on. The first check you enter is the only check you will need to compare all of the bytes.
This does come out on top in my tests, but just by a hair.
The following code is exactly how I tested it in airbreather/ArrayComparePerf.cs.
public unsafe bool SIMDNoFallThrough() #requires System.Runtime.Intrinsics.X86
{
if (a1 == null || a2 == null)
return false;
int length0 = a1.Length;
if (length0 != a2.Length) return false;
fixed (byte* b00 = a1, b01 = a2)
{
byte* b0 = b00, b1 = b01, last0 = b0 + length0, last1 = b1 + length0, last32 = last0 - 31;
if (length0 > 31)
{
while (b0 < last32)
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != -1)
return false;
b0 += 32;
b1 += 32;
}
return Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(last0 - 32), Avx.LoadVector256(last1 - 32))) == -1;
}
if (length0 > 15)
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != 65535)
return false;
return Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(last0 - 16), Sse2.LoadVector128(last1 - 16))) == 65535;
}
if (length0 > 7)
{
if (*(ulong*)b0 != *(ulong*)b1)
return false;
return *(ulong*)(last0 - 8) == *(ulong*)(last1 - 8);
}
if (length0 > 3)
{
if (*(uint*)b0 != *(uint*)b1)
return false;
return *(uint*)(last0 - 4) == *(uint*)(last1 - 4);
}
if (length0 > 1)
{
if (*(ushort*)b0 != *(ushort*)b1)
return false;
return *(ushort*)(last0 - 2) == *(ushort*)(last1 - 2);
}
return *b0 == *b1;
}
}
If no SIMD is preferred, the same method applied to the the existing LongPointers algorithm:
public unsafe bool LongPointersNoFallThrough()
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
return false;
fixed (byte* p1 = a1, p2 = a2)
{
byte* x1 = p1, x2 = p2;
int l = a1.Length;
if ((l & 8) != 0)
{
for (int i = 0; i < l / 8; i++, x1 += 8, x2 += 8)
if (*(long*)x1 != *(long*)x2) return false;
return *(long*)(x1 + (l - 8)) == *(long*)(x2 + (l - 8));
}
if ((l & 4) != 0)
{
if (*(int*)x1 != *(int*)x2) return false; x1 += 4; x2 += 4;
return *(int*)(x1 + (l - 4)) == *(int*)(x2 + (l - 4));
}
if ((l & 2) != 0)
{
if (*(short*)x1 != *(short*)x2) return false; x1 += 2; x2 += 2;
return *(short*)(x1 + (l - 2)) == *(short*)(x2 + (l - 2));
}
return *x1 == *x2;
}
}
If you are looking for a very fast byte array equality comparer, I suggest you take a look at this STSdb Labs article: Byte array equality comparer. It features some of the fastest implementations for byte[] array equality comparing, which are presented, performance tested and summarized.
You can also focus on these implementations:
BigEndianByteArrayComparer - fast byte[] array comparer from left to right (BigEndian)
BigEndianByteArrayEqualityComparer - - fast byte[] equality comparer from left to right (BigEndian)
LittleEndianByteArrayComparer - fast byte[] array comparer from right to left (LittleEndian)
LittleEndianByteArrayEqualityComparer - fast byte[] equality comparer from right to left (LittleEndian)
Use SequenceEquals for this to comparison.
The short answer is this:
public bool Compare(byte[] b1, byte[] b2)
{
return Encoding.ASCII.GetString(b1) == Encoding.ASCII.GetString(b2);
}
In such a way you can use the optimized .NET string compare to make a byte array compare without the need to write unsafe code. This is how it is done in the background:
private unsafe static bool EqualsHelper(String strA, String strB)
{
Contract.Requires(strA != null);
Contract.Requires(strB != null);
Contract.Requires(strA.Length == strB.Length);
int length = strA.Length;
fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
{
char* a = ap;
char* b = bp;
// Unroll the loop
#if AMD64
// For the AMD64 bit platform we unroll by 12 and
// check three qwords at a time. This is less code
// than the 32 bit case and is shorter
// pathlength.
while (length >= 12)
{
if (*(long*)a != *(long*)b) return false;
if (*(long*)(a+4) != *(long*)(b+4)) return false;
if (*(long*)(a+8) != *(long*)(b+8)) return false;
a += 12; b += 12; length -= 12;
}
#else
while (length >= 10)
{
if (*(int*)a != *(int*)b) return false;
if (*(int*)(a+2) != *(int*)(b+2)) return false;
if (*(int*)(a+4) != *(int*)(b+4)) return false;
if (*(int*)(a+6) != *(int*)(b+6)) return false;
if (*(int*)(a+8) != *(int*)(b+8)) return false;
a += 10; b += 10; length -= 10;
}
#endif
// This depends on the fact that the String objects are
// always zero terminated and that the terminating zero is not included
// in the length. For odd string sizes, the last compare will include
// the zero terminator.
while (length > 0)
{
if (*(int*)a != *(int*)b) break;
a += 2; b += 2; length -= 2;
}
return (length <= 0);
}
}
Since many of the fancy solutions above don't work with UWP and because I love Linq and functional approaches I pressent you my version to this problem.
To escape the comparison when the first difference occures, I chose .FirstOrDefault()
public static bool CompareByteArrays(byte[] ba0, byte[] ba1) =>
!(ba0.Length != ba1.Length || Enumerable.Range(1,ba0.Length)
.FirstOrDefault(n => ba0[n] != ba1[n]) > 0);

Categories