Checking if Bytes are 0x00 - c#

What is the most readable (and idiomatic) to write this method?
private bool BytesAreValid(byte[] bytes) {
var t = (bytes[0] | bytes[1] | bytes[2]);
return t != 0;
}
I need a function which tests the first three bytes of a file that it's not begin with 00 00 00.
Haven't done much byte manipulation. The code above doesn't seem correct to me, since t is inferred of type Int32.

t is type-inferred to be an Int32
Yup, because the | operator (like most operators) isn't defined for byte - the bytes are promoted to int values. (See section 7.11.1 of the C# 4 spec for details.)
But given that you only want to compare it with 0, that's fine anyway.
Personally I'd just write it as:
return bytes[0] != 0 && bytes[1] != 0 && bytes[2] != 0;
Or even:
return (bytes[0] != 0) && (bytes[1] != 0) && (bytes[2] != 0);
Both of these seem clearer to me.

private bool BytesAreValid(byte[] bytes) {
return !bytes.Take(3).SequenceEqual(new byte[] { 0, 0, 0 });
}

To anticipate variable array lengths and avoid null reference exceptions:
private bool BytesAreValid(byte[] bytes)
{
if (bytes == null) return false;
return !Array.Exists(bytes, x => x == 0);
}
Non-Linq version:
private bool BytesAreValid(byte[] bytes)
{
if (bytes == null) return false;
for (int i = 0; i < bytes.Length; i++)
{
if (bytes[i] == 0) return false;
}
return true;
}

Related

Decode cyrillic quoted-printable content

I'm using this sample for getting mail from server. Problem is that response contains cyrillic symbols I cannot decode.
Here is a header:
Content-type: text/html; charset="koi8-r"
Content-Transfer-Encoding: quoted-printable
And receive response function:
static void receiveResponse(string command)
{
try
{
if (command != "")
{
if (tcpc.Connected)
{
dummy = Encoding.ASCII.GetBytes(command);
ssl.Write(dummy, 0, dummy.Length);
}
else
{
throw new ApplicationException("TCP CONNECTION DISCONNECTED");
}
}
ssl.Flush();
byte[] bigBuffer = new byte[1024*16];
int bites = ssl.Read(bigBuffer, 0, bigBuffer.Length);
byte[] buffer = new byte[bites];
Array.Copy(bigBuffer, 0, buffer, 0, bites);
sb.Append(Encoding.ASCII.GetString(buffer));
string result = sb.ToString();
// here is an unsuccessful attempt at decoding
result = Regex.Replace(result, #"=([0-9a-fA-F]{2})",
m => m.Groups[1].Success
? Convert.ToChar(Convert.ToInt32(m.Groups[1].Value, 16)).ToString()
: "");
byte[] bytes = Encoding.Default.GetBytes(result);
result = Encoding.GetEncoding("koi8r").GetString(bytes);
}
catch (Exception ex)
{
throw new ApplicationException(ex.ToString());
}
}
How to decode stream correctly? In result string I got <p>=F0=D2=C9=D7=C5=D4 =D1 =F7=C1=CE=D1</p> instead of <p>Привет я Ваня</p>.
As #Max pointed out, you will need to decode the content using the encoding algorithm declared in the Content-Transfer-Encoding header.
In your case, it is the quoted-printable encoding.
You will need to decode the text of the message into an array of bytes and then you’ll need to convert that array of bytes into a string using the appropriate System.Text.Encoding. The name of the encoding to use will typically be specified in the Content-Type header as the charset parameter (in your case, koi8-r).
Since you already have the text as bytes in the buffer variable, simply perform the deciding on that:
byte[] buffer = new byte[bites];
int decodedLength = 0;
for (int i = 0; i < bites; i++) {
if (bigBuffer[i] == (byte) '=') {
if (bites > i + 1) {
// possible hex sequence
byte b1 = bigBuffer[i + 1];
byte b2 = bigBuffer[i + 2];
if (IsXDigit (b1) && IsXDigit (b2)) {
// decode
buffer[decodedLength++] = (ToXDigit (b1) << 4) | ToXDigit (b2);
i += 2;
} else if (b1 == (byte) '\r' && b2 == (byte) '\n') {
// folded line, drop the '=\r\n' sequence
i += 2;
} else {
// error condition, just pass it through
buffer[decodedLength++] = bigBuffer[i];
}
} else {
// truncated? just pass it through
buffer[decodedLength++] = bigBuffer[i];
}
} else {
buffer[decodedLength++] = bigBuffer[i];
}
}
string result = Encoding.GetEncoding ("koi8-r").GetString (buffer, 0, decodedLength);
Custom functions:
static byte ToXDigit (byte c)
{
if (c >= 0x41) {
if (c >= 0x61)
return (byte) (c - (0x61 - 0x0a));
return (byte) (c - (0x41 - 0x0A));
}
return (byte) (c - 0x30);
}
static bool IsXDigit (byte c)
{
return (c >= (byte) 'A' && c <= (byte) 'F') || (c >= (byte) 'a' && c <= (byte) 'f') || (c >= (byte) '0' && c <= (byte) '9');
}
Of course, instead of writing your own hodge podge IMAP library, you could just use MimeKit and MailKit ;-)

Encoding errors in embedded Json file

I have run into an issue and can't quite get my head around it.
I have this code:
public List<NavigationModul> LoadNavigation()
{
byte[] navBytes = NavigationResources.Navigation;
var encoding = GetEncoding(navBytes);
string json = encoding.GetString(navBytes);
List<NavigationModul> navigation = JsonConvert.DeserializeObject<List<NavigationModul>>(json);
return navigation;
}
public static Encoding GetEncoding(byte [] textBytes)
{
if (textBytes[0] == 0x2b && textBytes[1] == 0x2f && textBytes[2] == 0x76) return Encoding.UTF7;
if (textBytes[0] == 0xef && textBytes[1] == 0xbb && textBytes[2] == 0xbf) return Encoding.UTF8;
if (textBytes[0] == 0xff && textBytes[1] == 0xfe) return Encoding.Unicode; //UTF-16LE
if (textBytes[0] == 0xfe && textBytes[1] == 0xff) return Encoding.BigEndianUnicode; //UTF-16BE
if (textBytes[0] == 0 && textBytes[1] == 0 && textBytes[2] == 0xfe && textBytes[3] == 0xff) return Encoding.UTF32;
return Encoding.ASCII;
}
The Goal is to load an embedded Json File (NavigationResources.Navigation) from a ResourceFile. The Navigation File is an embedded file. We are just jusing the ResourceManager to avoid Magic strings.
After loading the bytes of the embedded file and checking for its encoding, I now read the String from the file and pass it to the JsonConverter.DeserializeObject function.
But unfortunaly this fails due to invalid Json. Long story short: The loaded json string still contains encoding identification bytes. And I can't figure out how to get rid of it.
I also tryed to convert the utf8 bytearray to default encoding before loading the string but this only makes the encoding bytes become a visible charecter.
I talked to my peers and they told me that they have run into the same problem reading embedded batchfiles, leading to broken batchfiles. They didn't know how to fix the problem either, but came up with a workaround for the batchfiles itself (add a blank line into the batchfile to make it work)
Any suggestions on how to fix this?
Thanks to Alex K. I have a solution:
Cuting of the Identification Bytes before calling Encoding.GetString did the trick.
Here is my function I now use to do the Task:
public static string GetStringFromEncodedBytes(byte[] bytes) {
Encoding encoding = Encoding.Default;
int skipBytes = 0;
if (bytes[0] == 0x2b && bytes[1] == 0x2f && bytes[2] == 0x76) {
encoding = Encoding.UTF7;
skipBytes = 3;
}
if (bytes[0] == 0xef && bytes[1] == 0xbb && bytes[2] == 0xbf)
{
encoding = Encoding.UTF8;
skipBytes = 3;
}
if (bytes[0] == 0xff && bytes[1] == 0xfe)
{
encoding = Encoding.Unicode;
skipBytes = 2;
}
if (bytes[0] == 0xfe && bytes[1] == 0xff)
{
encoding = Encoding.BigEndianUnicode;
skipBytes = 2;
}
if (bytes[0] == 0 && bytes[1] == 0 && bytes[2] == 0xfe && bytes[3] == 0xff)
{
encoding = Encoding.UTF32;
skipBytes = 4;
}
return encoding.GetString(bytes.Skip(skipBytes).ToArray());
}
Here's a simpler approach, removing the BOM after decoding:
// Your data is always in UTF-8 apparently, so just rely on that.
string text = Encoding.UTF8.GetString(data);
if (text.StartsWith("\ufeff"))
{
text = text.Substring(1);
}
This has the downside of copying the string, of course.
Or if you do want to skip the bytes:
// Again, we're assuming UTF-8
int start = data.Length >= 3 && data[0] == 0xef &&
data[1] == 0xbb && data[2] == 0xbf)
? 3 : 0;
string text = Encoding.UTF8.GetString(data, start, data.Length - start);
That way you don't need to use Skip and ToArray, and it avoids doing any extraneous copying.

Comparing Byte Arrays In C# (without for loop) [duplicate]

How can I do this fast?
Sure I can do this:
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length)
return false;
for (int i=0; i<a1.Length; i++)
if (a1[i]!=a2[i])
return false;
return true;
}
But I'm looking for either a BCL function or some highly optimized proven way to do this.
java.util.Arrays.equals((sbyte[])(Array)a1, (sbyte[])(Array)a2);
works nicely, but it doesn't look like that would work for x64.
Note my super-fast answer here.
You can use Enumerable.SequenceEqual method.
using System;
using System.Linq;
...
var a1 = new int[] { 1, 2, 3};
var a2 = new int[] { 1, 2, 3};
var a3 = new int[] { 1, 2, 4};
var x = a1.SequenceEqual(a2); // true
var y = a1.SequenceEqual(a3); // false
If you can't use .NET 3.5 for some reason, your method is OK.
Compiler\run-time environment will optimize your loop so you don't need to worry about performance.
P/Invoke powers activate!
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static bool ByteArrayCompare(byte[] b1, byte[] b2)
{
// Validate buffers are the same length.
// This also ensures that the count does not exceed the length of either buffer.
return b1.Length == b2.Length && memcmp(b1, b2, b1.Length) == 0;
}
Span<T> offers an extremely competitive alternative without having to throw confusing and/or non-portable fluff into your own application's code base:
// byte[] is implicitly convertible to ReadOnlySpan<byte>
static bool ByteArrayCompare(ReadOnlySpan<byte> a1, ReadOnlySpan<byte> a2)
{
return a1.SequenceEqual(a2);
}
The (guts of the) implementation as of .NET 6.0.4 can be found here.
I've revised #EliArbel's gist to add this method as SpansEqual, drop most of the less interesting performers in others' benchmarks, run it with different array sizes, output graphs, and mark SpansEqual as the baseline so that it reports how the different methods compare to SpansEqual.
The below numbers are from the results, lightly edited to remove "Error" column.
| Method | ByteCount | Mean | StdDev | Ratio | RatioSD |
|-------------- |----------- |-------------------:|----------------:|------:|--------:|
| SpansEqual | 15 | 2.074 ns | 0.0233 ns | 1.00 | 0.00 |
| LongPointers | 15 | 2.854 ns | 0.0632 ns | 1.38 | 0.03 |
| Unrolled | 15 | 12.449 ns | 0.2487 ns | 6.00 | 0.13 |
| PInvokeMemcmp | 15 | 7.525 ns | 0.1057 ns | 3.63 | 0.06 |
| | | | | | |
| SpansEqual | 1026 | 15.629 ns | 0.1712 ns | 1.00 | 0.00 |
| LongPointers | 1026 | 46.487 ns | 0.2938 ns | 2.98 | 0.04 |
| Unrolled | 1026 | 23.786 ns | 0.1044 ns | 1.52 | 0.02 |
| PInvokeMemcmp | 1026 | 28.299 ns | 0.2781 ns | 1.81 | 0.03 |
| | | | | | |
| SpansEqual | 1048585 | 17,920.329 ns | 153.0750 ns | 1.00 | 0.00 |
| LongPointers | 1048585 | 42,077.448 ns | 309.9067 ns | 2.35 | 0.02 |
| Unrolled | 1048585 | 29,084.901 ns | 428.8496 ns | 1.62 | 0.03 |
| PInvokeMemcmp | 1048585 | 30,847.572 ns | 213.3162 ns | 1.72 | 0.02 |
| | | | | | |
| SpansEqual | 2147483591 | 124,752,376.667 ns | 552,281.0202 ns | 1.00 | 0.00 |
| LongPointers | 2147483591 | 139,477,269.231 ns | 331,458.5429 ns | 1.12 | 0.00 |
| Unrolled | 2147483591 | 137,617,423.077 ns | 238,349.5093 ns | 1.10 | 0.00 |
| PInvokeMemcmp | 2147483591 | 138,373,253.846 ns | 288,447.8278 ns | 1.11 | 0.01 |
I was surprised to see SpansEqual not come out on top for the max-array-size methods, but the difference is so minor that I don't think it'll ever matter. After refreshing to run on .NET 6.0.4 with my newer hardware, SpansEqual now comfortably outperforms all others at all array sizes.
My system info:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
AMD Ryzen 9 5900X, 1 CPU, 24 logical and 12 physical cores
.NET SDK=6.0.202
[Host] : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
DefaultJob : .NET 6.0.4 (6.0.422.16404), X64 RyuJIT
There's a new built-in solution for this in .NET 4 - IStructuralEquatable
static bool ByteArrayCompare(byte[] a1, byte[] a2)
{
return StructuralComparisons.StructuralEqualityComparer.Equals(a1, a2);
}
Edit: modern fast way is to use a1.SequenceEquals(a2)
User gil suggested unsafe code which spawned this solution:
// Copyright (c) 2008-2013 Hafthor Stefansson
// Distributed under the MIT/X11 software license
// Ref: http://www.opensource.org/licenses/mit-license.php.
static unsafe bool UnsafeCompare(byte[] a1, byte[] a2) {
unchecked {
if(a1==a2) return true;
if(a1==null || a2==null || a1.Length!=a2.Length)
return false;
fixed (byte* p1=a1, p2=a2) {
byte* x1=p1, x2=p2;
int l = a1.Length;
for (int i=0; i < l/8; i++, x1+=8, x2+=8)
if (*((long*)x1) != *((long*)x2)) return false;
if ((l & 4)!=0) { if (*((int*)x1)!=*((int*)x2)) return false; x1+=4; x2+=4; }
if ((l & 2)!=0) { if (*((short*)x1)!=*((short*)x2)) return false; x1+=2; x2+=2; }
if ((l & 1)!=0) if (*((byte*)x1) != *((byte*)x2)) return false;
return true;
}
}
}
which does 64-bit based comparison for as much of the array as possible. This kind of counts on the fact that the arrays start qword aligned. It'll work if not qword aligned, just not as fast as if it were.
It performs about seven timers faster than the simple `for` loop. Using the J# library performed equivalently to the original `for` loop. Using .SequenceEqual runs around seven times slower; I think just because it is using IEnumerator.MoveNext. I imagine LINQ-based solutions being at least that slow or worse.
If you are not opposed to doing it, you can import the J# assembly "vjslib.dll" and use its Arrays.equals(byte[], byte[]) method...
Don't blame me if someone laughs at you though...
EDIT: For what little it is worth, I used Reflector to disassemble the code for that, and here is what it looks like:
public static bool equals(sbyte[] a1, sbyte[] a2)
{
if (a1 == a2)
{
return true;
}
if ((a1 != null) && (a2 != null))
{
if (a1.Length != a2.Length)
{
return false;
}
for (int i = 0; i < a1.Length; i++)
{
if (a1[i] != a2[i])
{
return false;
}
}
return true;
}
return false;
}
.NET 3.5 and newer have a new public type, System.Data.Linq.Binary that encapsulates byte[]. It implements IEquatable<Binary> that (in effect) compares two byte arrays. Note that System.Data.Linq.Binary also has implicit conversion operator from byte[].
MSDN documentation:System.Data.Linq.Binary
Reflector decompile of the Equals method:
private bool EqualsTo(Binary binary)
{
if (this != binary)
{
if (binary == null)
{
return false;
}
if (this.bytes.Length != binary.bytes.Length)
{
return false;
}
if (this.hashCode != binary.hashCode)
{
return false;
}
int index = 0;
int length = this.bytes.Length;
while (index < length)
{
if (this.bytes[index] != binary.bytes[index])
{
return false;
}
index++;
}
}
return true;
}
Interesting twist is that they only proceed to byte-by-byte comparison loop if hashes of the two Binary objects are the same. This, however, comes at the cost of computing the hash in constructor of Binary objects (by traversing the array with for loop :-) ).
The above implementation means that in the worst case you may have to traverse the arrays three times: first to compute hash of array1, then to compute hash of array2 and finally (because this is the worst case scenario, lengths and hashes equal) to compare bytes in array1 with bytes in array 2.
Overall, even though System.Data.Linq.Binary is built into BCL, I don't think it is the fastest way to compare two byte arrays :-|.
I posted a similar question about checking if byte[] is full of zeroes. (SIMD code was beaten so I removed it from this answer.) Here is fastest code from my comparisons:
static unsafe bool EqualBytesLongUnrolled (byte[] data1, byte[] data2)
{
if (data1 == data2)
return true;
if (data1.Length != data2.Length)
return false;
fixed (byte* bytes1 = data1, bytes2 = data2) {
int len = data1.Length;
int rem = len % (sizeof(long) * 16);
long* b1 = (long*)bytes1;
long* b2 = (long*)bytes2;
long* e1 = (long*)(bytes1 + len - rem);
while (b1 < e1) {
if (*(b1) != *(b2) || *(b1 + 1) != *(b2 + 1) ||
*(b1 + 2) != *(b2 + 2) || *(b1 + 3) != *(b2 + 3) ||
*(b1 + 4) != *(b2 + 4) || *(b1 + 5) != *(b2 + 5) ||
*(b1 + 6) != *(b2 + 6) || *(b1 + 7) != *(b2 + 7) ||
*(b1 + 8) != *(b2 + 8) || *(b1 + 9) != *(b2 + 9) ||
*(b1 + 10) != *(b2 + 10) || *(b1 + 11) != *(b2 + 11) ||
*(b1 + 12) != *(b2 + 12) || *(b1 + 13) != *(b2 + 13) ||
*(b1 + 14) != *(b2 + 14) || *(b1 + 15) != *(b2 + 15))
return false;
b1 += 16;
b2 += 16;
}
for (int i = 0; i < rem; i++)
if (data1 [len - 1 - i] != data2 [len - 1 - i])
return false;
return true;
}
}
Measured on two 256MB byte arrays:
UnsafeCompare : 86,8784 ms
EqualBytesSimd : 71,5125 ms
EqualBytesSimdUnrolled : 73,1917 ms
EqualBytesLongUnrolled : 39,8623 ms
using System.Linq; //SequenceEqual
byte[] ByteArray1 = null;
byte[] ByteArray2 = null;
ByteArray1 = MyFunct1();
ByteArray2 = MyFunct2();
if (ByteArray1.SequenceEqual<byte>(ByteArray2) == true)
{
MessageBox.Show("Match");
}
else
{
MessageBox.Show("Don't match");
}
Let's add one more!
Recently Microsoft released a special NuGet package, System.Runtime.CompilerServices.Unsafe. It's special because it's written in IL, and provides low-level functionality not directly available in C#.
One of its methods, Unsafe.As<T>(object) allows casting any reference type to another reference type, skipping any safety checks. This is usually a very bad idea, but if both types have the same structure, it can work. So we can use this to cast a byte[] to a long[]:
bool CompareWithUnsafeLibrary(byte[] a1, byte[] a2)
{
if (a1.Length != a2.Length) return false;
var longSize = (int)Math.Floor(a1.Length / 8.0);
var long1 = Unsafe.As<long[]>(a1);
var long2 = Unsafe.As<long[]>(a2);
for (var i = 0; i < longSize; i++)
{
if (long1[i] != long2[i]) return false;
}
for (var i = longSize * 8; i < a1.Length; i++)
{
if (a1[i] != a2[i]) return false;
}
return true;
}
Note that long1.Length would still return the original array's length, since it's stored in a field in the array's memory structure.
This method is not quite as fast as other methods demonstrated here, but it is a lot faster than the naive method, doesn't use unsafe code or P/Invoke or pinning, and the implementation is quite straightforward (IMO). Here are some BenchmarkDotNet results from my machine:
BenchmarkDotNet=v0.10.3.0, OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4870HQ CPU 2.50GHz, ProcessorCount=8
Frequency=2435775 Hz, Resolution=410.5470 ns, Timer=TSC
[Host] : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
DefaultJob : Clr 4.0.30319.42000, 64bit RyuJIT-v4.6.1637.0
Method | Mean | StdDev |
----------------------- |-------------- |---------- |
UnsafeLibrary | 125.8229 ns | 0.3588 ns |
UnsafeCompare | 89.9036 ns | 0.8243 ns |
JSharpEquals | 1,432.1717 ns | 1.3161 ns |
EqualBytesLongUnrolled | 43.7863 ns | 0.8923 ns |
NewMemCmp | 65.4108 ns | 0.2202 ns |
ArraysEqual | 910.8372 ns | 2.6082 ns |
PInvokeMemcmp | 52.7201 ns | 0.1105 ns |
I've also created a gist with all the tests.
I developed a method that slightly beats memcmp() (plinth's answer) and very slighly beats EqualBytesLongUnrolled() (Arek Bulski's answer) on my PC. Basically, it unrolls the loop by 4 instead of 8.
Update 30 Mar. 2019:
Starting in .NET core 3.0, we have SIMD support!
This solution is fastest by a considerable margin on my PC:
#if NETCOREAPP3_0
using System.Runtime.Intrinsics.X86;
#endif
…
public static unsafe bool Compare(byte[] arr0, byte[] arr1)
{
if (arr0 == arr1)
{
return true;
}
if (arr0 == null || arr1 == null)
{
return false;
}
if (arr0.Length != arr1.Length)
{
return false;
}
if (arr0.Length == 0)
{
return true;
}
fixed (byte* b0 = arr0, b1 = arr1)
{
#if NETCOREAPP3_0
if (Avx2.IsSupported)
{
return Compare256(b0, b1, arr0.Length);
}
else if (Sse2.IsSupported)
{
return Compare128(b0, b1, arr0.Length);
}
else
#endif
{
return Compare64(b0, b1, arr0.Length);
}
}
}
#if NETCOREAPP3_0
public static unsafe bool Compare256(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus128 = lastAddr - 128;
const int mask = -1;
while (b0 < lastAddrMinus128) // unroll the loop so that we are comparing 128 bytes at a time.
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 32), Avx.LoadVector256(b1 + 32))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 64), Avx.LoadVector256(b1 + 64))) != mask)
{
return false;
}
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0 + 96), Avx.LoadVector256(b1 + 96))) != mask)
{
return false;
}
b0 += 128;
b1 += 128;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
public static unsafe bool Compare128(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus64 = lastAddr - 64;
const int mask = 0xFFFF;
while (b0 < lastAddrMinus64) // unroll the loop so that we are comparing 64 bytes at a time.
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 16), Sse2.LoadVector128(b1 + 16))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 32), Sse2.LoadVector128(b1 + 32))) != mask)
{
return false;
}
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0 + 48), Sse2.LoadVector128(b1 + 48))) != mask)
{
return false;
}
b0 += 64;
b1 += 64;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
#endif
public static unsafe bool Compare64(byte* b0, byte* b1, int length)
{
byte* lastAddr = b0 + length;
byte* lastAddrMinus32 = lastAddr - 32;
while (b0 < lastAddrMinus32) // unroll the loop so that we are comparing 32 bytes at a time.
{
if (*(ulong*)b0 != *(ulong*)b1) return false;
if (*(ulong*)(b0 + 8) != *(ulong*)(b1 + 8)) return false;
if (*(ulong*)(b0 + 16) != *(ulong*)(b1 + 16)) return false;
if (*(ulong*)(b0 + 24) != *(ulong*)(b1 + 24)) return false;
b0 += 32;
b1 += 32;
}
while (b0 < lastAddr)
{
if (*b0 != *b1) return false;
b0++;
b1++;
}
return true;
}
I would use unsafe code and run the for loop comparing Int32 pointers.
Maybe you should also consider checking the arrays to be non-null.
If you look at how .NET does string.Equals, you see that it uses a private method called EqualsHelper which has an "unsafe" pointer implementation. .NET Reflector is your friend to see how things are done internally.
This can be used as a template for byte array comparison which I did an implementation on in blog post Fast byte array comparison in C#. I also did some rudimentary benchmarks to see when a safe implementation is faster than the unsafe.
That said, unless you really need killer performance, I'd go for a simple fr loop comparison.
For those of you that care about order (i.e. want your memcmp to return an int like it should instead of nothing), .NET Core 3.0 (and presumably .NET Standard 2.1 aka .NET 5.0) will include a Span.SequenceCompareTo(...) extension method (plus a Span.SequenceEqualTo) that can be used to compare two ReadOnlySpan<T> instances (where T: IComparable<T>).
In the original GitHub proposal, the discussion included approach comparisons with jump table calculations, reading a byte[] as long[], SIMD usage, and p/invoke to the CLR implementation's memcmp.
Going forward, this should be your go-to method for comparing byte arrays or byte ranges (as should using Span<byte> instead of byte[] for your .NET Standard 2.1 APIs), and it is sufficiently fast enough that you should no longer care about optimizing it (and no, despite the similarities in name it does not perform as abysmally as the horrid Enumerable.SequenceEqual).
#if NETCOREAPP3_0_OR_GREATER
// Using the platform-native Span<T>.SequenceEqual<T>(..)
public static int Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
var span1 = range1.AsSpan(offset1, count);
var span2 = range2.AsSpan(offset2, count);
return span1.SequenceCompareTo(span2);
// or, if you don't care about ordering
// return span1.SequenceEqual(span2);
}
#else
// The most basic implementation, in platform-agnostic, safe C#
public static bool Compare(byte[] range1, int offset1, byte[] range2, int offset2, int count)
{
// Working backwards lets the compiler optimize away bound checking after the first loop
for (int i = count - 1; i >= 0; --i)
{
if (range1[offset1 + i] != range2[offset2 + i])
{
return false;
}
}
return true;
}
#endif
I did some measurements using attached program .net 4.7 release build without the debugger attached. I think people have been using the wrong metric since what you are about if you care about speed here is how long it takes to figure out if two byte arrays are equal. i.e. throughput in bytes.
StructuralComparison : 4.6 MiB/s
for : 274.5 MiB/s
ToUInt32 : 263.6 MiB/s
ToUInt64 : 474.9 MiB/s
memcmp : 8500.8 MiB/s
As you can see, there's no better way than memcmp and it's orders of magnitude faster. A simple for loop is the second best option. And it still boggles my mind why Microsoft cannot simply include a Buffer.Compare method.
[Program.cs]:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Runtime.InteropServices;
using System.Text;
using System.Threading.Tasks;
namespace memcmp
{
class Program
{
static byte[] TestVector(int size)
{
var data = new byte[size];
using (var rng = new System.Security.Cryptography.RNGCryptoServiceProvider())
{
rng.GetBytes(data);
}
return data;
}
static TimeSpan Measure(string testCase, TimeSpan offset, Action action, bool ignore = false)
{
var t = Stopwatch.StartNew();
var n = 0L;
while (t.Elapsed < TimeSpan.FromSeconds(10))
{
action();
n++;
}
var elapsed = t.Elapsed - offset;
if (!ignore)
{
Console.WriteLine($"{testCase,-16} : {n / elapsed.TotalSeconds,16:0.0} MiB/s");
}
return elapsed;
}
[DllImport("msvcrt.dll", CallingConvention = CallingConvention.Cdecl)]
static extern int memcmp(byte[] b1, byte[] b2, long count);
static void Main(string[] args)
{
// how quickly can we establish if two sequences of bytes are equal?
// note that we are testing the speed of different comparsion methods
var a = TestVector(1024 * 1024); // 1 MiB
var b = (byte[])a.Clone();
// was meant to offset the overhead of everything but copying but my attempt was a horrible mistake... should have reacted sooner due to the initially ridiculous throughput values...
// Measure("offset", new TimeSpan(), () => { return; }, ignore: true);
var offset = TimeZone.Zero
Measure("StructuralComparison", offset, () =>
{
StructuralComparisons.StructuralEqualityComparer.Equals(a, b);
});
Measure("for", offset, () =>
{
for (int i = 0; i < a.Length; i++)
{
if (a[i] != b[i]) break;
}
});
Measure("ToUInt32", offset, () =>
{
for (int i = 0; i < a.Length; i += 4)
{
if (BitConverter.ToUInt32(a, i) != BitConverter.ToUInt32(b, i)) break;
}
});
Measure("ToUInt64", offset, () =>
{
for (int i = 0; i < a.Length; i += 8)
{
if (BitConverter.ToUInt64(a, i) != BitConverter.ToUInt64(b, i)) break;
}
});
Measure("memcmp", offset, () =>
{
memcmp(a, b, a.Length);
});
}
}
}
Couldn't find a solution I'm completely happy with (reasonable performance, but no unsafe code/pinvoke) so I came up with this, nothing really original, but works:
/// <summary>
///
/// </summary>
/// <param name="array1"></param>
/// <param name="array2"></param>
/// <param name="bytesToCompare"> 0 means compare entire arrays</param>
/// <returns></returns>
public static bool ArraysEqual(byte[] array1, byte[] array2, int bytesToCompare = 0)
{
if (array1.Length != array2.Length) return false;
var length = (bytesToCompare == 0) ? array1.Length : bytesToCompare;
var tailIdx = length - length % sizeof(Int64);
//check in 8 byte chunks
for (var i = 0; i < tailIdx; i += sizeof(Int64))
{
if (BitConverter.ToInt64(array1, i) != BitConverter.ToInt64(array2, i)) return false;
}
//check the remainder of the array, always shorter than 8 bytes
for (var i = tailIdx; i < length; i++)
{
if (array1[i] != array2[i]) return false;
}
return true;
}
Performance compared with some of the other solutions on this page:
Simple Loop: 19837 ticks, 1.00
*BitConverter: 4886 ticks, 4.06
UnsafeCompare: 1636 ticks, 12.12
EqualBytesLongUnrolled: 637 ticks, 31.09
P/Invoke memcmp: 369 ticks, 53.67
Tested in linqpad, 1000000 bytes identical arrays (worst case scenario), 500 iterations each.
It seems that EqualBytesLongUnrolled is the best from the above suggested.
Skipped methods (Enumerable.SequenceEqual,StructuralComparisons.StructuralEqualityComparer.Equals), were not-patient-for-slow. On 265MB arrays I have measured this:
Host Process Environment Information:
BenchmarkDotNet.Core=v0.9.9.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-3770 CPU 3.40GHz, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT]
GC=Concurrent Workstation
JitModules=clrjit-v4.6.1590.0
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.0443 ms | 1.1880 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 29.9917 ms | 0.7480 ms | 0.99 | 0.04 |
msvcrt_memcmp | 30.0930 ms | 0.2964 ms | 1.00 | 0.03 |
UnsafeCompare | 31.0520 ms | 0.7072 ms | 1.03 | 0.04 |
ByteArrayCompare | 212.9980 ms | 2.0776 ms | 7.06 | 0.25 |
OS=Windows
Processor=?, ProcessorCount=8
Frequency=3323582 ticks, Resolution=300.8802 ns, Timer=TSC
CLR=CORE, Arch=64-bit ? [RyuJIT]
GC=Concurrent Workstation
dotnet cli version: 1.0.0-preview2-003131
Type=CompareMemoriesBenchmarks Mode=Throughput
Method | Median | StdDev | Scaled | Scaled-SD |
----------------------- |------------ |---------- |------- |---------- |
NewMemCopy | 30.1789 ms | 0.0437 ms | 1.00 | 0.00 |
EqualBytesLongUnrolled | 30.1985 ms | 0.1782 ms | 1.00 | 0.01 |
msvcrt_memcmp | 30.1084 ms | 0.0660 ms | 1.00 | 0.00 |
UnsafeCompare | 31.1845 ms | 0.4051 ms | 1.03 | 0.01 |
ByteArrayCompare | 212.0213 ms | 0.1694 ms | 7.03 | 0.01 |
For comparing short byte arrays the following is an interesting hack:
if(myByteArray1.Length != myByteArray2.Length) return false;
if(myByteArray1.Length == 8)
return BitConverter.ToInt64(myByteArray1, 0) == BitConverter.ToInt64(myByteArray2, 0);
else if(myByteArray.Length == 4)
return BitConverter.ToInt32(myByteArray2, 0) == BitConverter.ToInt32(myByteArray2, 0);
Then I would probably fall out to the solution listed in the question.
It'd be interesting to do a performance analysis of this code.
I have not seen many linq solutions here.
I am not sure of the performance implications, however I generally stick to linq as rule of thumb and then optimize later if necessary.
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
return !array1.Where((t, i) => t != array2[i]).Any();
}
Please do note this only works if they are the same size arrays.
an extension could look like so
public bool CompareTwoArrays(byte[] array1, byte[] array2)
{
if (array1.Length != array2.Length) return false;
return !array1.Where((t, i) => t != array2[i]).Any();
}
I thought about block-transfer acceleration methods built into many graphics cards. But then you would have to copy over all the data byte-wise, so this doesn't help you much if you don't want to implement a whole portion of your logic in unmanaged and hardware-dependent code...
Another way of optimization similar to the approach shown above would be to store as much of your data as possible in a long[] rather than a byte[] right from the start, for example if you are reading it sequentially from a binary file, or if you use a memory mapped file, read in data as long[] or single long values. Then, your comparison loop will only need 1/8th of the number of iterations it would have to do for a byte[] containing the same amount of data.
It is a matter of when and how often you need to compare vs. when and how often you need to access the data in a byte-by-byte manner, e.g. to use it in an API call as a parameter in a method that expects a byte[]. In the end, you only can tell if you really know the use case...
Sorry, if you're looking for a managed way you're already doing it correctly and to my knowledge there's no built in method in the BCL for doing this.
You should add some initial null checks and then just reuse it as if it where in BCL.
I settled on a solution inspired by the EqualBytesLongUnrolled method posted by ArekBulski with an additional optimization. In my instance, array differences in arrays tend to be near the tail of the arrays. In testing, I found that when this is the case for large arrays, being able to compare array elements in reverse order gives this solution a huge performance gain over the memcmp based solution. Here is that solution:
public enum CompareDirection { Forward, Backward }
private static unsafe bool UnsafeEquals(byte[] a, byte[] b, CompareDirection direction = CompareDirection.Forward)
{
// returns when a and b are same array or both null
if (a == b) return true;
// if either is null or different lengths, can't be equal
if (a == null || b == null || a.Length != b.Length)
return false;
const int UNROLLED = 16; // count of longs 'unrolled' in optimization
int size = sizeof(long) * UNROLLED; // 128 bytes (min size for 'unrolled' optimization)
int len = a.Length;
int n = len / size; // count of full 128 byte segments
int r = len % size; // count of remaining 'unoptimized' bytes
// pin the arrays and access them via pointers
fixed (byte* pb_a = a, pb_b = b)
{
if (r > 0 && direction == CompareDirection.Backward)
{
byte* pa = pb_a + len - 1;
byte* pb = pb_b + len - 1;
byte* phead = pb_a + len - r;
while(pa >= phead)
{
if (*pa != *pb) return false;
pa--;
pb--;
}
}
if (n > 0)
{
int nOffset = n * size;
if (direction == CompareDirection.Forward)
{
long* pa = (long*)pb_a;
long* pb = (long*)pb_b;
long* ptail = (long*)(pb_a + nOffset);
while (pa < ptail)
{
if (*(pa + 0) != *(pb + 0) || *(pa + 1) != *(pb + 1) ||
*(pa + 2) != *(pb + 2) || *(pa + 3) != *(pb + 3) ||
*(pa + 4) != *(pb + 4) || *(pa + 5) != *(pb + 5) ||
*(pa + 6) != *(pb + 6) || *(pa + 7) != *(pb + 7) ||
*(pa + 8) != *(pb + 8) || *(pa + 9) != *(pb + 9) ||
*(pa + 10) != *(pb + 10) || *(pa + 11) != *(pb + 11) ||
*(pa + 12) != *(pb + 12) || *(pa + 13) != *(pb + 13) ||
*(pa + 14) != *(pb + 14) || *(pa + 15) != *(pb + 15)
)
{
return false;
}
pa += UNROLLED;
pb += UNROLLED;
}
}
else
{
long* pa = (long*)(pb_a + nOffset);
long* pb = (long*)(pb_b + nOffset);
long* phead = (long*)pb_a;
while (phead < pa)
{
if (*(pa - 1) != *(pb - 1) || *(pa - 2) != *(pb - 2) ||
*(pa - 3) != *(pb - 3) || *(pa - 4) != *(pb - 4) ||
*(pa - 5) != *(pb - 5) || *(pa - 6) != *(pb - 6) ||
*(pa - 7) != *(pb - 7) || *(pa - 8) != *(pb - 8) ||
*(pa - 9) != *(pb - 9) || *(pa - 10) != *(pb - 10) ||
*(pa - 11) != *(pb - 11) || *(pa - 12) != *(pb - 12) ||
*(pa - 13) != *(pb - 13) || *(pa - 14) != *(pb - 14) ||
*(pa - 15) != *(pb - 15) || *(pa - 16) != *(pb - 16)
)
{
return false;
}
pa -= UNROLLED;
pb -= UNROLLED;
}
}
}
if (r > 0 && direction == CompareDirection.Forward)
{
byte* pa = pb_a + len - r;
byte* pb = pb_b + len - r;
byte* ptail = pb_a + len;
while(pa < ptail)
{
if (*pa != *pb) return false;
pa++;
pb++;
}
}
}
return true;
}
This is almost certainly much slower than any other version given here, but it was fun to write.
static bool ByteArrayEquals(byte[] a1, byte[] a2)
{
return a1.Zip(a2, (l, r) => l == r).All(x => x);
}
This is similar to others, but the difference here is that there is no falling through to the next highest number of bytes I can check at once, e.g. if I have 63 bytes (in my SIMD example) I can check the equality of the first 32 bytes, and then the last 32 bytes, which is faster than checking 32 bytes, 16 bytes, 8 bytes, and so on. The first check you enter is the only check you will need to compare all of the bytes.
This does come out on top in my tests, but just by a hair.
The following code is exactly how I tested it in airbreather/ArrayComparePerf.cs.
public unsafe bool SIMDNoFallThrough() #requires System.Runtime.Intrinsics.X86
{
if (a1 == null || a2 == null)
return false;
int length0 = a1.Length;
if (length0 != a2.Length) return false;
fixed (byte* b00 = a1, b01 = a2)
{
byte* b0 = b00, b1 = b01, last0 = b0 + length0, last1 = b1 + length0, last32 = last0 - 31;
if (length0 > 31)
{
while (b0 < last32)
{
if (Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(b0), Avx.LoadVector256(b1))) != -1)
return false;
b0 += 32;
b1 += 32;
}
return Avx2.MoveMask(Avx2.CompareEqual(Avx.LoadVector256(last0 - 32), Avx.LoadVector256(last1 - 32))) == -1;
}
if (length0 > 15)
{
if (Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(b0), Sse2.LoadVector128(b1))) != 65535)
return false;
return Sse2.MoveMask(Sse2.CompareEqual(Sse2.LoadVector128(last0 - 16), Sse2.LoadVector128(last1 - 16))) == 65535;
}
if (length0 > 7)
{
if (*(ulong*)b0 != *(ulong*)b1)
return false;
return *(ulong*)(last0 - 8) == *(ulong*)(last1 - 8);
}
if (length0 > 3)
{
if (*(uint*)b0 != *(uint*)b1)
return false;
return *(uint*)(last0 - 4) == *(uint*)(last1 - 4);
}
if (length0 > 1)
{
if (*(ushort*)b0 != *(ushort*)b1)
return false;
return *(ushort*)(last0 - 2) == *(ushort*)(last1 - 2);
}
return *b0 == *b1;
}
}
If no SIMD is preferred, the same method applied to the the existing LongPointers algorithm:
public unsafe bool LongPointersNoFallThrough()
{
if (a1 == null || a2 == null || a1.Length != a2.Length)
return false;
fixed (byte* p1 = a1, p2 = a2)
{
byte* x1 = p1, x2 = p2;
int l = a1.Length;
if ((l & 8) != 0)
{
for (int i = 0; i < l / 8; i++, x1 += 8, x2 += 8)
if (*(long*)x1 != *(long*)x2) return false;
return *(long*)(x1 + (l - 8)) == *(long*)(x2 + (l - 8));
}
if ((l & 4) != 0)
{
if (*(int*)x1 != *(int*)x2) return false; x1 += 4; x2 += 4;
return *(int*)(x1 + (l - 4)) == *(int*)(x2 + (l - 4));
}
if ((l & 2) != 0)
{
if (*(short*)x1 != *(short*)x2) return false; x1 += 2; x2 += 2;
return *(short*)(x1 + (l - 2)) == *(short*)(x2 + (l - 2));
}
return *x1 == *x2;
}
}
If you are looking for a very fast byte array equality comparer, I suggest you take a look at this STSdb Labs article: Byte array equality comparer. It features some of the fastest implementations for byte[] array equality comparing, which are presented, performance tested and summarized.
You can also focus on these implementations:
BigEndianByteArrayComparer - fast byte[] array comparer from left to right (BigEndian)
BigEndianByteArrayEqualityComparer - - fast byte[] equality comparer from left to right (BigEndian)
LittleEndianByteArrayComparer - fast byte[] array comparer from right to left (LittleEndian)
LittleEndianByteArrayEqualityComparer - fast byte[] equality comparer from right to left (LittleEndian)
Use SequenceEquals for this to comparison.
The short answer is this:
public bool Compare(byte[] b1, byte[] b2)
{
return Encoding.ASCII.GetString(b1) == Encoding.ASCII.GetString(b2);
}
In such a way you can use the optimized .NET string compare to make a byte array compare without the need to write unsafe code. This is how it is done in the background:
private unsafe static bool EqualsHelper(String strA, String strB)
{
Contract.Requires(strA != null);
Contract.Requires(strB != null);
Contract.Requires(strA.Length == strB.Length);
int length = strA.Length;
fixed (char* ap = &strA.m_firstChar) fixed (char* bp = &strB.m_firstChar)
{
char* a = ap;
char* b = bp;
// Unroll the loop
#if AMD64
// For the AMD64 bit platform we unroll by 12 and
// check three qwords at a time. This is less code
// than the 32 bit case and is shorter
// pathlength.
while (length >= 12)
{
if (*(long*)a != *(long*)b) return false;
if (*(long*)(a+4) != *(long*)(b+4)) return false;
if (*(long*)(a+8) != *(long*)(b+8)) return false;
a += 12; b += 12; length -= 12;
}
#else
while (length >= 10)
{
if (*(int*)a != *(int*)b) return false;
if (*(int*)(a+2) != *(int*)(b+2)) return false;
if (*(int*)(a+4) != *(int*)(b+4)) return false;
if (*(int*)(a+6) != *(int*)(b+6)) return false;
if (*(int*)(a+8) != *(int*)(b+8)) return false;
a += 10; b += 10; length -= 10;
}
#endif
// This depends on the fact that the String objects are
// always zero terminated and that the terminating zero is not included
// in the length. For odd string sizes, the last compare will include
// the zero terminator.
while (length > 0)
{
if (*(int*)a != *(int*)b) break;
a += 2; b += 2; length -= 2;
}
return (length <= 0);
}
}
Since many of the fancy solutions above don't work with UWP and because I love Linq and functional approaches I pressent you my version to this problem.
To escape the comparison when the first difference occures, I chose .FirstOrDefault()
public static bool CompareByteArrays(byte[] ba0, byte[] ba1) =>
!(ba0.Length != ba1.Length || Enumerable.Range(1,ba0.Length)
.FirstOrDefault(n => ba0[n] != ba1[n]) > 0);

Binary pattern comparison shortcut / fastest implementation in C#

I need to check a given byte or series of bytes for a particular sequence of bits as follows:
Can start with zero or more number of 0s.
Can start with zero or more number of 1s.
Must contain at least one 0 at the end.
In other words, if the value of bytes is not 0, then we are only interested in values that contain consecutive 1s followed by at least one 0 at the end.
I wrote the following code to do just that but wanted to make sure that it highly optimized. I feel that the multiple checks within the if branches could be optimized but am not sure how. Please advise.
// The parameter [number] will NEVER be negative.
public static bool ConformsToPattern (System.Numerics.BigInteger number)
{
byte [] bytes = null;
bool moreOnesPossible = true;
if (number == 0) // 00000000
{
return (true); // All bits are zero.
}
else
{
bytes = number.ToByteArray();
if ((bytes [bytes.Length - 1] & 1) == 1)
{
return (false);
}
else
{
for (byte b=0; b < bytes.Length; b++)
{
if (moreOnesPossible)
{
if
(
(bytes [b] == 1) // 00000001
|| (bytes [b] == 3) // 00000011
|| (bytes [b] == 7) // 00000111
|| (bytes [b] == 15) // 00001111
|| (bytes [b] == 31) // 00011111
|| (bytes [b] == 63) // 00111111
|| (bytes [b] == 127) // 01111111
|| (bytes [b] == 255) // 11111111
)
{
// So far so good. Continue to the next byte with
// a possibility of more consecutive 1s.
}
else if
(
(bytes [b] == 128) // 10000000
|| (bytes [b] == 192) // 11000000
|| (bytes [b] == 224) // 11100000
|| (bytes [b] == 240) // 11110000
|| (bytes [b] == 248) // 11111000
|| (bytes [b] == 252) // 11111100
|| (bytes [b] == 254) // 11111110
)
{
moreOnesPossible = false;
}
else
{
return (false);
}
}
else
{
if (bytes [b] > 0)
{
return (false);
}
}
}
}
}
return (true);
}
IMPORTANT: The argument [number] sent to the function will NEVER be negative so no need to check for the sign bit.
I'm going to say that none of these answers are accounting for
00000010
00000110
00001110
00011110
00111110
01111110
00000100
00001100
00011100
00111100
01111100
etc, etc, etc.
Here's my byte array method:
public static bool ConformsToPattern(System.Numerics.BigInteger number)
{
bool foundStart = false, foundEnd = false;
int startPosition, stopPosition, increment;
if (number.IsZero || number.IsPowerOfTwo)
return true;
if (!number.IsEven)
return false;
byte[] bytes = number.ToByteArray();
if(BitConverter.IsLittleEndian)
{
startPosition = 0;
stopPosition = bytes.Length;
increment = 1;
}
else
{
startPosition = bytes.Length - 1;
stopPosition = -1;
increment = -1;
}
for(int i = startPosition; i != stopPosition; i += increment)
{
byte n = bytes[i];
for(int shiftCount = 0; shiftCount < 8; shiftCount++)
{
if (!foundEnd)
{
if ((n & 1) == 1)
foundEnd = true;
n = (byte)(n >> 1);
continue;
}
if (!foundStart)
{
if ((n & 1) == 0)
foundStart = true;
n = (byte)(n >> 1);
continue;
}
if (n == 0)
continue;
return false;
}
}
if (foundEnd)
return true;
return false;
}
Here's my BigInteger method:
public static bool ConformsToPattern(System.Numerics.BigInteger number)
{
bool foundStart = false;
bool foundEnd = false;
if (number.IsZero || number.IsPowerOfTwo)
return true;
if (!number.IsEven)
return false;
while (!number.IsZero)
{
if (!foundEnd)
{
if (!number.IsEven)
foundEnd = true;
number = number >> 1;
continue;
}
if (!foundStart)
{
if (number.IsEven)
foundStart = true;
number = number >> 1;
continue;
}
return false;
}
if (foundEnd)
return true;
return false;
}
Choose whichever works better for you. The byte array is faster as of now. The BigIntegers code is 100% accurate reference.
If you're not worried about native endianness remove that part code, but leaving it in there will ensure portability to other than just x86 systems. BigIntegers already gives me IsZero, IsEven and IsPowerOfTwo, so that's not an extra calculation. I'm not sure if that's the fastest way to bitshift right since there is a byte to int cast, but right now, I couldn't find another way. As for use of byte vs short vs int vs long for loop operations, that up to you to change if you feel it'll work better. I'm not sure what kind of BigIntegers you'll be sending so I think int would be safe. You can modify the code to remove the for loop and just copy paste the code 8 times, and it might be faster. Or you can throw that into a static method.
How about something like this? If you find a one, the only things after that can be 1s until a 0 is found. After that, only 0s. This looks like it'll do the trick a little faster because it doesn't do unnecessary or conditions.
// The parameter [number] will NEVER be negative.
public static bool ConformsToPattern (System.Numerics.BigInteger number)
{
byte [] bytes = null;
bool moreOnesPossible = true;
bool foundFirstOne = false;
if (number == 0) // 00000000
{
return (true); // All bits are zero.
}
else
{
bytes = number.ToByteArray();
if ((bytes [bytes.Length - 1] & 1) == 1)
{
return (false);
}
else
{
for (byte b=0; b < bytes.Length; b++)
{
if (moreOnesPossible)
{
if(!foundFirstOne)
{
if
(
(bytes [b] == 1) // 00000001
|| (bytes [b] == 3) // 00000011
|| (bytes [b] == 7) // 00000111
|| (bytes [b] == 15) // 00001111
|| (bytes [b] == 31) // 00011111
|| (bytes [b] == 63) // 00111111
|| (bytes [b] == 127) // 01111111
|| (bytes [b] == 255) // 11111111
)
{
foundFirstOne = true;
// So far so good. Continue to the next byte with
// a possibility of more consecutive 1s.
}
else if
(
(bytes [b] == 128) // 10000000
|| (bytes [b] == 192) // 11000000
|| (bytes [b] == 224) // 11100000
|| (bytes [b] == 240) // 11110000
|| (bytes [b] == 248) // 11111000
|| (bytes [b] == 252) // 11111100
|| (bytes [b] == 254) // 11111110
)
{
moreOnesPossible = false;
}
else
{
return (false);
}
}
else
{
if(bytes [b] != 255) // 11111111
{
if
(
(bytes [b] == 128) // 10000000
|| (bytes [b] == 192) // 11000000
|| (bytes [b] == 224) // 11100000
|| (bytes [b] == 240) // 11110000
|| (bytes [b] == 248) // 11111000
|| (bytes [b] == 252) // 11111100
|| (bytes [b] == 254) // 11111110
)
{
moreOnesPossible = false;
}
}
}
}
else
{
if (bytes [b] > 0)
{
return (false);
}
}
}
}
}
return (true);
}
Here is the method I wrote myself. Not very elegant but pretty fast.
/// <summary>
/// Checks to see if this cell lies on a major diagonal of a power of 2.
/// ^[0]*[1]*[0]+$ denotes the regular expression of the binary pattern we are looking for.
/// </summary>
public bool IsDiagonalMajorToPowerOfTwo ()
{
byte [] bytes = null;
bool moreOnesPossible = true;
System.Numerics.BigInteger number = 0;
number = System.Numerics.BigInteger.Abs(this.X - this.Y);
if ((number == 0) || (number == 1)) // 00000000
{
return (true); // All bits are zero.
}
else
{
// The last bit should always be 0.
if (number.IsEven)
{
bytes = number.ToByteArray();
for (byte b=0; b < bytes.Length; b++)
{
if (moreOnesPossible)
{
switch (bytes [b])
{
case 001: // 00000001
case 003: // 00000011
case 007: // 00000111
case 015: // 00001111
case 031: // 00011111
case 063: // 00111111
case 127: // 01111111
case 255: // 11111111
{
// So far so good.
// Carry on testing subsequent bytes.
break;
}
case 128: // 10000000
case 064: // 01000000
case 032: // 00100000
case 016: // 00010000
case 008: // 00001000
case 004: // 00000100
case 002: // 00000010
case 192: // 11000000
case 096: // 01100000
case 048: // 00110000
case 024: // 00011000
case 012: // 00001100
case 006: // 00000110
case 224: // 11100000
case 112: // 01110000
case 056: // 00111000
case 028: // 00011100
case 014: // 00001110
case 240: // 11110000
case 120: // 01111000
case 060: // 00111100
case 030: // 00011110
case 248: // 11111000
case 124: // 01111100
case 062: // 00111110
case 252: // 11111100
case 126: // 01111110
case 254: // 11111110
{
moreOnesPossible = false;
break;
}
default:
{
return (false);
}
}
}
else
{
if (bytes [b] > 0)
{
return (false);
}
}
}
}
else
{
return (false);
}
}
return (true);
}
If I understand you correctly, you must have only 1 consecutive series of 1's followed by consecutive zeros.
So if it has to end in zero, it has to be even.
All the bytes in the middle must be all 1's and the first and last byte are your only special cases.
if (number.IsZero)
return true;
if (!number.IsEven)
return false;
var bytes = number.ToByteArray();
for (int i = 0; i < bytes.Length; i++)
{
if (i == 0) //first byte case
{
if (!(
(bytes[i] == 1) // 00000001
|| (bytes[i] == 3) // 00000011
|| (bytes[i] == 7) // 00000111
|| (bytes[i] == 15) // 00001111
|| (bytes[i] == 31) // 00011111
|| (bytes[i] == 63) // 00111111
|| (bytes[i] == 127) // 01111111
|| (bytes[i] == 255) // 11111111
))
{
return false;
}
}
else if (i == bytes.Length) //last byte case
{
if (!(
(bytes[i] == 128) // 10000000
|| (bytes[i] == 192) // 11000000
|| (bytes[i] == 224) // 11100000
|| (bytes[i] == 240) // 11110000
|| (bytes[i] == 248) // 11111000
|| (bytes[i] == 252) // 11111100
|| (bytes[i] == 254) // 11111110
))
{
return false;
}
}
else //all bytes in the middle
{
if (bytes[i] != 255)
return false;
}
}
I'm a big fan of regular expressions, so I thought about simply converting the byte to a string and testing it against a regex. However, it's important to carefully define the pattern. By reading your question, I've come up with this one:
^(?:1*)(?:0+)$
Please, check it out:
public static bool ConformsToPattern(System.Numerics.BigInteger number)
{
byte[] ByteArray = number.ToByteArray();
Regex BinaryRegex = new Regex("^(?:1*)(?:0+)$", RegexOptions.Compiled);
return ByteArray.Where<byte>(x => !BinaryRegex.IsMatch(Convert.ToString(x, 2))).Count() > 0;
}
Not sure if this will be faster or slower than what you already have, but it's something to try (hope I didn't botch the logic)...
public bool ConformsToPattern(System.Numerics.BigInteger number) {
bool moreOnesPossible = true;
if (number == 0) {
return true;
}
else {
byte[] bytes = number.ToByteArray();
if ((bytes[bytes.Length - 1] & 1) == 1) {
return false;
}
else {
for (byte b = 0; b < bytes.Length; b++) {
if (moreOnesPossible) {
switch (bytes[b]) {
case 1:
case 3:
case 7:
case 15:
case 31:
case 63:
case 127:
case 255:
continue;
default:
switch (bytes[b]) {
case 128:
case 192:
case 224:
case 240:
case 248:
case 252:
case 254:
moreOnesPossible = false;
continue;
default:
return false;
}
}
}
else {
if (bytes[b] > 0) { return (false); }
}
}
}
}
return true;
}

GetType() and Typeof() in C#

itemVal = "0";
res = int.TryParse(itemVal, out num);
if ((res == true) && (num.GetType() == typeof(byte)))
return true;
else
return false; // goes here when I debugging.
Why num.GetType() == typeof(byte) does not return true ?
Because num is an int, not a byte.
GetType() gets the System.Type of the object at runtime. In this case, it's the same as typeof(int), since num is an int.
typeof() gets the System.Type object of a type at compile-time.
Your comment indicates you're trying to determine if the number fits into a byte or not; the contents of the variable do not affect its type (actually, it's the type of the variable that restricts what its contents can be).
You can check if the number would fit into a byte this way:
if ((num >= 0) && (num < 256)) {
// ...
}
Or this way, using a cast:
if (unchecked((byte)num) == num) {
// ...
}
It seems your entire code sample could be replaced by the following, however:
byte num;
return byte.TryParse(itemVal, num);
Simply because you are comparing a byte with an int
If you want to know number of bytes try this simple snippet:
int i = 123456;
Int64 j = 123456;
byte[] bytesi = BitConverter.GetBytes(i);
byte[] bytesj = BitConverter.GetBytes(j);
Console.WriteLine(bytesi.Length);
Console.WriteLine(bytesj.Length);
Output:
4
8
because and int and a byte are different data types.
an int (as it is commonly known) is 4 bytes (32 bits) an Int64, or Int16 are 64 or 16 bits respectively
a byte is only 8 bits
If num is an int it will never return true
If you wanna check if this int value would fit a byte, you might test the following;
int num = 0;
byte b = 0;
if (int.TryParse(itemVal, out num) && byte.TryParse(itemVal, b))
{
return true; //Could be converted to Int32 and also to Byte
}

Categories