Efficient way to read big endian data in C#

Efficient way to read big endian data in C# - c#

I use the following code to read BigEndian information using BinaryReader but I'm not sure if it is the efficient way of doing it. Is there any better solution?
Here is my code:
// some code to initialize the stream value
// set the length value to the Int32 size
BinaryReader reader =new BinaryReader(stream);
byte[] bytes = reader.ReadBytes(length);
Array.Reverse(bytes);
int result = System.BitConverter.ToInt32(temp, 0);

BitConverter.ToInt32 isn't very fast in the first place. I'd simply use
public static int ToInt32BigEndian(byte[] buf, int i)
{
return (buf[i]<<24) | (buf[i+1]<<16) | (buf[i+2]<<8) | buf[i+3];
}
You could also consider reading more than 4 bytes at a time.

As of 2019 (in fact, since .net core 2.1), there is now
byte[] buffer = ...;
BinaryPrimitives.ReadInt32BigEndian(buffer.AsSpan());
Documentation
Implementation

You could use IPAddress.NetworkToHostOrder, but I have no idea if it's actually more efficient. You'd have to profile it.

Related

Which is a better way to compare 2 files?

I have the following situation in C#:
ZipFile z1 = ZipFile.Read("f1.zip");
ZipFile z2 = ZipFile.Read("f2.zip");
MemoryStream ms1 = new MemoryStream();
MemoryStream ms2 = new MemoryStream()
ZipEntry zipentry1 = zip1["f1.dll"];
ZipEntry zipentry1 = zip2["f2.dll"];
zipentry1.Extract(ms1);
zipentry2.Extract(ms2);
byte[] b1 = new byte[ms1.Length];
byte[] b2 = new byte[ms2.Length];
ms1.Seek(0, SeekOrigin.Begin);
ms2.Seek(0, SeekOrigin.Begin);
what I have done here is opened 2 zip files f1.zip and f2.zip. Then I extract 2 files inside them (f1.txt and f2.txt inside f1.zip and f2.zip respectively) onto the MemoryStream objects. I now want to compare the files and find out if they are the same or not. I had 2 ways in mind:
1) Read the memory streams byte by byte and compare them.
For this I would use
ms1.BeginRead(b1, 0, (int) ms1.Length, null, null);
ms2.BeginRead(b2, 0, (int) ms2.Length, null, null);
and then run a for loop and compare each byte in b1 and b2.
2) Get the string values for both the memory streams and then do a string compare. For this I would use
string str1 = Encoding.UTF8.GetString(ms1.GetBuffer(), 0, (int)ms1.Length);
string str2 = Encoding.UTF8.GetString(ms2.GetBuffer(), 0, (int)ms2.Length);
and then do a simple string compare.
Now, I know comparing byte by byte will always give me a correct result. But the problem with it is, it will take a lot time as I have to do this for thousands of files. That is why I am thinking about the string compare method which looks to find out if the files are equal or not very quickly. But I am not sure if string compare will give me the correct result as the files are either dlls or media files etc and will contain special characters for sure.
Can anyone tell me if the string compare method will work correctly or not ?
Thanks in advance.
P.S. : I am using DotNetLibrary.

The baseline for this question is the native way to compare arrays: Enumerable.SequenceEqual. You should use that unless you have good reason to do otherwise.
If you care about speed, you could attempt to p/invoke to memcmp in msvcrt.dll and compare the byte arrays that way. I find it hard to imagine that could be beaten. Obviously you'd do a comparison of the lengths first and only call memcmp if the two byte arrays had the same length.
The p/invoke looks like this:
[DllImport("msvcrt.dll", CallingConvention=CallingConvention.Cdecl)]
static extern int memcmp(byte[] lhs, byte[] rhs, UIntPtr count);
But you should only contemplate this if you really do care about speed, and the pure managed alternatives are too slow for you. So, do some timings to make sure you are not optimising prematurely. Well, even to make sure that you are optimising at all.
I'd be very surprised if converting to string was fast. I'd expect it to be slow. And in fact I'd expect your code to fail because there's no reason for your byte arrays to be valid UTF-8. Just forget you ever had that idea!

Compare ZipEntry.Crc and ZipEntry.UncompressedSize of the two files, only if they match uncompress and do the byte comparison. If the two files are the same, their CRC and Size will be the same too. This strategy will save you a ton of CPU cycles.
ZipEntry zipentry1 = zip1["f1.dll"];
ZipEntry zipentry2 = zip2["f2.dll"];
if (zipentry1.Crc == zipentry2.Crc && zipentry1.UncompressedSize == zipentry2.UncompressedSize)
{
// uncompress
zipentry1.Extract(ms1);
zipentry2.Extract(ms2);
byte[] b1 = new byte[ms1.Length];
byte[] b2 = new byte[ms2.Length];
ms1.Seek(0, SeekOrigin.Begin);
ms2.Seek(0, SeekOrigin.Begin);
ms1.BeginRead(b1, 0, (int) ms1.Length, null, null);
ms2.BeginRead(b2, 0, (int) ms2.Length, null, null);
// perform a byte comparison
if (Enumerable.SequenceEqual(b1, b2)) // or a simple for loop
{
// files are the same
}
else
{
// files are different
}
}
else
{
// files are different
}

How to create big sized .txt file?

For certain reasons, I have to create a 1024 kb .txt file.
Below is my current code:
int size = 1024000 //1024 kb..
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
//newPath is local directory, where I store the created file
using (System.IO.StreamWriter sw = File.CreateText(newPath))
{
sw.WriteLine(tobewritten);
}
I have to wait at least 30 minutes to execute this piece of code, which I consider too long.
Now, I would like to ask for advice on how to actually achieve my mentioned objective effectively. Are there any alternatives to do this task? Am I writing bad code? Any help is appreciated.

There are several misunderstandings in the code you provided:
byte[] bytearray = new byte[size];
foreach (byte bit in bytearray)
{
bit = 0;
}
You seem to think that your are initializing each byte in your array bytearray with zero. Instead you just set the loop variable bit (unfortunate naming) to zero size times. Actually this code wouldn't even compile since you cannot assign to the foreach iteration variable.
Also you didn't need initialization here in the first place: byte array elements are automatically initialized to 0.
string tobewritten = string.Empty;
foreach (byte bit in bytearray)
{
tobewritten += bit.ToString();
}
You want to combine the string representation of each byte in your array to the string variable tobewritten. Since strings are immutable you create a new string for each element that has to be garbage collected along with the string you created for bit, this is relatively expensive, especially when you create 2048000 one of them - use a Stringbuilder instead.
Lastly all of that is not needed at all anyway - it seems you just want to write a bunch of "0" characters to a text file - if you are not worried about creating a single large string of zeros (it depends on the value of size whether this makes sense) you can just create the string directly to do this one go - or alternatively write a smaller string directly to the stream a bunch of times.
using (var file = File.CreateText(newpath))
{
file.WriteLine(new string('0', size));
}

Replace the string with a pre-sized StringBuilder to avoid unnecessary allocations.
Or, better yet, write each piece directly to the StreamWriter instead of pointlessly building a 100MB in-memory string first.

In C# how can I truncate a byte[] array

I have a byte[] array of one size, and I would like to truncate it into a smaller array?
I just want to chop the end off.

Arrays are fixed-size in C# (.NET).
You'll have to copy the contents to a new one.
byte[] sourceArray = ...
byte[] truncArray = new byte[10];
Array.Copy(sourceArray , truncArray , truncArray.Length);

You could use Array.Resize, but all this really does is make a truncated copy of the original array and then replaces the original array with the new one.

private static void Truncate() {
byte[] longArray = new byte[] {1,2,3,4,5,6,7,8,9,10};
Array.Resize(ref longArray, 5);//longArray = {1,2,3,4,5}
//if you like linq
byte[] shortArray = longArray.Take(5).ToArray();
}

I usually create an extension method:
public static byte[] SubByteArray(this byte[] byteArray, int len)
{
byte[] tmp = new byte[len];
Array.Copy(byteArray, tmp, len);
return tmp;
}
Which can be called on the byte array easily like this:
buffer.SubByteArray(len)

You can't truncate an array in C#. They are fixed in length.
If you want a data structure that you can truncate and acts like an array, you should use List<T>. You can use the List<T>.RemoveRange method to achieve this.

By the way, Array.Resize method takes much more time to complete. In my simple case, I just needed to resize array of bytes (~8000 items to ~20 items):
Array.Resize // 1728 ticks
Array.Copy // 8 ticks

You can now use ellipse notation in C#.
var truncArray = sourceArray[..10];

Cast filestream length to int

I have a question about the safety of a cast from long to int. I fear that the method I wrote might fail at this cast. Can you please take a look at the code below and tell me if it is possible to write something that would avoid a possible fail?
Thank you in advance.
public static string ReadDecrypted(string fileFullPath)
{
string result = string.Empty;
using (FileStream fs = new FileStream(fileFullPath, FileMode.Open, FileAccess.Read))
{
int fsLength = (int)fs.Length;
byte[] decrypted;
byte[] read = new byte[fsLength];
if (fs.CanRead)
{
fs.Read(read, 0, fsLength);
decrypted = ProtectedData.Unprotect(read, CreateEntropy(), DataProtectionScope.CurrentUser);
result = Utils.AppDefaultEncoding.GetString(decrypted, 0, decrypted.Length);
}
}
return result;
}

the short answer is: yes, this way you will have problems with any file with a length >= 2 GB!
if you don't expect any files that big then you can insert directly at the start of the using block:
if (((int)fs.Length) != fs.Length) throw new Exception ("too big");
otherwise you should NOT cast to int, but change byte[] read = new byte[fsLength];
to byte[] read = new byte[fs.Length]; and use a loop to read the file content in "chunks" of max. 2 GB per chunk.
Another alternative (available in .NET4) is to use MemoryMappedFile (see http://msdn.microsoft.com/en-us/library/dd997372.aspx) - this way you don't need to call Read at all :-)

Well, int is 32-bit and long is 64-bit, so there's always the possibility of losing some data with the cast if you're opening up 2GB files; on the other hand, that allocation of a byte array of fsLength would seem to indicate you're not expecting files that big. Put a check in to make sure that fs.Length isn't greater than 2,147,483,647, and you should be fine.

Working with byte arrays in C#

I have a byte array that represents a complete TCP/IP packet. For clarification, the byte array is ordered like this:
(IP Header - 20 bytes)(TCP Header - 20 bytes)(Payload - X bytes)
I have a Parse function that accepts a byte array and returns a TCPHeader object. It looks like this:
TCPHeader Parse( byte[] buffer );
Given the original byte array, here is the way I'm calling this function right now.
byte[] tcpbuffer = new byte[ 20 ];
System.Buffer.BlockCopy( packet, 20, tcpbuffer, 0, 20 );
TCPHeader tcp = Parse( tcpbuffer );
Is there a convenient way to pass the TCP byte array, i.e., bytes 20-39 of the complete TCP/IP packet, to the Parse function without extracting it to a new byte array first?
In C++, I could do the following:
TCPHeader tcp = Parse( &packet[ 20 ] );
Is there anything similar in C#? I want to avoid the creation and subsequent garbage collection of the temporary byte array if possible.

A common practice you can see in the .NET framework, and that I recommend using here, is specifying the offset and length. So make your Parse function also accept the offset in the passed array, and the number of elements to use.
Of course, the same rules apply as if you were to pass a pointer like in C++ - the array shouldn't be modified or else it may result in undefined behavior if you are not sure when exactly the data will be used. But this is no problem if you are no longer going to be modifying the array.

I would pass an ArraySegment<byte> in this case.
You would change your Parse method to this:
// Changed TCPHeader to TcpHeader to adhere to public naming conventions.
TcpHeader Parse(ArraySegment<byte> buffer)
And then you would change the call to this:
// Create the array segment.
ArraySegment<byte> seg = new ArraySegment<byte>(packet, 20, 20);
// Call parse.
TcpHeader header = Parse(seg);
Using the ArraySegment<T> will not copy the array, and it will do the bounds checking for you in the constructor (so that you don't specify incorrect bounds). Then you change your Parse method to work with the bounds specified in the segment, and you should be ok.
You can even create a convenience overload that will accept the full byte array:
// Accepts full array.
TcpHeader Parse(byte[] buffer)
{
// Call the overload.
return Parse(new ArraySegment<byte>(buffer));
}
// Changed TCPHeader to TcpHeader to adhere to public naming conventions.
TcpHeader Parse(ArraySegment<byte> buffer)

If an IEnumerable<byte> is acceptable as an input rather than byte[], and you're using C# 3.0, then you could write:
tcpbuffer.Skip(20).Take(20);
Note that this still allocates enumerator instances under the covers, so you don't escape allocation altogether, and so for a small number of bytes it may actually be slower than allocating a new array and copying the bytes into it.
I wouldn't worry too much about allocation and GC of small temporary arrays to be honest though. The .NET garbage collected environment is extremely efficient at this type of allocation pattern, particularly if the arrays are short lived, so unless you've profiled it and found GC to be a problem then I'd write it in the most intuitive way and fix up performance issues when you know you have them.

If you really need these kind of control, you gotta look at unsafe feature of C#. It allows you to have a pointer and pin it so that GC doesn't move it:
fixed(byte* b = &bytes[20]) {
}
However this practice is not suggested for working with managed only code if there are no performance issues. You could pass the offset and length as in Stream class.

If you can change the parse() method, change it to accept the offset where the processing should begin.
TCPHeader Parse( byte[] buffer , int offset);

You could use LINQ to do something like:
tcpbuffer.Skip(20).Take(20);
But System.Buffer.BlockCopy / System.Array.Copy are probably more efficient.

This is how I solved it coming from being a c programmer to a c# programmer. I like to use MemoryStream to convert it to a stream and then BinaryReader to break apart the binary block of data. Had to add the two helper functions to convert from network order to little endian. Also for building a byte[] to send see
Is there a way cast an object back to it original type without specifing every case? which has a function that allow for converting from an array of objects to a byte[].
Hashtable parse(byte[] buf, int offset )
{
Hashtable tcpheader = new Hashtable();
if(buf.Length < (20+offset)) return tcpheader;
System.IO.MemoryStream stm = new System.IO.MemoryStream( buf, offset, buf.Length-offset );
System.IO.BinaryReader rdr = new System.IO.BinaryReader( stm );
tcpheader["SourcePort"] = ReadUInt16BigEndian(rdr);
tcpheader["DestPort"] = ReadUInt16BigEndian(rdr);
tcpheader["SeqNum"] = ReadUInt32BigEndian(rdr);
tcpheader["AckNum"] = ReadUInt32BigEndian(rdr);
tcpheader["Offset"] = rdr.ReadByte() >> 4;
tcpheader["Flags"] = rdr.ReadByte() & 0x3f;
tcpheader["Window"] = ReadUInt16BigEndian(rdr);
tcpheader["Checksum"] = ReadUInt16BigEndian(rdr);
tcpheader["UrgentPointer"] = ReadUInt16BigEndian(rdr);
// ignoring tcp options in header might be dangerous
return tcpheader;
}
UInt16 ReadUInt16BigEndian(BinaryReader rdr)
{
UInt16 res = (UInt16)(rdr.ReadByte());
res <<= 8;
res |= rdr.ReadByte();
return(res);
}
UInt32 ReadUInt32BigEndian(BinaryReader rdr)
{
UInt32 res = (UInt32)(rdr.ReadByte());
res <<= 8;
res |= rdr.ReadByte();
res <<= 8;
res |= rdr.ReadByte();
res <<= 8;
res |= rdr.ReadByte();
return(res);
}

I don't think you can do something like that in C#. You could either make the Parse() function use an offset, or create 3 byte arrays to begin with; one for the IP Header, one for the TCP Header and one for the Payload.

There is no way using verifiable code to do this. If your Parse method can deal with having an IEnumerable<byte> then you can use a LINQ expression
TCPHeader tcp = Parse(packet.Skip(20));

Some people who answered
tcpbuffer.Skip(20).Take(20);
did it wrong. This is excellent solution, but the code should look like:
packet.Skip(20).Take(20);
You should use Skip and Take methods on your main packet, and tcpbuffer should not be exist in the code you posted. Also you don't have to use then System.Buffer.BlockCopy.
JaredPar was almost correct, but he forgot the Take method
TCPHeader tcp = Parse(packet.Skip(20));
But he didn't get wrong with tcpbuffer.
Your last line of your posted code should look like:
TCPHeader tcp = Parse(packet.Skip(20).Take(20));
But if you want to use System.Buffer.BlockCopy anyway instead Skip and Take, because maybe it is better in performance as Steven Robbins answered : "But System.Buffer.BlockCopy / System.Array.Copy are probably more efficient", or your Parse function cannot deal with IEnumerable<byte>, or you are more used to System.Buffer.Block in your posted question, then I would recommend to simply just make tcpbuffer not local variable, but private or protected or public or internal and static or not field (in other words it should be defined and created outside method where your posted code is executed). Thus tcpbuffer will be created only once, and his values (bytes) will be set every time you pass the code you posted at System.Buffer.BlockCopy line.
This way your code can look like:
class Program
{
//Your defined fields, properties, methods, constructors, delegates, events and etc.
private byte[] tcpbuffer = new byte[20];
Your unposted method title(arguments/parameters...)
{
//Your unposted code before your posted code
//byte[] tcpbuffer = new byte[ 20 ]; No need anymore! this line can be removed.
System.Buffer.BlockCopy( packet, 20, this.tcpbuffer, 0, 20 );
TCPHeader tcp = Parse( this.tcpbuffer );
//Your unposted code after your posted code
}
//Your defined fields, properties, methods, constructors, delegates, events and etc.
}
or simply only the necessary part:
private byte[] tcpbuffer = new byte[20];
...
{
...
//byte[] tcpbuffer = new byte[ 20 ]; No need anymore! This line can be removed.
System.Buffer.BlockCopy( packet, 20, this.tcpbuffer, 0, 20 );
TCPHeader tcp = Parse( this.tcpbuffer );
...
}
If you did:
private byte[] tcpbuffer;
instead, then you must on your constructor/s add the line:
this.tcpbuffer = new byte[20];
or
tcpbuffer = new byte[20];
You know that you don't have to type this. before tcpbuffer, it is optional, but if you defined it static, then you cannot do that. Instead you'll have to type the class name and then the dot '.', or leave it (just type the name of the field and that's it all).

Why not flip the problem and create classes that overlay the buffer to pull bits out?
// member variables
IPHeader ipHeader = new IPHeader();
TCPHeader tcpHeader = new TCPHeader();
// passing in the buffer, an offset and a length allows you
// to move the header over the buffer
ipHeader.SetBuffer( buffer, 0, 20 );
if( ipHeader.Protocol == TCP )
{
tcpHeader.SetBuffer( buffer, ipHeader.ProtocolOffset, 20 );
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Efficient way to read big endian data in C# - c#

BitConverter.ToInt32 isn't very fast in the first place. I'd simply use public static int ToInt32BigEndian(byte[] buf, int i) { return (buf[i]<<24) | (buf[i+1]<<16) | (buf[i+2]<<8) | buf[i+3]; } You could also consider reading more than 4 bytes at a time.

As of 2019 (in fact, since .net core 2.1), there is now byte[] buffer = ...; BinaryPrimitives.ReadInt32BigEndian(buffer.AsSpan()); Documentation Implementation

You could use IPAddress.NetworkToHostOrder, but I have no idea if it's actually more efficient. You'd have to profile it.

Related

Which is a better way to compare 2 files?

How to create big sized .txt file?

In C# how can I truncate a byte[] array

Cast filestream length to int

Working with byte arrays in C#

Categories

Resources