How to write bits to a file? - c#

How to write bits (not bytes) to a file with c#, .net? I'm preety stuck with it.
Edit: i'm looking for a different way that just writing every 8 bits as a byte

The smallest amount of data you can write at one time is a byte.
If you need to write individual bit-values. (Like for instance a binary format that requires a 1 bit flag, a 3 bit integer and a 4 bit integer); you would need to buffer the individual values in memory and write to the file when you have a whole byte to write. (For performance, it makes sense to buffer more and write larger chunks to the file).

Accumulate the bits in a buffer (a single byte can qualify as a "buffer")
When adding a bit, left-shift the buffer and put the new bit in the lowest position using OR
Once the buffer is full, append it to the file

I've made something like this to emulate a BitsWriter.
private BitArray bitBuffer = new BitArray(new byte[65536]);
private int bitCount = 0;
// Write one int. In my code, this is a byte
public void write(int b)
{
BitArray bA = new BitArray((byte)b);
int[] pattern = new int[8];
writeBitArray(bA);
}
// Write one bit. In my code, this is a binary value, and the amount of times
public void write(int b, int len)
{
int[] pattern = new int[len];
BitArray bA = new BitArray(len);
for (int i = 0; i < len; i++)
{
bA.Set(i, (b == 1));
}
writeBitArray(bA);
}
private void writeBitArray(BitArray bA)
{
for (int i = 0; i < bA.Length; i++)
{
bitBuffer.Set(bitCount + i, bA[i]);
bitCount++;
}
if (bitCount % 8 == 0)
{
BitArray bitBufferWithLength = new BitArray(new byte[bitCount / 8]);
byte[] res = new byte[bitBuffer.Count / 8];
for (int i = 0; i < bitCount; i++)
{
bitBufferWithLength.Set(i, (bitBuffer[i]));
}
bitBuffer.CopyTo(res, 0);
bitCount = 0;
base.BaseStream.Write(res, 0, res.Length);
}
}

You will have to use bitshifts or binary arithmetic, as you can only write one byte at a time, not individual bits.

Related

C# Byte[Hex] Array to String - No Conversion

so I am using following Code to read from Memory:
public static extern bool ReadProcessMemory(IntPtr handle, IntPtr baseAddress, [Out] byte[] buffer, int size, out IntPtr numberOfBytesRead);
It outputs the memory Hex code as Byte Array using the code above:
buffer[0] = 01;
buffer[1] = 2D;
buffer[2] = F2;
I want to search in a certain range of the Hex Code for a certain Hex Array.
To do that I would like to use the "KMP Algorithm for Pattern Searching".
Currently I am using the following code to achieve that:
byte[] moduleBytes = {0};
IntPtr bytesRead;
ReadProcessMemory(process.Handle, baseAddress, moduleBytes, moduleBytes.Length, out bytesRead);
string buffer = "";
foreach (byte bytesfrommemory in moduleBytes)
{
buffer += bytesfrommemory.ToString("X");;
}
//algorithm
string data = buffer;
int[] value = SearchString(data, pattern);
foreach (int entry in value)
{
Console.WriteLine(entry); //Outputs the offset where it found the code
}
The issue with that is that looping trough each byte to add it to the buffer string takes ages with +1000 bytes. Is there a faster way to "convert" the byte array from array to string without actually converting it as I still need the raw byte array just as string?
I tried it with the following code, but it converts it to something different:
char[] characters = moduleBytes.Select(o => (char)o).ToArray();
string buffer = new string(characters);
Thanks for any help :)
Alexei's comment is on point; you are converting bytes to a string to run a search algorithm that will convert the string to char array (i.e. a byte array, i.e. what you started with) in order to do its work. Finding a byte array within a byte array using KMP is the same as finding a string within a string
To demonstrate my point I casted around for an implementation of KMP that works on strings. I found one at Geeks For Geeks and swapped it from working on strings, to working on bytes, literally just just editing the type in the method calls; string has a Length and can be indexed like an array, byte array has a Length and can be indexed because it is an array etc - there is no more needed to make this version work:
// C# program for implementation of KMP pattern
// searching algorithm
using System;
public class GFG {
void KMPSearch(byte[] pat, byte[] txt)
{
int M = pat.Length;
int N = txt.Length;
// create lps[] that will hold the longest
// prefix suffix values for pattern
int[] lps = new int[M];
int j = 0; // index for pat[]
// Preprocess the pattern (calculate lps[]
// array)
computeLPSArray(pat, M, lps);
int i = 0; // index for txt[]
while (i < N) {
if (pat[j] == txt[i]) {
j++;
i++;
}
if (j == M) {
Console.Write("Found pattern "
+ "at index " + (i - j));
j = lps[j - 1];
}
// mismatch after j matches
else if (i < N && pat[j] != txt[i]) {
// Do not match lps[0..lps[j-1]] characters,
// they will match anyway
if (j != 0)
j = lps[j - 1];
else
i = i + 1;
}
}
}
void computeLPSArray(byte[] pat, int M, int[] lps)
{
// length of the previous longest prefix suffix
int len = 0;
int i = 1;
lps[0] = 0; // lps[0] is always 0
// the loop calculates lps[i] for i = 1 to M-1
while (i < M) {
if (pat[i] == pat[len]) {
len++;
lps[i] = len;
i++;
}
else // (pat[i] != pat[len])
{
// This is tricky. Consider the example.
// AAACAAAA and i = 7. The idea is similar
// to search step.
if (len != 0) {
len = lps[len - 1];
// Also, note that we do not increment
// i here
}
else // if (len == 0)
{
lps[i] = len;
i++;
}
}
}
}
// Driver program to test above function
public static void Main()
{
string txt = System.Text.Encoding.ASCII.GetBytes("ABABDABACDABABCABAB");
string pat = System.Text.Encoding.ASCII.GetBytes("ABABCABAB");
new GFG().KMPSearch(pat, txt);
}
}
// This code has been contributed by Amit Khandelwal.
The biggest work (most number of keys typed) is using System.Text.Encoding.ASCII.GetBytes to get a pair of byte arrays to feed in 😀
Take away point; don't convert your memory bytes at all - convert your search term to bytes and search it in the memory bytes
Disclaimer: I offer zero guarantee that this is a correct implementation of KMP that functions adequately for your use case. It seems to work, but I'm only pointing out that you don't need to convert to strings and back again to operate code that fundamentally can search bytes in exactly the same way as it searches strings

Bitwise operation performance, how to improve

I have a simple task: determine how many bytes is necessary to encode some number (byte array length) to byte array and encode final value (implement this article: Encoded Length and Value Bytes).
Originally I wrote a quick method that accomplish the task:
public static Byte[] Encode(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
List<Byte> computedRawData = new List<Byte> { enclosingtag };
// if array size is less than 128, encode length directly. No questions here
if (rawData.Length < 128) {
computedRawData.Add((Byte)rawData.Length);
} else {
// convert array size to a hex string
String hexLength = rawData.Length.ToString("x2");
// if hex string has odd length, align it to even by prepending hex string
// with '0' character
if (hexLength.Length % 2 == 1) { hexLength = "0" + hexLength; }
// take a pair of hex characters and convert each octet to a byte
Byte[] lengthBytes = Enumerable.Range(0, hexLength.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hexLength.Substring(x, 2), 16))
.ToArray();
// insert padding byte, set bit 7 to 1 and add byte count required
// to encode length bytes
Byte paddingByte = (Byte)(128 + lengthBytes.Length);
computedRawData.Add(paddingByte);
computedRawData.AddRange(lengthBytes);
}
computedRawData.AddRange(rawData);
return computedRawData.ToArray();
}
This is an old code and was written in an awful way.
Now I'm trying to optimize the code by using either, bitwise operators, or BitConverter class. Here is an example of bitwise-edition:
public static Byte[] Encode2(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
List<Byte> computedRawData = new List<Byte>(rawData);
if (rawData.Length < 128) {
computedRawData.Insert(0, (Byte)rawData.Length);
} else {
// temp number
Int32 num = rawData.Length;
// track byte count, this will be necessary further
Int32 counter = 1;
// simply make bitwise AND to extract byte value
// and shift right while remaining value is still more than 255
// (there are more than 8 bits)
while (num >= 256) {
counter++;
computedRawData.Insert(0, (Byte)(num & 255));
num = num >> 8;
}
// compose final array
computedRawData.InsertRange(0, new[] { (Byte)(128 + counter), (Byte)num });
}
computedRawData.Insert(0, enclosingtag);
return computedRawData.ToArray();
}
and the final implementation with BitConverter class:
public static Byte[] Encode3(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
List<Byte> computedRawData = new List<Byte>(rawData);
if (rawData.Length < 128) {
computedRawData.Insert(0, (Byte)rawData.Length);
} else {
// convert integer to a byte array
Byte[] bytes = BitConverter.GetBytes(rawData.Length);
// start from the end of a byte array to skip unnecessary zero bytes
for (int i = bytes.Length - 1; i >= 0; i--) {
// once the byte value is non-zero, take everything starting
// from the current position up to array start.
if (bytes[i] > 0) {
// we need to reverse the array to get the proper byte order
computedRawData.InsertRange(0, bytes.Take(i + 1).Reverse());
// compose final array
computedRawData.Insert(0, (Byte)(128 + i + 1));
computedRawData.Insert(0, enclosingtag);
return computedRawData.ToArray();
}
}
}
return null;
}
All methods do their work as expected. I used an example from Stopwatch class page to measure performance. And performance tests surprised me. My test method performed 1000 runs of the method to encode a byte array (actually, only array sixe) with 100 000 elements and average times are:
Encode -- around 200ms
Encode2 -- around 270ms
Encode3 -- around 320ms
I personally like method Encode2, because the code looks more readable, but its performance isn't that good.
The question: what you woul suggest to improve Encode2 method performance or to improve Encode readability?
Any help will be appreciated.
===========================
Update: Thanks to all who participated in this thread. I took into consideration all suggestions and ended up with this solution:
public static Byte[] Encode6(Byte[] rawData, Byte enclosingtag) {
if (rawData == null) {
return new Byte[] { enclosingtag, 0 };
}
Byte[] retValue;
if (rawData.Length < 128) {
retValue = new Byte[rawData.Length + 2];
retValue[0] = enclosingtag;
retValue[1] = (Byte)rawData.Length;
} else {
Byte[] lenBytes = new Byte[3];
Int32 num = rawData.Length;
Int32 counter = 0;
while (num >= 256) {
lenBytes[counter] = (Byte)(num & 255);
num >>= 8;
counter++;
}
// 3 is: len byte and enclosing tag
retValue = new byte[rawData.Length + 3 + counter];
rawData.CopyTo(retValue, 3 + counter);
retValue[0] = enclosingtag;
retValue[1] = (Byte)(129 + counter);
retValue[2] = (Byte)num;
Int32 n = 3;
for (Int32 i = counter - 1; i >= 0; i--) {
retValue[n] = lenBytes[i];
n++;
}
}
return retValue;
}
Eventually I moved away from lists to fixed-sized byte arrays. Avg time against the same data set is now about 65ms. It appears that lists (not bitwise operations) gives me a significant penalty in performance.
The main problem here is almost certainly the allocation of the List, and the allocation needed when you are inserting new elements, and when the list is converted to an array in the end. This code probably spend most of its time in the garbage collector and memory allocator. The use vs non-use of bitwise operators probably means very little in comparison, and I would have looked into ways to reduce the amount of memory you allocate first.
One way is to send in a reference to a byte array allocated in advance and and an index to where you are in the array instead of allocating and returning the data, and then return an integer telling how many bytes you have written. Working on large arrays is usually more efficient than working on many small objects. As others have mentioned, use a profiler, and see where your code spend its time.
Of cause the optimization I mentioned will makes your code more low level in nature, and more close to what you would typically do in C, but there is often a trade off between readability and performance.
Using "reverse, append, reverse" instead of "insert at front", and preallocating everything, it might be something like this: (not tested)
public static byte[] Encode4(byte[] rawData, byte enclosingtag) {
if (rawData == null) {
return new byte[] { enclosingtag, 0 };
}
List<byte> computedRawData = new List<byte>(rawData.Length + 6);
computedRawData.AddRange(rawData);
if (rawData.Length < 128) {
computedRawData.InsertRange(0, new byte[] { enclosingtag, (byte)rawData.Length });
} else {
computedRawData.Reverse();
// temp number
int num = rawData.Length;
// track byte count, this will be necessary further
int counter = 1;
// simply cast to byte to extract byte value
// and shift right while remaining value is still more than 255
// (there are more than 8 bits)
while (num >= 256) {
counter++;
computedRawData.Add((byte)num);
num >>= 8;
}
// compose final array
computedRawData.Add((byte)num);
computedRawData.Add((byte)(counter + 128));
computedRawData.Add(enclosingtag);
computedRawData.Reverse();
}
return computedRawData.ToArray();
}
I don't know for sure whether it's going to be faster, but it would make sense - now the expensive "insert at front" operation is mostly avoided, except in the case where there would be only one of them (probably not enough to balance with the two reverses).
An other idea is to limit the insert at front to only one time in an other way: collect all the things that have to be inserted there and then do it once. Could look something like this: (not tested)
public static byte[] Encode5(byte[] rawData, byte enclosingtag) {
if (rawData == null) {
return new byte[] { enclosingtag, 0 };
}
List<byte> computedRawData = new List<byte>(rawData);
if (rawData.Length < 128) {
computedRawData.InsertRange(0, new byte[] { enclosingtag, (byte)rawData.Length });
} else {
// list of all things that will be inserted
List<byte> front = new List<byte>(8);
// temp number
int num = rawData.Length;
// track byte count, this will be necessary further
int counter = 1;
// simply cast to byte to extract byte value
// and shift right while remaining value is still more than 255
// (there are more than 8 bits)
while (num >= 256) {
counter++;
front.Insert(0, (byte)num); // inserting in tiny list, not so bad
num >>= 8;
}
// compose final array
front.InsertRange(0, new[] { (byte)(128 + counter), (byte)num });
front.Insert(0, enclosingtag);
computedRawData.InsertRange(0, front);
}
return computedRawData.ToArray();
}
If it's not good enough or didn't help (or if this is worse - hey, could be), I'll try to come up with more ideas.

Dotnet Hex string to Java

Have a problem, much like this post: How to read a .NET Guid into a Java UUID.
Except, from a remote svc I get a hex str formatted like this: ABCDEFGH-IJKL-MNOP-QRST-123456.
I need to match the GUID.ToByteArray() generated .net byte array GH-EF-CD-AB-KL-IJ-OP-MN- QR- ST-12-34-56 in Java for hashing purposes.
I'm kinda at a loss as to how to parse this. Do I cut off the QRST-123456 part and perhaps use something like the Commons IO EndianUtils on the other part, then stitch the 2 arrays back together as well? Seems way too complicated.
I can rearrange the string, but I shouldn't have to do any of these. Mr. Google doesn't wanna help me neither..
BTW, what is the logic in Little Endian land that keeps those last 6 char unchanged?
Yes, for reference, here's what I've done {sorry for 'answer', but had trouble formatting it properly in comment}:
String s = "3C0EA2F3-B3A0-8FB0-23F0-9F36DEAA3F7E";
String[] splitz = s.split("-");
String rebuilt = "";
for (int i = 0; i < 3; i++) {
// Split into 2 char chunks. '..' = nbr of chars in chunks
String[] parts = splitz[i].split("(?<=\\G..)");
for (int k = parts.length -1; k >=0; k--) {
rebuilt += parts[k];
}
}
rebuilt += splitz[3]+splitz[4];
I know, it's hacky, but it'll do for testing.
Make it into a byte[] and skip the first 3 bytes:
package guid;
import java.util.Arrays;
public class GuidConvert {
static byte[] convertUuidToBytes(String guid) {
String hexdigits = guid.replaceAll("-", "");
byte[] bytes = new byte[hexdigits.length()/2];
for (int i = 0; i < bytes.length; i++) {
int x = Integer.parseInt(hexdigits.substring(i*2, (i+1)*2), 16);
bytes[i] = (byte) x;
}
return bytes;
}
static String bytesToHexString(byte[] bytes) {
StringBuilder buf = new StringBuilder();
for (byte b : bytes) {
int i = b >= 0 ? b : (int) b + 256;
buf.append(Integer.toHexString(i / 16));
buf.append(Integer.toHexString(i % 16));
}
return buf.toString();
}
public static void main(String[] args) {
String guid = "3C0EA2F3-B3A0-8FB0-23F0-9F36DEAA3F7E";
byte[] bytes = convertUuidToBytes(guid);
System.err.println("GUID = "+ guid);
System.err.println("bytes = "+ bytesToHexString(bytes));
byte[] tail = Arrays.copyOfRange(bytes, 3, bytes.length);
System.err.println("tail = "+ bytesToHexString(tail));
}
}
The last group of 6 bytes is not reversed because it is an array of bytes. The first four groups are reversed because they are a four-byte integer followed by three two-byte integers.

C# - Converting a Sequence of Numbers into Bytes

I am trying to send a UDP packet of bytes corresponding to the numbers 1-1000 in sequence. How do I convert each number (1,2,3,4,...,998,999,1000) into the minimum number of bytes required and put them in a sequence that I can send as a UDP packet?
I've tried the following with no success. Any help would be greatly appreciated!
List<byte> byteList = new List<byte>();
for (int i = 1; i <= 255; i++)
{
byte[] nByte = BitConverter.GetBytes((byte)i);
foreach (byte b in nByte)
{
byteList.Add(b);
}
}
for (int g = 256; g <= 1000; g++)
{
UInt16 st = Convert.ToUInt16(g);
byte[] xByte = BitConverter.GetBytes(st);
foreach (byte c in xByte)
{
byteList.Add(c);
}
}
byte[] sendMsg = byteList.ToArray();
Thank you.
You need to use :
BitConverter.GetBytes(INTEGER);
Think about how you are going to be able to tell the difference between:
260, 1 -> 0x1, 0x4, 0x1
1, 4, 1 -> 0x1, 0x4, 0x1
If you use one byte for numbers up to 255 and two bytes for the numbers 256-1000, you won't be able to work out at the other end which number corresponds to what.
If you just need to encode them as described without worrying about how they are decoded, it smacks to me of a contrived homework assignment or test, and I'm uninclined to solve it for you.
I think you are looking for something along the lines of a 7-bit encoded integer:
protected void Write7BitEncodedInt(int value)
{
uint num = (uint) value;
while (num >= 0x80)
{
this.Write((byte) (num | 0x80));
num = num >> 7;
}
this.Write((byte) num);
}
(taken from System.IO.BinaryWriter.Write(String)).
The reverse is found in the System.IO.BinaryReader class and looks something like this:
protected internal int Read7BitEncodedInt()
{
byte num3;
int num = 0;
int num2 = 0;
do
{
if (num2 == 0x23)
{
throw new FormatException(Environment.GetResourceString("Format_Bad7BitInt32"));
}
num3 = this.ReadByte();
num |= (num3 & 0x7f) << num2;
num2 += 7;
}
while ((num3 & 0x80) != 0);
return num;
}
I do hope this is not homework, even though is really smells like it.
EDIT:
Ok, so to put it all together for you:
using System;
using System.IO;
namespace EncodedNumbers
{
class Program
{
protected static void Write7BitEncodedInt(BinaryWriter bin, int value)
{
uint num = (uint)value;
while (num >= 0x80)
{
bin.Write((byte)(num | 0x80));
num = num >> 7;
}
bin.Write((byte)num);
}
static void Main(string[] args)
{
MemoryStream ms = new MemoryStream();
BinaryWriter bin = new BinaryWriter(ms);
for(int i = 1; i < 1000; i++)
{
Write7BitEncodedInt(bin, i);
}
byte[] data = ms.ToArray();
int size = data.Length;
Console.WriteLine("Total # of Bytes = " + size);
Console.ReadLine();
}
}
}
The total size I get is 1871 bytes for numbers 1-1000.
Btw, could you simply state whether or not this is homework? Obviously, we will still help either way. But we would much rather you try a little harder so you can actually learn for yourself.
EDIT #2:
If you want to just pack them in ignoring the ability to decode them back, you can do something like this:
protected static void WriteMinimumInt(BinaryWriter bin, int value)
{
byte[] bytes = BitConverter.GetBytes(value);
int skip = bytes.Length-1;
while (bytes[skip] == 0)
{
skip--;
}
for (int i = 0; i <= skip; i++)
{
bin.Write(bytes[i]);
}
}
This ignores any bytes that are zero (from MSB to LSB). So for 0-255 it will use one byte.
As states elsewhere, this will not allow you to decode the data back since the stream is now ambiguous. As a side note, this approach crams it down to 1743 bytes (as opposed to 1871 using 7-bit encoding).
A byte can only hold 256 distinct values, so you cannot store the numbers above 255 in one byte. The easiest way would be to use short, which is 16 bits. If you realy need to conserve space, you can use 10 bit numbers and pack that into a byte array ( 10 bits = 2^10 = 1024 possible values).
Naively (also, untested):
List<byte> bytes = new List<byte>();
for (int i = 1; i <= 1000; i++)
{
byte[] nByte = BitConverter.GetBytes(i);
foreach(byte b in nByte) bytes.Add(b);
}
byte[] byteStream = bytes.ToArray();
Will give you a stream of bytes were each group of 4 bytes is a number [1, 1000].
You might be tempted to do some work so that i < 256 take a single byte, i < 65535 take two bytes, etc. However, if you do this you can't read the values out of the stream. Instead, you'd add length encoding or sentinels bits or something of the like.
I'd say, don't. Just compress the stream, either using a built-in class, or gin up a Huffman encoding implementation using an agree'd upon set of frequencies.

What is the fastest way to convert a float[] to a byte[]?

I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!
I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.
EDIT:
To those critics screaming "premature optimization":
I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.
And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.
So I guess the lesson here is not to make premature assumptions ;)
There is a dirty fast (not unsafe code) way of doing this:
[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
[FieldOffset(0)]
public Byte[] Bytes;
[FieldOffset(0)]
public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
Double result = 0;
for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
{
result += convert.Doubles[i];
}
return result;
}
This will work, but I'm not sure of the support on Mono or newer versions of the CLR. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being eight bytes large so no calculation is necessary there.
I've looked for it some more, and it's actually described on MSDN, How to: Create a C/C++ Union by Using Attributes (C# and Visual Basic), so chances are this will be supported in future versions. I am not sure about Mono though.
Premature optimization is the root of all evil! #Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):
Elements BinaryWriter(float) BinaryWriter(byte[])
-----------------------------------------------------------
10 8.72ms 8.76ms
100 8.94ms 8.82ms
1000 10.32ms 9.06ms
10000 32.56ms 10.34ms
100000 213.28ms 739.90ms
1000000 1955.92ms 10668.56ms
There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.
So go with what is simple:
float[] data = new float[...];
foreach(float value in data)
{
writer.Write(value);
}
There is a way which avoids memory copying and iteration.
You can use a really ugly hack to temporary change your array to another type using (unsafe) memory manipulation.
I tested this hack in both 32 & 64 bit OS, so it should be portable.
The source + sample usage is maintained at https://gist.github.com/1050703 , but for your convenience I'll paste it here as well:
public static unsafe class FastArraySerializer
{
[StructLayout(LayoutKind.Explicit)]
private struct Union
{
[FieldOffset(0)] public byte[] bytes;
[FieldOffset(0)] public float[] floats;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
private struct ArrayHeader
{
public UIntPtr type;
public UIntPtr length;
}
private static readonly UIntPtr BYTE_ARRAY_TYPE;
private static readonly UIntPtr FLOAT_ARRAY_TYPE;
static FastArraySerializer()
{
fixed (void* pBytes = new byte[1])
fixed (void* pFloats = new float[1])
{
BYTE_ARRAY_TYPE = getHeader(pBytes)->type;
FLOAT_ARRAY_TYPE = getHeader(pFloats)->type;
}
}
public static void AsByteArray(this float[] floats, Action<byte[]> action)
{
if (floats.handleNullOrEmptyArray(action))
return;
var union = new Union {floats = floats};
union.floats.toByteArray();
try
{
action(union.bytes);
}
finally
{
union.bytes.toFloatArray();
}
}
public static void AsFloatArray(this byte[] bytes, Action<float[]> action)
{
if (bytes.handleNullOrEmptyArray(action))
return;
var union = new Union {bytes = bytes};
union.bytes.toFloatArray();
try
{
action(union.floats);
}
finally
{
union.floats.toByteArray();
}
}
public static bool handleNullOrEmptyArray<TSrc,TDst>(this TSrc[] array, Action<TDst[]> action)
{
if (array == null)
{
action(null);
return true;
}
if (array.Length == 0)
{
action(new TDst[0]);
return true;
}
return false;
}
private static ArrayHeader* getHeader(void* pBytes)
{
return (ArrayHeader*)pBytes - 1;
}
private static void toFloatArray(this byte[] bytes)
{
fixed (void* pArray = bytes)
{
var pHeader = getHeader(pArray);
pHeader->type = FLOAT_ARRAY_TYPE;
pHeader->length = (UIntPtr)(bytes.Length / sizeof(float));
}
}
private static void toByteArray(this float[] floats)
{
fixed(void* pArray = floats)
{
var pHeader = getHeader(pArray);
pHeader->type = BYTE_ARRAY_TYPE;
pHeader->length = (UIntPtr)(floats.Length * sizeof(float));
}
}
}
And the usage is:
var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
foreach (var b in bytes)
{
Console.WriteLine(b);
}
});
If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().
public static void BlockCopy(
Array src,
int srcOffset,
Array dst,
int dstOffset,
int count
)
For example:
float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
You're better-off letting the BinaryWriter do this for you. There's going to be iteration over your entire set of data regardless of which method you use, so there's no point in playing with bytes.
Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.
Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.
Using the new Span<> in .Net Core 2.1 or later...
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
Or, if Span can be used instead, then a direct reinterpret cast can be done: (very fast - zero copying)
Span<byte> byteArray3 = MemoryMarshal.Cast<float, byte>(floatArray);
// with span we can get a byte, set a byte, iterate, and more.
byte someByte = byteSpan[2];
byteSpan[2] = 33;
I did some crude benchmarks. The time taken for each is in the comments. [release/no debugger/x64]
float[] floatArray = new float[100];
for (int i = 0; i < 100; i++) floatArray[i] = i * 7.7777f;
Stopwatch start = Stopwatch.StartNew();
for (int j = 0; j < 100; j++)
{
start.Restart();
for (int k = 0; k < 1000; k++)
{
Span<byte> byteSpan = MemoryMarshal.Cast<float, byte>(floatArray);
}
long timeTaken1 = start.ElapsedTicks; ////// 0 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
}
long timeTaken2 = start.ElapsedTicks; ////// 26 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
for (int i = 0; i < floatArray.Length; i++)
BitConverter.GetBytes(floatArray[i]).CopyTo(byteArray, i * sizeof(float));
}
long timeTaken3 = start.ElapsedTicks; ////// 1310 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
}
long timeTaken4 = start.ElapsedTicks; ////// 33 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
MemoryStream memStream = new MemoryStream();
BinaryWriter writer = new BinaryWriter(memStream);
foreach (float value in floatArray)
writer.Write(value);
writer.Close();
}
long timeTaken5 = start.ElapsedTicks; ////// 1080 ticks //////
Console.WriteLine($"{timeTaken1/10,6} {timeTaken2 / 10,6} {timeTaken3 / 10,6} {timeTaken4 / 10,6} {timeTaken5 / 10,6} ");
}
We have a class called LudicrousSpeedSerialization and it contains the following unsafe method:
static public byte[] ConvertFloatsToBytes(float[] data)
{
int n = data.Length;
byte[] ret = new byte[n * sizeof(float)];
if (n == 0) return ret;
unsafe
{
fixed (byte* pByteArray = &ret[0])
{
float* pFloatArray = (float*)pByteArray;
for (int i = 0; i < n; i++)
{
pFloatArray[i] = data[i];
}
}
}
return ret;
}
Although it basically does do a for loop behind the scenes, it does do the job in one line
byte[] byteArray = floatArray.Select(
f=>System.BitConverter.GetBytes(f)).Aggregate(
(bytes, f) => {List<byte> temp = bytes.ToList(); temp.AddRange(f); return temp.ToArray(); });

Categories