For Serialization of Primitive Array, i'am wondering how to convert a Primitive[] to his corresponding byte[]. (ie an int[128] to a byte[512], or a ushort[] to a byte[]...)
The destination can be a Memory Stream, a network message, a file, anything.
The goal is performance (Serialization & Deserialization time), to be able to write with some streams a byte[] in one shot instead of loop'ing' through all values, or allocate using some converter.
Some already solution explored:
Regular Loop to write/read
//array = any int[];
myStreamWriter.WriteInt32(array.Length);
for(int i = 0; i < array.Length; ++i)
myStreamWriter.WriteInt32(array[i]);
This solution works for Serialization and Deserialization And is like 100 times faster than using Standard System.Runtime.Serialization combined with a BinaryFormater to Serialize/Deserialize a single int, or a couple of them.
But this solution becomes slower if array.Length contains more than 200/300 values (for Int32).
Cast?
Seems C# can't directly cast a Int[] to a byte[], or a bool[] to a byte[].
BitConverter.Getbytes()
This solution works, but it allocates a new byte[] at each call of the loop through my int[]. Performances are of course horrible
Marshal.Copy
Yup, this solution works too, but same problem as previous BitConverter one.
C++ hack
Because direct cast is not allowed in C#, i tryed some C++ hack after seeing into memory that array length is stored 4 bytes before array data starts
ARRAYCAST_API void Cast(int* input, unsigned char** output)
{
// get the address of the input (this is a pointer to the data)
int* count = input;
// the size of the buffer is located just before the data (4 bytes before as this is an int)
count--;
// multiply the number of elements by 4 as an int is 4 bytes
*count = *count * 4;
// set the address of the byte array
*output = (unsigned char*)(input);
}
and the C# that call:
byte[] arrayB = null;
int[] arrayI = new int[128];
for (int i = 0; i < 128; ++i)
arrayI[i] = i;
// delegate call
fptr(arrayI, out arrayB);
I successfully retrieve my int[128] into C++, switch the array length, and affecting the right adress to my 'output' var, but C# is only retrieving a byte[1] as return. It seems that i can't hack a managed variable like that so easily.
So i really start to think that all theses casts i want to achieve are just impossible in C# (int[] -> byte[], bool[] -> byte[], double[] -> byte[]...) without Allocating/copying...
What am-i missing?
How about using Buffer.BlockCopy?
// serialize
var intArray = new[] { 1, 2, 3, 4, 5, 6, 7, 8 };
var byteArray = new byte[intArray.Length * 4];
Buffer.BlockCopy(intArray, 0, byteArray, 0, byteArray.Length);
// deserialize and test
var intArray2 = new int[byteArray.Length / 4];
Buffer.BlockCopy(byteArray, 0, intArray2, 0, byteArray.Length);
Console.WriteLine(intArray.SequenceEqual(intArray2)); // true
Note that BlockCopy is still allocating/copying behind the scenes. I'm fairly sure that this is unavoidable in managed code, and BlockCopy is probably about as good as it gets for this.
Related
My application receives data from a serial port, which is send in packets. The packets are defined as following
1 byte - Identifier
2 bytes - lenght of data
n bytes - data
1 bytes - Checksum
For example if the length is specified as 508 there will be 508 bytes, which would be 127 uint32_t values.
Currently I use the following code to assemble the uint32_t values from the data that is sent in bytes:
private UInt32[] number_array = new UInt32[16384];
private void decodePacket(int startpos, byte[] data, int lenght)
{
/* Starting position */
int pos = startpos;
for(int i=0; i<lenght; i++)
{
/* Convert 4 bytes to one uint32_t value */
int value = data[i] | data[i + 1]<<8 | data[i + 2]<<16 | data[i + 3]<<24;
/* Write to array */
number_array[pos] = Convert.ToUInt32(value);
/* Advance i by 4 (bytes */
i += 4;
/* Advance pos */
pos++;
}
}
It does work fine, but I'm thinking it's very inefficient. There are usually 16384 uint32_t values to process, so this function is called a lot of times.
Is there a more efficient / faster way to do this?
Look at this simple code:
static void Main(string[] args)
{
byte[] data = new byte[]
{
1, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 5
};
byte id = data[0];
byte[] len = new byte[4];
Array.Copy(data, 1, len, 0, 2);
int dataLen = BitConverter.ToInt32(len, 0);
byte[] dataRead = new byte[dataLen];
Array.Copy(data, 3, dataRead, 0, dataLen);
byte checksum = data[data.Length - 1];
Console.ReadKey();
}
Now, first you get identifier that is on the first pos.
Next, you get length of the data. You have to create 4 byte array to copy values from data array to this new array. It should be 4 bytes, because you would like to convert those bytes into Int32. You could have this array 2 bytes length and convert it to Int16, but Int32 should have better performance.
So, when you have length data in this len array, you can convert those values into Int32. But be careful of endianes. It may be different.
At the end you create an array that will contain all the REAL data. And then copy the real data to new array.
This solution should be faster than yours. But... Be careful of endianess.
Here is the best possible way using C#
private void decodePacket(int startpos, byte[] data, int lenght)
{
/* Starting position */
int pos = startpos;
for (int i = 0; i < lenght; i += 4)
{
/* Convert 4 bytes to one uint32_t value */
int value = BitConverter.ToInt32(data, i);
/* Write to array */
number_array[pos] = Convert.ToUInt32(value);
/* Advance pos */
pos++;
}
}
This is same code as in question but with two changes.
Index i was being incremented at two places which was resulting in increments of 5 instead of 4. Changed it to just one increment of 4.
Use of BitConveter instead of bitwise logic. Although it might not provide any significant performance boost. It is better to use BitConverter for platform independence.
UPDATE
In case your bytes are stored at 32-bit aligned memory addresses BitConverter provides you with maximum performance conversion. But in C# you cannot guarantee memory location alignment. In that case bit shifting is the only way.
In case of bit shifting BitConverter also uses same logic for little endian systems as shown in question. But, it can help keeping your code platform independent by using another bit shifting pattern for big endian systems.
Here by "directly" I mean without temporary byte[] array.
The problem is, for example I have array of ints or doubles on the disk, so currently I create two arrays -- byte array and int array (in case of ints). The former is just for reading, the latter is the actual output.
Since Stream can read only to byte array I read it to the first array, than copy all the data to the second. It works, but it really hurts me (I am not talking about performance here).
So, how to read the array without temporary array? Using C# unsafe context is fine for me.
So far I tried two approaches: I looked if it is possible to create an array reusing allocated memory and second which looked more promising -- I could get pointer to the result/second array and in unsafe context I could cast it to byte* pointer. Considering my needs it is 100% safe and valid, however byte* pointer is not a byte[] array in C# world and I cannot find the way to cast pointer to array.
Code:
void ReadStuff(Stream stream, double[] data)
{
var dataBytes = new byte[data.Length * sizeof(double)];
stream.Read(dataBytes, 0, dataBytes.Length);
Buffer.BlockCopy(dataBytes, 0, data, 0, dataBytes.Length);
// ...
}
There is no way I know of to copy data directly from a stream to a typed array. But you can process the data in chunks, limiting your memory overhead to a fixed amount. The memory will be copied twice, but this is unavoidable as far as I know.
For example:
public static void ReadArrayDataChunked(BinaryReader binaryReader, Array target, Type type, int bufferSize = 4096)
{
var buffer = new byte[bufferSize];
var tSize = Marshal.SizeOf(type) ;
var remainingBytes = target.Length * tSize;
var targetPosition = 0;
while (remainingBytes > 0)
{
var toRead = Math.Min(remainingBytes, buffer.Length);
var bytesRead = binaryReader.Read(buffer, 0, toRead);
Buffer.BlockCopy(buffer, 0, target, targetPosition, bytesRead);
targetPosition += bytesRead;
remainingBytes -= bytesRead;
}
}
Note that this only works for primitive types due to the BlockCopy, but this helps improve copy performance. You will need to read other types item by item.
I need to combine two Bytes into one int value.
I receive from my camera a 16bit Image were two successive bytes have the intensity value of one pixel. My goal is to combine these two bytes into one "int" vale.
I manage to do this using the following code:
for (int i = 0; i < VectorLength * 2; i = i + 2)
{
NewImageVector[ImagePointer] = ((int)(buffer.Array[i + 1]) << 8) | ((int)(buffer.Array[i]));
ImagePointer++;
}
My image is 1280*960 so VectorLength==1228800 and the incomming buffer size is 2*1228800=2457600 elements...
Is there any way that I can speed this up?
Maybe there is another way so I don't need to use a for-loop.
Thank you
You could use the equivalent to the union of c. Im not sure if faster, but more elegant:
[StructLayout(LayoutKind.Explicit)]
struct byte_array
{
[FieldOffset(0)]
public byte byte1;
[FieldOffset(1)]
public byte byte2;
[FieldOffset(0)]
public short int0;
}
use it like this:
byte_array ba = new byte_array();
//insert the two bytes
ba.byte1 = (byte)(buffer.Array[i]);
ba.byte2 = (byte)(buffer.Array[i + 1]);
//get the integer
NewImageVector[ImagePointer] = ba.int1;
You can fill your two bytes and use the int. To find the faster way take the StopWatch-Class and compare the two ways like this:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
//The code
stopWatch.Stop();
MessageBox.Show(stopWatch.ElapsedTicks.ToString()); //Or milliseconds ,...
Assuming you can (re-)define NewImageVector as a short[], and every two consecutive bytes in Buffer should be transformed into a short (which basically what you're doing now, only you cast to an int afterwards), you can use Buffer.BlockCopy to do it for you.
As the documentation tells, you Buffer.BlockCopy copies bytes from one array to another, so in order to copy your bytes in buffer you need to do the following:
Buffer.BlockCopy(Buffer, 0, NewImageVector, 0, [NumberOfExpectedShorts] * 2)
This tells BlockCopy that you want to start copying bytes from Buffer, starting at index 0, to NewImageVector starting at index 0, and you want to copy [NumberOfExpectedShorts] * 2 bytes (since every short is two bytes long).
No loops, but it does depend on the ability of using a short[] array instead of an int[] array (and indeed, on using an array to begin with).
Note that this also requires the bytes in Buffer to be in little-endian order (i.e. Buffer[index] contains the low byte, buffer[index + 1] the high byte).
You can achieve a small performance increase by using unsafe pointers to iterate the arrays. The following code assumes that source is the input byte array (buffer.Array in your case). It also assumes that source has an even number of elements. In production code you would obviously have to check these things.
int[] output = new int[source.Length / 2];
fixed (byte* pSource = source)
fixed (int* pDestination = output)
{
byte* sourceIterator = pSource;
int* destIterator = pDestination;
for (int i = 0; i < output.Length; i++)
{
(*destIterator) = ((*sourceIterator) | (*(sourceIterator + 1) << 8));
destIterator++;
sourceIterator += 2;
}
}
return output;
I'm looking for a smooth/fast way to retrieve every nth short in a byte array and copy it to a new array.
l = lowerbyte
u = upperbyte
My data is of the following form:
byte[] bytes= {a0l, a0u, b0l, b0u, c0l, ..., n0l, n0l, a1l, a1u, b1l, ..., nXl, nXu}
What I need is to get get n byte arrays of length X (e.g., a[0..X], b[0..X], ... or M[a..n][0..X])
I was thinking of the following two steps:
convert values to short (=> short[] shorts= { a0, b0, c0, ... n0, a1, .. nX})
by using something like
short[] shorts= new short[(int)(Math.Ceiling(bytes.Length / 2.0)];
Buffer.BlockCopy(bytes, 0, shorts, 0, bytes.Length);
retrieve every second value from shorts
I'm looking for some fast code here... something like blockcopy with skip
I am completely aware that I could use a loop - but maybe there's a better
solution to it as I need to do this task for 80MB/s...
convert it back to byte arrays (same same - using blockcopy)
byte[] arrayX = new byte[shorts.Length * 2];
Buffer.BlockCopy(shorts, 0, arrayX , 0, arrayX .Length);
Thank you so much!
I think you might as just well copy the bytes directly to the new byte array, having calculated the correct offset for each short.
Converting it all to a separate short array and back to a byte array is a waste of time IMO.
I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.
Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.
I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.
What are the different ways to speed up the initialization of these objects?
The fastest way is to manually serialize the data.
An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.
You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).
An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.
Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])
These is how you can implement them both:
// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
writer.Write(intArr[i]);
// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
for (int i = 0; i < intArr.Length; i++)
intArr[i] = reader.ReadInt32();
// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));
writer.Write(intArr.Length);
writer.Write(byteArr);
// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);
I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.
It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).
Both times include the work the garbage collector has to do.
Obviously method 2 is faster than method 1, though possibly less readable.
Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.
Something like this, will write 128MB of an int[] at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB
int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
intArr[i] = i;
byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB
int dataDone = 0;
using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
while (dataDone < intArr.Length)
{
int dataToWrite = intArr.Length - dataDone;
if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
writer.Write(byteArr);
dataDone += dataToWrite;
}
}
Note that this is just for writing, reading works differently too :P.
I hope this gives you some more insight in dealing with very large data files :).
If you've just got a bunch of integers, then using JSON will indeed be pretty inefficient in terms of parsing. You can use BinaryReader and BinaryWriter to write binary files efficiently... but it's not clear to me why you need to read the file every time you create an object anyway. Why can't each new object keep a reference to the original array, which has been read once? Or if they need to mutate the data, you could keep one "canonical source" and just copy that array in memory each time you create an object.
The fastest way to create a byte array from an array of integers is to use Buffer.BlockCopy
byte[] result = new byte[a.Length * sizeof(int)];
Buffer.BlockCopy(a, 0, result, 0, result.Length);
// write result to FileStream or wherever
If you store the size of the array in the first element, you can use it again to deserialize. Make sure everything fits into memory, but looking at your file sizes it should.
var buffer = File.ReadAllBytes(#"...");
int size = BitConverter.ToInt32(buffer,0);
var result = new int[size];
Buffer.BlockCopy(buffer, 0, result, result.length);
Binary is not human readable, but definetely faster than JSON.