How to read binary array directly from the stream? - c#

Here by "directly" I mean without temporary byte[] array.
The problem is, for example I have array of ints or doubles on the disk, so currently I create two arrays -- byte array and int array (in case of ints). The former is just for reading, the latter is the actual output.
Since Stream can read only to byte array I read it to the first array, than copy all the data to the second. It works, but it really hurts me (I am not talking about performance here).
So, how to read the array without temporary array? Using C# unsafe context is fine for me.
So far I tried two approaches: I looked if it is possible to create an array reusing allocated memory and second which looked more promising -- I could get pointer to the result/second array and in unsafe context I could cast it to byte* pointer. Considering my needs it is 100% safe and valid, however byte* pointer is not a byte[] array in C# world and I cannot find the way to cast pointer to array.
Code:
void ReadStuff(Stream stream, double[] data)
{
var dataBytes = new byte[data.Length * sizeof(double)];
stream.Read(dataBytes, 0, dataBytes.Length);
Buffer.BlockCopy(dataBytes, 0, data, 0, dataBytes.Length);
// ...
}

There is no way I know of to copy data directly from a stream to a typed array. But you can process the data in chunks, limiting your memory overhead to a fixed amount. The memory will be copied twice, but this is unavoidable as far as I know.
For example:
public static void ReadArrayDataChunked(BinaryReader binaryReader, Array target, Type type, int bufferSize = 4096)
{
var buffer = new byte[bufferSize];
var tSize = Marshal.SizeOf(type) ;
var remainingBytes = target.Length * tSize;
var targetPosition = 0;
while (remainingBytes > 0)
{
var toRead = Math.Min(remainingBytes, buffer.Length);
var bytesRead = binaryReader.Read(buffer, 0, toRead);
Buffer.BlockCopy(buffer, 0, target, targetPosition, bytesRead);
targetPosition += bytesRead;
remainingBytes -= bytesRead;
}
}
Note that this only works for primitive types due to the BlockCopy, but this helps improve copy performance. You will need to read other types item by item.

Related

C# - Stream.Read offset is working incorrectly

I have a test code like below. I am reading from stream, offsetting by 2 positions, and then taking next 2 bytes. I would hope that result would be an array with 2 elements. This does not work though - offset is completely ignored, and full sized array is always returned, with only offset blocks having values. But this means my result table is still very large, it just has a lot of unwanted zeroes
How can I rework below code, so that file.Read() returns only an array of 2 bytes instead of 10 when length = 2 and offset = 2? In real world scenario I am dealing with large files (>2gigs) so filtering out the result array is not an option.
Edit: As the issue is unclear - below code requires me to always define output array that is the same size as the stream. Instead I would like to have an output that is of size of length (in below example I would like to have var buffer = new byte[2], but that will throw an exception because file.Read ignores offset and length and always returns 10 elements (with only 2 of them being read, rest is dummy zeroes).
private byte[] GetFilePart(int length, int offset)
{
//build some dummy content
var content = new byte[10];
for (int i = 0; i<10; i++)
{
content[i] = 1;
}
//read the data from content
var buffer = new byte[10];
using (Stream file = new MemoryStream(content))
{
file.Read(buffer, offset, length);
}
return buffer;
}
Looks like it's working properly to me; maybe your confusion would clear a bit if you inited your content array with something like:
for (int i = 1; i<=10; i++)
{
content[i-1] = i;
}
then each byte would have a different number and the image would look like:
offset relates to where into buffer the Stream will write the bytes to (it reads from the start of content). It does not relate to what bytes are read out of content.
Imagine Read as being called WriteBytesInto(byte[] whatBuffer, int whereToStartWriting, int howManyBytesToWrite) - you provide the buffer it will write into and tell it where to start and how many to do
If you did this, having inited content to be incrementing numbers:
file.Read(buffer, 2, 3); //read 3 bytes from stream and write to buffer # index 2
file.Read(buffer, 0, 2); //read 2 bytes from stream and write to buffer # index 0
Your buffer would end up looking like:
4,5,1,2,3,0,0,0,0,0
The 1,2,3 having been written first, then the 4,5 written next
If you want to skip two bytes from the stream (i.e. read the 3rd and 4th byte from content, Seek() the stream or set its Position (or as canton7 advises in the commments, if the stream is not seekable, read and discard some bytes)
How can I rework below code, so that file.Read() returns only an array of 2 bytes instead of 10 when length = 2 and offset = 2?
Well, file.Read doesn't return an array at all; it modifies an array you give it. If you want a 2 byte array, give it a 2 byte array:
byte buf = new byte[2];
file.Read(buf, 0, buf.Length);
If you want to open a file, skip the first 7 bytes and then read bytes 8th and 9th into your length-of-2 byte array then:
byte buf = new byte[2];
file.Position = 7; //absolute skip to 8th byte
file.Read(buf, 0, buf.Length);
For more on seeking in streams see Stream.Seek(0, SeekOrigin.Begin) or Position = 0

How to Convert Primitive[] to byte[]

For Serialization of Primitive Array, i'am wondering how to convert a Primitive[] to his corresponding byte[]. (ie an int[128] to a byte[512], or a ushort[] to a byte[]...)
The destination can be a Memory Stream, a network message, a file, anything.
The goal is performance (Serialization & Deserialization time), to be able to write with some streams a byte[] in one shot instead of loop'ing' through all values, or allocate using some converter.
Some already solution explored:
Regular Loop to write/read
//array = any int[];
myStreamWriter.WriteInt32(array.Length);
for(int i = 0; i < array.Length; ++i)
myStreamWriter.WriteInt32(array[i]);
This solution works for Serialization and Deserialization And is like 100 times faster than using Standard System.Runtime.Serialization combined with a BinaryFormater to Serialize/Deserialize a single int, or a couple of them.
But this solution becomes slower if array.Length contains more than 200/300 values (for Int32).
Cast?
Seems C# can't directly cast a Int[] to a byte[], or a bool[] to a byte[].
BitConverter.Getbytes()
This solution works, but it allocates a new byte[] at each call of the loop through my int[]. Performances are of course horrible
Marshal.Copy
Yup, this solution works too, but same problem as previous BitConverter one.
C++ hack
Because direct cast is not allowed in C#, i tryed some C++ hack after seeing into memory that array length is stored 4 bytes before array data starts
ARRAYCAST_API void Cast(int* input, unsigned char** output)
{
// get the address of the input (this is a pointer to the data)
int* count = input;
// the size of the buffer is located just before the data (4 bytes before as this is an int)
count--;
// multiply the number of elements by 4 as an int is 4 bytes
*count = *count * 4;
// set the address of the byte array
*output = (unsigned char*)(input);
}
and the C# that call:
byte[] arrayB = null;
int[] arrayI = new int[128];
for (int i = 0; i < 128; ++i)
arrayI[i] = i;
// delegate call
fptr(arrayI, out arrayB);
I successfully retrieve my int[128] into C++, switch the array length, and affecting the right adress to my 'output' var, but C# is only retrieving a byte[1] as return. It seems that i can't hack a managed variable like that so easily.
So i really start to think that all theses casts i want to achieve are just impossible in C# (int[] -> byte[], bool[] -> byte[], double[] -> byte[]...) without Allocating/copying...
What am-i missing?
How about using Buffer.BlockCopy?
// serialize
var intArray = new[] { 1, 2, 3, 4, 5, 6, 7, 8 };
var byteArray = new byte[intArray.Length * 4];
Buffer.BlockCopy(intArray, 0, byteArray, 0, byteArray.Length);
// deserialize and test
var intArray2 = new int[byteArray.Length / 4];
Buffer.BlockCopy(byteArray, 0, intArray2, 0, byteArray.Length);
Console.WriteLine(intArray.SequenceEqual(intArray2)); // true
Note that BlockCopy is still allocating/copying behind the scenes. I'm fairly sure that this is unavoidable in managed code, and BlockCopy is probably about as good as it gets for this.

Split up a memorystream into bytarray

Im trying to split up a memorystream into chunks by reading parts into a byte array but i think i have got something fundamentally wrong. I can read the first chunk but when i try to read rest of memorystream i get index out of bound even if there are more bytes to read. It seems that the issue is the size of the receving byte buffer that need to be as large as the memorystrem. I need to convert it into chunks as the code is in a webservice.
Anyone knows whats wrong with this code
fb.buffer is MemoryStream
long bytesLeft = fb.Buffer.Length;
fb.Buffer.Position = 0;
int offset =0;
int BUFF_SIZE = 8196;
while (bytesLeft > 0)
{
byte[] fs = new byte[BUFF_SIZE];
fb.Buffer.Read(fs, offset, BUFF_SIZE);
offset += BUFF_SIZE;
bytesLeft -= BUFF_SIZE;
}
offset here is the offset into the array. It should be zero here. You should also be looking at the return value from Read. It is not guaranteed to fill the buffer, even if more data is available.
However, if this is a MemoryStream - a better option might be ArraySegment<byte>, which requires no duplication of data.
Please look at this code for Stream.Read from MSDN from a glance - you shouldn't be incrementing offset - it should always be zero. Unless, of course, you happen to know for a fact the exact length of the stream in advance (therefore you would create the array the exact size).
You should also always grab the amount of bytes read from Read (the return value).
If you're looking to split it into 'chunks` do you mean you want n 8k chunks? Then you might want to do something like this:
List<byte[]> chunks = new List<byte[]>();
byte chunk = new byte[BUFF_SIZE];
int bytesRead = fb.Buffer.Read(chunk, 0, BUFF_SIZE);
while(bytesRead > 0)
{
if(bytesRead != BUFF_SIZE)
{
byte[] tail = new byte[bytesRead];
Array.Copy(chunk, tail, bytesRead);
chunk = tail;
}
chunks.Add(chunk);
bytesRead = fb.Buffer.Read(chunk, 0, BUFF_SIZE);
}
Note in particular that the last chunk is more than likely not going to be exactly BUFF_SIZE in length.

Better/faster way to fill a big array in C#

I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.
Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.
I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.
What are the different ways to speed up the initialization of these objects?
The fastest way is to manually serialize the data.
An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.
You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).
An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.
Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])
These is how you can implement them both:
// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
writer.Write(intArr[i]);
// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
for (int i = 0; i < intArr.Length; i++)
intArr[i] = reader.ReadInt32();
// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));
writer.Write(intArr.Length);
writer.Write(byteArr);
// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);
I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.
It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).
Both times include the work the garbage collector has to do.
Obviously method 2 is faster than method 1, though possibly less readable.
Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.
Something like this, will write 128MB of an int[] at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB
int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
intArr[i] = i;
byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB
int dataDone = 0;
using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
while (dataDone < intArr.Length)
{
int dataToWrite = intArr.Length - dataDone;
if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
writer.Write(byteArr);
dataDone += dataToWrite;
}
}
Note that this is just for writing, reading works differently too :P.
I hope this gives you some more insight in dealing with very large data files :).
If you've just got a bunch of integers, then using JSON will indeed be pretty inefficient in terms of parsing. You can use BinaryReader and BinaryWriter to write binary files efficiently... but it's not clear to me why you need to read the file every time you create an object anyway. Why can't each new object keep a reference to the original array, which has been read once? Or if they need to mutate the data, you could keep one "canonical source" and just copy that array in memory each time you create an object.
The fastest way to create a byte array from an array of integers is to use Buffer.BlockCopy
byte[] result = new byte[a.Length * sizeof(int)];
Buffer.BlockCopy(a, 0, result, 0, result.Length);
// write result to FileStream or wherever
If you store the size of the array in the first element, you can use it again to deserialize. Make sure everything fits into memory, but looking at your file sizes it should.
var buffer = File.ReadAllBytes(#"...");
int size = BitConverter.ToInt32(buffer,0);
var result = new int[size];
Buffer.BlockCopy(buffer, 0, result, result.length);
Binary is not human readable, but definetely faster than JSON.

Retrieve data from const int * const buffer[]

I marshalled correctly to:
IntPtr buffer
Buffer is pointer to array of 2 pointers to arrays with respective data.
The problem is that I get not accurate data, like if there's something missing in the data retrieved (e.g. missimg samples from stream of audio data).
// length is parameter
IntPtr[] temp = new IntPtr[2];
Marshal.Copy(buffer, temp, 0, 2);
bufferedData = new byte[bufferSize];
byte[] a = new byte[length];
byte[] b = new byte[length];
Marshal.Copy(temp[0], a, 0, length);
Marshal.Copy(temp[1], b, 0, length);
edit: sorry I forgot to write those 2 lines :)
I finally solved it. I wasn't reading full input buffer by mistake. Thanks for all your help!
Yes, you would need to copy the byte buffers too :)
Update: That looks better!
If buffer is a pointer to an array, you would need to read the pointer once more.
Effectively:
buffer = Marshal.ReadIntPtr(buffer);
I don't know anything about C# so this is a complete guess, but - you seem to be copying from ints to bytes, is "length" the count in ints, or the count in bytes? Could there be a mixup there? This can be a problem in regular old C++ sometimes.

Categories