Does converting between byte[] and MemoryStream cause overhead? - c#

I want to know if there's overhead when converting between byte arrays and Streams (specifically MemoryStreams when using MemoryStream.ToArray() and MemoryStream(byte[]). I assume it's temporary doubling memory usage.
For example, I read as a stream, convert to bytes, and then convert to stream again.
But getting rid of that byte conversion will require a bit of a rewrite. I don't want to waste time rewriting it if it doesn't make a difference.

So, yes.. you are correct in assuming that ToArray duplicates the memory in the stream.
If you want do not want to do this (for efficiency reasons), you could modify the bytes directly in the stream. Take a look at this:
// create some bytes: 0,1,2,3,4,5,6,7...
var originalBytes = Enumerable.Range(0, 256).Select(Convert.ToByte).ToArray();
using(var ms = new MemoryStream(originalBytes)) // ms is referencing bytes array, not duplicating it
{
// var duplicatedBytes = ms.ToArray(); // copy of originalBytes array
// If you don't want to duplicate the bytes but want to
// modify the buffer directly, you could do this:
var bufRef = ms.GetBuffer();
for(var i = 0; i < bufRef.Length; ++i)
{
bufRef[i] = Convert.ToByte(bufRef[i] ^ 0x55);
}
// or this:
/*
ms.TryGetBuffer(out var buf);
for (var i = 0; i < buf.Count; ++i)
{
buf[i] = Convert.ToByte(buf[i] ^ 0x55);
}*/
// or this:
/*
for (var i = 0; i < ms.Length; ++i)
{
ms.Position = i;
var b = ms.ReadByte();
ms.Position = i;
ms.WriteByte(Convert.ToByte(b ^ 0x55));
}*/
}
// originalBytes will now be 85,84,87,86...
ETA:
Edited to add in Blindy's examples. Thanks! -- Totally forgot about GetBuffer and had no idea about TryGetBuffer

Does MemoryStream(byte[]) cause a memory copy?
No, it's a non-resizable stream, and as such no copy is necessary.
Does MemoryStream.ToArray() cause a memory copy?
Yes, by design it creates a copy of the active buffer. This is to cover the resizable case, where the buffer used by the stream is not the same buffer that was initially provided due to reallocations to increase/decrease its size.
Alternatives to MemoryStream.ToArray() that don't cause memory copy?
Sure, you have MemoryStream.TryGetBuffer (out ArraySegment<byte> buffer), which returns a segment pointing to the internal buffer, whether or not it's resizable. If it's non-resizable, it's a segment into your original array.
You also have MemoryStream.GetBuffer, which returns the entire internal buffer. Note that in the resizable case, this will be a lot larger than the actual used stream space, and you'll have to adjust for that in code.
And lastly, you don't always actually need a byte array, sometimes you just need to write it to another stream (a file, a socket, a compression stream, an Http response, etc). For this, you have MemoryStream.CopyTo[Async], which also doesn't perform any copies.

Related

Copying pointers data to byte array and writing to MemoryStream results to tons of allocation in LOH

I have WPF app and using ffmpeg library. I have a video recording preview using SDL2. For SDL pixel format its used PixelFormat_UYVY, so every frame is converted to YUV420P.
Conversation is done with
MemoryStream ms = null;
using (ms = new MemoryStream())
{
int shift = 0;
byte* yuv_factor;
for (uint i = 0; i < 3; i++)
{
shift = (i == 0 ? 0 : 1);
yuv_factor = frame->data[i];
for (int j = 0; j < (frame->height >> shift); j++)
{
byte[] frameData = new byte[frame->width >> shift];
Marshal.Copy((IntPtr)yuv_factor, frameData, 0, frameData.Length);
yuv_factor += frame->linesize[i];
ms.Write(frameData, 0, frameData.Length);
}
}
}
return ms.ToArray();
Then this byte[] is simply casted to IntPtr and passed to SDL.
sdl.Preview((IntPtr)data);
The problem is that I can see tons of GC Pressure and a lot of System.Byte[] allocations in LOH. Is there a way to fix that?
I would suggest you begin with analyzing the number of allocations in your code:
MemoryStream which is essentially backed by an expanding byte array under the hood (see here). I have seen something like a recycleable version of it. That might help as well.
As already suggested in the comment, renting an array using the ArrayPool might be an easy way to reduce memory very quickly, especially in case of the frameData array.
As it seems, the ToArray() call at the end of your method also creates a new array (look at the implementation using the above link).
I would try to target these three spots first and then reevaluate.

How to read file bytes from byte offset?

If I am given a .cmp file and a byte offset 0x598, how can I read a file from this offset?
I can ofcourse read file bytes like this
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
But how can I read it from byte offset 0x598
To explain a bit more, actually from this offset the actual data starts that I have to read and before this byte offset it is just header data, so basically I have to read file from that offset till end.
Try code like this:
using (BinaryReader reader = new BinaryReader(File.Open("upgradefile.cmp", FileMode.Open)))
{
long offset = 0x598;
if (reader.BaseStream.Length > offset)
{
reader.BaseStream.Seek(offset, SeekOrigin.Begin);
byte[]fileBytes = reader.ReadBytes((int) (reader.BaseStream.Length - offset));
}
}
If you are not familiar with Streams, Linq, or whatever, I have simplest solution for you:
Read entire file into memory (I hope you deal with small files):
byte[] fileBytes = File.ReadAllBytes("upgradefile.cmp");
Calculate how many bytes are present in array after given offset:
long startOffset = 0x598; // this is just hexadecimal representation for human, it can be decimal or whatever
long howManyBytesToRead = fileBytes.Length - startOffset;
Then just copy data to new array:
byte[] newArray = new byte[howManyBytesToRead];
long pos = 0;
for (int i = startOffset; i < fileBytes.Length; i++)
{
newArray[pos] = fileBytes[i];
pos = pos + 1;
}
If you understand how it works you can look at Array.Copy method in Microsoft documentation.
By not using ReadAllBytes.
Get a stream, move to potition, read rest of files.
You basically complain that a convenience method made to allow a one line read of a whole file is not what you want - ignoring that it is just that, a convenience method. The normal way to deal with files is opening them and using a Stream.

fail to read it all into memory when shortcount reading

My textbook says:
Read receives a block of data from the stream into an array. It returns the number of
bytes received, which is always either less than or equal to the count argument. If
it’s less than count, it means either that the end of the stream has been reached or
the stream is giving you the data in smaller chunks (as is often the case with network streams). In either case, the balance of bytes in the array will remain unwritten, their previous values preserved.
With Read, you can be certain you’ve reached the end of the
stream only when the method returns 0. So, if you have a
1,000-byte stream, the following code may fail to read it all
into memory:
// Assuming s is a stream:
byte[] data = new byte [1000];
s.Read (data, 0, data.Length);
The Read method could read anywhere from 1 to 1,000 bytes,
leaving the balance of the stream unread.
I'm confused and I write a simple concolse application to verify:
//read.txt only contains 3 chars: abc
using (FileStream s = new FileStream("read.txt", FileMode.Open))
{
byte[] data = new byte[5];
int num = s.Read(data, 0, data.Length);
foreach (byte b in data)
{
Console.WriteLine(b);
}
}
and the output is:
97
98
99
0
0
so the data array has been written, why the textbook says "the balance of bytes in the array will remain unwritten and it may fail to read it all into memory",
EDIT: please discard my previous console application, the real question I have is:
a file has 1000 byte and I'm requesting to read all 1000 bytes:
byte[] data = new byte [1000];
s.Read (data, 0, data.Length);
why it may fail to read it all into memory? and the textbook also provides a correct method to read all 1000 bytes:
byte[] data = new byte [1000];
int bytesRead = 0;
int chunkSize = 1;
while (bytesRead < data.Length && chunkSize > 0)
bytesRead +=
chunkSize = s.Read (data, bytesRead, data.Length - bytesRead);
With regard to:
byte[] data = new byte[5];
int num = s.Read(data, 0, data.Length);
foreach (byte b in data)
That foreach will process every single byte in data (and there's always five of those regardless of how many were written to by the Read() call).
The number of bytes that were actually read by the Read() call gets stored into the num variable, so that's what you should be using to process your data, something like (untested but probably simple enough to not warrant it):
for (int i = 0; i < num; i++) {
Console.WriteLine(data[i]);
}
As per the text you quote, everything after the bytes that were read can be any arbitrary value because the Read() call did not change them. In your case, they were obviously zero before the Read() and so remained so.
If you were to run that foreach loop before the Read() as well as after, you should see only the first three bytes change (assuming the content of the file is different to what was in memory, of course).
If it’s less than count, it means either that the end of the stream has been reached or the stream is giving you the data in smaller chunks (as is often the case with network streams). In either case, the balance of bytes in the array will remain unwritten, their previous values preserved.
When the text says In either case, it refers to If it’s (the return value, i.e the bytes read from the stream) less than count. In this case, the bytes read don't take up all the buffer area you have designated to the read action. The rest of the buffer area is, of course, untouched (remain unwritten, their previous values preserved).
In your case, it is that the end of the stream has been reached because you only have 3 bytes in your file stream. The trailing zeros in your array are the untouched buffer elements.
So, if you have a 1,000-byte stream, the following code may fail to read it all into memory.
The text is talking about a 1,000-byte stream, while your stream just contains 3 bytes. For a short file stream that has only 3 bytes, it is likely that all data will be loaded into memory at once, which is the case you have observed. (just "likely", not "guaranteed"). For a longer stream (or other types of stream, e.g. NetworkStream), you are not guarunteed to read bytes enough to fill your designated buffer area at once (there are possibilities that you may even get only 1 bytes at a time!).

Split up a memorystream into bytarray

Im trying to split up a memorystream into chunks by reading parts into a byte array but i think i have got something fundamentally wrong. I can read the first chunk but when i try to read rest of memorystream i get index out of bound even if there are more bytes to read. It seems that the issue is the size of the receving byte buffer that need to be as large as the memorystrem. I need to convert it into chunks as the code is in a webservice.
Anyone knows whats wrong with this code
fb.buffer is MemoryStream
long bytesLeft = fb.Buffer.Length;
fb.Buffer.Position = 0;
int offset =0;
int BUFF_SIZE = 8196;
while (bytesLeft > 0)
{
byte[] fs = new byte[BUFF_SIZE];
fb.Buffer.Read(fs, offset, BUFF_SIZE);
offset += BUFF_SIZE;
bytesLeft -= BUFF_SIZE;
}
offset here is the offset into the array. It should be zero here. You should also be looking at the return value from Read. It is not guaranteed to fill the buffer, even if more data is available.
However, if this is a MemoryStream - a better option might be ArraySegment<byte>, which requires no duplication of data.
Please look at this code for Stream.Read from MSDN from a glance - you shouldn't be incrementing offset - it should always be zero. Unless, of course, you happen to know for a fact the exact length of the stream in advance (therefore you would create the array the exact size).
You should also always grab the amount of bytes read from Read (the return value).
If you're looking to split it into 'chunks` do you mean you want n 8k chunks? Then you might want to do something like this:
List<byte[]> chunks = new List<byte[]>();
byte chunk = new byte[BUFF_SIZE];
int bytesRead = fb.Buffer.Read(chunk, 0, BUFF_SIZE);
while(bytesRead > 0)
{
if(bytesRead != BUFF_SIZE)
{
byte[] tail = new byte[bytesRead];
Array.Copy(chunk, tail, bytesRead);
chunk = tail;
}
chunks.Add(chunk);
bytesRead = fb.Buffer.Read(chunk, 0, BUFF_SIZE);
}
Note in particular that the last chunk is more than likely not going to be exactly BUFF_SIZE in length.

Better/faster way to fill a big array in C#

I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.
Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.
I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.
What are the different ways to speed up the initialization of these objects?
The fastest way is to manually serialize the data.
An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.
You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).
An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.
Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])
These is how you can implement them both:
// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
writer.Write(intArr[i]);
// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
for (int i = 0; i < intArr.Length; i++)
intArr[i] = reader.ReadInt32();
// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));
writer.Write(intArr.Length);
writer.Write(byteArr);
// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);
I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.
It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).
Both times include the work the garbage collector has to do.
Obviously method 2 is faster than method 1, though possibly less readable.
Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.
Something like this, will write 128MB of an int[] at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB
int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
intArr[i] = i;
byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB
int dataDone = 0;
using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
while (dataDone < intArr.Length)
{
int dataToWrite = intArr.Length - dataDone;
if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
writer.Write(byteArr);
dataDone += dataToWrite;
}
}
Note that this is just for writing, reading works differently too :P.
I hope this gives you some more insight in dealing with very large data files :).
If you've just got a bunch of integers, then using JSON will indeed be pretty inefficient in terms of parsing. You can use BinaryReader and BinaryWriter to write binary files efficiently... but it's not clear to me why you need to read the file every time you create an object anyway. Why can't each new object keep a reference to the original array, which has been read once? Or if they need to mutate the data, you could keep one "canonical source" and just copy that array in memory each time you create an object.
The fastest way to create a byte array from an array of integers is to use Buffer.BlockCopy
byte[] result = new byte[a.Length * sizeof(int)];
Buffer.BlockCopy(a, 0, result, 0, result.Length);
// write result to FileStream or wherever
If you store the size of the array in the first element, you can use it again to deserialize. Make sure everything fits into memory, but looking at your file sizes it should.
var buffer = File.ReadAllBytes(#"...");
int size = BitConverter.ToInt32(buffer,0);
var result = new int[size];
Buffer.BlockCopy(buffer, 0, result, result.length);
Binary is not human readable, but definetely faster than JSON.

Categories