What's the fastest way to copy bits from an Int to a byte array, in C#?
I have a couple of ints and I need to copy (sometimes all and somtimes some of) the bits serially into a byte[]...
I need the process to be as efficient as possible (e.g. avoid creating new byte array in the process as I understand the BitConverter does etc).
One way to avoid creating new byte[] array on each call is to create a BinaryWriter on top of a MemoryStream, write your integers into it, and then harvest all the results at once by accessing MemoryStream's buffer:
var buf = new byte[400];
using (var ms = new MemoryStream(buf))
using (var bw = new BinaryWriter(ms)) {
for (int i = 0 ; i != 100 ; i++) {
bw.Write(2*i+3);
}
}
// At this point buf contains the bytes of 100 ints
Related
I'm trying to replace only one byte of data from a file, meaning something like 0X05 -> 0X15.
I'm using Replace function to do this.
using (StreamReader reader = new System.IO.StreamReader(Inputfile))
{
content = reader.ReadToEnd();
content = content.Replace("0x05","0x15");
reader.Close();
}
using (FileStream stream = new FileStream(outputfile, FileMode.Create))
{
using (BinaryWriter writer = new BinaryWriter(stream, Encoding.UTF8))
{
writer.Write(content);
}
}
Technically speaking, only that byte of data had to replaced with new byte, but I see there are many bytes of data changed.
Why other bytes are changing ?How can I achieve this?
You're talking about bytes but you've written code that reads strings; strings are an interpretation of bytes so if you truly do mean bytes, mangling them through strings is the wrong way to go
Anyways, there are helper methods to make your life easy, if the file is relatively small (maybe up to 500mb - I'd switch to using an incremental streaming reading/changing/writing method if it's bigger than this)
If you want bytes changed:
var b = File.ReadAllBytes("path");
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5)
b[x] = (byte)0x15;
File.WriteAllBytes("path", b);
If your file is a text file that literally has "0x05" in it:
File.WriteAllText("path", File.ReadAllText("path").Replace("0x05", "0x15"));
In response to your question in the comments, and assuming you want your file to grow by 2 bytes more for each 0x05 it contains (so a 1000 byte file that cotnains three 0x05 bytes will be 1006 bytes after being written) it is probably simplest to:
var b = File.ReadAllBytes("path");
using(FileStream fs = new FileStream("path", FileMode.Create)) //replace file
{
for(int x = 0; x < b.Length; x++)
if(b[x] == 0x5) {
fs.WriteByte((byte)0x15);
fs.WriteByte((byte)0x5);
fs.WriteByte((byte)0x15);
} else
fs.WriteByte(b);
}
Don't worry about writing a single byte at a time - it is buffered elsewhere in the IO chain. You could go for a solution that writes blocks of bytes from the array if you wanted.. this is just easier to code/understand
I want to know if there's overhead when converting between byte arrays and Streams (specifically MemoryStreams when using MemoryStream.ToArray() and MemoryStream(byte[]). I assume it's temporary doubling memory usage.
For example, I read as a stream, convert to bytes, and then convert to stream again.
But getting rid of that byte conversion will require a bit of a rewrite. I don't want to waste time rewriting it if it doesn't make a difference.
So, yes.. you are correct in assuming that ToArray duplicates the memory in the stream.
If you want do not want to do this (for efficiency reasons), you could modify the bytes directly in the stream. Take a look at this:
// create some bytes: 0,1,2,3,4,5,6,7...
var originalBytes = Enumerable.Range(0, 256).Select(Convert.ToByte).ToArray();
using(var ms = new MemoryStream(originalBytes)) // ms is referencing bytes array, not duplicating it
{
// var duplicatedBytes = ms.ToArray(); // copy of originalBytes array
// If you don't want to duplicate the bytes but want to
// modify the buffer directly, you could do this:
var bufRef = ms.GetBuffer();
for(var i = 0; i < bufRef.Length; ++i)
{
bufRef[i] = Convert.ToByte(bufRef[i] ^ 0x55);
}
// or this:
/*
ms.TryGetBuffer(out var buf);
for (var i = 0; i < buf.Count; ++i)
{
buf[i] = Convert.ToByte(buf[i] ^ 0x55);
}*/
// or this:
/*
for (var i = 0; i < ms.Length; ++i)
{
ms.Position = i;
var b = ms.ReadByte();
ms.Position = i;
ms.WriteByte(Convert.ToByte(b ^ 0x55));
}*/
}
// originalBytes will now be 85,84,87,86...
ETA:
Edited to add in Blindy's examples. Thanks! -- Totally forgot about GetBuffer and had no idea about TryGetBuffer
Does MemoryStream(byte[]) cause a memory copy?
No, it's a non-resizable stream, and as such no copy is necessary.
Does MemoryStream.ToArray() cause a memory copy?
Yes, by design it creates a copy of the active buffer. This is to cover the resizable case, where the buffer used by the stream is not the same buffer that was initially provided due to reallocations to increase/decrease its size.
Alternatives to MemoryStream.ToArray() that don't cause memory copy?
Sure, you have MemoryStream.TryGetBuffer (out ArraySegment<byte> buffer), which returns a segment pointing to the internal buffer, whether or not it's resizable. If it's non-resizable, it's a segment into your original array.
You also have MemoryStream.GetBuffer, which returns the entire internal buffer. Note that in the resizable case, this will be a lot larger than the actual used stream space, and you'll have to adjust for that in code.
And lastly, you don't always actually need a byte array, sometimes you just need to write it to another stream (a file, a socket, a compression stream, an Http response, etc). For this, you have MemoryStream.CopyTo[Async], which also doesn't perform any copies.
How can I read multiple byte arrays from a file? These byte arrays are images and have the potential to be large.
This is how I'm adding them to the file:
using (var stream = new FileStream(tempFile, FileMode.Append))
{
//convertedImage = one byte array
stream.Write(convertedImage, 0, convertedImage.Length);
}
So, now they're in tempFile and I don't know how to retrieve them as individual arrays. Ideally, I'd like to get them as an IEnumerable<byte[]>. Is there a way to split these, maybe?
To retrieve multiple sets of byte arrays, you will need to know the length when reading. The easiest way to do this (if you can change the writing code) is to add a length value:
using (var stream = new FileStream(tempFile, FileMode.Append))
{
//convertedImage = one byte array
// All ints are 4-bytes
stream.Write(BitConverter.GetBytes(convertedImage.Length), 0, 4);
// now, we can write the buffer
stream.Write(convertedImage, 0, convertedImage.Length);
}
Reading the data is then
using (var stream = new FileStream(tempFile, FileMode.Open))
{
// loop until we can't read any more
while (true)
{
byte[] convertedImage;
// All ints are 4-bytes
int size;
byte[] sizeBytes = new byte[4];
// Read size
int numRead = stream.Read(sizeBytes, 0, 4);
if (numRead <= 0) {
break;
}
// Convert to int
size = BitConverter.ToInt32(sizeBytes, 0);
// Allocate the buffer
convertedImage = new byte[size];
stream.Read(convertedImage, 0, size);
// Do what you will with the array
listOfArrays.Add(convertedImage);
} // end while
}
If all saved images are the same size, then you can eliminate the first read and write call from each, and hard-code size to the size of the arrays.
Unless you can work out the number of bytes taken by each individual array from the content of these bytes themselves, you need to store the number of images and their individual lengths into the file.
There are many ways to do it: you could write lengths of the individual arrays preceding each byte array, or you could write a "header" describing the rest of the content before writing the "payload" data to the file.
Header may look as follows:
Byte offset Description
----------- -------------------
0000...0003 - Number of files, N
0004...000C - Length of file 1
000D...0014 - Length of file 2
...
XXXX...XXXX - Length of file N
XXXX...XXXX - Content of file 1
XXXX...XXXX - Content of file 2
...
XXXX...XXXX - Content of file N
You can use BitConverter methods to produce byte arrays to be written to the header, or you could use BinaryWriter.
When you read back how do you get the number of bytes per image/byte array to read?
You will need to store the length too (i.e. first 4 bytes = encoded 32 bit int byte count, followed by the data bytes.)
To read back, read the first four bytes, un-encode it back to an int, and then read that number of bytes, repeat until eof.
For Serialization of Primitive Array, i'am wondering how to convert a Primitive[] to his corresponding byte[]. (ie an int[128] to a byte[512], or a ushort[] to a byte[]...)
The destination can be a Memory Stream, a network message, a file, anything.
The goal is performance (Serialization & Deserialization time), to be able to write with some streams a byte[] in one shot instead of loop'ing' through all values, or allocate using some converter.
Some already solution explored:
Regular Loop to write/read
//array = any int[];
myStreamWriter.WriteInt32(array.Length);
for(int i = 0; i < array.Length; ++i)
myStreamWriter.WriteInt32(array[i]);
This solution works for Serialization and Deserialization And is like 100 times faster than using Standard System.Runtime.Serialization combined with a BinaryFormater to Serialize/Deserialize a single int, or a couple of them.
But this solution becomes slower if array.Length contains more than 200/300 values (for Int32).
Cast?
Seems C# can't directly cast a Int[] to a byte[], or a bool[] to a byte[].
BitConverter.Getbytes()
This solution works, but it allocates a new byte[] at each call of the loop through my int[]. Performances are of course horrible
Marshal.Copy
Yup, this solution works too, but same problem as previous BitConverter one.
C++ hack
Because direct cast is not allowed in C#, i tryed some C++ hack after seeing into memory that array length is stored 4 bytes before array data starts
ARRAYCAST_API void Cast(int* input, unsigned char** output)
{
// get the address of the input (this is a pointer to the data)
int* count = input;
// the size of the buffer is located just before the data (4 bytes before as this is an int)
count--;
// multiply the number of elements by 4 as an int is 4 bytes
*count = *count * 4;
// set the address of the byte array
*output = (unsigned char*)(input);
}
and the C# that call:
byte[] arrayB = null;
int[] arrayI = new int[128];
for (int i = 0; i < 128; ++i)
arrayI[i] = i;
// delegate call
fptr(arrayI, out arrayB);
I successfully retrieve my int[128] into C++, switch the array length, and affecting the right adress to my 'output' var, but C# is only retrieving a byte[1] as return. It seems that i can't hack a managed variable like that so easily.
So i really start to think that all theses casts i want to achieve are just impossible in C# (int[] -> byte[], bool[] -> byte[], double[] -> byte[]...) without Allocating/copying...
What am-i missing?
How about using Buffer.BlockCopy?
// serialize
var intArray = new[] { 1, 2, 3, 4, 5, 6, 7, 8 };
var byteArray = new byte[intArray.Length * 4];
Buffer.BlockCopy(intArray, 0, byteArray, 0, byteArray.Length);
// deserialize and test
var intArray2 = new int[byteArray.Length / 4];
Buffer.BlockCopy(byteArray, 0, intArray2, 0, byteArray.Length);
Console.WriteLine(intArray.SequenceEqual(intArray2)); // true
Note that BlockCopy is still allocating/copying behind the scenes. I'm fairly sure that this is unavoidable in managed code, and BlockCopy is probably about as good as it gets for this.
I have 3 *.dat files (346KB,725KB,1762KB) that are filled with a json-string of "big" int-Arrays.
Each time my object is created (several times) I take those three files and use JsonConvert.DeserializeObject to deserialize the arrays into the object.
I thought about using binary-files instead of a json-string or could I even save these arrays directly? I dont need to use these files, it's just the location the data is currently saved. I would gladly switch to anything faster.
What are the different ways to speed up the initialization of these objects?
The fastest way is to manually serialize the data.
An easy way to do this is by creating a FileStream, and then wrapping it in a BinaryWriter/BinaryReader.
You have access to functions to write the basic data structures (numbers, string, char, byte[] and char[]).
An easy way to write a int[] (unneccesary if it's fixed size) is by prepending the length of the array with either an int/long (depending on the size, unsigned doesn't really give any advantages, since arrays use signed datatypes for their length storage). And then write all the ints.
Two ways to write all the ints would be:
1. Simply loop over the entire array.
2. Convert it into a byte[] and write it using BinaryWriter.Write(byte[])
These is how you can implement them both:
// Writing
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
writer.Write(intArr.Length);
for (int i = 0; i < intArr.Length; i++)
writer.Write(intArr[i]);
// Reading
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
for (int i = 0; i < intArr.Length; i++)
intArr[i] = reader.ReadInt32();
// Writing, method 2
BinaryWriter writer = new BinaryWriter(new FileStream(...));
int[] intArr = new int[1000];
byte[] byteArr = new byte[intArr.Length * sizeof(int)];
Buffer.BlockCopy(intArr, 0, byteArr, 0, intArr.Length * sizeof(int));
writer.Write(intArr.Length);
writer.Write(byteArr);
// Reading, method 2
BinaryReader reader = new BinaryReader(new FileStream(...));
int[] intArr = new int[reader.ReadInt32()];
byte[] byteArr = reader.ReadBytes(intArr.Length * sizeof(int));
Buffer.BlockCopy(byteArr, 0, intArr, 0, byteArr.Length);
I decided to put this all to the test, with an array of 10000 integers I ran the test 10000 times.
It resulted in method one consumes averagely 888200ns on my system (about 0.89ms).
While method 2 only consumes averagely 568600ns on my system (0.57ms averagely).
Both times include the work the garbage collector has to do.
Obviously method 2 is faster than method 1, though possibly less readable.
Another reason why method 1 can be better than method 2 is because method 2 requires double the amount of RAM free than data you're going to write (the original int[] and the byte[] that's converted from the int[]), when dealing with limited RAM/extremely large files (talking about 512MB+), though if this is the case, you can always make a hybrid solution, by for example writing away 128MB at a time.
Note that method 1 also requires this extra space, but because it's split down in 1 operation per item of the int[], it can release the memory a lot earlier.
Something like this, will write 128MB of an int[] at a time:
const int WRITECOUNT = 32 * 1024 * 1024; // 32 * sizeof(int)MB
int[] intArr = new int[140 * 1024 * 1024]; // 140 * sizeof(int)MB
for (int i = 0; i < intArr.Length; i++)
intArr[i] = i;
byte[] byteArr = new byte[WRITECOUNT * sizeof(int)]; // 128MB
int dataDone = 0;
using (Stream fileStream = new FileStream("data.dat", FileMode.Create))
using (BinaryWriter writer = new BinaryWriter(fileStream))
{
while (dataDone < intArr.Length)
{
int dataToWrite = intArr.Length - dataDone;
if (dataToWrite > WRITECOUNT) dataToWrite = WRITECOUNT;
Buffer.BlockCopy(intArr, dataDone, byteArr, 0, dataToWrite * sizeof(int));
writer.Write(byteArr);
dataDone += dataToWrite;
}
}
Note that this is just for writing, reading works differently too :P.
I hope this gives you some more insight in dealing with very large data files :).
If you've just got a bunch of integers, then using JSON will indeed be pretty inefficient in terms of parsing. You can use BinaryReader and BinaryWriter to write binary files efficiently... but it's not clear to me why you need to read the file every time you create an object anyway. Why can't each new object keep a reference to the original array, which has been read once? Or if they need to mutate the data, you could keep one "canonical source" and just copy that array in memory each time you create an object.
The fastest way to create a byte array from an array of integers is to use Buffer.BlockCopy
byte[] result = new byte[a.Length * sizeof(int)];
Buffer.BlockCopy(a, 0, result, 0, result.Length);
// write result to FileStream or wherever
If you store the size of the array in the first element, you can use it again to deserialize. Make sure everything fits into memory, but looking at your file sizes it should.
var buffer = File.ReadAllBytes(#"...");
int size = BitConverter.ToInt32(buffer,0);
var result = new int[size];
Buffer.BlockCopy(buffer, 0, result, result.length);
Binary is not human readable, but definetely faster than JSON.