I have a byte[200] that is read from a file, representing a short[100] in little-endian format. This is how I read it:
using (FileStream fs = new FileStream(_path, FileMode.Open, FileAccess.Read))
{
//fs.Seek(...)
byte[] record = new byte[200];
fs.Read(record, 0, record.Length);
short[] target = new short[100];
// magic operation that fills target array
}
I don't know what to put in "magic operation". I've read about BitConverter, but it doesn't seem to have a BitConverter.ToShort operation. Anyway, BitConverter seems to convert in a loop, whereas I would appreciate some way to "block copy" the whole array at once, if possible.
I think you're looking for Buffer.BlockCopy.
Buffer.BlockCopy(record, 0, target, 0, record.Length);
I believe that will preserve the endianness of the architecture you're on - so it may be inappropriate in some environments. You might want to abstract this into a method call which can check (once) whether or not it does what you want (e.g. by converting {0, 1} and seeing whether the result is {1} or {256}) and then either uses Buffer.BlockCopy or does it "manually" in a loop if necessary.
Related
I want to store an array of timestamps in a binary flat file. One of my requirements is that I can access individual timestamps later on for efficient query purposes without having to read and deserialize the entire array first (I use a binary search algorithm that finds the file position of a start timestamp and end timestamp which in turn determines which bytes are read and deserialized between those two timestamps because the entire binary file can be multiple gigabytes large in size).
Obviously, the simple but slow way is to use BitConverter.GetBytes(timestamp) to convert each timestamp to bytes and to then store them in the file. I can then access each item individually in the file and use my custom binary search algorithm to find the timestamp that matches with the desired timestamp.
However, I found that BinaryFormatter is incredibly efficient (multiple times faster than protobuf-net and any other serializer I tried) regarding serialization/deserialization of value type arrays. Hence I attempted to try to serialize an array of timestamps into binary form. However, apparently that will now prevent me from accessing individual timestamps in the file without having to first deserialize the entire array.
Is there a way to still access individual items in binary form after having serialized an entire array of items via BinaryFormatter?
Here is some code snippet that demonstrates what I mean:
var sampleArray = new int[5] { 1,2,3,4,5};
var serializedSingleValueArray = sampleArray.SelectMany(x => BitConverter.GetBytes(x)).ToArray();
var serializedArrayofSingleValues = Serializers.BinarySerializeToArray(sampleArray);
var deserializesToCorrectValue = BitConverter.ToInt32(serializedSingleValueArray, 0); //value = 1 (ok)
var wrongDeserialization = BitConverter.ToInt32(serializedArrayofSingleValues, 0); //value = 256 (???)
Here the serialization function:
public static byte[]BinarySerializeToArray(object toSerialize)
{
using (var stream = new MemoryStream())
{
Formatter.Serialize(stream, toSerialize);
return stream.ToArray();
}
}
Edit: I do not need to concern myself with efficient memory consumption or file sizes as those are currently by far not the bottlenecks. It is the speed of serialization and deserialization that is the bottleneck for me with multi-gigabyte large binary files and hence very large arrays of primitives.
If your problem is just "how to convert an array of struct,to byte[]" you have other options than BitConverter. BitConverter is for single values, the Buffer class is for arrays.
double[] d = new double[100];
d[4] = 1235;
d[8] = 5678;
byte[] b = new byte[800];
Buffer.BlockCopy(d, 0, b, 0, d.Length*sizeof(double));
// just to test it works
double[] d1 = new double[100];
Buffer.BlockCopy(b, 0, d1, 0, d.Length * sizeof(double));
This does a byte-level copy without converting anything and without iterating over items.
You can put this byte array directly to your stream (not a StreamWriter, not a Formatter)
stream.Write(b, 0, 800);
That's definitly the fastest way to write to a file,but it involves a complete copy, but probably also any other thinkable method, will read an item, store it first for some reason, before it goes to the file.
If this is the only thing you write to your file - you don't need to write the array-length in the file, you can use the file-length for this.
To read the 100th double value in the file:
file.Seek(100*sizeof(double), SeekOrigin.Begin);
byte[] tmp = new byte[8];
f.Read(tmp, 0, 8);
double value = BitConverter.ToDouble(tmp, 0);
Here, for single value, you can use BitConverter.
This is the solution for .NET Framework, C# <= 7.0
For .NET Standard/.NET Core, C# 8.0 you have more options with Span<T>, which gives you access to the internal memory, without copying the Data.
A Bitconverter is not a "slow" version, it's just a way to convert everything to a byte[] sequence. This is actually not costly, it's just interpreting the memory differently.
Computing the position in file, load 8 bytes, convert it to DateTime, you are done.
You should do it only with simple structured files, and with simple structured files you don't need a binary formatter. Just load/save your one array to one file. This way you can be sure your file-positions can be computed.
So in other words. Save your array yourself, Date byte Date, than you can load it also Date by Date.
Writing with one processing style, Reading with another, is always a bad idea.
My current approach is to read the COM stream into a C# MemoryStream and then call .toArray. However, I believe toArray creates a redundant copy of the data. Is there a better way that has reduced memory usage as the priority?
var memStream = new MemoryStream(10000);
var chunk = new byte[1000];
while (true)
{
int bytesRead = comStream.read(ref chunk, chunk.Length);
if (bytesRead == 0)
break; // eos
memStream.Write(chunk, 0, bytesRead);
}
//fairly sure this creates a duplicate copy of the data
var array = memStream.ToArray();
//does this also dupe the data?
var array2 = memStream.GetBuffer();
If you know the length of the data before you start consuming it, then: you can allocate a simple byte[] and fill that in your read loop simply by incrementing an offset each read with the number of bytes read (and decrementing your "number of bytes you're allowed to touch). This does depend on having a read overload / API that accepts either an offset or a pointer, though.
If that isn't an option: GetBuffer() is your best bet - it doesn't duplicate the data; rather it hands you the current possibly oversized byte[]. Because it is oversized, you must consider it in combination with the current .Length, perhaps wrapping the length/data pair in either a ArraySegment<byte>, or a Span<byte>/Memory<byte>.
In the "the length is known" scenario, if you're happy to work with oversized buffers, you could also consider a leased array, via ArrayPool<byte>.Shared - rent one of at least that size, fill it, then constrain your segment/span to the populated part (and remember to return it to the pool when you're done).
I've been toying around with some .NET features (namely Pipelines, Memory, and Array Pools) for high speed file reading/parsing. I came across something interesting while playing around with Array.Copy, Buffer.BlockCopy and ReadOnlySequence.CopyTo. The IO Pipeline reads data as byte and I'm attempting to efficiently turn it into char.
While playing around with Array.Copy I found that I am able to copy from byte[] to char[] and the compiler (and runtime) are more than happy to do it.
char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Array.Copy(buffer, 0, outputBuffer, 0, buffer.Length);
This code runs as expected, though I'm sure there are some UTF edge cases not properly handled here.
My curiosity comes with Buffer.BlockCopy
char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Buffer.BlockCopy(buffer, 0, outputBuffer, 0, buffer.Length);
The resulting contents of outputBuffer are garbage. For example, with the example contents of buffer as
{ 50, 48, 49, 56, 45 }
The contents of outputBuffer after the copy is
{ 12338, 14385, 12333, 11575, 14385 }
I'm just curious what is happening "under the hood" inside the CLR that is causing these 2 commands to output such different results.
Array.Copy() is smarter about the element type. It will try to use the memmove() CRT function when it can. But will fall back to a loop that copies each element when it can't. Converting them as necessary, it considers boxing and primitive type conversions. So one element in the source array will become one element in the destination array.
Buffer.BlockCopy() skips all that and blasts with memmove(). No conversions are considered. Which is why it can be slightly faster. And easier to mislead you about the array content. Do note that utf8 encoded character data is visible in that array, 12338 == 0x3032 = "2 ", 14385 = 0x3831 = "18", etc. Easier to see with Debug > Windows > Memory > Memory 1.
Noteworthy perhaps is that this type-coercion is a feature. Say when you receive an int[] through a socket or pipe but have the data in a byte[] buffer. By far the fastest way to do it.
I am having trouble keeping the size of an array fixed. I have to read data from a file. when the number of bytes read is less then 160 the array MyPCMBuf changes to the size of bytes read.
byte[] MyPCMBuf;
MyPCMBuf = new byte[160];
BinaryReader reader = new BinaryReader(File.Open(fileNamein, FileMode.Open)) ;
Array.Clear (MyPCMBuf,0,160);
MyPCMBuf = reader.ReadBytes(160); //if bytes read =20 , size of MyPCMBuf becomes 20
What is going on and how to avoid it?
That's because ReadBytes returns a new byte array.
If you want to read bytes into an existing array, call Read.
That is:
int bytesRead = reader.Read(myBuffer, 0, 160);
You have no problem keeing the size fixed.
ReadBytes returns a new array. That simple. The old one never changes size.
If you want to use your buffer, user another method, for example:
public virtual int Read(
byte[] buffer,
int index,
int count
)
on that class. Then you keep your array.
Don't use BinaryReader at all. Use the FileStream returned by Open and call its Read method. If you don't assign to MyPCMBuf its length cannot possibly change.
Unrelated: Use Array.Clear (MyPCMBuf,0,MyPCMBuf.Length);. Less redundancy. Less potential for errors. Use using. Don't initialize MyPCMBuf if you always overwrite it. Don't clear it redundantly. I see a lot of misunderstandings here. Be more rigorous in your approach to programming. It appears you don't really understand all the language features and APIs you are using. That's dangerous.
You are overwriting MyPCMBuf in line
MyPCMBuf = reader.ReadBytes(160);
Thus, line MyPCMBuf = new byte[160]; is irrelevant in your code.
You are not really doing what you think you are.
reader.ReadBytes(160) will create a new byte array of size at most 160. It's not storing values in the array you already had. That one is send to Garbage Collection.
Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.
It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.
MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation
If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.
I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.
The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.
Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)