I've been toying around with some .NET features (namely Pipelines, Memory, and Array Pools) for high speed file reading/parsing. I came across something interesting while playing around with Array.Copy, Buffer.BlockCopy and ReadOnlySequence.CopyTo. The IO Pipeline reads data as byte and I'm attempting to efficiently turn it into char.
While playing around with Array.Copy I found that I am able to copy from byte[] to char[] and the compiler (and runtime) are more than happy to do it.
char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Array.Copy(buffer, 0, outputBuffer, 0, buffer.Length);
This code runs as expected, though I'm sure there are some UTF edge cases not properly handled here.
My curiosity comes with Buffer.BlockCopy
char[] outputBuffer = ArrayPool<char>.Shared.Rent(inputBuffer.Length);
Buffer.BlockCopy(buffer, 0, outputBuffer, 0, buffer.Length);
The resulting contents of outputBuffer are garbage. For example, with the example contents of buffer as
{ 50, 48, 49, 56, 45 }
The contents of outputBuffer after the copy is
{ 12338, 14385, 12333, 11575, 14385 }
I'm just curious what is happening "under the hood" inside the CLR that is causing these 2 commands to output such different results.
Array.Copy() is smarter about the element type. It will try to use the memmove() CRT function when it can. But will fall back to a loop that copies each element when it can't. Converting them as necessary, it considers boxing and primitive type conversions. So one element in the source array will become one element in the destination array.
Buffer.BlockCopy() skips all that and blasts with memmove(). No conversions are considered. Which is why it can be slightly faster. And easier to mislead you about the array content. Do note that utf8 encoded character data is visible in that array, 12338 == 0x3032 = "2 ", 14385 = 0x3831 = "18", etc. Easier to see with Debug > Windows > Memory > Memory 1.
Noteworthy perhaps is that this type-coercion is a feature. Say when you receive an int[] through a socket or pipe but have the data in a byte[] buffer. By far the fastest way to do it.
Related
My current approach is to read the COM stream into a C# MemoryStream and then call .toArray. However, I believe toArray creates a redundant copy of the data. Is there a better way that has reduced memory usage as the priority?
var memStream = new MemoryStream(10000);
var chunk = new byte[1000];
while (true)
{
int bytesRead = comStream.read(ref chunk, chunk.Length);
if (bytesRead == 0)
break; // eos
memStream.Write(chunk, 0, bytesRead);
}
//fairly sure this creates a duplicate copy of the data
var array = memStream.ToArray();
//does this also dupe the data?
var array2 = memStream.GetBuffer();
If you know the length of the data before you start consuming it, then: you can allocate a simple byte[] and fill that in your read loop simply by incrementing an offset each read with the number of bytes read (and decrementing your "number of bytes you're allowed to touch). This does depend on having a read overload / API that accepts either an offset or a pointer, though.
If that isn't an option: GetBuffer() is your best bet - it doesn't duplicate the data; rather it hands you the current possibly oversized byte[]. Because it is oversized, you must consider it in combination with the current .Length, perhaps wrapping the length/data pair in either a ArraySegment<byte>, or a Span<byte>/Memory<byte>.
In the "the length is known" scenario, if you're happy to work with oversized buffers, you could also consider a leased array, via ArrayPool<byte>.Shared - rent one of at least that size, fill it, then constrain your segment/span to the populated part (and remember to return it to the pool when you're done).
BACKGROUND
I am writing a C# program which collects some information by data acquisition. It's quite complex so I won't detail it all here, but the data acquisition is instigated continuously and then, on an asynchronous thread, my program periodically visits the acquisition buffer and takes 100 samples from it. I then look inside the 100 samples for a trigger condition which I am interested in. If I see the trigger condition I collect a bunch of samples from a pre-trigger buffer, a bunch more from a post-trigger buffer, and assemble it all together into one 200-element array.
In my asynchronous thread I assemble my 200-element array (of type double) using the Buffer.BlockCopy method. The only specific reason I chose to use this method is that I need to be careful about how much data processing I do in my asynchronous thread; if I do too much I can end up over-filling the acquisition buffer because I am not visiting it often enough. Since Buffer.BlockCopy is much more efficient at pushing data from a source array into a destination array than a big 'for loop', that's the sole reason I decided to use it.
THE QUESTION
When I call the Buffer.BlockCopy method I do this:
Buffer.BlockCopy(newData, 0, myPulse, numSamplesfromPreTrigBuf, (trigLocation * sizeof(double));
Where;
newData is a double[] array containing new data (100 elements) (with typical data like 0.0034, 6.4342, etc ranging from 0 to 7).
myPulse is the destination array. It is instantiated with 200 elements.
numSamplesfromPreTrigBuf is an offset that I want to apply in this particular instance of the copy
trigLocation is the number of elements I want to copy in this particular instance.
The copy occurs without error, but the data written into myPulse is all screwed up; numbers such as -2.05E-289 and 5.72E+250. Either tiny numbers or massive numbers. These numbers do not occur in my source array.
I have resolved the issue simply by using Array.Copy() instead, with no other source-code modification except for removing the need to calculate the number of elements to copy by multiplying by sizeof(double). But I did spend two hours trying to debug the Buffer.BlockCopy() method with absolutely no idea why the copy is garbage.
Would any body have an idea, from my example usage of Buffer.BlockCopy (which I believe is the correct usage), how garbage data might be copied across?
I assume your offset is wrong - it's also a byte-offset, so you need to multiply it by sizeof(double), just like with the length.
Be careful about using BlockCopy and similar methods - you lose some of the safety of .NET. Unlike outright unsafe methods, it does check array bounds, but you can still produce some pretty weird results (and I assume you could e.g. produce invalid references - a big problem EDIT: fortunately, BlockCopy only works on primitive typed arrays).
Also, BlockCopy isn't thread-safe, so you want to synchronize access to the shared buffer, if you're accessing it from more than one thread at a time.
Indeed, Buffer.BlockCopy allows the source Array and destination Array to have different element types, so long as each element type is primitive. Either way, as you can see from the mscorlib.dll source code for ../vm/comutilnative.cpp, the copy is just a direct imaging operation which never interprets the copied bytes in any way (i.e., as 'logical' or 'numeric' values). It basically calls the C-language classic, memmove. So don't expect this:
var rgb = new byte[] { 1, 2, 3 };
var rgl = new long[3];
Buffer.BlockCopy(rgb, 0, rgl, 0, 3); // likely ERROR: integers never widened or narrowed
// INTENTION?: rgl = { 1, 2, 3 }
// RESULT: rgl = { 0x0000000000030201, 0, 0 }
Now given that Buffer.BlockCopy takes just a single count argument, allowing differently-sized element types introduces the fundamental semantic ambiguity of whether that single count argument would be counting the total bytes in terms or source or destination elements. Solutions to this might include:
Add a second count property so you'd have one each for src and dst; (no...)
Arbitrarily select src vs. dst for expressing count--and document the choice; (no...)
Always express count in bytes, since element size "1" is the (only) common denominator that's suitable to arbitrary different-sized element types. (yes)
Since (1.) is complex (possibly adding even more confusion), and the arbitrary symmetry-breaking of (2.) is poorly self-documenting, the choice taken here was (3.), meaning the count argument must always be specified in bytes.
Because the situation for the srcOffset and dstOffset arguments isn't as critical (on account of there being independent arguments for each 'offset', whereby each c̲o̲u̲l̲d̲ be indexed relative to its respective Array; spoiler alert: ...they aren't), it's less-widely mentioned that these arguments are also always expressed in bytes. From the documentation (emphasis added):
Buffer.BlockCopyhttps://learn.microsoft.com/en-us/dotnet/api/system.buffer.blockcopyParameters
src Array The source buffer.
srcOffset Int32 The zero-based byte offset into src.
dst Array The destination buffer.
dstOffset Int32 The zero-based byte offset into dst.
count Int32 The number of bytes to copy.
The fact that the srcOffset and dstOffset are byte-offsets leads to the strange situations under discussion on this page. For one thing, it entails that the copy of the first and/or last element can be partial:
var rgb = new byte[] { 0xFF, 1, 2, 3, 4, 5, 6, 7, 8, 0xFF };
var rgl = new long[10];
Buffer.BlockCopy(rgb, 1, rgl, 1, 8); // likely ERROR: does not target rgl[1], but
// rather *parts of* both rgl[0] and rgl[1]
// INTENTION? (see above) rgl = { 0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 0L } ✘
// INTENTION? (little-endian) rgl = { 0L, 0x0807060504030201L, 0L, ... } ✘
// INTENTION? (big-endian) rgl = { 0L, 0x0102030405060708L, 0L, ... } ✘
// ACTUAL RESULT: rgl = { 0x0706050403020100L, 8L, 0L, 0L, ... } ?
// ^^-- this byte not copied (see text)
Here, we see that instead of (perhaps) copying something into rgl[1], the element at index 1, and then (maybe) continuing on from there, the copy targeted byte-offset 1 within the first element rgl[0], and led to a partial copy--and surely unintended corruption--of that element. Specifically, byte 0 of rgl[0]--the least-significant-byte of a little-endian long--was not copied.
Continuing with the example, the long value at index 1 is a̲l̲s̲o̲ incompletely written, this time storing value '8' into its least-significant-byte, notably without affecting its other (upper) 7 bytes.
Because I didn't craft my example well enough to explicitly show it, let me be clear about this last point: for these partially-copied long values, the parts that are not copied are not zeroed out as might normally be expected from a proper long store of a byte value. So for the discussion of Buffer.BlockCopy, "partially-copied" means that the un-copied bytes of any multi-byte primitive (e.g. long) value are retained unaltered from before the operation, and thus become "merged" into the new value in some endianness-dependent--and thus likely (and hopefully) unintentional--manner.
To "fix" the example code, the offset supplied for each Array must be pre-multiplied by its respective element size to convert it to a byte offset. This will "correct" the above code to the only sensible operation Buffer.BlockCopy might reasonably perform here, namely a little-endian copy between (one or more) source and (one or more) destination elements, taking care to ensure that no element is partially- or incompletely-copied, respective to its size.
Buffer.BlockCopy(rgb, 1 * sizeof(byte), rgl, 1 * sizeof(long), 8); // CORRECTED (?)
// CORRECT RESULT: rgl = { 0L, 0x0807060504030201L, 0L, ... } ✔
// repaired code shows a proper little-endian store of eight consecutive bytes from a
// byte[] into exactly one complete element of a long[].
In the fixed example, 8 complete byte elements are copied to 1 complete long element. For simplicity, this is a "many-to-1" copy, but you can imagine more elaborate scenarios as well (not shown). In fact, with respect to element count from source-to-destination, a single call to Buffer.BlockCopy can deploy any of five operational patterns: { nop, 1-to-1, 1-to-many, many-to-1, many-to-many }.
The code also illustrates how concerns of endianness are implicated by Buffer.BlockCopy accepting arrays of differently-sized elements. Indeed, the repaired example seems to entail that the code now inherently incorporates a (correctness-)dependency on the endianness of the CPU on which it happens to be running. Combine this with the fact that realistic use-cases seem scarce or obscure, especially remembering the very real and error-prone partial-copying hazard discussed above.
Considering these points would suggest that the technique of mixing source/destination element sizes within a single call to Buffer.BlockCopy, while allowed by the API, should be avoided. In any case, use mixed element sizes with special caution, if at all.
I have a byte[200] that is read from a file, representing a short[100] in little-endian format. This is how I read it:
using (FileStream fs = new FileStream(_path, FileMode.Open, FileAccess.Read))
{
//fs.Seek(...)
byte[] record = new byte[200];
fs.Read(record, 0, record.Length);
short[] target = new short[100];
// magic operation that fills target array
}
I don't know what to put in "magic operation". I've read about BitConverter, but it doesn't seem to have a BitConverter.ToShort operation. Anyway, BitConverter seems to convert in a loop, whereas I would appreciate some way to "block copy" the whole array at once, if possible.
I think you're looking for Buffer.BlockCopy.
Buffer.BlockCopy(record, 0, target, 0, record.Length);
I believe that will preserve the endianness of the architecture you're on - so it may be inappropriate in some environments. You might want to abstract this into a method call which can check (once) whether or not it does what you want (e.g. by converting {0, 1} and seeing whether the result is {1} or {256}) and then either uses Buffer.BlockCopy or does it "manually" in a loop if necessary.
I have a large byte array with mostly 0's but some values that I need to process. If this was C++ or unsafe C# I would use a 32bit pointer and only if the current 32bit were not 0, I would look at the individual bytes. This enables much faster scanning through the all 0 blocks. Unfortunately this must be safe C# :-)
I could use an uint array instead of a byte array and then manipulate the individual bytes but it makes what I'm doing much more messy than I like. I'm looking for something simpler, like the pointer example (I miss pointers sigh)
Thanks!
If the code must be safe, and you don't want to use a larger type and "shift", them you'll have to iterate each byte.
(edit) If the data is sufficiently sparse, you could use a dictionary to store the non-zero values; then finding the non-zeros is trivial (and enormous by sparse arrays become cheap).
I'd follow what this guy said:
Using SSE in c# is it possible?
Basically, write a little bit of C/C++, possibly using SSE, to implement the scanning part efficiently, and call it from C#.
You can access the characters
string.ToCharArray()
Or you can access the raw byte[]
Text.Encoding.UTF8Encoding.GetBytes(stringvalue)
Ultimately, what I think you'd need here is
MemoryStream stream;
stream.Write(...)
then you will be able to directly hadnle the memory's buffer
There is also UnmanagedMemoryStream but I'm not sure whether it'd use unsafe calls inside
You can use the BitConverter class:
byte[] byteArray = GetByteArray(); // or whatever
for (int i = 0; i < b.Length; I += 2)
{
uint x = BitConverter.ToUInt32(byteArray, i);
// do what you want with x
}
Another option is to create a MemoryStream from the byte array, and then use a BinaryReader to read 32-bit values from it.
Is there a best (see below) way to append two byte arrays in C#?
Pretending I have complete control, I can make the first byte array sufficiently large to hold the second byte array at the end and use the Array.CopyTo function. Or I can loop over individual bytes and make an assignment.
Are there better ways? I can't imagine doing something like converting the byte arrays to string and joining them and converting them back would be better than either method above.
In terms of best/better (in order):
Fastest
Least RAM consumption
A constraint is that I must work in the .NET 2.0 framework.
The two choices recommended are MemoryStream and BlockCopy. I have run a simple speed test of 10,000,000 loops 3 times and got the following results:
Average of 3 runs of 10,000,000 loops in milliseconds:
BlockCopy Time: 1154, with a range of 13 milliseconds
MemoryStream GetBuffer Time: 1470, with a range of 14 milliseconds
MemoryStream ToArray Time: 1895, with a range of 3 milliseconds
CopyTo Time: 2079, with a range of 19 milliseconds
Byte-by-byte Time: 2203, with a range of 10 milliseconds
Results of List<byte> AddRange over 10 million loops:
List<byte> Time: 16694
Relative RAM Consumption (1 is baseline, higher is worse):
Byte-by-byte: 1
BlockCopy: 1
Copy To: 1
MemoryStream GetBuffer: 2.3
MemoryStream ToArray: 3.3
List<byte>: 4.2
The test shows that in general, unless you are doing a lot of byte copies [which I am], looking at byte copies is not worth a focus [e.g. 10 million runs yielding a difference of as much as 1.1 seconds].
You want BlockCopy
According to this blog post it is faster than Array.CopyTo.
You could also use an approach with a MemoryStream. Suppose b1 and b2 are two byte arrays, you can get a new one, b3, by using the MemoryStream in the following fashion:
var s = new MemoryStream();
s.Write(b1, 0, b1.Length);
s.Write(b2, 0, b2.Length);
var b3 = s.ToArray();
This should work without LINQ and is in fact quite a bit faster.
Create a new MemoryStream passing into the constructor a buffer that's exactly the size of the merged one. Write the individual arrays, and then finally use the buffer:
byte[] deadBeef = new byte[] { 0xDE, 0xAD, 0xBE, 0xEF};
byte[] baadF00d = new byte[] { 0xBA, 0xAD, 0xF0, 0x0D};
int newSize = deadBeef.Length + baadF00d.Length;
var ms = new MemoryStream(new byte[newSize], 0, newSize, true, true);
ms.Write(deadBeef, 0, deadBeef.Length);
ms.Write(baadF00d, 0, baadF00d.Length);
byte[] merged = ms.GetBuffer();
A lot of the low-level I/O functions in .NET take byte arrays and offsets. This was done to prevent needless copies. Be sure you really need the merged array if this is performance sensitive, otherwise just use buffers and offsets.
Another option, although I haven't tested it to see how it fares in terms of speed and memory consumption, would the LINQ approach:
byte[] combined = bytesOne.Concat(bytesTwo).Concat(bytesThree).ToArray();
...where bytesOne, bytesTwo, and bytesThree are byte arrays. Since Concat uses deferred execution, this shouldn't create any intermediate arrays, and it shouldn't duplicate the original arrays until it constructs the final merged array at the end.
Edit: LINQBridge will allow you to use LINQ-to-Objects (which this is an example of) in the 2.0 framework. I understand if you don't want to depend on this, but it's an option.
If you have arrays where the size will change from time to time, you're probably better off using a List<T> in the first place. Then you can just call the AddRange() method of the list.
Otherwise, Array.Copy() or Array.CopyTo() are as good as anything else you're likely to see.
Have you taught about using List or ArrayList instead of an Array? With these types they can grow or shrink and append via InsertRange
Do you need the output to actually be a byte array?
If not, you could create yourself a "smart cursor" (which is similar to what LINQ does): Create a custom IEnumerator<byte> that will first iterate the first array, and just continue on the second one without interuption.
This would work in the 2.0 framework be fast (in that the joining of arrays has virtually no cost), and use no more RAM than the arrays already consume.
Your first option of making the first array large enough to contain the second array and using Array.CopyTo ends up being roughly the same as manually iterating over each item and making the assignment. Array.CopyTo() just makes it more concise.
Converting to string and back to array will be horribly slow in contrast to the above. And would likely use more memory.