BitConverter.GetBytes in place - c#

I need to get values in UInt16 and UInt64 as Byte[]. At the moment I am using BitConverter.GetBytes, but this method gives me a new array instance each time.
I would like to use a method that allow me to "copy" those values to already existing arrays, something like:
.ToBytes(UInt64 value, Byte[] array, Int32 offset);
.ToBytes(UInt16 value, Byte[] array, Int32 offset);
I have been taking a look to the .NET source code with ILSpy, but I not very sure how this code works and how can I safely modify it to fit my requirement:
public unsafe static byte[] GetBytes(long value)
{
byte[] array = new byte[8];
fixed (byte* ptr = array)
{
*(long*)ptr = value;
}
return array;
}
Which would be the right way of accomplishing this?
Updated: I cannot use unsafe code. It should not create new array instances.

You can do like this:
static unsafe void ToBytes(ulong value, byte[] array, int offset)
{
fixed (byte* ptr = &array[offset])
*(ulong*)ptr = value;
}
Usage:
byte[] array = new byte[9];
ToBytes(0x1122334455667788, array, 1);
You can set offset only in bytes.
If you want managed way to do it:
static void ToBytes(ulong value, byte[] array, int offset)
{
byte[] valueBytes = BitConverter.GetBytes(value);
Array.Copy(valueBytes, 0, array, offset, valueBytes.Length);
}
Or you can fill values by yourself:
static void ToBytes(ulong value, byte[] array, int offset)
{
for (int i = 0; i < 8; i++)
{
array[offset + i] = (byte)value;
value >>= 8;
}
}

Now that .NET has added Span<T> support for better working with arrays, unmanaged memory, etc without excess allocations, they've also added System.Buffer.Binary.BinaryPrimitives.
This works as you would want, e.g. WriteUInt64BigEndian has this signature:
public static void WriteUInt64BigEndian (Span<byte> destination, ulong value);
which avoids allocating.

You say you want to avoid creating new arrays and you cannot use unsafe. Either use Ulugbek Umirov's answer with cached arrays (be careful with threading issues) or:
static void ToBytes(ulong value, byte[] array, int offset) {
unchecked {
array[offset + 0] = (byte)(value >> (8*7));
array[offset + 1] = (byte)(value >> (8*6));
array[offset + 2] = (byte)(value >> (8*5));
array[offset + 3] = (byte)(value >> (8*4));
//...
}
}

It seems that you wish to avoid, for some reason, creating any temporary new arrays. And you also want to avoid unsafe code.
You could pin the object and then copy to the array.
public static void ToBytes(ulong value, byte[] array, int offset)
{
GCHandle handle = GCHandle.Alloc(value, GCHandleType.Pinned);
try
{
Marshal.Copy(handle.AddrOfPinnedObject(), array, offset, 8);
}
finally
{
handle.Free();
}
}

BinaryWriter can be a good solution.
var writer = new BinaryWriter(new MemoryStream(yourbuffer, youroffset, yourbuffer.Length-youroffset));
writer.Write(someuint64);
It's useful when you need to convert a lot of data into a buffer continuously
var writer = new BinaryWriter(new MemoryStream(yourbuffer));
foreach(var value in yourints){
writer.Write(value);
}
or when you just want to write to a file, that's the best case to use BinaryWriter.
var writer = new BinaryWriter(yourFileStream);
foreach(var value in yourints){
writer.Write(value);
}

For anyone else that stumbles across this.
If Big Endian support is NOT required and Unsafe is allowed then I've written a library specifically to minimize GC allocations during serialization for single threaded serializers such as when serializing across TCP sockets.
Note: supports Little Endian ONLY example X86 / X64 (might add big endian support some day)
https://github.com/tcwicks/ChillX/tree/master/src/ChillX.Serialization
In the process also re-wrote BitConverter so that it works similar to BitPrimitives but with a lot of extras such as supporting arrays as well.
https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Serialization/BitConverterExtended.cs
Random rnd = new Random();
RentedBuffer<byte> buffer = RentedBuffer<byte>.Shared.Rent(BitConverterExtended.SizeOfUInt64
+ (20 * BitConverterExtended.SizeOfUInt16)
+ (20 * BitConverterExtended.SizeOfTimeSpan)
+ (10 * BitConverterExtended.SizeOfSingle);
UInt64 exampleLong = long.MaxValue;
int startIndex = 0;
startIndex += BitConverterExtended.GetBytes(exampleLong, buffer.BufferSpan, startIndex);
UInt16[] shortArray = new UInt16[20];
for (int I = 0; I < shortArray.Length; I++) { shortArray[I] = (ushort)rnd.Next(0, UInt16.MaxValue); }
//When using reflection / expression trees CLR cannot distinguish between UInt16 and Int16 or Uint64 and Int64 etc...
//Therefore Uint methods are renamed.
startIndex += BitConverterExtended.GetBytesUShortArray(shortArray, buffer.BufferSpan, startIndex);
TimeSpan[] timespanArray = new TimeSpan[20];
for (int I = 0; I < timespanArray.Length; I++) { timespanArray[I] = TimeSpan.FromSeconds(rnd.Next(0, int.MaxValue)); }
startIndex += BitConverterExtended.GetBytes(timespanArray, buffer.BufferSpan, startIndex);
float[] floatArray = new float[10];
for (int I = 0; I < floatArray.Length; I++) { floatArray[I] = MathF.PI * rnd.Next(short.MinValue, short.MaxValue); }
startIndex += BitConverterExtended.GetBytes(floatArray, buffer.BufferSpan, startIndex);
//Do stuff with buffer and then
buffer.Return();
//Or
buffer = null;
//and let RentedBufferContract do this automatically
The underlying problem is actually two fold. Even if you use bitprimitives we still have the issue of the byte array buffers that we are writing to / reading from. The issue is further compounded if we have lots of array fields T[] for example int[]
Using standard BitConverter leads to situations like 80% time spent in GC.
bitprimitives is a much better but we still need to manage the byte[] buffers. If we create one byte[] buffer for each int in an array of 1000 ints that is 1000 byte[4] arrays to garbage collect which means rent a buffer big enough for all 1000 ints (int[4000]) from ArrayPool but then we have to manage returning these buffers.
Further complicating it is the fact that we have to track the length of each buffer that is actually used since ArrayPool will return arrays that are larger than that which is requested. System.Buffers.ArrayPool.Shared.Rent(4000); will usually return a byte[4096] or maybe even a byte[8192]
This serialization code:
uses a wrappers around ArrayPool in order make its usage transparent:
https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Core/Structures/ManagedPool.cs
For managing renting and returning of ArrayPool buffers:
https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Core/Structures/RentedBuffer.cs
With operator overloads on + operator to automatically assign from Span
For preventing memory leaks and also managing use and forget style guaranteed return of rented buffers:
https://github.com/tcwicks/ChillX/blob/master/src/ChillX.Core/Structures/RentedBufferContract.cs
Example:
//Assume ArrayProperty_Long is a long[] array.
RentedBuffer<long> ClonedArray = RentedBuffer<long>.Shared.Rent(ArrayProperty_Long.Length);
ClonedArray += ArrayProperty_Long;
ClonedArray will be returned to ArrayPool.Shared automatically when ClonedArray goes out of scope (handled by RentedBufferContract).
Or call ClonedArray.Return();
p.s. Any feedback is much appreciated.
Following benchmarks compare performance using rented buffers and this serializer implementation versus using MessagePack without rented buffers. Messagepack if micro benchmarked is a faster serializer. This performance difference is purely due to reducing GC collection overheads by pooling / renting buffers.
----------------------------------------------------------------------------------
Benchmarking ChillX Serializer: Test Object: Data class with 31 properties / fields of different types inlcuding multiple arrays of different types
Num Reps: 50000 - Array Size: 256
----------------------------------------------------------------------------------
Entity Size bytes: 20,678 - Count: 00,050,000 - Threads: 01 - Entities Per Second: 00,025,523 - Mbps: 503.32 - Time: 00:00:01.9589880
Entity Size bytes: 20,678 - Count: 00,050,000 - Threads: 01 - Entities Per Second: 00,026,129 - Mbps: 515.26 - Time: 00:00:01.9135886
Entity Size bytes: 20,678 - Count: 00,050,000 - Threads: 01 - Entities Per Second: 00,026,291 - Mbps: 518.46 - Time: 00:00:01.9018027
----------------------------------------------------------------------------------
Benchmarking MessagePack Serializer: Test Object: Data class with 31 properties / fields of different types inlcuding multiple arrays of different types
Num Reps: 50000 - Array Size: 256
----------------------------------------------------------------------------------
Entity Size bytes: 19,811 - Count: 00,050,000 - Threads: 01 - Entities Per Second: 00,012,386 - Mbps: 234.01 - Time: 00:00:04.0668261
Entity Size bytes: 19,811 - Count: 00,050,000 - Threads: 01 - Entities Per Second: 00,012,285 - Mbps: 232.10 - Time: 00:00:04.0523329
Entity Size bytes: 19,811 - Count: 00,050,000 - Threads: 01 - Entities Per Second: 00,012,642 - Mbps: 238.84 - Time: 00:00:03.9811276

Related

Better Efficiency for assembling integers from bytes

My application receives data from a serial port, which is send in packets. The packets are defined as following
1 byte - Identifier
2 bytes - lenght of data
n bytes - data
1 bytes - Checksum
For example if the length is specified as 508 there will be 508 bytes, which would be 127 uint32_t values.
Currently I use the following code to assemble the uint32_t values from the data that is sent in bytes:
private UInt32[] number_array = new UInt32[16384];
private void decodePacket(int startpos, byte[] data, int lenght)
{
/* Starting position */
int pos = startpos;
for(int i=0; i<lenght; i++)
{
/* Convert 4 bytes to one uint32_t value */
int value = data[i] | data[i + 1]<<8 | data[i + 2]<<16 | data[i + 3]<<24;
/* Write to array */
number_array[pos] = Convert.ToUInt32(value);
/* Advance i by 4 (bytes */
i += 4;
/* Advance pos */
pos++;
}
}
It does work fine, but I'm thinking it's very inefficient. There are usually 16384 uint32_t values to process, so this function is called a lot of times.
Is there a more efficient / faster way to do this?
Look at this simple code:
static void Main(string[] args)
{
byte[] data = new byte[]
{
1, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 5
};
byte id = data[0];
byte[] len = new byte[4];
Array.Copy(data, 1, len, 0, 2);
int dataLen = BitConverter.ToInt32(len, 0);
byte[] dataRead = new byte[dataLen];
Array.Copy(data, 3, dataRead, 0, dataLen);
byte checksum = data[data.Length - 1];
Console.ReadKey();
}
Now, first you get identifier that is on the first pos.
Next, you get length of the data. You have to create 4 byte array to copy values from data array to this new array. It should be 4 bytes, because you would like to convert those bytes into Int32. You could have this array 2 bytes length and convert it to Int16, but Int32 should have better performance.
So, when you have length data in this len array, you can convert those values into Int32. But be careful of endianes. It may be different.
At the end you create an array that will contain all the REAL data. And then copy the real data to new array.
This solution should be faster than yours. But... Be careful of endianess.
Here is the best possible way using C#
private void decodePacket(int startpos, byte[] data, int lenght)
{
/* Starting position */
int pos = startpos;
for (int i = 0; i < lenght; i += 4)
{
/* Convert 4 bytes to one uint32_t value */
int value = BitConverter.ToInt32(data, i);
/* Write to array */
number_array[pos] = Convert.ToUInt32(value);
/* Advance pos */
pos++;
}
}
This is same code as in question but with two changes.
Index i was being incremented at two places which was resulting in increments of 5 instead of 4. Changed it to just one increment of 4.
Use of BitConveter instead of bitwise logic. Although it might not provide any significant performance boost. It is better to use BitConverter for platform independence.
UPDATE
In case your bytes are stored at 32-bit aligned memory addresses BitConverter provides you with maximum performance conversion. But in C# you cannot guarantee memory location alignment. In that case bit shifting is the only way.
In case of bit shifting BitConverter also uses same logic for little endian systems as shown in question. But, it can help keeping your code platform independent by using another bit shifting pattern for big endian systems.

c# - fastest method get integer from part of bits

I have byte[] byteArray, usually byteArray.Length = 1-3
I need decompose an array into bits, take some bits (for example, 5-17), and convert these bits to Int32.
I tried to do this
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b)
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0);
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList();
bools = bools.GetRange(offset, length);
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() );
int[] array = new int[1];
(new BitArray(bools.ToArray())).CopyTo(array, 0);
return array[0];
}
But this method is too slow, and I have to call it very often.
How can I do this more efficiently?
Thanx a lot! Now i do this:
public static byte[] GetPartOfByteArray( byte[] source, int offset, int length)
{
byte[] retBytes = new byte[length];
Buffer.BlockCopy(source, offset, retBytes, 0, length);
return retBytes;
}
public static Int32 Bits2Int(byte[] source, int offset, int length)
{
if (source.Length > 4)
{
source = GetPartOfByteArray(source, offset / 8, (source.Length - offset / 8 > 3 ? 4 : source.Length - offset / 8));
offset -= 8 * (offset / 8);
}
byte[] intBytes = new byte[4];
source.CopyTo(intBytes, 0);
Int32 full = BitConverter.ToInt32(intBytes);
Int32 mask = (1 << length) - 1;
return (full >> offset) & mask;
}
And it works very fast!
If you're after "fast", then ultimately you need to do this with bit logic, not LINQ etc. I'm not going to write actual code, but you'd need to:
use your offset with / 8 and % 8 to find the starting byte and the bit-offset inside that byte
compose however many bytes you need - quite possibly up to 5 if you are after a 32-bit number (because of the possibility of an offset)
; for example into a long, in whichever endianness (presumably big-endian?) you expect
use right-shift (>>) on the composed value to drop however-many bits you need to apply the bit-offset (i.e. value >>= offset % 8;)
mask out any bits you don't want; for example value &= ~(-1L << length); (the -1 gives you all-ones; the << length creates length zeros at the right hand edge, and the ~ swaps all zeros for ones and ones for zeros, so you now have length ones at the right hand edge)
if the value is signed, you'll need to think about how you want negatives to be handled, especially if you aren't always reading 32 bits
First of all, you're asking for optimization. But the only things you've said are:
too slow
need to call it often
There's no information on:
how much slow is too slow? have you measured current code? have you estimated how fast you need it to be?
how often is "often"?
how large are the source byt arrays?
etc.
Optimization can be done in a multitude of ways. When asking for optimization, everything is important. For example, if source byte[] is 1 or 2 bytes long (yeah, may be ridiculous, but you didn't tell us), and if it rarely changes, then you could get very nice results by caching results. And so on.
So, no solutions from me, just a list of possible performance problems:
private static IEnumerable<bool> GetBitsStartingFromLSB(byte b) // A
{
for (int i = 0; i < 8; i++)
{
yield return (b % 2 != 0); // A
b = (byte)(b >> 1);
}
}
public static Int32 Bits2Int(ref byte[] source, int offset, int length)
{
List<bool> bools = source.SelectMany(GetBitsStartingFromLSB).ToList(); //A,B
bools = bools.GetRange(offset, length); //B
bools.AddRange(Enumerable.Repeat(false, 32-length).ToList() ); //C
int[] array = new int[1]; //D
(new BitArray(bools.ToArray())).CopyTo(array, 0); //D
return array[0]; //D
}
A: LINQ is fun, but not fast unless done carefully. For each input byte, it takes 1 byte, splits that in 8 bools, passing them around wrapped it in a compiler-generated IEnumerable object *). Note that it all needs to be cleaned up later, too. Probably you'd get a better performance simply returning a new bool[8] or even BitArray(size=8).
*) conceptually. In fact yield-return is lazy, so it's not 8valueobj+1refobj, but just 1 enumerable that generates items. But then, you're doing .ToList() in (B), so me writing this in that way isn't that far from truth.
A2: the 8 is hardcoded. Once you drop that pretty IEnumerable and change it to a constant-sized array-like thing, you can preallocate that array and pass it via parameter to GetBitsStartingFromLSB to further reduce the amount of temporary objects created and later thrown away. And since SelectMany visits items one-by-one without ever going back, that preallocated array can be reused.
B: Converts whole Source array to stream of bytes, converts it to List. Then discards that whole list except for a small offset-length range of that list. Why covert to list at all? It's just another pack of objects wasted, and internal data is copied as well, since bool is a valuetype. You could have taken the range directly from IEnumerable by .Skip(X).Take(Y)
C: padding a list of bools to have 32 items. AddRange/Repeat is fun, but Repeat has to return an IEnumerable. It's again another object that is created and throw away. You're padding the list with false. Drop the list idea, make it an bool[32]. Or BitArray(32). They start with false automatically. That's the default value of a bool. Iterate over the those bits from 'range' A+B and write them into that array by index. Those written will have their value, those unwritten will stay false. Job done, no objects wasted.
C2: connect preallocating 32-item array with A+A2. GetBitsStartingFromLSB doesn't need to return anything, it may get a buffer to be filled via parameter. And that buffer doesn't need to be 8-item buffer. You may pass the whole 32-item final array, and pass an offset so that function knows exactly where to write. Even less objects wasted.
D: finally, all that work to return selected bits as an integer. new temporary array is created&wasted, new BitArray is effectively created&wasted too. Note that earlier you were already doing manual bit-shift conversion int->bits in GetBitsStartingFromLSB, why not just create a similar method that will do some shifts and do bits->int instead? If you knew the order of the bits, now you know them as well. No need for array&BitArray, some code wiggling, and you save on that allocations and data copying again.
I have no idea how much time/space/etc will that save for you, but that's just a few points that stand out at first glance, without modifying your original idea for the code too much, without doing-it-all via math&bitshifts in one go, etc. I've seen MarcGravell already wrote you some hints too. If you have time to spare, I suggest you try first mine, one by one, and see how (and if at all !) each change affects performance. Just to see. Then you'll probably scrap it all and try again new "do-it-all via math&bitshifts in one go" version with Marc's hints.

c# int[] array to byte[] array only LSB

What is fastest way to write int[] array to byte[] array, but only LSB (last 8 bits)? I used for loop and bitmask bArray[i] = (byte)(iArray[i] & 0xFF), but array is very long (+900k) and this take about 50ms. Do you know maybe other faster option?
You can try parallelizing the workload:
Parallel.For(0,iArray.Length, i => bArray[i] = (byte)(iArray[i] & 0xFF));
This will spawn multiple threads to do the conversion. It tends to be faster on my machine, but, will sometimes take longer due to the overhead of spawning multiple threads.
What are you doing that 50ms is too slow?
Part of the slowness comes from the need to read more data than you need: when you do
array[i] & 0xFF
you read the entire four bytes of an int, only to drop three most significant ones.
You could avoid this overhead with unsafe code. Keep in mind that the approach below assumes little endian architecture:
static unsafe void CopyLsbUnsafe(int[] from, byte[] to) {
fixed (int* s = from) {
fixed (byte* d = to) {
byte* sb = (byte*) s;
int* db = (int*)d;
int* end = db + to.Length/4;
while (db != end) {
*db++ = (*(sb + 0) << 0)
| (*(sb + 4) << 8)
| (*(sb + 8) << 16)
| (*(sb + 12) << 24);
sb += 16;
}
}
}
}
The above code re-interprets the int array as an array of bytes, and the array of bytes as an array of integers. Then it reads every 4-th byte into the destination array using a pointer, writing to the destination in groups of four bytes using an integer assignment.
My testing shows a respectable 60% improvement over a simple loop.

Convert 2 successive Bytes to one int value Increase speed in C#

I need to combine two Bytes into one int value.
I receive from my camera a 16bit Image were two successive bytes have the intensity value of one pixel. My goal is to combine these two bytes into one "int" vale.
I manage to do this using the following code:
for (int i = 0; i < VectorLength * 2; i = i + 2)
{
NewImageVector[ImagePointer] = ((int)(buffer.Array[i + 1]) << 8) | ((int)(buffer.Array[i]));
ImagePointer++;
}
My image is 1280*960 so VectorLength==1228800 and the incomming buffer size is 2*1228800=2457600 elements...
Is there any way that I can speed this up?
Maybe there is another way so I don't need to use a for-loop.
Thank you
You could use the equivalent to the union of c. Im not sure if faster, but more elegant:
[StructLayout(LayoutKind.Explicit)]
struct byte_array
{
[FieldOffset(0)]
public byte byte1;
[FieldOffset(1)]
public byte byte2;
[FieldOffset(0)]
public short int0;
}
use it like this:
byte_array ba = new byte_array();
//insert the two bytes
ba.byte1 = (byte)(buffer.Array[i]);
ba.byte2 = (byte)(buffer.Array[i + 1]);
//get the integer
NewImageVector[ImagePointer] = ba.int1;
You can fill your two bytes and use the int. To find the faster way take the StopWatch-Class and compare the two ways like this:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
//The code
stopWatch.Stop();
MessageBox.Show(stopWatch.ElapsedTicks.ToString()); //Or milliseconds ,...
Assuming you can (re-)define NewImageVector as a short[], and every two consecutive bytes in Buffer should be transformed into a short (which basically what you're doing now, only you cast to an int afterwards), you can use Buffer.BlockCopy to do it for you.
As the documentation tells, you Buffer.BlockCopy copies bytes from one array to another, so in order to copy your bytes in buffer you need to do the following:
Buffer.BlockCopy(Buffer, 0, NewImageVector, 0, [NumberOfExpectedShorts] * 2)
This tells BlockCopy that you want to start copying bytes from Buffer, starting at index 0, to NewImageVector starting at index 0, and you want to copy [NumberOfExpectedShorts] * 2 bytes (since every short is two bytes long).
No loops, but it does depend on the ability of using a short[] array instead of an int[] array (and indeed, on using an array to begin with).
Note that this also requires the bytes in Buffer to be in little-endian order (i.e. Buffer[index] contains the low byte, buffer[index + 1] the high byte).
You can achieve a small performance increase by using unsafe pointers to iterate the arrays. The following code assumes that source is the input byte array (buffer.Array in your case). It also assumes that source has an even number of elements. In production code you would obviously have to check these things.
int[] output = new int[source.Length / 2];
fixed (byte* pSource = source)
fixed (int* pDestination = output)
{
byte* sourceIterator = pSource;
int* destIterator = pDestination;
for (int i = 0; i < output.Length; i++)
{
(*destIterator) = ((*sourceIterator) | (*(sourceIterator + 1) << 8));
destIterator++;
sourceIterator += 2;
}
}
return output;

What is the equivalent of memset in C#?

I need to fill a byte[] with a single non-zero value. How can I do this in C# without looping through each byte in the array?
Update: The comments seem to have split this into two questions -
Is there a Framework method to fill a byte[] that might be akin to memset
What is the most efficient way to do it when we are dealing with a very large array?
I totally agree that using a simple loop works just fine, as Eric and others have pointed out. The point of the question was to see if I could learn something new about C# :) I think Juliet's method for a Parallel operation should be even faster than a simple loop.
Benchmarks:
Thanks to Mikael Svenson: http://techmikael.blogspot.com/2009/12/filling-array-with-default-value.html
It turns out the simple for loop is the way to go unless you want to use unsafe code.
Apologies for not being clearer in my original post. Eric and Mark are both correct in their comments; need to have more focused questions for sure. Thanks for everyone's suggestions and responses.
You could use Enumerable.Repeat:
byte[] a = Enumerable.Repeat((byte)10, 100).ToArray();
The first parameter is the element you want repeated, and the second parameter is the number of times to repeat it.
This is OK for small arrays but you should use the looping method if you are dealing with very large arrays and performance is a concern.
Actually, there is little known IL operation called Initblk (English version) which does exactly that. So, let's use it as a method that doesn't require "unsafe". Here's the helper class:
public static class Util
{
static Util()
{
var dynamicMethod = new DynamicMethod("Memset", MethodAttributes.Public | MethodAttributes.Static, CallingConventions.Standard,
null, new [] { typeof(IntPtr), typeof(byte), typeof(int) }, typeof(Util), true);
var generator = dynamicMethod.GetILGenerator();
generator.Emit(OpCodes.Ldarg_0);
generator.Emit(OpCodes.Ldarg_1);
generator.Emit(OpCodes.Ldarg_2);
generator.Emit(OpCodes.Initblk);
generator.Emit(OpCodes.Ret);
MemsetDelegate = (Action<IntPtr, byte, int>)dynamicMethod.CreateDelegate(typeof(Action<IntPtr, byte, int>));
}
public static void Memset(byte[] array, byte what, int length)
{
var gcHandle = GCHandle.Alloc(array, GCHandleType.Pinned);
MemsetDelegate(gcHandle.AddrOfPinnedObject(), what, length);
gcHandle.Free();
}
public static void ForMemset(byte[] array, byte what, int length)
{
for(var i = 0; i < length; i++)
{
array[i] = what;
}
}
private static Action<IntPtr, byte, int> MemsetDelegate;
}
And what is the performance? Here's my result for Windows/.NET and Linux/Mono (different PCs).
Mono/for: 00:00:01.1356610
Mono/initblk: 00:00:00.2385835
.NET/for: 00:00:01.7463579
.NET/initblk: 00:00:00.5953503
So it's worth considering. Note that the resulting IL will not be verifiable.
Building on Lucero's answer, here is a faster version. It will double the number of bytes copied using Buffer.BlockCopy every iteration. Interestingly enough, it outperforms it by a factor of 10 when using relatively small arrays (1000), but the difference is not that large for larger arrays (1000000), it is always faster though. The good thing about it is that it performs well even down to small arrays. It becomes faster than the naive approach at around length = 100. For a one million element byte array, it was 43 times faster.
(tested on Intel i7, .Net 2.0)
public static void MemSet(byte[] array, byte value) {
if (array == null) {
throw new ArgumentNullException("array");
}
int block = 32, index = 0;
int length = Math.Min(block, array.Length);
//Fill the initial array
while (index < length) {
array[index++] = value;
}
length = array.Length;
while (index < length) {
Buffer.BlockCopy(array, 0, array, index, Math.Min(block, length-index));
index += block;
block *= 2;
}
}
A little bit late, but the following approach might be a good compromise without reverting to unsafe code. Basically it initializes the beginning of the array using a conventional loop and then reverts to Buffer.BlockCopy(), which should be as fast as you can get using a managed call.
public static void MemSet(byte[] array, byte value) {
if (array == null) {
throw new ArgumentNullException("array");
}
const int blockSize = 4096; // bigger may be better to a certain extent
int index = 0;
int length = Math.Min(blockSize, array.Length);
while (index < length) {
array[index++] = value;
}
length = array.Length;
while (index < length) {
Buffer.BlockCopy(array, 0, array, index, Math.Min(blockSize, length-index));
index += blockSize;
}
}
Looks like System.Runtime.CompilerServices.Unsafe.InitBlock now does the same thing as the OpCodes.Initblk instruction that Konrad's answer mentions (he also mentioned a source link).
The code to fill in the array is as follows:
byte[] a = new byte[N];
byte valueToFill = 255;
System.Runtime.CompilerServices.Unsafe.InitBlock(ref a[0], valueToFill, (uint) a.Length);
This simple implementation uses successive doubling, and performs quite well (about 3-4 times faster than the naive version according to my benchmarks):
public static void Memset<T>(T[] array, T elem)
{
int length = array.Length;
if (length == 0) return;
array[0] = elem;
int count;
for (count = 1; count <= length/2; count*=2)
Array.Copy(array, 0, array, count, count);
Array.Copy(array, 0, array, count, length - count);
}
Edit: upon reading the other answers, it seems I'm not the only one with this idea. Still, I'm leaving this here, since it's a bit cleaner and it performs on par with the others.
If performance is critical, you could consider using unsafe code and working directly with a pointer to the array.
Another option could be importing memset from msvcrt.dll and use that. However, the overhead from invoking that might easily be larger than the gain in speed.
Or use P/Invoke way:
[DllImport("msvcrt.dll",
EntryPoint = "memset",
CallingConvention = CallingConvention.Cdecl,
SetLastError = false)]
public static extern IntPtr MemSet(IntPtr dest, int c, int count);
static void Main(string[] args)
{
byte[] arr = new byte[3];
GCHandle gch = GCHandle.Alloc(arr, GCHandleType.Pinned);
MemSet(gch.AddrOfPinnedObject(), 0x7, arr.Length);
}
With the advent of Span<T> (which is dotnet core only, but it is the future of dotnet) you have yet another way of solving this problem:
var array = new byte[100];
var span = new Span<byte>(array);
span.Fill(255);
If performance is absolutely critical, then Enumerable.Repeat(n, m).ToArray() will be too slow for your needs. You might be able to crank out faster performance using PLINQ or Task Parallel Library:
using System.Threading.Tasks;
// ...
byte initialValue = 20;
byte[] data = new byte[size]
Parallel.For(0, size, index => data[index] = initialValue);
All answers are writing single bytes only - what if you want to fill a byte array with words? Or floats? I find use for that now and then. So after having written similar code to 'memset' in a non-generic way a few times and arriving at this page to find good code for single bytes, I went about writing the method below.
I think PInvoke and C++/CLI each have their drawbacks. And why not have the runtime 'PInvoke' for you into mscorxxx? Array.Copy and Buffer.BlockCopy are native code certainly. BlockCopy isn't even 'safe' - you can copy a long halfway over another, or over a DateTime as long as they're in arrays.
At least I wouldn't go file new C++ project for things like this - it's a waste of time almost certainly.
So here's basically an extended version of the solutions presented by Lucero and TowerOfBricks that can be used to memset longs, ints, etc as well as single bytes.
public static class MemsetExtensions
{
static void MemsetPrivate(this byte[] buffer, byte[] value, int offset, int length) {
var shift = 0;
for (; shift < 32; shift++)
if (value.Length == 1 << shift)
break;
if (shift == 32 || value.Length != 1 << shift)
throw new ArgumentException(
"The source array must have a length that is a power of two and be shorter than 4GB.", "value");
int remainder;
int count = Math.DivRem(length, value.Length, out remainder);
var si = 0;
var di = offset;
int cx;
if (count < 1)
cx = remainder;
else
cx = value.Length;
Buffer.BlockCopy(value, si, buffer, di, cx);
if (cx == remainder)
return;
var cachetrash = Math.Max(12, shift); // 1 << 12 == 4096
si = di;
di += cx;
var dx = offset + length;
// doubling up to 1 << cachetrash bytes i.e. 2^12 or value.Length whichever is larger
for (var al = shift; al <= cachetrash && di + (cx = 1 << al) < dx; al++) {
Buffer.BlockCopy(buffer, si, buffer, di, cx);
di += cx;
}
// cx bytes as long as it fits
for (; di + cx <= dx; di += cx)
Buffer.BlockCopy(buffer, si, buffer, di, cx);
// tail part if less than cx bytes
if (di < dx)
Buffer.BlockCopy(buffer, si, buffer, di, dx - di);
}
}
Having this you can simply add short methods to take the value type you need to memset with and call the private method, e.g. just find replace ulong in this method:
public static void Memset(this byte[] buffer, ulong value, int offset, int count) {
var sourceArray = BitConverter.GetBytes(value);
MemsetPrivate(buffer, sourceArray, offset, sizeof(ulong) * count);
}
Or go silly and do it with any type of struct (although the MemsetPrivate above only works for structs that marshal to a size that is a power of two):
public static void Memset<T>(this byte[] buffer, T value, int offset, int count) where T : struct {
var size = Marshal.SizeOf<T>();
var ptr = Marshal.AllocHGlobal(size);
var sourceArray = new byte[size];
try {
Marshal.StructureToPtr<T>(value, ptr, false);
Marshal.Copy(ptr, sourceArray, 0, size);
} finally {
Marshal.FreeHGlobal(ptr);
}
MemsetPrivate(buffer, sourceArray, offset, count * size);
}
I changed the initblk mentioned before to take ulongs to compare performance with my code and that silently fails - the code runs but the resulting buffer contains the least significant byte of the ulong only.
Nevertheless I compared the performance writing as big a buffer with for, initblk and my memset method. The times are in ms total over 100 repetitions writing 8 byte ulongs whatever how many times fit the buffer length. The for version is manually loop-unrolled for the 8 bytes of a single ulong.
Buffer Len #repeat For millisec Initblk millisec Memset millisec
0x00000008 100 For 0,0032 Initblk 0,0107 Memset 0,0052
0x00000010 100 For 0,0037 Initblk 0,0102 Memset 0,0039
0x00000020 100 For 0,0032 Initblk 0,0106 Memset 0,0050
0x00000040 100 For 0,0053 Initblk 0,0121 Memset 0,0106
0x00000080 100 For 0,0097 Initblk 0,0121 Memset 0,0091
0x00000100 100 For 0,0179 Initblk 0,0122 Memset 0,0102
0x00000200 100 For 0,0384 Initblk 0,0123 Memset 0,0126
0x00000400 100 For 0,0789 Initblk 0,0130 Memset 0,0189
0x00000800 100 For 0,1357 Initblk 0,0153 Memset 0,0170
0x00001000 100 For 0,2811 Initblk 0,0167 Memset 0,0221
0x00002000 100 For 0,5519 Initblk 0,0278 Memset 0,0274
0x00004000 100 For 1,1100 Initblk 0,0329 Memset 0,0383
0x00008000 100 For 2,2332 Initblk 0,0827 Memset 0,0864
0x00010000 100 For 4,4407 Initblk 0,1551 Memset 0,1602
0x00020000 100 For 9,1331 Initblk 0,2768 Memset 0,3044
0x00040000 100 For 18,2497 Initblk 0,5500 Memset 0,5901
0x00080000 100 For 35,8650 Initblk 1,1236 Memset 1,5762
0x00100000 100 For 71,6806 Initblk 2,2836 Memset 3,2323
0x00200000 100 For 77,8086 Initblk 2,1991 Memset 3,0144
0x00400000 100 For 131,2923 Initblk 4,7837 Memset 6,8505
0x00800000 100 For 263,2917 Initblk 16,1354 Memset 33,3719
I excluded the first call every time, since both initblk and memset take a hit of I believe it was about .22ms for the first call. Slightly surprising my code is faster for filling short buffers than initblk, seeing it got half a page full of setup code.
If anybody feels like optimizing this, go ahead really. It's possible.
Tested several ways, described in different answers.
See sources of test in c# test class
You could do it when you initialize the array but I don't think that's what you are asking for:
byte[] myBytes = new byte[5] { 1, 1, 1, 1, 1};
.NET Core has a built-in Array.Fill() function, but sadly .NET Framework is missing it. .NET Core has two variations: fill the entire array and fill a portion of the array starting at an index.
Building on the ideas above, here is a more generic Fill function that will fill the entire array of several data types. This is the fastest function when benchmarking against other methods discussed in this post.
This function, along with the version that fills a portion an array are available in an open source and free NuGet package (HPCsharp on nuget.org). Also included is a slightly faster version of Fill using SIMD/SSE instructions that performs only memory writes, whereas BlockCopy-based methods perform memory reads and writes.
public static void FillUsingBlockCopy<T>(this T[] array, T value) where T : struct
{
int numBytesInItem = 0;
if (typeof(T) == typeof(byte) || typeof(T) == typeof(sbyte))
numBytesInItem = 1;
else if (typeof(T) == typeof(ushort) || typeof(T) != typeof(short))
numBytesInItem = 2;
else if (typeof(T) == typeof(uint) || typeof(T) != typeof(int))
numBytesInItem = 4;
else if (typeof(T) == typeof(ulong) || typeof(T) != typeof(long))
numBytesInItem = 8;
else
throw new ArgumentException(string.Format("Type '{0}' is unsupported.", typeof(T).ToString()));
int block = 32, index = 0;
int endIndex = Math.Min(block, array.Length);
while (index < endIndex) // Fill the initial block
array[index++] = value;
endIndex = array.Length;
for (; index < endIndex; index += block, block *= 2)
{
int actualBlockSize = Math.Min(block, endIndex - index);
Buffer.BlockCopy(array, 0, array, index * numBytesInItem, actualBlockSize * numBytesInItem);
}
}
Most of answers is for byte memset but if you want to use it for float or any other struct you should multiply index by size of your data. Because Buffer.BlockCopy will copy based on the bytes.
This code will be work for float values
public static void MemSet(float[] array, float value) {
if (array == null) {
throw new ArgumentNullException("array");
}
int block = 32, index = 0;
int length = Math.Min(block, array.Length);
//Fill the initial array
while (index < length) {
array[index++] = value;
}
length = array.Length;
while (index < length) {
Buffer.BlockCopy(array, 0, array, index * sizeof(float), Math.Min(block, length-index)* sizeof(float));
index += block;
block *= 2;
}
}
The Array object has a method called Clear. I'm willing to bet that the Clear method is faster than any code you can write in C#.
you can try the following codes, it will work.
byte[] test = new byte[65536];
Array.Clear(test,0,test.Length);

Categories