I want to fill an array with random values. The code I wrote is this one:
public class PersonalityMap
{
const int size = 16;
byte[,] fullMap = new byte[size, size];
/// <summary>
/// Generates a random map
/// </summary>
public PersonalityMap()
{
Random random = new Random();
byte[] row = new byte[size];
for (int i = 0; i < size; i++)
{
random.NextBytes(row);
for (int j = 0; j < size; j++)
fullMap[i, j] = row[j];
}
}
}
But I feel there's a way to do it faster.
Well, you could create one single-dimensional array, fill that, and then copy it with Buffer.BlockCopy:
Random random = new Random();
byte[] row = new byte[size * size];
random.NextBytes(row);
Buffer.BlockCopy(row, 0, fullMap, 0, size * size);
However, before you try to optimize even further - just how quick do you need this to be? Have you benchmarked your application and determined that this is the bottleneck of your application?
Jagged arrays (arrays of arrays) are consider faster than multidimensional arrays, but you will get speed up of a few ms. Is this worth it?
This optimization is not worth to spend your time on it.
Related
I am iterating through an array of bytes and add values of another array of bytes in a for loop.
var random = new Random();
byte[] bytes = new byte[20_000_000];
byte[] bytes2 = new byte[20_000_000];
for (int i = 0; i < bytes.Length; i++)
{
bytes[i] = (byte)random.Next(255);
}
for (int i = 0; i < bytes.Length; i++)
{
bytes2[i] = (byte)random.Next(255);
}
//how to optimize the part below
for (int i = 0; i < bytes.Length; i++)
{
bytes[i] += bytes2[i];
}
Is there any way to speed up the process, so it can be faster than linear.
You could use Vector:
static void Add(Span<byte> dst, ReadOnlySpan<byte> src)
{
Span<Vector<byte>> dstVec = MemoryMarshal.Cast<byte, Vector<byte>>(dst);
ReadOnlySpan<Vector<byte>> srcVec = MemoryMarshal.Cast<byte, Vector<byte>>(src);
for (int i = 0; i < dstVec.Length; ++i)
{
dstVec[i] += srcVec[i];
}
for (int i = dstVec.Length * Vector<byte>.Count; i < dst.Length; ++i)
{
dst[i] += src[i];
}
}
Will go even faster if you use a pointer here to align one of your arrays.
Pad the array length to the next highest multiple of 8.(It already is in your example.)
Use an unsafe context to create two ulong arrays pointing to the start of the existing byte arrays. Use a for loop to iterate bytes.Length / 8 times adding 8 bytes at a time.
On my system this runs for less than 13 milliseconds. Compared to 105 milliseconds for the original code.
You must add the /unsafe option to use this code. Open the project properties and select "allow unsafe code".
var random = new Random();
byte[] bytes = new byte[20_000_000];
byte[] bytes2 = new byte[20_000_000];
int Len = bytes.Length >> 3; // >>3 is the same as / 8
ulong MASK = 0x8080808080808080;
ulong MASKINV = 0x7f7f7f7f7f7f7f7f;
//Sanity check
if((bytes.Length & 7) != 0) throw new Exception("bytes.Length is not a multiple of 8");
if((bytes2.Length & 7) != 0) throw new Exception("bytes2.Length is not a multiple of 8");
unsafe
{
//Add 8 bytes at a time, taking into account overflow between bytes
fixed (byte* pbBytes = &bytes[0])
fixed (byte* pbBytes2 = &bytes2[0])
{
ulong* pBytes = (ulong*)pbBytes;
ulong* pBytes2 = (ulong*)pbBytes2;
for (int i = 0; i < Len; i++)
{
pBytes[i] = ((pBytes2[i] & MASKINV) + (pBytes[i] & MASKINV)) ^ ((pBytes[i] ^ pBytes2[i]) & MASK);
}
}
}
You can utilize all your processors/cores, assuming that your machine has more than one.
Parallel.ForEach(Partitioner.Create(0, bytes.Length), range =>
{
for (int i = range.Item1; i < range.Item2; i++)
{
bytes[i] += bytes2[i];
}
});
Update: The Vector<T> class can also be used in .NET Framework. It requires the package System.Numerics.Vectors. It offers the advantage of parallelization in a single core, by issuing a Single Instruction to Multiple Data (SIMD). Most current processors are SIMD-enabled. It is only enabled for 64-bit processes, so the flag [Prefer 32-bit] must be unchecked. On 32-bit processes the property Vector.IsHardwareAccelerated returns false, and the performance is bad.
using System.Numerics;
/// <summary>Adds each pair of elements in two arrays, and replaces the
/// left array element with the result.</summary>
public static void Add_UsingVector(byte[] left, byte[] right, int start, int length)
{
int i = start;
int step = Vector<byte>.Count; // the step is 16
int end = start + length - step + 1;
for (; i < end; i += step)
{
// Vectorize 16 bytes from each array
var vector1 = new Vector<byte>(left, i);
var vector2 = new Vector<byte>(right, i);
vector1 += vector2; // Vector arithmetic is unchecked only
vector1.CopyTo(left, i);
}
for (; i < start + length; i++) // Process the last few elements
{
unchecked { left[i] += right[i]; }
}
}
This runs 4-5 times faster than a simple loop, without utilizing more than one thread (25% CPU consumption in a 4-core PC).
I am using a segmented integer counter (byte array) in a parallel CTR implementation (encryption). The counter needs to be incremented in sizes corresponding to the blocks of data being processed by each processor. So, the number needs to be incremented by more than a single bit value. In other words.. the byte array is acting as a Big Integer, and I need to increase the sum value of the byte array by an integer factor.
I am currently using the methods shown below using a while loop, but I am wondering if there is a bit wise method (& | ^ etc), as using a loop seems very wasteful.. any ideas?
private void Increment(byte[] Counter)
{
int j = Counter.Length;
while (--j >= 0 && ++Counter[j] == 0) { }
}
/// <summary>
/// Increase a byte array by a numerical value
/// </summary>
/// <param name="Counter">Original byte array</param>
/// <param name="Count">Number to increase by</param>
/// <returns>Array with increased value [byte[]]</returns>
private byte[] Increase(byte[] Counter, Int32 Count)
{
byte[] buffer = new byte[Counter.Length];
Buffer.BlockCopy(Counter, 0, buffer, 0, Counter.Length);
for (int i = 0; i < Count; i++)
Increment(buffer);
return buffer;
}
The standard O(n) multi-precision add goes like this (assuming [0] is LSB):
static void Add(byte[] dst, byte[] src)
{
int carry = 0;
for (int i = 0; i < dst.Length; ++i)
{
byte odst = dst[i];
byte osrc = i < src.Length ? src[i] : (byte)0;
byte ndst = (byte)(odst + osrc + carry);
dst[i] = ndst;
carry = ndst < odst ? 1 : 0;
}
}
It can help to think of this is terms of grade-school arithmetic, which is really all it is:
129
+ 123
-----
Remember, where you'd perform an add and carry for each digit, starting from the left (the least-significant digit)? In this case, each digit is a byte in your array.
Rather than roll your own, have you considered using the arbitrary-precision BigInteger? It was actually created specifically for .NET's own crypto stuff.
I used a variation of Cory Nelsons answer that creates the array with the correct endian order and returns a new array. This is quite a bit faster then the method first posted.. thanks for the help.
private byte[] Increase(byte[] Counter, int Count)
{
int carry = 0;
byte[] buffer = new byte[Counter.Length];
int offset = buffer.Length - 1;
byte[] cnt = BitConverter.GetBytes(Count);
byte osrc, odst, ndst;
Buffer.BlockCopy(Counter, 0, buffer, 0, Counter.Length);
for (int i = offset; i > 0; i--)
{
odst = buffer[i];
osrc = offset - i < cnt.Length ? cnt[offset - i] : (byte)0;
ndst = (byte)(odst + osrc + carry);
carry = ndst < odst ? 1 : 0;
buffer[i] = ndst;
}
return buffer;
}
I would like to randomize the lines in a file which has over 32 million lines of 10 digit strings. I am aware of how to do it with File.ReadAllLines(...).OrderBy(s => random.Next()).ToArray() but this is not memory efficient since it will load everything into memory (over 1.4GB) and works only with x64 architecture.
The alternative would be to split it and randomize the shorter files and then merge them but I was wondering if there is a better way to do this.
Your current approach will allocate at least 2 large string arrays (probably more -- I don't know how OrderBy is implemented, but it probably does its own allocations).
If you randomize the data "in-place" by doing random permutations between the lines (e.g. using the Fisher-Yates shuffle), it will minimize the memory usage. Of course it will still be large if the file is large, but you won't allocate more memory than necessary.
EDIT: if all lines have the same length (*), it means you can do random access to a given line in the file, so you can do the Fisher-Yates shuffle directly in the file.
(*) and assuming you're not using an encoding where characters can have different byte lengths, like UTF-8
This application demonstrates what you want, using a byte array
It creates a file with the padded numbers 0 to 32000000.
It loads the file, then shuffles them in memory, using a block-copying Fisher-Yates method.
Finally, it writes the file back out in shuffled order
Peak memory usage is about 400 MB. Runs in about 20 seconds on my machine (mostly file IO).
public class Program
{
private static Random random = new Random();
public static void Main(string[] args)
{
// create massive file
var random = new Random();
const int lineCount = 32000000;
var file = File.CreateText("BigFile.txt");
for (var i = 0; i < lineCount ; i++)
{
file.WriteLine("{0}",i.ToString("D10"));
}
file.Close();
int sizeOfRecord = 12;
var loadedLines = File.ReadAllBytes("BigFile.txt");
ShuffleByteArray(loadedLines, lineCount, sizeOfRecord);
File.WriteAllBytes("BigFile2.txt", loadedLines);
}
private static void ShuffleByteArray(byte[] byteArray, int lineCount, int sizeOfRecord)
{
var temp = new byte[sizeOfRecord];
for (int i = lineCount - 1; i > 0; i--)
{
int j = random.Next(0, i + 1);
// copy i to temp
Buffer.BlockCopy(byteArray, sizeOfRecord * i, temp, 0, sizeOfRecord);
// copy j to i
Buffer.BlockCopy(byteArray, sizeOfRecord * j, byteArray, sizeOfRecord * i, sizeOfRecord);
// copy temp to j
Buffer.BlockCopy(temp, 0, byteArray, sizeOfRecord * j, sizeOfRecord);
}
}
}
How to write bits (not bytes) to a file with c#, .net? I'm preety stuck with it.
Edit: i'm looking for a different way that just writing every 8 bits as a byte
The smallest amount of data you can write at one time is a byte.
If you need to write individual bit-values. (Like for instance a binary format that requires a 1 bit flag, a 3 bit integer and a 4 bit integer); you would need to buffer the individual values in memory and write to the file when you have a whole byte to write. (For performance, it makes sense to buffer more and write larger chunks to the file).
Accumulate the bits in a buffer (a single byte can qualify as a "buffer")
When adding a bit, left-shift the buffer and put the new bit in the lowest position using OR
Once the buffer is full, append it to the file
I've made something like this to emulate a BitsWriter.
private BitArray bitBuffer = new BitArray(new byte[65536]);
private int bitCount = 0;
// Write one int. In my code, this is a byte
public void write(int b)
{
BitArray bA = new BitArray((byte)b);
int[] pattern = new int[8];
writeBitArray(bA);
}
// Write one bit. In my code, this is a binary value, and the amount of times
public void write(int b, int len)
{
int[] pattern = new int[len];
BitArray bA = new BitArray(len);
for (int i = 0; i < len; i++)
{
bA.Set(i, (b == 1));
}
writeBitArray(bA);
}
private void writeBitArray(BitArray bA)
{
for (int i = 0; i < bA.Length; i++)
{
bitBuffer.Set(bitCount + i, bA[i]);
bitCount++;
}
if (bitCount % 8 == 0)
{
BitArray bitBufferWithLength = new BitArray(new byte[bitCount / 8]);
byte[] res = new byte[bitBuffer.Count / 8];
for (int i = 0; i < bitCount; i++)
{
bitBufferWithLength.Set(i, (bitBuffer[i]));
}
bitBuffer.CopyTo(res, 0);
bitCount = 0;
base.BaseStream.Write(res, 0, res.Length);
}
}
You will have to use bitshifts or binary arithmetic, as you can only write one byte at a time, not individual bits.
I would like to get a byte[] from a float[] as quickly as possible, without looping through the whole array (via a cast, probably). Unsafe code is fine. Thanks!
I am looking for a byte array 4 time longer than the float array (the dimension of the byte array will be 4 times that of the float array, since each float is composed of 4 bytes). I'll pass this to a BinaryWriter.
EDIT:
To those critics screaming "premature optimization":
I have benchmarked this using ANTS profiler before I optimized. There was a significant speed increase because the file has a write-through cache and the float array is exactly sized to match the sector size on the disk. The binary writer wraps a file handle created with pinvoke'd win32 API. The optimization occurs since this lessens the number of function calls.
And, with regard to memory, this application creates massive caches which use plenty of memory. I can allocate the byte buffer once and re-use it many times--the double memory usage in this particular instance amounts to a roundoff error in the overall memory consumption of the app.
So I guess the lesson here is not to make premature assumptions ;)
There is a dirty fast (not unsafe code) way of doing this:
[StructLayout(LayoutKind.Explicit)]
struct BytetoDoubleConverter
{
[FieldOffset(0)]
public Byte[] Bytes;
[FieldOffset(0)]
public Double[] Doubles;
}
//...
static Double Sum(byte[] data)
{
BytetoDoubleConverter convert = new BytetoDoubleConverter { Bytes = data };
Double result = 0;
for (int i = 0; i < convert.Doubles.Length / sizeof(Double); i++)
{
result += convert.Doubles[i];
}
return result;
}
This will work, but I'm not sure of the support on Mono or newer versions of the CLR. The only strange thing is that the array.Length is the bytes length. This can be explained because it looks at the array length stored with the array, and because this array was a byte array that length will still be in byte length. The indexer does think about the Double being eight bytes large so no calculation is necessary there.
I've looked for it some more, and it's actually described on MSDN, How to: Create a C/C++ Union by Using Attributes (C# and Visual Basic), so chances are this will be supported in future versions. I am not sure about Mono though.
Premature optimization is the root of all evil! #Vlad's suggestion to iterate over each float is a much more reasonable answer than switching to a byte[]. Take the following table of runtimes for increasing numbers of elements (average of 50 runs):
Elements BinaryWriter(float) BinaryWriter(byte[])
-----------------------------------------------------------
10 8.72ms 8.76ms
100 8.94ms 8.82ms
1000 10.32ms 9.06ms
10000 32.56ms 10.34ms
100000 213.28ms 739.90ms
1000000 1955.92ms 10668.56ms
There is little difference between the two for small numbers of elements. Once you get into the huge number of elements range, the time spent copying from the float[] to the byte[] far outweighs the benefits.
So go with what is simple:
float[] data = new float[...];
foreach(float value in data)
{
writer.Write(value);
}
There is a way which avoids memory copying and iteration.
You can use a really ugly hack to temporary change your array to another type using (unsafe) memory manipulation.
I tested this hack in both 32 & 64 bit OS, so it should be portable.
The source + sample usage is maintained at https://gist.github.com/1050703 , but for your convenience I'll paste it here as well:
public static unsafe class FastArraySerializer
{
[StructLayout(LayoutKind.Explicit)]
private struct Union
{
[FieldOffset(0)] public byte[] bytes;
[FieldOffset(0)] public float[] floats;
}
[StructLayout(LayoutKind.Sequential, Pack = 1)]
private struct ArrayHeader
{
public UIntPtr type;
public UIntPtr length;
}
private static readonly UIntPtr BYTE_ARRAY_TYPE;
private static readonly UIntPtr FLOAT_ARRAY_TYPE;
static FastArraySerializer()
{
fixed (void* pBytes = new byte[1])
fixed (void* pFloats = new float[1])
{
BYTE_ARRAY_TYPE = getHeader(pBytes)->type;
FLOAT_ARRAY_TYPE = getHeader(pFloats)->type;
}
}
public static void AsByteArray(this float[] floats, Action<byte[]> action)
{
if (floats.handleNullOrEmptyArray(action))
return;
var union = new Union {floats = floats};
union.floats.toByteArray();
try
{
action(union.bytes);
}
finally
{
union.bytes.toFloatArray();
}
}
public static void AsFloatArray(this byte[] bytes, Action<float[]> action)
{
if (bytes.handleNullOrEmptyArray(action))
return;
var union = new Union {bytes = bytes};
union.bytes.toFloatArray();
try
{
action(union.floats);
}
finally
{
union.floats.toByteArray();
}
}
public static bool handleNullOrEmptyArray<TSrc,TDst>(this TSrc[] array, Action<TDst[]> action)
{
if (array == null)
{
action(null);
return true;
}
if (array.Length == 0)
{
action(new TDst[0]);
return true;
}
return false;
}
private static ArrayHeader* getHeader(void* pBytes)
{
return (ArrayHeader*)pBytes - 1;
}
private static void toFloatArray(this byte[] bytes)
{
fixed (void* pArray = bytes)
{
var pHeader = getHeader(pArray);
pHeader->type = FLOAT_ARRAY_TYPE;
pHeader->length = (UIntPtr)(bytes.Length / sizeof(float));
}
}
private static void toByteArray(this float[] floats)
{
fixed(void* pArray = floats)
{
var pHeader = getHeader(pArray);
pHeader->type = BYTE_ARRAY_TYPE;
pHeader->length = (UIntPtr)(floats.Length * sizeof(float));
}
}
}
And the usage is:
var floats = new float[] {0, 1, 0, 1};
floats.AsByteArray(bytes =>
{
foreach (var b in bytes)
{
Console.WriteLine(b);
}
});
If you do not want any conversion to happen, I would suggest Buffer.BlockCopy().
public static void BlockCopy(
Array src,
int srcOffset,
Array dst,
int dstOffset,
int count
)
For example:
float[] floatArray = new float[1000];
byte[] byteArray = new byte[floatArray.Length * 4];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
You're better-off letting the BinaryWriter do this for you. There's going to be iteration over your entire set of data regardless of which method you use, so there's no point in playing with bytes.
Although you can obtain a byte* pointer using unsafe and fixed, you cannot convert the byte* to byte[] in order for the writer to accept it as a parameter without performing data copy. Which you do not want to do as it will double your memory footprint and add an extra iteration over the inevitable iteration that needs to be performed in order to output the data to disk.
Instead, you are still better off iterating over the array of floats and writing each float to the writer individually, using the Write(double) method. It will still be fast because of buffering inside the writer. See sixlettervariables's numbers.
Using the new Span<> in .Net Core 2.1 or later...
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
Or, if Span can be used instead, then a direct reinterpret cast can be done: (very fast - zero copying)
Span<byte> byteArray3 = MemoryMarshal.Cast<float, byte>(floatArray);
// with span we can get a byte, set a byte, iterate, and more.
byte someByte = byteSpan[2];
byteSpan[2] = 33;
I did some crude benchmarks. The time taken for each is in the comments. [release/no debugger/x64]
float[] floatArray = new float[100];
for (int i = 0; i < 100; i++) floatArray[i] = i * 7.7777f;
Stopwatch start = Stopwatch.StartNew();
for (int j = 0; j < 100; j++)
{
start.Restart();
for (int k = 0; k < 1000; k++)
{
Span<byte> byteSpan = MemoryMarshal.Cast<float, byte>(floatArray);
}
long timeTaken1 = start.ElapsedTicks; ////// 0 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray2 = MemoryMarshal.Cast<float, byte>(floatArray).ToArray();
}
long timeTaken2 = start.ElapsedTicks; ////// 26 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
for (int i = 0; i < floatArray.Length; i++)
BitConverter.GetBytes(floatArray[i]).CopyTo(byteArray, i * sizeof(float));
}
long timeTaken3 = start.ElapsedTicks; ////// 1310 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
Buffer.BlockCopy(floatArray, 0, byteArray, 0, byteArray.Length);
}
long timeTaken4 = start.ElapsedTicks; ////// 33 ticks //////
start.Restart();
for (int k = 0; k < 1000; k++)
{
byte[] byteArray = new byte[sizeof(float) * floatArray.Length];
MemoryStream memStream = new MemoryStream();
BinaryWriter writer = new BinaryWriter(memStream);
foreach (float value in floatArray)
writer.Write(value);
writer.Close();
}
long timeTaken5 = start.ElapsedTicks; ////// 1080 ticks //////
Console.WriteLine($"{timeTaken1/10,6} {timeTaken2 / 10,6} {timeTaken3 / 10,6} {timeTaken4 / 10,6} {timeTaken5 / 10,6} ");
}
We have a class called LudicrousSpeedSerialization and it contains the following unsafe method:
static public byte[] ConvertFloatsToBytes(float[] data)
{
int n = data.Length;
byte[] ret = new byte[n * sizeof(float)];
if (n == 0) return ret;
unsafe
{
fixed (byte* pByteArray = &ret[0])
{
float* pFloatArray = (float*)pByteArray;
for (int i = 0; i < n; i++)
{
pFloatArray[i] = data[i];
}
}
}
return ret;
}
Although it basically does do a for loop behind the scenes, it does do the job in one line
byte[] byteArray = floatArray.Select(
f=>System.BitConverter.GetBytes(f)).Aggregate(
(bytes, f) => {List<byte> temp = bytes.ToList(); temp.AddRange(f); return temp.ToArray(); });