Thread safe high performance random generator - c#

I need a high performance random number generator that is thread-safe. I need only random bytes in the value type (which is ulong for now), not within ranges. I've used the C# built-in Random class, but it was kind of slow and not thread-safe.
Later I moved to XORShift functions that actually works very fine, but to achieve thread-safeness I need to put the calculation in lock, and that degrades the performance drastically.
What I'm using to generate a random ulong is the following:
public class Rand
{
ulong seed = 0;
object lockObj = new object();
public Rand()
{
unchecked
{
seed = (ulong)DateTime.Now.Ticks;
}
}
public Rand(ulong seed)
{
this.seed = seed;
}
public ulong GetULong()
{
unchecked
{
lock (lockObj)
{
ulong t = 0;
t = seed;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
seed = t;
return t * 0x2545F4914F6CDD1D;
}
}
}
}
This works fine and fast, but locking makes it take about 1-2us if it is called from 200 concurrent threads, otherwise calculation finishes under 100ns.
If I remove locking there is a chance two threads take the same seed and will calculate the same random which is not good for my purposes. If I'm removing the ulong t declaration and work directly on the seed then there will be a very little chance to generate the same random for two concurrent calls, but there is also a chance the value will be shifted out from the value range, like t << 25 will be called many times in a row by different threads without carrying the rotation it will become just simply 0.
I think the proper way would be if there is a shared value that may be changed by any concurrent call and work with that value in the calculation methods, since these values are atomic (at least withing CPU cores) it is not a problem if many calculations are using it in the same time, but that is a problem if this value shifts out from the bitrange.
Is there any good solution to solve this problem? I'd be thankful for any help.
Edit: Ok, I've forgot to mention I have no control over the threads, because async tasks are calling this function, so threads are coming randomly from the threadpool, using thread ID is also a no solution, since there is a chance one specific thread never will call this method again at all, and keeping an instance for that ID is not a good thing.

Simply create one instance of Rand on each thread. Thread-safe, no locking, thus very performant. This can be achieved using the ThreadStaticAttribute.
public static class Rand
{
[ThreadStatic] private static Rand defaultRand;
public static Rand Default => defaultRand ??= new Rand();
// Add extra methods for seeding the static instance...
}
// Then in any thread:
var randomNumber = Rand.Default.GetULong();

You can do it without locking and still be threadsafe. Assuming the calculation is very fast (it is) and there is slower code executing around it, it's likely faster to simply recalculate if another thread changes it in between starting the calculation and finishing it. You can do that with an Interlocked.CompareExchange spin loop. The only difficulty is that there is no ulong version of that so we have to use an unsafe method to get the equivalent.
private static unsafe ulong InterlockedCompareExchange(ref ulong location,
ulong value, ulong comparand)
{
fixed (ulong* ptr = &location)
{
return (ulong)Interlocked.CompareExchange(ref *(long*)ptr, (long)value, (long)comparand);
}
}
public ulong GetULong()
{
unchecked
{
ulong prev = seed;
ulong t = prev;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
while (InterlockedCompareExchange(ref seed, t, prev) != prev)
{
prev = seed;
t = prev;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
}
return t * 0x2545F4914F6CDD1D;
}
}

The less code we execute in the critical section, the faster it will work.
It runs 30-50% faster on my CPU.
You can also use an asynchronous process that will prepare the next collection
public sealed class Rand
{
private ulong seed = 0;
private readonly object lockObj = new object();
public Rand()
{
unchecked
{
seed = (ulong) DateTime.Now.Ticks;
}
_current = 500;
}
public Rand(ulong seed)
{
this.seed = seed;
}
private ulong[] _batch = new ulong[501];
private int _current = -1;
public ulong GetULong2()
{
unchecked
{
ulong t = 0;
lock (lockObj)
{
t ^= seed >> 12;
t ^= t << 25;
t ^= t >> 27;
seed = t;
}
return t * 0x2545F4914F6CDD1D;
}
}
public ulong GetULong5()
{
unchecked
{
var t = seed;
t *= (uint)Thread.CurrentThread.ManagedThreadId;
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
seed = t;
return t * 0x2545F4914F6CDD1D;
}
}
public ulong GetULong()
{
unchecked
{
do
{
var current = Interlocked.Increment(ref _current);
if (current < 501)
return _batch[current];
lock (lockObj)
{
if (_current >= 500)
{
ulong t = seed;
for (int i = 0; i < 501; i++)
{
t ^= t >> 12;
t ^= t << 25;
t ^= t >> 27;
var result = t * 0x2545F4914F6CDD1D;
_batch[i] = result;
}
seed = t;
_current = -1;
}
}
}while(true);
}
}
}

Ok this solution by l33t is a very nice and elegant way to fix the cross-thread issue, and this solution by Stanislav also a suitable way to prevent on-demand generation by pre-generating and caching batches. Thanks everyone for the ideas.
Meanwhile I changed the bitshift to rotate, this will avoid shifting out the seed to 0, and I could omit the locking. Apparently it works just fine, and with very small latency (200 concurrent threads with about 100-120ns/call)
public class Rand
{
ulong seed = 0;
object lockObj = new object();
public Rand()
{
unchecked
{
seed = (ulong)DateTime.Now.Ticks;
}
}
public Rand(ulong seed)
{
this.seed = seed;
}
public ulong GetULong()
{
unchecked
{
seed ^= (seed >> 12) | (seed << (64 - 12));
seed ^= (seed << 25) | (seed >> (64 - 25));
seed ^= (seed >> 27) | (seed << (64 - 27));
seed *= 0x2545F4914F6CDD1D;
int s = Environment.CurrentManagedThreadId % 64;
return (seed >> s) | (seed << (64 - s));
}
}
// even faster
public ulong GetULong2()
{
unchecked
{
seed ^= (seed >> 12) | (seed << (64 - 12));
seed ^= (seed << 25) | (seed >> (64 - 25));
seed ^= (seed >> 27) | (seed << (64 - 27));
ulong r = seed * 0x2545F4914F6CDD1D;
seed = r;
int s = Environment.CurrentManagedThreadId % 64;
return (seed >> s) | (seed << (64 - s));
}
}
// better entropy
public ulong GetULong3()
{
unchecked
{
int s = Environment.CurrentManagedThreadId % 12;
seed ^= (seed >> (12 - s)) | (seed << (64 - (12 - s)));
seed ^= (seed << (25 - s)) | (seed >> (64 - (25 - s)));
seed ^= (seed >> (27 - s)) | (seed << (64 - (27 - s)));
ulong r = seed * 0x2545F4914F6CDD1D;
seed = r;
s = Environment.CurrentManagedThreadId % 64;
return (r >> s) | (r << (64 - s));
}
}
}
While this solution fits my demands the most, I will not mark it as an answer because it does not produce the same randoms as the one in the question, but still produces seemingly unique randoms across all threads, so still might be a solution.
Edit: Ok, after some experimenting GetULong() was the fastest, but from 100 million random generations produced more than 23000 colliding values on 200 concurrent threads. Same with GetULong2(). Added a little extra for GetULong3() method to increase entropy, that produced only about 210 colliding values from 100 million generations on 200 concurrent threads, and only about 50 colliding values from 100 million generations on 500 concurrent threads.
For me this kind of entropy is more than enougn, because they must be unique, so in my application after each generation they are tried to be added to a collection atomically and if there is already one with the same key, then random is called again. 50-200 retry in 100 million events is acceptable even on general random generators, so this is more than enough for me, especially, because it is fast, and can be used tread-safe with only one instance of the random generator.
Thank you everyone for the help, I hope this may help others also.

Related

How to merge two bitmaps with specific shift(offset)?

Let say we have two bitmaps that are represented by unsigned long(64-bit) arrays. And I want to merge this two bitmaps using specific shift(offset).
For example merge bitmap1(bigger) into bitmap2(smaller) starting offset 3. Offset 3 mean that 3rd bit of bitmap1 corresponds to 0 bit of bitmap2.
By merge I mean logical Or operation. What is the cleanest way to do this?
Currently I have done this with simple uneffective for loop
const ulong BitsPerUlong = 64;
MergeAt(ulong startIndex, Bitmap bitmap2)
{
for (int i = startIndex; i < bitmap2.Capacity; i++)
{
bool newVal = bitmap2.GetAt(i) | bitmap1.GetAt(i)
bitmap2.SetAt(i, newVal)
}
}
bool GetAt(ulong index)
{
var dataOffset = BitOffsetToUlongOffset(index);
ulong mask = 0x1ul << ((int)(index % BitsPerUlong));
return (_data[dataOffset] & mask) == mask;
}
void SetAt(ulong index, bool value)
{
var dataOffset = BitOffsetToUlongOffset(index);
ulong mask = 0x1ul << ((int)(index % BitsPerUlong));
if (value)
{
_data[dataOffset] |= mask;
}
else
{
_data[dataOffset] &= ~mask;
}
}
ulong BitOffsetToUlongOffset(ulong index)
{
var dataOffset = index / BitsPerUlong;
return dataOffset;
}
(C/C++/C# accepted).
As you probably figured out yourself, if offset < BitsPerULong the first block can be merged with:
data1[0] |= data2[0] << offset;
Which leaves some bits in data2[0] unmerged, but you can get those with:
data2[0] >> (BitsPerULong - offset)
So the next merge for i > 0 becomes:
data1[i] |= (data2[i] << offset) | (data2[i-1] >> (BitsPerULong - offset));
from which you can construct a for-loop to merge all data. Of course, this still means a couple of bits from data2 will "fall off" but I think that's inherent to your problem description?
If you need a more generic solution where offset can also be greater than BitsPerULong, this needs a bit more work.
I presume you mean that you want to "merge" the smaller INTO the bigger.
Have you tried: bitmapLarger |= ( bitmapSmaller << 3 ) ?

Speed up byte parsing possible?

We are doing some performance optimizations in our project and with the profiler I came upon the following method:
private int CalculateAdcValues(byte lowIndex)
{
byte middleIndex = (byte)(lowIndex + 1);
byte highIndex = (byte)(lowIndex + 2);
// samples is a byte[]
retrun (int)((int)(samples[highIndex] << 24)
+ (int)(samples[middleIndex] << 16) + (int)(samples[lowIndex] << 8));
}
This method is already pretty fast with ~1µs per execution, but it is called ~100.000 times per second and so it takes ~10% of the CPU.
Does anyone have an idea how to further improve this method?
EDIT:
Current solution:
fixed (byte* p = samples)
{
for (; loopIndex < 61; loopIndex += 3)
{
adcValues[k++] = *((int*)(p + loopIndex)) << 8;
}
}
This takes <40% of the time then before (the "whole method" took ~35µs per call before and ~13µs now). The for-loop actualy takes more time then the calcualtion now...
I strongly suspect that after casting to byte, your indexes are being converted back to int anyway for use in the array indexing operation. That will be cheap, but may not be entirely free. So get rid of the casts, unless you were using the conversion to byte to effectively get the index within the range 0..255. At that point you can get rid of the separate local variables, too.
Additionally, your casts to int are no-ops as the shift operations are only defined on int and higher types.
Finally, using | may be faster than +:
private int CalculateAdcValues(byte lowIndex)
{
return (samples[lowIndex + 2] << 24) |
(samples[lowIndex + 1] << 16) |
(samples[lowIndex] << 8);
}
(Why is there nothing in the bottom 8 bits? Is that deliberate? Note that the result will end up being negative if samples[lowIndex + 2] has its top bit set - is that okay?)
Seeing that you have a friendly endianess, go unsafe
unsafe int CalculateAdcValuesFast1(int lowIndex)
{
fixed (byte* p = &samples[lowIndex])
{
return *(int*)p << 8;
}
}
On x86 about 30% faster. Not much gain as I hoped. About 40% when on x64.
As suggested by #CodeInChaos:
var bounds = samples.Length - 3;
fixed (byte* p = samples)
{
for (int i = 0; i < 1000000000; i++)
{
var r = CalculateAdcValuesFast2(p, i % bounds); // about 2x faster
// or inlined:
var r = *((int*)(p + i % bounds)) << 8; // about 3x faster
// do something
}
}
unsafe object CalculateAdcValuesFast2(byte* p1, int p2)
{
return *((int*)(p1 + p2)) << 8;
}
May be following can be little faster. I have removed casting to integer.
var middleIndex = (byte)(lowIndex + 1);
var highIndex = (byte)(lowIndex + 2);
return (this.samples[highIndex] << 24) + (this.samples[middleIndex] << 16) + (this.samples[lowIndex] << 8);

Custom Random Number Generator

Is it possible to get an extremely fast, but reliable (Same input = same output, so I can't use time) pseudo-random number generator? I want the end result to be something like float NumGen( int x, int y, int seed ); so that it creates a random number between 0 and 1 based on those three values. I found several random number generators, but I can't get them to work, and the random number generator that comes with Unity is far to slow to use. I have to make about 9 calls to the generator per 1 meter of terrain, so I don't really care if it's not perfectly statistically random, just that it works really quickly. Does anyone know of an algorithm that fits my needs? Thanks :)
I think you are underestimating the System.Random class. It is quite speedy. I believe your slow down is related to creating a new instance of the Random class on each call to your NumGen method.
In my quick test I was able to generate 100,000 random numbers using System.Random in about 1 millisecond.
To avoid the slow down consider seed points in your 2D plane. Disperse the seed points so that they cover a distance no greater than 100,000 meters. Then associate (or calculate) the nearest seed point for each meter, and use that point as your seed to System.Random.
Yes, you will be generating a ton of random numbers you will never use, but they are virtually free.
Pseudo-code:
double NumGen(x, y, distance, seed) {
Random random = new Random(seed);
double result = 0;
for (int i=0; i<distance; i++) {
result = random.NextDouble();
}
}
You could modify this simple outline to return a sequence of random numbers (possibly representing a grid), and couple that with a caching mechanism. That would let you conserve memory and improve (lessen) CPU consumption.
I guess you had to create a Random instance on every call to NumGen. To get the function to return the same number for the same parameters you could use a hash function.
I tested a few things, and this code was about 3 times faster than recreating intances of Random.
//System.Security.Cryptography
static MD5 hasher = MD5.Create();
static byte[] outbuf;
static byte[] inbuf = new byte[12];
static float floatHash(uint x, uint y, uint z) {
inbuf[0]= (byte)(x >> 24);
inbuf[1]=(byte)(x >> 16);
inbuf[2]=(byte)(x >> 8);
inbuf[3]=(byte)(x);
inbuf[4]=(byte)(y >> 24);
inbuf[5]=(byte)(y >> 16);
inbuf[6]=(byte)(y >> 8);
inbuf[7]=(byte)(y);
inbuf[8]=(byte)(z >> 24);
inbuf[9]=(byte)(z >> 16);
inbuf[10]=(byte)(z >> 8);
inbuf[11]=(byte)(z);
outbuf = hasher.ComputeHash(inbuf);
return ((float)BitConverter.ToUInt64(outbuf, 0))/ulong.MaxValue;
}
Another method using some RSA methods is about 5 times faster than new System.Random(seed):
static uint prime = 4294967291;
static uint ord = 4294967290;
static uint generator = 4294967279;
static uint sy;
static uint xs;
static uint xy;
static float getFloat(uint x, uint y, uint seed) {
//will return values 1=> x >0; replace 'ord' with 'prime' to get 1> x >0
//one call to modPow would be enough if all data fits into an ulong
sy = modPow(generator, (((ulong)seed) << 32) + (ulong)y, prime);
xs = modPow(generator, (((ulong)x) << 32) + (ulong)seed, prime);
xy = modPow(generator, (((ulong)sy) << 32) + (ulong)xy, prime);
return ((float)xy) / ord;
}
static ulong b;
static ulong ret;
static uint modPow(uint bb, ulong e, uint m) {
b = bb;
ret = 1;
while (e > 0) {
if (e % 2 == 1) {
ret = (ret * b) % m;
}
e = e >> 1;
b = (b * b) % m;
}
return (uint)ret;
}
I ran a test to generate 100000 floats. I used the index as seed for System.Random and as x parameter of floatHash (y and z were 0).
System.Random: Min: 2.921559E-06 Max: 0.9999979 Repetitions: 0
floatHash MD5: Min: 7.011156E-06 Max: 0.9999931 Repetitions: 210 (values were returned twice)
getFloat RSA: Min: 1.547858E-06 Max: 0.9999989 Repetitions: 190

How to add even parity bit on 7-bit binary number

I am continuing from my previous question. I am making a c# program where the user enters a 7-bit binary number and the computer prints out the number with an even parity bit to the right of the number. I am struggling. I have a code, but it says BitArray is a namespace but is used as a type. Also, is there a way I could improve the code and make it simpler?
namespace BitArray
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Please enter a 7-bit binary number:");
int a = Convert.ToInt32(Console.ReadLine());
byte[] numberAsByte = new byte[] { (byte)a };
BitArray bits = new BitArray(numberAsByte);
int count = 0;
for (int i = 0; i < 8; i++)
{
if (bits[i])
{
count++;
}
}
if (count % 2 == 1)
{
bits[7] = true;
}
bits.CopyTo(numberAsByte, 0);
a = numberAsByte[0];
Console.WriteLine("The binary number with a parity bit is:");
Console.WriteLine(a);
Might be more fun to duplicate the circuit they use to do this..
bool odd = false;
for(int i=6;i>=0;i--)
odd ^= (number & (1 << i)) > 0;
Then if you want even parity set bit 7 to odd, odd parity to not odd.
or
bool even = true;
for(int i=6;i>=0;i--)
even ^= (number & (1 << i)) > 0;
The circuit is dual function returns 0 and 1 or 1 and 0, does more than 1 bit at a time as well, but this is a bit light for TPL....
PS you might want to check the input for < 128 otherwise things are going to go well wrong.
ooh didn't notice the homework tag, don't use this unless you can explain it.
Almost the same process, only much faster on a larger number of bits. Using only the arithmetic operators (SHR && XOR), without loops:
public static bool is_parity(int data)
{
//data ^= data >> 32; // if arg >= 64-bit (notice argument length)
//data ^= data >> 16; // if arg >= 32-bit
//data ^= data >> 8; // if arg >= 16-bit
data ^= data >> 4;
data ^= data >> 2;
data ^= data >> 1;
return (data & 1) !=0;
}
public static byte fix_parity(byte data)
{
if (is_parity(data)) return data;
return (byte)(data ^ 128);
}
Using a BitArray does not buy you much here, if anything it makes your code harder to understand. Your problem can be solved with basic bit manipulation with the & and | and << operators.
For example to find out if a certain bit is set in a number you can & the number with the corresponding power of 2. That leads to:
int bitsSet = 0;
for(int i=0;i<7;i++)
if ((number & (1 << i)) > 0)
bitsSet++;
Now the only thing remain is determining if bitsSet is even or odd and then setting the remaining bit if necessary.

Fastest way to calculate sum of bits in byte array

I have two byte arrays with the same length. I need to perform XOR operation between each byte and after this calculate sum of bits.
For example:
11110000^01010101 = 10100101 -> so 1+1+1+1 = 4
I need do the same operation for each element in byte array.
Use a lookup table. There are only 256 possible values after XORing, so it's not exactly going to take a long time. Unlike izb's solution though, I wouldn't suggest manually putting all the values in though - compute the lookup table once at startup using one of the looping answers.
For example:
public static class ByteArrayHelpers
{
private static readonly int[] LookupTable =
Enumerable.Range(0, 256).Select(CountBits).ToArray();
private static int CountBits(int value)
{
int count = 0;
for (int i=0; i < 8; i++)
{
count += (value >> i) & 1;
}
return count;
}
public static int CountBitsAfterXor(byte[] array)
{
int xor = 0;
foreach (byte b in array)
{
xor ^= b;
}
return LookupTable[xor];
}
}
(You could make it an extension method if you really wanted...)
Note the use of byte[] in the CountBitsAfterXor method - you could make it an IEnumerable<byte> for more generality, but iterating over an array (which is known to be an array at compile-time) will be faster. Probably only microscopically faster, but hey, you asked for the fastest way :)
I would almost certainly actually express it as
public static int CountBitsAfterXor(IEnumerable<byte> data)
in real life, but see which works better for you.
Also note the type of the xor variable as an int. In fact, there's no XOR operator defined for byte values, and if you made xor a byte it would still compile due to the nature of compound assignment operators, but it would be performing a cast on each iteration - at least in the IL. It's quite possible that the JIT would take care of this, but there's no need to even ask it to :)
Fastest way would probably be a 256-element lookup table...
int[] lut
{
/*0x00*/ 0,
/*0x01*/ 1,
/*0x02*/ 1,
/*0x03*/ 2
...
/*0xFE*/ 7,
/*0xFF*/ 8
}
e.g.
11110000^01010101 = 10100101 -> lut[165] == 4
This is more commonly referred to as bit counting. There are literally dozens of different algorithms for doing this. Here is one site which lists a few of the more well known methods. There are even CPU specific instructions for doing this.
Theorectically, Microsoft could add a BitArray.CountSetBits function that gets JITed with the best algorithm for that CPU architecture. I, for one, would welcome such an addition.
As I understood it you want to sum the bits of each XOR between the left and right bytes.
for (int b = 0; b < left.Length; b++) {
int num = left[b] ^ right[b];
int sum = 0;
for (int i = 0; i < 8; i++) {
sum += (num >> i) & 1;
}
// do something with sum maybe?
}
I'm not sure if you mean sum the bytes or the bits.
To sum the bits within a byte, this should work:
int nSum = 0;
for (int i=0; i<=7; i++)
{
nSum += (byte_val>>i) & 1;
}
You would then need the xoring, and array looping around this, of course.
The following should do
int BitXorAndSum(byte[] left, byte[] right) {
int sum = 0;
for ( var i = 0; i < left.Length; i++) {
sum += SumBits((byte)(left[i] ^ right[i]));
}
return sum;
}
int SumBits(byte b) {
var sum = 0;
for (var i = 0; i < 8; i++) {
sum += (0x1) & (b >> i);
}
return sum;
}
It can be rewritten as ulong and use unsafe pointer, but byte is easier to understand:
static int BitCount(byte num)
{
// 0x5 = 0101 (bit) 0x55 = 01010101
// 0x3 = 0011 (bit) 0x33 = 00110011
// 0xF = 1111 (bit) 0x0F = 00001111
uint count = num;
count = ((count >> 1) & 0x55) + (count & 0x55);
count = ((count >> 2) & 0x33) + (count & 0x33);
count = ((count >> 4) & 0xF0) + (count & 0x0F);
return (int)count;
}
A general function to count bits could look like:
int Count1(byte[] a)
{
int count = 0;
for (int i = 0; i < a.Length; i++)
{
byte b = a[i];
while (b != 0)
{
count++;
b = (byte)((int)b & (int)(b - 1));
}
}
return count;
}
The less 1-bits, the faster this works. It simply loops over each byte, and toggles the lowest 1 bit of that byte until the byte becomes 0. The castings are necessary so that the compiler stops complaining about the type widening and narrowing.
Your problem could then be solved by using this:
int Count1Xor(byte[] a1, byte[] a2)
{
int count = 0;
for (int i = 0; i < Math.Min(a1.Length, a2.Length); i++)
{
byte b = (byte)((int)a1[i] ^ (int)a2[i]);
while (b != 0)
{
count++;
b = (byte)((int)b & (int)(b - 1));
}
}
return count;
}
A lookup table should be the fastest, but if you want to do it without a lookup table, this will work for bytes in just 10 operations.
public static int BitCount(byte value) {
int v = value - ((value >> 1) & 0x55);
v = (v & 0x33) + ((v >> 2) & 0x33);
return ((v + (v >> 4) & 0x0F));
}
This is a byte version of the general bit counting function described at Sean Eron Anderson's bit fiddling site.

Categories