RNGCryptoServiceProvider - generate number in a range faster and retain distribution? - c#

I'm using the RNG crypto provider to generate numbers in a range the truly naive way:
byte[] bytes = new byte[4];
int result = 0;
while(result < min || result > max)
{
RNG.GetBytes(bytes);
result = BitConverter.ToInt32(bytes);
}
This is great when the range is wide enough such that there is a decent chance of getting a result, but earlier today I hit a scenario where the range is sufficiently small (within 10,000 numbers) that it can take an age.
So I've been trying to think of a better way that will achieve a decent distribution but will be faster. But now I'm getting into deeper maths and statistics that I simply didn't do at school, or at least if I did I have forgotten it all!
My idea is:
get the highest set bit positions of the min and max, e.g. for 4 it would be 3 and for 17 it would be 5
select a number of bytes from the prng that could contain at least the high bits, e.g.1 in this case for 8 bits
see if any of the upper bits in the allowed range (3-5) are set
if yes, turn that into a number up to and including the high bit
if that number is between min and max, return.
if any of the previous tests fail, start again.
Like I say, this could be exceedingly naive, but I am sure it will return a match in a narrow range faster than the current implementation. I'm not in front of a computer at the moment so can't test, will be doing that tomorrow morning UK time.
But of course speed isn't my only concern, otherwise I would just use Random (needs a couple of tick marks there to format correctly if someone would be kind enough - they're not on the Android keyboard!).
The biggest concern I have with the above approach is that I am always throwing away up to 7 bits that were generated by the prng, which seems bad. I thought of ways to factor them in (e.g. a simple addition) but they seem terribly unscientific hacks!
I know about the mod trick, where you only have to generate one sequence, but I also know about its weaknesses.
Is this a dead end? Ultimately if the best solution is going to be to stick with the current implementation I will, I just feel that there must be a better way!

Stephen Toub and Shawn Farkas has co-written an excellent article on MSDN called Tales From The CryptoRandom that you should definitely read if you're experimenting with RNGCryptoServiceProviders
In it they provide an implementation that inherits from System.Random (which contains the nice range-random method that you're looking for) but instead of using pseudo random numbers their implementation uses the RNGCryptoServiceProvider.
The way he has implemented the Next(min, max) method is as follows:
public override Int32 Next(Int32 minValue, Int32 maxValue)
{
if (minValue > maxValue)
throw new ArgumentOutOfRangeException("minValue");
if (minValue == maxValue) return minValue;
Int64 diff = maxValue - minValue;
while (true)
{
_rng.GetBytes(_uint32Buffer);
UInt32 rand = BitConverter.ToUInt32(_uint32Buffer, 0);
Int64 max = (1 + (Int64)UInt32.MaxValue);
Int64 remainder = max % diff;
if (rand < max - remainder)
{
return (Int32)(minValue + (rand % diff));
}
}
}
The reasoning for the choice of implementation as well as a detailed analysis about loss of randomness and what steps they are taking to produce high-quality random numbers is in their article.
Thread safe bufferred CryptoRandom
I've written an extended implementation of Stephen's class which utilized a random buffer in order to minimize any overhead of calling out to GetBytes(). My implementation also uses synchronization to provide thread safety, making it possible to share the instance between all your threads to make full use of the buffer.
I wrote this for a very specific scenario so you should of course profile whether or not is makes sense for you given the specific contention and concurrency attributes of your application. I threw the code up on github if you wan't to check it out.
Threadsafe buffered CryptoRandom based on Stephen Toub and Shawn Farkas' implementation
When I wrote it (a couple of years back) I seem to have done some profiling as well
Results produced by calling Next() 1 000 000 times on my machine (dual core 3Ghz)
System.Random completed in 20.4993 ms (avg 0 ms) (first: 0.3454 ms)
CryptoRandom with pool completed in 132.2408 ms (avg 0.0001 ms) (first: 0.025 ms)
CryptoRandom without pool completed in 2 sec 587.708 ms (avg 0.0025 ms) (first: 1.4142 ms)
|---------------------|------------------------------------|
| Implementation | Slowdown compared to System.Random |
|---------------------|------------------------------------|
| System.Random | 0 |
| CryptoRand w pool | 6,6x |
| CryptoRand w/o pool | 19,5x |
|---------------------|------------------------------------|
Please note that theese measurements only profile a very specific non-real-world scenario and should only be used for guidance, measure your scenario for proper results.

If you use a while loop, this is going to be slow and is based on an unknown number of iterations.
You could compute it on the first try using the modulo operator (%).
But, if we squeeze results with modulo, we immediately create an imbalance in the probability distributions.
This means that this approach could be applied if we care only about the speed, not probabilistic randomness of the generated number.
Here is a RNG utility that could suit your needs:
using System;
using System.Security.Cryptography;
static class RNGUtil
{
/// <exception cref="ArgumentOutOfRangeException"><paramref name="min" /> is greater than <paramref name="max" />.</exception>
public static int Next(int min, int max)
{
if (min > max) throw new ArgumentOutOfRangeException(nameof(min));
if (min == max) return min;
using (var rng = new RNGCryptoServiceProvider())
{
var data = new byte[4];
rng.GetBytes(data);
int generatedValue = Math.Abs(BitConverter.ToInt32(data, startIndex: 0));
int diff = max - min;
int mod = generatedValue % diff;
int normalizedNumber = min + mod;
return normalizedNumber;
}
}
}
In this case RNGUtil.Next(-5, 20) would fetch an arbitrary number within range -5..19
A small test:
var list = new LinkedList<int>();
for (int i = 0; i < 10000; i++)
{
int next = RNGUtil.Next(-5, 20);
list.AddLast(next);
}
bool firstNumber = true;
foreach (int x in list.Distinct().OrderBy(x => x))
{
if (!firstNumber) Console.Out.Write(", ");
Console.Out.Write(x);
firstNumber = false;
}
Output: -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19

You can generate many more bytes at once for a very small overhead. The main overhead with the RNGCrptoService is the call itself to fill in the bytes.
While you might throw away unused bytes, I'd give it a shot as I've gotten very good speeds from this and the modulo method (that you aren't using).
int vSize = 20*4;
byte[] vBytes = new byte[vSize];
RNG.GetBytes(vBytes);
int vResult = 0;
int vLocation = 0;
while(vResult < min || vResult > max)
{
vLocation += 4;
vLocation = vLocation % vSize;
if(vLocation == 0)
RNG.GetBytes(vBytes);
vResult = BitConverter.ToInt32(vBytes, vLocation);
}
Another thing you can do is the comparison you where thinking of bitwise. However, I'd focus on if the range fits in a byte, a short, an int, or a long. Then you can modulo the int result by the max of that type (giving you the lower order bits).
//We want a short, so we change the location increment and we modulo the result.
int vSize = 20*4;
byte[] vBytes = new byte[vSize];
RNG.GetBytes(vBytes);
int vResult = 0;
int vLocation = 0;
while(vResult < min || vResult > max)
{
vLocation += 2;
vLocation = vLocation % vSize;
if(vLocation == 0)
RNG.GetBytes(vBytes);
vResult = BitConverter.ToInt32(vBytes, vLocation) % 32768;
}

The following is an adaptation of #Andrey-WD's answer above, but with the difference that you simply send in a random number you already have generated (a ulong in this case, could be changed to uint). Where this is very efficient is when you need multiple random numbers in a range, you can simply generate an array of such numbers via RNGCryptoServiceProvider (or whatever, even with Random if that fit your needs). This I'm sure will be far more performant when needing to generate multiple random numbers within a range. All you need is the stash of random numbs to feed the function. See my note above on #Andrey-WD's answer, I'm curious why others aren't doing this simpler modulus route which doesn't require multiple iterations. If there really is a required reason for the multiple iterations route, I would be glad to hear it.
public static int GetRandomNumber(int min, int max, ulong randomNum)
{
if (min > max) throw new ArgumentOutOfRangeException(nameof(min));
if (min == max) return min;
//var rng = new RNGCryptoServiceProvider();
//byte[] data = new byte[4];
//rng.GetBytes(data);
//int generatedValue = Math.Abs(BitConverter.ToInt32(data, startIndex: 0));
int diff = max - min;
int mod = (int)(randomNum % (ulong)diff); // generatedValue % diff;
int normalizedNumber = min + mod;
return normalizedNumber;
}
Here's how you can effectively get a clean array of random numbers. I like how this cleanly encapsulates getting the random numbers, the code that uses this then doesn't have to be cluttered with byte conversion on every iteration to get the int or long with BitConverter. I also assume this gains performance by a singular conversion of bytes to the array type.
public static ulong[] GetRandomLongArray(int length)
{
if (length < 0) throw new ArgumentOutOfRangeException(nameof(length));
ulong[] arr = new ulong[length];
if (length > 0) { // if they want 0, why 'throw' a fit, just give it to them ;)
byte[] rndByteArr = new byte[length * sizeof(ulong)];
var rnd = new RNGCryptoServiceProvider();
rnd.GetBytes(rndByteArr);
Buffer.BlockCopy(rndByteArr, 0, arr, 0, rndByteArr.Length);
}
return arr;
}
Usage:
ulong[] randomNums = GetRandomLongArray(100);
for (int i = 0; i < 20; i++) {
ulong randNum = randomNums[i];
int val = GetRandomNumber(10, 30, randNum); // get a rand num between 10 - 30
WriteLine(val);
}

public int RandomNumber(int min = 1, int max = int.MaxValue)
{
using (var rng = new RNGCryptoServiceProvider())
{
byte[] buffer = new byte[4];
rng.GetBytes(buffer);
return (int)(BitConverter.ToUInt32(buffer, 0) >> 1) % ((max - min) + 1);
}
}

Related

Why multiply Random.Next() by a Constant?

I recently read an article explaining how to generate a weighted random number, and there's a piece of the code that I don't understand:
int r = ((int)(rand.Next() * (323567)) % prefix[n - 1]) + 1;
Why is rand.Next being multiplied by a constant 323567? Would the code work without this constant?
Below is the full code for reference, and you can find the full article here: https://www.geeksforgeeks.org/random-number-generator-in-arbitrary-probability-distribution-fashion/
Any help is appreciated, thank you!!
// C# program to generate random numbers
// according to given frequency distribution
using System;
class GFG{
// Utility function to find ceiling
// of r in arr[l..h]
static int findCeil(int[] arr, int r,
int l, int h)
{
int mid;
while (l < h)
{
// Same as mid = (l+h)/2
mid = l + ((h - l) >> 1);
if (r > arr[mid])
l = mid + 1;
else
h = mid;
}
return (arr[l] >= r) ? l : -1;
}
// The main function that returns a random number
// from arr[] according to distribution array
// defined by freq[]. n is size of arrays.
static int myRand(int[] arr, int[] freq, int n)
{
// Create and fill prefix array
int[] prefix = new int[n];
int i;
prefix[0] = freq[0];
for(i = 1; i < n; ++i)
prefix[i] = prefix[i - 1] + freq[i];
// prefix[n-1] is sum of all frequencies.
// Generate a random number with
// value from 1 to this sum
Random rand = new Random();
int r = ((int)(rand.Next() * (323567)) % prefix[n - 1]) + 1; // <--- RNG * Constant
// Find index of ceiling of r in prefix array
int indexc = findCeil(prefix, r, 0, n - 1);
return arr[indexc];
}
// Driver Code
static void Main()
{
int[] arr = { 1, 2, 3, 4 };
int[] freq = { 10, 5, 20, 100 };
int i, n = arr.Length;
// Let us generate 10 random numbers
// according to given distribution
for(i = 0; i < 5; i++)
Console.WriteLine(myRand(arr, freq, n));
}
}
UPDATE:
I ran this code to check it:
int[] intArray = new int[] { 1, 2, 3, 4, 5 };
int[] weights = new int[] { 5, 20, 20, 40, 15 };
List<int> results = new List<int>();
for (int i = 0; i < 100000; i++)
{
results.Add(WeightedRNG.GetRand(intArray, weights, intArray.Length));
}
for (int i = 0; i < intArray.Length; i++)
{
int itemsFound = results.Where(x => x == intArray[i]).Count();
Console.WriteLine($"{intArray[i]} was returned {itemsFound} times.");
}
And here are the results with the constant:
1 was returned 5096 times.
2 was returned 19902 times.
3 was returned 20086 times.
4 was returned 40012 times.
5 was returned 14904 times.
And without the constant...
1 was returned 100000 times.
2 was returned 0 times.
3 was returned 0 times.
4 was returned 0 times.
5 was returned 0 times.
It completely breaks without it.
The constant does serve a purpose in some environments, but I don't believe this code is correct for C#.
To explain, let's look at the arguments to the function. The first sign something is off is passing n as an argument instead of inferring it from the arrays. The second sign is it's poor practice in C# to deal with paired arrays rather than something like a 2D array or sequence of single objects (such as a Tuple). But those are just indicators something is odd, and not evidence of any bugs.
So let's put that on hold for a moment and explain why a constant might matter by looking a small example.
Say you have three numbers (1, 2, and 3) with weights 3, 2, and 2. This function first builds up a prefix array, where each item includes the chances of finding the number for that index and all previous numbers.
We end up with a result like this: (3, 5, 7). Now we can use the last value and take a random number from 1 to 7. Values 1-3 result in 1, values 4 and 5 result in 2, and values 6 and 7 result in 3.
To find this random number the code now calls rand.Next(), and this is where I think the error comes in. In many platforms, the Next() function returns a double between 0 and 1. That's too small to use to lookup your weighted value, so you then multiply by a prime constant related the platform's epsilon value to ensure you have a reasonably large result that will cover the entire possible range of desired weights (in this case, 1-7) and then some. Now you take the remainder (%) vs your max weight (7), and map it via the prefix array to get the final result.
So the first error is, in C#, .Next() does not return a double. It is documented to return a non-negative random integer between 0 and int.MaxValue. Multiply that by 323567 and you're asking for integer overflow exceptions. Another sign of this mistake is the cast to int: the result of this function is already an int. And let's not even talk the meaningless extra parentheses around (323567).
There is also another, more subtle, error.
Let's the say the result of the (int)(rand.Next() * 323567) expression is 10. Now we take this value and get the remainder when dividing by our max value (%7). The problem here is we have two chances to roll a 1, 2, or 3 (the extra chance is if the original was 8, 9, or 10), and only once chance for the remaining weights (4-7). So we have introduced unintended bias into the system, completely throwing off the expected weights. (This is a simplification. The actual number space is not 1-10; it's 0 * 323567 - 0.999999 * 323567. But the potential for bias still exists as long that max value is not evenly divisible by the max weight value).
It's possible the constant was chosen because it has properties to minimize this effect, but again: it was based on a completely different kind of .Next() function.
Therefore the rand.Next() line should probably be changed to look like this:
int r = rand.Next(prefix[n - 1]) +1;
or this:
int r = ((int)(rand.NextDouble() * (323567 * prefix[n - 1])) % prefix[n - 1]) + 1;
But, given the other errors, I'd be wary of using this code at all.
For fun, here's an example running several different options:
https://dotnetfiddle.net/Y5qhRm
The original random method (using NextDouble() and a bare constant) doesn't fare as badly as I'd expect.

Find number with binary search - maximum and minimum number of operations

I want to know how can I calculate the maximum and minimum number of operations to find the specific number in defined range.
Here the example code (C#):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
namespace Rextester
{
public class Program
{
public static void Main(string[] args)
{
Random r = new Random();
int min = 2047;
int max = 4096;
int val = min;
int counter = 0;
int i = r.Next(2047, 4096);
while(true)
{
Console.WriteLine("--> "+val);
if(i == val) break;
val = (max-min)/2 + min;
if(i>val) {
min = val;
}
else if(i<val) {
max = val;
}
counter++;
}
Console.WriteLine("\n\nThe calculated i is: "+val+", but i is: "+i+", done in "+counter+" counts.\n\n\n\n\n");
}
}
}
I can see that to find the random number in range from 2047 to 4096 takes 11 operations maximum (based on 10000 runs).
Where can I find a theoretical explanation of the calculation?
Thanks a lot!
The way binary search works is that you halve the search interval on each operation. So if I have 200 numbers, the first iteration will get it down to 100, the second to 50, then 25, 12, 6, 3, 1 and we're done. If you have some mathematical background, this pattern looks very familiar - it's an exponential function. In particular, since we're dealing with binary search, it's an exponential with base two.
So if you want to know how many values you can search with a given number of operations, it's just that power of two - if you only want to allow 8 operations, you can have up to 2^8 values to search in - 256. If you want to go the other way around, from number of values to (maximum) number of operations, you need to use the inverse function - a logarithm with base two. The base-two logarithm of 4096 is 12.
Why is your result 11? Because you don't increment the counter when i == val is true; but that's only a matter of definition of operation that you use. There's 11 halvings involved, since the final operation doesn't do any halving.
4096 - 2047 = 2049, which equals 2^11 + 1
Your answer is the power of 2, which gives you the size of the interval within which you search.

How do I generate random number between 0 and 1 in C#?

I want to get the random number between 1 and 0. However, I'm getting 0 every single time. Can someone explain me the reason why I and getting 0 all the time?
This is the code I have tried.
Random random = new Random();
int test = random.Next(0, 1);
Console.WriteLine(test);
Console.ReadKey();
According to the documentation, Next returns an integer random number between the (inclusive) minimum and the (exclusive) maximum:
Return Value
A 32-bit signed integer greater than or equal to minValue and less than maxValue; that is, the range of return values includes minValue but not maxValue. If minValue equals maxValue, minValue is returned.
The only integer number which fulfills
0 <= x < 1
is 0, hence you always get the value 0. In other words, 0 is the only integer that is within the half-closed interval [0, 1).
So, if you are actually interested in the integer values 0 or 1, then use 2 as upper bound:
var n = random.Next(0, 2);
If instead you want to get a decimal between 0 and 1, try:
var n = random.NextDouble();
You could, but you should do it this way:
double test = random.NextDouble();
If you wanted to get random integer ( 0 or 1), you should set upper bound to 2, because it is exclusive
int test = random.Next(0, 2);
Every single answer on this page regarding doubles is wrong, which is sort of hilarious because everyone is quoting the documentation. If you generate a double using NextDouble(), you will not get a number between 0 and 1 inclusive of 1, you will get a number from 0 to 1 exclusive of 1.
To get a double, you would have to do some trickery like this:
public double NextRandomRange(double minimum, double maximum)
{
Random rand = new Random();
return rand.NextDouble() * (maximum - minimum) + minimum;
}
and then call
NextRandomRange(0,1 + Double.Epsilon);
Seems like that would work, doesn't it? 1 + Double.Epsilon should be the next biggest number after 1 when working with doubles, right? This is how you would solve the problem with ints.
Wellllllllllllllll.........
I suspect that this will not work correctly, since the underlying code will be generating a few bytes of randomness, and then doing some math tricks to fit it in the expected range. The short answer is that Logic that applies to ints doesn't quite work the same when working with floats.
Lets look, shall we? (https://referencesource.microsoft.com/#mscorlib/system/random.cs,e137873446fcef75)
/*=====================================Next=====================================
**Returns: A double [0..1)
**Arguments: None
**Exceptions: None
==============================================================================*/
public virtual double NextDouble() {
return Sample();
}
What the hell is Sample()?
/*====================================Sample====================================
**Action: Return a new random number [0..1) and reSeed the Seed array.
**Returns: A double [0..1)
**Arguments: None
**Exceptions: None
==============================================================================*/
protected virtual double Sample() {
//Including this division at the end gives us significantly improved
//random number distribution.
return (InternalSample()*(1.0/MBIG));
}
Ok, starting to get somewhere. MBIG btw, is Int32.MaxValue(2147483647 or 2^31-1), making the division work out to:
InternalSample()*0.0000000004656612873077392578125;
Ok, what the hell is InternalSample()?
private int InternalSample() {
int retVal;
int locINext = inext;
int locINextp = inextp;
if (++locINext >=56) locINext=1;
if (++locINextp>= 56) locINextp = 1;
retVal = SeedArray[locINext]-SeedArray[locINextp];
if (retVal == MBIG) retVal--;
if (retVal<0) retVal+=MBIG;
SeedArray[locINext]=retVal;
inext = locINext;
inextp = locINextp;
return retVal;
}
Well...that is something. But what is this SeedArray and inext crap all about?
private int inext;
private int inextp;
private int[] SeedArray = new int[56];
So things start to fall together. Seed array is an array of ints that is used for generating values from. If you look at the init function def, you see that there is a whole lot of bit addition and trickery being done to randomize an array of 55 values with initial quasi-random values.
public Random(int Seed) {
int ii;
int mj, mk;
//Initialize our Seed array.
//This algorithm comes from Numerical Recipes in C (2nd Ed.)
int subtraction = (Seed == Int32.MinValue) ? Int32.MaxValue : Math.Abs(Seed);
mj = MSEED - subtraction;
SeedArray[55]=mj;
mk=1;
for (int i=1; i<55; i++) { //Apparently the range [1..55] is special (All hail Knuth!) and so we're skipping over the 0th position.
ii = (21*i)%55;
SeedArray[ii]=mk;
mk = mj - mk;
if (mk<0) mk+=MBIG;
mj=SeedArray[ii];
}
for (int k=1; k<5; k++) {
for (int i=1; i<56; i++) {
SeedArray[i] -= SeedArray[1+(i+30)%55];
if (SeedArray[i]<0) SeedArray[i]+=MBIG;
}
}
inext=0;
inextp = 21;
Seed = 1;
}
Ok, going back to InternalSample(), we can now see that random doubles are generated by taking the difference of two scrambled up 32 bit ints, clamping the result into the range of 0 to 2147483647 - 1 and then multiplying the result by 1/2147483647. More trickery is done to scramble up the list of seed values as it uses values, but that is essentially it.
(It is interesting to note that the chance of getting any number in the range is roughly 1/r EXCEPT for 2^31-2, which is 2 * (1/r)! So if you think some dumbass coder is using RandNext() to generate numbers on a video poker machine, you should always bet on 2^32-2! This is one reason why we don't use Random for anything important...)
so, if the output of InternalSample() is 0 we multiply that by 0.0000000004656612873077392578125 and get 0, the bottom end of our range. if we get 2147483646, we end up with 0.9999999995343387126922607421875, so the claim that NextDouble produces a result of [0,1) is...sort of right? It would be more accurate to say it is int he range of [0,0.9999999995343387126922607421875].
My suggested above solution would fall on its face, since double.Epsilon = 4.94065645841247E-324, which is WAY smaller than 0.0000000004656612873077392578125 (the amount you would add to our above result to get 1).
Ironically, if it were not for the subtraction of one in the InternalSample() method:
if (retVal == MBIG) retVal--;
we could get to 1 in the return values that come back. So either you copy all the code in the Random class and omit the retVal-- line, or multiply the NextDouble() output by something like 1.0000000004656612875245796924106 to slightly stretch the output to include 1 in the range. Actually testing that value gets us really close, but I don't know if the few hundred million tests I ran just didn't produce 2147483646 (quite likely) or there is a floating point error creeping into the equation. I suspect the former. Millions of test are unlikely to yield a result that has 1 in 2 billion odds.
NextRandomRange(0,1.0000000004656612875245796924106); // try to explain where you got that number during the code review...
TLDR? Inclusive ranges with random doubles is tricky...
You are getting zero because Random.Next(a,b) returns number in range [a, b), which is greater than or equal to a, and less than b.
If you want to get one of the {0, 1}, you should use:
var random = new Random();
var test = random.Next(0, 2);
Because you asked for a number less than 1.
The documentation says:
Return Value
A 32-bit signed integer greater than or equal to minValue and less than maxValue; that is, the range of return values
includes minValue but not maxValue. If minValue equals maxValue,
minValue is returned.
Rewrite the code like this if you are targeting 0.0 to 1.0
Random random = new Random();
double test = random.NextDouble();
Console.WriteLine(test);
Console.ReadKey();

Efficient Rolling Max and Min Window

I want to calculate a rolling maximum and minimum value efficiently. Meaning anything better than recalculating the maximum/minimum from all the values in use every time the window moves.
There was a post on here that asked the same thing and someone posted a solution involving some kind of stack approach that supposedly worked based on its rating. However I can't find it again for the life of me.
Any help would be appreciated in finding a solution or the post. Thank you all!
The algorithm you want to use is called the ascending minima (C++ implementation).
To do this in C#, you will want to get a double ended queue class, and a good one exists on NuGet under the name Nito.Deque.
I have written a quick C# implementation using Nito.Deque, but I have only briefly checked it, and did it from my head so it may be wrong!
public static class AscendingMinima
{
private struct MinimaValue
{
public int RemoveIndex { get; set; }
public double Value { get; set; }
}
public static double[] GetMin(this double[] input, int window)
{
var queue = new Deque<MinimaValue>();
var result = new double[input.Length];
for (int i = 0; i < input.Length; i++)
{
var val = input[i];
// Note: in Nito.Deque, queue[0] is the front
while (queue.Count > 0 && i >= queue[0].RemoveIndex)
queue.RemoveFromFront();
while (queue.Count > 0 && queue[queue.Count - 1].Value >= val)
queue.RemoveFromBack();
queue.AddToBack(new MinimaValue{RemoveIndex = i + window, Value = val });
result[i] = queue[0].Value;
}
return result;
}
}
Here's one way to do it more efficiently. You still have to calculate the value occasionally but, other than certain degenerate data (ever decreasing values), that's minimised in this solution.
We'll limit ourselves to the maximum to simplify things but it's simple to extend to a minimum as well.
All you need is the following:
The window itself, initially empty.
The current maximum (max), initially any value.
The count of the current maximum (maxcount), initially zero.
The idea is to use max and maxcount as a cache for holding the current maximum. Where the cache is valid, you only need to return the value in it, a very fast constant-time operation.
If the cache is invalid when you ask for the maximum, it populates the cache and then returns that value. This is slower than the method in the previous paragraph but subsequent requests for the maximum once the cache is valid again use that faster method.
Here's what you do for maintaining the window and associated data:
Get the next value N.
If the window is full, remove the earliest entry M. If maxcount is greater than 0 and M is equal to max, decrement maxcount. Once maxcount reaches 0, the cache is invalid but we don't need to worry about that until such time the user requests the maximum value (there's no point repopulating the cache until then).
Add N to the rolling window.
If the window size is now 1 (that N is the only current entry), set max to N and maxcount to 1, then go back to step 1.
If maxcount is greater than 0 and N is greater than max, set max to N and maxcount to 1, then go back to step 1.
If maxcount is greater than 0 and N is equal to max, increment maxcount.
Go back to step 1.
Now, at any point while that window management is going on, you may request the maximum value. This is a separate operation, distinct from the window management itself. This can be done using the following rules in sequence.
If the window is empty, there is no maximum: raise an exception or return some sensible sentinel value.
If maxcount is greater than 0, then the cache is valid: simply return max.
Otherwise, the cache needs to be repopulated. Go through the entire list, setting up max and maxcount as per the code snippet below.
set max to window[0], maxcount to 0
for each x in window[]:
if x > max:
set max to x, maxcount to 1
else:
if x == max:
increment maxcount
The fact that you mostly maintain a cache of the maximum value and only recalculate when needed makes this a much more efficient solution than simply recalculating blindly whenever an entry is added.
For some definite statistics, I created the following Python program. It uses a sliding window of size 25 and uses random numbers from 0 to 999 inclusive (you can play with these properties to see how they affect the outcome).
First some initialisation code. Note the stat variables, they'll be used to count cache hits and misses:
import random
window = []
max = 0
maxcount = 0
maxwin = 25
statCache = 0
statNonCache = 0
Then the function to add a number to the window, as per my description above:
def addNum(n):
global window
global max
global maxcount
if len(window) == maxwin:
m = window[0]
window = window[1:]
if maxcount > 0 and m == max:
maxcount = maxcount - 1
window.append(n)
if len(window) == 1:
max = n
maxcount = 1
return
if maxcount > 0 and n > max:
max = n
maxcount = 1
return
if maxcount > 0 and n == max:
maxcount = maxcount + 1
Next, the code which returns the maximum value from the window:
def getMax():
global max
global maxcount
global statCache
global statNonCache
if len(window) == 0:
return None
if maxcount > 0:
statCache = statCache + 1
return max
max = window[0]
maxcount = 0
for val in window:
if val > max:
max = val
maxcount = 1
else:
if val == max:
maxcount = maxcount + 1
statNonCache = statNonCache + 1
return max
And, finally, the test harness:
random.seed()
for i in range(1000000):
val = int(1000 * random.random())
addNum(val)
newmax = getMax()
print("%d cached, %d non-cached"%(statCache,statNonCache))
Note that the test harness attempts to get the maximum for every time you add a number to the window. In practice, this may not be needed. In other words, this is the worst-case scenario for the random data generated.
Running that program a few times for pseudo-statistical purposes, we get (formatted and analysed for reporting purposes):
960579 cached, 39421 non-cached
960373 cached, 39627 non-cached
960395 cached, 39605 non-cached
960348 cached, 39652 non-cached
960441 cached, 39559 non-cached
960602 cached, 39398 non-cached
960561 cached, 39439 non-cached
960463 cached, 39537 non-cached
960409 cached, 39591 non-cached
960798 cached, 39202 non-cached
======= ======
9604969 395031
So you can see that, on average for random data, only about 3.95% of the cases resulted in a calculation hit (cache miss). The vast majority used the cached values. That should be substantially better than having to recalculate the maximum on every insertion into the window.
Some things that will affect that percentage will be:
The window size. Larger sizes means that there's more likelihood of a cache hit, improving the percentage. For example, doubling the window size pretty much halved the cache misses (to 1.95%).
The range of possible values. Less choice here means that there's more likely to be cache hits in the window. For example, reducing the range from 0..999 to 0..9 gave a big improvement in reducing cache misses (0.85%).
I'm assuming that by "window" you mean a range a[start] to a[start + len], and that start moves along. Consider the minimal value, the maximal is similar, and the move to the window a[start + 1] to a[start + len + 1]. Then the minimal value of the window will change only if (a) a[start + len + 1] < min (a smaller value came in), or (b) a[start] == min (one of the smallest values just left; recompute the minimum).
Another, possibly more efficient way of doing this, is to fill a priority queue with the first window, and update with each value entering/leaving, but I don't think that is much better (priority queues aren't suited to "pick out random element from the middle" (what you need to do when advancing the window). And the code will be much more complex. Better stick to the simple solution until proven that the performance isn't acceptable, and that this code is responsible for (much of) the resource consumption.
After writing my own algo yesterday, and asking for improvements, I was referred here. Indeed this algo is more elegant.
I'm not sure it offers constant speed calc regardless of the window size, but regardless, I tested the performance vs my own caching algo (fairly simple, and probably uses the same idea as others have proposed).
the caching is 8-15 times faster (tested with rolling windows of 5,50,300,1000 I don't need more).
below are both alternatives with stopwatches and result validation.
static class Program
{
static Random r = new Random();
static int Window = 50; //(small to facilitate visual functional test). eventually could be 100 1000, but not more than 5000.
const int FullDataSize =1000;
static double[] InputArr = new double[FullDataSize]; //array prefilled with the random input data.
//====================== Caching algo variables
static double Low = 0;
static int LowLocation = 0;
static int CurrentLocation = 0;
static double[] Result1 = new double[FullDataSize]; //contains the caching mimimum result
static int i1; //incrementor, just to store the result back to the array. In real life, the result is not even stored back to array.
//====================== Ascending Minima algo variables
static double[] Result2 = new double[FullDataSize]; //contains ascending miminum result.
static double[] RollWinArray = new double[Window]; //array for the caching algo
static Deque<MinimaValue> RollWinDeque = new Deque<MinimaValue>(); //Niro.Deque nuget.
static int i2; //used by the struct of the Deque (not just for result storage)
//====================================== my initialy proposed caching algo
static void CalcCachingMin(double currentNum)
{
RollWinArray[CurrentLocation] = currentNum;
if (currentNum <= Low)
{
LowLocation = CurrentLocation;
Low = currentNum;
}
else if (CurrentLocation == LowLocation)
ReFindHighest();
CurrentLocation++;
if (CurrentLocation == Window) CurrentLocation = 0; //this is faster
//CurrentLocation = CurrentLocation % Window; //this is slower, still over 10 fold faster than ascending minima
Result1[i1++] = Low;
}
//full iteration run each time lowest is overwritten.
static void ReFindHighest()
{
Low = RollWinArray[0];
LowLocation = 0; //bug fix. missing from initial version.
for (int i = 1; i < Window; i++)
if (RollWinArray[i] < Low)
{
Low = RollWinArray[i];
LowLocation = i;
}
}
//======================================= Ascending Minima algo based on http://stackoverflow.com/a/14823809/2381899
private struct MinimaValue
{
public int RemoveIndex { get; set; }
public double Value { get; set; }
}
public static void CalcAscendingMinima (double newNum)
{ //same algo as the extension method below, but used on external arrays, and fed with 1 data point at a time like in the projected real time app.
while (RollWinDeque.Count > 0 && i2 >= RollWinDeque[0].RemoveIndex)
RollWinDeque.RemoveFromFront();
while (RollWinDeque.Count > 0 && RollWinDeque[RollWinDeque.Count - 1].Value >= newNum)
RollWinDeque.RemoveFromBack();
RollWinDeque.AddToBack(new MinimaValue { RemoveIndex = i2 + Window, Value = newNum });
Result2[i2++] = RollWinDeque[0].Value;
}
public static double[] GetMin(this double[] input, int window)
{ //this is the initial method extesion for ascending mimima
//taken from http://stackoverflow.com/a/14823809/2381899
var queue = new Deque<MinimaValue>();
var result = new double[input.Length];
for (int i = 0; i < input.Length; i++)
{
var val = input[i];
// Note: in Nito.Deque, queue[0] is the front
while (queue.Count > 0 && i >= queue[0].RemoveIndex)
queue.RemoveFromFront();
while (queue.Count > 0 && queue[queue.Count - 1].Value >= val)
queue.RemoveFromBack();
queue.AddToBack(new MinimaValue { RemoveIndex = i + window, Value = val });
result[i] = queue[0].Value;
}
return result;
}
//============================================ Test program.
static void Main(string[] args)
{ //this it the test program.
//it runs several attempts of both algos on the same data.
for (int j = 0; j < 10; j++)
{
Low = 12000;
for (int i = 0; i < Window; i++)
RollWinArray[i] = 10000000;
//Fill the data + functional test - generate 100 numbers and check them in as you go:
InputArr[0] = 12000;
for (int i = 1; i < FullDataSize; i++) //fill the Input array with random data.
//InputArr[i] = r.Next(100) + 11000;//simple data.
InputArr[i] = InputArr[i - 1] + r.NextDouble() - 0.5; //brownian motion data.
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < FullDataSize; i++) //run the Caching algo.
CalcCachingMin(InputArr[i]);
stopwatch.Stop();
Console.WriteLine("Caching : " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
stopwatch.Start();
for (int i = 0; i < FullDataSize; i++) //run the Ascending Minima algo
CalcAscendingMinima(InputArr[i]);
stopwatch.Stop();
Console.WriteLine("AscMimima: " + stopwatch.ElapsedTicks + " mS: " + stopwatch.ElapsedMilliseconds);
stopwatch.Reset();
i1 = 0; i2 = 0; RollWinDeque.Clear();
}
for (int i = 0; i < FullDataSize; i++) //test the results.
if (Result2[i] != Result1[i]) //this is a test that algos are valid. Errors (mismatches) are printed.
Console.WriteLine("Current:" + InputArr[i].ToString("#.00") + "\tLowest of " + Window + "last is " + Result1[i].ToString("#.00") + " " + Result2[i].ToString("#.00") + "\t" + (Result1[i] == Result2[i])); //for validation purposes only.
Console.ReadLine();
}
}
I would suggest maintaining a stack which supports getMin() or getMax().
This can be done with two stacks and costs only constant time.
fyi: https://www.geeksforgeeks.org/design-a-stack-that-supports-getmin-in-o1-time-and-o1-extra-space/

Random playlist algorithm

I need to create a list of numbers from a range (for example from x to y) in a random order so that every order has an equal chance.
I need this for a music player I write in C#, to create play lists in a random order.
Any ideas?
Thanks.
EDIT: I'm not interested in changing the original list, just pick up random indexes from a range in a random order so that every order has an equal chance.
Here's what I've wrriten so far:
public static IEnumerable<int> RandomIndexes(int count)
{
if (count > 0)
{
int[] indexes = new int[count];
int indexesCountMinus1 = count - 1;
for (int i = 0; i < count; i++)
{
indexes[i] = i;
}
Random random = new Random();
while (indexesCountMinus1 > 0)
{
int currIndex = random.Next(0, indexesCountMinus1 + 1);
yield return indexes[currIndex];
indexes[currIndex] = indexes[indexesCountMinus1];
indexesCountMinus1--;
}
yield return indexes[0];
}
}
It's working, but the only problem of this is that I need to allocate an array in the memory in the size of count. I'm looking for something that dose not require memory allocation.
Thanks.
This can actually be tricky if you're not careful (i.e., using a naïve shuffling algorithm). Take a look at the Fisher-Yates/Knuth shuffle algorithm for proper distribution of values.
Once you have the shuffling algorithm, the rest should be easy.
Here's more detail from Jeff Atwood.
Lastly, here's Jon Skeet's implementation and description.
EDIT
I don't believe that there's a solution that satisfies your two conflicting requirements (first, to be random with no repeats and second to not allocate any additional memory). I believe you may be prematurely optimizing your solution as the memory implications should be negligible, unless you're embedded. Or, perhaps I'm just not smart enough to come up with an answer.
With that, here's code that will create an array of evenly distributed random indexes using the Knuth-Fisher-Yates algorithm (with a slight modification). You can cache the resulting array, or perform any number of optimizations depending on the rest of your implementation.
private static int[] BuildShuffledIndexArray( int size ) {
int[] array = new int[size];
Random rand = new Random();
for ( int currentIndex = array.Length - 1; currentIndex > 0; currentIndex-- ) {
int nextIndex = rand.Next( currentIndex + 1 );
Swap( array, currentIndex, nextIndex );
}
return array;
}
private static void Swap( IList<int> array, int firstIndex, int secondIndex ) {
if ( array[firstIndex] == 0 ) {
array[firstIndex] = firstIndex;
}
if ( array[secondIndex] == 0 ) {
array[secondIndex] = secondIndex;
}
int temp = array[secondIndex];
array[secondIndex] = array[firstIndex];
array[firstIndex] = temp;
}
NOTE: You can use ushort instead of int to half the size in memory as long as you don't have more than 65,535 items in your playlist. You could always programmatically switch to int if the size exceeds ushort.MaxValue. If I, personally, added more than 65K items to a playlist, I wouldn't be shocked by increased memory utilization.
Remember, too, that this is a managed language. The VM will always reserve more memory than you are using to limit the number of times it needs to ask the OS for more RAM and to limit fragmentation.
EDIT
Okay, last try: we can look to tweak the performance/memory trade off: You could create your list of integers, then write it to disk. Then just keep a pointer to the offset in the file. Then every time you need a new number, you just have disk I/O to deal with. Perhaps you can find some balance here, and just read N-sized blocks of data into memory where N is some number you're comfortable with.
Seems like a lot of work for a shuffle algorithm, but if you're dead-set on conserving memory, then at least it's an option.
If you use a maximal linear feedback shift register, you will use O(1) of memory and roughly O(1) time. See here for a handy C implementation (two lines! woo-hoo!) and tables of feedback terms to use.
And here is a solution:
public class MaximalLFSR
{
private int GetFeedbackSize(uint v)
{
uint r = 0;
while ((v >>= 1) != 0)
{
r++;
}
if (r < 4)
r = 4;
return (int)r;
}
static uint[] _feedback = new uint[] {
0x9, 0x17, 0x30, 0x44, 0x8e,
0x108, 0x20d, 0x402, 0x829, 0x1013, 0x203d, 0x4001, 0x801f,
0x1002a, 0x2018b, 0x400e3, 0x801e1, 0x10011e, 0x2002cc, 0x400079, 0x80035e,
0x1000160, 0x20001e4, 0x4000203, 0x8000100, 0x10000235, 0x2000027d, 0x4000016f, 0x80000478
};
private uint GetFeedbackTerm(int bits)
{
if (bits < 4 || bits >= 28)
throw new ArgumentOutOfRangeException("bits");
return _feedback[bits];
}
public IEnumerable<int> RandomIndexes(int count)
{
if (count < 0)
throw new ArgumentOutOfRangeException("count");
int bitsForFeedback = GetFeedbackSize((uint)count);
Random r = new Random();
uint i = (uint)(r.Next(1, count - 1));
uint feedback = GetFeedbackTerm(bitsForFeedback);
int valuesReturned = 0;
while (valuesReturned < count)
{
if ((i & 1) != 0)
{
i = (i >> 1) ^ feedback;
}
else {
i = (i >> 1);
}
if (i <= count)
{
valuesReturned++;
yield return (int)(i-1);
}
}
}
}
Now, I selected the feedback terms (badly) at random from the link above. You could also implement a version that had multiple maximal terms and you select one of those at random, but you know what? This is pretty dang good for what you want.
Here is test code:
static void Main(string[] args)
{
while (true)
{
Console.Write("Enter a count: ");
string s = Console.ReadLine();
int count;
if (Int32.TryParse(s, out count))
{
MaximalLFSR lfsr = new MaximalLFSR();
foreach (int i in lfsr.RandomIndexes(count))
{
Console.Write(i + ", ");
}
}
Console.WriteLine("Done.");
}
}
Be aware that maximal LFSR's never generate 0. I've hacked around this by returning the i term - 1. This works well enough. Also, since you want to guarantee uniqueness, I ignore anything out of range - the LFSR only generates sequences up to powers of two, so in high ranges, it will generate wost case 2x-1 too many values. These will get skipped - that will still be faster than FYK.
Personally, for a music player, I wouldn't generate a shuffled list, and then play that, then generate another shuffled list when that runs out, but do something more like:
IEnumerable<Song> GetSongOrder(List<Song> allSongs)
{
var playOrder = new List<Song>();
while (true)
{
// this step assigns an integer weight to each song,
// corresponding to how likely it is to be played next.
// in a better implementation, this would look at the total number of
// songs as well, and provide a smoother ramp up/down.
var weights = allSongs.Select(x => playOrder.LastIndexOf(x) > playOrder.Length - 10 ? 50 : 1);
int position = random.Next(weights.Sum());
foreach (int i in Enumerable.Range(allSongs.Length))
{
position -= weights[i];
if (position < 0)
{
var song = allSongs[i];
playOrder.Add(song);
yield return song;
break;
}
}
// trim playOrder to prevent infinite memory here as well.
if (playOrder.Length > allSongs.Length * 10)
playOrder = playOrder.Skip(allSongs.Length * 8).ToList();
}
}
This would make songs picked in order, as long as they haven't been recently played. This provides "smoother" transitions from the end of one shuffle to the next, because the first song of the next shuffle could be the same song as the last shuffle with 1/(total songs) probability, whereas this algorithm has a lower (and configurable) chance of hearing one of the last x songs again.
Unless you shuffle the original song list (which you said you don't want to do), you are going to have to allocate some additional memory to accomplish what you're after.
If you generate the random permutation of song indices beforehand (as you are doing), you obviously have to allocate some non-trivial amount of memory to store it, either encoded or as a list.
If the user doesn't need to be able to see the list, you could generate the random song order on the fly: After each song, pick another random song from the pool of unplayed songs. You still have to keep track of which songs have already been played, but you can use a bitfield for that. If you have 10000 songs, you just need 10000 bits (1250 bytes), each one representing whether the song has been played yet.
I don't know your exact limitations, but I have to wonder if the memory required to store a playlist is significant compared to the amount required for playing audio.
There are a number of methods of generating permutations without needing to store the state. See this question.
I think you should stick to your current solution (the one in your edit).
To do a re-order with no repetitions & not making your code behave unreliable, you have to track what you have already used / like by keeping unused indexes or indirectly by swapping from the original list.
I suggest to check it in the context of the working application i.e. if its of any significance vs. the memory used by other pieces of the system.
From a logical standpoint, it is possible. Given a list of n songs, there are n! permutations; if you assign each permutation a number from 1 to n! (or 0 to n!-1 :-D) and pick one of those numbers at random, you can then store the number of the permutation that you are currently using, along with the original list and the index of the current song within the permutation.
For example, if you have a list of songs {1, 2, 3}, your permutations are:
0: {1, 2, 3}
1: {1, 3, 2}
2: {2, 1, 3}
3: {2, 3, 1}
4: {3, 1, 2}
5: {3, 2, 1}
So the only data I need to track is the original list ({1, 2, 3}), the current song index (e.g. 1) and the index of the permutation (e.g. 3). Then, if I want to find the next song to play, I know it's third (2, but zero-based) song of permutation 3, e.g. Song 1.
However, this method relies on you having an efficient means of determining the ith song of the jth permutation, which until I've had chance to think (or someone with a stronger mathematical background than I can interject) is equivalent to "then a miracle happens". But the principle is there.
If memory was really a concern after a certain number of records and it's safe to say that if that memory boundary is reached, there's enough items in the list to not matter if there are some repeats, just as long as the same song was not repeated twice, I would use a combination method.
Case 1: If count < max memory constraint, generate the playlist ahead of time and use Knuth shuffle (see Jon Skeet's implementation, mentioned in other answers).
Case 2: If count >= max memory constraint, the song to be played will be determined at run time (I'd do it as soon as the song starts playing so the next song is already generated by the time the current song ends). Save the last [max memory constraint, or some token value] number of songs played, generate a random number (R) between 1 and song count, and if R = one of X last songs played, generate a new R until it is not in the list. Play that song.
Your max memory constraints will always be upheld, although performance can suffer in case 2 if you've played a lot of songs/get repeat random numbers frequently by chance.
you could use a trick we do in sql server to order sets in random like this with the use of guid. the values are always distributed equaly random.
private IEnumerable<int> RandomIndexes(int startIndexInclusive, int endIndexInclusive)
{
if (endIndexInclusive < startIndexInclusive)
throw new Exception("endIndex must be equal or higher than startIndex");
List<int> originalList = new List<int>(endIndexInclusive - startIndexInclusive);
for (int i = startIndexInclusive; i <= endIndexInclusive; i++)
originalList.Add(i);
return from i in originalList
orderby Guid.NewGuid()
select i;
}
You're going to have to allocate some memory, but it doesn't have to be a lot. You can reduce the memory footprint (the degree by which I'm unsure, as I don't know that much about the guts of C#) by using a bool array instead of int. Best case scenario this will only use (count / 8) bytes of memory, which isn't too bad (but I doubt C# actually represents bools as single bits).
public static IEnumerable<int> RandomIndexes(int count) {
Random rand = new Random();
bool[] used = new bool[count];
int i;
for (int counter = 0; counter < count; counter++) {
while (used[i = rand.Next(count)]); //i = some random unused value
used[i] = true;
yield return i;
}
}
Hope that helps!
As many others have said you should implement THEN optimize, and only optimize the parts that need it (which you check on with a profiler). I offer a (hopefully) elegant method of getting the list you need, which doesn't really care so much about performance:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Test
{
class Program
{
static void Main(string[] a)
{
Random random = new Random();
List<int> list1 = new List<int>(); //source list
List<int> list2 = new List<int>();
list2 = random.SequenceWhile((i) =>
{
if (list2.Contains(i))
{
return false;
}
list2.Add(i);
return true;
},
() => list2.Count == list1.Count,
list1.Count).ToList();
}
}
public static class RandomExtensions
{
public static IEnumerable<int> SequenceWhile(
this Random random,
Func<int, bool> shouldSkip,
Func<bool> continuationCondition,
int maxValue)
{
int current = random.Next(maxValue);
while (continuationCondition())
{
if (!shouldSkip(current))
{
yield return current;
}
current = random.Next(maxValue);
}
}
}
}
It is pretty much impossible to do it without allocating extra memory. If you're worried about the amount of extra memory allocated, you could always pick a random subset and shuffle between those. You'll get repeats before every song is played, but with a sufficiently large subset I'll warrant few people will notice.
const int MaxItemsToShuffle = 20;
public static IEnumerable<int> RandomIndexes(int count)
{
Random random = new Random();
int indexCount = Math.Min(count, MaxItemsToShuffle);
int[] indexes = new int[indexCount];
if (count > MaxItemsToShuffle)
{
int cur = 0, subsetCount = MaxItemsToShuffle;
for (int i = 0; i < count; i += 1)
{
if (random.NextDouble() <= ((float)subsetCount / (float)(count - i + 1)))
{
indexes[cur] = i;
cur += 1;
subsetCount -= 1;
}
}
}
else
{
for (int i = 0; i < count; i += 1)
{
indexes[i] = i;
}
}
for (int i = indexCount; i > 0; i -= 1)
{
int curIndex = random.Next(0, i);
yield return indexes[curIndex];
indexes[curIndex] = indexes[i - 1];
}
}

Categories