How to use Shuffling to improve rng?

How to use Shuffling to improve rng? - c#

Following a pseudo code implementation of Bays & Durham I'm trying to sort a random sequence of numbers using the inbuilt random in c#.
The pseudo code I'm following is:
An array of random numbers (V) is generated
A random number is generated (Y) - the seed should be the last number in the array
Generate a random Index (y) using the following formula: (K*Y) / m
Where:
K - Size of the array
Y - The random number generated
M - The modulus used in the random number generator used to populate the array
The element at position Y in the array i.e. V[y] is returned
The element at position Y to in the array is set to Y itself i.e. V[y] = Y
int[] seq = GetRand(size, min, max);
int nextRand = cRNG.Next(seq[seq.Length-1]);
//int index = (seq.Length * nextRand) / [Add Modulus];
return seq;
The first couple steps I could follow. Now, I need to return a shuffled array so the pseudo code needs to be modified a little.
Few pointers on above code:
cRNG -> The instance name of Random
GetRand(...) -> Has a forloop using cRNG
Now what I'm not understanding are the following:
1) Given I'm using the inbuilt random how can I get the modulus used to populate the array since it's simply a for loop using cRNG.Next(min, max)
2) The last step I don't fully understand it
Any help would be quite appreciated!
[Edit]
Following Samuel Vidal solution would this work? (Assuming I have to do it all in a single method)
public static int[] Homework(int size, int min, int max)
{
var seq = GetRand(size, min, max);
nextRand = cRNG.Next(min, max);
var index = (int) ((seq.Length * (double) (nextRand - min)) / (max - min));
nextRand = seq[index];
seq[index] = cRNG.Next(min, max);
return sequence; // nextRand that should be returned instead.
// Made it return the array because it should return the newly shuffled array hence the for loop
}

I think what you want is the following,
var index = (int) ((sequence.Length * (double)(randNum - min)) / (max - min));
But I suggest that you fill the array with unaltered random numbers from the source, then that you map the output of you random number generator to the interval [min, max[ as a overloading method Next(int min, int max) that calls the output of your Bays-Durham shuffle process.
public class BaysDurham
{
private readonly int[] t;
private int y; // auxiliary variable
// Knuth TAOCP 2 p. 34 Algorithm B
public BaysDurham(int k)
{
t = new int[k];
for (int i = 0; i < k; i++)
{
t[i] = rand.Next();
}
y = rand.Next();
}
public int Next()
{
var i = (int)(((long) t.Length * y) / int.MaxValue);
y = t[i];
t[i] = rand.Next();
return y;
}
public int Next(int min, int max)
{
long m = max - min;
return min + (int) ((Next() * m) / int.MaxValue);
}
private readonly Random rand = new Random();
}
Generally, it is a great software design principle to have separation of concerns.
Test code:
[Test]
public void Test()
{
const int k = 30;
const int n = 1000 * 1000;
const int min = 10;
const int max = 27;
var rng = new BaysDurham(k);
var count = new int[max];
for (int i = 0; i < n; i++)
{
var index = rng.Next(min, max);
count[index] ++;
}
Console.WriteLine($"Expected : {1.0 / (max - min):F4}, Actual :");
for (int i = min; i < max; i++)
{
Console.WriteLine($"{i} : {count[i] / (double) (n):F4}");
}
}
Result:
Expected : 0.0588, Actual :
10 : 0.0584
11 : 0.0588
12 : 0.0588
13 : 0.0587
14 : 0.0587
15 : 0.0589
16 : 0.0590
17 : 0.0592
18 : 0.0586
19 : 0.0590
20 : 0.0586
21 : 0.0588
22 : 0.0586
23 : 0.0587
24 : 0.0590
25 : 0.0594
26 : 0.0590

Related

Algorithm converting lotto ticket number to integer value and back again

I'm looking for the algorithm to convert a lotto ticket number to an integer value an back again.
Let's say the lotto number can be between 1 and 45 and a tickets contains 6 unique numbers. This means there are a maximum of 8145060 unique lotto tickets.
eg:
01-02-03-04-05-06 = 1
01-02-03-04-05-07 = 2
.
.
.
39-41-42-43-44-45 = 8145059
40-41-42-43-44-45 = 8145060
I'd like to have a function (C# preferable but any language will do) which converts between a lotto ticket and an integer and back again. At the moment I use the quick and dirty method of pre-calculating everything, which needs a lot of memory.

For enumerating integer combinations, you need to use the combinatorial number system. Here's a basic implementation in C#:
using System;
using System.Numerics;
using System.Collections.Generic;
public class CombinatorialNumberSystem
{
// Helper functions for calculating values of (n choose k).
// These are not optimally coded!
// ----------------------------------------------------------------------
protected static BigInteger factorial(int n) {
BigInteger f = 1;
while (n > 1) f *= n--;
return f;
}
protected static int binomial(int n, int k) {
if (k > n) return 0;
return (int)(factorial(n) / (factorial(k) * factorial(n-k)));
}
// In the combinatorial number system, a combination {c_1, c_2, ..., c_k}
// corresponds to the integer value obtained by adding (c_1 choose 1) +
// (c_2 choose 2) + ... + (c_k choose k)
// NOTE: combination values are assumed to start from zero, so
// a combination like {1, 2, 3, 4, 5} will give a non-zero result
// ----------------------------------------------------------------------
public static int combination_2_index(int[] combo) {
int ix = 0, i = 1;
Array.Sort(combo);
foreach (int c in combo) {
if (c > 0) ix += binomial(c, i);
i++;
}
return ix;
}
// The reverse of this process is a bit fiddly. See Wikipedia for an
// explanation: https://en.wikipedia.org/wiki/Combinatorial_number_system
// ----------------------------------------------------------------------
public static int[] index_2_combination(int ix, int k) {
List<int> combo_list = new List<int>();
while (k >= 1) {
int n = k - 1;
if (ix == 0) {
combo_list.Add(n);
k--;
continue;
}
int b = 0;
while (true) {
// (Using a linear search here, but a binary search with
// precomputed binomial values would be faster)
int b0 = b;
b = binomial(n, k);
if (b > ix || ix == 0) {
ix -= b0;
combo_list.Add(n-1);
break;
}
n++;
}
k--;
}
int[] combo = combo_list.ToArray();
Array.Sort(combo);
return combo;
}
}
The calculations are simpler if you work with combinations of integers that start from zero, so for example:
00-01-02-03-04-05 = 0
00-01-02-03-04-06 = 1
.
.
.
38-40-41-42-43-44 = 8145058
39-40-41-42-43-44 = 8145059
You can play around with this code at ideone if you like.

there seem to be actually 45^6 distinct numbers, a simple way is to treat the ticket number as a base-45 number and convert it to base 10:
static ulong toDec(string input){
ulong output = 0;
var lst = input.Split('-').ToList();
for (int ix =0; ix< lst.Count; ix++)
{
output = output + ( (ulong.Parse(lst[ix])-1) *(ulong) Math.Pow(45 , 5-ix));
}
return output;
}
examples:
01-01-01-01-01-01 => 0
01-01-01-01-01-02 => 1
01-01-01-01-02-01 => 45
45-45-45-45-45-45 => 8303765624

Get n distinct random numbers between two values whose sum is equal to a given number

I would like to find distinct random numbers within a range that sums up to given number.
Note: I found similar questions in stackoverflow, however they do not address exactly this problem (ie they do not consider a negative lowerLimit for the range).
If I wanted that the sum of my random number was equal to 1 I just generate the required random numbers, compute the sum and divided each of them by the sum; however here I need something a bit different; I will need my random numbers to add up to something different than 1 and still my random numbers must be within a given range.
Example: I need 30 distinct random numbers (non integers) between -50 and 50 where the sum of the 30 generated numbers must be equal to 300; I wrote the code below, however it will not work when n is much larger than the range (upperLimit - lowerLimit), the function could return numbers outside the range [lowerLimit - upperLimit]. Any help to improve the current solution?
static void Main(string[] args)
{
var listWeights = GetRandomNumbersWithConstraints(30, 50, -50, 300);
}
private static List<double> GetRandomNumbersWithConstraints(int n, int upperLimit, int lowerLimit, int sum)
{
if (upperLimit <= lowerLimit || n < 1)
throw new ArgumentOutOfRangeException();
Random rand = new Random(Guid.NewGuid().GetHashCode());
List<double> weight = new List<double>();
for (int k = 0; k < n; k++)
{
//multiply by rand.NextDouble() to avoid duplicates
double temp = (double)rand.Next(lowerLimit, upperLimit) * rand.NextDouble();
if (weight.Contains(temp))
k--;
else
weight.Add(temp);
}
//divide each element by the sum
weight = weight.ConvertAll<double>(x => x / weight.Sum()); //here the sum of my weight will be 1
return weight.ConvertAll<double>(x => x * sum);
}
EDIT - to clarify
Running the current code will generate the following 30 numbers that add up to 300. However those numbers are not within -50 and 50
-4.425315699
67.70219958
82.08592061
46.54014109
71.20352208
-9.554070146
37.65032717
-75.77280868
24.68786878
30.89874589
142.0796933
-1.964407284
9.831226893
-15.21652248
6.479463312
49.61283063
118.1853036
-28.35462683
49.82661159
-65.82706541
-29.6865969
-54.5134262
-56.04708803
-84.63783048
-3.18402453
-13.97935982
-44.54265204
112.774348
-2.911427266
-58.94098071

Ok, here how it could be done
We will use Dirichlet Distribution, which is distribution for random numbers xi in the range [0...1] such that
Sumi xi = 1
So, after linear rescaling condition for sum would be satisfied automatically. Dirichlet distribution is parametrized by αi, but we assume all RN to be from the same marginal distribution, so there is only one parameter α for each and every index.
For reasonable large value of α, mean value of sampled random numbers would be =1/n, and variance ~1/(n * α), so larger α lead to random value more close to the mean.
Ok, now back to rescaling,
vi = A + B*xi
And we have to get A and B. As #HansKeﬆing rightfully noted, with only two free parameters we could satisfy only two constraints, but you have three. So we would strictly satisfy low bound constraint, sum value constraint, but occasionally violate upper bound constraint. In such case we just throw whole sample away and do another one.
Again, we have a knob to turn, α getting larger means we are close to mean values and less likely to hit upper bound. With α = 1 I'm rarely getting any good sample, but with α = 10 I'm getting close to 40% of good samples. With α = 16 I'm getting close to 80% of good samples.
Dirichlet sampling is done via Gamma distribution, using code from MathDotNet.
Code, tested with .NET Core 2.1
using System;
using MathNet.Numerics.Distributions;
using MathNet.Numerics.Random;
class Program
{
static void SampleDirichlet(double alpha, double[] rn)
{
if (rn == null)
throw new ArgumentException("SampleDirichlet:: Results placeholder is null");
if (alpha <= 0.0)
throw new ArgumentException($"SampleDirichlet:: alpha {alpha} is non-positive");
int n = rn.Length;
if (n == 0)
throw new ArgumentException("SampleDirichlet:: Results placeholder is of zero size");
var gamma = new Gamma(alpha, 1.0);
double sum = 0.0;
for(int k = 0; k != n; ++k) {
double v = gamma.Sample();
sum += v;
rn[k] = v;
}
if (sum <= 0.0)
throw new ApplicationException($"SampleDirichlet:: sum {sum} is non-positive");
// normalize
sum = 1.0 / sum;
for(int k = 0; k != n; ++k) {
rn[k] *= sum;
}
}
static bool SampleBoundedDirichlet(double alpha, double sum, double lo, double hi, double[] rn)
{
if (rn == null)
throw new ArgumentException("SampleDirichlet:: Results placeholder is null");
if (alpha <= 0.0)
throw new ArgumentException($"SampleDirichlet:: alpha {alpha} is non-positive");
if (lo >= hi)
throw new ArgumentException($"SampleDirichlet:: low {lo} is larger than high {hi}");
int n = rn.Length;
if (n == 0)
throw new ArgumentException("SampleDirichlet:: Results placeholder is of zero size");
double mean = sum / (double)n;
if (mean < lo || mean > hi)
throw new ArgumentException($"SampleDirichlet:: mean value {mean} is not within [{lo}...{hi}] range");
SampleDirichlet(alpha, rn);
bool rc = true;
for(int k = 0; k != n; ++k) {
double v = lo + (mean - lo)*(double)n * rn[k];
if (v > hi)
rc = false;
rn[k] = v;
}
return rc;
}
static void Main(string[] args)
{
double[] rn = new double [30];
double lo = -50.0;
double hi = 50.0;
double alpha = 10.0;
double sum = 300.0;
for(int k = 0; k != 1_000; ++k) {
var q = SampleBoundedDirichlet(alpha, sum, lo, hi, rn);
Console.WriteLine($"Rng(BD), v = {q}");
double s = 0.0;
foreach(var r in rn) {
Console.WriteLine($"Rng(BD), r = {r}");
s += r;
}
Console.WriteLine($"Rng(BD), summa = {s}");
}
}
}
UPDATE
Usually, when people ask such question, there is an implicit assumption/requirement - all random numbers shall be distribution in the same way. It means that if I draw marginal probability density function (PDF) for item indexed 0 from the sampled array, I shall get the same distribution as I draw marginal probability density function for the last item in the array. People usually sample random arrays to pass it down to other routines to do some interesting stuff. If marginal PDF for item 0 is different from marginal PDF for last indexed item, then just reverting array will produce wildly different result with the code which uses such random values.
Here I plotted distributions of random numbers for item 0 and last item (#29) for original conditions([-50...50] sum=300), using my sampling routine. Look similar, isn't it?
Ok, here is a picture from your sampling routine, same original conditions([-50...50] sum=300), same number of samples
UPDATE II
User supposed to check return value of the sampling routine and accept and use sampled array if (and only if) return value is true. This is acceptance/rejection method. As an illustration, below is code used to histogram samples:
int[] hh = new int[100]; // histogram allocated
var s = 1.0; // step size
int k = 0; // good samples counter
for( ;; ) {
var q = SampleBoundedDirichlet(alpha, sum, lo, hi, rn);
if (q) // good sample, accept it
{
var v = rn[0]; // any index, 0 or 29 or ....
var i = (int)((v - lo) / s);
i = System.Math.Max(i, 0);
i = System.Math.Min(i, hh.Length-1);
hh[i] += 1;
++k;
if (k == 100000) // required number of good samples reached
break;
}
}
for(k = 0; k != hh.Length; ++k)
{
var x = lo + (double)k * s + 0.5*s;
var v = hh[k];
Console.WriteLine($"{x} {v}");
}

Here you go. It'll probably run for centuries before actually returning the list, but it'll comply :)
public List<double> TheThing(int qty, double lowest, double highest, double sumto)
{
if (highest * qty < sumto)
{
throw new Exception("Impossibru!");
// heresy
highest = sumto / 1 + (qty * 2);
lowest = -highest;
}
double rangesize = (highest - lowest);
Random r = new Random();
List<double> ret = new List<double>();
while (ret.Sum() != sumto)
{
if (ret.Count > 0)
ret.RemoveAt(0);
while (ret.Count < qty)
ret.Add((r.NextDouble() * rangesize) + lowest);
}
return ret;
}

I come up with this solution which is fast. I am sure it couldbe improved, but for the moment it does the job.
n = the number of random numbers that I will need to find
Constraints
the n random numbers must add up to finalSum the n random numbers
the n random numbers must be within lowerLimit and upperLimit
The idea is to remove from the initial list (that sums up to finalSum) of random numbers the numbers outside the range [lowerLimit, upperLimit].
Then count the number left of the list (called nValid) and their sum (called sumOfValid).
Now, iteratively search for (n-nValid) random numbers within the range [lowerLimit, upperLimit] whose sum is (finalSum-sumOfValid)
I tested it with several combinations for the inputs variables (including negative sum) and the results looks good.
static void Main(string[] args)
{
int n = 100;
int max = 5000;
int min = -500000;
double finalSum = -1000;
for (int i = 0; i < 5000; i++)
{
var listWeights = GetRandomNumbersWithConstraints(n, max, min, finalSum);
Console.WriteLine("=============");
Console.WriteLine("sum = " + listWeights.Sum());
Console.WriteLine("max = " + listWeights.Max());
Console.WriteLine("min = " + listWeights.Min());
Console.WriteLine("count = " + listWeights.Count());
}
}
private static List<double> GetRandomNumbersWithConstraints(int n, int upperLimit, int lowerLimit, double finalSum, int precision = 6)
{
if (upperLimit <= lowerLimit || n < 1) //todo improve here
throw new ArgumentOutOfRangeException();
Random rand = new Random(Guid.NewGuid().GetHashCode());
List<double> randomNumbers = new List<double>();
int adj = (int)Math.Pow(10, precision);
bool flag = true;
List<double> weights = new List<double>();
while (flag)
{
foreach (var d in randomNumbers.Where(x => x <= upperLimit && x >= lowerLimit).ToList())
{
if (!weights.Contains(d)) //only distinct
weights.Add(d);
}
if (weights.Count() == n && weights.Max() <= upperLimit && weights.Min() >= lowerLimit && Math.Round(weights.Sum(), precision) == finalSum)
return weights;
/* worst case - if the largest sum of the missing elements (ie we still need to find 3 elements,
* then the largest sum is 3*upperlimit) is smaller than (finalSum - sumOfValid)
*/
if (((n - weights.Count()) * upperLimit < (finalSum - weights.Sum())) ||
((n - weights.Count()) * lowerLimit > (finalSum - weights.Sum())))
{
weights = weights.Where(x => x != weights.Max()).ToList();
weights = weights.Where(x => x != weights.Min()).ToList();
}
int nValid = weights.Count();
double sumOfValid = weights.Sum();
int numberToSearch = n - nValid;
double sum = finalSum - sumOfValid;
double j = finalSum - weights.Sum();
if (numberToSearch == 1 && (j <= upperLimit || j >= lowerLimit))
{
weights.Add(finalSum - weights.Sum());
}
else
{
randomNumbers.Clear();
int min = lowerLimit;
int max = upperLimit;
for (int k = 0; k < numberToSearch; k++)
{
randomNumbers.Add((double)rand.Next(min * adj, max * adj) / adj);
}
if (sum != 0 && randomNumbers.Sum() != 0)
randomNumbers = randomNumbers.ConvertAll<double>(x => x * sum / randomNumbers.Sum());
}
}
return randomNumbers;
}

How to generate random 5 digit number depend on user summary

Hi guy i try to generate 50 number with 5 digit depend on user total summary. For example, User give 500000 and then i need to random number with 5 digit by 50 number equal 500000
i try this but it isn't 5 digit number
int balane = 500000;
int nums = 50;
int max = balane / nums;
Random rand = new Random();
int newNum = 0;
int[] ar = new int[nums];
for (int i = 0; i < nums - 1; i++)
{
newNum = rand.Next(0, max);
ar[i] = newNum;
balane -= newNum;
max = balane / (nums - i - 1);
ar[nums - 1] = balane;
}
int check = 0;
foreach (int x in ar)
{
check += x;
}
i tried already but value inside my array have negative value i want to get only
positive value
Please help me, how to solve this and thank for advance.

I once asked a similar question on codereview.stackexchange.com. I have modified my answer to produce a five digit sequence for you.
Furthermore, this code is fast enough to be used to create tens of thousands of codes in a single request. If you look at the initial question and answer (linked to below) you will find that it checks to see whether the code has been used or not prior to inserting it. Thus, the codes are unique.
void Main()
{
Console.WriteLine(GenerateCode(CodeLength));
}
private const int CodeLength = 10;
// since Random does not make any guarantees of thread-safety, use different Random instances per thread
private static readonly ThreadLocal<Random> _random = new ThreadLocal<Random>(() => new Random());
// Define other methods and classes here
private static string GenerateCode(int numberOfCharsToGenerate)
{
char[] chars = "0123456789".ToCharArray();
var sb = new StringBuilder();
for (int i = 0; i < numberOfCharsToGenerate; i++)
{
int num = _random.Value.Next(0, chars.Length);
sb.Append(chars[num]);
}
return sb.ToString();
}
Original question and answer: https://codereview.stackexchange.com/questions/142049/creating-a-random-code-and-saving-if-it-does-not-exist/142056#142056

Perhaps try this:
var rnd = new Random();
var numbers = Enumerable.Range(0, 50).Select(x => rnd.Next(500_000)).OrderBy(x => x).ToArray();
numbers = numbers.Skip(1).Zip(numbers, (x1, x0) => x1 - x0).ToArray();
numbers = numbers.Append(500_000 - numbers.Sum()).ToArray();
Console.WriteLine(numbers.Count());
Console.WriteLine(numbers.Sum());
This outputs:
50
500000
This works by generating 50 random numbers between 0 and 499,999 inclusively. It then sorts them ascendingly and then gets the difference between each successive pair. This by definition produces a set of 49 values that almost adds up to 500,000. It's then just a matter of adding the one missing number by doing 500_000 - numbers.Sum().

Compare current and last value in list

There is a moving average suppose: 2, 4, 6 , 8 , 10...n;
Then add the current value (10) to list
List<int>numHold = new List<int>();
numhold.Add(currentvalue);
Inside the list:
the current value is added
10
and so on
20
30
40 etc
by using
var lastdigit = numHold[numhold.Count -1];
I can get the last digit but the output is
current: 10 last: 10
current: 20 last: 20
the output should be
current: 20 last: 10
Thanks

Typically, C# indexers start from 0, so the first element has index 0. On the other hand, Count/Length will use 1 for one element. So your
numHold[numhold.Count - 1]
actually takes the last element in the list. If you need the one before that, you need to use - 2 - though be careful you do not reach outside of the bounds of the list (something like Math.Max(0, numhold.Count - 2) might be appropriate).

You can also store the values in separate variables:
List<int> nums = new List<int> { 1 };
int current = 1;
int last = current;
for (int i = 0; i < 10; i++)
{
last = current;
current = i * 2;
nums.Add(current);
}
Console.WriteLine("Current: {0}", current);
Console.WriteLine("Last: {0}", last);

Question is so unclear, but if ur using moving average to draw a line graph 📈 you would use a circular buffer which can be implemented by urself utilizing an object that contains an array of specified size, and the next available position. You could also download a nuget package that already has it done.

A relatively simple way to calculate a moving average is to use a circular buffer to hold the last N values (where N is the number of values for which to compute a moving average).
For example:
public sealed class MovingAverage
{
private readonly int _max;
private readonly double[] _numbers;
private double _total;
private int _front;
private int _count;
public MovingAverage(int max)
{
_max = max;
_numbers = new double[max];
}
public double Average
{
get { return _total / _count; }
}
public void Add(double value)
{
_total += value;
if (_count == _max)
_total -= _numbers[_front];
else
++_count;
_numbers[_front] = value;
_front = (_front+1)%_max;
}
};
which you might use like this:
var test = new MovingAverage(11);
for (int i = 0; i < 25; ++i)
{
test.Add(i);
Console.WriteLine(test.Average);
}
Note that this code is optimised for speed. After a large number of iterations, you might start to get rounding errors. You can avoid this by adding to class MovingAverage a slower method to calculate the average instead of using the Average property:
public double AccurateAverage()
{
double total = 0;
for (int i = 0, j = _front; i < _count; ++i)
{
total += _numbers[j];
if (--j < 0)
j = _max - 1;
}
return total/_count;
}

Your last item will always be at position 0.
List<int>numHold = new List<int>();
numHold.add(currentvalue); //Adding 10
numHold[0]; // will contain 10
numHold.add(currentvalue); //Adding 20
numHold[0]; // will contain 10
numHold[numhold.Count - 1]; // will contain 20
the better way to get first and last are
numHold.first(); //Actually last in your case
numHold.last(); //first in your case

Looking for a Histogram Binning algorithm for decimal data

I need to generate bins for the purposes of calculating a histogram. Language is C#. Basically I need to take in an array of decimal numbers and generate a histogram plot out of those.
Haven't been able to find a decent library to do this outright so now I'm just looking for either a library or an algorithm to help me do the binning of the data.
So...
Are there any C# libraries out there that will take in an array of decimal data and output a binned histogram?
Is there generic algorithm for building the bins to be used in generated a histogram?

Here is a simple bucket function I use. Sadly, .NET generics doesn't support a numerical type contraint so you will have to implement a different version of the following function for decimal, int, double, etc.
public static List<int> Bucketize(this IEnumerable<decimal> source, int totalBuckets)
{
var min = source.Min();
var max = source.Max();
var buckets = new List<int>();
var bucketSize = (max - min) / totalBuckets;
foreach (var value in source)
{
int bucketIndex = 0;
if (bucketSize > 0.0)
{
bucketIndex = (int)((value - min) / bucketSize);
if (bucketIndex == totalBuckets)
{
bucketIndex--;
}
}
buckets[bucketIndex]++;
}
return buckets;
}

I got odd results using #JakePearson accepted answer. It has to do with an edge case.
Here is the code I used to test his method. I changed the extension method ever so slightly, returning an int[] and accepting double instead of decimal.
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
Random rand = new Random(1325165);
int maxValue = 100;
int numberOfBuckets = 100;
List<double> values = new List<double>();
for (int i = 0; i < 10000000; i++)
{
double value = rand.NextDouble() * (maxValue+1);
values.Add(value);
}
int[] bins = values.Bucketize(numberOfBuckets);
PointPairList points = new PointPairList();
for (int i = 0; i < numberOfBuckets; i++)
{
points.Add(i, bins[i]);
}
zedGraphControl1.GraphPane.AddBar("Random Points", points,Color.Black);
zedGraphControl1.GraphPane.YAxis.Title.Text = "Count";
zedGraphControl1.GraphPane.XAxis.Title.Text = "Value";
zedGraphControl1.AxisChange();
zedGraphControl1.Refresh();
}
}
public static class Extension
{
public static int[] Bucketize(this IEnumerable<double> source, int totalBuckets)
{
var min = source.Min();
var max = source.Max();
var buckets = new int[totalBuckets];
var bucketSize = (max - min) / totalBuckets;
foreach (var value in source)
{
int bucketIndex = 0;
if (bucketSize > 0.0)
{
bucketIndex = (int)((value - min) / bucketSize);
if (bucketIndex == totalBuckets)
{
bucketIndex--;
}
}
buckets[bucketIndex]++;
}
return buckets;
}
}
Everything works well when using 10,000,000 random double values between 0 and 100 (exclusive). Each bucket has roughly the same number of values, which makes sense given that Random returns a normal distribution.
But when I changed the value generation line from
double value = rand.NextDouble() * (maxValue+1);
to
double value = rand.Next(0, maxValue + 1);
and you get the following result, which double counts the last bucket.
It appears that when a value is same as one of the boundaries of a bucket, the code as it is written puts the value in the incorrect bucket. This artifact doesn't appear to happen with random double values as the chance of a random number being equal to a boundary of a bucket is rare and wouldn't be obvious.
The way I corrected this is to define what side of the bucket boundary is inclusive vs. exclusive.
Think of
0< x <=1 1< x <=2 ... 99< x <=100
vs.
0<= x <1 1<= x <2 ... 99<= x <100
You cannot have both boundaries inclusive, as the method wouldn't know which bucket to put it in if you have a value that is exactly equal to a boundary.
public enum BucketizeDirectionEnum
{
LowerBoundInclusive,
UpperBoundInclusive
}
public static int[] Bucketize(this IList<double> source, int totalBuckets, BucketizeDirectionEnum inclusivity = BucketizeDirectionEnum.UpperBoundInclusive)
{
var min = source.Min();
var max = source.Max();
var buckets = new int[totalBuckets];
var bucketSize = (max - min) / totalBuckets;
if (inclusivity == BucketizeDirectionEnum.LowerBoundInclusive)
{
foreach (var value in source)
{
int bucketIndex = (int)((value - min) / bucketSize);
if (bucketIndex == totalBuckets)
continue;
buckets[bucketIndex]++;
}
}
else
{
foreach (var value in source)
{
int bucketIndex = (int)Math.Ceiling((value - min) / bucketSize) - 1;
if (bucketIndex < 0)
continue;
buckets[bucketIndex]++;
}
}
return buckets;
}
The only issue now is if the input dataset has a lot of min and max values, the binning method will exclude many of those values and the resulting graph will misrepresent the dataset.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to use Shuffling to improve rng? - c#

Related

Algorithm converting lotto ticket number to integer value and back again

Get n distinct random numbers between two values whose sum is equal to a given number

How to generate random 5 digit number depend on user summary

Compare current and last value in list

Looking for a Histogram Binning algorithm for decimal data

Categories

Resources