Performance issues with List - c#

I got 1000 millions of records in my Database for which I need to update all the rows with 20 values randomly.
So,for Every random 50 Million records,1 value need to updated.
So,I thought of Generating a List with 1000 million numbers and select random 50 million records from that list and remove that 50 million records from that list and so on.
My code :
List Creation:
List<long> LstMainList = new List<long>();
for (int i = 1; i <= 999999999; i++)
{
LstMainList.Add(i);
}
New Empty List : List<TableData> Table1 = new List<TableData>();
Selecting Random Numbers and adding them to New List and removing the item from the MainList which contains 1000 million items.
Random rand = new Random();
for (int a = 0; a < 50000000; a++)
{
int lstindex = rand.Next(LstMainList.Count);
Int64 lstData = LstMainList[lstindex];
Table1.Add(new TableData { MESSAGE_ID = lstData });
LstMainList.RemoveAt(lstindex);
if (a % 100000 == 0)
{
if (previousThread != null)
{
previousThread.Join();
}
List<TableData> copyList = Table1.ToList();
previousThread = new Thread(() => BulkCopyList(copyList, "PLAN_TABLE_1"));
previousThread.Start();
Table1.Clear();
}
}
Now,My problem is : At the Line of LstMainList.RemoveAt(lstindex);,it is taking long time to remove the Index from the MainList because it contains 1000 million records.
Is there a way to remove the record from List in a simple way? or any other way to make this simple?

First - use array for ids instead of list (especially without initialized capacity)
int idsCount = 100000000;
long[] ids = new long[idsCount];
for(long i = 1; i < idsCount; i++)
ids[i] = i;
Use Fisher–Yates shuffle to shuffle ids in array
Random rnd = new Random();
int n = idsCount;
while(n > 1)
{
int k = rnd.Next(n);
n--;
long temp = ids[n];
ids[n] = ids[k];
ids[k] = temp;
}
With shuffled ids you don't need to modify ids list. Removing item at random position is very expensive operation. If you remove item at position 0 whole list should be copied to new array. Now you can just iterate ids array.
Or you can use morelinq Batch to create batches of TableData and bulk them:
int size = 100000;
foreach(var batch in ids.Batch(size, id => new TableData { MESSAGE_ID = id }))
{
var copyList = batch.ToList();
// ...
}
UPDATE: Thus you need batches of different size, you can use following extension method to get range of items from array:
public static IEnumerable<T> GetRange<T>(
this T[] array, int startIndex, int count)
{
for (int i = startIndex; i < startIndex + count; i++)
yield return array[i];
}
So, getting 5000 TableData starting from index 20000 will look like:
var copyList = ids.GetRange(20000, 5000)
.Select(id => new TableData { MESSAGE_ID = id })
.ToList();
Of course, more efficient way will be just iterate ids array, and add items to list with pre-initialize capacity:
int size = 5000;
int startIndex = 20000;
List<TableData> copyList = new List<TableData>(size);
for (int i = startIndex; i < startIndex + size; i++)
copyList.Add(new TableData { MESSAGE_ID = ids[i] });
Going further I would move TableData objects creation to thread which does bulk copy. And just passed sequence of ids it should use.

Firstly, here's some advice from Microsoft about selecting rows randonly from a large table.
Secondly, if that's of no use, read on...
If you know the number of items you want to randomly select, and the number of items in a sequence from which you want to randomly select, then there is an O(N) solution.
In the example below, the method RandomlySelectedItems<T>() provides a sequence of the randomly selected items.
Here's the code. (To reiterate, you can only use this if you know in advance the number of items from which you will be selecting):
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo
{
internal static class Program
{
static void Main(string[] args)
{
int numberOfValuesToSelectFrom = 10000000;
int numberOfValuesToSelect = 20;
var valuesToSelectFrom = Enumerable.Range(1, numberOfValuesToSelectFrom);
var selectedValues = RandomlySelectedItems
(
valuesToSelectFrom,
numberOfValuesToSelect,
numberOfValuesToSelectFrom,
new Random()
);
foreach (int value in selectedValues)
Console.WriteLine(value);
}
/// <summary>Randomly selects items from a sequence.</summary>
/// <typeparam name="T">The type of the items in the sequence.</typeparam>
/// <param name="sequence">The sequence from which to randomly select items.</param>
/// <param name="count">The number of items to randomly select from the sequence.</param>
/// <param name="sequenceLength">The number of items in the sequence among which to randomly select.</param>
/// <param name="rng">The random number generator to use.</param>
/// <returns>A sequence of randomly selected items.</returns>
/// <remarks>This is an O(N) algorithm (N is the sequence length).</remarks>
public static IEnumerable<T> RandomlySelectedItems<T>(IEnumerable<T> sequence, int count, int sequenceLength, Random rng)
{
if (sequence == null)
throw new ArgumentNullException("sequence");
if (count < 0 || count > sequenceLength)
throw new ArgumentOutOfRangeException("count", count, "count must be between 0 and sequenceLength");
if (rng == null)
throw new ArgumentNullException("rng");
int available = sequenceLength;
int remaining = count;
var iterator = sequence.GetEnumerator();
for (int current = 0; current < sequenceLength; ++current)
{
iterator.MoveNext();
if (rng.NextDouble() < remaining/(double)available)
{
yield return iterator.Current;
--remaining;
}
--available;
}
}
}
}

One option is to not try to generate truly or even pseudo-random numbers but use a sequence that is only apparently random to a casual observer. This can work in a lot cases however it would not work if the the items need to be chosen randomly to protect from an attacker predicting the next value. The benefit is that you do not need to keep track of all the generated values in memory to shuffle them.
To start, select two random prime numbers (a, b) less than the number of rows (r) you have such that a * b > r and a does not divide r. The mapping f(x) = a * x + b mod r is guaranteed to be one-to-one in the ring Z[r]. We will use that fact to generate a sequence where each value is unique from 0 to r - 1.
Let's pick two random primes, say a = 11268619 and b = 4064861. Then you can generate sequence of "random" numbers in the range 0 to 1e9-1:
private static IEnumerable<int> GenerateSequence()
{
const int max = 1000000000;
const long a = 11268619, b = 4064861;
for(int i = 0; i < max; i++)
{
int c = (int)((a * i + b) % max);
yield return c;
}
}

Related

How to generate unique random integers that do not duplicate

I have created a short program that creates 3 random integers between 1-9 and stores them in an array, however, I would not like any of them to repeat, that is, I would like each to be unique. Is there an easier way to generate 3 unique integers other than having to iterate through the array and comparing each integer to each other? That just seems so tedious if I were to increase my array to beyond 3 integers.
This is my code to generate 3 random numbers. I saw other code in Java, but I thought maybe C# has a easier and more efficient way to do it.
var number = new Numbers[3];
Random r = new Random();
for ( int i = 0; i < number.Length; i++)
{
number[i] = new Numbers(r.Next(1,9));
}
Console.WriteLine("The Three Random Numbers Are:");
foreach(Numbers num in number)
{
Console.WriteLine("{0}", num.Number);
}
I would do something like this:
var range = Enumerable.Range(1, 8);
var rnd = new Random();
var listInts = range.OrderBy(i => rnd.Next()).Take(3).ToList();
You could make an array or a list of the numbers that might be generated, e.g. 0, 1, 2, 3. Then you generate a number from 0 to this list's length, e.g. 2 and pick list[2] so for the next time you only have 0, 1, 3 in your list.
It takes longer to generate it, especially for long lists but it doesn't repeat numbers.
using System;
using System.Collections.Generic;
public class Test
{
static Random random = new Random();
public static List<int> GenerateRandom(int count)
{
// generate count random values.
HashSet<int> candidates = new HashSet<int>();
// top will overflow to Int32.MinValue at the end of the loop
for (Int32 top = Int32.MaxValue - count + 1; top > 0; top++)
{
// May strike a duplicate.
if (!candidates.Add(random.Next(top))) {
candidates.Add(top);
}
}
// load them in to a list.
List<int> result = new List<int>();
result.AddRange(candidates);
// shuffle the results:
int i = result.Count;
while (i > 1)
{
i--;
int k = random.Next(i + 1);
int value = result[k];
result[k] = result[i];
result[i] = value;
}
return result;
}
public static void Main()
{
List<int> vals = GenerateRandom(10);
Console.WriteLine("Result: " + vals.Count);
vals.ForEach(Console.WriteLine);
}
}
Grate explanation and answers from here
Source http://ideone.com/Zjpzdh

How can I pick random objects out of a list with C#?

I have an IQueryable containing more than 300 objects:
public class Detail
{
public int Id { get; set; }
public int CityId { get; set; }
public bool Chosen { get; set; }
}
IQueryable<Detail> details = ...
How can I go against this an at random pick out 50 objects? I assume that I would need to convert this with .ToList() but I am not sure how I could pick out random elements.
300 is not very much, so Yes, make it a List:
IQueryable<Detail> details = ...
IList<Detail> detailList = details.ToList();
And now you can pick a random item :
var randomItem = detailList[rand.Next(detailList.Count)];
and you could repeat that 50 times. That would however lead to duplicates, and the process to eliminate them would become messy.
So use a standard shuffle algorithm and then pick the first 50 :
Shuffle(detailList);
var selection = detailList.Take(50);
If you know in advance the total number of items from which to randomly pick, you can do it without converting to a list first.
The following method will do it for you:
/// <summary>Randomly selects items from a sequence.</summary>
/// <typeparam name="T">The type of the items in the sequence.</typeparam>
/// <param name="sequence">The sequence from which to randomly select items.</param>
/// <param name="count">The number of items to randomly select from the sequence.</param>
/// <param name="sequenceLength">The number of items in the sequence among which to randomly select.</param>
/// <param name="rng">The random number generator to use.</param>
/// <returns>A sequence of randomly selected items.</returns>
/// <remarks>This is an O(N) algorithm (N is the sequence length).</remarks>
public static IEnumerable<T> RandomlySelectedItems<T>(IEnumerable<T> sequence, int count, int sequenceLength, System.Random rng)
{
if (sequence == null)
{
throw new ArgumentNullException("sequence");
}
if (count < 0 || count > sequenceLength)
{
throw new ArgumentOutOfRangeException("count", count, "count must be between 0 and sequenceLength");
}
if (rng == null)
{
throw new ArgumentNullException("rng");
}
int available = sequenceLength;
int remaining = count;
var iterator = sequence.GetEnumerator();
for (int current = 0; current < sequenceLength; ++current)
{
iterator.MoveNext();
if (rng.NextDouble() < remaining/(double)available)
{
yield return iterator.Current;
--remaining;
}
--available;
}
}
(The key thing here is needing to know at the start the number of items to choose from; this does reduce the utility somewhat. But if getting the count is quick and buffering all the items would take too much memory, this is a useful solution.)
Here's another approach which uses Reservoir sampling
This approach DOES NOT need to know the total number of items to choose from, but it does need to buffer the output. Of course, it also needs to enumerate the entire input collection.
Therefore this is really only of use when you don't know in advance the number of items to choose from (or the number of items to choose from is very large).
I would recommend just shuffling a list as per Henk's answer rather than doing it this way, but I'm including it here for the sake of interest:
// n is the number of items to randomly choose.
public static List<T> RandomlyChooseItems<T>(IEnumerable<T> items, int n, Random rng)
{
var result = new List<T>(n);
int index = 0;
foreach (var item in items)
{
if (index < n)
{
result.Add(item);
}
else
{
int r = rng.Next(0, index + 1);
if (r < n)
result[r] = item;
}
++index;
}
return result;
}
As an addendum to Henk's answer, here's a canonical implementation of the Shuffle algorithm he mentions. In this, _rng is an instance of Random:
/// <summary>Shuffles the specified array.</summary>
/// <typeparam name="T">The type of the array elements.</typeparam>
/// <param name="array">The array to shuffle.</param>
public void Shuffle<T>(IList<T> array)
{
for (int n = array.Count; n > 1;)
{
int k = _rng.Next(n);
--n;
T temp = array[n];
array[n] = array[k];
array[k] = temp;
}
}
Random rnd = new Random();
IQueryable<Detail> details = myList.OrderBy(x => rnd.Next()).Take(50);
var l = new List<string>();
l.Add("A");
l.Add("B");
l.Add("C");
l.Add("D");
l.Add("E");
l.Add("F");
l.Add("G");
l.Add("H");
l.Add("I");
var random = new Random();
var nl = l.Select(i=> new {Value=i,Index = random.Next()});
var finalList = nl.OrderBy(i=>i.Index).Take(3);
foreach(var i in finalList)
{
Console.WriteLine(i.Value);
}
IQueryable<Detail> details = myList.OrderBy(x => Guid.NewGuid()).ToList();
After this just walk trough it linearly:
var item1 = details[0];
This will avoid duplicates.
This is what ended up working for me, it ensures no duplicates are returned:
public List<T> GetRandomItems(List<T> items, int count = 3)
{
var length = items.Count();
var list = new List<T>();
var rnd = new Random();
var seed = 0;
while (list.Count() < count)
{
seed = rnd.Next(0, length);
if(!list.Contains(items[seed]))
list.Add(items[seed]);
}
return list;
}

C# - Writing numbers 1 through 10 in random order [duplicate]

This question already has answers here:
Is using Random and OrderBy a good shuffle algorithm? [closed]
(13 answers)
Closed 9 years ago.
Part 1: All I am wanting to achieve is to write the numbers 1, 2, 3 ... 8, 9, 10 to the console window in random order. So all the numbers will need to be written to console window, but the order of them must be random.
Part 2: In my actual project I plan to write all of the elements in an array, to the console window in random order. I am assuming that if I can get the answer to part 1, I should easily be able to implement this with an array.
/// <summary>
/// Returns all numbers, between min and max inclusive, once in a random sequence.
/// </summary>
IEnumerable<int> UniqueRandom(int minInclusive, int maxInclusive)
{
List<int> candidates = new List<int>();
for (int i = minInclusive; i <= maxInclusive; i++)
{
candidates.Add(i);
}
Random rnd = new Random();
while (candidates.Count > 0)
{
int index = rnd.Next(candidates.Count);
yield return candidates[index];
candidates.RemoveAt(index);
}
}
In your program
Console.WriteLine("All numbers between 0 and 10 in random order:");
foreach (int i in UniqueRandom(0, 10)) {
Console.WriteLine(i);
}
Enumerable.Range(1, 10).OrderBy(i => Guid.NewGuid()) works nicely.
using System;
using System.Collections;
namespace ConsoleApplication
{
class Numbers
{
public ArrayList RandomNumbers(int max)
{
// Create an ArrayList object that will hold the numbers
ArrayList lstNumbers = new ArrayList();
// The Random class will be used to generate numbers
Random rndNumber = new Random();
// Generate a random number between 1 and the Max
int number = rndNumber.Next(1, max + 1);
// Add this first random number to the list
lstNumbers.Add(number);
// Set a count of numbers to 0 to start
int count = 0;
do // Repeatedly...
{
// ... generate a random number between 1 and the Max
number = rndNumber.Next(1, max + 1);
// If the newly generated number in not yet in the list...
if (!lstNumbers.Contains(number))
{
// ... add it
lstNumbers.Add(number);
}
// Increase the count
count++;
} while (count <= 10 * max); // Do that again
// Once the list is built, return it
return lstNumbers;
}
}
Main
class Program
{
static int Main()
{
Numbers nbs = new Numbers();
const int Total = 10;
ArrayList lstNumbers = nbs.RandomNumbers(Total);
for (int i = 0; i < lstNumbers.Count; i++)
Console.WriteLine("{0}", lstNumbers[i].ToString());
return 0;
}
}
}
int[] ints = new int[11];
Random rand = new Random();
Random is a class built into .NET, and allows us to create random integers really, really easily. Basically all we have to do is call a method inside our rand object to get that random number, which is nice. So, inside our loop, we just set each element to the results of that method:
for (int i = 0; i < ints.Length; i++)
{
ints[i] = rand.Next(11);
}
We are essentially filling our entire array with random numbers here, all between 0 and 10. At this point all we have to do is display the contents for the user, which can be done with a foreach loop:
foreach (int i in ints)
{
Console.WriteLine(i.ToString());
}

Is my lottery program wrong ? or am I so unlucky?

I made a lottery program : http://yadi.sk/d/bBKefn9g4OC7s
Here is the whole source code : http://yadi.sk/d/rnQMfrry4O8cu
Random rnd = new Random();
int[] loto;
loto = new int[7];
for (int f = 1; f <= 6; f++) {
loto[f] = rnd.Next(1, 50); // Generating random number between 1-49
for (int h = 1; h < f; h++) {
if (loto[f] == loto[h]) { // Check with other numbers for the case of duplicate
loto[f] = rnd.Next(1, 50); // If there is a duplicate create that number again
}
}
}
This section I'm generating random 6 different numbers between 1-49
Also I'm wondering in this example, are nested loops increase the spontaneity ?
I'm getting 3-4 max, this program wrong or am I so unlucky ?
( note that : that's my first program )
For all guys trying to help me : I'm really beginner on programming(c# yesterday | c++ 3 weeks i guess), and if you guys clarify what you mean in codes it'll be great.
And please not give me extreme hard coding examples( I don't wanna quit c# )
Your method looks unsafe, as get value from Random again in the inner loop does not guarantee that it will return unduplicated value. For low value as 1-49, you can use simple random-picking algorithm like this
var numbers = new List<int>();
for (int i = 1; i <= 49; i++) {
numbers.Add(i);
}
Random r = new Random();
var loto = new int[6];
for (int f = 0; f < 6; f++) {
int idx = r.Next(0, numbers.Count);
loto[f] = numbers[idx];
numbers.RemoveAt(idx);
}
Note that this is far from optimal solution in terms of performance, but if you will run it only once in a few seconds or more so it should be fine.
I think it's correct except for the for loop declaration: remember that arrays in C# are zero-based. Thus the loop should look like this:
for (int f = 0; f < 7; f++)
or even better:
for (int f = 0; f < loto.Length; f++)
Update: I cannot comment the other answers (too less reputation), thus I have to post it here:
#Dan: only one loop is not correct as it is not allowed to have the same number twice in Loto. In his inner loop, 1342 checks if the created random number already exists, so it is not correct to leave it out.
#James: As 1342 just started programming, it is not necessary to use a static field in my opinion. I guess that he or she has his whole code in the Main method so there is no benefit using a static variable.
There are a few issues here - you've got one too many loops for a start, and no comments.
See this (over-commented) example below:
// This is static so we don't recreate it every time.
private static Random _rnd;
static void Main(string[] args)
{
_rnd = new Random();
// You can declare and initialise a variable in one statement.
// In this case you want the array size to be 6, not 7!
Int32[] lotoNumbers = new Int32[6];
// Generate 10 sets of 6 random numbers
for (int i = 0; i < 10; i++)
{
// Use a meaningful name for your iteration variable
// In this case I used 'idx' as in 'index'
// Arrays in c# are 0-based, so your lotto array has
// 6 elements - [0] to [5]
for (Int32 idx = 0; idx < 6; idx++)
{
// Keep looping until we have a unique number
int proposedNumber;
do
{
proposedNumber = _rnd.Next(1, 50);
} while (lotoNumbers.Contains(proposedNumber));
// Assign the unique proposed number to your array
lotoNumbers[idx] = proposedNumber;
}
}
}
You should end up with a 6 element long array with 6 random numbers between 1 and 50 in it.
Hope that helps!
Edit:
It's also well worth taking note of James' answer - if you're doing the above in a loop, you'll get the same values every time from Random, due to how the seed is used. Using a static version of Random will give much better results.
You don't want to keep re-creating a new instance of Random each time, that's the likely cause of why you keep getting similar values each time. The better approach is to create a static instance of Random and use that across your entire app - this should give you more realistic results e.g.
using System.Collections.Generic;
using System.Linq;
...
static readonly Random rand = new Random();
...
List<int> lottoNumbers = new List<int>(6);
int drawnNumber = -1;
for (int i = 0; i < lottoNumbers.Count; i++) {
do
{
drawnNumber = rand.Next(1, 50); // generate random number
}
while (lottoNumbers.Contains(drawnNumber)) // keep generating random numbers until we get one which hasn't already been drawn
lottoNumbers[i] = drawnNumber; // set the lotto number
}
// print results
foreach (var n in lottoNumbers)
Console.WriteLine(n);
For easily testing it, I have left the console logs and static void main for you.
You do not need two iterations for this. Also - arrays are 0 based, so either f has to be equal to 0, or less than 7. I went with equal 0 below.
I have created a recursive method which creates a new value and checks if the array contains the value. If it does not contain it, it adds it. But if it does contain it, the method calls itself to find a new value. It will continue to do this until a new value is found.
Recursive methods are methods which call themselves. Don't try and fill an array with an index bigger than 50 with this, as you will get an endless loop.
private static readonly Random Rnd = new Random();
static void Main(string[] args)
{
var loto = new int[7];
for (int f = 0; f <= 6; f++)
{
var randomValue = GetRandomNumberNotInArr(loto);
Console.WriteLine(randomValue);
loto[f] = randomValue;
}
Console.Read();
}
/// <summary>
/// Finds a new random value to insert into arr. If arr already contains this another
///random value will be found.
/// </summary>
/// <param name="arr">arr with already found values</param>
/// <returns></returns>
private static int GetRandomNumberNotInArr(int[] arr)
{
var next = Rnd.Next(1, 50);
return !arr.Contains(next) ? next : GetRandomNumberNotInArr(arr);
}
I can see that you are trying to simulate drawing 6 lottery numbers between 1 and 50.
Your code has some logic errors, but rather than fixing it I'm going to suggest doing it a different way.
There are several ways to approach this; a common one is this:
Create an empty collection of numbers.
while there aren't enough numbers in the collection
let randomNumber = new random number in the appropriate range
if (randomNumber isn't already in the collection)
add randomNumber to the collection
But there's another approach which scales nicely, so I'll demonstrate this one (someone else will probably already have written about the other approach):
Add to a collection all the numbers you want to choose from
Randomly rearrange (shuffle) the numbers in the collection
Draw the required number of items from the collection
This is pretty much what happens in a real-life lottery.
To shuffle a collection we can use the Fisher-Yates Shuffle. Here's an implementation:
/// <summary>Used to shuffle collections.</summary>
public class Shuffler
{
/// <summary>Shuffles the specified array.</summary>
/// <typeparam name="T">The type of the array elements.</typeparam>
/// <param name="array">The array to shuffle.</param>
public void Shuffle<T>(IList<T> array)
{
for (int n = array.Count; n > 1;)
{
int k = _rng.Next(n);
--n;
T temp = array[n];
array[n] = array[k];
array[k] = temp;
}
}
private readonly Random _rng = new Random();
}
Here's a full compilable example. I've avoided using Linq in this example because I don't want to confuse you!
using System;
using System.Collections.Generic;
namespace Demo
{
public static class Program
{
private static void Main()
{
int[] lotoDraw = createDraw();
Shuffler shuffler = new Shuffler();
shuffler.Shuffle(lotoDraw); // Now they are randomly ordered.
// We want 6 numbers, so we just draw the first 6:
int[] loto = draw(lotoDraw, 6);
// Print them out;
foreach (int ball in loto)
Console.WriteLine(ball);
}
private static int[] draw(int[] bag, int n) // Draws the first n items
{ // from the bag
int[] result = new int[n];
for (int i = 0; i < n; ++i)
result[i] = bag[i];
return result;
}
private static int[] createDraw() // Creates a collection of numbers
{ // from 1..50 to draw from.
int[] result = new int[50];
for (int i = 0; i < 50; ++i)
result[i] = i + 1;
return result;
}
}
public class Shuffler
{
public void Shuffle<T>(IList<T> list)
{
for (int n = list.Count; n > 1; )
{
int k = _rng.Next(n);
--n;
T temp = list[n];
list[n] = list[k];
list[k] = temp;
}
}
private readonly Random _rng = new Random();
}
}

Distributed probability random number generator

I want to generate a number based on a distributed probability. For example, just say there are the following occurences of each numbers:
Number| Count
1 | 150
2 | 40
3 | 15
4 | 3
with a total of (150+40+15+3) = 208
then the probability of a 1 is 150/208= 0.72
and the probability of a 2 is 40/208 = 0.192
How do I make a random number generator that returns be numbers based on this probability distribution?
I'm happy for this to be based on a static, hardcoded set for now but I eventually want it to derive the probability distribution from a database query.
I've seen similar examples like this one but they are not very generic. Any suggestions?
The general approach is to feed uniformly distributed random numbers from 0..1 interval into the inverse of the cumulative distribution function of your desired distribution.
Thus in your case, just draw a random number x from 0..1 (for example with Random.NextDouble()) and based on its value return
1 if 0 <= x < 150/208,
2 if 150/208 <= x < 190/208,
3 if 190/208 <= x < 205/208 and
4 otherwise.
Do this only once:
Write a function that calculates a cdf array given a pdf array. In your example pdf array is [150,40,15,3], cdf array will be [150,190,205,208].
Do this every time:
Get a random number in [0,1) , multiply with 208, truncate up (or down: I leave it to you to think about the corner cases) You'll have an integer in 1..208. Name it r.
Perform a binary search on cdf array for r. Return the index of the cell that contains r.
The running time will be proportional to log of the size of the given pdf array. Which is good. However, if your array size will always be so small (4 in your example) then performing a linear search is easier and also will perform better.
There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time. For details, see the following question, especially my answer there:
Data structures for loaded dice?
The following C# code implements Michael Vose's version of the alias method, as described in this article; see also this question. I have written this code for your convenience and provide it here.
public class LoadedDie {
// Initializes a new loaded die. Probs
// is an array of numbers indicating the relative
// probability of each choice relative to all the
// others. For example, if probs is [3,4,2], then
// the chances are 3/9, 4/9, and 2/9, since the probabilities
// add up to 9.
public LoadedDie(int probs){
this.prob=new List<long>();
this.alias=new List<int>();
this.total=0;
this.n=probs;
this.even=true;
}
Random random=new Random();
List<long> prob;
List<int> alias;
long total;
int n;
bool even;
public LoadedDie(IEnumerable<int> probs){
// Raise an error if nil
if(probs==null)throw new ArgumentNullException("probs");
this.prob=new List<long>();
this.alias=new List<int>();
this.total=0;
this.even=false;
var small=new List<int>();
var large=new List<int>();
var tmpprobs=new List<long>();
foreach(var p in probs){
tmpprobs.Add(p);
}
this.n=tmpprobs.Count;
// Get the max and min choice and calculate total
long mx=-1, mn=-1;
foreach(var p in tmpprobs){
if(p<0)throw new ArgumentException("probs contains a negative probability.");
mx=(mx<0 || p>mx) ? P : mx;
mn=(mn<0 || p<mn) ? P : mn;
this.total+=p;
}
// We use a shortcut if all probabilities are equal
if(mx==mn){
this.even=true;
return;
}
// Clone the probabilities and scale them by
// the number of probabilities
for(var i=0;i<tmpprobs.Count;i++){
tmpprobs[i]*=this.n;
this.alias.Add(0);
this.prob.Add(0);
}
// Use Michael Vose's alias method
for(var i=0;i<tmpprobs.Count;i++){
if(tmpprobs[i]<this.total)
small.Add(i); // Smaller than probability sum
else
large.Add(i); // Probability sum or greater
}
// Calculate probabilities and aliases
while(small.Count>0 && large.Count>0){
var l=small[small.Count-1];small.RemoveAt(small.Count-1);
var g=large[large.Count-1];large.RemoveAt(large.Count-1);
this.prob[l]=tmpprobs[l];
this.alias[l]=g;
var newprob=(tmpprobs[g]+tmpprobs[l])-this.total;
tmpprobs[g]=newprob;
if(newprob<this.total)
small.Add(g);
else
large.Add(g);
}
foreach(var g in large)
this.prob[g]=this.total;
foreach(var l in small)
this.prob[l]=this.total;
}
// Returns the number of choices.
public int Count {
get {
return this.n;
}
}
// Chooses a choice at random, ranging from 0 to the number of choices
// minus 1.
public int NextValue(){
var i=random.Next(this.n);
return (this.even || random.Next((int)this.total)<this.prob[i]) ? I : this.alias[i];
}
}
Example:
var loadedDie=new LoadedDie(new int[]{150,40,15,3}); // list of probabilities for each number:
// 0 is 150, 1 is 40, and so on
int number=loadedDie.nextValue(); // return a number from 0-3 according to given probabilities;
// the number can be an index to another array, if needed
I place this code in the public domain.
I know this is an old post, but I also searched for such a generator and was not satisfied with the solutions I found. So I wrote my own and want to share it to the world.
Just call "Add(...)" some times before you call "NextItem(...)"
/// <summary> A class that will return one of the given items with a specified possibility. </summary>
/// <typeparam name="T"> The type to return. </typeparam>
/// <example> If the generator has only one item, it will always return that item.
/// If there are two items with possibilities of 0.4 and 0.6 (you could also use 4 and 6 or 2 and 3)
/// it will return the first item 4 times out of ten, the second item 6 times out of ten. </example>
public class RandomNumberGenerator<T>
{
private List<Tuple<double, T>> _items = new List<Tuple<double, T>>();
private Random _random = new Random();
/// <summary>
/// All items possibilities sum.
/// </summary>
private double _totalPossibility = 0;
/// <summary>
/// Adds a new item to return.
/// </summary>
/// <param name="possibility"> The possibility to return this item. Is relative to the other possibilites passed in. </param>
/// <param name="item"> The item to return. </param>
public void Add(double possibility, T item)
{
_items.Add(new Tuple<double, T>(possibility, item));
_totalPossibility += possibility;
}
/// <summary>
/// Returns a random item from the list with the specified relative possibility.
/// </summary>
/// <exception cref="InvalidOperationException"> If there are no items to return from. </exception>
public T NextItem()
{
var rand = _random.NextDouble() * _totalPossibility;
double value = 0;
foreach (var item in _items)
{
value += item.Item1;
if (rand <= value)
return item.Item2;
}
return _items.Last().Item2; // Should never happen
}
}
Thanks for all your solutions guys! Much appreciated!
#Menjaraz I tried implementing your solution as it looks very resource friendly, however had some difficulty with the syntax.
So for now, I just transformed my summary into a flat list of values using LINQ SelectMany() and Enumerable.Repeat().
public class InventoryItemQuantityRandomGenerator
{
private readonly Random _random;
private readonly IQueryable<int> _quantities;
public InventoryItemQuantityRandomGenerator(IRepository database, int max)
{
_quantities = database.AsQueryable<ReceiptItem>()
.Where(x => x.Quantity <= max)
.GroupBy(x => x.Quantity)
.Select(x => new
{
Quantity = x.Key,
Count = x.Count()
})
.SelectMany(x => Enumerable.Repeat(x.Quantity, x.Count));
_random = new Random();
}
public int Next()
{
return _quantities.ElementAt(_random.Next(0, _quantities.Count() - 1));
}
}
Use my method. It is simple and easy-to-understand.
I don't count portion in range 0...1, i just use "Probabilityp Pool" (sounds cool, yeah?)
At circle diagram you can see weight of every element in pool
Here you can see an implementing of accumulative probability for roulette
`
// Some c`lass or struct for represent items you want to roulette
public class Item
{
public string name; // not only string, any type of data
public int chance; // chance of getting this Item
}
public class ProportionalWheelSelection
{
public static Random rnd = new Random();
// Static method for using from anywhere. You can make its overload for accepting not only List, but arrays also:
// public static Item SelectItem (Item[] items)...
public static Item SelectItem(List<Item> items)
{
// Calculate the summa of all portions.
int poolSize = 0;
for (int i = 0; i < items.Count; i++)
{
poolSize += items[i].chance;
}
// Get a random integer from 0 to PoolSize.
int randomNumber = rnd.Next(0, poolSize) + 1;
// Detect the item, which corresponds to current random number.
int accumulatedProbability = 0;
for (int i = 0; i < items.Count; i++)
{
accumulatedProbability += items[i].chance;
if (randomNumber <= accumulatedProbability)
return items[i];
}
return null; // this code will never come while you use this programm right :)
}
}
// Example of using somewhere in your program:
static void Main(string[] args)
{
List<Item> items = new List<Item>();
items.Add(new Item() { name = "Anna", chance = 100});
items.Add(new Item() { name = "Alex", chance = 125});
items.Add(new Item() { name = "Dog", chance = 50});
items.Add(new Item() { name = "Cat", chance = 35});
Item newItem = ProportionalWheelSelection.SelectItem(items);
}
Here's an implementation using the Inverse distribution function:
using System;
using System.Linq;
// ...
private static readonly Random RandomGenerator = new Random();
private int GetDistributedRandomNumber()
{
double totalCount = 208;
var number1Prob = 150 / totalCount;
var number2Prob = (150 + 40) / totalCount;
var number3Prob = (150 + 40 + 15) / totalCount;
var randomNumber = RandomGenerator.NextDouble();
int selectedNumber;
if (randomNumber < number1Prob)
{
selectedNumber = 1;
}
else if (randomNumber >= number1Prob && randomNumber < number2Prob)
{
selectedNumber = 2;
}
else if (randomNumber >= number2Prob && randomNumber < number3Prob)
{
selectedNumber = 3;
}
else
{
selectedNumber = 4;
}
return selectedNumber;
}
An example to verify the random distribution:
int totalNumber1Count = 0;
int totalNumber2Count = 0;
int totalNumber3Count = 0;
int totalNumber4Count = 0;
int testTotalCount = 100;
foreach (var unused in Enumerable.Range(1, testTotalCount))
{
int selectedNumber = GetDistributedRandomNumber();
Console.WriteLine($"selected number is {selectedNumber}");
if (selectedNumber == 1)
{
totalNumber1Count += 1;
}
if (selectedNumber == 2)
{
totalNumber2Count += 1;
}
if (selectedNumber == 3)
{
totalNumber3Count += 1;
}
if (selectedNumber == 4)
{
totalNumber4Count += 1;
}
}
Console.WriteLine("");
Console.WriteLine($"number 1 -> total selected count is {totalNumber1Count} ({100 * (totalNumber1Count / (double) testTotalCount):0.0} %) ");
Console.WriteLine($"number 2 -> total selected count is {totalNumber2Count} ({100 * (totalNumber2Count / (double) testTotalCount):0.0} %) ");
Console.WriteLine($"number 3 -> total selected count is {totalNumber3Count} ({100 * (totalNumber3Count / (double) testTotalCount):0.0} %) ");
Console.WriteLine($"number 4 -> total selected count is {totalNumber4Count} ({100 * (totalNumber4Count / (double) testTotalCount):0.0} %) ");
Example output:
selected number is 1
selected number is 1
selected number is 1
selected number is 1
selected number is 2
selected number is 1
...
selected number is 2
selected number is 3
selected number is 1
selected number is 1
selected number is 1
selected number is 1
selected number is 1
number 1 -> total selected count is 71 (71.0 %)
number 2 -> total selected count is 20 (20.0 %)
number 3 -> total selected count is 8 (8.0 %)
number 4 -> total selected count is 1 (1.0 %)

Categories