I want to generate a number based on a distributed probability. For example, just say there are the following occurences of each numbers:
Number| Count
1 | 150
2 | 40
3 | 15
4 | 3
with a total of (150+40+15+3) = 208
then the probability of a 1 is 150/208= 0.72
and the probability of a 2 is 40/208 = 0.192
How do I make a random number generator that returns be numbers based on this probability distribution?
I'm happy for this to be based on a static, hardcoded set for now but I eventually want it to derive the probability distribution from a database query.
I've seen similar examples like this one but they are not very generic. Any suggestions?
The general approach is to feed uniformly distributed random numbers from 0..1 interval into the inverse of the cumulative distribution function of your desired distribution.
Thus in your case, just draw a random number x from 0..1 (for example with Random.NextDouble()) and based on its value return
1 if 0 <= x < 150/208,
2 if 150/208 <= x < 190/208,
3 if 190/208 <= x < 205/208 and
4 otherwise.
Do this only once:
Write a function that calculates a cdf array given a pdf array. In your example pdf array is [150,40,15,3], cdf array will be [150,190,205,208].
Do this every time:
Get a random number in [0,1) , multiply with 208, truncate up (or down: I leave it to you to think about the corner cases) You'll have an integer in 1..208. Name it r.
Perform a binary search on cdf array for r. Return the index of the cell that contains r.
The running time will be proportional to log of the size of the given pdf array. Which is good. However, if your array size will always be so small (4 in your example) then performing a linear search is easier and also will perform better.
There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time. For details, see the following question, especially my answer there:
Data structures for loaded dice?
The following C# code implements Michael Vose's version of the alias method, as described in this article; see also this question. I have written this code for your convenience and provide it here.
public class LoadedDie {
// Initializes a new loaded die. Probs
// is an array of numbers indicating the relative
// probability of each choice relative to all the
// others. For example, if probs is [3,4,2], then
// the chances are 3/9, 4/9, and 2/9, since the probabilities
// add up to 9.
public LoadedDie(int probs){
this.prob=new List<long>();
this.alias=new List<int>();
this.total=0;
this.n=probs;
this.even=true;
}
Random random=new Random();
List<long> prob;
List<int> alias;
long total;
int n;
bool even;
public LoadedDie(IEnumerable<int> probs){
// Raise an error if nil
if(probs==null)throw new ArgumentNullException("probs");
this.prob=new List<long>();
this.alias=new List<int>();
this.total=0;
this.even=false;
var small=new List<int>();
var large=new List<int>();
var tmpprobs=new List<long>();
foreach(var p in probs){
tmpprobs.Add(p);
}
this.n=tmpprobs.Count;
// Get the max and min choice and calculate total
long mx=-1, mn=-1;
foreach(var p in tmpprobs){
if(p<0)throw new ArgumentException("probs contains a negative probability.");
mx=(mx<0 || p>mx) ? P : mx;
mn=(mn<0 || p<mn) ? P : mn;
this.total+=p;
}
// We use a shortcut if all probabilities are equal
if(mx==mn){
this.even=true;
return;
}
// Clone the probabilities and scale them by
// the number of probabilities
for(var i=0;i<tmpprobs.Count;i++){
tmpprobs[i]*=this.n;
this.alias.Add(0);
this.prob.Add(0);
}
// Use Michael Vose's alias method
for(var i=0;i<tmpprobs.Count;i++){
if(tmpprobs[i]<this.total)
small.Add(i); // Smaller than probability sum
else
large.Add(i); // Probability sum or greater
}
// Calculate probabilities and aliases
while(small.Count>0 && large.Count>0){
var l=small[small.Count-1];small.RemoveAt(small.Count-1);
var g=large[large.Count-1];large.RemoveAt(large.Count-1);
this.prob[l]=tmpprobs[l];
this.alias[l]=g;
var newprob=(tmpprobs[g]+tmpprobs[l])-this.total;
tmpprobs[g]=newprob;
if(newprob<this.total)
small.Add(g);
else
large.Add(g);
}
foreach(var g in large)
this.prob[g]=this.total;
foreach(var l in small)
this.prob[l]=this.total;
}
// Returns the number of choices.
public int Count {
get {
return this.n;
}
}
// Chooses a choice at random, ranging from 0 to the number of choices
// minus 1.
public int NextValue(){
var i=random.Next(this.n);
return (this.even || random.Next((int)this.total)<this.prob[i]) ? I : this.alias[i];
}
}
Example:
var loadedDie=new LoadedDie(new int[]{150,40,15,3}); // list of probabilities for each number:
// 0 is 150, 1 is 40, and so on
int number=loadedDie.nextValue(); // return a number from 0-3 according to given probabilities;
// the number can be an index to another array, if needed
I place this code in the public domain.
I know this is an old post, but I also searched for such a generator and was not satisfied with the solutions I found. So I wrote my own and want to share it to the world.
Just call "Add(...)" some times before you call "NextItem(...)"
/// <summary> A class that will return one of the given items with a specified possibility. </summary>
/// <typeparam name="T"> The type to return. </typeparam>
/// <example> If the generator has only one item, it will always return that item.
/// If there are two items with possibilities of 0.4 and 0.6 (you could also use 4 and 6 or 2 and 3)
/// it will return the first item 4 times out of ten, the second item 6 times out of ten. </example>
public class RandomNumberGenerator<T>
{
private List<Tuple<double, T>> _items = new List<Tuple<double, T>>();
private Random _random = new Random();
/// <summary>
/// All items possibilities sum.
/// </summary>
private double _totalPossibility = 0;
/// <summary>
/// Adds a new item to return.
/// </summary>
/// <param name="possibility"> The possibility to return this item. Is relative to the other possibilites passed in. </param>
/// <param name="item"> The item to return. </param>
public void Add(double possibility, T item)
{
_items.Add(new Tuple<double, T>(possibility, item));
_totalPossibility += possibility;
}
/// <summary>
/// Returns a random item from the list with the specified relative possibility.
/// </summary>
/// <exception cref="InvalidOperationException"> If there are no items to return from. </exception>
public T NextItem()
{
var rand = _random.NextDouble() * _totalPossibility;
double value = 0;
foreach (var item in _items)
{
value += item.Item1;
if (rand <= value)
return item.Item2;
}
return _items.Last().Item2; // Should never happen
}
}
Thanks for all your solutions guys! Much appreciated!
#Menjaraz I tried implementing your solution as it looks very resource friendly, however had some difficulty with the syntax.
So for now, I just transformed my summary into a flat list of values using LINQ SelectMany() and Enumerable.Repeat().
public class InventoryItemQuantityRandomGenerator
{
private readonly Random _random;
private readonly IQueryable<int> _quantities;
public InventoryItemQuantityRandomGenerator(IRepository database, int max)
{
_quantities = database.AsQueryable<ReceiptItem>()
.Where(x => x.Quantity <= max)
.GroupBy(x => x.Quantity)
.Select(x => new
{
Quantity = x.Key,
Count = x.Count()
})
.SelectMany(x => Enumerable.Repeat(x.Quantity, x.Count));
_random = new Random();
}
public int Next()
{
return _quantities.ElementAt(_random.Next(0, _quantities.Count() - 1));
}
}
Use my method. It is simple and easy-to-understand.
I don't count portion in range 0...1, i just use "Probabilityp Pool" (sounds cool, yeah?)
At circle diagram you can see weight of every element in pool
Here you can see an implementing of accumulative probability for roulette
`
// Some c`lass or struct for represent items you want to roulette
public class Item
{
public string name; // not only string, any type of data
public int chance; // chance of getting this Item
}
public class ProportionalWheelSelection
{
public static Random rnd = new Random();
// Static method for using from anywhere. You can make its overload for accepting not only List, but arrays also:
// public static Item SelectItem (Item[] items)...
public static Item SelectItem(List<Item> items)
{
// Calculate the summa of all portions.
int poolSize = 0;
for (int i = 0; i < items.Count; i++)
{
poolSize += items[i].chance;
}
// Get a random integer from 0 to PoolSize.
int randomNumber = rnd.Next(0, poolSize) + 1;
// Detect the item, which corresponds to current random number.
int accumulatedProbability = 0;
for (int i = 0; i < items.Count; i++)
{
accumulatedProbability += items[i].chance;
if (randomNumber <= accumulatedProbability)
return items[i];
}
return null; // this code will never come while you use this programm right :)
}
}
// Example of using somewhere in your program:
static void Main(string[] args)
{
List<Item> items = new List<Item>();
items.Add(new Item() { name = "Anna", chance = 100});
items.Add(new Item() { name = "Alex", chance = 125});
items.Add(new Item() { name = "Dog", chance = 50});
items.Add(new Item() { name = "Cat", chance = 35});
Item newItem = ProportionalWheelSelection.SelectItem(items);
}
Here's an implementation using the Inverse distribution function:
using System;
using System.Linq;
// ...
private static readonly Random RandomGenerator = new Random();
private int GetDistributedRandomNumber()
{
double totalCount = 208;
var number1Prob = 150 / totalCount;
var number2Prob = (150 + 40) / totalCount;
var number3Prob = (150 + 40 + 15) / totalCount;
var randomNumber = RandomGenerator.NextDouble();
int selectedNumber;
if (randomNumber < number1Prob)
{
selectedNumber = 1;
}
else if (randomNumber >= number1Prob && randomNumber < number2Prob)
{
selectedNumber = 2;
}
else if (randomNumber >= number2Prob && randomNumber < number3Prob)
{
selectedNumber = 3;
}
else
{
selectedNumber = 4;
}
return selectedNumber;
}
An example to verify the random distribution:
int totalNumber1Count = 0;
int totalNumber2Count = 0;
int totalNumber3Count = 0;
int totalNumber4Count = 0;
int testTotalCount = 100;
foreach (var unused in Enumerable.Range(1, testTotalCount))
{
int selectedNumber = GetDistributedRandomNumber();
Console.WriteLine($"selected number is {selectedNumber}");
if (selectedNumber == 1)
{
totalNumber1Count += 1;
}
if (selectedNumber == 2)
{
totalNumber2Count += 1;
}
if (selectedNumber == 3)
{
totalNumber3Count += 1;
}
if (selectedNumber == 4)
{
totalNumber4Count += 1;
}
}
Console.WriteLine("");
Console.WriteLine($"number 1 -> total selected count is {totalNumber1Count} ({100 * (totalNumber1Count / (double) testTotalCount):0.0} %) ");
Console.WriteLine($"number 2 -> total selected count is {totalNumber2Count} ({100 * (totalNumber2Count / (double) testTotalCount):0.0} %) ");
Console.WriteLine($"number 3 -> total selected count is {totalNumber3Count} ({100 * (totalNumber3Count / (double) testTotalCount):0.0} %) ");
Console.WriteLine($"number 4 -> total selected count is {totalNumber4Count} ({100 * (totalNumber4Count / (double) testTotalCount):0.0} %) ");
Example output:
selected number is 1
selected number is 1
selected number is 1
selected number is 1
selected number is 2
selected number is 1
...
selected number is 2
selected number is 3
selected number is 1
selected number is 1
selected number is 1
selected number is 1
selected number is 1
number 1 -> total selected count is 71 (71.0 %)
number 2 -> total selected count is 20 (20.0 %)
number 3 -> total selected count is 8 (8.0 %)
number 4 -> total selected count is 1 (1.0 %)
Related
I have a stream of data (integers) with given (constant) frequency. From time to time I need to compute different averages (predefined). I am looking for solution to do it fast and efficient.
Assumptions:
Sampling rate is constant (predefined) and might be something between 125-500 SPS
Averages I need to compute are predefined and it might me one average or many (for example only last 200ms average or last 250ms and last 500ms). There might be many averages but they are predefined!
At any time I need to be able to compute current average (real time)
What I have right now:
I assume that in particular timeframe there will be always the same amount of data. So having frequency 100SPS I assume that one second contain exactly 100 values
Queue with constant length is created (something like buffer)
For EVERY defined average, Sum variable is created
Every time new sample arrive I place it on the queue.
Every time I have new sample in the queue I add its value to the every Sum variables I have and also remove value of element which is out of the window (based on position in Queue)
Once I need to compute average I just take the particular Sum variable and divide it by number of elements this Sum should contain
To give you more better insight there is a code which I have right now:
public class Buffer<T> : LinkedList<T>
{
private readonly int capacity;
public bool IsFull => Count >= capacity;
public Buffer(int capacity)
{
this.capacity = capacity;
}
public void Enqueue(T item)
{
if (Count == capacity)
{
RemoveFirst();
}
AddLast(item);
}
}
public class MovingAverage
{
private readonly Buffer<float> Buffer;
private static readonly object bufferLock = new object();
public Dictionary<string, float> Sums { get; private set; }
public Dictionary<string, int> Counts { get; private set; }
public MovingAverage(List<int> sampleCounts, List<string> names)
{
if (sampleCounts.Count != names.Count)
{
throw new ArgumentException("Wrong Moving Averages parameters");
}
Buffer = new Buffer<float>(sampleCounts.Max());
Sums = new Dictionary<string, float>();
Counts = new Dictionary<string, int>();
for (int i = 0; i < names.Count; i++)
{
Sums[names[i]] = 0;
Counts[names[i]] = sampleCounts[i];
}
}
public void ProcessAveraging(float val)
{
lock (bufferLock)
{
if (float.IsNaN(val))
{
val = 0;
}
foreach (var keyVal in Counts.OrderBy(a => a.Value))
{
Sums[keyVal.Key] += val;
if (Buffer.Count >= keyVal.Value)
{
Sums[keyVal.Key] -= Buffer.ElementAt(Buffer.Count - keyVal.Value);
}
}
Buffer.Enqueue(val);
}
}
public float GetLastAverage(string averageName)
{
lock (bufferLock)
{
if (Buffer.Count >= Counts[averageName])
{
return Sums[averageName] / Counts[averageName];
}
else
{
return Sums[averageName] / Buffer.Count;
}
}
}
}
That works really nice and is fast enough but in real world having 100 SPS doesnt really mean you will always have 100 samples in 1 second. Sometimes its 100, sometimes 99, sometimes 101. Computing these averages is critical for my system and 1 sample more or less could change a lot. Thats why I need a real timer telling me whether sample is already out of moving-average window or not.
The idea with adding timestamp to every sample seems to be promising
Plenty of answers here.. Might as well add another one :)
This one might need some minor debugging for "off by one" etc - I didn't have a real dataset to work with so perhaps treat it as pseudocode
It's like yours: there's a buffer that is circular - give it enough capacity to hold N samples where N is enough to inspect your moving averages - 100 SPS and want to inspect 250ms I think you'll need at least 25, but we aren't short on space so you could make it more
struct Cirray
{
long _head;
TimedFloat[] _data;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public void Add(float f)
{
_data[_head++%_data.Length] = new TimedFloat() { F = f };
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
}
struct TimedFloat
{
[DllImport("Kernel32.dll", CallingConvention = CallingConvention.Winapi)]
private static extern void GetSystemTimePreciseAsFileTime(out long filetime);
private float _f;
public float F { get => _f;
set {
_f = value;
GetSystemTimePreciseAsFileTime(out long x);
T = DateTime.FromFileTimeUtc(x).Ticks;
}
}
public long T;
}
The normal DateTime.UtcNow isn't very precise - about 16ms - so it's probably no good for timestamping data like this if youre saying that even one sample could throw it off. Instead we can make it so we get the ticks equivalent of the high resolution timer, if your system supports it (if not, you might have to change system, or abuse a StopWatch class into giving a higher resolution supplement) and we're timestamping every data item.
I thought about going to the complexity of maintaining N number of constantly moving pointers to various tail ends of the data and dec/incrementing N number of sums - it could still be done (and you clearly know how) but your question read like you'd probably call for the averages infrequently enough that an N sums/counts solution would spend more time maintaining the counts than it would to just run through 250 or 500 floats every now and then and just add them up. GetAverages as a result takes an array of ticks (10 thousand per ms) of the ranges you want the data over, e.g. new[] { 50 * 10000, 100 * 10000, 150 * 10000, 200 * 10000, 250 * 10000 } for 50ms to 250ms in steps of 50, and it starts at the current head and sums backwards until the point where it's going to break a time boundary (and this might be the off-by-one bit) whereupon it yields the average for that timespan, then resumes summing and counting (the count given by math of the start minus the current index) for the next time span.. I think I understood right that you want e.g. the "average over the last 50ms" and "average over the last 100ms", not "average for the recent 50ms" and "average for the 50ms before recent"
Edit:
Thought about it some more and did this:
struct Cirray
{
long _head;
TimedFloat[] _data;
RunningAverage[] _ravgs;
public Cirray(int capacity)
{
_head = 0;
_data = new TimedFloat[capacity];
}
public Cirray(int capacity, int[] deltas) : this(capacity)
{
_ravgs = new RunningAverage[deltas.Length];
for (int i = 0; i < deltas.Length; i++)
_ravgs[i] = new RunningAverage() { OverMilliseconds = deltas[i] };
}
public void Add(float f)
{
//in c# every assignment returns the assigned value; capture it for use later
var addedTF = (_data[_head++ % _data.Length] = new TimedFloat() { F = f });
if (_ravgs == null)
return;
foreach (var ra in _ravgs)
{
//add the new tf to each RA
ra.Count++;
ra.Total += addedTF.F;
//move the end pointer in the RA circularly up the array, subtracting/uncounting as we go
var boundary = addedTF.T - ra.OverMilliseconds;
while (_data[ra.EndPointer].T < boundary) //while the sample is timed before the boundary, move the
{
ra.Count--;
ra.Total -= _data[ra.EndPointer].F;
ra.EndPointer = (ra.EndPointer + 1) % _data.Length; //circular indexing
}
}
}
public IEnumerable<float> GetAverages(int[] forDeltas)
{
double sum = 0;
long start = _head - 1;
long now = _data[start].T;
int whichDelta = 0;
for (long idx = start; idx >= 0 && whichDelta < forDeltas.Length; idx--)
{
if (_data[idx % _data.Length].T < now - forDeltas[whichDelta])
{
yield return (float)(sum / (start - idx));
whichDelta++;
}
sum += _data[idx % _data.Length].F;
}
}
public IEnumerable<float> GetAverages() //from the built ins
{
foreach (var ra in _ravgs)
{
if (ra.Count == 0)
yield return 0;
else
yield return (float)(ra.Total / ra.Count);
}
}
}
Absolutely haven't tested it, but it embodies my thinking in the comments
Instead of using a linked list I would fall back to some internal functions as array copy. In this answer I included a possible rewrite for your buffer class. Taking over the idea to keep a sum at every position.
This buffer keeps track of all the sums but in order to do that it needs to sum up every item with the new value. Based on the frequency you need to get that average it might be better to sum up when you need it and only keep the individual values.
In any way I just wanted to point out how you could do it with Array.Copy
public class BufferSum
{
private readonly int _capacity;
private readonly int _last;
private float[] _items;
public int Count { get; private set; }
public bool IsFull => Count >= _capacity;
public BufferSum(int capacity)
{
_capacity = capacity;
_last = capacity - 1;
_items = new float[_capacity];
}
public void Enqueue(float item)
{
if (Count == _capacity)
{
Array.Copy(_items, 1, _items, 0, _last);
_items[_last] = 0;
}
else
{
Count++;
}
for (var i = 0; i < Count; i ++)
{
_items[i] += item;
}
}
public float Avarage => _items[0] / Count;
public float AverageAt(int ms, int fps)
{
var _pos = Convert.ToInt32(ms / 1000 * fps);
return _items[Count - _pos] / _pos;
}
}
Additional be careful with the lock statement that will take a lot of time to.
Make an array of size 500, int counter c.
For every sample:
summ -= A[c % 500] //remove old value
summ += sample
A[c % 500] = sample //replace it with new value
c++
if needed, calculate
average = summ / 500
You always want to remove the oldest element on one side of your sequence and add a new element at the other side of the sequence: you need a queue instead of a stack.
I think a round list will be faster: as long as you have not the maximum size, just add the elements, once you've got the maximum size, replace the oldest element.
This seems like a nice reusable class. Later we'll add the moving average part.
class RoundArray<T>
{
public RoundArray(int maxSize)
{
this.MaxSize = maxSize;
this.roundArray = new List<T>(maxSize);
}
private readonly int maxSize;
private readonly List<T> roundArray;
public int indexOldestItem = 0;
public void Add(T item)
{
// if list not full, just add
if (this.roundArray.Count < this.maxSize)
this.roundArray.Add(item);
else
{
// list is full, replace the oldest item:
this.roundArray[oldestItem] = item;
oldestItem = (oldestItem + 1) % this.maxSize;
}
public int Count => this.roundArray.Count;
public T Oldest => this.roundArray[this.indexOldestItem];
}
}
To make this class useful, add methods to enumerate the data, starting at the oldest or the newest, consider to add other useful reusable methods. Maybe you should implement IReadOnlyCollection<T>. Maybe some private fields should have public properties.
Your moving average calculator will use this RoundArray. Whenever an item is added, and your roundArray is not full yet, the item is added to the sum and to the round array.
If the roundArray is full, then the item replaces the oldest item. You subtract the value of the OldestItem from the Sum, and add the new Item to the Sum.
class MovingAverageCalculator
{
public MovingAverageCalculator(int maxSize)
{
this.roundArray = new RoundArray<int>(maxSize);
}
private readonly RoundArray<int> roundArray;
private int sum = 0;
private int Count => this.RoundArray.Count;
private int Average => this.sum / this.Count;
public voidAdd(int value)
{
if (this.Count == this.MaxSize)
{
// replace: remove the oldest value from the sum and add the new one
this.Sum += value - this.RoundArray.Oldest;
}
else
{
// still building: just add the new value to the Sum
this.Sum += value;
}
this.RoundArray.Add(value);
}
}
Cumulative sums.
Compute a series of cumulative sums1 for every block of ~1000 or so elements. (Could be less however 500 or 1000 is not that much of a difference and this will be more comfortable) You want to hold every block as long as at least one element inside is relevant. Then it can be recycled.2
When you need your current sum and you are within one block, your desired sum is:block[max_index] - block[last_relevant_number].
For the case when you are at the borderline of two blocks b1, b2 in this order, your desired sum is:
b1[b1.length - 1] - b1[last_relevant_number] + b2[max_index]
And we are done. The main advantage of this approach is that you don't need to know beforehands how many elements you want to keep and you can compute the result on the go.
You also don't need to handle the removal of the elements as you will naturally overwrite them when you recycle the segment - keeping the indices is all you need.
Example: let us have a constant timeseries ts = [1,1,1, .... 1]. The cumulative sums of the series will be cumsum = [1,2,3 ... n]. The sum from i-th to the j-th(inclusive) element of the ts will be cumsum[j] - cumsum[i - 1] = j - i - 1. For i = 5, j = 6 it will be 6 - 4 = 2 which is correct.
1 For array [1,2,3,4,5] these would be [1,3,6,10,15] - just for the sake of completeness.
2 Since you mentioned ~500 elements, two blocks should be enough.
There is a moving average suppose: 2, 4, 6 , 8 , 10...n;
Then add the current value (10) to list
List<int>numHold = new List<int>();
numhold.Add(currentvalue);
Inside the list:
the current value is added
10
and so on
20
30
40 etc
by using
var lastdigit = numHold[numhold.Count -1];
I can get the last digit but the output is
current: 10 last: 10
current: 20 last: 20
the output should be
current: 20 last: 10
Thanks
Typically, C# indexers start from 0, so the first element has index 0. On the other hand, Count/Length will use 1 for one element. So your
numHold[numhold.Count - 1]
actually takes the last element in the list. If you need the one before that, you need to use - 2 - though be careful you do not reach outside of the bounds of the list (something like Math.Max(0, numhold.Count - 2) might be appropriate).
You can also store the values in separate variables:
List<int> nums = new List<int> { 1 };
int current = 1;
int last = current;
for (int i = 0; i < 10; i++)
{
last = current;
current = i * 2;
nums.Add(current);
}
Console.WriteLine("Current: {0}", current);
Console.WriteLine("Last: {0}", last);
Question is so unclear, but if ur using moving average to draw a line graph 📈 you would use a circular buffer which can be implemented by urself utilizing an object that contains an array of specified size, and the next available position. You could also download a nuget package that already has it done.
A relatively simple way to calculate a moving average is to use a circular buffer to hold the last N values (where N is the number of values for which to compute a moving average).
For example:
public sealed class MovingAverage
{
private readonly int _max;
private readonly double[] _numbers;
private double _total;
private int _front;
private int _count;
public MovingAverage(int max)
{
_max = max;
_numbers = new double[max];
}
public double Average
{
get { return _total / _count; }
}
public void Add(double value)
{
_total += value;
if (_count == _max)
_total -= _numbers[_front];
else
++_count;
_numbers[_front] = value;
_front = (_front+1)%_max;
}
};
which you might use like this:
var test = new MovingAverage(11);
for (int i = 0; i < 25; ++i)
{
test.Add(i);
Console.WriteLine(test.Average);
}
Note that this code is optimised for speed. After a large number of iterations, you might start to get rounding errors. You can avoid this by adding to class MovingAverage a slower method to calculate the average instead of using the Average property:
public double AccurateAverage()
{
double total = 0;
for (int i = 0, j = _front; i < _count; ++i)
{
total += _numbers[j];
if (--j < 0)
j = _max - 1;
}
return total/_count;
}
Your last item will always be at position 0.
List<int>numHold = new List<int>();
numHold.add(currentvalue); //Adding 10
numHold[0]; // will contain 10
numHold.add(currentvalue); //Adding 20
numHold[0]; // will contain 10
numHold[numhold.Count - 1]; // will contain 20
the better way to get first and last are
numHold.first(); //Actually last in your case
numHold.last(); //first in your case
I got 1000 millions of records in my Database for which I need to update all the rows with 20 values randomly.
So,for Every random 50 Million records,1 value need to updated.
So,I thought of Generating a List with 1000 million numbers and select random 50 million records from that list and remove that 50 million records from that list and so on.
My code :
List Creation:
List<long> LstMainList = new List<long>();
for (int i = 1; i <= 999999999; i++)
{
LstMainList.Add(i);
}
New Empty List : List<TableData> Table1 = new List<TableData>();
Selecting Random Numbers and adding them to New List and removing the item from the MainList which contains 1000 million items.
Random rand = new Random();
for (int a = 0; a < 50000000; a++)
{
int lstindex = rand.Next(LstMainList.Count);
Int64 lstData = LstMainList[lstindex];
Table1.Add(new TableData { MESSAGE_ID = lstData });
LstMainList.RemoveAt(lstindex);
if (a % 100000 == 0)
{
if (previousThread != null)
{
previousThread.Join();
}
List<TableData> copyList = Table1.ToList();
previousThread = new Thread(() => BulkCopyList(copyList, "PLAN_TABLE_1"));
previousThread.Start();
Table1.Clear();
}
}
Now,My problem is : At the Line of LstMainList.RemoveAt(lstindex);,it is taking long time to remove the Index from the MainList because it contains 1000 million records.
Is there a way to remove the record from List in a simple way? or any other way to make this simple?
First - use array for ids instead of list (especially without initialized capacity)
int idsCount = 100000000;
long[] ids = new long[idsCount];
for(long i = 1; i < idsCount; i++)
ids[i] = i;
Use Fisher–Yates shuffle to shuffle ids in array
Random rnd = new Random();
int n = idsCount;
while(n > 1)
{
int k = rnd.Next(n);
n--;
long temp = ids[n];
ids[n] = ids[k];
ids[k] = temp;
}
With shuffled ids you don't need to modify ids list. Removing item at random position is very expensive operation. If you remove item at position 0 whole list should be copied to new array. Now you can just iterate ids array.
Or you can use morelinq Batch to create batches of TableData and bulk them:
int size = 100000;
foreach(var batch in ids.Batch(size, id => new TableData { MESSAGE_ID = id }))
{
var copyList = batch.ToList();
// ...
}
UPDATE: Thus you need batches of different size, you can use following extension method to get range of items from array:
public static IEnumerable<T> GetRange<T>(
this T[] array, int startIndex, int count)
{
for (int i = startIndex; i < startIndex + count; i++)
yield return array[i];
}
So, getting 5000 TableData starting from index 20000 will look like:
var copyList = ids.GetRange(20000, 5000)
.Select(id => new TableData { MESSAGE_ID = id })
.ToList();
Of course, more efficient way will be just iterate ids array, and add items to list with pre-initialize capacity:
int size = 5000;
int startIndex = 20000;
List<TableData> copyList = new List<TableData>(size);
for (int i = startIndex; i < startIndex + size; i++)
copyList.Add(new TableData { MESSAGE_ID = ids[i] });
Going further I would move TableData objects creation to thread which does bulk copy. And just passed sequence of ids it should use.
Firstly, here's some advice from Microsoft about selecting rows randonly from a large table.
Secondly, if that's of no use, read on...
If you know the number of items you want to randomly select, and the number of items in a sequence from which you want to randomly select, then there is an O(N) solution.
In the example below, the method RandomlySelectedItems<T>() provides a sequence of the randomly selected items.
Here's the code. (To reiterate, you can only use this if you know in advance the number of items from which you will be selecting):
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo
{
internal static class Program
{
static void Main(string[] args)
{
int numberOfValuesToSelectFrom = 10000000;
int numberOfValuesToSelect = 20;
var valuesToSelectFrom = Enumerable.Range(1, numberOfValuesToSelectFrom);
var selectedValues = RandomlySelectedItems
(
valuesToSelectFrom,
numberOfValuesToSelect,
numberOfValuesToSelectFrom,
new Random()
);
foreach (int value in selectedValues)
Console.WriteLine(value);
}
/// <summary>Randomly selects items from a sequence.</summary>
/// <typeparam name="T">The type of the items in the sequence.</typeparam>
/// <param name="sequence">The sequence from which to randomly select items.</param>
/// <param name="count">The number of items to randomly select from the sequence.</param>
/// <param name="sequenceLength">The number of items in the sequence among which to randomly select.</param>
/// <param name="rng">The random number generator to use.</param>
/// <returns>A sequence of randomly selected items.</returns>
/// <remarks>This is an O(N) algorithm (N is the sequence length).</remarks>
public static IEnumerable<T> RandomlySelectedItems<T>(IEnumerable<T> sequence, int count, int sequenceLength, Random rng)
{
if (sequence == null)
throw new ArgumentNullException("sequence");
if (count < 0 || count > sequenceLength)
throw new ArgumentOutOfRangeException("count", count, "count must be between 0 and sequenceLength");
if (rng == null)
throw new ArgumentNullException("rng");
int available = sequenceLength;
int remaining = count;
var iterator = sequence.GetEnumerator();
for (int current = 0; current < sequenceLength; ++current)
{
iterator.MoveNext();
if (rng.NextDouble() < remaining/(double)available)
{
yield return iterator.Current;
--remaining;
}
--available;
}
}
}
}
One option is to not try to generate truly or even pseudo-random numbers but use a sequence that is only apparently random to a casual observer. This can work in a lot cases however it would not work if the the items need to be chosen randomly to protect from an attacker predicting the next value. The benefit is that you do not need to keep track of all the generated values in memory to shuffle them.
To start, select two random prime numbers (a, b) less than the number of rows (r) you have such that a * b > r and a does not divide r. The mapping f(x) = a * x + b mod r is guaranteed to be one-to-one in the ring Z[r]. We will use that fact to generate a sequence where each value is unique from 0 to r - 1.
Let's pick two random primes, say a = 11268619 and b = 4064861. Then you can generate sequence of "random" numbers in the range 0 to 1e9-1:
private static IEnumerable<int> GenerateSequence()
{
const int max = 1000000000;
const long a = 11268619, b = 4064861;
for(int i = 0; i < max; i++)
{
int c = (int)((a * i + b) % max);
yield return c;
}
}
I have an IQueryable containing more than 300 objects:
public class Detail
{
public int Id { get; set; }
public int CityId { get; set; }
public bool Chosen { get; set; }
}
IQueryable<Detail> details = ...
How can I go against this an at random pick out 50 objects? I assume that I would need to convert this with .ToList() but I am not sure how I could pick out random elements.
300 is not very much, so Yes, make it a List:
IQueryable<Detail> details = ...
IList<Detail> detailList = details.ToList();
And now you can pick a random item :
var randomItem = detailList[rand.Next(detailList.Count)];
and you could repeat that 50 times. That would however lead to duplicates, and the process to eliminate them would become messy.
So use a standard shuffle algorithm and then pick the first 50 :
Shuffle(detailList);
var selection = detailList.Take(50);
If you know in advance the total number of items from which to randomly pick, you can do it without converting to a list first.
The following method will do it for you:
/// <summary>Randomly selects items from a sequence.</summary>
/// <typeparam name="T">The type of the items in the sequence.</typeparam>
/// <param name="sequence">The sequence from which to randomly select items.</param>
/// <param name="count">The number of items to randomly select from the sequence.</param>
/// <param name="sequenceLength">The number of items in the sequence among which to randomly select.</param>
/// <param name="rng">The random number generator to use.</param>
/// <returns>A sequence of randomly selected items.</returns>
/// <remarks>This is an O(N) algorithm (N is the sequence length).</remarks>
public static IEnumerable<T> RandomlySelectedItems<T>(IEnumerable<T> sequence, int count, int sequenceLength, System.Random rng)
{
if (sequence == null)
{
throw new ArgumentNullException("sequence");
}
if (count < 0 || count > sequenceLength)
{
throw new ArgumentOutOfRangeException("count", count, "count must be between 0 and sequenceLength");
}
if (rng == null)
{
throw new ArgumentNullException("rng");
}
int available = sequenceLength;
int remaining = count;
var iterator = sequence.GetEnumerator();
for (int current = 0; current < sequenceLength; ++current)
{
iterator.MoveNext();
if (rng.NextDouble() < remaining/(double)available)
{
yield return iterator.Current;
--remaining;
}
--available;
}
}
(The key thing here is needing to know at the start the number of items to choose from; this does reduce the utility somewhat. But if getting the count is quick and buffering all the items would take too much memory, this is a useful solution.)
Here's another approach which uses Reservoir sampling
This approach DOES NOT need to know the total number of items to choose from, but it does need to buffer the output. Of course, it also needs to enumerate the entire input collection.
Therefore this is really only of use when you don't know in advance the number of items to choose from (or the number of items to choose from is very large).
I would recommend just shuffling a list as per Henk's answer rather than doing it this way, but I'm including it here for the sake of interest:
// n is the number of items to randomly choose.
public static List<T> RandomlyChooseItems<T>(IEnumerable<T> items, int n, Random rng)
{
var result = new List<T>(n);
int index = 0;
foreach (var item in items)
{
if (index < n)
{
result.Add(item);
}
else
{
int r = rng.Next(0, index + 1);
if (r < n)
result[r] = item;
}
++index;
}
return result;
}
As an addendum to Henk's answer, here's a canonical implementation of the Shuffle algorithm he mentions. In this, _rng is an instance of Random:
/// <summary>Shuffles the specified array.</summary>
/// <typeparam name="T">The type of the array elements.</typeparam>
/// <param name="array">The array to shuffle.</param>
public void Shuffle<T>(IList<T> array)
{
for (int n = array.Count; n > 1;)
{
int k = _rng.Next(n);
--n;
T temp = array[n];
array[n] = array[k];
array[k] = temp;
}
}
Random rnd = new Random();
IQueryable<Detail> details = myList.OrderBy(x => rnd.Next()).Take(50);
var l = new List<string>();
l.Add("A");
l.Add("B");
l.Add("C");
l.Add("D");
l.Add("E");
l.Add("F");
l.Add("G");
l.Add("H");
l.Add("I");
var random = new Random();
var nl = l.Select(i=> new {Value=i,Index = random.Next()});
var finalList = nl.OrderBy(i=>i.Index).Take(3);
foreach(var i in finalList)
{
Console.WriteLine(i.Value);
}
IQueryable<Detail> details = myList.OrderBy(x => Guid.NewGuid()).ToList();
After this just walk trough it linearly:
var item1 = details[0];
This will avoid duplicates.
This is what ended up working for me, it ensures no duplicates are returned:
public List<T> GetRandomItems(List<T> items, int count = 3)
{
var length = items.Count();
var list = new List<T>();
var rnd = new Random();
var seed = 0;
while (list.Count() < count)
{
seed = rnd.Next(0, length);
if(!list.Contains(items[seed]))
list.Add(items[seed]);
}
return list;
}
This question already has answers here:
Is using Random and OrderBy a good shuffle algorithm? [closed]
(13 answers)
Closed 9 years ago.
Part 1: All I am wanting to achieve is to write the numbers 1, 2, 3 ... 8, 9, 10 to the console window in random order. So all the numbers will need to be written to console window, but the order of them must be random.
Part 2: In my actual project I plan to write all of the elements in an array, to the console window in random order. I am assuming that if I can get the answer to part 1, I should easily be able to implement this with an array.
/// <summary>
/// Returns all numbers, between min and max inclusive, once in a random sequence.
/// </summary>
IEnumerable<int> UniqueRandom(int minInclusive, int maxInclusive)
{
List<int> candidates = new List<int>();
for (int i = minInclusive; i <= maxInclusive; i++)
{
candidates.Add(i);
}
Random rnd = new Random();
while (candidates.Count > 0)
{
int index = rnd.Next(candidates.Count);
yield return candidates[index];
candidates.RemoveAt(index);
}
}
In your program
Console.WriteLine("All numbers between 0 and 10 in random order:");
foreach (int i in UniqueRandom(0, 10)) {
Console.WriteLine(i);
}
Enumerable.Range(1, 10).OrderBy(i => Guid.NewGuid()) works nicely.
using System;
using System.Collections;
namespace ConsoleApplication
{
class Numbers
{
public ArrayList RandomNumbers(int max)
{
// Create an ArrayList object that will hold the numbers
ArrayList lstNumbers = new ArrayList();
// The Random class will be used to generate numbers
Random rndNumber = new Random();
// Generate a random number between 1 and the Max
int number = rndNumber.Next(1, max + 1);
// Add this first random number to the list
lstNumbers.Add(number);
// Set a count of numbers to 0 to start
int count = 0;
do // Repeatedly...
{
// ... generate a random number between 1 and the Max
number = rndNumber.Next(1, max + 1);
// If the newly generated number in not yet in the list...
if (!lstNumbers.Contains(number))
{
// ... add it
lstNumbers.Add(number);
}
// Increase the count
count++;
} while (count <= 10 * max); // Do that again
// Once the list is built, return it
return lstNumbers;
}
}
Main
class Program
{
static int Main()
{
Numbers nbs = new Numbers();
const int Total = 10;
ArrayList lstNumbers = nbs.RandomNumbers(Total);
for (int i = 0; i < lstNumbers.Count; i++)
Console.WriteLine("{0}", lstNumbers[i].ToString());
return 0;
}
}
}
int[] ints = new int[11];
Random rand = new Random();
Random is a class built into .NET, and allows us to create random integers really, really easily. Basically all we have to do is call a method inside our rand object to get that random number, which is nice. So, inside our loop, we just set each element to the results of that method:
for (int i = 0; i < ints.Length; i++)
{
ints[i] = rand.Next(11);
}
We are essentially filling our entire array with random numbers here, all between 0 and 10. At this point all we have to do is display the contents for the user, which can be done with a foreach loop:
foreach (int i in ints)
{
Console.WriteLine(i.ToString());
}