Create ThreadLocal random generators with known seeds - c#

I'm struggling to find a way to have a single random number generator per thread, while at the same time making sure that when the program is re-run, the same numbers are produced.
What I do now is something like this:
class Program {
static void Main(string[] args) {
var seed = 10;
var data = new List<double>();
var dataGenerator = new Random(seed);
for (int i = 0; i < 10000; i++) {
data.Add(dataGenerator.NextDouble());
}
var results = new ConcurrentBag<double>();
Parallel.ForEach(data, (d) => {
var result = Calculate(d, new Random(d.GetHashCode());
results.Add(result);
});
}
static double Calculate(double x, Random random) {
return x * random.NextDouble();
}
}
Because the randomgenerator that creates the 'data' list is provided a seed and the randomgenerators that are used in the calculation are provided a seed based on the hashcode of the number being processed, the results are repeatable. Regardless the number of threads and the order in which they are instantiated.
I'm wondering if it's possible to instantiate just a single randomgenerator for each thread. The following following piece of code seems to accomplish that, but because the random generators are not provided with a (reproducible) seed anymore, the results are not repeatable.
class Program {
static void Main(string[] args) {
var seed = 10;
var data = new List<double>();
var dataGenerator = new Random(seed);
for (int i = 0; i < 10000; i++) {
data.Add(dataGenerator.NextDouble());
}
var results = new ConcurrentBag<double>();
var localRandom = new ThreadLocal<Random>(() => new Random());
Parallel.ForEach(data, (d) => {
var result = Calculate(d, localRandom.Value);
results.Add(result);
});
}
static double Calculate(double x, Random random) {
return x * random.NextDouble();
}
}
Can anyone think of a nice solution to this problem?

It's possible, indeed you very nearly do it correctly in your question, but the problem is that that isn't quite what you want.
If you seeded your thread-local Random with the same number each time, you would make the results deterministic within that thread, related to the number of previous operations. What you want is a pseudo-random number that is deterministic relative to the input.
Well, you could just stick with Random(). It's not that heavy.
Alternatively, you can have your own pseudo-random algorithm. Here's a simple example based on a re-hashing algorithm (intended to distribute the bits of hashcodes even better):
private static double Calculate(double x)
{
unchecked
{
uint h = (uint)x.GetHashCode();
h += (h << 15) ^ 0xffffcd7d;
h ^= (h >> 10);
h += (h << 3);
h ^= (h >> 6);
h += (h << 2) + (h << 14);
return (h ^ (h >> 16)) / (double)uint.MaxValue * x;
}
}
This isn't a particularly good pseudo-random generator, but it's pretty fast. It also does no allocation and leads to no garbage collection.
There-in lies the trade-off of this entire approach; you can simplify the above and be even faster but less "random" or you can be more "random" for more effort. I'm sure there's code out there that is both faster and more "random" than the above, which is more to demonstrate the approach than anything else, but among the rival algorithms you're looking at a trade-off of the quality of the generated number versus the performance. new Random(d).NextDouble() is at a particular point in that trade-off, other approaches are at other points.
Edit: The re-hashing algorithm I used is a Wang/Jenkins hash. I couldn't remember the name when I wrote it.
Edit: Having a better idea of your requirements from the comments, I'd now say that...
You want to create a PRNG class, it could use the algorithm above, that of System.Random (taking reflected code as a starting point), the 128bitXorShift algorithm you mention or whatever. The important difference is that it must have a Reseed method. For example, if you copied System.Random's approach, your reseed would look like most of the constructor's body (indeed, you'd probably refactor so that apart from maybe creating the array it uses, the constructor would call into reseed).
Then you'd create an instance per thread, and call .Reseed(d.GetHashCode()) at the point where you'd create a new Random in your existing code.
Note also that this gives you another advantage, which is that if you depend upon consistent results from your PRNG (which it seems you do), then the fact that you are not promised a consistent algorithm in System.Random between framework versions (perhaps even including patches and security fixes) is a bad point for you, and this approach adds consistency.
However, you are also not promised a consistent algorithm to double.GetHashCode(). I'd doubt they'd change it (unlike string.GetHashCode(), which is often changed), but just in case you could make your Reseed() take a double do something like:
private static unsafe int GetSeedInteger(double d)
{
if(d == 0.0)
return 0;
long num = *((long*)&d);
return ((int)num) ^ (int)(num >> 32);
}
Which pretty much just copies the current double.GetHashCode(), but now you'll be consistent in the face of framework changes.
It might be worth considering breaking the set of tasks into chunks yourself, creating threads for each chunk, and then just creating this object as a local in the per-chunk method.
Pros:
Accessing ThreadLocal<T> is more expensive than accessing a local T.
If the tasks are consistent in relative time to execute, you don't need a lot of Parallel.ForEach's cleverness.
Cons:
Parallel.ForEach is really good at balancing things out. What you're doing has to be very naturally balanced, or saving a lot on a pre-chunk basis, before eschewing its use gains you anything.

Related

Best way to GetHashCode() for 44-bit number stored as Int64

I have around 5,000,000 objects stored in a Dictionary<MyKey, MyValue>.
MyKey is a struct that packs each component of my key (5 different numbers) in the right-most 44 bits of an Int64 (ulong).
Since the ulong will always start with 20 zero-bits, my gut feeling is that returning the native Int64.GetHashCode() implementation is likely to collide more often, than if the hash code implementation only considers the 44 bits that are actually in use (although mathematically, I wouldn't know where to begin to prove that theory).
This increases the number of calls to .Equals() and makes dictionary lookups slower.
The .NET implementation of Int64.GetHashCode() looks like this:
public override int GetHashCode()
{
return (int)this ^ (int)(this >> 32);
}
How would I best implement GetHashCode()?
I couldn't begin to suggest a "best" way to hash 44-bit numbers. But, I can suggest a way to compare it to the 64-bit hash algorithm.
One way to do this is to simply check how many collisions you get for a set of numbers (as suggested by McKenzie et al in Selecting a Hashing Algorithm) Unless you're going to test all possible values of your set, you'll need to judge whether the # of collisions you get is acceptable. This could be done in code with something like:
var rand = new Random(42);
var dict64 = new Dictionary<int, int>();
var dict44 = new Dictionary<int, int>();
for (int i = 0; i < 100000; ++i)
{
// get value between 0 and 0xfffffffffff (max 44-bit value)
var value44 = (ulong)(rand.NextDouble() * 0x0FFFFFFFFFFF);
var value64 = (ulong)(rand.NextDouble() * ulong.MaxValue);
var hash64 = value64.GetHashCode();
var hash44 = (int)value44 ^ (int)(value44>> 32);
if (!dict64.ContainsValue(hash64))
{
dict64.Add(hash64,hash64);
}
if (!dict44.ContainsValue(hash44))
{
dict44.Add(hash44, hash44);
}
}
Trace.WriteLine(string.Format("64-bit hash: {0}, 64-bit hash with 44-bit numbers {1}", dict64.Count, dict44.Count));
In other words, consistently generate 100,000 random 64-bit values and 100,000 random 44-bit values, perform a hash on each and keep track of unique values.
In my test this generated 99998 unique values for 44-bit numbers and 99997 unique values for 64-bit numbers. So, that's one less collision for 44-bit numbers over 64-bit numbers. I would expect less collisions with 44-bit numbers simply because you have less possible inputs.
I'm not going to tell you the 64-bit hash method is "best" for 44-bit; you'll have to decide if these results mean it's good for your circumstances.
Ideally you should be testing with realistic values that your application is likely to generate. Given those will all be 44-bit values, it's hard to compare that to the collisions ulong.GetHashCode() produces (i.e. you'd have identical results). If random values based on a constant seed isn't good enough, modify the code with something better.
While things might not "feel" right, science suggests there's no point in changing something without reproducible tests that prove a change is necessary.
Here's my attempt to answer this question, which I'm posting despite the fact that the answer is the opposite of what I was expecting. (Although I may have made a mistake somewhere - I almost hope so, and am open to criticism regarding my test technique.)
// Number of Dictionary hash buckets found here:
// http://stackoverflow.com/questions/24366444/how-many-hash-buckets-does-a-net-dictionary-use
const int CNumberHashBuckets = 4999559;
static void Main(string[] args)
{
Random randomNumberGenerator = new Random();
int[] dictionaryBuckets1 = new int[CNumberHashBuckets];
int[] dictionaryBuckets2 = new int[CNumberHashBuckets];
for (int i = 0; i < 5000000; i++)
{
ulong randomKey = (ulong)(randomNumberGenerator.NextDouble() * 0x0FFFFFFFFFFF);
int simpleHash = randomKey.GetHashCode();
BumpHashBucket(dictionaryBuckets1, simpleHash);
int superHash = ((int)(randomKey >> 12)).GetHashCode() ^ ((int)randomKey).GetHashCode();
BumpHashBucket(dictionaryBuckets2, superHash);
}
int collisions1 = ComputeCollisions(dictionaryBuckets1);
int collisions2 = ComputeCollisions(dictionaryBuckets2);
}
private static void BumpHashBucket(int[] dictionaryBuckets, int hashedKey)
{
int bucketIndex = (int)((uint)hashedKey % CNumberHashBuckets);
dictionaryBuckets[bucketIndex]++;
}
private static int ComputeCollisions(int[] dictionaryBuckets)
{
int i = 0;
foreach (int dictionaryBucket in dictionaryBuckets)
i += Math.Max(dictionaryBucket - 1, 0);
return i;
}
I try to simulate how the processing done by Dictionary will work. The OP says he has "around 5,000,000" objects in a Dictionary, and according to the referenced source there will be either 4999559 or 5999471 "buckets" in the Dictionary.
Then I generate 5,000,000 random 44-bit keys to simulate the OP's Dictionary entries, and for each key I hash it two different ways: the simple ulong.GetHashCode() and an alternative way that I suggested in a comment. Then I turn each hash code into a bucket index using modulo - I assume that's how it is done by Dictionary. This is used to increment the pseudo buckets as a way of computing the number of collisions.
Unfortunately (for me) the results are not as I was hoping. With 4999559 buckets the simulation typically indicates around 1.8 million collisions, with my "super hash" technique actually having a few (around 0.01%) MORE collisions. With 5999471 buckets there are typically around 1.6 million collisions, and my so-called super hash gives maybe 0.1% fewer collisions.
So my "gut feeling" was wrong, and there seems to be no justification for trying to find a better hash code technique.

How to best implement K-nearest neighbours in C# for large number of dimensions?

I'm implementing the K-nearest neighbours classification algorithm in C# for a training and testing set of about 20,000 samples each, and 25 dimensions.
There are only two classes, represented by '0' and '1' in my implementation. For now, I have the following simple implementation :
// testSamples and trainSamples consists of about 20k vectors each with 25 dimensions
// trainClasses contains 0 or 1 signifying the corresponding class for each sample in trainSamples
static int[] TestKnnCase(IList<double[]> trainSamples, IList<double[]> testSamples, IList<int[]> trainClasses, int K)
{
Console.WriteLine("Performing KNN with K = "+K);
var testResults = new int[testSamples.Count()];
var testNumber = testSamples.Count();
var trainNumber = trainSamples.Count();
// Declaring these here so that I don't have to 'new' them over and over again in the main loop,
// just to save some overhead
var distances = new double[trainNumber][];
for (var i = 0; i < trainNumber; i++)
{
distances[i] = new double[2]; // Will store both distance and index in here
}
// Performing KNN ...
for (var tst = 0; tst < testNumber; tst++)
{
// For every test sample, calculate distance from every training sample
Parallel.For(0, trainNumber, trn =>
{
var dist = GetDistance(testSamples[tst], trainSamples[trn]);
// Storing distance as well as index
distances[trn][0] = dist;
distances[trn][1] = trn;
});
// Sort distances and take top K (?What happens in case of multiple points at the same distance?)
var votingDistances = distances.AsParallel().OrderBy(t => t[0]).Take(K);
// Do a 'majority vote' to classify test sample
var yea = 0.0;
var nay = 0.0;
foreach (var voter in votingDistances)
{
if (trainClasses[(int)voter[1]] == 1)
yea++;
else
nay++;
}
if (yea > nay)
testResults[tst] = 1;
else
testResults[tst] = 0;
}
return testResults;
}
// Calculates and returns square of Euclidean distance between two vectors
static double GetDistance(IList<double> sample1, IList<double> sample2)
{
var distance = 0.0;
// assume sample1 and sample2 are valid i.e. same length
for (var i = 0; i < sample1.Count; i++)
{
var temp = sample1[i] - sample2[i];
distance += temp * temp;
}
return distance;
}
This takes quite a bit of time to execute. On my system it takes about 80 seconds to complete. How can I optimize this, while ensuring that it would also scale to larger number of data samples? As you can see, I've tried using PLINQ and parallel for loops, which did help (without these, it was taking about 120 seconds). What else can I do?
I've read about KD-trees being efficient for KNN in general, but every source I read stated that they're not efficient for higher dimensions.
I also found this stackoverflow discussion about this, but it seems like this is 3 years old, and I was hoping that someone would know about better solutions to this problem by now.
I've looked at machine learning libraries in C#, but for various reasons I don't want to call R or C code from my C# program, and some other libraries I saw were no more efficient than the code I've written. Now I'm just trying to figure out how I could write the most optimized code for this myself.
Edited to add - I cannot reduce the number of dimensions using PCA or something. For this particular model, 25 dimensions are required.
Whenever you are attempting to improve the performance of code, the first step is to analyze the current performance to see exactly where it is spending its time. A good profiler is crucial for this. In my previous job I was able to use the dotTrace profiler to good effect; Visual Studio also has a built-in profiler. A good profiler will tell you exactly where you code is spending time method-by-method or even line-by-line.
That being said, a few things come to mind in reading your implementation:
You are parallelizing some inner loops. Could you parallelize the outer loop instead? There is a small but nonzero cost associated to a delegate call (see here or here) which may be hitting you in the "Parallel.For" callback.
Similarly there is a small performance penalty for indexing through an array using its IList interface. You might consider declaring the array arguments to "GetDistance()" explicitly.
How large is K as compared to the size of the training array? You are completely sorting the "distances" array and taking the top K, but if K is much smaller than the array size it might make sense to use a partial sort / selection algorithm, for instance by using a SortedSet and replacing the smallest element when the set size exceeds K.

Performance reading large Dataset from Multiple parallel threads

I’m working on a Genetic Machine Learning project developed in .Net (as opposed to Matlab – My Norm). I’m no pro .net coder so excuse any noobish implementations.
The project itself is huge so I won’t bore you with the full details but basically a population of Artificial Neural Networks (like decision trees) are each evaluated on a problem domain that in this case uses a stream of sensory inputs. The top performers in the population are allowed to breed and produced offspring (that inherit tendencies from both parents) and the poor performers are killed off or breed-out of the population. Evolution continues until an acceptable solution is found. Once found, the final evolved ‘Network’ is extracted from the lab and placed in a light-weight real-world application. The technique can be used to develop very complex control solution that would be almost impossible or too time consuming to program normally, like automated Car driving, mechanical stability control, datacentre load balancing etc, etc.
Anyway, the project has been a huge success so far and is producing amazing results, but the only problem is the very slow performance once I move to larger datasets. I’m hoping is just my code, so would really appreciate some expert help.
In this project, convergence to a solution close to an ideal can often take around 7 days of processing! Just making a little tweak to a parameter and waiting for results is just too painful.
Basically, multiple parallel threads need to read sequential sections of a very large dataset (the data does not change once loaded). The dataset consists of around 300 to 1000 Doubles in a row and anything over 500k rows. As the dataset can exceed the .Net object limit of 2GB, it can’t be stored in normal 2d array – The simplest way round this was to use a Generic List of single arrays.
The parallel scalability seems to be a big limiting factor as running the code on a beast of a server with 32 Xeon cores that normally eats Big dataset for breakfast does not yield much of a performance gain over a Corei3 desktop!
Performance gains quickly dwindle away as the number of cores increases.
From profiling the code (with my limited knowledge) I get the impression that there is a huge amount of contention reading the dataset from multiple threads.
I’ve tried experimenting with different dataset implementations using Jagged arrays and various concurrent collections but to no avail.
I’ve knocked up a quick and dirty bit of code for benchmarking that is similar to the core implementation of the original and still exhibits the similar read performance issues and parallel scalability issues.
Any thoughts or suggestions would be much appreciated or confirmation that this is the best I’m going to get.
Many thanks
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
//Benchmark script to time how long it takes to read dataset per iteration
namespace Benchmark_Simple
{
class Program
{
public static TrainingDataSet _DataSet;
public static int Features = 100; //Real test will require 300+
public static int Rows = 200000; //Real test will require 500K+
public static int _PopulationSize = 500; //Real test will require 1000+
public static int _Iterations = 10;
public static List<NeuralNetwork> _NeuralNetworkPopulation = new List<NeuralNetwork>();
static void Main()
{
Stopwatch _Stopwatch = new Stopwatch();
//Create Dataset
Console.WriteLine("Creating Training DataSet");
_DataSet = new TrainingDataSet(Features, Rows);
Console.WriteLine("Finished Creating Training DataSet");
//Create Neural Network Population
for (int i = 0; i <= _PopulationSize - 1; i++)
{
_NeuralNetworkPopulation.Add(new NeuralNetwork());
}
//Main Loop
for (int i = 0; i <= _Iterations - 1; i++)
{
_Stopwatch.Restart();
Parallel.ForEach(_NeuralNetworkPopulation, _Network => { EvaluateNetwork(_Network); });
//######## Removed for simplicity ##########
//Run Evolutionary Genetic Algorithm on population - I.E. Breed the strong, kill of the weak
//##########################################
//Repeat until acceptable solution is found
Console.WriteLine("Iteration time: {0}", _Stopwatch.ElapsedMilliseconds / 1000);
_Stopwatch.Stop();
}
Console.ReadLine();
}
private static void EvaluateNetwork(NeuralNetwork Network)
{
//Evaluate network on 10% of the Training Data at a random starting point
double Score = 0;
Random Rand = new Random();
int Count = (Rows / 100) * 10;
int RandonStart = Rand.Next(0, Rows - Count);
//The data must be read sequentially
for (int i = RandonStart; i <= RandonStart + Count; i++)
{
double[] NetworkInputArray = _DataSet.GetDataRow(i);
//####### Dummy Evaluation - just give it somthing to do for the sake of it
double[] Temp = new double[NetworkInputArray.Length + 1];
for (int j = 0; j <= NetworkInputArray.Length - 1; j++)
{
Temp[j] = Math.Log(NetworkInputArray[j] * Rand.NextDouble());
}
Score += Rand.NextDouble();
//##################
}
Network.Score = Score;
}
public class TrainingDataSet
{
//Simple demo class of fake data for benchmarking
private List<double[]> DataList = new List<double[]>();
public TrainingDataSet(int Features, int Rows)
{
Random Rand = new Random();
for (int i = 1; i <= Rows; i++)
{
double[] NewRow = new double[Features];
for (int j = 0; j <= Features - 1; j++)
{
NewRow[j] = Rand.NextDouble();
}
DataList.Add(NewRow);
}
}
public double[] GetDataRow(int Index)
{
return DataList[Index];
}
}
public class NeuralNetwork
{
//Simple Class to represent a dummy Neural Network -
private double _Score;
public NeuralNetwork()
{
}
public double Score
{
get { return _Score; }
set { _Score = value; }
}
}
}
}
The first thing is that the only way to answer any performance questions is by profiling the application. I'm using the VS 2012 builtin profiler - there are others https://stackoverflow.com/a/100490/19624
From an initial read through the code, i.e. a static analysis the only thing that jumped out at me was the continual reallocation of Temp inside the loop; this is not efficient and if possible needs moving outside of the loop.
With a profiler you can see what's happening:
I profiled first using the code you posted, (top marks to you for posting a full compilable example of the problem, if you hadn't I wouldn't be answering this now).
This shows me that the bulk is in the inside of the loop, I moved the allocation to the the Parallel.ForEach loop.
Parallel.ForEach(_NeuralNetworkPopulation, _Network =>
{
double[] Temp = new double[Features + 1];
EvaluateNetwork(_Network, Temp);
});
So what I can see from the above is that there is 4.4% wastage on the reallocation; but the probably unsurprising thing is that it is the inner loop that is taking 87.6%.
This takes me to my first rule of optimisation which is to first to review your algorithm rather than optimizing the code. A poor implementation of a good algorithm is usually faster than a highly optimized poor algorithm.
Removing the repeated allocate of Temp changes the picture slightly;
Also worth tuning a bit by specifying the parallelism; I've found that Parallel.ForEach is good enough for what I use it for, but again you may get better results from manually partitioning the work up into queues.
Parallel.ForEach(_NeuralNetworkPopulation,
new ParallelOptions { MaxDegreeOfParallelism = 32 },
_Network =>
{
double[] Temp = new double[Features + 1];
EvaluateNetwork(_Network, Temp);
});
Whilst running I'm getting what I'd expect in terms of CPU usage: although my machine was also running another lengthy process which was taking the base level (the peak in the chart below is when profiling this program).
So to summarize
Review the most frequently executed part and come up with new algorithm if possible.
Profile on the target machine
Only when you're sure about (1) above is it then worth looking at optimising the algorithm; considering the following
a) Code optimisations
b) Memory tuning / partioning of data to keep as much in cache
c) Improvements to threading usage

C# Random number hits far often than expected

We have this random number generator:
Random rnd = new Random();
bool PiratePrincess = rnd.Next(1, 5000) == 1;
This is called on every page view. There should be a 1/5000 chance the variable is True. However, in ~15,000 page views this has been True about 20 times!
Can someone explain why this is the case, and how to prevent this so it is roughly 1/5000 times? It's not important at all for this to be truly random.
Edit: Should this do the trick?
Random rnd = new Random();
bool PiratePrincess = (rnd.Next() + ThisUser.UserID + ThisUser.LastVisit.Ticks + DateTime.Now.Ticks) % 5000 == 1;
How quickly are these pageviews coming in? new Random will initialize based on the current time, so many overlapping requests will get the same random seed. Randomize the seed based on the remote IP address hashed with the current time for more uniqueness.
That said, it is possible to flip a coin 20 times and get heads every single time. It's a legitimate random outcome.
Edit:this will do it
var r = new Random(
Convert.ToInt32(
(ThisUser.UserID ^ ThisUser.LastVisit.Ticks ^ DateTime.Now.Ticks) & 0xFFFFFFFF)
);
var isPiratePrincess = (r.Next (5000) == 42);
You only need instantiate a single instance of Random, thus:
public class Widget
{
private static rngInstance = new Random() ;
public bool IsPiratePrincess()
{
bool isTrue = rngInstance.Next(1, 5000) == 1 ;
return isTrue ;
}
}
Pseudo-random number generators implement a series. Each time you instantiate a new instance, it seeds itself based on (among other things) the current time-of-day and it starts a new series.
If you instantiate a new one on each invocation, and the instantiations are frequent enough, you'll likely see similarities in the stream of pseudo-random values generated, since the initial seed values are likely to be close to each other.
Edited to note: Since System.Random is not thread-safe, you probably want to wrap in such a way as to make it thread-safe. This class (from Getting Random Numbers in Thread-Safe Way) will do the trick. It uses a per-thread static field and instantiates a RNG instance for each thread:
public static class RandomGen2
{
private static Random _global = new Random();
[ThreadStatic]
private static Random _local;
public static int Next()
{
Random inst = _local;
if (inst == null)
{
int seed;
lock (_global) seed = _global.Next();
_local = inst = new Random(seed);
}
return inst.Next();
}
}
I'm a little dubious about seeding each RNG instance with the output of another RNG. That strikes me a liable to bias the results. I think it might be better to use the default constructor.
Another approach would be to latch each access to a single RNG instance:
public static class RandomGen1
{
private static Random _inst = new Random();
public static int Next()
{
lock (_inst) return _inst.Next();
}
}
But that has some performance issues (bottleneck, plus overhead of a lock on each call).
You're creating a new random number generator with a new seed every time you're generating a number. The distribution of the values you get will have more to do with the seed than it does with the distribution characteristics of the algorithm being used.
I'm not sure what the best way is to achieve a more even distribution. If your traffic volume is low enough, you could use a hit counter on the page on check for divisibility by 5000, but that kind of approach would quickly run into contention problems if you tried to scale it.
If the problem is really the seed (as many people including myself suspect), you can either persist an instance of Random() (which will cause concurrency issues under high volume, but is fine for ~15,000 hits per day), or you can introduce more entropy to the seed.
If this were for an application where you do not want determined people to break the pseudo-random characteristics, you should look into software or hardware that generates a good seed on your server (poker websites often use a hardware entropy generator).
If you just want a good distribution and don't expect people to try to hack your solution, consider just blending various sources of entropy (the current timestamp, hash of the user's user agent string, the IPv4 address or hash of the IPv6 address, etc.).
UPDATE: You mention you have the user ID too. Hash that for entropy along with one or more of the items mentioned above, especially the ticks from the current timestamp.
You can get exactly 1/5000 if you avoid Random altogether.
vals = Enumerable.Repeat(false, 4999).ToList();
vals.Add(true);
// this or an in-memory shuffle if that is a concern
isPiratePrincess = vals.OrderBy(v => Guid.NewGuid()).ToList();
// remove values as it's queried & reset when empty.
The problem with using Random here is that you could be in a state where 20 in 15K were legitimately hit. In the next several thousand iterations, you'll hit a cold streak that regresses you towards the expected mean.
The random number generator uses the current time as a seed for the random number generator. So, if two come in at the same time, they could theoretically have the same seed. If you want your numbers to be unique with each other, you might want to use the same random number generator for all instances of the page.

more than 1 sequence of random numbers c#, linq

i was using this code to generate a random sequence of numbers:
var sequence = Enumerable.Range(0, 9).OrderBy(n => n * n * (new Random()).Next());
everything was ok till i need more than one sequence, in this code i call the routine 10 times, and the results are my problem, all the sequence are equal.
int i = 0;
while (i<10)
{
Console.Write("{0}:",i);
var sequence = Enumerable.Range(0, 9).OrderBy(n => n * n * (new Random()).Next());
sequence.ToList().ForEach(x=> Console.Write(x));
i++;
Console.WriteLine();
}
Can someone give me a hint of how actually generate different sequences? hopefully using LINQ
The problem is that you're creating a new instance of Random on each iteration. Each instance will be taking its initial seed from the current time, which changes relatively infrequently compared to how often your delegate is getting executed. You could create a single instance of Random and use it repeatedly. See my article on randomness for more details.
However, I would also suggest that you use a Fisher-Yates shuffle instead of OrderBy in order to shuffle the values (there are plenty of examples on Stack Overflow, such as this one)... although it looks like you're trying to bias the randomness somewhat. If you could give more details as to exactly what you're trying to do, we may be able to help more.
You are creating 10 instances of Random in quick succession, and pulling the first pseudo-random number from each of them. I'm not surprised they're all the same.
Try this:
Random r = new Random();
var sequence = Enumerable.Range(0, 9).OrderBy(n => n * n * r.Next());
In my code I use this static method I wrote many years ago, and it still shows good randomization:
using System.Security.Cryptography;
...
public static int GenerateRandomInt(int from, int to)
{
byte[] salt = new byte[4];
RandomNumberGenerator rng = RandomNumberGenerator.Create();
rng.GetBytes(salt);
int num = 0;
for (int i = 0; i < 4; i++)
{
num += salt[i];
}
return num % (to + 1 - from) + from;
}
Can't explain this example in detail, I need to bring myself back in time to remember, but I don't care ;)

Categories