more than 1 sequence of random numbers c#, linq

more than 1 sequence of random numbers c#, linq - c#

i was using this code to generate a random sequence of numbers:
var sequence = Enumerable.Range(0, 9).OrderBy(n => n * n * (new Random()).Next());
everything was ok till i need more than one sequence, in this code i call the routine 10 times, and the results are my problem, all the sequence are equal.
int i = 0;
while (i<10)
{
Console.Write("{0}:",i);
var sequence = Enumerable.Range(0, 9).OrderBy(n => n * n * (new Random()).Next());
sequence.ToList().ForEach(x=> Console.Write(x));
i++;
Console.WriteLine();
}
Can someone give me a hint of how actually generate different sequences? hopefully using LINQ

The problem is that you're creating a new instance of Random on each iteration. Each instance will be taking its initial seed from the current time, which changes relatively infrequently compared to how often your delegate is getting executed. You could create a single instance of Random and use it repeatedly. See my article on randomness for more details.
However, I would also suggest that you use a Fisher-Yates shuffle instead of OrderBy in order to shuffle the values (there are plenty of examples on Stack Overflow, such as this one)... although it looks like you're trying to bias the randomness somewhat. If you could give more details as to exactly what you're trying to do, we may be able to help more.

You are creating 10 instances of Random in quick succession, and pulling the first pseudo-random number from each of them. I'm not surprised they're all the same.
Try this:
Random r = new Random();
var sequence = Enumerable.Range(0, 9).OrderBy(n => n * n * r.Next());

In my code I use this static method I wrote many years ago, and it still shows good randomization:
using System.Security.Cryptography;
...
public static int GenerateRandomInt(int from, int to)
{
byte[] salt = new byte[4];
RandomNumberGenerator rng = RandomNumberGenerator.Create();
rng.GetBytes(salt);
int num = 0;
for (int i = 0; i < 4; i++)
{
num += salt[i];
}
return num % (to + 1 - from) + from;
}
Can't explain this example in detail, I need to bring myself back in time to remember, but I don't care ;)

Related

Median Maintenance Algorithm - Same implementation yields different results depending on Int32 or Int64

I found something interesting while doing a HW question.
The howework question asks to code the Median Maintenance algorithm.
The formal statement is as follows:
The goal of this problem is to implement the "Median Maintenance" algorithm (covered in the Week 5 lecture on heap applications). The text file contains a list of the integers from 1 to 10000 in unsorted order; you should treat this as a stream of numbers, arriving one by one. Letting xi denote the ith number of the file, the kth median mk is defined as the median of the numbers x1,…,xk. (So, if k is odd, then mk is ((k+1)/2)th smallest number among x1,…,xk; if k is even, then m1 is the (k/2)th smallest number among x1,…,xk.)
In order to get O(n) running time, this should be implemented using heaps obviously. Anyways, I coded this using Brute Force (deadline was too soon and needed an answer right away) (O(n2)) with the following steps:
Read data in
Sort array
Find Median
Add it to running time
I ran the algorithm through several test cases (with a known answer) and got the correct results, however when I was running the same algorithm on a larger data set I was getting the wrong answer. I was doing all the operations using Int64 ro represent the data.
Then I tried switching to Int32 and magically I got the correct answer which makes no sense to me.
The code is below, and it is also found here (the data is in the repo). The algorithm starts to give erroneous results after the 3810 index:
private static void Main(string[] args)
{
MedianMaintenance("Question2.txt");
}
private static void MedianMaintenance(string filename)
{
var txtData = File.ReadLines(filename).ToArray();
var inputData32 = new List<Int32>();
var medians32 = new List<Int32>();
var sums32 = new List<Int32>();
var inputData64 = new List<Int64>();
var medians64 = new List<Int64>();
var sums64 = new List<Int64>();
var sum = 0;
var sum64 = 0f;
var i = 0;
foreach (var s in txtData)
{
//Add to sorted list
var intToAdd = Convert.ToInt32(s);
inputData32.Add(intToAdd);
inputData64.Add(Convert.ToInt64(s));
//Compute sum
var count = inputData32.Count;
inputData32.Sort();
inputData64.Sort();
var index = 0;
if (count%2 == 0)
{
//Even number of elements
index = count/2 - 1;
}
else
{
//Number is odd
index = ((count + 1)/2) - 1;
}
var val32 = Convert.ToInt32(inputData32[index]);
var val64 = Convert.ToInt64(inputData64[index]);
if (i > 3810)
{
var t = sum;
var t1 = sum + val32;
}
medians32.Add(val32);
medians64.Add(val64);
//Debug.WriteLine("Median is {0}", val);
sum += val32;
sums32.Add(Convert.ToInt32(sum));
sum64 += val64;
sums64.Add(Convert.ToInt64(sum64));
i++;
}
Console.WriteLine("Median Maintenance result is {0}", (sum).ToString("N"));
Console.WriteLine("Median Maintenance result is {0}", (medians32.Sum()).ToString("N"));
Console.WriteLine("Median Maintenance result is {0} - Int64", (sum64).ToString("N"));
Console.WriteLine("Median Maintenance result is {0} - Int64", (medians64.Sum()).ToString("N"));
}
What's more interesting is that the running sum (in the sum64 variable) yields a different result than summing all items in the list with LINQ's Sum() function.
The results (the thirs one is the one that's wrong):
These are the computer details:
I'll appreciate if someone can give me some insights on why is this happening.
Thanks,

0f is initializing a 32 bit float variable, you meant 0d or 0.0 to receive a 64 bit floating point.
As for linq, you'll probably get better results if you use strongly typed lists.
new List<int>()
new List<long>()

The first thing I notice is what the commenter did: var sum64 = 0f initializes sum64 as a float. As the median value of a collection of Int64s will itself be an Int64 (the specified rules don't use the mean between two midpoint values in a collection of even cardinality), you should instead declare this variable explicitly as a long. In fact, I would go ahead and replace all usages of var in this code example; the convenience of var is being lost here in causing type-related bugs.

Why .NET group by is (much) slower when the number of buckets grows

Given this simple piece of code and 10mln array of random numbers:
static int Main(string[] args)
{
int size = 10000000;
int num = 10; //increase num to reduce number of buckets
int numOfBuckets = size/num;
int[] ar = new int[size];
Random r = new Random(); //initialize with randum numbers
for (int i = 0; i < size; i++)
ar[i] = r.Next(size);
var s = new Stopwatch();
s.Start();
var group = ar.GroupBy(i => i / num);
var l = group.Count();
s.Stop();
Console.WriteLine(s.ElapsedMilliseconds);
Console.ReadLine();
return 0;
}
I did some performance on grouping, so when the number of buckets is 10k the estimated execution time is 0.7s, for 100k buckets it is 2s, for 1m buckets it is 7.5s.
I wonder why is that. I imagine that if the GroupBy is implemented using HashTable there might be problem with collisions. For example initially the hashtable is prepard to work for let's say 1000 groups and then when the number of groups is growing it needs to increase the size and do the rehashing. If these was the case I could then write my own grouping where I would initialize the HashTable with expected number of buckets, I did that but it was only slightly faster.
So my question is, why number of buckets influences groupBy performance that much?
EDIT:
running under release mode change the results to 0.55s, 1.6s, 6.5s respectively.
I also changed the group.ToArray to piece of code below just to force execution of grouping :
foreach (var g in group)
array[g.Key] = 1;
where array is initialized before timer with appropriate size, the results stayed almost the same.
EDIT2:
You can see the working code from mellamokb in here pastebin.com/tJUYUhGL

I'm pretty certain this is showing the effects of memory locality (various levels of caching) and also object allocation.
To verify this, I took three steps:
Improve the benchmarking to avoid unnecessary parts and to garbage collect between tests
Remove the LINQ part by populating a Dictionary (which is effecively what GroupBy does behind the scenes)
Remove even Dictionary<,> and show the same trend for plain arrays.
In order to show this for arrays, I needed to increase the input size, but it does show the same kind of growth.
Here's a short but complete program which can be used to test both the dictionary and the array side - just flip which line is commented out in the middle:
using System;
using System.Collections.Generic;
using System.Diagnostics;
class Test
{
const int Size = 100000000;
const int Iterations = 3;
static void Main()
{
int[] input = new int[Size];
// Use the same seed for repeatability
var rng = new Random(0);
for (int i = 0; i < Size; i++)
{
input[i] = rng.Next(Size);
}
// Switch to PopulateArray to change which method is tested
Func<int[], int, TimeSpan> test = PopulateDictionary;
for (int buckets = 10; buckets <= Size; buckets *= 10)
{
TimeSpan total = TimeSpan.Zero;
for (int i = 0; i < Iterations; i++)
{
// Switch which line is commented to change the test
// total += PopulateDictionary(input, buckets);
total += PopulateArray(input, buckets);
GC.Collect();
GC.WaitForPendingFinalizers();
}
Console.WriteLine("{0,9}: {1,7}ms", buckets, (long) total.TotalMilliseconds);
}
}
static TimeSpan PopulateDictionary(int[] input, int buckets)
{
int divisor = input.Length / buckets;
var dictionary = new Dictionary<int, int>(buckets);
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
int count;
dictionary.TryGetValue(key, out count);
count++;
dictionary[key] = count;
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
static TimeSpan PopulateArray(int[] input, int buckets)
{
int[] output = new int[buckets];
int divisor = input.Length / buckets;
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
output[key]++;
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
}
Results on my machine:
PopulateDictionary:
10: 10500ms
100: 10556ms
1000: 10557ms
10000: 11303ms
100000: 15262ms
1000000: 54037ms
10000000: 64236ms // Why is this slower? See later.
100000000: 56753ms
PopulateArray:
10: 1298ms
100: 1287ms
1000: 1290ms
10000: 1286ms
100000: 1357ms
1000000: 2717ms
10000000: 5940ms
100000000: 7870ms
An earlier version of PopulateDictionary used an Int32Holder class, and created one for each bucket (when the lookup in the dictionary failed). This was faster when there was a small number of buckets (presumably because we were only going through the dictionary lookup path once per iteration instead of twice) but got significantly slower, and ended up running out of memory. This would contribute to fragmented memory access as well, of course. Note that PopulateDictionary specifies the capacity to start with, to avoid effects of data copying within the test.
The aim of using the PopulateArray method is to remove as much framework code as possible, leaving less to the imagination. I haven't yet tried using an array of a custom struct (with various different struct sizes) but that may be something you'd like to try too.
EDIT: I can reproduce the oddity of the slower result for 10000000 than 100000000 at will, regardless of test ordering. I don't understand why yet. It may well be specific to the exact processor and cache I'm using...
--EDIT--
The reason why 10000000 is slower than the 100000000 results has to do with the way hashing works. A few more tests explain this.
First off, let's look at the operations. There's Dictionary.FindEntry, which is used in the [] indexing and in Dictionary.TryGetValue, and there's Dictionary.Insert, which is used in the [] indexing and in Dictionary.Add. If we would just do a FindEntry, the timings would go up as we expect it:
static TimeSpan PopulateDictionary1(int[] input, int buckets)
{
int divisor = input.Length / buckets;
var dictionary = new Dictionary<int, int>(buckets);
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
int count;
dictionary.TryGetValue(key, out count);
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
This is implementation doesn't have to deal with hash collisions (because there are none), which makes the behavior as we expect it. Once we start dealing with collisions, the timings start to drop. If we have as much buckets as elements, there are obviously less collisions... To be exact, we can figure out exactly how many collisions there are by doing:
static TimeSpan PopulateDictionary(int[] input, int buckets)
{
int divisor = input.Length / buckets;
int c1, c2;
c1 = c2 = 0;
var dictionary = new Dictionary<int, int>(buckets);
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
int count;
if (!dictionary.TryGetValue(key, out count))
{
dictionary.Add(key, 1);
++c1;
}
else
{
count++;
dictionary[key] = count;
++c2;
}
}
stopwatch.Stop();
Console.WriteLine("{0}:{1}", c1, c2);
return stopwatch.Elapsed;
}
The result is something like this:
10:99999990
10: 4683ms
100:99999900
100: 4946ms
1000:99999000
1000: 4732ms
10000:99990000
10000: 4964ms
100000:99900000
100000: 7033ms
1000000:99000000
1000000: 22038ms
9999538:90000462 <<-
10000000: 26104ms
63196841:36803159 <<-
100000000: 25045ms
Note the value of '36803159'. This answers the question why the last result is faster than the first result: it simply has to do less operations -- and since caching fails anyways, that factor doesn't make a difference anymore.

10k the estimated execution time is 0.7s, for 100k buckets it is 2s, for 1m buckets it is 7.5s.
This is an important pattern to recognize when you profile code. It is one of the standard size vs execution time relationships in software algorithms. Just from seeing the behavior, you can tell a lot about the way the algorithm was implemented. And the other way around of course, from the algorithm you can predict the expected execution time. A relationship that's annotated in the Big Oh notation.
Speediest code you can get is amortized O(1), execution time barely increases when you double the size of the problem. The Dictionary<> class behaves that way, as John demonstrated. The increases in time as the problem set gets large is the "amortized" part. A side-effect of Dictionary having to perform linear O(n) searches in buckets that keep getting bigger.
A very common pattern is O(n). That tells you that there is a single for() loop in the algorithm that iterates over the collection. O(n^2) tells you there are two nested for() loops. O(n^3) has three, etcetera.
What you got is the one in between, O(log n). It is the standard complexity of a divide-and-conquer algorithm. In other words, each pass splits the problem in two, continuing with the smaller set. Very common, you see it back in sorting algorithms. Binary search is the one you find back in your text book. Note how log₂(10) = 3.3, very close to the increment you see in your test. Perf starts to tank a bit for very large sets due to the poor locality of reference, a cpu cache problem that's always associated with O(log n) algoritms.
The one thing that John's answer demonstrates is that his guess cannot be correct, GroupBy() certainly does not use a Dictionary<>. And it is not possible by design, Dictionary<> cannot provide an ordered collection. Where GroupBy() must be ordered, it says so in the MSDN Library:
The IGrouping objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping. Elements in a grouping are yielded in the order they appear in source.
Not having to maintain order is what makes Dictionary<> fast. Keeping order always cost O(log n), a binary tree in your text book.
Long story short, if you don't actually care about order, and you surely would not for random numbers, then you don't want to use GroupBy(). You want to use a Dictionary<>.

There are (at least) two influence factors: First, a hash table lookup only takes O(1) if you have a perfect hash function, which does not exist. Thus, you have hash collisions.
I guess more important, though, are caching effects. Modern CPUs have large caches, so for the smaller bucket count, the hash table itself might fit into the cache. As the hash table is frequently accessed, this might have a strong influence on the performance. If there are more buckets, more accesses to the RAM might be neccessary, which are slow compared to a cache hit.

There are a few factors at work here.
Hashes and groupings
The way grouping works is by creating a hash table. Each individual group then supports an 'add' operation, which adds an element to the add list. To put it bluntly, it's like a Dictionary<Key, List<Value>>.
Hash tables are always overallocated. If you add an element to the hash, it checks if there is enough capacity, and if not, recreates the hash table with a larger capacity (To be exact: new capacity = count * 2 with count the number of groups). However, a larger capacity means that the bucket index is no longer correct, which means you have to re-build the entries in the hash table. The Resize() method in Lookup<Key, Value> does this.
The 'groups' themselves work like a List<T>. These too are overallocated, but are easier to reallocate. To be precise: the data is simply copied (with Array.Copy in Array.Resize) and a new element is added. Since there's no re-hashing or calculation involved, this is quite a fast operation.
The initial capacity of a grouping is 7. This means, for 10 elements you need to reallocate 1 time, for 100 elements 4 times, for 1000 elements 8 times, and so on. Because you have to re-hash more elements each time, your code gets a bit slower each time the number of buckets grows.
I think these overallocations are the largest contributors to the small growth in the timings as the number of buckets grow. The easiest way to test this theory is to do no overallocations at all (test 1), and simply put counters in an array. The result can be shown below in the code for FixArrayTest (or if you like FixBucketTest which is closer to how groupings work). As you can see, the timings of # buckets = 10...10000 are the same, which is correct according to this theory.
Cache and random
Caching and random number generators aren't friends.
Our little test also shows that when the number of buckets grows above a certain threshold, memory comes into play. On my computer this is at an array size of roughly 4 MB (4 * number of buckets). Because the data is random, random chunks of RAM will be loaded and unloaded into the cache, which is a slow process. This is also the large jump in the speed. To see this in action, change the random numbers to a sequence (called 'test 2'), and - because the data pages can now be cached - the speed will remain the same overall.
Note that hashes overallocate, so you will hit the mark before you have a million entries in your grouping.
Test code
static void Main(string[] args)
{
int size = 10000000;
int[] ar = new int[size];
//random number init with numbers [0,size-1]
var r = new Random();
for (var i = 0; i < size; i++)
{
ar[i] = r.Next(0, size);
//ar[i] = i; // Test 2 -> uncomment to see the effects of caching more clearly
}
Console.WriteLine("Fixed dictionary:");
for (var numBuckets = 10; numBuckets <= 1000000; numBuckets *= 10)
{
var num = (size / numBuckets);
var timing = 0L;
for (var i = 0; i < 5; i++)
{
timing += FixBucketTest(ar, num);
//timing += FixArrayTest(ar, num); // test 1
}
var avg = ((float)timing) / 5.0f;
Console.WriteLine("Avg Time: " + avg + " ms for " + numBuckets);
}
Console.WriteLine("Fixed array:");
for (var numBuckets = 10; numBuckets <= 1000000; numBuckets *= 10)
{
var num = (size / numBuckets);
var timing = 0L;
for (var i = 0; i < 5; i++)
{
timing += FixArrayTest(ar, num); // test 1
}
var avg = ((float)timing) / 5.0f;
Console.WriteLine("Avg Time: " + avg + " ms for " + numBuckets);
}
}
static long FixBucketTest(int[] ar, int num)
{
// This test shows that timings will not grow for the smaller numbers of buckets if you don't have to re-allocate
System.Diagnostics.Stopwatch s = new Stopwatch();
s.Start();
var grouping = new Dictionary<int, List<int>>(ar.Length / num + 1); // exactly the right size
foreach (var item in ar)
{
int idx = item / num;
List<int> ll;
if (!grouping.TryGetValue(idx, out ll))
{
grouping.Add(idx, ll = new List<int>());
}
//ll.Add(item); //-> this would complete a 'grouper'; however, we don't want the overallocator of List to kick in
}
s.Stop();
return s.ElapsedMilliseconds;
}
// Test with arrays
static long FixArrayTest(int[] ar, int num)
{
System.Diagnostics.Stopwatch s = new Stopwatch();
s.Start();
int[] buf = new int[(ar.Length / num + 1) * 10];
foreach (var item in ar)
{
int code = (item & 0x7FFFFFFF) % buf.Length;
buf[code]++;
}
s.Stop();
return s.ElapsedMilliseconds;
}

When executing bigger calculations, less physical memory is available on the computer, counting the buckets will be slower with less memory, as you expend the buckets, your memory will decrease.
Try something like the following:
int size = 2500000; //10000000 divided by 4
int[] ar = new int[size];
//random number init with numbers [0,size-1]
System.Diagnostics.Stopwatch s = new Stopwatch();
s.Start();
for (int i = 0; i<4; i++)
{
var group = ar.GroupBy(i => i / num);
//the number of expected buckets is size / num.
var l = group.ToArray();
}
s.Stop();
calcuting 4 times with lower numbers.

Why am I getting strange results when generating Random() numbers?

My program needs to:
a. Generate an array of 20 random integers from zero to nine. Search for the first occurrence, if any, of the number 7, and report its position in the array.
b. Repeat the computation of part a 1000 times, and for each position in the array, report the number of times that the first occurrence of a 7 in the array is at that position
However whenever I run the program I get strange results (different every time) such as:
No sevens found at any position
1000 sevens found at one position and no sevens found anywhere else
Hundreds of sevens found in 2 positions, and none found anywhere else.
Does anyone have an idea what is wrong with my program?
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace Week_6_Project_2
{
class Program
{
static int intArrayLength = 20;
static int[] resultsArray = new int[intArrayLength];
public static Array generateRandomArray() {
int[] randomNumberArray = new int[intArrayLength];
Random random = new Random();
int popcounter = 0;
while (popcounter < intArrayLength) {
randomNumberArray[popcounter] = random.Next(0, 10);
popcounter += 1;
}
return randomNumberArray;
}
public static void searchForSevens()
{
int counter = 0;
int[] randomArray = (int[])generateRandomArray();
while (counter < intArrayLength)
{
if (randomArray[counter] == 7)
{
resultsArray[counter] += 1;
counter = intArrayLength;
}
counter += 1;
}
}
static void Main()
{
int searchCounter = 0;
while (searchCounter < 1000)
{
searchForSevens();
searchCounter += 1;
}
int displayCounter = 0;
while (displayCounter < intArrayLength)
{
Console.WriteLine("Number of first occurrence of 7 at position {0} = {1}", displayCounter, resultsArray[displayCounter]);
displayCounter += 1;
}
Console.ReadLine();
}
}
}

Your main problem is that each searchForSevens() test only takes a small fraction of time and the Random class uses auto-seeding from the clock. The clock however has a limited resolution. The result is that many (sometimes all) of your random sequences will be the same. And at most you will have 2 or 3 different result sets.
The simple fix for this single-threaded program is to use 1 static instance of Random.

You're instantiating a new instance of Random every time you call the generateRandomArray method. Since the random number generator uses the current time as a seed, instantiating two Random instances at the same time results in the same numbers being generated, which explains your unexpected results.
To solve your problem, you should only instantiate one Random instance, store it in private field, and reuse it every time you call the Next method.

The problem I assume stems from the fact, that Random() uses current time as seed. And the computation happens so fast, that each time new Random() is called, it uses same time. So you get same sequence of numbers.
To fix this, you simply need to set seed by yourself, incremmenting it every cycle should be enough.
long base = DateTime.Now.TotalMilliseconds;
Random rand = new Random(base+searchCounter);
.. something like that.

I will not answer but will try to give an analogy for the people that think that they need more then one Random instance...
Suppose that you need to fill 6 sheets of paper with random numbers from 1-6. Ask yourself this: do you need 6 dices or one to do the job? If you answer that you need more than one dice, ask yourself this: how different or more random is throwing different dice instead of same dice every time?
I mean, if you throw ONE on a dice, next throw of a dice won't have any less chance to be ONE again than it would be any other number. It goes against intuition, but it is mathematically and statistically so.

In your original code, you're calling the Random method in rapid succession [broken example based on OP's original code], thus seeding the method with the same number, resulting in duplicate "random" numbers. Creating a static member will ensure randomness simply because you're only creating a single instance of it.
Try creating a single static instance of random like this. [static member example].
static readonly Random Random = new Random();
Based on this, here's how I would solve your particular problem.
using System;
namespace Week_6_Project_2
{
class Program
{
// ******************************************
// THIS IS A SINGLE INSTANCE OF Random.
// read below as to why I'm seeding the instantiation of Random();
static readonly Random Random = new Random(Guid.NewGuid().GetHashCode());
// ******************************************
private const int IntArrayLength = 20;
static readonly int[] ResultsArray = new int[IntArrayLength];
public static Array GenerateRandomArray()
{
var randomNumberArray = new int[IntArrayLength];
var popcounter = 0;
while (popcounter < IntArrayLength)
{
randomNumberArray[popcounter] = Random.Next(0, 10);
popcounter += 1;
}
return randomNumberArray;
}
public static void SearchForSevens()
{
var counter = 0;
var randomArray = (int[])GenerateRandomArray();
while (counter < IntArrayLength)
{
if (randomArray[counter] == 7)
{
ResultsArray[counter] += 1;
counter = IntArrayLength;
}
counter += 1;
}
}
static void Main()
{
var searchCounter = 0;
while (searchCounter < 1000)
{
SearchForSevens();
searchCounter += 1;
}
var displayCounter = 0;
while (displayCounter < IntArrayLength)
{
Console.WriteLine("Number of first occurrence of 7 at position {0} = {1}", displayCounter, ResultsArray[displayCounter]);
displayCounter += 1;
}
Console.ReadLine();
}
}
}
Further reading about Random()
Beyond the answer above, sometimes it is necessary to seed Random(int); (I like to use the HashCode of a Guid) to ensure further randomness. This is because the default seed uses the clock which from the docs [microsoft] has a finite resolution. If your class is instantiated multiple times in quick succession (< 16ms), you will get the same seed from the clock... this breaks stuff.
[example of seeded Random(int) run in rapid succession with random results]
"using the parameterless constructor to create different Random objects in close succession creates random number generators that produce identical sequences of random numbers."
hear me when I say that you should NOT instantiate a new Random on every iteration of a loop -- use a static member
Another valid reason to implement your own seed is when you want to recreate a random sequence [example of two identical lists based on same seed]. Re-using the same seed will recreate the sequence since the sequence is based on the seed.
note: others might say that seeding it is not necessary [link], but I personally believe that for the additional few keystrokes and the microscopic hit on the clock, you might as well increase the probability of a unique seed. It doesn't hurt anything, and in some situations it can help.

System.Random not so random, why does this occur? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Random number generator only generating one random number
I've distilled the behavior observed in a larger system into this code sequence:
for (int i = 0; i < 100; i++)
{
Random globalRand = new Random(0x3039 + i);
globalRand.Next();
globalRand.Next();
int newSeed = globalRand.Next();
Random rand = new Random(newSeed);
int value = rand.Next(1, 511);
Console.WriteLine(value);
}
Running this from Visual Studio 2012 targeting .NET 4.5 will output either 316 or 315. Extending this beyond 100 iterations and you'll see the value slowly decrement (314, 313...) but it still isn't what I'd imagine anyone would consider "random".
EDIT
I am aware that there are several questions on StackOverflow already that ask why their random numbers aren't random. However, those questions have issues regarding one of a) not passing in a seed (or passing in the same seed) to their Random object instance or b) doing something like NextInt(0, 1) and not realizing that the second parameter is an exclusive bound. Neither of those issues are true for this question.

It's a pseudo random generator which basically creates a long (infinite) list of numbers. The list is deterministic but the order can in most practical scenarios be treated as random. The order is however determined by the seed.
The most random behaviour you can achieve (without fancy pants tricks that are hard to get right all the time) is to reuse the same object over and over.
If you change your code to the below you have a more random behaviour
Random globalRand = new Random();
for (int i = 0; i < 100; i++)
{
globalRand.Next();
globalRand.Next();
int newSeed = globalRand.Next();
Random rand = new Random(newSeed);
int value = rand.Next(1, 511);
Console.WriteLine(value);
}
The reason is the math behind pseudo random generators basically just creates an infinite list of numbers.
The distribution of these numbers are almost random but the order in which they come are not. Computers are deterministic and as such incapable of producing true random numbers (without aid) so to circumvent this math geniuses have produced functions capable of producing these lists of numbers, that have a lot of radnomness about them but where the seed is determining the order.
Given the same seed the function always produces the same order. Given two seeds close to each order (where close can be a property of which function is used) the list will be almost the same for the first several numbers in the list but will eventually be very different.

Using the first random number to generate the second. Doesnt make it any more "Random". As suggested try this.
Also as suggested no need to generate the random object inside the loop.
Random globalRand = new Random();
for (int i = 0; i < 100; i++)
{
int value = globalRand.Next(1, 511);
Console.WriteLine(value);
}

I believe the Random() is time based so if your process is running really fast you will get the same answer if you keep creating new instances of your Random() in your loop. Try creatining the Random() outside of the loop and just use the .Next() to get your answer like:
Random rand = new Random();
for (int i = 0; i < 100; i++)
{
int value = rand.Next();
Console.WriteLine(value);
}

Without parameters Random c'tor takes the current date and time as the seed - and you can generally execute a fair amount of code before the internal timer works out that the current date and time has changed. Therefore you are using the same seed repeatedly - and getting the same results repeatedly.
Source : http://csharpindepth.com/Articles/Chapter12/Random.aspx

Seeding a pseudo-random number generator in C#

I need a seed for an instance of C#'s Random class, and I read that most people use the current time's ticks counter for this. But that is a 64-bit value and the seed needs to be a 32-bit value. Now I thought that the GetHashCode() method, which returns an int, should provide a reasonably distributed value for its object and this may be used to avoid using only the lower 32-bits of the tick count. However, I couldn't find anything about the GetHashCode() of the Int64 datatype.
So, I know that it will not matter much, but will the following work as good as I think (I can't trial-and-error randomness), or maybe it works the same as using (int)DateTime.Now.Ticks as the seed? Or maybe it even works worse? Who can shed some light on this.
int seed = unchecked(DateTime.Now.Ticks.GetHashCode());
Random r = new Random(seed);
Edit: Why I need a seed and don't just let the Random() constructor do the work? I need to send the seed to other clients which use the same seed for the same random sequence.

new Random() already uses the current time. It is equivalent to new Random(Environment.TickCount).
But this is an implementation detail and might change in future versions of .net
I'd recommend using new Random() and only provide a fixed seed if you want to get a reproducible sequence of pseudo random values.
Since you need a known seed just use Environment.TickCount just like MS does. And then transmit it to the other program instances as seed.
If you create multiple instances of Random in a short interval (could be 16ms) they can be seeded to the same value, and thus create the same pseudo-random sequence. But that's most likely not a problem here. This common pitfall is caused by windows updating the current time(DateTime.Now/.UtcNow) and the TickCount(Environment.TickCount) only every few milliseconds. The exact interval depends on the version of windows and on what other programs are running. Typical intervals where they don't change are 16ms or 1ms.

If you need to seed it with something other than the current time (in which case you can use the default constructor), you can use this:
Random random = new Random(Guid.NewGuid().GetHashCode());

I had a similar question , to select a random set of questions from a larger list of questions. But when I use the time as the seed it gives the same random number .
So here is my solution.
int TOTALQ = 7;
int NOOFQ = 5;
int[] selectedQuestion = new int[TOTALQ];
int[] askQuestion = new int[NOOFQ];
/* Genarae a random number 1 to TOTALQ
* - if that number in selectedQuestion array is not o
* - Fill askQuestion array with that number
* - remove that number from selectedQuestion
* - if not re-do that - - while - array is not full.
*/
for (int i = 0; i < TOTALQ; i++) // fill the array
selectedQuestion[i] = 1;
int question = 0;
int seed = 1;
while (question < NOOFQ)
{
DateTime now1 = new DateTime();
now1 = DateTime.Now;
Random rand = new Random(seed+now1.Millisecond);
int RandomQuestion = rand.Next(1, TOTALQ);
Response.Write("<br/> seed " + seed + " Random number " + RandomQuestion );
if (selectedQuestion[RandomQuestion] != 0)
{
selectedQuestion[RandomQuestion] = 0; // set that q =0 so not to select
askQuestion[question] = selectedQuestion[RandomQuestion];
Response.Write(". Question no " + question + " will be question " + RandomQuestion + " from list " );
question++;
}
seed++;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

more than 1 sequence of random numbers c#, linq - c#

You are creating 10 instances of Random in quick succession, and pulling the first pseudo-random number from each of them. I'm not surprised they're all the same. Try this: Random r = new Random(); var sequence = Enumerable.Range(0, 9).OrderBy(n => n * n * r.Next());

Related

Median Maintenance Algorithm - Same implementation yields different results depending on Int32 or Int64

Why .NET group by is (much) slower when the number of buckets grows

Why am I getting strange results when generating Random() numbers?

System.Random not so random, why does this occur? [duplicate]

Seeding a pseudo-random number generator in C#

Categories

Resources