I'm pretty sure that this will never be an issue. However, I'm still curious, exactly how many iterations can any given seed generate a random number before its scope is fully exhausted and it wraps back around to generating the same numbers again?
As an example:
Suppose you have an array consisting of eight integer indices; during a given iteration the random.Next would fill each indice with a value of 0-31. And the test is attempting to see how long it would take to generate a perfect array of all 31's.
Mathematically, the odds are roughly 1 in 1,099,511,627,776 per iteration to yield a perfect array of all 31's. However, this is assuming that the C# Random number generator could even make it to the projected range of 1 trillion iterations without wrapping back around on itself.
So, to sum up my actual question, could the Random class achieve the test that I have presented? Or would it reach a half-way mark and just doom itself to failure regardless of how many iterations it goes through? What exactly is the number of iterations before the end of the random number generator will be reached? I wanted to also mention that it only takes about 20 minutes to successfully generate an array of 6 perfect 31's. This was tested by me and verified.
I should also mention that I am currently running a testing mechanism that is trying to achieve this. So far, this is the current report, which is displayed every minute:
##### Report #####
Elapsed Simulation Time: 00:49:00.1759559
Total Iterations: 20784834152
Perfect Eight Success: 0
Failure: 20784834152
##### End Report #####
I have estimated the required time to find 1 perfect array of 31's to be roughly 47 Hours and 56 Minutes to get close to the range of finding even 1 perfect set of 31's. That's with my computer filling my array 383,500,572 every minute. Looks like this test will take far longer than I originally projected.
2 Hour Update
##### Report #####
Elapsed Simulation Time: 02:00:00.4483950
Total Iterations: 55655726300
Success: 0
Failure: 55655726300
##### End Report #####
I kind of wish I would have threaded this...probably could have cut the time in half...
Enough comments already. Here's the definitive answer.
First: The RNG can, at best, operate on 64-bit values. There are finitely many 64-bit values, so, according to the pigeonhole principle, with enough iterations (n > 2^64) you will definitely get at least one repetitive value.
The underlying algorithm uses some finite, arbitrary number of parameters to decide the next random value. If we assume there are N state variables, each with 64 bits, there can be at most (2^64)^N different internal states. Like before, with enough iterations, your RNG will have the same internal state. This will cause a loop, and it will certainly come to pass at some point in time. As for how many iterations it takes to loop back, suffice it to say there will be more than you'll ever need for day-to-day random number generation. I haven't run into any such loop yet (it's been generating for 20 minutes straight on my i7 CPU, and if your code generates that many numbers, you're probably doing something very wrong).
Second: I don't know about eight 31's in a row, but that's just a special case. What you're asking, basically, is this: Given some arbitrary sequence S_QUERY of numbers, will the RNG generate S_QUERY?
To answer, we must first note that the RNG generates a finite sequence S_RNG of numbers. So the real question is this: is S_QUERY a subsequence of S_RNG? Since S_RNG is finite, it can only have finitely many subsequences. However, there are infinitely many possible S_QUERY's to choose from, so for every RNG you can find some S_QUERY's which cannot be generated by that RNG. As for the special case of eight 31's, I don't know and I can't know. Keep that code running and find out.
I just wanted to post my testing code and explain a few things. First, here is my code:
using System;
using System.Diagnostics;
namespace ConsoleApplication1
{
public static class Program
{
public static long Success;
public static long Failure;
public static long TotalIterations;
public static long TotalCallsToRandom;
public static readonly int CurrentSeed = Environment.TickCount;
public static Random Random = new Random(CurrentSeed);
public static Stopwatch TotalSimulationTime = new Stopwatch();
public static Stopwatch ReportWatchTime = new Stopwatch();
public static bool IsRunning = true;
//
public const int TotalTestingIndices = 7;
public const int MaximumTestingValue = 31;
public const int TimeBetweenReports = 30000; // Report every 30 Seconds.
//
public static void Main(string[] args)
{
int[] array = new int[TotalTestingIndices];
TotalSimulationTime.Start();
ReportWatchTime.Start();
while (IsRunning)
{
if (ReportWatchTime.ElapsedMilliseconds >= TimeBetweenReports)
{
Report();
ReportWatchTime.Restart();
}
Fill(array);
if (IsPerfect(array))
{
Success++;
Console.WriteLine("A Perfect Array was found!");
PrintArray(array);
Report();
IsRunning = false;
}
else
{
Failure++;
}
TotalIterations++;
}
Console.Read();
}
public static void Report()
{
Console.WriteLine();
Console.WriteLine("## Report ##");
Console.WriteLine("Current Seed: " + CurrentSeed);
Console.WriteLine("Desired Perfect Number: " + MaximumTestingValue);
Console.WriteLine("Total Testing Indices: " + TotalTestingIndices);
Console.WriteLine("Total Simulation Time: " + TotalSimulationTime.Elapsed);
Console.WriteLine("Total Iterations: " + TotalIterations);
Console.WriteLine("Total Random.NextInt() Calls: " + TotalCallsToRandom);
Console.WriteLine("Success: " + Success);
Console.WriteLine("Failure: " + Failure);
Console.WriteLine("## End of Report ##");
Console.WriteLine();
}
public static void PrintArray(int[] array)
{
for (int i = 0; i < array.Length; i++)
{
Console.Write(array[i]);
if (i != array.Length - 1)
{
Console.Write(",");
}
}
}
/// <summary>
/// Optimized to terminate quickly.
/// </summary>
/// <param name="array"></param>
/// <returns></returns>
public static bool IsPerfect(int[] array)
{
for (int i = 0; i < array.Length; i++)
{
if (array[i] != MaximumTestingValue)
{
return false;
}
}
return true;
}
public static void Fill(int[] array)
{
for (int i = 0; i < array.Length; i++)
{
array[i] = Random.Next(MaximumTestingValue + 1);
TotalCallsToRandom++;
}
}
}
}
After about three hours of testing I have come to a few realizations. I believe it may be possible to get eight perfect indices of 31...but only if you get lucky within the first billion or so calls to Random.Next(). I know this may seem like a subjective thing to say, but it's what I have experienced through these tests. I never once got 8-Perfect 31's, but I did get 7-Perfect 31's. The first time it was after 13 minutes. Here is the print out:
A Perfect Array was found!
31,31,31,31,31,31,31
## Report ##
Total Simulation Time: 00:13:32.4293323
Total Iterations: 7179003125
Success: 1
Failure: 7179003125
## End of Report ##
I didnt have it coded in at the time to print it out, but that print out would mean there were 50,253,021,875 individual calls to Random.NextInt(); This means that the resolution held up all the way to 50 Billion calls.
And the other 7-Perfect was only after about 30 seconds of the program running. That means there are "Good Seeds" for getting this kind of rarity fairly quickly. I also ran the test for 7-Perfect indices for thirty minutes and didn't get a single one. It's based on luck, but at the same time I heavily feel as though there is an invisible threshold; if you don't hit it soon it won't happen at all. A poster above said that the resolution of the Random class is "281,474,976,710,656". But my tests seem to conclude that the resolution may actually be far smaller than that. Try it yourself, start from 4-6 indices(Happens within a matter of seconds) and move up to 7 and 8. It's not just that the probability increases, it's that there is a threshold...or maybe I am just wrong. Who knows?
Related
Basically I was doing a code kata on codewars site to kinda of 'warm up' before starting to code, and noticed a problem that I don't know if its because of my code, or just regular thing.
public static string WhoIsNext(string[] names, long n)
{
Queue<string> fifo = new Queue<string>(names);
for(int i = 0; i < n - 1; i++)
{
var name = fifo.Dequeue();
fifo.Enqueue(name);
fifo.Enqueue(name);
}
return fifo.Peek();
}
And Is called like this:
// Test 1
string[] names = { "Sheldon", "Leonard", "Penny", "Rajesh", "Howard" };
long n = 1;
var nth = CodeKata.WhoIsNext(names, n); // n = 1 Should return sheldon.
// test 2
string[] names = { "Sheldon", "Leonard", "Penny", "Rajesh", "Howard" };
long n = 52;
var nth = CodeKata.WhoIsNext(names, n); // n = 52 Should return Penny.
// test 3
string[] names = { "Sheldon", "Leonard", "Penny", "Rajesh", "Howard" };
long n = 7230702951;
var nth = CodeKata.WhoIsNext(names, n); // n = 52 Should return Leonard.
In this code When I put the long n with the value 7230702951 (a really high number...), it throws an out of memory exception. Is the number that high, or is the queue just not optimized for such numbers.
I say this because I tried using a List and the list memory usage stayed under 500 MB (the plateu was around 327MB btw), and this running for about 2/3min, whereas the queue throwed the exception in a matter of seconds, and went over 2GB in just that time alone.
Can someone explain to me the why of this happening, I just curious?
edit 1
I forgot to add the List code:
public static string WhoIsNext(string[] names, long n)
{
List<string> test = new List<string>(names);
for(int i = 0; i < n - 1; i++)
{
var name = test[0];
test.RemoveAt(0);
test.Add(name);
test.Add(name);
}
return test[0];
}
edit 2
For those saying that the code doubles the names and is inneficient, I already know that, the code isn't made to be useful, is just a kata. (I updated the link now!)
My question is as to why is Queue so much more inneficient thatn List with high count numbers.
Part of the reason is that the queue code is way faster than the List code, because queues are optimised for deletes due to the fact that they are a circular buffer. Lists aren't - the list copies the array contents every time you remove that first element.
Change the input value to 72307000 for example. On my machine, the queue finishes that in less than a second. The list is still chugging away minutes (and at this rate, hours) later. In 4 minutes i is now at 752408 - it has done almost 1% of the work).
Thus, I am not sure the queue is less memory efficient. It is just so fast that you run into the memory issue sooner. The list almost certainly has the same issue (the way that List and Queue do array size doubling is very similar) - it will just likely take days to run into it.
To a certain extent, you could predict this even without running your code. A queue with 7230702951 entries in it (running 64-bit) will take a minimum of 8 bytes per entry. So 57845623608 bytes. Which is larger than 50GB. Clearly your machine is going to struggle to fit that in RAM (plus .NET won't let you have an array that large)...
Additionally, your code has a subtle bug. The loop can't ever end (if n is greater than int.MaxValue). Your loop variable is an int but the parameter is a long. Your int will overflow (from int.MaxValue to int.MinValue with i++). So the loop will never exit, for large values of n (meaning the queue will grow forever). You likely should change the type of i to long.
The program I am creating prompts the user to enter an amount of months, the amount of work absences he/she has had in each month, and the amount of absences allowed per month. It is supposed to calculate the average amount of work absences by an employee as well as the amount of times that he/she went over the max absences allowed.
I was able to calculate the average with an array loop but was having issues with the times he/she has gone over the max absences. I am using the binary search method but have trouble outputting the specific amount of months that the employee has gone over the allowed absence amount.
This is my current code for that section:
for (int i = 0; i < numbOfAbsences.Length; i++)
{
sum += numbOfAbsences[i];
averageAbsences = (sum / numbOfMonths);
Console.WriteLine("Employee was absent " + averageAbsences + " times per month.");
}
a = Array.BinarySearch(numbOfAbsences, maxAbsences);
if (a >= maxAbsences)
{
}
I am unsure of what would go under the last set of brackets, as I am not trying to point out whether the max amount was exceeded but rather the amount of times that it was.
Thank you for the help in advanced.
First off, the question is not clearly stated. It boils down to this:
"What's a good way to take an int array of sums and divide them all by a count to produce a double array of averages? Also, how can I count the values of an int array greater than a threshold? (Here's what I've got so far: )"
Second, the code snippet is very confusing. More precisely... It's not indented correctly. Not all of the variables used are defined, so we have to guess what they are. The variable names aren't very descriptive, so one has to look at how each variable is used to understand it. Most importantly however, this is no encapsulation. No objects, no functions... just raw code that you need to read every line of very carefully to understand.
Contrast the snippet with this:
using System;
using System.Linq;
namespace ArraySearch_StackOverflow
{
class Program
{
static void Main(string[] args)
{
int[] employeeAbsencesEachMonth = new int[] { 1, 2, 3, 4, 5, 6 };
int maxAbsencesAllowedPerMonth = 3;
double averageAbsencesPerMonth = GetAverageAbsencesPerMonth(employeeAbsencesEachMonth);
Console.WriteLine($"Employee's Average Absences Per Month: {averageAbsencesPerMonth}"); /* 3 */
int numTimesMaxAbsencesExceeded = GetNumMaxAbsencesViolations(employeeAbsencesEachMonth, maxAbsencesAllowedPerMonth);
Console.WriteLine($"Number of Times Employee Exceeded Max Absence Limit: {numTimesMaxAbsencesExceeded}"); /* 3 */
Console.WriteLine("\nPress any key to continue...");
Console.ReadKey();
}
private static double GetAverageAbsencesPerMonth(int[] employeeAbsencesEachMonth)
{
// ???
throw new NotImplementedException();
}
private static int GetNumMaxAbsencesViolations(int[] employeeAbsencesEachMonth, int maxAbsencesAllowedPerMonth)
{
// ???
throw new NotImplementedException();
}
}
}
This snippet is extremely clear. Even without a description, it's immediately apparent what's being asked, and how to tell if an answer is correct. Because the questions have been translated into a function signature and the context has been translated into a driver in which the functions are called, complete with setup and an expected result.
Handily, this implies a simple format for the answer, creating a good chance you can copy-paste it directly into your code:
private static double GetAverageAbsencesPerMonth(int[] employeeAbsencesEachMonth)
{
return employeeAbsencesEachMonth.Sum() / employeeAbsencesEachMonth.Length;
}
private static int GetNumMaxAbsencesViolations(int[] employeeAbsencesEachMonth, int maxAbsencesAllowedPerMonth)
{
return employeeAbsencesEachMonth.Count(x => x > maxAbsencesAllowedPerMonth);
}
Folks i have a sample console application which has a method that calculates the factors of the given number and returns the same as list.
If the input number is less than 9 digits the program is working fine however if the number is a 12 digit number, the execution goes on forever and there is no output and no exceptions also.
I have attached the execution code...
static void Main(string[] args)
{
var list = GetFactors(600851475143);
}
static List<long> GetFactors(long bigNum)
{
var list = new List<long>();
long counter = 1;
try
{
while (counter <= bigNum / 2)
{
if (bigNum % counter == 0)
{
list.Add(counter);
}
counter++;
}
}
catch (Exception ex)
{
throw;
}
return list;
}
...which is to be expected.
Most of cryptographic algorithms are based on how computationally expensive it is to compute (prime) factors of a given number.
The increment in the cost is not linear, so most likely, the jump between 9 and 12 digits is too big (with that unoptimized algorithm) that you won't be able to see the result anytime soon.
There are several documents about it on the net, just one among them many:
http://computer.howstuffworks.com/computing-power.htm
The following ruby code runs in ~15s. It barely uses any CPU/Memory (about 25% of one CPU):
def collatz(num)
num.even? ? num/2 : 3*num + 1
end
start_time = Time.now
max_chain_count = 0
max_starter_num = 0
(1..1000000).each do |i|
count = 0
current = i
current = collatz(current) and count += 1 until (current == 1)
max_chain_count = count and max_starter_num = i if (count > max_chain_count)
end
puts "Max starter num: #{max_starter_num} -> chain of #{max_chain_count} elements. Found in: #{Time.now - start_time}s"
And the following TPL C# puts all my 4 cores to 100% usage and is orders of magnitude slower than the ruby version:
static void Euler14Test()
{
Stopwatch sw = new Stopwatch();
sw.Start();
int max_chain_count = 0;
int max_starter_num = 0;
object locker = new object();
Parallel.For(1, 1000000, i =>
{
int count = 0;
int current = i;
while (current != 1)
{
current = collatz(current);
count++;
}
if (count > max_chain_count)
{
lock (locker)
{
max_chain_count = count;
max_starter_num = i;
}
}
if (i % 1000 == 0)
Console.WriteLine(i);
});
sw.Stop();
Console.WriteLine("Max starter i: {0} -> chain of {1} elements. Found in: {2}s", max_starter_num, max_chain_count, sw.Elapsed.ToString());
}
static int collatz(int num)
{
return num % 2 == 0 ? num / 2 : 3 * num + 1;
}
How come ruby runs faster than C#? I've been told that Ruby is slow. Is that not true when it comes to algorithms?
Perf AFTER correction:
Ruby (Non parallel): 14.62s
C# (Non parallel): 2.22s
C# (With TPL): 0.64s
Actually, the bug is quite subtle, and has nothing to do with threading. The reason that your C# version takes so long is that the intermediate values computed by the collatz method eventually start to overflow the int type, resulting in negative numbers which may then take ages to converge.
This first happens when i is 134,379, for which the 129th term (assuming one-based counting) is 2,482,111,348. This exceeds the maximum value of 2,147,483,647 and therefore gets stored as -1,812,855,948.
To get good performance (and correct results) on the C# version, just change:
int current = i;
…to:
long current = i;
…and:
static int collatz(int num)
…to:
static long collatz(long num)
That will bring down your performance to a respectable 1.5 seconds.
Edit: CodesInChaos raises a very valid point about enabling overflow checking when debugging math-oriented applications. Doing so would have allowed the bug to be immediately identified, since the runtime would throw an OverflowException.
Should be:
Parallel.For(1L, 1000000L, i =>
{
Otherwise, you have integer overfill and start checking negative values. The same collatz method should operate with long values.
I experienced something like that. And I figured out that's because each of your loop iterations need to start other thread and this takes some time, and in this case it's comparable (I think it's more time) than the operations you acctualy do in the loop body.
There is an alternative for that: You can get how many CPU cores you have and than use a parallelism loop with the same number of iterations you have cores, each loop will evaluate part of the acctual loop you want, it's done by making an inner for loop that depends on the parallel loop.
EDIT: EXAMPLE
int start = 1, end = 1000000;
Parallel.For(0, N_CORES, n =>
{
int s = start + (end - start) * n / N_CORES;
int e = n == N_CORES - 1 ? end : start + (end - start) * (n + 1) / N_CORES;
for (int i = s; i < e; i++)
{
// Your code
}
});
You should try this code, I'm pretty sure this will do the job faster.
EDIT: ELUCIDATION
Well, quite a long time since I answered this question, but I faced the problem again and finally understood what's going on.
I've been using AForge implementation of Parallel for loop, and it seems like, it fires a thread for each iteration of the loop, so, that's why if the loop takes relatively a small amount of time to execute, you end up with a inefficient parallelism.
So, as some of you pointed out, System.Threading.Tasks.Parallel methods are based on Tasks, which are kind of a higher level of abstraction of a Thread:
"Behind the scenes, tasks are queued to the ThreadPool, which has been enhanced with algorithms that determine and adjust to the number of threads and that provide load balancing to maximize throughput. This makes tasks relatively lightweight, and you can create many of them to enable fine-grained parallelism."
So yeah, if you use the default library's implementation, you won't need to use this kind of "bogus".
I've made such experiment - made 10 million random numbers from C and C#. And then counted how much times each bit from 15 bits in random integer is set. (I chose 15 bits because C supports random integer only up to 0x7fff).
What i've got is this:
I have two questions:
Why there are 3 most probable bits ? In C case bits 8,10,12 are most probable. And
in C# bits 6,8,11 are most probable.
Also seems that C# most probable bits is mostly shifted by 2 positions then compared to C most probable bits. Why is this ? Because C# uses other RAND_MAX constant or what ?
My test code for C:
void accumulateResults(int random, int bitSet[15]) {
int i;
int isBitSet;
for (i=0; i < 15; i++) {
isBitSet = ((random & (1<<i)) != 0);
bitSet[i] += isBitSet;
}
}
int main() {
int i;
int bitSet[15] = {0};
int times = 10000000;
srand(0);
for (i=0; i < times; i++) {
accumulateResults(rand(), bitSet);
}
for (i=0; i < 15; i++) {
printf("%d : %d\n", i , bitSet[i]);
}
system("pause");
return 0;
}
And test code for C#:
static void accumulateResults(int random, int[] bitSet)
{
int i;
int isBitSet;
for (i = 0; i < 15; i++)
{
isBitSet = ((random & (1 << i)) != 0) ? 1 : 0;
bitSet[i] += isBitSet;
}
}
static void Main(string[] args)
{
int i;
int[] bitSet = new int[15];
int times = 10000000;
Random r = new Random();
for (i = 0; i < times; i++)
{
accumulateResults(r.Next(), bitSet);
}
for (i = 0; i < 15; i++)
{
Console.WriteLine("{0} : {1}", i, bitSet[i]);
}
Console.ReadKey();
}
Very thanks !! Btw, OS is Windows 7, 64-bit architecture & Visual Studio 2010.
EDIT
Very thanks to #David Heffernan. I made several mistakes here:
Seed in C and C# programs was different (C was using zero and C# - current time).
I didn't tried experiment with different values of Times variable to research reproducibility of results.
Here's what i've got when analyzed how probability that first bit is set depends on number of times random() was called:
So as many noticed - results are not reproducible and shouldn't be taken seriously.
(Except as some form of confirmation that C/C# PRNG are good enough :-) ).
This is just common or garden sampling variation.
Imagine an experiment where you toss a coin ten times, repeatedly. You would not expect to get five heads every single time. That's down to sampling variation.
In just the same way, your experiment will be subject to sampling variation. Each bit follows the same statistical distribution. But sampling variation means that you would not expect an exact 50/50 split between 0 and 1.
Now, your plot is misleading you into thinking the variation is somehow significant or carries meaning. You'd get a much better understanding of this if you plotted the Y axis of the graph starting at 0. That graph looks like this:
If the RNG behaves as it should, then each bit will follow the binomial distribution with probability 0.5. This distribution has variance np(1 − p). For your experiment this gives a variance of 2.5 million. Take the square root to get the standard deviation of around 1,500. So you can see simply from inspecting your results, that the variation you see is not obviously out of the ordinary. You have 15 samples and none are more than 1.6 standard deviations from the true mean. That's nothing to worry about.
You have attempted to discern trends in the results. You have said that there are "3 most probable bits". That's only your particular interpretation of this sample. Try running your programs again with different seeds for your RNGs and you will have graphs that look a little different. They will still have the same quality to them. Some bits are set more than others. But there won't be any discernible patterns, and when you plot them on a graph that includes 0, you will see horizontal lines.
For example, here's what your C program outputs for a random seed of 98723498734.
I think this should be enough to persuade you to run some more trials. When you do so you will see that there are no special bits that are given favoured treatment.
You know that the deviation is about 2500/5,000,000, which comes down to 0,05%?
Note that the difference of frequency of each bit varies by only about 0.08% (-0.03% to +0.05%). I don't think I would consider that significant. If every bit were exactly equally probable, I would find the PRNG very questionable instead of just somewhat questionable. You should expect some level of variance in processes that are supposed to be more or less modelling randomness...