I need to generate a random number between 0 and 1 in C#. It doesn't need to be more accurate than to a single decimal place but it's not a problem if it is.
I can either do Random.Next(0, 10) / 10.0 or Random.NextDouble().
I could not find any concrete information on the time complexity of either method. I assume Random.Next() will be more efficient as in Java, however the addition of the division (the complexity of which would depend on the method used by C#) complicates things.
Is it possible to find out which is more efficient purely from a theoretical standpoint? I realise I can time both over a series of tests, but want to understand why one has better complexity than the other.
Looking at the implmenentation source code, NextDouble() will be more efficient.
NextDouble() simply calls the Sample() method:
public virtual double NextDouble() {
return Sample();
}
Next(maxValue) performs a comparison on maxvalue, calls Sample(), multiplies the value by maxvalue, converts it to int and returns it:
public virtual int Next(int maxValue) {
if (maxValue<0) {
throw new ArgumentOutOfRangeException("maxValue", Environment.GetResourceString("ArgumentOutOfRange_MustBePositive", "maxValue"));
}
Contract.EndContractBlock();
return (int)(Sample()*maxValue);
}
So, as you can see, Next(maxValue) is doing the same work as NextDouble() and then doing some more, so NextDouble() will be more efficient in returning a number between 0 and 1.
For Mono users, you can see NextDouble() and Next(maxValue) implementations here. Mono does it a little differently, but it basically involves the same steps as the official implementation.
As Zoran says, you would need to be generating a huge amount of random numbers to notice a difference.
Either way, you'll be able to generate many many millions, if not billions, of random numbers every second. Do you really need that many?
On a more concrete level, both variants have time complexity O(1), meaning that you could measure the time difference between the two methods and that would be it.
Random generator = new Random();
int count = 1_000_000;
Stopwatch sw = new Stopwatch();
sw.Start();
double res;
for (int i = 0; i < count; i++)
res = generator.Next(0, 10) / 10.0;
sw.Stop();
Stopwatch sw1 = new Stopwatch();
sw1.Start();
for (int i = 0; i < count; i++)
res = generator.NextDouble();
sw1.Stop();
Console.WriteLine($"{sw.ElapsedMilliseconds} - {sw1.ElapsedMilliseconds}");
This code prints 44 msec : 29 msec on my computer. And again - I don't think that you should optimize an operation which takes 44 milliseconds on a million executions.
If 15 nanoseconds per execution still makes the difference, then the second method is one tiny bit faster.
Related
Context of this is a function, which needs to run pretty much once per frame, and is therefore very critical performance-wise. This function contains a loop, and operations inside it.
private int MyFunction(int number)
{
// Code
for (int i = 0; i <= 10000; i++)
{
var value = i * number
var valuePow2 = value * value;
// Some code which uses valuePow2 several times
}
return 0; // Not actual line
}
Now, because of mathematical properties, we know that (a * b)² is equal to a² * b²
So, it would be possible to make my function into this:
private int MyFunction(int number)
{
// Code
var numberPow2 = number * number;
for (int i = 0; i <= 10000; i++)
{
var iPow2 = i * i
var valuePow2 = numberPow2 * iPow2;
// Some code which uses valuePow2 several times
}
return 0; // Not actual line
}
intuitively, this seems like it should be faster, since number² does not vary, and is now only calculated once outside of the loop. At the very least, this would be much faster for a human to do, because the x² operation is done on a much smaller number during the loop.
What I am wondering, is in C#, when you use types like int, will the multiplication actually be faster with smaller numbers?
For example, will 5 * 5 execute faster than 5000 * 5000?
If so, then the second version is better, even if by a small margin, because of that.
But if, for a given data type, the time is constant, then the first version of the function is better, because half of the calculations will be done on smaller numbers, because I do the same amount of multiplication in the loop both times, but in the second version I do one extra multiplication before the start.
I am aware that for all intent and purposes, the performance difference is negligible. I was suggested the second version in a Code Review because the function is critical, and I can't find any documentation to support either view.
For example, will 5 * 5 execute faster than 5000 * 5000?
For compile-time constants, 5 * x is cheaper than 5000 * x because the former can be done with lea eax, [rdi + rdi*4].
But for runtime variables, the only integer instruction with data-dependent performance is division. This applies on any mainstream CPU: pipelining is so important that even if some cases could run with lower latency, they typically don't because that makes scheduling harder. (You can't have the same execution unit produce 2 results in the same cycle; instead the CPU just wants to know that putting inputs in on one cycle will definitely result in the answer coming out 3 cycles later.)
(For FP, again only division and sqrt have data-dependent performance on normal CPUs.)
Code using integers or FP that has any data-dependent branching can be much slower if the branches go a different way. (e.g. branch prediction is "trained" on one sequence of jumps for a binary search; searching with another key will be slower because it will mispredict at least once.)
And for the record, suggestions to use Math.Pow instead of integer * are insane. Simply converting an integer to double and back is slower than multiplying by itself with integer multiply.
Adam's answer links a benchmark that's looping over a big array, with auto-vectorization possible. SSE / AVX2 only has 32-bit integer multiply.
And 64-bit takes more memory bandwidth. That's also why it shows speedups for 16 and 8-bit integers. So it finds c=a*b running at half speed on a Haswell CPU, but that does not apply to your loop case.
In scalar code, imul r64, r64 has identical performance to imul r32, r32 on Intel mainstream CPUs (since at least Nehalem), and on Ryzen (https://agner.org/optimize/). Both 1 uop, 3 cycle latency, 1/clock throughput.
It's only AMD Bulldozer-family, and AMD Atom and Silvermont, where 64-bit scalar multiply is slower. (Assuming 64-bit mode of course! In 32-bit mode, working with 64-bit integers is slower.)
Optimizing your loop
For a fixed value of number, instead of recalculating i*number, compilers can and will optimize this to inum += number. This is called a strength-reduction optimization, because addition is a "weaker" (slightly cheaper) operation than multiplication.
for(...) {
var value = i * number
var valuePow2 = value * value;
}
can be compiled into asm that does something like
var value = 0;
for(...) {
var valuePow2 = value * value;
...
value += number;
}
You might try writing it by hand that way, in case the compiler isn't doing it for you.
But integer multiplication is very cheap and fully pipelined on modern CPUs, especially. It has slightly higher latency than add, and can run on fewer ports (usually only 1 per clock throughput instead of 4 for add), but you say you're doing significant work with valuePow2. That should let out-of-order execution hide the latency.
If you check the asm and the compiler is using a separate loop counter incrementing by 1, you could also try to hand-hold your compiler into optimizing the loop to use value as the loop counter.
var maxval = number * 10000;
for (var value = 0; i <= maxval; value += number) {
var valuePow2 = value * value;
...
}
Be careful if number*10000 can overflow, if you need it to wrap correctly. In that case this loop would run far fewer iterations. (Unless number is so big that value += number also wraps...)
For a typical processor, multiplying two 32-bit integers will take the same amount of cycles regardless of data in those integers. Most current processors will take nearly twice the time to multiply 64-bit integers as it takes to multiply 32-bit integers.
I did notice a problem in both of your codes. When you multiply two ints, it returns an type int. The var type will set the type to the return value. That means, valuePow2 will be an int.
Since your loop goes up to 10000, if number is 5 or greater, then you will overflow valuePow2.
If you don't want to overflow your int, you could change your code to
private int MyFunction(int number)
{
// Code
for (int i = 0; i <= 10000; i++)
{
long value = i * number; //64bit multiplication
long valuePow2 = value * value; //64bit multiplication
// Some code which uses valuePow2 several times
}
return 0; // Not actual line
}
the modified code should be faster because you may change a 64bit multiplication into a 32bit multiplication
private int MyFunction(int number)
{
// Code
long numberPow2 = number * number; //64bit multiplication
for (int i = 0; i <= 10000; i++)
{
int iPow2 = i * i; //32bit multiplication
long valuePow2 = numberPow2 * iPow2; //64bit multiplication
// Some code which uses valuePow2 several times
}
return 0; // Not actual line
}
But the circuitry in the CPU and the optimization of the compiler can change how many cycles this ends up running.
At the end of the day, you said it best:
I am aware that for all intent and purposes, the performance difference is negligible.
Given this simple piece of code and 10mln array of random numbers:
static int Main(string[] args)
{
int size = 10000000;
int num = 10; //increase num to reduce number of buckets
int numOfBuckets = size/num;
int[] ar = new int[size];
Random r = new Random(); //initialize with randum numbers
for (int i = 0; i < size; i++)
ar[i] = r.Next(size);
var s = new Stopwatch();
s.Start();
var group = ar.GroupBy(i => i / num);
var l = group.Count();
s.Stop();
Console.WriteLine(s.ElapsedMilliseconds);
Console.ReadLine();
return 0;
}
I did some performance on grouping, so when the number of buckets is 10k the estimated execution time is 0.7s, for 100k buckets it is 2s, for 1m buckets it is 7.5s.
I wonder why is that. I imagine that if the GroupBy is implemented using HashTable there might be problem with collisions. For example initially the hashtable is prepard to work for let's say 1000 groups and then when the number of groups is growing it needs to increase the size and do the rehashing. If these was the case I could then write my own grouping where I would initialize the HashTable with expected number of buckets, I did that but it was only slightly faster.
So my question is, why number of buckets influences groupBy performance that much?
EDIT:
running under release mode change the results to 0.55s, 1.6s, 6.5s respectively.
I also changed the group.ToArray to piece of code below just to force execution of grouping :
foreach (var g in group)
array[g.Key] = 1;
where array is initialized before timer with appropriate size, the results stayed almost the same.
EDIT2:
You can see the working code from mellamokb in here pastebin.com/tJUYUhGL
I'm pretty certain this is showing the effects of memory locality (various levels of caching) and also object allocation.
To verify this, I took three steps:
Improve the benchmarking to avoid unnecessary parts and to garbage collect between tests
Remove the LINQ part by populating a Dictionary (which is effecively what GroupBy does behind the scenes)
Remove even Dictionary<,> and show the same trend for plain arrays.
In order to show this for arrays, I needed to increase the input size, but it does show the same kind of growth.
Here's a short but complete program which can be used to test both the dictionary and the array side - just flip which line is commented out in the middle:
using System;
using System.Collections.Generic;
using System.Diagnostics;
class Test
{
const int Size = 100000000;
const int Iterations = 3;
static void Main()
{
int[] input = new int[Size];
// Use the same seed for repeatability
var rng = new Random(0);
for (int i = 0; i < Size; i++)
{
input[i] = rng.Next(Size);
}
// Switch to PopulateArray to change which method is tested
Func<int[], int, TimeSpan> test = PopulateDictionary;
for (int buckets = 10; buckets <= Size; buckets *= 10)
{
TimeSpan total = TimeSpan.Zero;
for (int i = 0; i < Iterations; i++)
{
// Switch which line is commented to change the test
// total += PopulateDictionary(input, buckets);
total += PopulateArray(input, buckets);
GC.Collect();
GC.WaitForPendingFinalizers();
}
Console.WriteLine("{0,9}: {1,7}ms", buckets, (long) total.TotalMilliseconds);
}
}
static TimeSpan PopulateDictionary(int[] input, int buckets)
{
int divisor = input.Length / buckets;
var dictionary = new Dictionary<int, int>(buckets);
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
int count;
dictionary.TryGetValue(key, out count);
count++;
dictionary[key] = count;
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
static TimeSpan PopulateArray(int[] input, int buckets)
{
int[] output = new int[buckets];
int divisor = input.Length / buckets;
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
output[key]++;
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
}
Results on my machine:
PopulateDictionary:
10: 10500ms
100: 10556ms
1000: 10557ms
10000: 11303ms
100000: 15262ms
1000000: 54037ms
10000000: 64236ms // Why is this slower? See later.
100000000: 56753ms
PopulateArray:
10: 1298ms
100: 1287ms
1000: 1290ms
10000: 1286ms
100000: 1357ms
1000000: 2717ms
10000000: 5940ms
100000000: 7870ms
An earlier version of PopulateDictionary used an Int32Holder class, and created one for each bucket (when the lookup in the dictionary failed). This was faster when there was a small number of buckets (presumably because we were only going through the dictionary lookup path once per iteration instead of twice) but got significantly slower, and ended up running out of memory. This would contribute to fragmented memory access as well, of course. Note that PopulateDictionary specifies the capacity to start with, to avoid effects of data copying within the test.
The aim of using the PopulateArray method is to remove as much framework code as possible, leaving less to the imagination. I haven't yet tried using an array of a custom struct (with various different struct sizes) but that may be something you'd like to try too.
EDIT: I can reproduce the oddity of the slower result for 10000000 than 100000000 at will, regardless of test ordering. I don't understand why yet. It may well be specific to the exact processor and cache I'm using...
--EDIT--
The reason why 10000000 is slower than the 100000000 results has to do with the way hashing works. A few more tests explain this.
First off, let's look at the operations. There's Dictionary.FindEntry, which is used in the [] indexing and in Dictionary.TryGetValue, and there's Dictionary.Insert, which is used in the [] indexing and in Dictionary.Add. If we would just do a FindEntry, the timings would go up as we expect it:
static TimeSpan PopulateDictionary1(int[] input, int buckets)
{
int divisor = input.Length / buckets;
var dictionary = new Dictionary<int, int>(buckets);
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
int count;
dictionary.TryGetValue(key, out count);
}
stopwatch.Stop();
return stopwatch.Elapsed;
}
This is implementation doesn't have to deal with hash collisions (because there are none), which makes the behavior as we expect it. Once we start dealing with collisions, the timings start to drop. If we have as much buckets as elements, there are obviously less collisions... To be exact, we can figure out exactly how many collisions there are by doing:
static TimeSpan PopulateDictionary(int[] input, int buckets)
{
int divisor = input.Length / buckets;
int c1, c2;
c1 = c2 = 0;
var dictionary = new Dictionary<int, int>(buckets);
var stopwatch = Stopwatch.StartNew();
foreach (var item in input)
{
int key = item / divisor;
int count;
if (!dictionary.TryGetValue(key, out count))
{
dictionary.Add(key, 1);
++c1;
}
else
{
count++;
dictionary[key] = count;
++c2;
}
}
stopwatch.Stop();
Console.WriteLine("{0}:{1}", c1, c2);
return stopwatch.Elapsed;
}
The result is something like this:
10:99999990
10: 4683ms
100:99999900
100: 4946ms
1000:99999000
1000: 4732ms
10000:99990000
10000: 4964ms
100000:99900000
100000: 7033ms
1000000:99000000
1000000: 22038ms
9999538:90000462 <<-
10000000: 26104ms
63196841:36803159 <<-
100000000: 25045ms
Note the value of '36803159'. This answers the question why the last result is faster than the first result: it simply has to do less operations -- and since caching fails anyways, that factor doesn't make a difference anymore.
10k the estimated execution time is 0.7s, for 100k buckets it is 2s, for 1m buckets it is 7.5s.
This is an important pattern to recognize when you profile code. It is one of the standard size vs execution time relationships in software algorithms. Just from seeing the behavior, you can tell a lot about the way the algorithm was implemented. And the other way around of course, from the algorithm you can predict the expected execution time. A relationship that's annotated in the Big Oh notation.
Speediest code you can get is amortized O(1), execution time barely increases when you double the size of the problem. The Dictionary<> class behaves that way, as John demonstrated. The increases in time as the problem set gets large is the "amortized" part. A side-effect of Dictionary having to perform linear O(n) searches in buckets that keep getting bigger.
A very common pattern is O(n). That tells you that there is a single for() loop in the algorithm that iterates over the collection. O(n^2) tells you there are two nested for() loops. O(n^3) has three, etcetera.
What you got is the one in between, O(log n). It is the standard complexity of a divide-and-conquer algorithm. In other words, each pass splits the problem in two, continuing with the smaller set. Very common, you see it back in sorting algorithms. Binary search is the one you find back in your text book. Note how log₂(10) = 3.3, very close to the increment you see in your test. Perf starts to tank a bit for very large sets due to the poor locality of reference, a cpu cache problem that's always associated with O(log n) algoritms.
The one thing that John's answer demonstrates is that his guess cannot be correct, GroupBy() certainly does not use a Dictionary<>. And it is not possible by design, Dictionary<> cannot provide an ordered collection. Where GroupBy() must be ordered, it says so in the MSDN Library:
The IGrouping objects are yielded in an order based on the order of the elements in source that produced the first key of each IGrouping. Elements in a grouping are yielded in the order they appear in source.
Not having to maintain order is what makes Dictionary<> fast. Keeping order always cost O(log n), a binary tree in your text book.
Long story short, if you don't actually care about order, and you surely would not for random numbers, then you don't want to use GroupBy(). You want to use a Dictionary<>.
There are (at least) two influence factors: First, a hash table lookup only takes O(1) if you have a perfect hash function, which does not exist. Thus, you have hash collisions.
I guess more important, though, are caching effects. Modern CPUs have large caches, so for the smaller bucket count, the hash table itself might fit into the cache. As the hash table is frequently accessed, this might have a strong influence on the performance. If there are more buckets, more accesses to the RAM might be neccessary, which are slow compared to a cache hit.
There are a few factors at work here.
Hashes and groupings
The way grouping works is by creating a hash table. Each individual group then supports an 'add' operation, which adds an element to the add list. To put it bluntly, it's like a Dictionary<Key, List<Value>>.
Hash tables are always overallocated. If you add an element to the hash, it checks if there is enough capacity, and if not, recreates the hash table with a larger capacity (To be exact: new capacity = count * 2 with count the number of groups). However, a larger capacity means that the bucket index is no longer correct, which means you have to re-build the entries in the hash table. The Resize() method in Lookup<Key, Value> does this.
The 'groups' themselves work like a List<T>. These too are overallocated, but are easier to reallocate. To be precise: the data is simply copied (with Array.Copy in Array.Resize) and a new element is added. Since there's no re-hashing or calculation involved, this is quite a fast operation.
The initial capacity of a grouping is 7. This means, for 10 elements you need to reallocate 1 time, for 100 elements 4 times, for 1000 elements 8 times, and so on. Because you have to re-hash more elements each time, your code gets a bit slower each time the number of buckets grows.
I think these overallocations are the largest contributors to the small growth in the timings as the number of buckets grow. The easiest way to test this theory is to do no overallocations at all (test 1), and simply put counters in an array. The result can be shown below in the code for FixArrayTest (or if you like FixBucketTest which is closer to how groupings work). As you can see, the timings of # buckets = 10...10000 are the same, which is correct according to this theory.
Cache and random
Caching and random number generators aren't friends.
Our little test also shows that when the number of buckets grows above a certain threshold, memory comes into play. On my computer this is at an array size of roughly 4 MB (4 * number of buckets). Because the data is random, random chunks of RAM will be loaded and unloaded into the cache, which is a slow process. This is also the large jump in the speed. To see this in action, change the random numbers to a sequence (called 'test 2'), and - because the data pages can now be cached - the speed will remain the same overall.
Note that hashes overallocate, so you will hit the mark before you have a million entries in your grouping.
Test code
static void Main(string[] args)
{
int size = 10000000;
int[] ar = new int[size];
//random number init with numbers [0,size-1]
var r = new Random();
for (var i = 0; i < size; i++)
{
ar[i] = r.Next(0, size);
//ar[i] = i; // Test 2 -> uncomment to see the effects of caching more clearly
}
Console.WriteLine("Fixed dictionary:");
for (var numBuckets = 10; numBuckets <= 1000000; numBuckets *= 10)
{
var num = (size / numBuckets);
var timing = 0L;
for (var i = 0; i < 5; i++)
{
timing += FixBucketTest(ar, num);
//timing += FixArrayTest(ar, num); // test 1
}
var avg = ((float)timing) / 5.0f;
Console.WriteLine("Avg Time: " + avg + " ms for " + numBuckets);
}
Console.WriteLine("Fixed array:");
for (var numBuckets = 10; numBuckets <= 1000000; numBuckets *= 10)
{
var num = (size / numBuckets);
var timing = 0L;
for (var i = 0; i < 5; i++)
{
timing += FixArrayTest(ar, num); // test 1
}
var avg = ((float)timing) / 5.0f;
Console.WriteLine("Avg Time: " + avg + " ms for " + numBuckets);
}
}
static long FixBucketTest(int[] ar, int num)
{
// This test shows that timings will not grow for the smaller numbers of buckets if you don't have to re-allocate
System.Diagnostics.Stopwatch s = new Stopwatch();
s.Start();
var grouping = new Dictionary<int, List<int>>(ar.Length / num + 1); // exactly the right size
foreach (var item in ar)
{
int idx = item / num;
List<int> ll;
if (!grouping.TryGetValue(idx, out ll))
{
grouping.Add(idx, ll = new List<int>());
}
//ll.Add(item); //-> this would complete a 'grouper'; however, we don't want the overallocator of List to kick in
}
s.Stop();
return s.ElapsedMilliseconds;
}
// Test with arrays
static long FixArrayTest(int[] ar, int num)
{
System.Diagnostics.Stopwatch s = new Stopwatch();
s.Start();
int[] buf = new int[(ar.Length / num + 1) * 10];
foreach (var item in ar)
{
int code = (item & 0x7FFFFFFF) % buf.Length;
buf[code]++;
}
s.Stop();
return s.ElapsedMilliseconds;
}
When executing bigger calculations, less physical memory is available on the computer, counting the buckets will be slower with less memory, as you expend the buckets, your memory will decrease.
Try something like the following:
int size = 2500000; //10000000 divided by 4
int[] ar = new int[size];
//random number init with numbers [0,size-1]
System.Diagnostics.Stopwatch s = new Stopwatch();
s.Start();
for (int i = 0; i<4; i++)
{
var group = ar.GroupBy(i => i / num);
//the number of expected buckets is size / num.
var l = group.ToArray();
}
s.Stop();
calcuting 4 times with lower numbers.
So there is several ways of creating a random bool in C#:
Using Random.Next(): rand.Next(2) == 0
Using Random.NextDouble(): rand.NextDouble() > 0.5
Is there really a difference? If so, which one actually has the better performance? Or is there another way I did not see, that might be even faster?
The first option - rand.Next(2) executes behind the scenes the following code:
if (maxValue < 0)
{
throw new ArgumentOutOfRangeException("maxValue",
Environment.GetResourceString("ArgumentOutOfRange_MustBePositive", new object[] { "maxValue" }));
}
return (int) (this.Sample() * maxValue);
and for the second option - rand.NextDouble():
return this.Sample();
Since the first option contains maxValue validation, multiplication and casting, the second option is probably faster.
Small enhancement for the second option:
According to MSDN
public virtual double NextDouble()
returns
A double-precision floating point number greater than or equal to 0.0, and less than 1.0.
So if you want an evenly spread random bool you should use >= 0.5
rand.NextDouble() >= 0.5
Range 1: [0.0 ... 0.5[
Range 2: [0.5 ... 1.0[
|Range 1| = |Range 2|
The fastest. Calling the method Random.Next has the less overhead. The extension method below runs 20% faster than Random.NextDouble() > 0.5, and 35% faster than Random.Next(2) == 0.
public static bool NextBoolean(this Random random)
{
return random.Next() > (Int32.MaxValue / 2);
// Next() returns an int in the range [0..Int32.MaxValue]
}
Faster than the fastest. It is possible to generate random booleans with the Random class even faster, by using tricks. The 31 significant bits of a generated int can be used for 31 subsequent boolean productions. The implementation below is 40% faster than the previously declared as the fastest.
public class RandomEx : Random
{
private uint _boolBits;
public RandomEx() : base() { }
public RandomEx(int seed) : base(seed) { }
public bool NextBoolean()
{
_boolBits >>= 1;
if (_boolBits <= 1) _boolBits = (uint)~this.Next();
return (_boolBits & 1) == 0;
}
}
I ran tests with stopwatch. 100,000 iterations:
System.Random rnd = new System.Random();
if (rnd.Next(2) == 0)
trues++;
CPUs like integers, so the Next(2) method was faster. 3,700 versus 7,500ms, which is quite substantial.
Also: I think random numbers can be a bottleneck, I created around 50 every frame in Unity, even with a tiny scene that noticeably slowed down my system, so I also was hoping to find a method to create a random bool.
So I also tried
if (System.DateTime.Now.Millisecond % 2 == 0)
trues++;
but calling a static function was even slower with 9,600ms. Worth a shot.
Finally I skipped the comparison and only created 100,000 random values, to make sure the int vs. double comparison did not influence the elapsed time, but the result was pretty much the same.
Based on answers here, Below is the code I used to generate random bool value.
Random rand = new Random();
bool randomBool = rand.NextDouble() >= 0.5;
References:
Generate a random boolean
https://stackoverflow.com/a/28763727/2218697
As I am calling ummanaged dll from C#, I've gone through some testing about the performance of for-loop in C# and C...
The result amazed me in the way that as the loop goes over bigger range, the performance of C# decreases as compared to C..For smaller range,C# shown well performance over C....But, as upper range of for-loop increases, the C# performance degrades as compare to C....
Here is my testing code....
[DllImport("Testing.dll", CallingConvention = CallingConvention.Cdecl)]
public static extern int SumLoop(int lowLimit, int highLimit);
public static void Main(string[] args)
{
const int LowerRange = 1;
const int HigherRange = 1000000;
// Test with C# For Loop
var watch1 = new Stopwatch();
watch1.Start();
int sum = 0;
for (int i = LowerRange; i <= HigherRange; i++)
{
sum += i;
}
watch1.Stop();
long elapseTime1 = watch1.ElapsedMilliseconds;
// Test with C-for loop
var watch2 = new Stopwatch();
watch2.Start();
int sumFromC = SumLoop(LowerRange , HigherRange);
long elapseTime2 = watch2.ElapsedMilliseconds;
}
Testing.dll:
__declspec(dllexport) int SumLoop(int lowLimit, int highLimit)
{
int idx;
int totalSum = 0;
for(idx = lowLimit;idx<= highLimit; idx= idx +1)
{
totalSum += idx;
}
return totalSum;
}
Testing Result :
Testing 1 :
HigherRange : 1000000
C# Loop : 4 millisecond
C-loop : 9 millisecond
Testing 2 :
HigherRange : 10000000
C# Loop : 53 millisecond
C-loop : 36 millisecond
Testing 3 :
HigherRange : 100000000
C# Loop : 418 millisecond
C-loop : 343 millisecond
Here, I started above testing with the aim that C for-loop performance will be better than C# loop but it goes exactly opposite with my understanding and gone with this question and agreed...But when I increase the upper range of for loop, the C performance goes well as compared to C#...
Now,I'm thinking that is the appraoch of testing is wrong or is it the expected performance result?
What's happening here is that you are ignoring the fixed overhead of using P/Invoke to call the C function.
The C function will be faster than the C# version BUT because of the relatively large overhead of calling it, the C function will appear to be slower for small arrays because the calling overhead is a relatively large proportion of the overall time.
However, as you increase the size of the collection the overhead will become a smaller and smaller proportion of the overall time, until the extra speed of the C version asserts itself and you start to see it running faster.
If you look at the times for the C# function, you can see that it is indeed increasing more or less linearly with N, which you'd expect. Compare T = 4 with T = 418 after you increase N by a factor of 100. Just what you'd expect. But the C times do NOT appear to increase linearly, for the reason outlined above.
Incidentally, if you take at least two timings, you can use simultaneous equations to solve:
T = K + XN
Where K is the fixed overhead, and X is the overhead per element.
I have calculated from your timings that the fixed overhead for calling the unmanaged code is approximately 5.6 milliseconds and the overhead per element is 3.373737 x 10^-6 milliseconds.
That overhead seems somewhat large, but I guess there's some inaccuracies in the measured data.
I've made such experiment - made 10 million random numbers from C and C#. And then counted how much times each bit from 15 bits in random integer is set. (I chose 15 bits because C supports random integer only up to 0x7fff).
What i've got is this:
I have two questions:
Why there are 3 most probable bits ? In C case bits 8,10,12 are most probable. And
in C# bits 6,8,11 are most probable.
Also seems that C# most probable bits is mostly shifted by 2 positions then compared to C most probable bits. Why is this ? Because C# uses other RAND_MAX constant or what ?
My test code for C:
void accumulateResults(int random, int bitSet[15]) {
int i;
int isBitSet;
for (i=0; i < 15; i++) {
isBitSet = ((random & (1<<i)) != 0);
bitSet[i] += isBitSet;
}
}
int main() {
int i;
int bitSet[15] = {0};
int times = 10000000;
srand(0);
for (i=0; i < times; i++) {
accumulateResults(rand(), bitSet);
}
for (i=0; i < 15; i++) {
printf("%d : %d\n", i , bitSet[i]);
}
system("pause");
return 0;
}
And test code for C#:
static void accumulateResults(int random, int[] bitSet)
{
int i;
int isBitSet;
for (i = 0; i < 15; i++)
{
isBitSet = ((random & (1 << i)) != 0) ? 1 : 0;
bitSet[i] += isBitSet;
}
}
static void Main(string[] args)
{
int i;
int[] bitSet = new int[15];
int times = 10000000;
Random r = new Random();
for (i = 0; i < times; i++)
{
accumulateResults(r.Next(), bitSet);
}
for (i = 0; i < 15; i++)
{
Console.WriteLine("{0} : {1}", i, bitSet[i]);
}
Console.ReadKey();
}
Very thanks !! Btw, OS is Windows 7, 64-bit architecture & Visual Studio 2010.
EDIT
Very thanks to #David Heffernan. I made several mistakes here:
Seed in C and C# programs was different (C was using zero and C# - current time).
I didn't tried experiment with different values of Times variable to research reproducibility of results.
Here's what i've got when analyzed how probability that first bit is set depends on number of times random() was called:
So as many noticed - results are not reproducible and shouldn't be taken seriously.
(Except as some form of confirmation that C/C# PRNG are good enough :-) ).
This is just common or garden sampling variation.
Imagine an experiment where you toss a coin ten times, repeatedly. You would not expect to get five heads every single time. That's down to sampling variation.
In just the same way, your experiment will be subject to sampling variation. Each bit follows the same statistical distribution. But sampling variation means that you would not expect an exact 50/50 split between 0 and 1.
Now, your plot is misleading you into thinking the variation is somehow significant or carries meaning. You'd get a much better understanding of this if you plotted the Y axis of the graph starting at 0. That graph looks like this:
If the RNG behaves as it should, then each bit will follow the binomial distribution with probability 0.5. This distribution has variance np(1 − p). For your experiment this gives a variance of 2.5 million. Take the square root to get the standard deviation of around 1,500. So you can see simply from inspecting your results, that the variation you see is not obviously out of the ordinary. You have 15 samples and none are more than 1.6 standard deviations from the true mean. That's nothing to worry about.
You have attempted to discern trends in the results. You have said that there are "3 most probable bits". That's only your particular interpretation of this sample. Try running your programs again with different seeds for your RNGs and you will have graphs that look a little different. They will still have the same quality to them. Some bits are set more than others. But there won't be any discernible patterns, and when you plot them on a graph that includes 0, you will see horizontal lines.
For example, here's what your C program outputs for a random seed of 98723498734.
I think this should be enough to persuade you to run some more trials. When you do so you will see that there are no special bits that are given favoured treatment.
You know that the deviation is about 2500/5,000,000, which comes down to 0,05%?
Note that the difference of frequency of each bit varies by only about 0.08% (-0.03% to +0.05%). I don't think I would consider that significant. If every bit were exactly equally probable, I would find the PRNG very questionable instead of just somewhat questionable. You should expect some level of variance in processes that are supposed to be more or less modelling randomness...