Looking at another question of mine I realized that technically there is nothing preventing this algorithm from running for an infinite period of time. (IE: It never returns)
Because of the chance that rand.Next(1, 100000); could theoretically keep generating the same value.
Out of curiosity; how would I calculate the probability of this happening? I assume it would be very small?
Code from other question:
Random rand = new Random();
List<Int32> result = new List<Int32>();
for (Int32 i = 0; i < 300; i++)
{
Int32 curValue = rand.Next(1, 100000);
while (result.Exists(value => value == curValue))
{
curValue = rand.Next(1, 100000);
}
result.Add(curValue);
}
On ONE given draw of a random number, the probability of repeating a value readily found in the result list is
P(Collision) = i * 1/100000 where i is the number of values in the list.
That is because all 100,000 possible numbers are assumed to have the same probability of being drawn (assumption of a uniform distribution) and the drawing of any number is independent from that of drawing any other number.
The probability of experiencing such a "collision" with the numbers from the list several several times in a row is
P(n Collisions) = P(Collision) ^ n
where n is the number of times a collision happens
That is because the drawings are independent.
Numerically...
when the list is half full, i = 150 and
P(Collision) = 0.15% = 0.0015 and
P(2 Collisions) = 0.00000225
P(3 Collisions) - 0.000000003375
P(4 Collisions) = 0.0000000000050265
when the list is all full but for the last one, i = 299 and
P(Collision) = 0.299% = 0.00299 and
P(2 Collisions) = 0.0000089401 (approx)
P(3 Collisions) = 0.00000002673 (approx)
P(4 Collisions) = 0.000000000079925 (approx)
You are therefore right to assume that the probability of having to draw multiple times for finding the next suitable value to add to the array is very small, and should therefore not impact the overall performance of the snippet. Beware that there will be a few retries (statistically speaking), but the total number of retries will be small compared to 300.
If however the total number of item desired in the list was to increase much, or if the range of random number sought was to be reduced, P(Collision) would not be so small and hence the number of "retries" needed would grow accordingly. That is why other algorithms exist for drawings multiple values without replacement; most are based on the idea of using the random number as an index into a array of all the remaining values.
Assuming a uniform distribution (not a bad assumption, I believe) the chance of getting the number n times in a row is (0.00001)^n.
It's quite possible for a PRNG to generate the same number in a limited range in consecutive calls. The probability would be a function of the bit-size of the raw PRNG and the method used to reduce that size to the numeric range you want (in this case 1 - 100000).
To answer your question exactly, no, it isn't very small, the probability of it going on for an infinite period of time "is" 0. I say "is" because it actually tends to 0 when the number of iterations tends to infinity.
As bdares said, it will tend to 0 with (1/range)ˆn , with n being the number of iterations, if we can assume an uniform distribution (this says we kinda can).
This program will not halt if:
A random number is picked that is in the result set
That number generates a cycle (i.e. a loop) in the random number generator's algorithm (they all do)
All numbers in the loop are already in the result set
All random number generators eventually loop back on themselves, due to the limited number of integers possible ==> for 32-bit, only 2^32 possible values.
"Good" generators have very large loops. "Poor" algorithms yield short loops for certain values. Consult Knuth's The Art of Computer Programming for random number generators. It is a fascinating read.
Now, assuming there is a cycle of (n) numbers. For your program, which loops 300 times, that means (n) <= 300. Also, the number of attempts you try before you hit on a number in this cycle, plus the length of the cycle, must not be greater than 300. Therefore, assuming the first try you hit on the cycle, then the cycle can be 300 long. If on the second try you hit the cycle, it can only be 299 long.
Assuming that most random number generation algorithms have reasonably-flat probability distribution, the probability of hitting a 300-cycle the first time is (300/2^32), multiplied by the probability of having a 300-cycle (this depends on the rand algorithm), plus the probability of hitting a 299-cycle the first time (299/2^32) x probability of having a 299-cycle, etc. And so on and so forth. Then add up the second try, third try, all the way up to the 300-th try (which can only be a 1-cycle).
Now this is assuming that any number can take on the full 2^32 generator space. If you are limiting it to 100000 only, then in essence you increase the chance of having much shorter cycles, because multiple numbers (in the 2^32 space) can map to the same number in "real" 100000 space.
In reality, most random generator algorithms have minimum cycle lengths of > 300. A random generator implementation based on the simplest LCG (linear congruential generator, wikipedia) can have a "full period" (i.e. 2^32) with the correct choice of parameters. So it is safe to say that minimum cycle lengths are definitely > 300. If this is the case, then it depends on the mapping algorithm of the generator to map 2^32 numbers into 100000 numbers. Good mappers will not create 300-cycles, poor mappers may create short cycles.
Related
Assume I have an array of bytes which are truly random (e.g. captured from an entropy source).
byte[] myTrulyRandomBytes = MyEntropyHardwareEngine.GetBytes(8);
Now, I want to get a random double precision floating point value, but between the values of 0 and positive 1 (like the Random.NextDouble() function performs).
Simply passing an array of 8 random bytes into BitConverter.ToDouble() can yield strange results, but most importantly, the results will almost never be less than 1.
I am fine with bit-manipulation, but the formatting of floating point numbers has always been mysterious to me. I tried many combinations of bits to apply randomness to and always ended up finding the numbers were either just over 1, always VERY close to 0, or very large.
Can someone explain which bits should be made random in a double in order to make it random within the range 0 and 1?
Though working answers have been given, I'll give an other one, that looks worse but isn't:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / long.MaxValue;
The issue with casting from an ulong to double is that it's not directly supported by hardware, so it compiles to this:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx ; interpret ulong as long and convert it to double
test rcx,rcx ; add fixup if it was "negative"
jge 000000000000001D
vaddsd xmm0,xmm0,mmword ptr [00000060h]
vdivsd xmm0,xmm0,mmword ptr [00000068h]
Whereas with my suggestion it will compile more nicely:
vxorps xmm0,xmm0,xmm0
vcvtsi2sd xmm0,xmm0,rcx
vdivsd xmm0,xmm0,mmword ptr [00000060h]
Both tested with the x64 JIT in .NET 4, but this applies in general, there just isn't a nice way to convert an ulong to a double.
Don't worry about the bit of entropy being lost: there are only 262 doubles between 0.0 and 1.0 in the first place, and most of the smaller doubles cannot be chosen so the number of possible results is even less.
Note that this as well as the presented ulong examples can result in exactly 1.0 and distribute the values with slightly differing gaps between adjacent results because they don't divide by a power of two. You can change them exclude 1.0 and get a slightly more uniform spacing (but see the first plot below, there is a bunch of different gaps, but this way it is very regular) like this:
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) / ((double)long.MaxValue + 1);
As a really nice bonus, you can now change the division to a multiplication (powers of two usually have inverses)
long asLong = BitConverter.ToInt64(myTrulyRandomBytes, 0);
double number = (double)(asLong & long.MaxValue) * 1.08420217248550443400745280086994171142578125E-19;
Same idea for ulong, if you really want to use that.
Since you also seemed interested specifically in how to do it with double-bits trickery, I can show that too.
Because of the whole significand/exponent deal, it can't really be done in a super direct way (just reinterpreting the bits and that's it), mainly because choosing the exponent uniformly spells trouble (with a uniform exponent, the numbers are necessarily clumped preferentially near 0 since most exponents are there).
But if the exponent is fixed, it's easy to make a double that's uniform in that region. That cannot be 0 to 1 because that spans a lot of exponents, but it can be 1 to 2 and then we can subtract 1.
So first mask away the bits that won't be part of the significand:
x &= (1L << 52) - 1;
Put in the exponent (1.0 - 2.0 range, excluding 2)
x |= 0x3ff0000000000000;
Reinterpret and adjust for the offset of 1:
return BitConverter.Int64BitsToDouble(x) - 1;
Should be pretty fast, too. An unfortunate side effect is that this time it really does cost a bit of entropy, because there are only 52 but there could have been 53. This way always leaves the least significant bit zero (the implicit bit steals a bit).
There were some concerns about the distributions, which I will address now.
The approach of choosing a random (u)long and dividing it by the maximum value clearly has a uniformly chosen (u)long, and what happens after that is actually interesting. The result can justifiably be called a uniform distribution, but if you look at it as a discrete distribution (which it actually is) it looks (qualitatively) like this: (all examples for minifloats)
Ignore the "thicker" lines and wider gaps, that's just the histogram being funny. These plots used division by a power of two, so there is no spacing problem in reality, it's only plotted strangely.
Top is what happens when you use too many bits, as happens when dividing a complete (u)long by its max value. This gives the lower floats a better resolution, but lots of different (u)longs get mapped onto the same float in the higher regions. That's not necessarily a bad thing, if you "zoom out" the density is the same everywhere.
The bottom is what happens when the resolution is limited to the worst case (0.5 to 1.0 region) everywhere, which you can do by limiting the number of bits first and then doing the "scale the integer" deal. My second suggesting with the bit hacks does not achieve this, it's limited to half that resolution.
For what it's worth, NextDouble in System.Random scales a non-negative int into the 0.0 .. 1.0 range. The resolution of that is obviously a lot lower than it could be. It also uses an int that cannot be int.MaxValue and therefore scales by approximately 1/(231-1) (cannot be represented by a double, so slightly rounded), so there are actually 33 slightly different gaps between adjacent possible results, though the majority of the gaps is the same distance.
Since int.MaxValue is small compared to what can be brute-forced these days, you can easily generate all possible results of NextDouble and examine them, for example I ran this:
const double scale = 4.6566128752458E-10;
double prev = 0;
Dictionary<long, int> hist = new Dictionary<long, int>();
for (int i = 0; i < int.MaxValue; i++)
{
long bits = BitConverter.DoubleToInt64Bits(i * scale - prev);
if (!hist.ContainsKey(bits))
hist[bits] = 1;
else
hist[bits]++;
prev = i * scale;
if ((i & 0xFFFFFF) == 0)
Console.WriteLine("{0:0.00}%", 100.0 * i / int.MaxValue);
}
This is easier than you think; its all about scaling (also true when going from a 0-1 range to some other range).
Basically, if you know that you have 64 truly random bits (8 bytes) then just do this:
double zeroToOneDouble = (double)(BitConverter.ToUInt64(bytes) / (decimal)ulong.MaxValue);
The trouble with this kind of algorithm comes when your "random" bits aren't actually uniformally random. That's when you need a specialized algorithm, such as a Mersenne Twister.
I don't know wether it's the best solution for this, but it should do the job:
ulong asLong = BitConverter.ToUInt64(myTrulyRandomBytes, 0);
double number = (double)asLong / ulong.MaxValue;
All I'm doing is converting the byte array to a ulong which is then divided by it's max value, so that the result is between 0 and 1.
To make sure the long value is within the range from 0 to 1, you can apply the following mask:
long longValue = BitConverter.ToInt64(myTrulyRandomBytes, 0);
longValue &= 0x3fefffffffffffff;
The resulting value is guaranteed to lay in the range [0, 1).
Remark. The 0x3fefffffffffffff value is very-very close to 1 and will be printed as 1, but it is really a bit less than 1.
If you want to make the generated values greater, you could set a number higher bits of an exponent to 1. For instance:
longValue |= 0x03c00000000000000;
Summarizing: example on dotnetfiddle.
If you care about the quality of the random numbers generated, be very suspicious of the answers that have appeared so far.
Those answers that use Int64BitsToDouble directly will definitely have problems with NaNs and infinities. For example, 0x7ff0000000000001, a perfectly good random bit pattern, converts to NaN (and so do thousands of others).
Those that try to convert to a ulong and then scale, or convert to a double after ensuring that various bit-pattern constraints are met, won't have NaN problems, but they are very likely to have distributional problems. Representable floating point numbers are not distributed uniformly over (0, 1), so any scheme that randomly picks among all representable values will not produce values with the required uniformity.
To be safe, just use ToInt32 and use that int as a seed for Random. (To be extra safe, reject 0.) This won't be as fast as the other schemes, but it will be much safer. A lot of research and effort has gone into making RNGs good in ways that are not immediately obvious.
Simple piece of code to print the bits out for you.
for (double i = 0; i < 1.0; i+=0.05)
{
var doubleToInt64Bits = BitConverter.DoubleToInt64Bits(i);
Console.WriteLine("{0}:\t{1}", i, Convert.ToString(doubleToInt64Bits, 2));
}
0.05: 11111110101001100110011001100110011001100110011001100110011010
0.1: 11111110111001100110011001100110011001100110011001100110011010
0.15: 11111111000011001100110011001100110011001100110011001100110100
0.2: 11111111001001100110011001100110011001100110011001100110011010
0.25: 11111111010000000000000000000000000000000000000000000000000000
0.3: 11111111010011001100110011001100110011001100110011001100110011
0.35: 11111111010110011001100110011001100110011001100110011001100110
0.4: 11111111011001100110011001100110011001100110011001100110011001
0.45: 11111111011100110011001100110011001100110011001100110011001100
0.5: 11111111011111111111111111111111111111111111111111111111111111
0.55: 11111111100001100110011001100110011001100110011001100110011001
0.6: 11111111100011001100110011001100110011001100110011001100110011
0.65: 11111111100100110011001100110011001100110011001100110011001101
0.7: 11111111100110011001100110011001100110011001100110011001100111
0.75: 11111111101000000000000000000000000000000000000000000000000001
0.8: 11111111101001100110011001100110011001100110011001100110011011
0.85: 11111111101011001100110011001100110011001100110011001100110101
0.9: 11111111101100110011001100110011001100110011001100110011001111
0.95: 11111111101110011001100110011001100110011001100110011001101001
I've found guides on programmically getting random numbers that generate a normal; however, I need to spread out a number so that it reflects a normal distribution (in the least calculation time possible).
For example, I have an item that cost $500,000 in year 2050 and I would like to spread it across with a standard deviation of 20 years. This will result in the area under the first standard deviation (year 2030 to 2070) having (68% * $500,000) of the cost.
Is there a formula I can use in my code to achieve this? The only way I can think of right now is using the Box Muller random generator and loop through each dollar to generate the distribution, which is definitely not efficient.
Note that the total value under the normal distribution is 1. Therefore, you just need to find out what percentage of the curve accumulates in a certain timeframe and multiply by the total cost.
Instead of a normal distribution, you want the normal CDF (cumulative density function) (here it is in C#).
So, to sum up: to find the cost for a given range of years, you find the normalized values for those years, plug them into the CDF function, and multiply by the total cost:
start = (start_year-0.5-median)/stddev;
end = (end_year+0.5-median)/stddev;
cost = total_cost*(Phi(end)-Phi(start));
The following LINQPad code generates random sequence of unique integers from 0 to N and calculates the length of cycle for every integer starting from 0. In order to calculate cycle length for a given integer, it reads value from boxes array at the index equal to that integer, than takes the value and reads from the index equal to that value, and so on. The process stops when the value read from array is equal to original integer we started with. The number of iterations spent to calculating the length of every cycle gets saved into a Dictionary.
const int count = 100;
var random = new Random();
var boxes = Enumerable.Range(0, count).OrderBy(x => random.Next(0, count - 1)).ToArray();
string.Join(", ", boxes.Select(x => x.ToString())).Dump("Boxes");
var stats = Enumerable.Range(0, count).ToDictionary(x => x, x => {
var iterations = 0;
var ind = x;
while(boxes[ind] != x)
{
ind = boxes[ind];
iterations++;
}
return iterations;
});
stats.GroupBy(x => x.Value).Select(x => new {x.Key, Count = x.Count()}).OrderBy(x => x.Key).Dump("Stats");
stats.Sum(x => x.Value).Dump("Total Iterations");
Typical result looks as follows:
The results I am getting seem weird to me:
The lengths of all cycles can be grouped into only few buckets (usually 3 to 7). I was hoping to see more distinct buckets.
The number of elements in every bucket most of the time grows together with the bucket value they belong to. I was hoping that it would be more random.
I have tried several different randomize functions, like .NET's Random and RandomNumberGenerator classes, as well as random data generated from random.org. All of them seem to produce similar results.
Am I doing something wrong? Are those results expected from mathematical point of view? Or, perhaps, the pseudo nature of randomizing functions that I used have side effects?
What you are doing is generating a random permutation of size count. Then you check the properties of the permutation. If your random number generator is good, then you should observe the statistics of random permutations.
The average number of cycles of length k is 1/k, for k<count. On average, there is 1 fixed point, 1/2 cycles of length 2, 1/3 cycles of length 3, etc. The average number of cycles of any length is therefore 1+1/2+1/3+...+1/count ~ ln count + gamma. There are a lot of neat properties of the distribution of the number of cycles. Very occasionally there are many cycles, but the average value of 2^# cycles is count+1.
Your buckets correspond to the number of different cycle lengths, which is at most the number of cycles, but might be lower because of repeated cycle lengths. On average, few cycle lengths are repeated. Even as the count increases to infinity, and the average number of cycles increases to infinity, the average number of repeated cycle lengths stays finite.
In a permutation test in statistics, usually an example of bootstrapping, to analyze some types of data, you view it as an example of a permutation. For example, you might observe two quantities, x_i and y_i. You get a permutation by sorting the xs and ys, and seeing the index of the value of y paired with the kth x value. Then you compare statistics of this permutation with the properties of random permutations. This doesn't assume much about the underlying distributions, but it can still detect when x and y seem to be related. So, it's useful to know what to expect from random permutations.
I need to generate a unique ID and was considering Guid.NewGuid to do this, which generates something of the form:
0fe66778-c4a8-4f93-9bda-366224df6f11
This is a little long for the string-type database column that it will end up residing in, so I was planning on truncating it.
The question is: Is one end of a GUID more preferable than the rest in terms of uniqueness? Should I be lopping off the start, the end, or removing parts from the middle? Or does it just not matter?
You can save space by using a base64 string instead:
var g = Guid.NewGuid();
var s = Convert.ToBase64String(g.ToByteArray());
Console.WriteLine(g);
Console.WriteLine(s);
This will save you 12 characters (8 if you weren't using the hyphens).
Keep all of it.
From the above link:
* Four bits to encode the computer number,
* 56 bits for the timestamp, and
* four bits as a uniquifier.
you can redefine the Guid to right-size it to your needs.
If the GUID were simply a random number, you could keep an arbitrary subset of the bits and suffer a certain percent chance of collision that you can calculate with the "birthday algorithm":
double numBirthdays = 365; // set to e.g. 18446744073709551616d for 64 bits
double numPeople = 23; // set to the maximum number of GUIDs you intend to store
double probability = 1; // that all birthdays are different
for (int x = 1; x < numPeople; x++)
probability *= (double)(numBirthdays - x) / numBirthdays;
Console.WriteLine("Probability that two people have the same birthday:");
Console.WriteLine((1 - probability).ToString());
However, often the probability of a collision is higher because, as a matter of fact, GUIDs are in general NOT random. According to Wikipedia's GUID article there are five types of GUIDs. The 13th digit specifies which kind of GUID you have, so it tends not to vary much, and the top two bits of the 17th digit are always fixed at 01.
For each type of GUID you'll get different degrees of randomness. Version 4 (13th digit = 4) is entirely random except for digits 13 and 17; versions 3 and 5 are effectively random, as they are cryptographic hashes; while versions 1 and 2 are mostly NOT random but certain parts are fairly random in practical cases. A "gotcha" for version 1 and 2 GUIDs is that many GUIDs could come from the same machine and in that case will have a large number of identical bits (in particular, the last 48 bits and many of the time bits will be identical). Or, if many GUIDs were created at the same time on different machines, you could have collisions between the time bits. So, good luck safely truncating that.
I had a situation where my software only supported 64 bits for unique IDs so I couldn't use GUIDs directly. Luckily all of the GUIDs were type 4, so I could get 64 bits that were random or nearly random. I had two million records to store, and the birthday algorithm indicated that the probability of a collision was 1.08420141198273 x 10^-07 for 64 bits and 0.007 (0.7%) for 48 bits. This should be assumed to be the best-case scenario, since a decrease in randomness will usually increase the probability of collision.
I suppose that in theory, more GUID types could exist in the future than are defined now, so a future-proof truncation algorithm is not possible.
I agree with Rob - Keep all of it.
But since you said you're going into a database, I thought I'd point out that just using Guid's doesn't necessarily mean that it will index well in a database. For that reason, the NHibernate developers created a Guid.Comb algorithm that's more DB friendly.
See NHibernate POID Generators revealed and documentation on the Guid Algorithms for more information.
NOTE: Guid.Comb is designed to improve performance on MsSQL
Truncating a GUID is a bad idea, please see this article for why.
You should consider generating a shorter GUID, as google reveals some solutions for. These solutions seem to involve taking a GUID and changing it to be represented in full 255 bit ascii.
How can you calculate large factorials using C#? Windows calculator in Win 7 overflows at Factorial (3500). As a programming and mathematical question I am interested in knowing how you can calculate factorial of a larger number (20000, may be) in C#. Any pointers?
[Edit] I just checked with a calc on Win 2k3, since I could recall doing a bigger factorial on Win 2k3. I was surprised by the way things worked out.
Calc on Win2k3 worked with even big numbers. I tried !50000 and I got an answer, 3.3473205095971448369154760940715e+213236
It was very fast while I did all this.
The main question here is not only to find out the appropriate data type, but also a bit mathematical. If I try to write a simple factorial code in C# [recursive or loop], the performance is really bad. It takes multiple seconds to get an answer. How is the calc in Windows 2k3 (or XP) able to perform such a huge factorial in less than 10 seconds? Is there any other way of calculating factorial programmatically in C#?
Have a look at the BigInteger structure:
http://msdn.microsoft.com/en-us/library/system.numerics.biginteger.aspx
Maybe this can help you implement this functionality.
CodeProject has an implementation for older versions of the framework at http://www.codeproject.com/KB/cs/biginteger.aspx.
If I try to write a simple factorial code in C# [recursive or loop], the performance is really bad. It takes multiple seconds to get an answer.
Let's do a quick order-of-magnitude calculation here for a naive implementation of factorial that performs n multiplications. Suppose we are on the last step. 19999! is about 218 bits. 20000 is about 25 bits; we'll assume that it is a 32 bit integer. The final multiplication therefore involves the addition of up to 25 partial results each roughly 218 bits long. The number of bit operations will therefore be on the order of 223.
That's for the last stage; there will be 20000 = 216 such operations at each stage, so that is a total of about 239 operations. Some of them will of course be cheaper, but we're going for an order of magnitude here.
A modern processor does about 232 operations per second. Therefore it will take about 27 seconds to get the result.
Of course, the big integer library writers were not naive; they take advantage of the ability of the chip to do many bit operations in parallel. They're probably doing the math in 32 bit chunks, giving speedups of a factor of 25. So our total order-of-magnitude calculation is that it should take about 22 seconds to get a result.
22 is 4. So your observation that it takes a few seconds to get a result is expected.
How is the calc in Windows 2k3 (or XP) able to perform such a huge factorial in less than 10 seconds?
I don't know. Extreme cleverness in exploiting the math operations on the chip probably. Or, using a non-naive algorithm for calculating factorial. Or, possibly they are using Stirling's Approximation and getting an inexact result.
Is there any other way of calculating factorial programmatically in C#?
Sure. If all you care about is the order of magnitude then you can use Stirling's Approximation. If you care about the exact value then you're going to have to compute it.
There exist sophisticated computational algorithms for efficiently computing the factorials of large, arbitrary precision numbers. The Schönhage–Strassen algorithm, for instance, allows you to perform asymptotically fast multiplication for arbitrarily large integers.
Case in point, Mathematica computes 22000! on my machine in less than 1 second. The Implementation Notes page at reference.wolfram.com states:
(Mathematica's) n! uses an O(log(n) M(n)) algorithm of Schönhage based on dynamic decomposition to prime powers.
Unfortunately, the implementation of such algorithms is both complicated and error prone. Rather than trying to roll your own implementation, it may be wiser for you to license a copy of Mathematica (or a similar product that meets your functional and performance needs) and either use it, or a .NET programming interface to it, to perform your computation.
Have you looked at System.Numerics.BigInteger?
Using System.Numerics BigInteger
var bi = new BigInteger(1);
var factorial = 171;
for (var i = 1; i <= factorial; i++)
{
bi *= i;
}
will be calculated to
1241018070217667823424840524103103992616605577501693185388951803611996075221691752992751978120487585576464959501670387052809889858690710767331242032218484364310473577889968548278290754541561964852153468318044293239598173696899657235903947616152278558180061176365108428800000000000000000000000000000000000000000
For 50000! it takes a couple seconds to calculate but it seems to work and the result is a 213237 digit number and that's also what Wolfram says.
You will probably have to implement your own arbitrary precision numeric type.
There are various approaches. probably not the most efficient, but perhaps the simplest is to have variable length arrays of byte (unsigned char). Each element represents a digit. ideally this would be included in a class, and you can then add a method which let's you multiply the number with another arbitrary precision number. A multiply with a standard C# integer would probably also be a good idea, but a little trickier to implement.
Since they don't give you the result down to the last digit, they may be "cheating" using some approximation.
Check out http://mathworld.wolfram.com/StirlingsApproximation.html
Using Stirling's formula you can calculate (an approximation of) the factorial of n in logn time. Of course, they might as well have a dictionary with pre-calculated values of factorial(n) for every n up to one million, making the calculator show the result extremely fast.
This answer covers limits for basic .Net types to compute and represent n!
Basic code to calculate factorial for "SomeType" that supports multiplication:
SomeType factorial = 1;
int n = 35;
for (int i = 1; i <= n; i++)
{
factorial *= i;
}
Limits for built in number types:
short - correct results up to 7!, incorrect results afterwards, code returns 0 starting 18 (similar to int)
int - correct results up to 12!, incorrect results afterwards, code returns 0 starting at 34 (Why computing factorial of realtively small numbers (34+) returns 0)
float - precise results up to 14!, correct but not precise afterwards, returns infinity starting at 35
long - correct results up to 20!, incorrect results afterwards, code returns 0 starting at 66 (similar to int)
double - precise results up to 22!, correct but not precise afterwards, returns infinity starting at 171
BigInteger - precise and upper limit is set by memory usage only.
Note: integer types overflow pretty quickly and start producing incorrect results. Realistically if you need factorials for any practical usage long is the type to go (up to 20!), if you can't expect limited numbers - BigInteger is the only type provided in .Net Framework to provide precise results (albeit slow for large numbers as there is no built-in optimized n! method)
You need a special big-number library for this. This link introduces the System.Numeric.BigInteger class, and incidentally has an example program that calculates factorials. But don't use the example! If you recurse like that, your stack will grow horribly. Just write a for-loop to do the multiplication.
I don't know how you could do this in a language without arbitrary precision arithmetic. I guess a start could be to count factors of 5 and 2, removing them from the product, and add on these zeroes at the end.
As you can see there are many.
>>> factorial(20000)
<<non-zeroes removed>>0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000L