List<T> capacity increasing vs Dictionary<K,V> capacity increasing? - c#

Why does List<T> increase its capacity by a factor of 2?
private void EnsureCapacity(int min)
{
if (this._items.Length < min)
{
int num = (this._items.Length == 0) ? 4 : (this._items.Length * 2);
if (num < min)
{
num = min;
}
this.Capacity = num;
}
}
Why does Dictionary<K,V> use prime numbers as capacity?
private void Resize()
{
int prime = HashHelpers.GetPrime(this.count * 2);
int[] numArray = new int[prime];
for (int i = 0; i < numArray.Length; i++)
{
numArray[i] = -1;
}
Entry<TKey, TValue>[] destinationArray = new Entry<TKey, TValue>[prime];
Array.Copy(this.entries, 0, destinationArray, 0, this.count);
for (int j = 0; j < this.count; j++)
{
int index = destinationArray[j].hashCode % prime;
destinationArray[j].next = numArray[index];
numArray[index] = j;
}
this.buckets = numArray;
this.entries = destinationArray;
}
Why doesn't it also just multiply by 2? Both are dealing with finding continues memory location...correct?

It's common to use prime numbers for hash table sizes because it reduces the probability of collisions.
Hash tables typically use the modulo operation to find the bucket where an entry belongs, as you can see in your code:
int index = destinationArray[j].hashCode % prime;
Suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering:
Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries.
Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.
(from http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html)
HashHelpers.GetPrime(this.count * 2)
should return a prime number. Look at the definition of HashHelpers.GetPrime().

Dictionary puts all its objects into buckets depending on their GetHashCode value, i.e.
Bucket[object.GetHashCode() % DictionarySize] = object;
It uses a prime number for size to avoid the chance of collisions. Presumably a size with many divisors would be bad for poorly designed hash codes.

From a question in SO;
Dictionary or hash table relies on hashing the key to get a smaller
index to look up into corresponding store (array). So choice of hash
function is very important. Typical choice is to get hash code of a
key (so that we get good random distribution) and then divide the code
by a prime number and use reminder to index into fixed number of
buckets. This allows to convert arbitrarily large hash codes into a
bounded set of small numbers for which we can define an array to look
up into. So its important to have array size in prime number and then
the best choice for the size become the prime number that is larger
than the required capacity. And that's exactly dictionary
implementation does.
List<T> employs arrays to store data; and increasing the capacity of an array requires copying the array to a new memory location; which is time consuming. I guess, in order to lower the occurence of copying arrays, list doubles it's capacity.

I'm not computer scientist, but ...
Most probabbly its related to a HashTable's Load factor (the last link just a math definition), and for not creating more confusion, for not math auditory, it's important to define that:
loadFactor = FreeCells/AllCells
this we can write as
loadFactor = (AllBuckets - UsedBuckets)/AllBuckets
loadFactor defines a probabbilty of collision in hash map.
So by using a Prime Number,a number that
..is a natural number greater than 1 that
has no positive divisors other than 1 and itself.
we decrease (but do not erase) a risk of collision in our hashmap.
If loadFactor tends to 0, we have more secure hashmap, so we always has to keep it as low as possible. By MS blog, they found out that the value of that loadFactor (optimal one) has to be arround 0.72, so if it becomes bigger, we increase the capacity following nearest prime number.
EDIT
To be more clear on this: having a prime number, ensures, as mush as it possible, uniform destribution of the hashes in this concrete implementation of the hash we have in .NET dictionary. It's not about efficency of the retrieval of the values, but efficiency of the memory used and collision risk reduction.
Hope this helps.

Dictionary needs some heuristic so that hash code distribution among buckets is more uniform.
.NET's Dictionary uses prime number of buckets to do that, and then calculates bucket index like this:
int num = this.comparer.GetHashCode(key) & 2147483647; // make hash code positive
// get the remainder from division - that's our bucket index
int num2 = this.buckets[num % ((int)this.buckets.Length)];
When it grows, it doubles the number of buckets and then adds some more to make the number prime again.
It's not the only heuristic possible. Java's HashMap, for example, takes another approach. The number of buckets there is always a power of 2 and on grow it just doubles the number of buckets:
resize(2 * table.length);
But when calculating bucket index it modifies hash:
static int hash(int h) {
// This function ensures that hashCodes that differ only by
// constant multiples at each bit position have a bounded
// number of collisions (approximately 8 at default load factor).
h ^= (h >>> 20) ^ (h >>> 12);
return h ^ (h >>> 7) ^ (h >>> 4);
}
static int indexFor(int h, int length) {
return h & (length-1);
}
// from put() method
int hash = hash(key.hashCode()); // get modified hash
int i = indexFor(hash, table.length); // trim the hash to the bucket count
List on the other hand doesn't need any heuristic, so they didn't bother.
Addition: Grow behavior doesn't influence Add's complexity at all. Dictionary, HashMap and List each have amortized Add complexity of O(1).
Grow operation takes O(N) but occurs only N-th time, so to cause grow operation we need to call Add N times. For N=8 the time it takes to do N Adds has the value
O(1)+O(1)+O(1)+O(1)+O(1)+O(1)+O(1)+O(N) = O(N)+O(N) = O(2N) = O(N)
So, N Adds take O(N), then one Add takes O(1).

Increasing the capacity by a constant factor (instead of for example increasing the capacity by a additive constant) when resizing is required to guarantee some amortized running times. For example adding to or removing from the end of an array based list requires O(1) time except when you have to increase or decrease the capacity requiring to copy the list content and therefore requiring O(n) time. Changing the capacity by a constant factor guarantees that the amortized runtime is still O(1). The optimal value of the factor depends on the expected usage. Some more information on Wikipedia.
Choosing the capacity of a hash table to be prime is used to improve the distribution of the items. bucket[hash % capacity] will yield a more uniform distribution if hash is not uniformly distributed if capacity is prime. (I can not give the math behind that but I am looking for a good reference.) The combination of this with the first point is exactly what the implementation does - increasing the capacity by a factor (of at least) 2 and also ensure that the capacity is prime.

Related

Selecting set of binary sequences to avoid similarity

I want to be able to programatically generate a set of binary sequences of a given length whilst avoiding similarity between any two sequences.
I'll define 'similar' between two sequences thus:
If sequence A can be converted to sequence B (or B to A) by bit-shifting A (non-circularly) and padding with 0s, A and B are similar (note: bit-shifting is allowed on only one of the sequences otherwise both could always be shifted to a sequence of just 0s)
For example: A = 01010101 B = 10101010 C = 10010010
In this example, A and B are similar because a single left-shift of A results in B (A << 1 = B). A and C are not similar because no bit-shifting of one can result in the other.
A set of sequences is defined is dissimilar if no subset of size 2 is similar.
I believe there could be multiple sets for a given sequence length and presumably the size of the set will be significantly less than the total possibilities (total possibilities = 2 ^ sequence length).
I need a way to generate a set for a given sequence length. Does an algorithm exist that can achieve this? Selecting sequences one at a time and checking against all previously selected sequences is not acceptable for my use case (but may have to be if a better method doesn't exist!).
I've tried generating sets of integers based on primes numbers and also the golden ratio, then converting to binary. This seemed like it might be a viable method, but I have been unable to get it to work as expected.
Update: I have written a function in C# to that uses a prime number modulo to generate the set without success. Also I've tried using the Fibonacci sequence which finds a mostly dissimilar set, but of a size that is very small compared to the number of possibilities:
private List<string> GetSequencesFib(int sequenceLength)
{
var sequences = new List<string>();
long current = 21;
long prev = 13;
long prev2 = 8;
long size = (long)Math.Pow(2, sequenceLength);
while (current < size)
{
current = prev + prev2;
sequences.Add(current.ToBitString(sequenceLength));
prev2 = prev;
prev = current;
}
return sequences;
}
This generates a set of sequences of size 41 that is roughly 60% dissimilar (sequenceLength = 32). It is started at 21 since lower values produce sequences of mostly 0s which are similar to any other sequence.
By relaxing the conditions of similarity to only allowing a small number of successive bit-shifts, the proportion of dissimilar sequences approaches 100%. This may be acceptable in my use case.
Update 2:
I've implemented a function following DCHE's suggestion, by selecting all odd numbers greater than half the maximum value for a given sequence length:
private static List<string> GetSequencesOdd(int length)
{
var sequences = new List<string>();
long max = (long)(Math.Pow(2, length));
long quarterMax = max / 4;
for (long n = quarterMax * 2 + 1; n < max; n += 2)
{
sequences.Add(n.ToBitString(length));
}
return sequences;
}
This produces an entirely dissimilar set as per my requirements. I can see why this works mathematically as well.
I can't prove it, but from my experimenting, I think that your set is the odd integers greater than half of the largest number in binary. E.g. for bit sets of length 3, max integer is 7, so the set is 5 and 7 (101, 111).

Create random ints with minimum and maximum from Random.NextBytes()

Title pretty much says it all. I know I could use Random.NextInt(), of course, but I want to know if there's a way to turn unbounded random data into bounded without statistical bias. (This means no RandomInt() % (maximum-minimum)) + minimum). Surely there is a method like it, that doesn't introduce bias into the data it outputs?
If you assume that the bits are randomly distributed, I would suggest:
Generate enough bytes to get a number within the range (e.g. 1 byte to get a number in the range 0-100, 2 bytes to get a number in the range 0-30000 etc).
Use only enough bits from those bytes to cover the range you need. So for example, if you're generating numbers in the range 0-100, take the bottom 7 bits of the byte you've generated
Interpret the bits you've got as a number in the range [0, 2n) where n is the number of bit
Check whether the number is in your desired range. It should be at least half the time (on average)
If so, use it. If not, repeat the above steps until a number is in the right range.
The use of just the required number of bits is key to making this efficient - you'll throw away up to half the number of bytes you generate, but no more than that, assuming a good distribution. (And if you are generating numbers in a nicely binary range, you won't need to throw anything away.)
Implementation left as an exercise to the reader :)
You could try with something like:
public static int MyNextInt(Random rnd, int minValue, int maxValue)
{
var buffer = new byte[4];
rnd.NextBytes(buffer);
uint num = BitConverter.ToUInt32(buffer, 0);
// The +1 is to exclude the maxValue in the case that
// minValue == int.MinValue, maxValue == int.MaxValue
double dbl = num * 1.0 / ((long)uint.MaxValue + 1);
long range = (long)maxValue - minValue;
int result = (int)(dbl * range) + minValue;
return result;
}
Totally untested... I can't guarantee that the results are truly pseudo-random... But the idea of creating a double (dbl) number is the same used by the Random class. Only I use the uint.MaxValue as the base instead of int.MaxValue. In this way I don't have to check for negative values of the buffer.
I propose a generator of random integers, based on NextBytes.
This method discards only 9.62% of bits in average over the word size range for positive Int32's due to the usage of Int64 as a representation for bit manupulation.
Maximum bit loss occurs at word size of 22 bits, and it's 20 lost bits of 64 used in byte range conversion. In this case bit efficiency is 68.75%
Also, 25% of values are lost because of clipping the unbound range to maximum value.
Be careful to use Take(N) on the IEnumerable returned, because it's an infinite generator otherwise.
I'm using a buffer of 512 long values, so it generates 4096 random bytes at once. If you just need a sequence of few integers, change the buffer size from 512 to a more optimal value, down to 1.
public static class RandomExtensions
{
public static IEnumerable<int> GetRandomIntegers(this Random r, int max)
{
if (max < 1)
throw new ArgumentOutOfRangeException("max", max, "Must be a positive value.");
const int longWordsTotal = 512;
const int bufferSize = longWordsTotal * 8;
var buffer = new byte[bufferSize];
var wordSize = (int)Math.Log(max, 2) + 1;
while(true)
{
r.NextBytes(buffer);
for (var longWordIndex = 0; longWordIndex < longWordsTotal; longWordIndex++)
{
ulong longWord = BitConverter.ToUInt64(buffer, longWordIndex);
var lastStartBit = 64 - wordSize;
var count = 0;
for (var startBit = 0; startBit <= lastStartBit; startBit += wordSize)
{
count ++;
var mask = ((1UL << wordSize) - 1) << startBit;
var unboundValue = (int)((mask & longWord) >> startBit);
if (unboundValue <= max)
yield return unboundValue;
}
}
}
}
}

Dictionary <,> Size , GetHashCode and Prime Numbers?

I've been reading quite a lot about this interesting topic (IMO). but I'm not fully understand one thing :
Dictionary size is increasing its capacity ( doubles to the closest prime number) to a prime number (when reallocation) :
because :
int index = hashCode % [Dictionary Capacity];
So we can see that prime numbers are used here for [Dictionary Capacity] because their GreatestCommonFactor is 1. and this helps to avoid collisions.
In addition
I've seen many samples of implementing theGetHashCode() :
Here is a sample from Jon Skeet :
public override int GetHashCode()
{
unchecked
{
int hash = 17;
// Suitable nullity checks etc, of course :)
hash = hash * 23 + field1.GetHashCode();
hash = hash * 23 + field2.GetHashCode();
hash = hash * 23 + field3.GetHashCode();
return hash;
}
}
I don't understand :
Question
Does prime numbers are used both in : Dictionary capacity
and in the generation of getHashCode ?
Because in the code above , there is a good chance that the return value will not be a prime number [please correct me if i'm wrong] because of the
multiplication by 23
addition of the GetHashCode() value for each field.
For Example : (11,17,173 are prime number)
int hash = 17;
hash = hash * 23 + 11; //402
hash = hash * 23 + 17; //9263
hash = hash * 23 + 173 //213222
return hash;
213222 is not a prime.
Also there is not any math rule which state :
(not a prime number) + (prime number) = (prime number)
nor
(not a prime number) * (prime number) = (prime number)
nor
(not a prime number) * (not a prime number) = (prime number)
So what am I missing ?
It does not matter what the result of GetHashCode is (it does not have to be prime at all), as long as the result is the same for two objects that are considered to be equal. However, it is nice (but not required) to have GetHashCode return a different value for two objects that are considered to be different (but still not necessarily prime).
Given two numbers a and b, when you multiply them you get c = a * b. There are usually multiple different pairs of a and b that give the same result c. For example 6 * 2 = 12 and 4 * 3 = 12. However, when a is a prime number, there are a lot less pairs that give the same result. This is convenient for the property that the hash code should be different for different objects.
In the dictionary the same principle applies: the objects are put in buckets depending on their hash. Since most integers do not divide nicely by a prime number, you get a nice spreading of your objects in the buckets. Ideally you'd want only one item in each bucket for optimal dictionary performance.
Slightly off-topic: Cicada's (that's an insect) use prime numbers to determine after how many years they go and mate again. Since this mating cycle is a prime number of years, the chances of the mating continously coinciding with the life cycles of any of its enemies are slim.

Fastest way to sum digits in a number

Given a large number, e.g. 9223372036854775807 (Int64.MaxValue), what is the quickest way to sum the digits?
Currently I am ToStringing and reparsing each char into an int:
num.ToString().Sum(c => int.Parse(new String(new char[] { c })));
Which is surely horrifically inefficent. Any suggestions?
And finally, how would you make this work with BigInteger?
Thanks
Well, another option is:
int sum = 0;
while (value != 0)
{
int remainder;
value = Math.DivRem(value, 10, out remainder);
sum += remainder;
}
BigInteger has a DivRem method as well, so you could use the same approach.
Note that I've seen DivRem not be as fast as doing the same arithmetic "manually", so if you're really interested in speed, you might want to consider that.
Also consider a lookup table with (say) 1000 elements precomputed with the sums:
int sum = 0;
while (value != 0)
{
int remainder;
value = Math.DivRem(value, 1000, out remainder);
sum += lookupTable[remainder];
}
That would mean fewer iterations, but each iteration has an added array access...
Nobody has discussed the BigInteger version. For that I'd look at 101, 102, 104, 108 and so on until you find the last 102n that is less than your value. Take your number div and mod 102n to come up with 2 smaller values. Wash, rinse, and repeat recursively. (You should keep your iterated squares of 10 in an array, and in the recursive part pass along the information about the next power to use.)
With a BigInteger with k digits, dividing by 10 is O(k). Therefore finding the sum of the digits with the naive algorithm is O(k2).
I don't know what C# uses internally, but the non-naive algorithms out there for multiplying or dividing a k-bit by a k-bit integer all work in time O(k1.6) or better (most are much, much better, but have an overhead that makes them worse for "small big integers"). In that case preparing your initial list of powers and splitting once takes times O(k1.6). This gives you 2 problems of size O((k/2)1.6) = 2-0.6O(k1.6). At the next level you have 4 problems of size O((k/4)1.6) for another 2-1.2O(k1.6) work. Add up all of the terms and the powers of 2 turn into a geometric series converging to a constant, so the total work is O(k1.6).
This is a definite win, and the win will be very, very evident if you're working with numbers in the many thousands of digits.
Yes, it's probably somewhat inefficient. I'd probably just repeatedly divide by 10, adding together the remainders each time.
The first rule of performance optimization: Don't divide when you can multiply instead. The following function will take four digit numbers 0-9999 and do what you ask. The intermediate calculations are larger than 16 bits. We multiple the number by 1/10000 and take the result as a Q16 fixed point number. Digits are then extracted by multiplication by 10 and taking the integer part.
#define TEN_OVER_10000 ((1<<25)/1000 +1) // .001 Q25
int sum_digits(unsigned int n)
{
int c;
int sum = 0;
n = (n * TEN_OVER_10000)>>9; // n*10/10000 Q16
for (c=0;c<4;c++)
{
printf("Digit: %d\n", n>>16);
sum += n>>16;
n = (n & 0xffff) * 10; // next digit
}
return sum;
}
This can be extended to larger sizes but its tricky. You need to ensure that the rounding in the fixed point calculation always works correctly. I also did 4 digit numbers so the intermediate result of the fixed point multiply would not overflow.
Int64 BigNumber = 9223372036854775807;
String BigNumberStr = BigNumber.ToString();
int Sum = 0;
foreach (Char c in BigNumberStr)
Sum += (byte)c;
// 48 is ascii value of zero
// remove in one step rather than in the loop
Sum -= 48 * BigNumberStr.Length;
Instead of int.parse, why not subtract '0' from each digit to get the actual value.
Remember, '9' - '0' = 9, so you should be able to do this in order k (length of the number). The subtraction is just one operation, so that should not slow things down.

How can I best generate a static array of random number on demand?

An application I'm working on requires a matrix of random numbers. The matrix can grow in any direction at any time, and isn't always full. (I'll probably end up re-implementing it with a quad tree or something else, rather than a matrix with a lot of null objects.)
I need a way to generate the same matrix, given the same seed, no matter in which order I calculate the matrix.
LazyRandomMatrix rndMtx1 = new LazyRandomMatrix(1234) // Seed new object
float X = rndMtx1[0,0] // Lazily generate random numbers on demand
float Y = rndMtx1[3,16]
float Z = rndMtx1[23,-5]
Debug.Assert(X == rndMtx1[0,0])
Debug.Assert(Y == rndMtx1[3,16])
Debug.Assert(Z == rndMtx1[23,-5])
LazyRandomMatrix rndMtx2 = new LazyRandomMatrix(1234) // Seed second object
Debug.Assert(Y == rndMtx2[3,16]) // Lazily generate the same random numbers
Debug.Assert(Z == rndMtx2[23,-5]) // on demand in a different order
Debug.Assert(X == rndMtx2[0,0])
Yes, if I knew the dimensions of the array, the best way would be to generate the entire array, and just return values, but they need to be generated independently and on demand.
My first idea was to initialize a new random number generator for each call to a new coordinate, seeding it with some hash of the overall matrix's seed and the coordinates used in calling, but this seems like a terrible hack, as it would require creating a ton of new Random objects.
What you're talking about is typically called "Perlin Noise", here's a link for you: http://freespace.virgin.net/hugo.elias/models/m_perlin.htm
The most important thing in that article is the noise function in 2D:
function Noise1(integer x, integer y)
n = x + y * 57
n = (n<<13) ^ n;
return ( 1.0 - ( (n * (n * n * 15731 + 789221) + 1376312589) & 7fffffff) / 1073741824.0);
end function
It returns a number between -1.0 and +1.0 based on the x and y coordonates alone (and a hard coded seed that you can change randomly at the start of your app or just leave it as it is).
The rest of the article is about interpolating these numbers, but depending on how random you want these numbers, you can just leave them as it is. Note that these numbers will be utterly random. If you instead apply a Cosine Interpolator and use the generated noise every 5-6 indexes, interpolating inbetween, you get heightmap data (which is what I used it for). Skip it for totally random data.
Standart random generator usually is generator of sequence, where each next element is build from previous. So to generate rndMtx1[3,16] you have to generate all previous elements in a sequence.
Actually you need something different from random generator, because you need only one value, but not the sequence. You have to build your own "generator" which uses seed and indexes as input for formula to produce single random value. You can invent many ways to do so. One of the simplest way is to take random value asm hash(seed + index) (I guess idea of hashes used in passwords and signing is to produce some stable "random" value out of input data).
P.S. You can improve your approach with independent generators (Random(seed + index)) by making lazy blocks of matrix.
I think your first idea of instantiating a new Random object seeded by some deterministic hash of (x-coordinate, y-coordinate, LazyRandomMatrix seed) is probably reasonable for most situations. In general, creating lots of small objects on the managed heap is something the CLR is very good at handling efficiently. And I don't think Random.ctor() is terribly expensive. You can easily measure the performance if it's a concern.
A very similar solution which may be easier than creating a good deterministic hash is to use two Random objects each time. Something like:
public int this[int x, int y]
{
get
{
Random r1 = new Random(_seed * x);
Random r2 = new Random(y);
return (r1.Next() ^ r2.Next());
}
}
Here is a solution based on a SHA1 hash. Basically this takes the bytes for the X, Y and Seed values and packs this into a byte array. Then a hash for the byte array and the first 4 bytes of the hash used to generate an int. This should be pretty random.
public class LazyRandomMatrix
{
private int _seed;
private SHA1 _hashProvider = new SHA1CryptoServiceProvider();
public LazyRandomMatrix(int seed)
{
_seed = seed;
}
public int this[int x, int y]
{
get
{
byte[] data = new byte[12];
Buffer.BlockCopy(BitConverter.GetBytes(x), 0, data, 0, 4);
Buffer.BlockCopy(BitConverter.GetBytes(y), 0, data, 4, 4);
Buffer.BlockCopy(BitConverter.GetBytes(_seed), 0, data, 8, 4);
byte[] hash = _hashProvider.ComputeHash(data);
return BitConverter.ToInt32(hash, 0);
}
}
}
PRNGs can be built out of hash functions.
This is what e.g. MS Research did with parallelizing random number generation with MD5 or others with TEA on a GPU.
(In fact, PRNGs can be thought of as a hash function from (seed, state) => nextnumber.)
Generating massive amounts of random numbers on a GPU brings up similar problems.
(E.g., to make it parallel, there should not be a single shared state.)
Although it is more common in the crypto world, using a crypto hash, I have taken the liberty to use MurmurHash 2.0 for sake of speed and simplicity. It has very good statistical properties as a hash function. A related, but not identical test shows that it gives good results as a PRNG. (unless I have fsc#kd up something in the C# code, that is.:) Feel free to use any other suitable hash function; crypto ones (MD5, TEA, SHA) as well - though crypto hashes tend to be much slower.
public class LazyRandomMatrix
{
private uint seed;
public LazyRandomMatrix(int seed)
{
this.seed = (uint)seed;
}
public int this[int x, int y]
{
get
{
return MurmurHash2((uint)x, (uint)y, seed);
}
}
static int MurmurHash2(uint key1, uint key2, uint seed)
{
// 'm' and 'r' are mixing constants generated offline.
// They're not really 'magic', they just happen to work well.
const uint m = 0x5bd1e995;
const int r = 24;
// Initialize the hash to a 'random' value
uint h = seed ^ 8;
// Mix 4 bytes at a time into the hash
key1 *= m;
key1 ^= key1 >> r;
key1 *= m;
h *= m;
h ^= key1;
key2 *= m;
key2 ^= key2 >> r;
key2 *= m;
h *= m;
h ^= key2;
// Do a few final mixes of the hash to ensure the last few
// bytes are well-incorporated.
h ^= h >> 13;
h *= m;
h ^= h >> 15;
return (int)h;
}
}
A pseudo-random number generator is essentially a function that deterministically calculates a successor for a given value.
You could invent a simple algorithm that calculates a value from its neighbours. If a neighbour doesn't have a value yet, calculate its value from its respective neighbours first.
Something like this:
value(0,0) = seed
value(x+1,0) = successor(value(x,0))
value(x,y+1) = successor(value(x,y))
Example with successor(n) = n+1 to calculate value(2,4):
\ x 0 1 2
y +-------------------
0 | 627 628 629
1 | 630
2 | 631
3 | 632
4 | 633
This example algorithm is obviously not very good, but you get the idea.
You want a random number generator with random access to the elements, instead of sequential access. (Note that you can reduce your two coordinates into a single index i.e. by computing i = x + (y << 16).)
A cool example of such a generator is Blum Blum Shub, which is a cryptographically secure PRNG with easy random-access. Unfortunately, it is very slow.
A more practical example is the well-known linear congruential generator. You can easily modify one to allow random access. Consider the definition:
X(0) = S
X(n) = B + X(n-1)*A (mod M)
Evaluating this directly would take O(n) time (that's pseudo linear, not linear), but you can convert to a non-recursive form:
//Expand a few times to see the pattern:
X(n) = B + X(n-1)*A (mod M)
X(n) = B + (B + X(n-2)*A)*A (mod M)
X(n) = B + (B + (B + X(n-3)*A)*A)*A (mod M)
//Aha! I see it now, and I can reduce it to a closed form:
X(n) = B + B*A + B*A*A + ... + B*A^(N-1) + S*A^N (mod M)
X(n) = S*A^N + B*SUM[i:0..n-1](A^i) (mod M)
X(n) = S*A^N + B*(A^N-1)/(A-1) (mod M)
That last equation can be computed relatively quickly, although the second part of it is a bit tricky to get right (because division doesn't distribute over mod the same way addition and multiplication do).
As far as I see, there are 2 basic algorithms possible here:
Generate a new random number based on func(seed, coord) for each coord
Generate a single random number from seed, and then transform it for the coord (something like rotate(x) + translate(y) or whatever)
In the first case, you have the problem of always generating new random numbers, although this may not be as expensive as you fear.
In the second case, the problem is that you may lose randomness during your transformation operations. However, in either case the result is reproducible.

Categories