smart way to generate unique random number

smart way to generate unique random number - c#

i want to generate a sequence of unique random numbers in the range of 00000001 to 99999999.
So the first one might be 00001010, the second 40002928 etc.
The easy way is to generate a random number and store it in the database, and every next time do it again and check in the database if the number already exists and if so, generate a new one, check it again, etc.
But that doesn't look right, i could be regenerating a number maybe 100 times if the number of generated items gets large.
Is there a smarter way?
EDIT
as allways i forgot to say WHY i wanted this, and it will probably make things clearer and maybe get an alternative, and it is:
we want to generate an ordernumber for a booking, so we could just use 000001, 000002 etc. But we don't want to give the competitors a clue of how much orders are created (because it's not a high volume market, and we don't want them to know if we are on order 30 after 2 months or at order 100. So we want to have an order number which is random (yet unique)

You can use either an Linear Congruential Generator (LCG) or Linear Feedback Shift Register (LFSR). Google or wikipedia for more info.
Both can, with the right parameters, operate on a 'full-cycle' (or 'full period') basis so that they will generate a 'psuedo-random number' only once in a single period, and generate all numbers within the range. Both are 'weak' generators, so no good for cyptography, but perhaps 'good enough' for apparent randomness. You may have to constrain the period to work within your 'decimal' maximum as having 'binary' periods is necessary.
Update: I should add that it is not necessary to pre-calculate or pre-store previous values in any way, you only need to keep the previous seed-value (single int) and calculate 'on-demand' the next number in the sequence. Of course you can save a chain of pre-calculated numbers to your DB if desired, but it isn't necessary.

How about creating a set all of possible numbers and simply randomising the order? You could then just pick the next number from the tail.
Each number appears only once in the set, and when you want a new one it has already been generated, so the overhead is tiny at the point at which you want one. You could do this in memory or the database of your choice. You'll just need a sensible locking strategy for pulling the next available number.

You could build a table with all the possible numbers in it, give the record a 'used' field.
Select all records that have not been 'used'
Pick a random number (r) between 1 and record count
Take record number r
Get your 'random value' from the record
Set the 'used' flag and update the db.
That should be more efficient than picking random numbers, querying the database and repeat until not found as that's just begging for an eternity for the last few values.

Use Pseudo-random Number Generators.
For example - Linear Congruential Random Number Generator
(if increment and n are coprime, then code will generate all numbers from 0 to n-1):
int seed = 1, increment = 3;
int n = 10;
int x = seed;
for(int i = 0; i < n; i++)
{
x = (x + increment) % n;
Console.WriteLine(x);
}
Output:
4
7
0
3
6
9
2
5
8
1
Basic Random Number Generators
Mersenne Twister

Using this algorithm might be suitable, though it's memory consuming:
http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
Put the numbers in the array from 1 to 99999999 and do the shuffle.

For the extremely limited size of your numbers no you cannot expect uniqueness for any type of random generation.
You are generating a 32bit integer, whereas to reach uniqueness you need a much larger number in terms around 128bit which is the size GUIDs use which are guaranteed to always be globally unique.

In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at
The Art of Computer Programming, Volume 2: Seminumerical Algorithms
by Donald E. Knuth. Chapter 3 is all about random numbers.

You could just place your numbers in a set. If the size of the set after generation of your N numbers is too small, generate some more.
Do some trial runs. How many numbers do you have to generate on average? Try to find out an optimal solution to the tradeoff "generate too many numbers" / "check too often for duplicates". This optimal is a number M, so that after generating M numbers, your set will likely hold N unique numbers.
Oh, and M can also be calculated: If you need an extra number (your set contains N-1), then the chance of a random number already being in the set is (N-1)/R, with R being the range. I'm going crosseyed here, so you'll have to figure this out yourself (but this kinda stuff is what makes programming fun, no?).

You could put a unique constraint on the column that contains the random number, then handle any constraint voilations by regenerating the number. I think this normally indexes the column as well so this would be faster.
You've tagged the question with C#, so I'm guessing you're using C# to generate the random number. Maybe think about getting the database to generate the random number in a stored proc, and return it.

You could try giving writing usernames by using a starting number and an incremental number. You start at a number (say, 12000), then, for each account created, the number goes up by the incremental value.
id = startValue + (totalNumberOfAccounts * inctrementalNumber)
If incrementalNumber is a prime value, you should be able to loop around the max account value and not hit another value. This creates the illusion of a random id, but should also have very little conflicts. In the case of a conflicts, you could add a number to increase when there's a conflict, so the above code becomes. We want to handle this case, since, if we encounter one account value that is identical, when we increment, we will bump into another conflict when we increment again.
id = startValue + (totalNumberOfAccounts * inctrementalNumber) + totalConflicts

By fallowing line we can get e.g. 6 non repetitive random numbers for range e.g. 1 to 100.
var randomNumbers = Enumerable.Range(1, 100)
.OrderBy(n => Guid.NewGuid())
.Take(6)
.OrderBy(n => n);

I've had to do something like this before (create a "random looking" number for part of a URL). What I did was create a list of keys randomly generated. Each time it needed a new number it simply randomly selected a number from keys.Count and XOR the key and the given sequence number, then outputted XORed value (in base 62) prefixed with the keys index (in base 62).
I also check the output to ensure it does not contain any naught words. If it does simply take the next key and have a second go.
Decrypting the number is equally simple (the first digit is the index to the key to use, a simple XOR and you are done).
I like andora's answer if you are generating new numbers and might have used it had I known. However if I was to do this again I would have simply used UUIDs. Most (if not every) platform has a method for generating them and the length is just not an issue for URLs.

You could try shuffling the set of possible values then using them sequentially.

I like Lazarus's solution, but if you want to avoid effectively pre-allocating the space for every possible number, just store the used numbers in the table, but build an "unused numbers" list in memory by adding all possible numbers to a collection then deleting every one that's present in the database. Then select one of the remaining numbers and use that, adding it to the list in the database, obviously.
But, like I say, I like Lazaru's solution - I think that's your best bet for most scenarios.

function getShuffledNumbers(count) {
var shuffledNumbers = new Array();
var choices = new Array();
for (var i = 0; i<count; i++) {
// choose a number between 1 and amount of numbers remaining
choices[i] = selectedNumber = Math.ceil(Math.random()*(99999999 - i));
// Now to figure out the number based on this selection, work backwards until
// you figure out which choice this number WOULD have been on the first step
for (var j = 0; j < i; j++) {
if (choices[i - 1 - j] >= selectedNumber) {
// This basically says "it was choice number (selectedNumber) on the last step,
// but if it's greater than or equal to this, it must have been choice number
// (selectedNumber + 1) on THIS step."
selectedNumber++;
}
}
shuffledNumbers[i] = selectedNumber;
}
return shuffledNumbers;
}
This is as fast a way I could think of and only uses memory as it needs, however if you run it all the way through it will use double as much memory because it has two arrays, choices and shuffledNumbers.

Running a linear congruential generator once to generate each number is apt to produce rather feeble results. Running it through a number of iterations which is relatively prime to your base (100,000,000 in this case) will improve it considerably. If before reporting each output from the generator, you run it through one or more additional permutation functions, the final output will still be a duplicate-free permutation of as many numbers as you want (up to 100,000,000) but if the proper functions are chosen the result can be cryptographically strong.

create and store ind db two shuffled versions(SHUFFLE_1 and SHUFFLE_2) of the interval [0..N), where N=10'000;
whenever a new order is created, you assign its id like this:
ORDER_FAKE_INDEX = N*SHUFFLE_1[ORDER_REAL_INDEX / N] + SHUFFLE_2[ORDER_REAL_INDEX % N]

I also came with same kind of problem but in C#. I finally solved it. Hope it works for you also.
Suppose I need random number between 0 and some MaxValue and having a Random type object say random.
int n=0;
while(n<MaxValue)
{
int i=0;
i=random.Next(n,MaxValue);
n++;
Write.Console(i.ToString());
}

the stupid way: build a table to record, store all the numble first, and them ,every time the numble used, and flag it as "used"

System.Random rnd = new System.Random();
IEnumerable<int> numbers = Enumerable.Range(0, 99999999).OrderBy(r => rnd.Next());
This gives a randomly shuffled collection of ints in your range. You can then iterate through the collection in order.
The nice part about this is that you're not actually creating the entire collection in memory.
See comments below - this will generate the entire collection in memory when you iterate to the first element.

You can genearate number like below if you are ok with consumption of memory.
import java.util.ArrayList;
import java.util.Collections;
public class UniqueRandomNumbers {
public static void main(String[] args) {
ArrayList<Integer> list = new ArrayList<Integer>();
for (int i=1; i<11; i++) {
list.add(i);
}
Collections.shuffle(list);
for (int i=0; i<11; i++) {
System.out.println(list.get(i));
}
}
}

Related

c# format preserving encryption for integers

I have a requirement for generating numeric codes that will be used as redemption codes for vouchers or similar. The requirement is that the codes are numeric and relatively short for speed on data entry for till operators. Around 6 characters long and numeric. We know that's a small number so we have a process in place so that the codes can expire and be re-used.
We started off by just using a sequential integer generator which is working well in terms of generating a unique code. The issue with this is that the codes generated are sequential so predictable which means customers could guess codes that we generate and redeem a voucher not meant for them.
I've been reading up on Format Preserving Encryption which seems like it might work well for us. We don't need to decrypt the code back at any point as the code itself is arbitrary we just need to ensure it's not predictable (by everyday people). It's not crucial for security it's just to keep honest people honest.
There are various ciphers referenced in the wikipedia article but I have very basic cryptographic and mathematical skills and am not capable of writing my own code to achieve this based on the ciphers.
I guess my question is, does anyone know of a c# implementation of this that will encrypt an integer into another integer and maintain the same length?
FPE seems to be used well for encrypting a 16 digit credit card number into another 16 digit number. We need the same sort of thing but not necessarily fixed to a length but as long is the plain values length matches the encrypted values length.
So the following four integers would be encrypted
from
123456
123457
123458
123459
to something non-sequential like this
521482
265012
961450
346582
I'm open to any other suggestions to achieve this FPE just seemed like a good option.
EDIT
Thanks for the suggestions around just generating a unique code and storing them and checking for duplicates. for now we've avoided doing this because we don't want to have to check storage when we generate. This is why we use a sequential integer generator so we don't need to check if the code is unique or not. I'll re-investigate doing this but for now still looking for ways to avoid having to go to storage each time we generate a code.

I wonder if this will not be off base also, but let me give it a try. This solution will require no storage but will require processing power (a tiny amount, but it would not be pencil-and-paper easy). It is essentially a homemade PRNG but may have characteristics more suitable to what you want to do than the built-in ones do.
To make your number generator, make a polynomial with prime coefficients and a prime modulus. For example, let X represent the Nth voucher you issed. Then:
Voucher Number = (23x^4+19x^3+5x^2+29x+3)%65537. This is of course just an example; you could use any number of terms, any primes you want for the coefficients, and you can make the modulus as large as you like. In fact, the modulus does not need to be prime at all. It only sets the maximum voucher number. Having the coefficients be prime helps cut down on collisions.
In this case, vouchers #100, 101, and 102 would have numbers 26158, 12076, and 6949, respectively. Consider it a sort of toy encryption where the coefficients are your key. Not super secure, but nothing with an output space as small as you are asking for would be secure against a strong adversary. But this should stop the everyday fraudster.
To confirm a valid voucher would take the computer (but calculation only, not storage). It would iterate through a few thousand or tens of thousands of input X looking for the output Y that matches the voucher presented to you. When it found the match, it could signal a valid voucher.
Alternatively, you could issue the vouchers with the serial number and the calculation concatenated together, like a value and checksum. Then you could run the calculation on the value by hand using your secret coefficients to confirm validity.
As long as you do not reveal the coefficients to anyone, it is very hard to identify a pattern in the outputs. I am not sure if this is even close to as secure as what you were looking for, but posting the idea just in case.
I miscalculated the output for 100 (did it by hand and failed). Corrected it just now. Let me add some code to illustrate how I'd check for a valid voucher:
using System;
using System.Numerics;
namespace Vouchers
{
class Program
{
static void Main(string[] args)
{
Console.Write("Enter voucher number: ");
BigInteger input = BigInteger.Parse(Console.ReadLine());
for (BigInteger i = 0;i<10000000;i++)
{
BigInteger testValue = (23 * i * i * i * i + 19 * i * i * i + 5 * i * i + 29 * i + 3) % 65537;
if(testValue==input)
{
Console.WriteLine("That is voucher # " + i.ToString());
break;
}
if (i == 100) Console.WriteLine(testValue);
}
Console.ReadKey();
}
}
}

One option is to build an in-place random permutation of the numbers. Consider this code:
private static readonly Random random = new Random((int)DateTime.UtcNow.Ticks);
private static int GetRandomPermutation(int input)
{
char[] chars = input.ToString().ToCharArray();
for (int i = 0; i < chars.Length; i++ )
{
int j = random.Next(chars.Length);
if (j != i)
{
char temp = chars[i];
chars[i] = chars[j];
chars[j] = temp;
}
}
return int.Parse(new string(chars));
}
You mentioned running into performance issues with some other techniques. This method does a lot of work, so it may not meet your performance requirements. It's a neat academic exercise, anyway.

Thanks for the help from the comments to my original post on this from Blogbeard and lc. It Turns out we needed to hit storage when generating the codes anyway so this meant implementing a PRNG was a better option for us rather than messing around with encryption.
This is what we ended up doing
Continue to use our sequential number generator to generate integers
Create an instance of C# Random class (a PRNG) using the sequential number as a seed.
Generate a random number within the range of the minimum and maximum number we want.
Check for duplicates and regenerate until we find a unique one
Turns out using c# random with a seed makes the random numbers actually quite predictable when using the sequential number as a seed for each generation.
For example with a range between 1 and 999999 using a sequential seed I tested generating 500000 values without a single collision.

Probability with Random.Next()

I want to write a lottery draw program which needs to randomly choose 20000 numbers from 1-2000000 range. The code is as below:
Random r = New Random(seed); //seed is a 6 digits e.g 123456
int i=0;
while(true){
r.Next(2000000);
i++;
if(i>=20000)
break;
}
My questions are:
Can it make sure the same possibility of all the numbers from 1 to 2000000?
Is the upper bound 2000000 included in the r.Next()?
Any suggestion?

The .NET Random class does a fairly good job of generating random numbers. However be aware that if you seed it with the same number you'll get the same "random" numbers each time. If you don't want this behavior don't provide a seed.
If you're after much more random number generator than the built in .NET one then take a look at random.org. It's one of the best sites out there for getting true random numbers - I believe there's an API. Here's a quote from their site:
RANDOM.ORG offers true random numbers to anyone on the Internet. The
randomness comes from atmospheric noise, which for many purposes is
better than the pseudo-random number algorithms typically used in
computer programs. People use RANDOM.ORG for holding drawings,
lotteries and sweepstakes, to drive games and gambling sites, for
scientific applications and for art and music. The service has existed
since 1998 and was built by Dr Mads Haahr of the School of Computer
Science and Statistics at Trinity College, Dublin in Ireland. Today,
RANDOM.ORG is operated by Randomness and Integrity Services Ltd.
Finally Random.Next() is exlusive so the upper value you supply will never be called. You may need to adjust your code appropriately if you want 2000000 to be in there.

It includes the minValue but does not include the maxValue. Therefore if you want to generate numbers from 1 to 2000000 use:
r.Next(1,2000001)

I believe your question is implementation dependent.
The naïve method of generating a random integer in a range is to generate a random 32-bit word and then normalise it across your range.
The larger the range you're normalising the more the probabilities of each individual value fluctuate.
In your situation, you're normalising 4.3 billion inputs over 2 million outputs. This will mean that the probabilities of each number in your range will differ by up to about 1 in 2000 (or 0.05%). If this slight difference in probabilities is okay for you, then go ahead.

Upperbound included?
No, the upperbound is exclusive so you'll have to use 2000001 to include 2000000.
Any suggestion?
Let me take the liberty of suggesting not to use a while(true) / break. Simply put the condition of the if in your while statement:
Random r = New Random(seed); //seed is a 6 digits e.g 123456
int i=0;
while(i++ < 20000)
{
r.Next(1, 2000001);
}
I know this is nitpicking, but it is a suggestion... :)

How to generate 6-8 digit random numbers without colliding with previous generated numbers in time critical fashion?

In this scenarion I have some managers(around 150 in numbers). One of their daily job is to generate 50(constant) authorisation code (6-8 digit numbers) which are stored in db with their Id. If any authorisation code is used that code is marked as used and triggers delete them when they are 15 days old and have been used.
In my table i have set authorisation code as unique key. i generate a random number then query the db if it exists i generate another or i else save it.
Every thing is fine except my logic of checking the existence of number in db.This round trip + checking is causing significant delay as of now there are over 1090083 pending authorisation code. Since these authorisation code are in circulation we cant revoke it and with current load it is taking sometime to find new numbers.
I need to implement it in a different logic for which execution speed should be low regardles of number of random number that has been used.
My table is designed as follows
slno(auth increment) || auth_code (random code) || auth_by (created by) || used
(1=used/0=unused)

The easiest thing to do is generate random numbers and generating a new random id if you get a duplicate. This works because with your figures the probability of getting a duplicate is pretty small.
If that doesnt convince you, you can think of many schemes that guarantee mathematically that the numbers will be unique and still look random, but it gets complex.

Consider this. If randoms are unique and stored in a base in some kind of (code_id, code, other_data) table way, you can just add anoter table in your base: (code, code_id) with the code field being indexed granting you some nice logariphmycal search.
But given this, you can also create an additional key right in your first table instead. As soon as code is unique, it would work fine.

If your database does not support to create unique ids:
- Set up a table with all random numbers which are sorted by value and its
size is stored and available.
Randomly select an element of this table.
Get the successor element. If the successor element is a neighbor of the element,
take the next successor element. If you reach the last element, start over with the
element from step 2 and take now the predecessor.
Now simply choose a random range with element-next element and get your random number.
Ready !
EXAMPLE: You stored all your ids in a sorted table. Lets assume this is e.g.
{890, 1045, 2345, 2346, 4087}
First step: Select one of them randomly. You get that e.g. by C#
Random random = new Random();
int indexOfNumber = random.Next(0, myTableSize);
Second step: You got the index, lets assume it is 2. You are now getting the next number at index 3, it is 2346. Unfortunately it is a direct neighbor, so you continue to index 4.
This is 4087.
Third step: Create your number by
int myRandomNumber = previousElement + random.Next(1,nextElement-previousElement);
in this case:
int myRandomNumber = 2346 + random.Next(1, 4087-2346);
Store the new random number.
With this you will read mostly two elements from the database (probably some more) independent of the size of the database. Creating two random numbers is insignificant.
You must only care for the edge cases if your index is at the end (simply reverse the search direction).

Control over random numbers in C#

If I use two random numbers, how can I ensure that the first of these numbers generated is always larger than the second in order to present a subtraction or a divide quiz to the user?

You don't.
Just check which one is larger and present accordingly.

You generate the second random number and add it to the first one.

var max = 1000;
var rnd = new Random();
int a, b;
do
{
a = rnd.Next(max);
b = rnd.Next(max);
} while (a <= b);
You can use similar approach for more complex conditions too (for example if your task is to generate 2 numbers that in sum give more than 100, etc).
You will have to make your code smarter if probability of random numbers satisfying your condition is so small that generation takes too much time but for this particular task this approach is good enough.

You can generate numbers like this (pseudocode): (int)(a*rand()+b) where a and b control the range and starting point of your random numbers.
If a1=10, b1=1 for instance you get a range of 1-10. With a2=10 and b2=11 you get numbers in the range 11-20, which might be good for simple subtraction problems.

Dictionary of Primes

I was trying to create this helper function in C# that returns the first n prime numbers. I decided to store the numbers in a dictionary in the <int,bool> format. The key is the number in question and the bool represents whether the int is a prime or not. There are a ton of resources out there calculating/generating the prime numbers(SO included), so I thought of joining the masses by crafting another trivial prime number generator.
My logic goes as follows:
public static Dictionary<int,bool> GetAllPrimes(int number)
{
Dictionary<int, bool> numberArray = new Dictionary<int, bool>();
int current = 2;
while (current <= number)
{
//If current has not been marked as prime in previous iterations,mark it as prime
if (!numberArray.ContainsKey(current))
numberArray.Add(current, true);
int i = 2;
while (current * i <= number)
{
if (!numberArray.ContainsKey(current * i))
numberArray.Add(current * i, false);
else if (numberArray[current * i])//current*i cannot be a prime
numberArray[current * i] = false;
i++;
}
current++;
}
return numberArray;
}
It will be great if the wise provide me with suggestions,optimizations, with possible refactorings. I was also wondering if the inclusion of the Dictionary helps with the run-time of this snippet.

Storing integers explicitly needs at least 32 bits per prime number, with some overhead for the container structure.
At around 231, the maximal value a signed 32 bit integer can take, about every 21.5th number is prime. Smaller primes are more dense, about 1 in ln(n) numbers is prime around n.
This means it is more memory efficient to use an array of bits than to store numbers explicitly. It will also be much faster to look up if a number is prime, and reasonably fast to iterate through the primes.
It seems this is called a BitArray in C# (in Java it is BitSet).

The first thing that bothers is that, why are you storing the number itself ?
Can't you just use the index itself which will represent the number?
PS: I'm not a c# developer so maybe it is not possible with a dictionary, but it can be done with the appropriate structure.

First, you only have to loop untill the square root of the number. Make all numbers false by default and have a simple flag that you set true at the beginning of every iteration.
Further, don't store it in a dictionary. Make it a bool array and have the index be the number you're looking for. Only 0 won't make any sense, but that doesn't matter. You don't have to init either; bools are false by default. Just declare an bool[] of number length.
Then, I would init like this:
primes[2] = true;
for(int i = 3; i < sqrtNumber; i += 2) {
}
So you skip all the even numbers automatically.
By the way, never declare a variable (i) in a loop, it makes it slower.
So that's about it. For more info see this page.

I'm pretty sure the Dictionary actually hurts performance, since it doesn't enable you to perform the trial divisions in an optimal order. Traditionally, you would store the known primes so that they could be iterated from smallest to largest, since smaller primes are factors of more composite numbers than larger primes. Additionally, you never need to try division with any prime larger than the square root of the candidate prime.
Many other optimizations are possible (as you yourself point out, this problem has been studied to death) but those are the ones that I can see off the top of my head.

The dictionary really doesn't make sense here -- just store all primes up to a given number in a list. Then follow these steps:
Is given number in the list?
Yes - it's prime. Done.
Not in list
Is given number larger than the list maximum?
No - it's not prime. Done.
Bigger than maximum; need to fill list up to maximum.
Run a sieve up to given number.
Repeat.

1) From the perspective of the client to this function, wouldn't it be better if the return type was bool[] (from 0 to number perhaps)? Internally, you have three states (KnownPrime, KnownComposite, Unknown), which could be represented by an enumeration. Storing an an array of this enumeration internally, prepopulated with Unknown, will be faster than a dictionary.
2) If you stick with the dictionary, the part of the sieve that marks multiples of the current number as composite could be replaced with a numberArray.TryGetValue() pattern rather than multiple checks for ContainsKey and subsequent retrieval of the value by key.

The trouble with returning an object that holds the primes is that unless you're careful to make it immutable, client code is free to mess up the values, in turn meaning you're not able to cache the primes you've already calculated.
How about having a method such as:
bool IsPrime(int primeTest);
in your helper class that can hide the primes it's already calculated, meaning you don't have to re-calculate them every time.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.