18 Digit Unique ID - Code reliability

18 Digit Unique ID - Code reliability - c#

I want a number that would be unique forever, I came up with the following code,
it generates a number and adds a check digit to the end of it, I would like to know how reliable is this code?
public void GenerateUniqueNumber(out string ValidUniqueNumber) {
string GeneratedUniqueNumber = "";
// Default implementation of UNIX time of the current UTC time
TimeSpan ts = DateTime.UtcNow - new DateTime(1970, 1, 1, 0, 0, 0, 0);
string FormatedDateTime = Convert.ToInt64(ts.TotalSeconds).ToString();
string ssUniqueId = DateTime.UtcNow.ToString("fffffff");
//Add Padding to UniqueId
string FormatedUniqueId = ssUniqueId.PadLeft(7, '0');
if (FormatedDateTime.Length == 10 && FormatedUniqueId.Length == 7)
{
// Calculate checksum number using Luhn's algorithm.
int sum = 0;
bool odd = true;
string InputData = FormatedDateTime + FormatedUniqueId;
int CheckSumNumber;
for (int i = InputData.Length - 1; i >= 0; i--)
{
if (odd == true)
{
int tSum = Convert.ToInt32(InputData[i].ToString()) * 2;
if (tSum >= 10)
{
string tData = tSum.ToString();
tSum = Convert.ToInt32(tData[0].ToString()) + Convert.ToInt32(tData[1].ToString());
}
sum += tSum;
}
else
sum += Convert.ToInt32(InputData[i].ToString());
odd = !odd;
}
//CheckSumNumber = (((sum / 10) + 1) * 10) - sum;
CheckSumNumber = (((sum + 9) / 10) * 10) - sum;
// Compute Full length 18 digit UniqueNumber
GeneratedUniqueNumber = FormatedDateTime + FormatedUniqueId + Convert.ToString(CheckSumNumber);
}
else
{
// Error
GeneratedUniqueNumber = Convert.ToString(-1);
}
ValidUniqueNumber = GeneratedUniqueNumber;
}
EDIT: clarification
GUID can not be used, the number will need to be entered into a IVR system via telephone keypad.

You cannot use GUIDs, but you can create your own format of unique number similar to a GUID, that is based on the machine's MAC address (space) and the current time and date (time). This is guaranteed to be unique if the machines all have synchronised clocks.
For more information, please see here

Why don't you just use a Guid?

There are a few problems with this method:
You're basically just counting the number of milliseconds from January 1, 1970. You can get this from ts.TotalSeconds rounded to 0.0000001. All your conversion and millisecond calculation is unnecessary.
10 years is about 3×10¹¹ milliseconds. You are keeping 17 significant digits, so for the next 10 years the first 5 digits will never change and cannot be used to distinguish numbers. They are useless.
Are you generating numbers for milliseconds between 1970 and now? If not, they also cannot be used to distinguish numbers and are useless.
This is totally dependent on what machine is returning the date. Anyone who has access to this machine can generate whatever "unique" numbers they want. Is this is problem?
Anyone who sees one of these numbers can tell when it was generated. Is this a problem?
Anyone can predict what number will be generated when. Is this a problem?
1015 milliseconds is about 30000 years. After then, your algorithm will repeat numbers. Seems like a long time, but you specified "forever" and 30000 years is not "forever". Do you really mean "forever"?

If I understand your implementation correctly, it only uses the current date/time as a basis. That means that if you create two IDs simultaneously, they will not be unique.

Since you mentioned (in comments) that the IDs are stored in a DB, you can generate the IDs either using the method you mentioned or randomly and check for the existence in the DB.
If it already exists, generate a new one, otherwise you're done.
One thing though, I would make sure that checking for the existence of the ID and the actual saving of the record to the DB be done in a transaction, otherwise you run the risk of having another request create that record in between the checking for the ID and the creation of the row.
Also just checking, why wouldn't an auto-increment number generated by the database itself work? The DB would guarantee it's uniqueness (for that table anyway)

You don't say what the numbers are to be used for. Do they have some sort of value associated with them? Will it be a problem if users can figure out the scheme and guess valid ticket numbers?
If it is important for these numbers to be hard to guess, this scheme falls down; something that outputs data that looks really random would be better. You might take a monotonically increasing serial number and encrypt it using a block cipher (with a 64-bit block size); that gives you a 64-bit output or about 20 decimal digits worth, which you could take (say) the last 18 of. (If reversibility is important, i.e. given a ticket number you want to be able to recover the serial number, you need to be a bit more careful here.)
Do you need a cast-iron 100% guarantee that no ticket numbers will ever be the same? If so, you need to keep them in a database and mark them off when used. If you do that, it might be reasonable to just use a good random number generator and check for dupes every time.

Using the system time is a good start, but it gives you collisions if you need to generate two UIDs at the same time. It doesn't help that you're using the "fffffff" format: The Windows clock resolution is only 15-16 ms, so only one or two of those "f"s are doing any good.
Also, your approach tells you exactly when the ID was generated. Depending on your needs, this may be a desirable feature, or it may be a security risk.
You'll need your IDs to include other information instead of or in addition to the time. Some possible choices are:
A random number
A cyclic counter
A hash of the program name (if your need these IDs in multiple programs)
The MAC address or other identifier for the machine (If the IDs need to be unique across multiple computers)
If you want to ensure uniqueness, then store your IDs in a database so you can check for duplicates.

As "Andrew Hare" says, You can use Guid.
About your code the answer is "NO"!
because if client's computer's DateTime was wrong or change result may be couple or more!

No such thing as random anyway. Here's a suggestion.
Create your own "random" 18 digit number
Before sending it to the user, check it against existing ones in DB
If already in DB, rinse and repeat.

Related

Encode current date into short unique string

I need to encode current datetime into some unique string to store it in database.
I found this article how to generate a unique token which expires after 24 hours? but for me generated token is to long (34 symbols)
Is there some other similar way to encode shorter string?
Perfect size <= 10 symbols.

I'd ask why? What are you trying to do? If you want to timestamp something like a log entry then just use the datetime value - every database type I know has a built in date/time type.
If you're trying to generate a unique Id for use as something like a primary key then this would be a bad idea - I've yet to find a good case for using a date based unique id.
It would be much better to have an auto-incrementing integer or even GUID value. If you wanted you could then add a timestamp column to the database

You can use a tick (DateTime.Ticks for instance) but don't store the tick as simple string, encode the bits. If you use a long tick (64bit) you should consider ASCII85 encoding of the bytes so it wont exceed 10 symbols.
var tickBytes = BitConverter.GetBytes(DateTime.UtcNow.Ticks);
string encodedTicks = new Ascii85().Encode(tickBytes);
If you chose a 32bit tick, base 64 should be fine.
For a readable tick with precision to second (less precise than the previous solution)
long origin = new DateTime(2014, 7, 24).Ticks / TimeSpan.TicksPerSecond;
long customTicks = (DateTime.UtcNow.Ticks / TimeSpan.TicksPerSecond) - origin;
string readableTicks = customTicks.ToString(CultureInfo.InvariantCulture);
That will stay on 10 chars or less for ~300 years.

Okay, if you want it from "about now" to some point in the future, and you want seconds granularity, and you want ASCII symbols, let's assume base64.
With 8 characters of base64, we can encode 6 bytes of data. That will give us 248 different values, which allows about 9 million years-worth of seconds. Given that range, we might as well use the DateTime.Ticks property and divide by ticks-per-second, not worrying about the epoch. Full code coming later if you want it, but as a list of steps:
Take DateTime.UtcNow.Ticks
Divide by TimeSpan.TicksPerSecond
Convert the result into a byte[], e.g. with BitConverter.GetBytes(long)
Encode the least-significant 6 bits (I'm hopeless with endianness - either the first or last 6 bytes of the byte[] as base64 using Convert.ToBase64String

C# bitwise manipulation for generating unique number

I am trying to generate unique values in c# with the help of DateTime ticks and and incrementing number.
Pseudo code:
Take last 43 significant bits from DateTime.Now ticks (lets name it A)
Take last 21 bits from increasing sequence (lets name it 'B')
Left shift 'A' 21 times (lets name it 'C')
Do binary OR in A and C
I ran the test for generating 2 million number and inserting in database column which has unique constraint set and it ran successfully.
Here is the piece of code that does that:
private static long _sequence = 1;
public static long GetUniqueNumber()
{
const int timeShift = 21;
var dateTime = DateTime.Now.Ticks;
const long dateTimeMask = ~(0L) >> timeShift;
const long sequenceMask = ((~(0L) >> (64 - timeShift)));
var seq = Interlocked.Increment(ref _sequence);
var dateTimeNo = (dateTimeMask & dateTime) << timeShift;
var seqNum = (seq & sequenceMask);
var num = dateTimeNo | seqNum;
return num;
}
I have two questions:
1. Is this logic good enough to generate unique numbers ?
2. I find that some generated numbers are '-ve' which I didn't understand.
Any help/suggestions/improvements are welcome.

Is this logic good enough to generate unique numbers
Unique across what scope? Across multiple computers/processes/AppDomains?, certainly not. Within a single AppDomain? Not really. Generating 2 million numbers is irrelevant - that's just testing that your sequence part works. (221 is just over 2 million.)
If you can call GetUniqueNumber 221+1 times within the granularity of DateTime.Now (which is likely to be ~10-15ms) then you'll get a repeat. Have you measured how fast your computed can call this?
Then there's the fact that those 43 bits will be repeated in 243 ticks' time... or at least would be if you had a sufficiently fine-grained clock. (And sooner or later the granularity will work against you.)
I find that some generated numbers are '-ve' which I didn't understand.
Whenever dateTimeNo has its top bit (out of 43) set, you'll end up with a long with the top bit set - which means it'll be negative.
EDIT: Also note that your shifting is broken. This:
const long dateTimeMask = ~(0L) >> timeShift;
is performed a sign-extended shift - so you're just ending up with ~0L.
In short: use Guid.NewGuid. It's what it's there for.

Negative numbers are due to implementation of long. As it is a signed number, if MSB that is bit 64 becomes '1' after the bit wise manipulation, the number will become negative. Nothing to worry about that.

Generating a unique 15 digite Pin code from a 10digit number

I want to create pin codes and serial numbers for scratch papers , I have already generated unique 10 digit numbers , now I want to turn that 10 digit number to a 16 digit number (with check digit in the end) . The thing is that the function that does this should be reversible so by seeing the 16 digit number I can check whether it is valid or not .(if it is not generated by me it should not be valid) .
this is how I have generated the 10 digit unique random codes :
Guid PinGuid;
byte[] Arr;
UInt32 PINnum = 0;
while (PINnum.ToString().Length != 10)
{
PinGuid = Guid.NewGuid();
Arr = PinGuid.ToByteArray();
PINnum = BitConverter.ToUInt32(Arr, 0);
}
return PINnum.ToString();
I would be grateful if you can give me a hint on how to do it .

First off, I would avoid GUID since some prefixes are reserved for special applications. Which means that these areas of the GUID may not be allocated uniformly on creation, so you may not get exactly 10 digits of randomness like you plan.
Also since your loop waits for the GUID to become the right size you could do it more efficiently.
10 digits = 10**10
Log_2(10) = approx 3322/1000
So you need approx 33 bits for 10 digit number. Since you want your number to be exactly 10 digits, you can either pad numbers less than 10^10 with leading zeroes, or you can generate only numbers between 10^9 and 10^10 - 1.
If you take the latter case you need 9*10^9 numbers in your space -- giving you all numbers from 1 followed by nine zeroes up to 9 followed by 9 9s.
Then you would like to convert this space of numbers into a larger space, to expand it by a factor of 5 and include one more digit as a check digit.
Pick a check digit function as anything you like. You could simply sum (mod 10) the original 10 digits, or choose something more complicated.
Presumably you do not want people to be able to generate valid instances. So if you are really serious about your security, you should modify any suggestions you get from the net before deploying them.
I would do something along the lines of :
Generate a uniform 10digit number with no leading zeroes by
randomTenDigits = 10**9 + rand(9*10**9)
Using an encryption scheme (like AES 256 or even RSA or El-Gamal since their slower speed will no be so important since input length is small ) encrypt this 10 digit number using a secret key only you and others you trust are aware of. Perhaps you can concatenate the 10 digit number 10 times, and then concatenate that result with some other secret that you choose, and then finally encrypt this expanded secret of which the 10 digit number is a part.
Take some choice 5 digits (around 17 bits) of the resulting ciphertext, and append these to your 10 digit number.
Generate 1 digit of check digit by whatever method you desire.
As you will note the real security of this scheme is not from a check digit, it is from the secret key you can use to authenticate the 16 digit number. The test you will use to authenticate it is: does the given 10 digit number when concatenated with other secrets I have, encrypt, using a secret key only I know, to the given 5 digit number presented with it.
Since the difficulty for an attacker of forging one of your numbers depends on the difficulty of
discovering your secret keys and other info
discovering which method of encryption you use
discovering which part of the resulting cipher text you emit for the 5 digit secret, or
simply brute forcing the 5 digits to discover the correct pairing, and since 5 digits is not a big space to search, I would suggest instead generating larger numbers. 10 or 16 digits is not really a huge space to search. So instead of digits I would use upper and lower case letters plus digits plus space and full stop to give you 64 letters in your alphabet. Then if you used 16 you get around 96 bits of security.
However if numbers are non-negotiable and the size of 10 digits for your base space is also non-negotiable, doing it this way is probably the most secure. You may be able to set up your system to deter people from brute forcing it, though you should consider what if someone acquires a piece of your hardware through a vendor. I believe it is easier to design security in rather than design in a mechanism for detecting people trying to brute force query your system.
However if serious dough is on the line ( like millions ) the security you employ should really be first class. Equivalent to the kind of security you would employ to protect a pin number to a million dollar bank account. The more secure you are the longer you can carry on your biz with credibility and trust.
So along these lines I would suggest increasing the size of your secrets to make it infeasible for someone to simply try all combinations and forge a valid one, and in particular thinking about how to design your system to make it difficult to break for people with lots of skills and motivation (money). You really can't be too careful.

I would keep it simple. Put PINnum.ToString() into a buffer. Place a filler digit at 5 intervals. The first four could be random garbage and the last could be a check digit, or you could make each filler a check digit for its section. Here is an example.
buf = PINnum.ToString();
int chkdgit = function to create your checkdigit
Random rnd = new Random();
int i = rnd.Next(1001,9999);
fillbuf = i.toString();
return buf[0] + buf[1] + fillbuf[0] + buf[2] .... chkdgit.toString();
its a rather simple approach, but if your security needs aren't at level 1, it might suffice

Which part of a GUID is most worth keeping?

I need to generate a unique ID and was considering Guid.NewGuid to do this, which generates something of the form:
0fe66778-c4a8-4f93-9bda-366224df6f11
This is a little long for the string-type database column that it will end up residing in, so I was planning on truncating it.
The question is: Is one end of a GUID more preferable than the rest in terms of uniqueness? Should I be lopping off the start, the end, or removing parts from the middle? Or does it just not matter?

You can save space by using a base64 string instead:
var g = Guid.NewGuid();
var s = Convert.ToBase64String(g.ToByteArray());
Console.WriteLine(g);
Console.WriteLine(s);
This will save you 12 characters (8 if you weren't using the hyphens).

Keep all of it.
From the above link:
* Four bits to encode the computer number,
* 56 bits for the timestamp, and
* four bits as a uniquifier.
you can redefine the Guid to right-size it to your needs.

If the GUID were simply a random number, you could keep an arbitrary subset of the bits and suffer a certain percent chance of collision that you can calculate with the "birthday algorithm":
double numBirthdays = 365; // set to e.g. 18446744073709551616d for 64 bits
double numPeople = 23; // set to the maximum number of GUIDs you intend to store
double probability = 1; // that all birthdays are different
for (int x = 1; x < numPeople; x++)
probability *= (double)(numBirthdays - x) / numBirthdays;
Console.WriteLine("Probability that two people have the same birthday:");
Console.WriteLine((1 - probability).ToString());
However, often the probability of a collision is higher because, as a matter of fact, GUIDs are in general NOT random. According to Wikipedia's GUID article there are five types of GUIDs. The 13th digit specifies which kind of GUID you have, so it tends not to vary much, and the top two bits of the 17th digit are always fixed at 01.
For each type of GUID you'll get different degrees of randomness. Version 4 (13th digit = 4) is entirely random except for digits 13 and 17; versions 3 and 5 are effectively random, as they are cryptographic hashes; while versions 1 and 2 are mostly NOT random but certain parts are fairly random in practical cases. A "gotcha" for version 1 and 2 GUIDs is that many GUIDs could come from the same machine and in that case will have a large number of identical bits (in particular, the last 48 bits and many of the time bits will be identical). Or, if many GUIDs were created at the same time on different machines, you could have collisions between the time bits. So, good luck safely truncating that.
I had a situation where my software only supported 64 bits for unique IDs so I couldn't use GUIDs directly. Luckily all of the GUIDs were type 4, so I could get 64 bits that were random or nearly random. I had two million records to store, and the birthday algorithm indicated that the probability of a collision was 1.08420141198273 x 10^-07 for 64 bits and 0.007 (0.7%) for 48 bits. This should be assumed to be the best-case scenario, since a decrease in randomness will usually increase the probability of collision.
I suppose that in theory, more GUID types could exist in the future than are defined now, so a future-proof truncation algorithm is not possible.

I agree with Rob - Keep all of it.
But since you said you're going into a database, I thought I'd point out that just using Guid's doesn't necessarily mean that it will index well in a database. For that reason, the NHibernate developers created a Guid.Comb algorithm that's more DB friendly.
See NHibernate POID Generators revealed and documentation on the Guid Algorithms for more information.
NOTE: Guid.Comb is designed to improve performance on MsSQL

Truncating a GUID is a bad idea, please see this article for why.
You should consider generating a shorter GUID, as google reveals some solutions for. These solutions seem to involve taking a GUID and changing it to be represented in full 255 bit ascii.

Get number of digits in an unsigned long integer c#

I'm trying to determine the number of digits in a c# ulong number, i'm trying to do so using some math logic rather than using ToString().Length. I have not benchmarked the 2 approaches but have seen other posts about using System.Math.Floor(System.Math.Log10(number)) + 1 to determine the number of digits.
Seems to work fine until i transition from 999999999999997 to 999999999999998 at which point, it i start getting an incorrect count.
Has anyone encountered this issue before ?
I have seen similar posts with a Java emphasis # Why log(1000)/log(10) isn't the same as log10(1000)? and also a post # How to get the separate digits of an int number? which indicates how i could possibly achieve the same using the % operator but with a lot more code
Here is the code i used to simulate this
Action<ulong> displayInfo = number =>
Console.WriteLine("{0,-20} {1,-20} {2,-20} {3,-20} {4,-20}",
number,
number.ToString().Length,
System.Math.Log10(number),
System.Math.Floor(System.Math.Log10(number)),
System.Math.Floor(System.Math.Log10(number)) + 1);
Array.ForEach(new ulong[] {
9U,
99U,
999U,
9999U,
99999U,
999999U,
9999999U,
99999999U,
999999999U,
9999999999U,
99999999999U,
999999999999U,
9999999999999U,
99999999999999U,
999999999999999U,
9999999999999999U,
99999999999999999U,
999999999999999999U,
9999999999999999999U}, displayInfo);
Array.ForEach(new ulong[] {
1U,
19U,
199U,
1999U,
19999U,
199999U,
1999999U,
19999999U,
199999999U,
1999999999U,
19999999999U,
199999999999U,
1999999999999U,
19999999999999U,
199999999999999U,
1999999999999999U,
19999999999999999U,
199999999999999999U,
1999999999999999999U
}, displayInfo);
Thanks in advance
Pat

log10 is going to involve floating point conversion - hence the rounding error. The error is pretty small for a double, but is a big deal for an exact integer!
Excluding the .ToString() method and a floating point method, then yes I think you are going to have to use an iterative method but I would use an integer divide rather than a modulo.
Integer divide by 10. Is the result>0? If so iterate around. If not, stop.
The number of digits is the number of iterations required.
Eg. 5 -> 0; 1 iteration = 1 digit.
1234 -> 123 -> 12 -> 1 -> 0; 4 iterations = 4 digits.

I would use ToString().Length unless you know this is going to be called millions of times.
"premature optimization is the root of all evil" - Donald Knuth

From the documentation:
By default, a Double value contains 15
decimal digits of precision, although
a maximum of 17 digits is maintained
internally.
I suspect that you're running into precision limits. Your value of 999,999,999,999,998 probably is at the limit of precision. And since the ulong has to be converted to double before calling Math.Log10, you see this error.

Other answers have posted why this happens.
Here is an example of a fairly quick way to determine the "length" of an integer (some cases excluded). This by itself is not very interesting -- but I include it here because using this method in conjunction with Log10 can get the accuracy "perfect" for the entire range of an unsigned long without requiring a second log invocation.
// the lookup would only be generated once
// and could be a hard-coded array literal
ulong[] lookup = Enumerable.Range(0, 20)
.Select((n) => (ulong)Math.Pow(10, n)).ToArray();
ulong x = 999;
int i = 0;
for (; i < lookup.Length; i++) {
if (lookup[i] > x) {
break;
}
}
// i is length of x "in a base-10 string"
// does not work with "0" or negative numbers
This lookup-table approach can be easily converted to any base. This method should be faster than the iterative divide-by-base approach but profiling is left as an exercise to the reader. (A direct if-then branch broken into "groups" is likely quicker yet, but that's way too much repetitive typing for my tastes.)
Happy coding.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.