best practice to create a generic user id - c#

What s the best way to implement a method that creates and assings ID s to user on a asp.net application?
I was thinking about using DateTime ticks and thread id
I wanna make sure that there is no collision and user ids are unique.
ID can be a string or long.
should i use MD5 on some information that i collect from user? what would that be?
I have seen that md5 collision rate is very low.

I would use GUIDs based off the limited information you've given.

The simplest solution is an autoincremented number. This requires a central server.
Date/time plus a one-way hash are for pseudo-random IDs. Do they have to be pseudo random for security? This should not be relied upon for uniqueness because by definition one-way hashes collide. You'd still need a central server to check for duplicates before issuing the ID.
GUIDs are best if the IDs are created in a distributed system (no central server to generate the ID). GUIDs can be generated on separate machines, and they shouldn't collide. Depends on the implementation, but some GUID algorithms are simply pseudo-random, and yes, there is still a possibility of collision.

Guid is by far the best choice for generating unique ids for something like a userid. They are absolutely guaranteed to be unique globally (hence the name). In order to best work with a clustered index you should use NEWSEQUENTIALID(). This generates sequential ids that can be appended to the index, and prevents sql server having to reorganise and page the index every time a value is added. There is a small security concern associated with using this function in that the next value in the sequence can be determined.

Related

Should cache keys be hashed?

I am working on an existing system that using NCache. it is a distributed system with large caching requirements, so there is no question that caching is the correct answer, but...
For some reason, in the existing code, all cache keys are hashed before storing in the cache.
My argument is that we should NOT hash the key, as the caching library may have some super optimized way of storing it's dictionary and hashing everything means we may actually be slowing down lookups if we do this.
The guy who originally wrote the code has left, and the knowledge of why the keys are cached has been lost.
Can anyone suggest if hashing is the correct thing to do, or should it be removed.
Okay so your question is
Should we hash the keys before storing?
If you yourself do hashing, will it slow down anything
Well, the cache API works on strings as keys. In the background NCache automatically generates hashes against these keys which help it to identify where the object should be stored. And by where I mean in which node.
When you say that your application Hashes keys before handing it over to NCahe, then it is simple an unnecessary step. NCache API was meant to take this headache from you.
BUT if those hashes were generated because of some internal Logic within your application then that's another case. Please check carefully.
Needless to say, if you're doing something again and again then it will definitely have a performance degradation. The Hash strings that you provide will be used again to generate another hash value (int).
Whether you should or shouldn't hash keys depends on your system requirements.
NCache identifies object by it's key, and considers objects with equal keys to be equal. Below is a definition of a hash function from Wikipedia:
A hash function is any function that can be used to map data of
arbitrary size to data of fixed size.
If you stop hash keys, then cache may behave differently. For example, some objects that NCache considered equal, now NCache may consider not equal. And instead of one cache entry you will get two.
NCache doesn't require you to hash keys. NCache key is just a string that is unique for each object. Relevant excerpt from NCache 4.6 Programmer’s Guide:
NCache uses a “key” and “value” structure for objects. Every object
must have a unique string key associated with it. Every key has an
atomic occurrence in the cache whether it is local or clustered.
Cached keys are case sensitive in nature, and if you try to add
another key with same value, an OperationFailedException is thrown by
the cache.

To which extent is C# System.Guid.NewGuid() safe for the following situation?

I'm building an app.
Assume it is a messaging app and becomes as popular as whatsapp.
A GUID will be given for every message sent in the world.
There will be a problem if any 2 GUIDs of in the world are equal.
As of today 30billion!(official) whatsapp messages are sent in the world in a day.
I'm using C# (Xamarin)'s System.Guid.NewGuid method to generate GUIDs.
What is the chance for a 'problem' occur because the random numbers are not truly random?
(This question is different from others because it describes a situation where millions of people get billions of new GUIDs combined each day.)
I like this passage from Wikipedia:
They may or may not be generated from random (or pseudo-random) numbers. GUIDs generated from random numbers normally contain 6 fixed bits (these indicate that the GUID is random) and 122 random bits; the total number of unique such GUIDs is 2122 (approximately 5.3×1036). This number is so large that the probability of the same number being generated randomly twice is negligible;[citation needed] however other GUID versions have different uniqueness properties and probabilities, ranging from guaranteed uniqueness to likely duplicates. Assuming uniform probability for simplicity, the probability of one duplicate would be about 50% if every person on earth as of 2014 owned 600 million GUIDs.
https://en.wikipedia.org/wiki/Globally_unique_identifier
If you are truly concerned you always have the option and ability to create a collision detection method. Such as if you detect that a GUID is already in use then just assign a new random GUID and iterate this until no duplicate is detected. Reminds me of hashtables so to speak. There are performance penalties for this, but just know there is a solution for your obstacle!
Update
I can understand your concern about randomness, but if you take into account it is a structured algorithm it will almost assign in a sequential chronological matter (kind of). There is a minimal concern pertaining to this, I would just be more concerned with the performance degradation surrounding using a 128-bit value for a primary key resolvement.
Also from Wikipedia:
GUIDs are commonly used as the primary key of database tables, and with that, often the table has a clustered index on that attribute. This presents a performance issue when inserting records because a fully random GUID means the record may need to be inserted anywhere within the table rather than merely appended near the end of it.
If there are 30B messages every day, 30B GUIDs are generated.
This is how many times you can generate a unique GUID. Even if your application stands strong for septillions of years, the chance of generating duplicate UUIDs is about 0.03225806451%. So please do not worry, even if it WILL happen, it will happen once, and you'll resolve it easily.

C# generate Guid that is unique(does not exist in a table in the database)

I have a table where the primary key is of type Guid. In my MVC application, I want to add a record to this table. I know of the Guid.newGuid() that creates a new Guid. This workds well but I have a concern. How does one ensures that the created Guid is unique (does not yet exist in the database)? Is there a way to generate it by comparing already existing values to make sure that the new guid is unique across the database records?
The entire purpose of the guid generation technique is that it doesn't need to. The algorithm will generate a globally unique value even if it doesn't have access to all of the other previously generated GUIDs.
In particular, the algorithm is to just generate one big random number. There are so many bits of data in the GUID that the odds of two of them having the same value are infinitesimally small. Small enough that they truly can be ignored.
For a more detailed analysis see Eric Lippert's blog on the subject. (In particular part three.)
Note that, as a consequence of this, using a GUID as a unique identifier will take up quite a bit more space in the database then just using a numeric identifier. Any decent database will have a special column type specifically designed to be a unique identifier that it will populate; such a column will be able to ensure uniqueness while using quite a lot less space than a GUID.
The possibility od generating a duplicate is very low. However you could enforce a UNIQUE constraint on the database table.
However there are very little chances that the new GUId will match with anyone present in database.
But if you still want to be sure just create a proc or something that will give true or false for a GUID. If it returns true then generate again and repeat this process till a unique GUID is not achieved.

Is there any better option for GUID creation than System.Guid.NewGuid() in .net

In my application code i am generating GUID using System.Guid.NewGuid() and saving this to SQL server DB.
I have few questions regarding the GUID generation:
when I ran the program I did not find any problem with this in terms of performance, but I still wanted to know whether we have any other better way to generate GUID.
System.Guid.NewGuid() is this the only way to create GUID in .NET
code?
The GUIDs generated by Guid.NewGuid are not sequential according to SQL Servers sort order. This means you are inserting randomly into your indexes which is a disaster for performance. It might not matter, if the write volume is small enough.
You can use SQL Servers NEWSEQUENTIALGUID() function to create sequential ones, or just use an int.
One alternative way to generate guids (I presume as your PK) is to set the column in the table up like this:
create table MyTable(
MyTableID uniqueidentifier not null default (newid()),
...
Implementing like this means that you've the choice whether or not to set them in .Net or to let SQL do it.
I wouldn't say either is going to be "better" or "quicker" though.
To answer the question:
Is there any better option for GUID creation than
System.Guid.NewGuid() in .net
I would venture to say that System.Guid.NewGuid() is the preferred choice.
But for the follow up question:
...saving this to SQL server DB.
The answer is less clear. This has been discussed on the web for a long time. Just Google "guid as primary key" and you'll have hours of reading to do.
Usually when you use a Guid in Sql server it is for the reason of using as primary keys in tables. This has many nice advantages:
It's easy to generate new values without accessing the database
You can be reasonably sure that you locally generated Guid will NOT cause a primary key collision
But there are significant drawbacks as well:
If the primary key is also the clustered index, inserting large amounts of new rows will cause a lot of IO (disc operations) and index updates.
The Guid is quite large compared to the other popular alternative for a surrogate key, the int. Since all other indexes on the table contain the clustered index key, they will grow much faster if you have a Guid vs an int.
Which will also cause more IO since those indexes will require more memory
To mitigate the IO issue, Sql Server 2005 introduced a new NEWSEQUENTIALGUID() function which can be used to generate sequential Guids when inserting new rows. But if you are ging to use that, then you will have to be in contact with the database to generate one, so you lose the possibility to generate one when off line. In this situation you could still generate a normal Guid and use that.
There are also many articles on the web about how to roll your own sequential Guids. One sample:
http://www.codeproject.com/Articles/388157/GUIDs-as-fast-primary-keys-under-multiple-database
I have not tested any of them so I can't vouch for how good they are. I chose that specific sample because it contains some information that might be interesting. Specifically:
It gets even more complicated, because one eccentricity of Microsoft
SQL Server is that it orders GUID values according to the least
significant six bytes (i.e. the last six bytes of the Data4 block).
So, if we want to create a sequential GUID for use with SQL Server, we
have to put the sequential portion at the end. Most other database
systems will want it at the beginning.
EDIT: Since the issue seems to be about inserting large amounts of data using bulk copy, a sequential Guid will probably be needed. If it's not necessary to know the Guid value before inserting then the answer by Jon Egerton would be one good way to solve the issue. If you need to know the Guid value beforehand you will either have to generate sequential Guids to use when inserting or use a workaround.
One possible workaround could be to change the table to use a seeded INT as primary key (and clustered index), and have the Guid value as a separate column with a unique index. When inserting the Guid will be supplied by you while the seeded int will be the clustered index. The rows will then be inserted sequntially, and your generated Guid can still be used as an alternative key for fetching records later. I have no idea if this is a feasible solution for you but it's at least one possible workaround.
NewGuid would be the generally recommended way - unless you need sequential values, in which case you can P/Invoke to the rpcrt function UuidCreateSequential:
Private Declare Function UuidCreateSequential Lib "rpcrt4.dll" (ByRef id As Guid) As Integer
(Sorry, nicked from VB, sure you can convert to C# or other .NET languages as required).

CQRS and primary key: guid or not?

For my project, which is a potentially big web site, I have chosen to separate the command interface from the query interface. As a result, submitting commands are one-way operations that don't return a result. This means that the client has to provide the key, for example:
service.SubmitCommand(new AddUserCommand() { UserId = key, ... });
Obviously I can't use an int for the primary key, so a Guid is a logical choice - except that I read everywhere about the performance impact it has, which scares me :)
But then I also read about COMB Guids, and how they provide the advantages of Guid's while still having a good performance. I also found an implementation here: Sequential GUID in Linq-to-Sql?.
So before I take this important decision: does someone have experience with this matter, of advice?
Thanks a lot!
Lud
First of all, I use sequential GUIDs as a primary key and I don't have any problems with performance.
Most of tests Sequential GUID vs INT as primary key operates with batch insert and selects data from idle database. But in a real life selects and updates happen in SAME time.
As you are applying CQRS, you will not have batch inserts and burden for opening and closing transactions will take much more time than 1 write query. As you have separated read storage, your select operations on a table with GUID PK will be much faster than they would be on a table with INT PK in a unified storage.
Besides, asynchrony, that gives you messaging, allows your applications scale much better than systems with blocking RPC calls can do.
In consideration of aforesaid, choosing GUIDs vs INTs seems to me as be penny-wise and pound-foolish.
You didn't specify which database engine you are using, but since you mentioned LINQ to SQL, I guess it's MS SQL Server.
If yes, then Kimberly Tripp has some advice about that:
Disk space is cheap...
GUIDs as PRIMARY KEYs and/or the clustering key
To summarize the two links in a few words:
sequential GUIDs perform better than random GUIDs, but still worse than numeric autoincrement keys
it's very important to choose the right clustered index for your table, especially when your primary key is a GUID
Instead of supplying a Guid to a command (which is probably meaningless to the domain), you probably already have a natural key like username which serves to uniquely identify the user. This natural key make a lot more sense for the user commands:
When you create a user, you know the username because you submitted it as part of the command.
When you're logging in, you know the username because the user submitted it as part of the login command.
If you index the username column properly, you may not need the GUID. The best way to verify this is to run a test - insert a million user records and see how CreateUser and Login perform. If you really to see a serious performance hit that you have verified adversely affects the business and can't be solved by caching, then add a Guid.
If you're doing DDD, you'll want to focus hard on keeping the domain clean so the code is easy to understand and reflects the actual business processes. Introducing an artificial key is contrary to that goal, but if you're sure that it provides actual value to the business, then go ahead.

Categories