SQL Primary Key Generation - c#

Used SQL Server = MySQL
Programming language = irrelevant, but I stick to Java and C#
I have a theoretical question regarding the best way to go about primary key generation for SQL databases which are then used by another program that I write, (let's assume it is not web-based.)
I know that a primary key must be unique, and I prefer primary keys where I can also immediately tell where they are coming from when I see them, either in my eclipse or windows console when I use a database, as well as in relationship tables. For that reason, I generally create my own primary key as an alphanumeric string unless a specific unique value is available such as an ISBN or SS num. For a table Actors, a primary key could then look like a1, and in a table Movies m1020 (Assuming titles are not unique such as different versions of the movie 'Return to witch Mountain').
So my question then is, how is a primary key best generated (in my program or in the db itself as a procedure)? For such a scheme, is it best to use two columns, one with a constant string such as 'a' for actors and a single running count? (In that case i need to research how to reference a table whose PK spans multiple columns) What is the most professional way of handling such a task?
Thank you for your time.

A best practice is to let your DB engine generate the primary key as an auto-increment NUMBER. Alphanumeric string are not a good way, even if it seems too "abstract" for you. Then, you don't have to worry about your primary key in your program (Java, C#, anything else) ; at each line inserted in your Database, an unique primary key is automatially inserted
By the way, with your solution, I'm not sure you manage the case where two rows are inserted simultaneously... Are you sure in absolutely no case, your primary key can be duplicated ?

Your first line says:-
SQL Server = MySQL
Thats not true. They are different.
how is a primary key best generated (in my program or in the db itself
as a procedure)?
Primary keys are generated by MYSQL when you specify the column with primary key constraint on it. The primary keys are automatically generated and they are automatically incremented.
If you want your primary key as alphanumeric(which I personally will not recommend) then you may try like this:-
CREATE TABLE A(
id INT NOT NULL AUTO_INCREMENT,
prefix CHAR(30) NOT NULL,
PRIMARY KEY (id, prefix),
I would recommend you to have Primary key as Integer as that would help you to make your selction a bit easier and optimal.For MyIsam tables you can create a multi-column index and put auto_increment field on secondary column

For MySQL there's a best way - set AUTO_INCREMENT property for your primary key table field.
You can get the generated id later with last_insert_id function or it's java or c# analog.

I don't know why you would use "alphanumeric" values - why not just a plain number?
Anyway, use whatever auto-increment functionality is available in whichever DB-system you are using, and stick with that. Do not create primary keys outside of the DB - you can't know when / how two systems might access the DB at the same time, which could cause problems if the two create the same PK value, and attempt to insert it.
Also, in my view, a PK should just be an ID (in a single column) for a specific row, and nothing more - if you need a field indicating that a record concerns data of type "actor" for instance, then that should be a separate field, and have nothing to do with the primary key (why would it?)

Related

Is this okay to have a Alphanumeric field as a PrimaryKey?

I am rewriting a new timesheet application including redesigning database and it will require data migration from Oracle to Oracle.
In the old system field ‘EmployeeCod’ is a Primary Key and it is in Alphanumeric form i.e. ‘UK001’, ‘UK002’,‘FR001’,’FR002’, ‘US001’ . Employee table is also linked to timesheet and other tables where the EmpCode is being referred as a FK.
To make the JOINs perform faster in the new system I was thinking about adding a new INT column in the Employee table and set it to PK. (Don't know if it will make any big difference)
-Employee table has about 600 rows.
-Data type of EmpCode is Varchar2(20) in old DB which I can reduce to Varchar2(6) in the new system and alter it later as company expends.
I am wondering if it is better to keep the EmpCode as a Primary Key which will make things easier in migrating data or should I add a INT column?
Someone has given me following advise in one of my previous thread:
“if you need to create a composite code of AANNN then I'd split this into two: a simple 'Prefix' field of CHAR(2) and an identity field of INT, then turn EmpCode into a computed field that concats the two and stick an index on there that (#Chris)”
I am not sure if this option would work as employee table is linked to other tables as well. (EmpCode is being used as FK in other tables)
n
If you do add this PK, and also keep the former PK, you will have some data management issues to deal with. Or perhaps your customers. Getting rid of the old PK may not be feasable if there are existing users who will be upgrading to the new database.
If EmployeeCode, the former PK is used by the users of the data to identify Employees, then you will have to add a constraint to make sure that this field is unique. Carrying both codes will wipe out any performance gains you were hoping for.
If it were me, I'd leave well enough alone. The performance gains, if any, will be trivial.
The performance difference will be negligible if the index you're creating on the alphanumeric field is the clustered index for the table. Which, based off of your question is going to be the case, but I wanted to note that for completeness. I say this for two reasons:
A clustered index is the physical order of the table and so when seeking against that index, looking for more data presumably off of the data page in a query, a binary search can be performed against it because it's also physically stored in that order.
A binary search is just about as efficient as you can get, lest we forget though a statistical index. I call this out because integer primary keys build statistical indexes which are as fast a seek as you can get because mathmatically speaking we know 2 comes after 1 for example.
So, just keep that in mind when building alphanumeric, or even compound, keys and indexes and trying to compare the difference between them and an integer key. Personally, I prefer to stick with integer primary keys because I have found them to perform better over time during extreme growth.
I hope this helps.
I use alphanumeric primary keys regularly and see absolutely no issues with it. There is no performance issue, you have a wider addressable space, and you can be more expressive/human readable. Integer keys are just a convention.
Add to that the risk you're adding to you project by adding a major architectural change over and above the porting issues, I'd say stick with the existing schema as much as possible.
There will be no performance improvement - in fact, unless you know and can prove/measure that you have a performance problem, changing things "to make them faster" usually leads to pain.
However, there is a concern that your primary key appears to carry meaning - it's a country code, concatenated with a number. What if an employee moves from the US to the UK? What if the UK hires its 1000th employee?
For that reason, I'd refactor the application to use a meaningless primary key; whether it's an INT or a VARCHAR is not hugely relevant.
You do occassionally come across alphanumeric primary keys.. personally I find it just makes life more difficult.. if you are able to change it and you want to change it, I would say go ahead.. it will make things easier for you later. As for it being an FK, you would need to be careful to write a script to properly update all the data. One way you can do this is:
Step 1: Create a new int column for the PK and set Identity Insert to true
Step 2: Add a new int column in your child table and then:
Step 3: write an update script like this:
UPDATE childTable C
INNER JOIN parentTable P ON C.oldEmpID = P.oldEmpID
SET C.myNewEmpIDColumn = P.myNewEmpIDColumn
Step 4: Repeat steps 2 & 3 for all child tables
Step 5: Delete all old FK columns
Something like that and don't forget to backup your current DB first ;)

Check for duplicate column values (not a key) in SQL

Is there a way for SQL to enforce unique column values, that are not a primary key to another table?
For instance, say I have TblDog which has the fields:
DogId - Primary Key
DogTag - Integer
DogNumber - varchar
The DogTag and DogNumber fields must be unique, but are not linked to any sort of table.
The only way I can think of involves pulling any records that match the DogTag and pulling any records that match the DogNumber before creating or editing (excluding the current record being updated.) This is two calls to the database before even creating/editing the record.
My question is: is there a way to set SQL to enforce these values to be unique, without setting them as a key, or in Entity Frameworks (without excessive calls to the DB)?
I understand that I could group the two calls in one, but I need to be able to inform the user exactly which field has been duplicated (or both).
Edit: The database is SQL Server 2008 R2.
As MilkywayJoe suggests, use unique key constraints in the SQL database. These are checked during inserts + Updates.
ALTER TABLE TblDog ADD CONSTRAINT U_DogTag UNIQUE(DogTag)
AND
ALTER TABLE TblDog ADD CONSTRAINT U_DogNumber UNIQUE(DogNumber)
I'd suggest setting unique constraints/indexes to prevent duplicate entries.
ALTER TABLE TblDog ADD CONSTRAINT U_DogTag UNIQUE(DogTag)
CREATE UNIQUE INDEX idxUniqueDog
ON TblDog (DogTag, DogNUmber)
It doesn't appear as though Entity Framework supports it (yet), but was on the cards. Looks like you are going to need to do this directly in the database using Unique Constraints as mentioned in the comments.

Using a hash as a primary key?

I have a requirement to store the list of services for multiple computers. I thought I would create one table to hold a list of all possible tables, a table for all possible computers and then a table to link a service to a computer.
I was thinking to keep the full services list unique, I could possibly use a hash of the executable as the primary key for the service, but i'm not sure if there would be any downsides to this (note that the hashing is only for identification. Not for any types of security purposes). I was thinking rather than using a binary field as the primary/foreign key, that I would store the value as a base 64 encoded sha512, and using an nvarchar(88). Something similar to this:
CREATE TABLE Services
(
ServiceHash nvarchar(88) NOT NULL,
ServiceName nvarchar(256) NOT NULL,
ServiceDescription nvarchar(256),
PRIMARY KEY (ServiceHash)
)
Is there any inherent problems with this solution? (I will be using a SQL 2008 database and generally accessing it via C#.Net).
The problem is that a hash is per definition NOT UNIQUE. It is unlikely you get a collision, but it IS possible. As a result, you can not use the hash only, which means the whole hash id is a dead end.
Use a normal ID field, use a unique constraint with index on the ServiceName.
From a performance point of view, having a non-incremental primary key would cause your clustered index to get fragmented rather quickly.
I recommend either:
Use an INT or BIGINT surrogate PK, with auto-increment.
Use a sequential GUID as a PK. Not as fast for indexing as an INT but incremental, therefore low fragmentation in time.
You can then play with non-clustered indexes on your other columns, including the one storing the hashes. Being VARCHAR you can also full-text index it and then do an exact matching when looking for a specific hash.
But, if possible, use a numerical hash instead and make a non-clustered index on it.
And of course, consider what #TomTom mentioned below.

Approach for primary key generation

What is the best approach when generating a primary key for a table?
That is, when the data received by the database is not injective and can't be used as a primary key.
In the code, what is the best way to manage a primary key for the table rows?
Thanks.
First recommendation stay away from uniqueidentifier for any primary key. Although it has some interesting easy ways to generate it client side, it makes it almost impossible to have any idexes on the primary key that may be useful. If I could go back in time and ban uniqueidentifiers from 99% of the places that they have been used, this would have saved more than 3 man years of dba/development time in the last 2 years.
Here is what I would recommend, using the INT IDENTITY as a primary key.
create table YourTableName(
pkID int not null identity primary key,
... the rest of the columns declared next.
)
where pkID is the name of your primary key column.
This should do what you are looking for.
AUTO_INCREMENT in mysql, IDENTITY in SQL Server..
IDENTITY in SQL Server
and if you need to get know what you new ID was while INSERT-ing data, use OUTPUT clause of INSERT statement - so the copy of new rows is put to table-type param.
If for some reason generating unique ID at SQL is not suitable for you, generate GUID's at your app - GUID has a very hight level of uniquness (but it's not guaranteed in fact). And SQL Server has dedicated GUID type for column - it's called uniqueidentifier.
http://msdn.microsoft.com/en-us/library/ms187942.aspx

Edit composite key value using LINQ

I have a table which uses three columns as a composite key.
One of these column values is used as a sequence tracker for ordered related records. When I insert a new record I have to increment the sequence numbers for the related records that come after the new record.
I can do this directly in SQL Server Management Studio, but when I attempt this in LINQ I get the following error:
Value of member 'Sequence' of an object of type 'TableName' changed.
A member defining the identity of the object cannot be changed.
Consider adding a new object with new identity and deleting the existing one instead.
Can anyone suggest a way around this limitation?
(Adding a new record (as suggested by the error message) isn't really an option as the table with the composite key has a relationship with another table.)
Changing primary keys is a "code smell" in my book.
The fix we implemented was as follows
Deleted the relationship that used the composite key
Added autoincrement ID field, set that as primary key
Added Unique contstraint to the three fields that we were previously using as our
Re-created the relationship using the three fields that were previously our primary key
I worked around this by using a SQL stored proc to update one of the primary keys and calling it from LINQ.
I think the compiler is right. The only way of doing this is creating a new record and deleting the old one.
(Adding a new record (as suggested by
the error message) isn't really an
option as the table with the composite
key has a relationship with another
table.)
I think there's no problem with this. Just copy all the fields of your entity, set the new sequence, and set also any relation by just assigning the old EntitySet reference to the new one. I tried this and it updates correctly.
Besides of this, couldn't you just create a new ID column with auto-increment? I agree with #ocdecio. I think changing primary keys is poor design ...
I don't know LINQ, but would this work if you have cascading update defined on the SQL Server for the FK relationships?
Mind, I think using a composite key is a bad idea and changing one is a worse idea. The primary key should not change. Too many things can get broken if the primary key changes. And what do you do when the primary key changes and it is now not unique? If you do this, you will need a way to handle that as well because it will happen.

Categories