Using String values for primary key

Using String values for primary key - c#

Can I store SOF/2015/01 as my ID, and can I auto increment 01 like usual primary key?

Can I store SOF/2015/01 as my ID
Answer : yes you can
can I auto increment 01 like usual primary key.
Answer : no you can't
Auto increment can increment only numbers.
You have to that manually.
You can use function in trigger to generate your desire auto incremented number like this
create function NextCustomerNumber()
returns char(5)
as
begin
declare #lastval char(5)
set #lastval = (select max(customerNumber) from Customers)
if #lastval is null set #lastval = 'C0001'
declare #i int
set #i = right(#lastval,4) + 1
return 'C' + right('000' + convert(varchar(10),#i),4)
end
This can cause some issues, however:
What if two processes attempt to add a row to the table at the exact
same time? Can you ensure that the same value is not generated for
both processes?
There can be overhead querying the existing data each time you'd like
to insert new data
Unless this is implemented as a trigger, this means that all inserts
to your data must always go through the same stored procedure that
calculates these sequences. This means that bulk imports, or moving
data from production to testing and so on, might not be possible or
might be very inefficient.
If it is implemented as a trigger, will it work for a set-based
multi-row INSERT statement? If so, how efficient will it be? This
function wouldn't work if called for each row in a single set-based
INSERT -- each NextCustomerNumber() returned would be the same value.
You can learn more from this

Create a two column unique primary key with the string 'SOF/2015' as part one and an auto increment-ing integer as the second column. You can combine the two columns using a function that returns a string to give you the combined key. For syntactic sugar, you can create a view on the table using your function to combine the keys into one view column.

You can certainly use NCHAR or NVARCHAR types as primary keys, on the presumption that any variable-sized columns don't use MAX, and data doesn't exceed the maximum allowable size of your index.
As for using it as an auto-incremented column, that won't work. SQL is very smart, but not quite in that way.
I would suggest pulling that string into two or three separate columns, so that you can store the "01" portion as a separate, IDENTITY'd column. But certainly this is a design question that you'd have to work out on your own.
The other solution would be a trigger, but I'd probably hesitate in general with using something like this as a primary key. Using numeric types is just a lot nicer in many ways, particularly when you have to reference the tables elsewhere. You could always apply a UNIQUE index on the string representation.

Related

Insert data manually in a safe way?

I have a trigger which needs to fill a table with hundreds of rows, I need to type every single insert manually (it is a kind of pre-config table).
This table has an Int FK to an Enum Table. The Enum Table uses an int as a PK and a varchar (which is UNIQUE).
While typing the insert statements I need to be very careful that the integer FK is the correct one.
I would rather like to insert the data by the varchar of the enum.
So I do something like this now:
INSERT INTO MyTable(ColorId)
VALUES(1)
And I would like to do something like this:
INSERT INTO MyTable(ColorStr)
VALUES('Red')
The reason why the Enum has an int PK is because of performance issues (fast queries), but I don't know if it is a good idea now. What do you think?
Is there a safe way to do it? Is it possible to insert data into a Table View?

Sure. Do not insert.
No joke.
First, you do not need to use one insert statement PER LINE - look at the syntax, you can have one insert statement doing a lot of lines.
Second, nothing in the world says you can not do processing (like select and join) on the inserted data.
I generally use table definition like this (with a merge statement) for all my static lookup library data (like country lists). Full automatic maintenance on every change. WIth inserts and updates happening on demand.

Check if text column is unique before import

I have a sql database where one of the columns is a varchar value. This value is always unique, it's not decided by me but a 3rd party application that supplies the data, it's length is undefined and is a mixture of numbers and letters. I should add that it's not declared as unique in the database as to my knowledge you can't for a varchar type?
Each week I run an import of this data from a csv file, however, the only way I know how to check if I'm importing a unique value is to loop through each row in the database and compare it to each line in the csv file to check if the corresponding value is unique.
Obviously this is very inefficient and is only going to get worse over time as the database gets bigger.
I've tried checking Google but no to avail, could be that I'm looking for the wrong thing though.
Any pointers would be much appreciated.
Application is written in C#

Look at running a MERGE command on SQL instead of an INSERT, which will allow you to explicitly guide action to be taken on a duplicate.
Note that if the unique field is indexed unique, then searching for a value is O(LOG(n)) and not O(n). THis means that overall performance for inserting N values is O(NLog(N)) and not O(NN). As N gets large, this is a substantial performance improvement.

Index the table on the unique field.
Do a 'if exists' on the unique key field value.If it returns a true, the row exists, update the row. If the return is false then this is a new row, insert the row.

Slow Insert Time With Composite Primary Key in Cassandra

I have been working with Cassandra and I have hit a bit of a stumbling block. For how I need to search on data I found that a Composite primary key works great for what I need but the insert times for the record in this Column Family go to the dogs with it and I am not entirely sure why.
Table Definition:
CREATE TABLE exampletable (
clientid int,
filledday int,
filledtime bigint,
id uuid,
...etc...
PRIMARY KEY (clientid, filledday, filledtime, id)
);
clientid = The internal id of the client. filledday = The number of days since 1/1/1900. filledtime = The number of ticks of the day at which the record was recived. id = A Guid.
The day and time structure exists because I need to be able to filter by day easily and quickly.
I know Cassandra stores Column Families with composite primary keys quite differently. From what I understand it will store the everything as new columns off of a base row of the main component of the primary key. Is that the reason the inserts would be slow? When I say slow I mean that if I just have a primary key on id the insert will take ~200 milliseconds and with the composite primary key (or any subset of it, I tried just clientid and id to the same effect) it will take upwards of 32 seconds for 1000 records. The Select times are faster out of the composite key table since I have to apply secondary indexes and use 'ALLOW FILTERING' in order to get the proper records back with the standard key table (I know I could do this in code but the concern is that I am dealing with some massive data sets and that will not always be practical or possible).
Am I declaring the Column Family or the Primary Key wrong for what I am trying to do? With all the unlisted, non-primary key columns the table is 37 columns wide, would that be the problem? I am quite stumped at this point. I have not be able to really find anything about others having similar problems.

Well, your partition key is the client id, so all writes per client go to one node. If you are writing lots of data per client, you could end up with a hotspot, thus decreasing your overall throughput.
Also, could you give an example of the queries that you run? In Cassandra, the data model always need to resemble the queries you want to run. If you need to "allow filtering", then it seems that something is not quite right with your data model. For instance, I don't really see the point of "filledtime" in your PK. If you want to query by time period, just replace your three column keys with a TimeUUID column "ts". This would create a wide row, with one column per entry with a unique timestam, clustered/partitioned per client id.
This allows queries like:
select * from exampletable where clientid = 123 and ts > minTimeuuid('2013-06-18 16:23:00') and ts < minTimeuuid('2013-06-18 16:24:00');
Again, this would depend on the queries you actually need to run.
And lastly, for overall guidance on data modelling, take a look into this ebay tech blog. Reading it helped me cleared up some things for me.
Hope that helps!

Is this okay to have a Alphanumeric field as a PrimaryKey?

I am rewriting a new timesheet application including redesigning database and it will require data migration from Oracle to Oracle.
In the old system field ‘EmployeeCod’ is a Primary Key and it is in Alphanumeric form i.e. ‘UK001’, ‘UK002’,‘FR001’,’FR002’, ‘US001’ . Employee table is also linked to timesheet and other tables where the EmpCode is being referred as a FK.
To make the JOINs perform faster in the new system I was thinking about adding a new INT column in the Employee table and set it to PK. (Don't know if it will make any big difference)
-Employee table has about 600 rows.
-Data type of EmpCode is Varchar2(20) in old DB which I can reduce to Varchar2(6) in the new system and alter it later as company expends.
I am wondering if it is better to keep the EmpCode as a Primary Key which will make things easier in migrating data or should I add a INT column?
Someone has given me following advise in one of my previous thread:
“if you need to create a composite code of AANNN then I'd split this into two: a simple 'Prefix' field of CHAR(2) and an identity field of INT, then turn EmpCode into a computed field that concats the two and stick an index on there that (#Chris)”
I am not sure if this option would work as employee table is linked to other tables as well. (EmpCode is being used as FK in other tables)
n

If you do add this PK, and also keep the former PK, you will have some data management issues to deal with. Or perhaps your customers. Getting rid of the old PK may not be feasable if there are existing users who will be upgrading to the new database.
If EmployeeCode, the former PK is used by the users of the data to identify Employees, then you will have to add a constraint to make sure that this field is unique. Carrying both codes will wipe out any performance gains you were hoping for.
If it were me, I'd leave well enough alone. The performance gains, if any, will be trivial.

The performance difference will be negligible if the index you're creating on the alphanumeric field is the clustered index for the table. Which, based off of your question is going to be the case, but I wanted to note that for completeness. I say this for two reasons:
A clustered index is the physical order of the table and so when seeking against that index, looking for more data presumably off of the data page in a query, a binary search can be performed against it because it's also physically stored in that order.
A binary search is just about as efficient as you can get, lest we forget though a statistical index. I call this out because integer primary keys build statistical indexes which are as fast a seek as you can get because mathmatically speaking we know 2 comes after 1 for example.
So, just keep that in mind when building alphanumeric, or even compound, keys and indexes and trying to compare the difference between them and an integer key. Personally, I prefer to stick with integer primary keys because I have found them to perform better over time during extreme growth.
I hope this helps.

I use alphanumeric primary keys regularly and see absolutely no issues with it. There is no performance issue, you have a wider addressable space, and you can be more expressive/human readable. Integer keys are just a convention.
Add to that the risk you're adding to you project by adding a major architectural change over and above the porting issues, I'd say stick with the existing schema as much as possible.

There will be no performance improvement - in fact, unless you know and can prove/measure that you have a performance problem, changing things "to make them faster" usually leads to pain.
However, there is a concern that your primary key appears to carry meaning - it's a country code, concatenated with a number. What if an employee moves from the US to the UK? What if the UK hires its 1000th employee?
For that reason, I'd refactor the application to use a meaningless primary key; whether it's an INT or a VARCHAR is not hugely relevant.

You do occassionally come across alphanumeric primary keys.. personally I find it just makes life more difficult.. if you are able to change it and you want to change it, I would say go ahead.. it will make things easier for you later. As for it being an FK, you would need to be careful to write a script to properly update all the data. One way you can do this is:
Step 1: Create a new int column for the PK and set Identity Insert to true
Step 2: Add a new int column in your child table and then:
Step 3: write an update script like this:
UPDATE childTable C
INNER JOIN parentTable P ON C.oldEmpID = P.oldEmpID
SET C.myNewEmpIDColumn = P.myNewEmpIDColumn
Step 4: Repeat steps 2 & 3 for all child tables
Step 5: Delete all old FK columns
Something like that and don't forget to backup your current DB first ;)

Is the usage of identity insert good with metadatatables

I have several tables within my database that contains nothing but "metadata".
For example we have different grouptypes, contentItemTypes, languages, ect.
the problem is, if you use automatic numbering then it is possible that you create gaps.
The id's are used within our code so, the number is very important.
Now I wonder if it isn't better not to use autonumbering within these tables?
Now we have create the row in the database first, before we can write our code. And in my opinion this should not be the case.
What do you guys think?

I would use an identity column as you suggest to be your primary key(surrogate key) and then assign your you candidate key (identifier from your system) to be a standard column but apply a unique constraint to it. This way you can ensure you do not insert duplicate records.
Make sense?

if these are FK tables used just to expand codes into a description or contain other attributes, then I would NOT use an IDENTITY. Identity are good for ever inserting user data, metadata tables are usually static. When you deploy a update to your code, you don't want to be suprised and have an IDENTITY value different than you expect.
For example, you add a new value to the "Languages" table, you expect the ID will be 6, but for some reason (development is out of sync, another person has not implemented their next language type, etc) the next identity you get is different say 7. You then insert or convert a bunch of rows having using Language ID=6 which all fail becuase it does not exist (it is 7 iin the metadata table). Worse yet, they all actuall insert or update because the value 6 you thought was yours was already in the medadata table and you now have a mix of two items sharing the same 6 value, and your new 7 value is left unused.
I would pick the proper data type based on how many codes you need, how often you will need to look at it (CHARs are nice to look at for a few values, helps with memory).
for example, if you only have a few groups, and you'll often look at the raw data, then a char(1) may be good:
GroupTypes table
-----------------
GroupType char(1) --'M'=manufacturing, 'P'=purchasing, 'S'=sales
GroupTypeDescription varchar(100)
however, if there are many different values, then some form of an int (tinyint, smallint, int, bigint) may do it:
EmailTypes table
----------------
EmailType smallint --2 bytes, up to 32k different positive values
EmailTypeDescription varchar(100)

If the numbers are hardcoded in your code, don't use identity fields. Hardcode them in the database as well as they'll be less prone to changing because someone scripted a database badly.

I would use an identity column as the primary key also just for simplicity sake of inserting the records into the database, but then use a column for type of metadata, I call mine LookUpType(int), as well as columns for LookUpId (int value in code) or value in select lists, LookUpName(string), and if those values require additional settings so to speak use extra columns. I personally use two extras, LookUpKey for hierarchical relations, and LookUpValue for abbreviations or alternate values of LookUpName.

Well, if those numbers are important to you because they'll be in code, I would probably not use an IDENTITY.
Instead, just make sure you use a INT column and make it the primary key - in that case, you will have to provide the ID's yourself, and they'll have to be unique.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.