Auto-increment Using Dates - c#

I'm quite a beginner in general but I have a theory, idea, etc...
I want to create a task database, with a unique TaskID column [primary key or not] using the date. I need the entry to be auto-generated. In order to avoid collisions, I want to attach a number to the end, so this should achieve the goal of having all entries unique. So a series of entries would look like this:
201309281 [2013-09-28]
201309282
201309291
My thought is that I could use auto-increment that would reset at midnight EST, and start again at the given date, or something like that.
The advantage, to me, of having it work like this, is that you could see all tasks created on a given day, but then the particular task may not be completed or invoiced until, say, a week later. This way you could search by creation date, completion date, or invoice date.
I realize that there are many ways to achieve the end goal of task database. I was just curious if this was possible, or if anyone had any thoughts on how to implement it as the primary key column, or any other column for that matter.
I also want to apologize if this question is unclear. I will try to sum up here.
Can you have an auto-increment column based on the date the row is created, so it automatically generates the date as a number [20130929] with an extra digit on the end in the following format, AND have that extra digit number on the end reset to "1" every day at midnight EST or UTC?
And thoughts on how to accomplish?
eg:
201309291
EDIT: BTW, I would like to use an MVC4 web app to give users CRUD functionality. Using C#. I thought this fact may expand the options.
EDIT: I found this q/a on stack, and it seems similar, but doesn't quite answer my question. My thought is posting the link here might help find an answer. Resetting auto-increment column back to 0 daily

I take it you're new to db design Nick but this sort of design would make any seasoned DBA cringe. You should avoid putting any information in primary keys. The results you're trying to achieve can be attained using something like the code below. Remember, PK's should always be dumb ID's, no intelligent keys!
Disclaimer: I'm a very strong proponent of surrogate key designs and I'm biased in that direction. I've been stung many times by architectures didn't fully consider the trade-offs or the downstream implications of a natural key design. I humbly respect and understand the opinions of natural key advocates but in my experience developing relational business apps - surrogate designs are the better choice 99% of the time.
(BTW, you don't really even need the createdt field in the RANK clause, you could use the auto-increment PK instead in the ORDER BY clause of the PARTITION).
CREATE TABLE tbl(
id int IDENTITY(1,1) NOT NULL,
dt date NOT NULL,
createdt datetime NOT NULL
CONSTRAINT PK_tbl PRIMARY KEY CLUSTERED (id ASC)
)
go
'I usually have this done for me by the database
'rather than pass it from middle tier
'ALTER TABLE tbl ADD CONSTRAINT DF_tbl_createdt
' DEFAULT (getdate()) FOR createdt
insert into tbl(dt,createdt) values
('1/1/13','1/1/13 1:00am'),('1/1/13','1/1/13 2:00am'),('1/1/13','1/1/13 3:00am'),
('1/2/13','1/2/13 1:00am'),('1/2/13','1/1/13 2:00am'),('1/2/13','1/1/13 3:00am')
go
SELECT id,dt,rank=RANK() OVER (PARTITION BY dt ORDER BY createdt ASC)
from tbl

I would say that this is a very bad design thought. Primary keys ideally should be surrogate in nature and thus automatically created by SQL Server.
The logic drafted by you might get implemented well but due to lot of manual-engineering it could lead to lot of complexities, maintenance overhead and performance issues.
For creating PKs you should restrict yourself to either IDENTITY property, SEQUENCES (new in SQL Server 2012), or GUID (newID()).
Even if you want to go with your design you can have a combination of Date type column and an IDENTITY int/bigint column. And you can add an extra computed column to concatenate them. Resetting IDENTITY column every midnight would not be a good idea.

Ok, I found an answer. There may be problems with this method that I don't know about, so comments would be welcome. But this method does work.
CREATE TABLE [dbo].[MainOne](
[DocketDate] NVARCHAR(8),
[DocketNumber] NVARCHAR(10),
[CorpCode] NVARCHAR(5),
CONSTRAINT pk_Docket PRIMARY KEY (DocketDate,DocketNumber)
)
GO
INSERT INTO [dbo].[MainOne] VALUES('20131003','1','CRH')
GO
CREATE TRIGGER AutoIncrement_Trigger ON [dbo].[MainOne]
instead OF INSERT AS
BEGIN
DECLARE #number INT
SELECT #number=COUNT(*) FROM [dbo].[MainOne] WHERE [DocketDate] = CONVERT(DATE, GETDATE())
INSERT INTO [dbo].[MainOne] (DocketDate,DocketNumber,CorpCode) SELECT (CONVERT(DATE, GETDATE
())),(#number+1),inserted.CorpCode FROM inserted
END
Any thoughts? I will wait three days before I mark as answer.
The only reason I'm not marking 'sisdog' is because it doesn't appear that his answer would make this an automatic function when an insert query is run.

Related

Shall I use separate tables for each and every category or one table to store all attributes for a Classified Website?

I am developing a classified website by using asp.net and my DB is mysql. Please MSSQL users I need your support too. this is a problem with database schema not related to a specific database provider.
I just want a little bit clarification from you.
So in here since this is a classified website you can post Job Ads, Vehicle Ads, Real estate ads etc...
So I have a header table to store common details about the ad. like title, description and so on.
CREATE TABLE `ad_header` (
`ad_header_id_pk` int(10) unsigned NOT NULL AUTO_INCREMENT,
`district_id_fk` tinyint(5) unsigned NOT NULL,
`district_name` varchar(50) DEFAULT NULL,
`city_id_fk` tinyint(5) unsigned DEFAULT NULL,
`city_name` varchar(50) DEFAULT NULL,
`category_id_fk` smallint(3) unsigned NOT NULL,
`sub_category_id_fk` smallint(3) unsigned DEFAULT NULL,
`title` varchar(100) NOT NULL,
`description` text NOT NULL,
...............
PRIMARY KEY (`ad_header_id_pk`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
So if It is a Job ad I have another table for to store the attributes that relevant only to a JOb ad like salary, employment type, working hours
Also If it is a vehicle ad I have separate table to store fuel type, transmission type etc...
So I have 10 categories. These categories are not going to change in a decade. So now I have these 2 approaches
1) One header table and 10 specific tables for to store each
categories attributes
2) One header table and One attribute table
that will hold all attributes of each and every classified groups. for those that are not relevant will hold NULL values
What is the best way to do this regarding performance and scalability.
For those who build classified websites please give me a guide. Thanks in advance
The question is not entirely clear to me, but I can give some advice:
First of all, if you find yourself wanting to store delimited values in a single column/cell, you need to step back and create a new table to hold that info. NEVER store delimited data in a single column.
If I understand your question correctly, Ads have Categories like "Job", "For Sale", "Vehicle", "Real Estate", etc. Categories should then have Attributes, where attributes might be things unique to each category, like "Transmission Type" or "Mileage" for the Vehicle category, or "Square Feet" or "Year Constructed" for the Real Estate category.
There is more than one correct way to handle this situation.
If the master categories are somewhat fixed, it is a legitimate design choice to have a separate table for the attributes from each category, such that each ad listing would have one record from ad_header, and one record from the specific Attribute table for that category. So a vehicle listing would have an ad_header record and a vehicle_attributes record.
If the categories are more fluid, it is also a legitimate design choice in this case to have one CateogryAttributes table, that defines the Attributes used with each Category, along with an Ad_Listing_Attributes table that holds the attribute data for each listing, that would include a foreign key to both CategoryAttributes and Ad_header. Note the schema for this table effectively follows Entity/Attribute/Value pattern, which is widely considered to actually be more of an anti-pattern. That is, it's something to be avoided in most cases. But if you expect to be frequently adding new categories, it may be the best you can do here.
A final option is to put attributes from all categories in a single large table, and populate only what you need. So a vehichle listing would have only an ad_header record, but there would be a lot of NULL columns in the record. I'd avoid that in this case, because your ideal scenario would want to require some attributes for certain categories (ie: NOT NULLABLE columns) but leave others options.
This is another case where Postgresql may have been the better DB choice. Postgresql has something called table inheritance, that is specifically designed to address this situation to allow you to avoid an EAV table schema.
Full disclosure: I'm actually a Sql Server guy for most situations, but it does seem like Postgresql may be a better fit for you. My experience is MySql was good in the late 90's and early 00's, but has really lagged behind since. It continues to be popular today mainly because of that early momentum along with some advantage in cheap hosting availability, rather than any real technical merit.

Slow Insert Time With Composite Primary Key in Cassandra

I have been working with Cassandra and I have hit a bit of a stumbling block. For how I need to search on data I found that a Composite primary key works great for what I need but the insert times for the record in this Column Family go to the dogs with it and I am not entirely sure why.
Table Definition:
CREATE TABLE exampletable (
clientid int,
filledday int,
filledtime bigint,
id uuid,
...etc...
PRIMARY KEY (clientid, filledday, filledtime, id)
);
clientid = The internal id of the client. filledday = The number of days since 1/1/1900. filledtime = The number of ticks of the day at which the record was recived. id = A Guid.
The day and time structure exists because I need to be able to filter by day easily and quickly.
I know Cassandra stores Column Families with composite primary keys quite differently. From what I understand it will store the everything as new columns off of a base row of the main component of the primary key. Is that the reason the inserts would be slow? When I say slow I mean that if I just have a primary key on id the insert will take ~200 milliseconds and with the composite primary key (or any subset of it, I tried just clientid and id to the same effect) it will take upwards of 32 seconds for 1000 records. The Select times are faster out of the composite key table since I have to apply secondary indexes and use 'ALLOW FILTERING' in order to get the proper records back with the standard key table (I know I could do this in code but the concern is that I am dealing with some massive data sets and that will not always be practical or possible).
Am I declaring the Column Family or the Primary Key wrong for what I am trying to do? With all the unlisted, non-primary key columns the table is 37 columns wide, would that be the problem? I am quite stumped at this point. I have not be able to really find anything about others having similar problems.
Well, your partition key is the client id, so all writes per client go to one node. If you are writing lots of data per client, you could end up with a hotspot, thus decreasing your overall throughput.
Also, could you give an example of the queries that you run? In Cassandra, the data model always need to resemble the queries you want to run. If you need to "allow filtering", then it seems that something is not quite right with your data model. For instance, I don't really see the point of "filledtime" in your PK. If you want to query by time period, just replace your three column keys with a TimeUUID column "ts". This would create a wide row, with one column per entry with a unique timestam, clustered/partitioned per client id.
This allows queries like:
select * from exampletable where clientid = 123 and ts > minTimeuuid('2013-06-18 16:23:00') and ts < minTimeuuid('2013-06-18 16:24:00');
Again, this would depend on the queries you actually need to run.
And lastly, for overall guidance on data modelling, take a look into this ebay tech blog. Reading it helped me cleared up some things for me.
Hope that helps!

Creating a Timeline and SQL Storage

Language: C#
Compiler: Visual Studio 2012
O/S: Windows 7 Home Premium
Here is a question thats been on many questions, and through a few debates.
I know there are currently provisional .net controls for a functional timeline, as well as hints and tips on how a process would be done, but I have not found (so far) a complete tutorial on a well-maintained SQL-Storage Timeline system.
I need to document almost every change that my site will have. From the addition to user reputation, to the joining / creating and eventual submissions of members, clans games etc.
As far as I know, DateTime in a SQL database should be avoided, especially in large quantities.
What would be the implementation, process, and eventual output of a Timeline?
What you're describing is sometimes known as "Audit history" - and it's often implemented using a single, denormalized, table, however many DB purists will argue against it as you lose strong typing.
The table looks like this:
AuditTable( EventId bigint, DateTime datetime, Subject nvarchar, Table varchar, Column varchar, TablePK bigint, OldValueInt bigint nullable, OldValueStr nvarchar nullable )
-- add more nullable columns for more types, if necessary
Each time a value is changed, such as a user's reputation being increased, you would add a row to this table, such as this:
INSERT INTO AuditTable( Now(), N'User reputation increased', 'Users', 'Reputation', #userId, 100 )
You only need to store the old value (the value before the change) because the new (i.e. current) value will be in the actual table row.
Adding to the Audit table can be done entirely automatically with SQL Server table triggers.
To view a user's reputation history, you would do this:
SELECT * FROM AuditTable WHERE Table = 'Users' AND Column = 'Reputation' AND TablePK = #userId
Now as I said, this design is more for auditing rather than maintaining an easily user-accessible history, these are the disadvantages:
You cannot semantically index the table, so lookups and lists will always be slow
You're storing database metadata as strings, so there's a lot of overhead
There's no referential integrity (this can be a good thing in that the data will remain if you re-architecture the original tables, e.g. removing the Reputation field from the Users table)
If you want to be more "pure" then you really have to design a table structure that directly supports the history-tracking you want to build. You don't need to create a history table for every field - even Stackoverflow doesn't store a history of everything. For example:
UserReputationHistory ( UserId bigint, ReputationChange int, DateTime when, Subject nvarchar )
Of course it does complicate your code to have to maintain these disparate FooHistory tables.
The other things in in your original question that you comment, such as a member's join date doesn't need a history table, you can get that from a DateJoined field in the member's own DB row.

Is this okay to have a Alphanumeric field as a PrimaryKey?

I am rewriting a new timesheet application including redesigning database and it will require data migration from Oracle to Oracle.
In the old system field ‘EmployeeCod’ is a Primary Key and it is in Alphanumeric form i.e. ‘UK001’, ‘UK002’,‘FR001’,’FR002’, ‘US001’ . Employee table is also linked to timesheet and other tables where the EmpCode is being referred as a FK.
To make the JOINs perform faster in the new system I was thinking about adding a new INT column in the Employee table and set it to PK. (Don't know if it will make any big difference)
-Employee table has about 600 rows.
-Data type of EmpCode is Varchar2(20) in old DB which I can reduce to Varchar2(6) in the new system and alter it later as company expends.
I am wondering if it is better to keep the EmpCode as a Primary Key which will make things easier in migrating data or should I add a INT column?
Someone has given me following advise in one of my previous thread:
“if you need to create a composite code of AANNN then I'd split this into two: a simple 'Prefix' field of CHAR(2) and an identity field of INT, then turn EmpCode into a computed field that concats the two and stick an index on there that (#Chris)”
I am not sure if this option would work as employee table is linked to other tables as well. (EmpCode is being used as FK in other tables)
n
If you do add this PK, and also keep the former PK, you will have some data management issues to deal with. Or perhaps your customers. Getting rid of the old PK may not be feasable if there are existing users who will be upgrading to the new database.
If EmployeeCode, the former PK is used by the users of the data to identify Employees, then you will have to add a constraint to make sure that this field is unique. Carrying both codes will wipe out any performance gains you were hoping for.
If it were me, I'd leave well enough alone. The performance gains, if any, will be trivial.
The performance difference will be negligible if the index you're creating on the alphanumeric field is the clustered index for the table. Which, based off of your question is going to be the case, but I wanted to note that for completeness. I say this for two reasons:
A clustered index is the physical order of the table and so when seeking against that index, looking for more data presumably off of the data page in a query, a binary search can be performed against it because it's also physically stored in that order.
A binary search is just about as efficient as you can get, lest we forget though a statistical index. I call this out because integer primary keys build statistical indexes which are as fast a seek as you can get because mathmatically speaking we know 2 comes after 1 for example.
So, just keep that in mind when building alphanumeric, or even compound, keys and indexes and trying to compare the difference between them and an integer key. Personally, I prefer to stick with integer primary keys because I have found them to perform better over time during extreme growth.
I hope this helps.
I use alphanumeric primary keys regularly and see absolutely no issues with it. There is no performance issue, you have a wider addressable space, and you can be more expressive/human readable. Integer keys are just a convention.
Add to that the risk you're adding to you project by adding a major architectural change over and above the porting issues, I'd say stick with the existing schema as much as possible.
There will be no performance improvement - in fact, unless you know and can prove/measure that you have a performance problem, changing things "to make them faster" usually leads to pain.
However, there is a concern that your primary key appears to carry meaning - it's a country code, concatenated with a number. What if an employee moves from the US to the UK? What if the UK hires its 1000th employee?
For that reason, I'd refactor the application to use a meaningless primary key; whether it's an INT or a VARCHAR is not hugely relevant.
You do occassionally come across alphanumeric primary keys.. personally I find it just makes life more difficult.. if you are able to change it and you want to change it, I would say go ahead.. it will make things easier for you later. As for it being an FK, you would need to be careful to write a script to properly update all the data. One way you can do this is:
Step 1: Create a new int column for the PK and set Identity Insert to true
Step 2: Add a new int column in your child table and then:
Step 3: write an update script like this:
UPDATE childTable C
INNER JOIN parentTable P ON C.oldEmpID = P.oldEmpID
SET C.myNewEmpIDColumn = P.myNewEmpIDColumn
Step 4: Repeat steps 2 & 3 for all child tables
Step 5: Delete all old FK columns
Something like that and don't forget to backup your current DB first ;)

Is the usage of identity insert good with metadatatables

I have several tables within my database that contains nothing but "metadata".
For example we have different grouptypes, contentItemTypes, languages, ect.
the problem is, if you use automatic numbering then it is possible that you create gaps.
The id's are used within our code so, the number is very important.
Now I wonder if it isn't better not to use autonumbering within these tables?
Now we have create the row in the database first, before we can write our code. And in my opinion this should not be the case.
What do you guys think?
I would use an identity column as you suggest to be your primary key(surrogate key) and then assign your you candidate key (identifier from your system) to be a standard column but apply a unique constraint to it. This way you can ensure you do not insert duplicate records.
Make sense?
if these are FK tables used just to expand codes into a description or contain other attributes, then I would NOT use an IDENTITY. Identity are good for ever inserting user data, metadata tables are usually static. When you deploy a update to your code, you don't want to be suprised and have an IDENTITY value different than you expect.
For example, you add a new value to the "Languages" table, you expect the ID will be 6, but for some reason (development is out of sync, another person has not implemented their next language type, etc) the next identity you get is different say 7. You then insert or convert a bunch of rows having using Language ID=6 which all fail becuase it does not exist (it is 7 iin the metadata table). Worse yet, they all actuall insert or update because the value 6 you thought was yours was already in the medadata table and you now have a mix of two items sharing the same 6 value, and your new 7 value is left unused.
I would pick the proper data type based on how many codes you need, how often you will need to look at it (CHARs are nice to look at for a few values, helps with memory).
for example, if you only have a few groups, and you'll often look at the raw data, then a char(1) may be good:
GroupTypes table
-----------------
GroupType char(1) --'M'=manufacturing, 'P'=purchasing, 'S'=sales
GroupTypeDescription varchar(100)
however, if there are many different values, then some form of an int (tinyint, smallint, int, bigint) may do it:
EmailTypes table
----------------
EmailType smallint --2 bytes, up to 32k different positive values
EmailTypeDescription varchar(100)
If the numbers are hardcoded in your code, don't use identity fields. Hardcode them in the database as well as they'll be less prone to changing because someone scripted a database badly.
I would use an identity column as the primary key also just for simplicity sake of inserting the records into the database, but then use a column for type of metadata, I call mine LookUpType(int), as well as columns for LookUpId (int value in code) or value in select lists, LookUpName(string), and if those values require additional settings so to speak use extra columns. I personally use two extras, LookUpKey for hierarchical relations, and LookUpValue for abbreviations or alternate values of LookUpName.
Well, if those numbers are important to you because they'll be in code, I would probably not use an IDENTITY.
Instead, just make sure you use a INT column and make it the primary key - in that case, you will have to provide the ID's yourself, and they'll have to be unique.

Categories