Is the usage of identity insert good with metadatatables - c#

I have several tables within my database that contains nothing but "metadata".
For example we have different grouptypes, contentItemTypes, languages, ect.
the problem is, if you use automatic numbering then it is possible that you create gaps.
The id's are used within our code so, the number is very important.
Now I wonder if it isn't better not to use autonumbering within these tables?
Now we have create the row in the database first, before we can write our code. And in my opinion this should not be the case.
What do you guys think?

I would use an identity column as you suggest to be your primary key(surrogate key) and then assign your you candidate key (identifier from your system) to be a standard column but apply a unique constraint to it. This way you can ensure you do not insert duplicate records.
Make sense?

if these are FK tables used just to expand codes into a description or contain other attributes, then I would NOT use an IDENTITY. Identity are good for ever inserting user data, metadata tables are usually static. When you deploy a update to your code, you don't want to be suprised and have an IDENTITY value different than you expect.
For example, you add a new value to the "Languages" table, you expect the ID will be 6, but for some reason (development is out of sync, another person has not implemented their next language type, etc) the next identity you get is different say 7. You then insert or convert a bunch of rows having using Language ID=6 which all fail becuase it does not exist (it is 7 iin the metadata table). Worse yet, they all actuall insert or update because the value 6 you thought was yours was already in the medadata table and you now have a mix of two items sharing the same 6 value, and your new 7 value is left unused.
I would pick the proper data type based on how many codes you need, how often you will need to look at it (CHARs are nice to look at for a few values, helps with memory).
for example, if you only have a few groups, and you'll often look at the raw data, then a char(1) may be good:
GroupTypes table
-----------------
GroupType char(1) --'M'=manufacturing, 'P'=purchasing, 'S'=sales
GroupTypeDescription varchar(100)
however, if there are many different values, then some form of an int (tinyint, smallint, int, bigint) may do it:
EmailTypes table
----------------
EmailType smallint --2 bytes, up to 32k different positive values
EmailTypeDescription varchar(100)

If the numbers are hardcoded in your code, don't use identity fields. Hardcode them in the database as well as they'll be less prone to changing because someone scripted a database badly.

I would use an identity column as the primary key also just for simplicity sake of inserting the records into the database, but then use a column for type of metadata, I call mine LookUpType(int), as well as columns for LookUpId (int value in code) or value in select lists, LookUpName(string), and if those values require additional settings so to speak use extra columns. I personally use two extras, LookUpKey for hierarchical relations, and LookUpValue for abbreviations or alternate values of LookUpName.

Well, if those numbers are important to you because they'll be in code, I would probably not use an IDENTITY.
Instead, just make sure you use a INT column and make it the primary key - in that case, you will have to provide the ID's yourself, and they'll have to be unique.

Related

Shall I use separate tables for each and every category or one table to store all attributes for a Classified Website?

I am developing a classified website by using asp.net and my DB is mysql. Please MSSQL users I need your support too. this is a problem with database schema not related to a specific database provider.
I just want a little bit clarification from you.
So in here since this is a classified website you can post Job Ads, Vehicle Ads, Real estate ads etc...
So I have a header table to store common details about the ad. like title, description and so on.
CREATE TABLE `ad_header` (
`ad_header_id_pk` int(10) unsigned NOT NULL AUTO_INCREMENT,
`district_id_fk` tinyint(5) unsigned NOT NULL,
`district_name` varchar(50) DEFAULT NULL,
`city_id_fk` tinyint(5) unsigned DEFAULT NULL,
`city_name` varchar(50) DEFAULT NULL,
`category_id_fk` smallint(3) unsigned NOT NULL,
`sub_category_id_fk` smallint(3) unsigned DEFAULT NULL,
`title` varchar(100) NOT NULL,
`description` text NOT NULL,
...............
PRIMARY KEY (`ad_header_id_pk`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
So if It is a Job ad I have another table for to store the attributes that relevant only to a JOb ad like salary, employment type, working hours
Also If it is a vehicle ad I have separate table to store fuel type, transmission type etc...
So I have 10 categories. These categories are not going to change in a decade. So now I have these 2 approaches
1) One header table and 10 specific tables for to store each
categories attributes
2) One header table and One attribute table
that will hold all attributes of each and every classified groups. for those that are not relevant will hold NULL values
What is the best way to do this regarding performance and scalability.
For those who build classified websites please give me a guide. Thanks in advance
The question is not entirely clear to me, but I can give some advice:
First of all, if you find yourself wanting to store delimited values in a single column/cell, you need to step back and create a new table to hold that info. NEVER store delimited data in a single column.
If I understand your question correctly, Ads have Categories like "Job", "For Sale", "Vehicle", "Real Estate", etc. Categories should then have Attributes, where attributes might be things unique to each category, like "Transmission Type" or "Mileage" for the Vehicle category, or "Square Feet" or "Year Constructed" for the Real Estate category.
There is more than one correct way to handle this situation.
If the master categories are somewhat fixed, it is a legitimate design choice to have a separate table for the attributes from each category, such that each ad listing would have one record from ad_header, and one record from the specific Attribute table for that category. So a vehicle listing would have an ad_header record and a vehicle_attributes record.
If the categories are more fluid, it is also a legitimate design choice in this case to have one CateogryAttributes table, that defines the Attributes used with each Category, along with an Ad_Listing_Attributes table that holds the attribute data for each listing, that would include a foreign key to both CategoryAttributes and Ad_header. Note the schema for this table effectively follows Entity/Attribute/Value pattern, which is widely considered to actually be more of an anti-pattern. That is, it's something to be avoided in most cases. But if you expect to be frequently adding new categories, it may be the best you can do here.
A final option is to put attributes from all categories in a single large table, and populate only what you need. So a vehichle listing would have only an ad_header record, but there would be a lot of NULL columns in the record. I'd avoid that in this case, because your ideal scenario would want to require some attributes for certain categories (ie: NOT NULLABLE columns) but leave others options.
This is another case where Postgresql may have been the better DB choice. Postgresql has something called table inheritance, that is specifically designed to address this situation to allow you to avoid an EAV table schema.
Full disclosure: I'm actually a Sql Server guy for most situations, but it does seem like Postgresql may be a better fit for you. My experience is MySql was good in the late 90's and early 00's, but has really lagged behind since. It continues to be popular today mainly because of that early momentum along with some advantage in cheap hosting availability, rather than any real technical merit.

Storing a Dictionary<int,string> or KeyValuePair in a database

I wanted to see what others have experienced when working with types like List<> or Dictionary<> and having in turn storing and retrieving that data?
Here's an example scenario: users will be creating their own "templates", where these templates is essentially a collection of Dictionary, e.g. for user1, values are (1, Account), (2, Bank), (3, Code), (4, Savings), and for user2, values (unrelated) could be (1, Name), (2, Grade), (3, Class), and so on. These templates/lists could be of varying length but they will always have an index and a value. Also, each list/ template will have one and only one User linked to it.
What types did you choose on the database side?
And pain-points and/or advice I should be aware of?
As far as the types within the collection go, there is a fairly 1-to-1 mapping between .Net types and SQL types: SQL Server Data Type Mappings. You mostly need to worry about string fields:
Will they always be ASCII values (0 - 255)? Then use VARCHAR. If they might contain non-ASCII / UCS-2 characters, then use NVARCHAR.
What is their likely max length?
Of course, sometimes you might want to use a slightly different numeric type in the database. The main reason would be if an int was chosen on the app side because it "easier" (or so I have been told) to deal with than Int16 and byte, but the values will never be above 32,767 or 255, then you should most likely use SMALLINT or TINYINT respectively. The difference between int and byte in terms of memory in the app layer might be minimal, but it does have an impact in terms of physical storage, especially as row counts increase. And if that is not clear, "impact" means slowing down queries and sometimes costing more money when you need to buy more SAN space. But, the reason I said to "most likely use SMALLINT or TINYINT" is because if you have Enterprise Edition and have Row Compression or Page Compression enabled, then the values will be stored in the smallest datatype that they will fit in.
As far as retrieving the data from the database, that is just a simple SELECT.
As far as storing that data (at least in terms of doing it efficiently), well, that is more interesting :). A nice way to transport a list of fields to SQL Server is to use Table-Valued Parameters (TVPs). These were introduced in SQL Server 2008. I have posted a code sample (C# and T-SQL) in this answer on a very similar question here: Pass Dictionary<string,int> to Stored Procedure T-SQL. There is another TVP example on that question (the accepted answer), but instead of using IEnumerable<SqlDataRecord>, it uses a DataTable which is an unnecessary copy of the collection.
EDIT:
With regards to the recent update of the question that specifies the actual data being persisted, that should be stored in a table similar to:
UserID INT NOT NULL,
TemplateIndex INT NOT NULL,
TemplateValue VARCHAR(100) NOT NULL
The PRIMARY KEY should be (UserID, TemplateIndex) as that is a unique combination. There is no need (at least not with the given information) for an IDENTITY field.
The TemplateIndex and TemplateValue fields would get passed in the TVP as shown in my answer to the question that I linked above. The UserID would be sent by itself as a second SqlParameter. In the stored procedure, you would do something similar to:
INSERT INTO SchemaName.TableName (UserID, TemplateIndex, TemplateName)
SELECT #UserID,
tmp.TemplateIndex,
tmp.TemplateName
FROM #ImportTable tmp;
And just to have it stated explicitly, unless there is a very specific reason for doing so (which would need to include never, ever needing to use this data in any queries, such that this data is really just a document and no more usable in queries than a PDF or image), then you shouldn't serialize it to any format. Though if you were inclined to do so, XML is a better choice than JSON, at least for SQL Server, as there is built-in support for interacting with XML data in SQL Server but not so much for JSON.
List or any collection's representation in databases are supposed to be tables. Always think of it as a collection and relate it to what a database offers.
Though you can always serialize a collection, i do not suggest it since updating or inserting records, you'd always update the whole record or data whereas having a table, you'd only have to query for the KEY wherein Dictionary, you already have it.

Is this okay to have a Alphanumeric field as a PrimaryKey?

I am rewriting a new timesheet application including redesigning database and it will require data migration from Oracle to Oracle.
In the old system field ‘EmployeeCod’ is a Primary Key and it is in Alphanumeric form i.e. ‘UK001’, ‘UK002’,‘FR001’,’FR002’, ‘US001’ . Employee table is also linked to timesheet and other tables where the EmpCode is being referred as a FK.
To make the JOINs perform faster in the new system I was thinking about adding a new INT column in the Employee table and set it to PK. (Don't know if it will make any big difference)
-Employee table has about 600 rows.
-Data type of EmpCode is Varchar2(20) in old DB which I can reduce to Varchar2(6) in the new system and alter it later as company expends.
I am wondering if it is better to keep the EmpCode as a Primary Key which will make things easier in migrating data or should I add a INT column?
Someone has given me following advise in one of my previous thread:
“if you need to create a composite code of AANNN then I'd split this into two: a simple 'Prefix' field of CHAR(2) and an identity field of INT, then turn EmpCode into a computed field that concats the two and stick an index on there that (#Chris)”
I am not sure if this option would work as employee table is linked to other tables as well. (EmpCode is being used as FK in other tables)
n
If you do add this PK, and also keep the former PK, you will have some data management issues to deal with. Or perhaps your customers. Getting rid of the old PK may not be feasable if there are existing users who will be upgrading to the new database.
If EmployeeCode, the former PK is used by the users of the data to identify Employees, then you will have to add a constraint to make sure that this field is unique. Carrying both codes will wipe out any performance gains you were hoping for.
If it were me, I'd leave well enough alone. The performance gains, if any, will be trivial.
The performance difference will be negligible if the index you're creating on the alphanumeric field is the clustered index for the table. Which, based off of your question is going to be the case, but I wanted to note that for completeness. I say this for two reasons:
A clustered index is the physical order of the table and so when seeking against that index, looking for more data presumably off of the data page in a query, a binary search can be performed against it because it's also physically stored in that order.
A binary search is just about as efficient as you can get, lest we forget though a statistical index. I call this out because integer primary keys build statistical indexes which are as fast a seek as you can get because mathmatically speaking we know 2 comes after 1 for example.
So, just keep that in mind when building alphanumeric, or even compound, keys and indexes and trying to compare the difference between them and an integer key. Personally, I prefer to stick with integer primary keys because I have found them to perform better over time during extreme growth.
I hope this helps.
I use alphanumeric primary keys regularly and see absolutely no issues with it. There is no performance issue, you have a wider addressable space, and you can be more expressive/human readable. Integer keys are just a convention.
Add to that the risk you're adding to you project by adding a major architectural change over and above the porting issues, I'd say stick with the existing schema as much as possible.
There will be no performance improvement - in fact, unless you know and can prove/measure that you have a performance problem, changing things "to make them faster" usually leads to pain.
However, there is a concern that your primary key appears to carry meaning - it's a country code, concatenated with a number. What if an employee moves from the US to the UK? What if the UK hires its 1000th employee?
For that reason, I'd refactor the application to use a meaningless primary key; whether it's an INT or a VARCHAR is not hugely relevant.
You do occassionally come across alphanumeric primary keys.. personally I find it just makes life more difficult.. if you are able to change it and you want to change it, I would say go ahead.. it will make things easier for you later. As for it being an FK, you would need to be careful to write a script to properly update all the data. One way you can do this is:
Step 1: Create a new int column for the PK and set Identity Insert to true
Step 2: Add a new int column in your child table and then:
Step 3: write an update script like this:
UPDATE childTable C
INNER JOIN parentTable P ON C.oldEmpID = P.oldEmpID
SET C.myNewEmpIDColumn = P.myNewEmpIDColumn
Step 4: Repeat steps 2 & 3 for all child tables
Step 5: Delete all old FK columns
Something like that and don't forget to backup your current DB first ;)

Sync ID Across Multiple Entities

I have an application running that has entities that might be: CustomerType1, CustomerType2, and CustomerType3.
All three CustomerType entities might have completely different information, but they all have a CustomerID field which is an integer.
I am trying to figure out how to set things up so that no matter which type is created, the CustomerID will always be unique across all three types, and remain an integer.
For example, creating the following would result in the following CustomerID
CustomerType1 - 1
CustomerType1 - 2
CustomerType1 - 3
CustomerType2 - 4
CustomerType1 - 5
CustomerType3 - 6
CustomerType1 - 7
What is the best way to approach this?
2 possible approaches:
Use a single table for all of your customer types with a "discriminator" field to track each customer type, and include the CustomerID as its identity.
Use an external table to manage creating the CustomerID as an identity field.
The first approach has the advantage of having direct support in the Entity Framework for as outlined in the following tutorial:
http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/implementing-inheritance-with-the-entity-framework-in-an-asp-net-mvc-application
Note that although your customer types may contain different data, this approach could end up being cheaper in the long run in terms of scalability despite the wasted database space. In your business layer, the customer types could simply ignore the fields that don't pertain to them.
The second approach would probably be best suited to adding onto existing applications that are too difficult to change. In the long run, there is more work involved with keeping track of the IDs this way. For one, your business layer will need to fetch an ID from one table in order to insert into another table, which can be expensive for large datasets. Depending on the requirements of the business layer, there may also be scenarios where you have to discard an unused CustomerID and it would simply not exist in the system (you would skip from CustomerID 58 to CustomerID 60 for example).
Your approach is similar to having the same entity type in network databases, with unique IDs.
Usually, many developers use a UID or related type. Its a type that has from 64, 128, or more digits, and its generated randomly and automatically. If you use the same database in different machines, its almost imposible to get the same value.
Some databases store that value as a string, instead of an integer.
If you only had a single table with integer keys, how do you generate the primary key value ? Automatic ? Do you generate the value in code, and later assigned to the primary key field ?
Solution 1
If your database supports U.I.D. or O.I.D. or Unique identifiers, that are generated automatically, as an integers, use them.
Solution 2
If your database supports U.I.D. or O.I.D. as varchar / string, or the database engine, or program that you use, have a function that generates U.I.D., you may use that function, cast the result value from string to integer, (stripping separators like "-"), and stored in an integer primary key field.
Summary
Many developers prefer to let the database engine generate the primary key automatically, when inserting a new record. In cases, likes this, its better, to generate the primary key in code, and assign it directly. Since you are using "Entity Framework", I ignore how does that library handles primary keys.
Cheers.

Is it bad practice to implement a separate table consisting of only two rows Female and Male?

Assume we have a Person table with 3 columns:
PersonId of type int. (Primary key)
Name of type string.
GenderId of type int (Foreign key referencing Gender table).
The Gender table consists of 2 columns:
GenderId of type int.
Name of type string.
My question is:
Is it worth implementing the Gender table? Or it causes performance degradation? What is the best way to handle this?
Edit 1:
I have to populate a drop down control with a list of fixed genders (female and male) in my UI.
I think the best approach in this case is a compromise:
Create a table called Gender with a single varchar column called 'Name' or 'Gender'. Gender is really a natural primary key. Put the values 'Male' and 'Female' in it.
Create foreign key to your Person table on a column named 'Gender'.
Now you only need to query from one table, but you're still protected from data inconsistencies by the foreign key, and you can pull the values for your dropdown from the Gender table if you want to. Best of both worlds.
Additionally, it makes life easier for someone working in the database, because they don't need to remember which arbitrary ids you've assigned to Male/Female.
If you have a field with only two possible values, you don't need another table for it. You can just use something like a BIT (0=male, 1=female) or a CHAR ('M' and 'F').
I am firm believe in lookup-tables for this -- which is essentially what is being proposed but with one distinction: use friendly non-auto-generated PKs.
For instance the PKs might be: "M", "F", "N" (and there might be 2-4 or so rows depending upon accepted gender classifications). Using a simple PK allows easy queries while still allowing a higher form of normalization and referential consistency constraints without having to employ check-constraints.
As the question proposes, I also employ additional columns, such as a Name/Title/Label as appropriate (these are useful as a reference and add self-documentation to the identities). McCarthy advocates using this data itself as the PK (which is one option), but I consider this a trait of the identity and use more terse hand-picked PK.
In this sense, I hold the entire concept of lookup-tables to provide the same sort of role as "constants" in code.
Normalizing gender into a separate table is overkill in this instance.
Why not just have GenderType as a string in the first table?
That way you save having to generate and store an extra GenderID (try to minimise the use of IDs as otherwise all you'll have in a table is a whole lot of columns just pointing to other tables... over normalization)
Adding to what other people are saying, you can also create an INDEX ( PersonId, GenderId ) to fasten up the calculations.
Given that you only have two possible genders, and that this is extremely unlikely to need to change in the future, I would not bother to have a separate table. Just add a column to your Person table. A join can be efficient if needed, but it is always slower than no join.
And if, for whatever reason, you feel the need for more than two possible genders, you can still store them in a single column in the Person table.

Categories