Database relation traversing multiple tables - c#

I would like to expand on this question in terms of performance. The db schema was:
Make
MakeId
MakeName
Model
ModelId
ModelName
MakeId (FK)
Vehicle
VehicleId
DatePurchased
ModelId (FK)
If I want to know the Make of a Vehicle I would need to traverse over the Model table using Vehicle.Model.Make. Let's assume I have not three but 4 or 5 tables which are connected this way, so that I would have to write e.g. InvoiceForVehicle.Vehicle.Model.Make. This would result in bad query performance, I think.
I could add an additional column MakeId (FK) to the InvoiceForVehicle table which goes directly to the make. This would mean I have duplicate data and that every time I change the relation between the InvoiceForVehicle and a vehicle I would have to update the MakeId (FK) accordingly.
InvoiceForVehicle
InvoiceId
DateCreated
VehicleId (FK)
MakeId (FK)
Is that a good idea?

For performance reasons: Maybe
For consistency reasons: No
Using what you suggest, it will be possible to have a Vehicle that's connected to a Make that doesn't correspond to (the Model of) that Vehicle!
You could try to use composite (and maybe natural) keys all the way down, with corresponding composite foreign keys. The important foreign key in this case would be the one going from Vehicle (MakeId, ModelId) to Model (MakeId, ModelId).

I doubt you will see much of a hit (if any) since you will be referencing by an ID and not doing any actual searching.
I think (based on my knowledge) that your existing model is structured more correctly then the new proposed solution. You shouldn't put yourself in a situation where you can lose data integrity like your new solution would allow.
So to answer your question. No, I don't think the new idea is a good solution. Your existing setup is more "correct" based on database normal form. Also, since Entity Framework lazy loads data you won't actually be running any queries that aren't needed.

Related

Implement a "One-to-many" relationship with DataSets

I have two tables, one containing patient information, the other, the notes for each patient.
(One patient, many notes for a patient).
Given this, in the Designer (which you access by right-clicking on the chosen DataSet), how do I create a one-to-many relationship? I have never performed this before.
Secondly, for the patient notes table, how would I add a note to a patient record using SQL syntax? Note, this is not updating an existing one, but adding a completely new one to the patientNotes table using the unique patient ID number as the reference (so only that specific patient has that note added to them, not them and everyone else).
Very technically speaking, you don't need to do anything to create a one-to-many relationship. You just have to have the two tables set up as you have them and use them as you intend on using them. I work in data warehousing and unfortunately a great many of our relationships like this are not formalized with any sort of key or constraint.
The correct way to do it is to implement a foreign key constraint on the patient ID column on the patientNotes table. A FK will only allow you to insert data into patientNotes IF the patient ID exists in the patient table. If you would try to insert a note into your table that has a patient ID that doesn't exist in the patient table, the insert would fail and the SQL engine would give you an error. Note that the column on the patients table that you are creating the FK to must be a primary key.
Inserting data will really go as any other insert would:
INSERT INTO dbo.patientNotes (patientId, NoteText)
VALUES(4265, 'During his 8/14/2014 visit, Mr. Cottinsworth complained of chest pains. Evidently he has been wearing a lady''s corset to hide his large gut. Advised the very portly Mr. Cottinsworth to discontinue corset use'
You could toss that in a SP, put it in your code and use parameters for the patientId and NoteText, however you wanted to do it.
As far as doing this all in Visual Studio graphically, I can't be of much help there. I typically use the TSQL editor and type out what I want to do to the DB. I'm sure there are tutorials abound on how to set up FKs on Visual Studio.
Further reading:
http://msdn.microsoft.com/en-us/library/ms189049.aspx
http://www.scarydba.com/2010/11/22/do-foreign-key-constraints-help-performance/
what are the advantages of defining a foreign key

A Recipe Database Model Advice

I'm trying to design a snackbar automation system (to answer the first question - No, it's not a homework, it's for learning purposes) and I have an issue with with the recipes and how to present them in the database. I have two options:
Option 1:
[Ingredients] -> [IngrID, IngrName]
[Recipe] -> [RecipeID, RecipeName]
[IngRecipe] -> [IngrID, RecipeID]
In this case the third table is a typical many-to-many table, the model looks correct and it's a piece of cake to manipulate with the data using Entity Framework. But I want to keep track of the amounts as well. Generally I use Ingredients as a table to insert the new purcheses. If the ingredient exist, just update the amount.
Option 2
Now if I add column "amount" as a column to IngRecipe, the whole idea of many-to-many table vanishes and I no longer can use the entity model to fill the fields automatically. But I can't seem to find a more apropriete place for this column. Where and how will I say "Well, get me 100 gr of chicken breast and add it to whatever recipe"
Any help is appreciated. Thanks in advance!
It's a solid model start, consider:
RecipieIngredients -> Recipe (FK), Ingredient (FK), IngredientQuantity
Key over (Recipe, Ingredient)
Note that it is still a M-M relationship (the quantity is not part of the PK nor involved in a FK), just with more relevant data for this relationship pair. The names can be changed, but at some point, this must be represented as a M-M relationship in a normalized relational model.
Don't let the framework ruin a good normalized design - and I hope EF can cope with such trivial scenarios; even LINQ2SQL can.

problem in show view when delete foreign key?

I have a student table and an education table, with the PK of the education table as a foreign key of the student table. However, when the education is deleted, the student no longer appears in the view. How do I solve this problem?
From the information you have given, my guess is that you have enforced referencial integrity on your database. This means that when you deleted a row in education, the students that were linked with it were also deleted.
I find that it is good practice to never delete data from tables due to other fields being dependent on them. Instead, you should have a boolean value in the table that is called 'IsDeleted' and just change that to True when you want to 'Delete' it, and when you pull data make sure you filter out anything that has the 'IsDeleted' set to 'True'
Based on what you are asking I think you should first rethink your database structure.
Answer the following questions:
Does it make sense to have a student with a non existant education?
This would be the case if you deleted an education in your Educations table but students with a FK to that education row lived on in your database. This seems to be what you are asking for but it doesn't make much sense as it doesn't mantain data integrity.
Should you be allowed to delete an education if students are enlisted in said education?
If it shouldn`t be allowed then you would only need to disable cascade deleting in your 1 to many relationship and your problem would be solved.
If an eduction is deleted, should all students assigned to said education remain in the data base?
This is what you want but with the structure of your database it is not straightforward to achieve.
Easier solution?
One would be to create 3 tables instead of 2:
Educations
Students
StudentsEducationAssignments
In 1 you store eveything that has to do ONLY with your education entities. In 2 only what has to do with your student entities (note that what type of education they choose is not something that ONLY describes the student). In 3 you store what students are assigned to what educations.
This way, if you delete an education, the students assigned to it will not be deleted, only the information that ties students to that specific education. You keep database integrity easier this way.
Hope this helps.
Maybe an OUTER JOIN instead of an INNER JOIN in your view?
If you show us the view definition we might be able to help more, without it we're just guessing.

Is it bad practice to implement a separate table consisting of only two rows Female and Male?

Assume we have a Person table with 3 columns:
PersonId of type int. (Primary key)
Name of type string.
GenderId of type int (Foreign key referencing Gender table).
The Gender table consists of 2 columns:
GenderId of type int.
Name of type string.
My question is:
Is it worth implementing the Gender table? Or it causes performance degradation? What is the best way to handle this?
Edit 1:
I have to populate a drop down control with a list of fixed genders (female and male) in my UI.
I think the best approach in this case is a compromise:
Create a table called Gender with a single varchar column called 'Name' or 'Gender'. Gender is really a natural primary key. Put the values 'Male' and 'Female' in it.
Create foreign key to your Person table on a column named 'Gender'.
Now you only need to query from one table, but you're still protected from data inconsistencies by the foreign key, and you can pull the values for your dropdown from the Gender table if you want to. Best of both worlds.
Additionally, it makes life easier for someone working in the database, because they don't need to remember which arbitrary ids you've assigned to Male/Female.
If you have a field with only two possible values, you don't need another table for it. You can just use something like a BIT (0=male, 1=female) or a CHAR ('M' and 'F').
I am firm believe in lookup-tables for this -- which is essentially what is being proposed but with one distinction: use friendly non-auto-generated PKs.
For instance the PKs might be: "M", "F", "N" (and there might be 2-4 or so rows depending upon accepted gender classifications). Using a simple PK allows easy queries while still allowing a higher form of normalization and referential consistency constraints without having to employ check-constraints.
As the question proposes, I also employ additional columns, such as a Name/Title/Label as appropriate (these are useful as a reference and add self-documentation to the identities). McCarthy advocates using this data itself as the PK (which is one option), but I consider this a trait of the identity and use more terse hand-picked PK.
In this sense, I hold the entire concept of lookup-tables to provide the same sort of role as "constants" in code.
Normalizing gender into a separate table is overkill in this instance.
Why not just have GenderType as a string in the first table?
That way you save having to generate and store an extra GenderID (try to minimise the use of IDs as otherwise all you'll have in a table is a whole lot of columns just pointing to other tables... over normalization)
Adding to what other people are saying, you can also create an INDEX ( PersonId, GenderId ) to fasten up the calculations.
Given that you only have two possible genders, and that this is extremely unlikely to need to change in the future, I would not bother to have a separate table. Just add a column to your Person table. A join can be efficient if needed, but it is always slower than no join.
And if, for whatever reason, you feel the need for more than two possible genders, you can still store them in a single column in the Person table.

long vs Guid for the Id (Entity), what are the pros and cons

I am doing a web-application on asp.net mvc and I'm choosing between the long and Guid data type for my entities, but I don't know which one is better. Some say that long is much faster. Guid also might have some advantages. Anybody knows ?
When GUIDs can be Inappropriate
GUIDs are almost always going to be slower because they are larger. That makes your indexes larger. That makes your tables larger. That means that if you have to scan your tables, either wholly or partially, it will take longer and you will see less performance. This is a huge concern in reporting based systems. For example, one would never use a GUID as a foreign key in a fact table because its length would usually be significant, as fact tables are often partially scanned to generate aggregates.
Also consider whether or not it is appropriate to use a "long". That's an enormously large number. You only need it if you think you might have over 2 BILLION entries in your table at some point. It's rare that I use them.
GUIDs can also be tough to use and debug. Saying, "there's a problem with Customer record 10034, Frank, go check it out" is a lot easier than saying "there's a problem with {2f1e4fc0-81fd-11da-9156-00036a0f876a}..." Ints and longs are also easier to type into queries when you need to.
Oh, and it's not the case that you never get the same GUID twice. It has been known to happen on very large, disconnected systems, so that's something to consider, although I wouldn't design for it in most apps.
When GUIDs can be Appropriate
GUIDs are the appropriate when you're working with disconnected systems where entities are created and then synchronized. For example, if someone makes a record in your database on a mobile device and syncs it, or you have entities being created at different branch offices and synced to a central store at night. That's the kind of flexibility they give you.
GUIDs also allow you the ability to associate entities without persisting them to the database, in certain ORM scenarios. Linq to SQL (and I believe the EF) don't have this problem, though there are times you might be forced to submit your changes to the database to get a key.
If you create your GUIDs on the client, it's possible that since the GUIDs you create are not sequential, that insert performance could suffer because of page splits on the DB.
My Advice
A lot of stuff to consider here. My vote is to not use them unless you have a compelling use case for them. If performance really is your goal, keep your tables small. Keep your fields small. Keep your DB indexes small and selective.
SIZE:
Long is 8 bytes
Guid is 16 bytes
GUID has definitely high probability for going to be unique and is best to use for identification of individual records in a data base(s).
long (Identity in DB), might represent a unique record in a table but you might have records represented by same ID (Identity), in one or more different table like as follows:
TableA: PersonID int, name varchar(50)
TableB: ProductID int, name varchar(50)
SELECT PersonID from TableA where name =''
SELECT ProductID from TableB where name =''
both can return same value, but in case of GUID :
TableA: PersonID uniqueidentifier, name varchar(50)
TableB: ProductID uniqueidentifier, name varchar(50)
SELECT PersonID from TableA where name =''
SELECT ProductID from TableB where name ='
you can rarely have same value as id returned from two tables
Have a look here
SQL Server - Guid VS. Long
GUIDs as PRIMARY KEYs and/or the clustering key
Guids make it much easier to create a 'fresh' entity in your API because you simply assign it the value of Guid.NewGuid(). There's no reliance on auto-incremented keys from a database, so this better decouples the Domain Model from the underlying persistence mechanism.
On the downside, if you use a Guid as the Clustered Index in SQL Server, inserts become expensive because new rows are very rarely added to the end of the table, so the index needs to be rebuilt very often.
Another issue is that if you perform selects from such a database without specifying an explicit ordering, you get out results in an essentially random order.

Categories