Lets say I have a table Person and I need that a user can add different attributes to him/herself.
User should be able to add a date, string, number, boolean, multiple values.
Lets say he wants to add:
Date of birth
Name
Heigth
Children names
How would I hold this in database?
I have 2 ideas:
I can hold all the values as string or varchar and always parse the value back to original format when used. Multiple values holding like text1#text2#text3 or similar.
Having a table, where there are columns for each : date, string, number and only the one that is needed will be populated and other will stay nulls.
Any suggestions?
Good database design should always be N:1 (many to one) or 1:1 (one to one), never 1:N (one to many) or N:N (many to many), meaning that if you have multiple related fields of a user, you should make a new table that refers to the user.
Since a user can only have one birth date though, you should keep that as a column to the Users table.
For example, in this case you want children names as the "multiple", assigned to one user.
A simple table for that could look like this:
ID int primary key
UserID int references User(ID)
Name varchar
That way, you can make multiple children names for one user, while still being able to keep constraints in the database (which helps ensure code correctness if you're interfacing with it through an application!)
Some people will suggest having a table for each of the values, just to avoid nulls. For example, to store their birthdate, you make a table similar to the Children names table above, since you won't have to make a column in the Users table that might be null.
Personally I think using nulls are fine, as they allow you to see if there is a relevant result set without joining (or worse, left joining) an entire table of potentially irrelevant information.
Use your second approach. In your table 'Person', have a row for each record that has multiple columns each which holds a single value for you desired fields.
So..
tbPerson
ID | Date Of Birth | Name | Height | Childrens names | etc...
To Create a table...
CREATE TABLE tbPerson([ID] INT IDENTITY(1,1), [Date Of Birth] DATE, [Name] VARCHAR(50), Height INT, [Childrens names] VARCHAR(250))
This is the best and easiest way and enables editing 1 field of a persons records simple. In your first approach you will have endless nightmares storing everything a 1 long string.
Related
I need to create sqlite statement that checks if a specific column value is not null then add a new column and insert the new value.
What I already have is this :
CREATE TABLE "StuPayment" (
"PayNumber" INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
"StudentName" ntext,
"CourseName" ntext,
"PayDate" datetime,
"CheckNumber" NUMERIC,
"Amount" NUMERIC
)
What I want to create is a new payment columns (PayDate1,CheckNumber1,Amount1) when a student pays a course cost on two payments or maybe three sometimes.
Thanks for your time reading this.
No. Don't do it. Just record the payments in multiple rows. You already have a datetime column, so each payment is recorded separately.
There are multiple advantages to new rows:
You can easily search for things like amount > 1000 and not have to worry about extra columns.
You can use an index to search on the payment columns, such as getting all payments on a particular date.
PayNumber uniquely identifies each payment.
You don't have to reserve space for empty values in all the rows.
Adding new payment methods (say credit cards, debit cards, direct debit, or other mechanisms) is simpler, because you don't have to multiply the columns for each potential payment.
You can more easily support payment plans, such as one payment per week.
Your concern about 10,000 rows/year is not relevant in today's world. Databases and computers are powerful.
If you want to see all the payments that a student has made, you can use:
select studentname, coursename, count(*) as numpayments, sum(amount)
from stupayment
group by studentname, coursename;
I would recommend to have a view created and do whatever manipulation you want as I see theres unclear logical requirement. Go for view Creation
Create view as (Select * from table) ;
Alter view add column c1 ;
On a SQL Server database I have a table with about 200 items:
create table dbo.Reports
(
Id int identity not null,
HealthRating int not null, --- { One of: NoProblems; TemporaryChange; ... }
Hobbies int not null, --- { Many of: None, Running, Tennis, Football, Swimming }
HobbiesOthers nvarchar (400) null
-- More 100 columns
);
So I have about 200 columns with types: INT, NVARCHAR, BIT and DATETIME.
Some of the INT columns are as HealthRating to store one value.
Others are like Hobbies to hold many items ... And usually have an extra column to store other options as text (nvarchar) ...
How should I structure this table? I see 3 options:
Have one column for each property so:
HealthRatingNoProblems bit no null,
HealthRatingTemporaryChange bit no null,
Create lookup tables for HealthRatings, Hobbies, ...
Probably I will end with more 60 tables or so ...
Use Enums and Flag enums which are now supported in Entity Framework and store one choice and multiple choice items in Int columns as I posted.
What would you suggest?
By all means -- please! -- normalize that poor table. If you end up with 50 or 60 tables then so be it. That is the design. If a user has a hobby, that information will be in the Hobby table. If he has three hobbies, there will be three entries in the Hobby table. If he doesn't have any hobbies, there will be nothing in the Hobby table. So on with all the other tables.
And for all those times you are only interested in hobbies, you only involve the Hobby table with the Reports table and leave all the other tables alone. You can't do that with one huge, all encompassing row that attempts to hold everything. There, if you only want to look at hobby information, you still have to read in the entire row, bringing in all that data you don't want. Why read in data you are just going to discard?
Lets say I need to fetch some records from the database, and filter them based on an enumeration type property.
fetch List<SomeType>
filter on SomeType.Size
enumeration Size { Small, Medium, Large }
when displaying records, there will be a predefined value for Size filter (ex Medium). In most of the cases, user will select a value from filtered data by predefined value.
There is a possibility that a user could also filter to Large, then filter to Medium, then filter to Large again.
I have different situations with same scenario:
List contains less than 100 records and 3-5 properties
List contains 100-500 records and 3-5 properties
List contains max 2000 records with 3-5 properties
What is my best approach here? Should I have a tab that will contain grid for each enum, or should I have one common enum and always filter, or?
I would do the filtering right on the database, if those fields are indexed I would suspect having the db filter it would be much faster than filtering with c-sharp after the fact.
Of course you can always cache the filtered database result as to prevent multiple unnescessary database calls.
EDIT: as for storing the information in the database, suppose you had this field setup:
CREATE TABLE Tshirts
(
id int not null identity(1,1),
name nvarchar(255) not null,
tshirtsizeid int not null,
primary key(id)
)
CREATE TABLE TshirtSizes
(
id int not null, -- not auto-increment
name nvarchar(255)
)
INSERT INTO TshirtSizes(id, name) VALUES(1, 'Small')
INSERT INTO TshirtSizes(id, name) VALUES(2, 'Medium')
INSERT INTO TshirtSizes(id, name) VALUES(3, 'Large')
ALTER TABLE Tshirts ADD FOREIGN KEY(tshirtsizeid) REFERENCES tshirtsize(id)
then in your C#
public enum TShirtSizes
{
Small = 1,
Medium = 2,
Large = 3
}
In this example, the table TshirtSizes is only used for the reader to know what the magic numbers 1, 2, and 3 mean. If you don't care about database read-ability you can omit those tables and just have an indexed column.
Memory is usually cheap. Otherwise you could one-time sort all the values and retrieve based on comparison which would be O(n). You could keep track of the positions of things and retrieve faster that way.
Assume we have a Person table with 3 columns:
PersonId of type int. (Primary key)
Name of type string.
GenderId of type int (Foreign key referencing Gender table).
The Gender table consists of 2 columns:
GenderId of type int.
Name of type string.
My question is:
Is it worth implementing the Gender table? Or it causes performance degradation? What is the best way to handle this?
Edit 1:
I have to populate a drop down control with a list of fixed genders (female and male) in my UI.
I think the best approach in this case is a compromise:
Create a table called Gender with a single varchar column called 'Name' or 'Gender'. Gender is really a natural primary key. Put the values 'Male' and 'Female' in it.
Create foreign key to your Person table on a column named 'Gender'.
Now you only need to query from one table, but you're still protected from data inconsistencies by the foreign key, and you can pull the values for your dropdown from the Gender table if you want to. Best of both worlds.
Additionally, it makes life easier for someone working in the database, because they don't need to remember which arbitrary ids you've assigned to Male/Female.
If you have a field with only two possible values, you don't need another table for it. You can just use something like a BIT (0=male, 1=female) or a CHAR ('M' and 'F').
I am firm believe in lookup-tables for this -- which is essentially what is being proposed but with one distinction: use friendly non-auto-generated PKs.
For instance the PKs might be: "M", "F", "N" (and there might be 2-4 or so rows depending upon accepted gender classifications). Using a simple PK allows easy queries while still allowing a higher form of normalization and referential consistency constraints without having to employ check-constraints.
As the question proposes, I also employ additional columns, such as a Name/Title/Label as appropriate (these are useful as a reference and add self-documentation to the identities). McCarthy advocates using this data itself as the PK (which is one option), but I consider this a trait of the identity and use more terse hand-picked PK.
In this sense, I hold the entire concept of lookup-tables to provide the same sort of role as "constants" in code.
Normalizing gender into a separate table is overkill in this instance.
Why not just have GenderType as a string in the first table?
That way you save having to generate and store an extra GenderID (try to minimise the use of IDs as otherwise all you'll have in a table is a whole lot of columns just pointing to other tables... over normalization)
Adding to what other people are saying, you can also create an INDEX ( PersonId, GenderId ) to fasten up the calculations.
Given that you only have two possible genders, and that this is extremely unlikely to need to change in the future, I would not bother to have a separate table. Just add a column to your Person table. A join can be efficient if needed, but it is always slower than no join.
And if, for whatever reason, you feel the need for more than two possible genders, you can still store them in a single column in the Person table.
I have several tables within my database that contains nothing but "metadata".
For example we have different grouptypes, contentItemTypes, languages, ect.
the problem is, if you use automatic numbering then it is possible that you create gaps.
The id's are used within our code so, the number is very important.
Now I wonder if it isn't better not to use autonumbering within these tables?
Now we have create the row in the database first, before we can write our code. And in my opinion this should not be the case.
What do you guys think?
I would use an identity column as you suggest to be your primary key(surrogate key) and then assign your you candidate key (identifier from your system) to be a standard column but apply a unique constraint to it. This way you can ensure you do not insert duplicate records.
Make sense?
if these are FK tables used just to expand codes into a description or contain other attributes, then I would NOT use an IDENTITY. Identity are good for ever inserting user data, metadata tables are usually static. When you deploy a update to your code, you don't want to be suprised and have an IDENTITY value different than you expect.
For example, you add a new value to the "Languages" table, you expect the ID will be 6, but for some reason (development is out of sync, another person has not implemented their next language type, etc) the next identity you get is different say 7. You then insert or convert a bunch of rows having using Language ID=6 which all fail becuase it does not exist (it is 7 iin the metadata table). Worse yet, they all actuall insert or update because the value 6 you thought was yours was already in the medadata table and you now have a mix of two items sharing the same 6 value, and your new 7 value is left unused.
I would pick the proper data type based on how many codes you need, how often you will need to look at it (CHARs are nice to look at for a few values, helps with memory).
for example, if you only have a few groups, and you'll often look at the raw data, then a char(1) may be good:
GroupTypes table
-----------------
GroupType char(1) --'M'=manufacturing, 'P'=purchasing, 'S'=sales
GroupTypeDescription varchar(100)
however, if there are many different values, then some form of an int (tinyint, smallint, int, bigint) may do it:
EmailTypes table
----------------
EmailType smallint --2 bytes, up to 32k different positive values
EmailTypeDescription varchar(100)
If the numbers are hardcoded in your code, don't use identity fields. Hardcode them in the database as well as they'll be less prone to changing because someone scripted a database badly.
I would use an identity column as the primary key also just for simplicity sake of inserting the records into the database, but then use a column for type of metadata, I call mine LookUpType(int), as well as columns for LookUpId (int value in code) or value in select lists, LookUpName(string), and if those values require additional settings so to speak use extra columns. I personally use two extras, LookUpKey for hierarchical relations, and LookUpValue for abbreviations or alternate values of LookUpName.
Well, if those numbers are important to you because they'll be in code, I would probably not use an IDENTITY.
Instead, just make sure you use a INT column and make it the primary key - in that case, you will have to provide the ID's yourself, and they'll have to be unique.