Database Design Deleting parent and his children in nested Parent-child situation

Database Design Deleting parent and his children in nested Parent-child situation - c#

In Oracle I have a table called Category consisting of three columns:
ID = which is system produced unique key ,
Catgeory_name = which is 300 char ,
and parent_id = which either could be -1 which means no parent for this category, or it could be a value from the ID column described earlier as the parent_id.
The problem is when I delete a category who is a parent, I need to automatically delete all the children as well. My question is : Does SQL provide any means to do this automatically or should I take care of it in my upper layer langugae which is C#.
For example if there was a foreign key situation between two tables, I know SQL provides ON DELETE CASCADE to delete the dependent records as well as the parent record upon a delete request for the origianal record.
However, I don't know of any way in SQL that would take care of the above situation automatically, meaning when the parent is deleted in the above table, all the children get deleted as well.
Thanks in advance for your help.

If the parent_id was set to NULL if there was no parent, you could define a foreign key on category that referenced the primary key in category
SQL> create table category (
2 id number primary key,
3 category_name varchar2(300),
4 parent_id number references category( id )
5 );
Table created.
You could then declare that foreign key constraint to automatically delete the children when the parent row is deleted.
If you really want to use a magic value of -1 to indicate the absence of a parent rather than using a proper NULL, you could potentially insert a row into the category table with an id of -1 and then create the foreign key constraint. But that is much less elegant than using a NULL.

Related

Database id column or code column

I am working on a project where I have to implement some new functionality.
In the process I have to design some tables and build some editors for that data.
I have one table for categories and one for types.
On client side I have to build some lists using those types, but each list must use types from only one category.
I don't like the idea to use PKs in my C#. I would rather create a column name "Code" in the category table and use that in my C# code, when preparing the lists.
EDIT: I do not mean removing the PK (I will have an int base Id column). I mean adding another column to the category ("Code") to use jsut in C# as string constants, instead of ids.
Is this an okay idea?

If I've understood your question, I would recommend the following structure for your tables:
Table Category
ID Int -- primary key
Code Varchar(8) -- Code value displayed to users
Description Varchar(100)
Table Item
ID Int -- primary key
CategoryID Int -- foreign key to Category
Code Varchar(8) -- Code value displayed to users
Description Varchar(100)
This way, if you change a Category record's Code, nothing changes behind the scenes, and the key values are never exposed to your users.

What is the right order of insertion/deletion/modification on dataset?

The MSDN claims that the order is :
Child table: delete records.
Parent table: insert, update, and delete records.
Child table: insert and update records.
I have a problem with that.
Example : ParentTable have two records parent1(Id : 1) and parent2(Id : 2)
ChildTable have a record child1(Id : 1, ParentId : 1)
If we update the child1 to have a new parent parent2, and then we delete parent1.
We have nothing to delete in child table
We delete parent1 : we broke the constraint, because the child is still attached to parent1, unless we update it first.
So what is the right order, and is the MSDN false on the subject?
My personnals thoughts is
Child table: delete records.
Parent table: insert, update records.
Child table: insert and update records.
Parent table: delete records.
But the problem is, with potentially unique constraint, we must always delete the records in a table before adding new... So I have no solution right now for commiting my datas to my database.
Edit : thanks for the answers, but your corner case is my daily case... I opt for the ugly solution to disabled constraint, then update database, and re-enabled constraint. I'm still searching a better solution..

Doesn't your SQL product support deferred constraint checking ?
If not, you could try
Delete all child records - delete all parent records - insert all parent records - insert all child records
where any UPDATEs have been split into their constituent DELETEs and INSERTs.
This should work correctly in all cases, but at acceptable speeds probably in none ...
It is also provable that this is the only scheme that can work correctly in all cases, since :
(a) key constraints on parent dictate that parent DELETES must precede parent INSERTS,
(b) key constraints on child dictate that child DELETES must precede child INSERTS,
(c) FK dictates that child DELETES must precede parent DELETES
(d) FK also dictates that child INSERTS must follow parent INSERTS
The given sequence is the only possible one that satisfies these 4 requirements, and it also shows that UPDATEs to the child make a solution impossible no matter what, since an UPDATE means a "simultaneous" DELETE plus INSERT.

You have to take their context into account. MS said
When updating related tables in a dataset, it is important to update
in the proper sequence to reduce the chance of violating referential
integrity constraints.
in the context of writing client data application software.
Why is it important to reduce the chance of violating referential integrity constraints? Because violating those constraints means
more round trips between the dbms and the client, either for the client code to handle the constraint violations, or for the human user to handle the violations,
more time taken,
more load on the server,
more opportunities for human error, and
more chances for concurrent updates to change the underlying data (possibly confusing either the application code, the human user, or both).
And why do they consider their procedure the right way? Because it provides a single process that will avoid referential integrity violations in almost all the common cases, and even in a lot of the uncommon ones. For example . . .
If the update is a DELETE operation on the referenced table, and if foreign keys in the referencing tables are declared as ON DELETE CASCADE, then the optimal thing is to simply delete the referenced row (the parent row), and let the dbms manage the cascade. (This is also the optimal thing for ON DELETE SET DEFAULT, and for ON DELETE SET NULL.)
If the update is a DELETE operation on the referenced table, and if foreign keys in the referencing tables are declared as ON DELETE RESTRICT, then the optimal thing is to delete all the referencing rows (child rows) first, then delete the referenced row.
But, with proper use of transactions, MS's procedure leaves the database in a consistent state regardless. The value is that it's a single, client-side process to code and to maintain, even though it's not optimal in all cases. (That's often the case in software design--choosing a single way that's not optimal in all cases. ActiveRecord leaps to mind.)
You said
Example : ParentTable have two records parent1(Id : 1) and parent2(Id
: 2)
ChildTable have a record child1(Id : 1, ParentId : 1)
If we update the child1 to have a new parent parent2, and the we
delete parent1.
We have nothing to delete in child table
We delete parent1 : we broke the constraint, because the child is still attached to parent1, unless we update it first.
That's not a referential integrity issue; it's a procedural issue. This problem clearly requires two transactions.
Update the child to have a new parent, then commit. This data must be corrected regardless of what happens to the first parent. Specifically, this data must be corrected even if there are concurrent updates or other constraints that make it either temporarily or permanently impossible to delete the first parent. (This isn't a referential integrity issue, because there's no ON DELETE SET TO NEXT PARENT ID OR MAKE YOUR BEST GUESS clause in SQL foreign key constraints.)
Delete the first parent, then commit. This might require first updating any number of child rows in any number of tables. In a huge organization, I can imagine some deletes like this taking weeks to finish.

Sounds to me like:
Insert parent2. Child still points to parent1.
Update child to point to parent2. Now nothing references parent1.
Delete parent1.
You'd want to wrap it in a transaction where available.
Depending on your schema, you could also extend this to:
Update parent1 to indicate that it is locked (or lock it in the DB), thus preventing updates.
Insert parent2
Update child to point to parent2
Delete parent1
This order has the advantage that a join between the parent and child will return a consistent result throughout. When the child is updating the results of a join will "flip" to the new state.
EDIT:
Another option is to move the parent/child references into another table, e.g. "links";
CREATE TABLE links (
link_id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
parent_id INT NOT NULL,
child_id INT NOT NULL
);
You may well want foreign keys constraints on the parent and child columns, as of course some appropriate indices. This arrangement allows for very flexible relationships between the parent and child tables - possibly too flexible, but that depends on your application. Now you can do something like;
UPDATE links
SET parent_id = #new_parent_id
WHERE parent_id = #old_parent_id
AND child_id = #child_id;

The need to DELETE a parent record without deleting the child records is unusual enough that I am certain the normally prescribed order of dataset operations defined by MS does not apply in this case.
The most efficient method would be to UPDATE the child records to reflect the new parent, then DELETE the original parent. As others have mentioned, this operation should be performed within a transaction.

I think seperating actions on tables is not a good design, so my solution is
insert/update/delete parent table
insert/update/delete child table
the key point is you should not change parentId of a child record, you should delete child of parent1 and add a new child to parent2. by doing like this you will no longer worry about broke constraint. and off course you must use transaction.

MSDN claim is correct in the basis of using dependencies (foreign keys). Think of the order as
Child table (cascade delete)
Parent table: insert and/or update and/or delete record meaning final step of the cascade delete.
Child table: insert or update.
Since we talk about cascade delete, we must guarantee that by deleting a parent record, there is a need to delete any child record relating to parent before we delete the parent record. If we don't have child records, there is no delete at child level. That's all.
On the other hand you may approach you case in different ways. I think that a real life (almost) scenario will be more helpful. Let's assume that the parent table is the master part of orders (orderID, clientID, etc) and the child table is the details part (detailID, orderID, productOrServiceID, etc). So you get an order and you have the following
Parent table
orderID = 1 (auto increment)
...
Child table
detailID = 1 (auto increment)
orderID = 1
productOrServiceID = 342
and
detailID = 2
orderID = 1
productOrServiceID = 169
and
detailID = 3
orderID = 1
productOrServiceID = 307
So we have one order for three products/services. Now your client wants you to move the second product or service to a new order and deliver it later. You have two options to do this.
The first one (direct)
Create a new order (new parent record) that gets orderID = 2
Update child table by setting orderID = 2 where orderID = 1 and productOrServiceID = 169
As a result you will have
Parent table
orderID = 1 (auto increment)
...
and
orderID = 2
...
Child table
detailID = 1 (auto increment)
orderID = 1
productOrServiceID = 342
and
detailID = 2
orderID = 2
productOrServiceID = 169
and
detailID = 3
orderID = 1
productOrServiceID = 307
The second one (indirect)
Keep a DataRow of the second product/service from child table as a variable
Delete the relative row from child table
Create a new order (new parent record) that gets orderID = 2
Insert the kept DataRow on child table by changing the field orderID from 1 to 2
As a result you will have
Parent table
orderID = 1 (auto increment)
...
and
orderID = 2
...
Child table
detailID = 1 (auto increment)
orderID = 1
productOrServiceID = 342
and
detailID = 3
orderID = 1
productOrServiceID = 307
and
detailID = 4
orderID = 2
productOrServiceID = 169
The reason for the second option, which is by the way the preferable one for many applications, is that gives raw sequences of detail ids for each parent record. I have seen cases of expanding the second option by recreating all details records. I think that is quite easy to find open source solutions relating to this case and check the implementation.
Finally my personal advice is to avoid doing this kind of stuff with datasets unless your application is single user. Databases can easily handle this "problem" in a thread safe way with transactions.

Deleting multiple tables from SQL Server 2008 using Datalist C#

This may seem a common question but I googled to find the right answer that can fix my problem and failed to do so.
I have multiple tables connected to each other by ProductID and I wish to delete all data from them when the product from main table has been deleted. i.e.
Products : ProductID - Vender - Description
ProductRatings : ProductID - Rating - VisitorsCount
ProductComments : ProductID - VisitorName - Comment
I read that for such situation a SQL trigger is used but I have no idea about it besides I might be mentioning my DataSource in ASCX.CS file in some cases and in some cases I might simply use SqlDatasoruce in ASCX file. Is there any query or stored procedure that can be used?

The easiest way to do this is to implement a foreign key relationship to ProductID and set on delete cascade. This is a general idea:
create table ProductRatings
(
ProductID int not null
foreign key references Products(ProductID) on delete cascade,
Rating int not null,
VisitorsCount int not null
)
What that does is when you delete a primary key value from the Products table, that causes SQL Server to delete all records that have a foreign key constraint to that primary key value. If you do this with your ProductComments table as well, problem solved. No need to explicitly call a DELETE on any records in the referencing tables.
And if you aren't using referential integrity...you should.
EDIT: this also holds true for UPDATEs on the primary key. You just need to specify on update cascade, and the foreign key references will update as the primary key did to ensure RI.

Database design for compiling records from another table

What's the best way to design a table that reference multiple records from another table?
For example, there is a table called diary that stores subjects, descriptions and keywords, then another table called DiaryCompilation for combining all selected records into a book by just referencing the id from the diary.
What's the best way to create the DiaryCompilation?
I was thinking of consisting it into two fields: id, references
where in all selected records are placed in references, but is it a good practice or are there better approaches?

<--- Each record is a new entry of diary --->
Diary: ID, Subject, Description, Keywords
<--- Single record per compilation for summary info --->
DiaryCompilation: ID, Title
<--- Pages of Diary Compilation --->
DiaryCompilationEntries: DiaryCompilationID, DiaryID

Diary and DiaryCompilation would have a 1 to many relationship. Just make sure that the Id in the DairyCompilation is setup as the Primary Key and put a Foreign Key constraint on it so that it ties back to an Id in the Diary table. This will prevent you from deleting a diary and orphaning a record in the DiaryCompilation table. As long as you have a normalized data model you should be fine.

Keep your fields atomic. Placing several values in references field would make it much harder to query and to enforce referential constraints.
A 1:N relationship between parent and child is modeled by migrating parent's primary key into child. In your case, this would look something like this:
COMPILATION (
COMPILATION_ID PK
-- Other fields...
)
DIARY (
DIARY_ID PK
COMPILATION_ID FK(COMPILATION)
SUBJECT
DESCRIPTION
)
-- Not a good idea to have several keywords in a single field, so we need a separate table for keywords.
DIARY_KEYWORD (
DIARY_ID PK, FK(DIARY)
KEYWORD PK
)
If you actually want N:N relationship (i.e. diary can be part of more than one compilation), you'll need a dedicated table to hold these connections, something like this:
COMPILATION (
COMPILATION_ID PK
)
DIARY_IN_COMPILATION (
COMPILATION_ID PK, FK(COMPILATION)
DIARY_ID PK, FK(DIARY)
)
DIARY (
DIARY_ID PK
SUBJECT
DESCRIPTION
)
DIARY_KEYWORD (
DIARY_ID PK, FK(DIARY)
KEYWORD PK
)

What's the best way to create the DiaryCompilation?
Sounds like the best way might be to use a view instead of a table.

When deleting a record that is referenced in another table, how do I know when to stop?

I have the following table structure in my database:
create table Encargado(
ID int primary key,
Nombre varchar(300),
)
create table Area(
ID int primary key,
Nombre varchar(300),
Jefe int foreign key references Encargado(ID)
)
create table Carrera(
ID int primary key,
Nombre varchar(300),
Area int foreign key references Area(ID)
)
create table Formacion(
ID int primary key,
Grado varchar(300),
Lugar varchar(300)
)
create table Docente(
ID int primary key,
Nombre varchar(300),
Carrera int foreign key references Carrera(ID),
Formacion int foreign key references Formacion(ID),
Horario varchar(300)
)
create table Evaluacion(
ID int primary key,
Docente int foreign key references Docente(ID),
Evaluador varchar(300),
Secuencia int,
Pizarra int,
Audiovisual int,
Letra int,
Voz int,
GestosVocabulario int,
Ejemplificacion int,
Respuestas int,
DominioEscenico int,
Participacion int,
Observacion varchar(4000),
Materias varchar(3000),
Valido bit
)
create table Seguimiento(
ID int primary key,
Docente int foreign key references Docente(ID),
Modulo int,
Semestre int,
Ano int,
Fecha datetime,
Hora datetime,
OrdenSecuencia bit,
OrdenSecuenciaObservacion varchar(200),
PortafolioAlumno bit,
PortalofioAlumnoObservacion varchar(200),
AspectosParaEntrevista varchar(3000),
Conclusiones varchar(3000),
Evaluador varchar(300),
DirectorDeArea int foreign key references Encargado(ID),
EncargadoControl int foreign key references Encargado(ID),
)
Say I want to delete an Area, how would I do this? I would need to also delete all Carreras and also all the Docentes.
public void Delete(Area area)
{
db.Carreras.DeleteAllOnSubmit(area.Carreras);
//I'm stuck here. Is this what I should be doing?
}
Can someone suggest how to handle this?
I'm using C# and Linq-to-SQL. I feel I may have dug myself into a hole by using this table structure or perhaps that's one of the downfalls of a relational database? :\

I wouldn't handle this on the Linq-to-SQL side, I'd use cascading deletes on the database side if you truly want to delete all the child records.
For example, with Oracle you can add a "ON DELETE CASCADE" clause to your create table statements, refer to this link.
The cascading delete will handle deleting all the records from the child tables, all with a single delete operation. The beauty of this approach is that no matter where you perform the operation, albeit via Linq-To-SQL, JAVA, ROR, PHP, etc, the logic is centralized in the DB so it works the same way no matter who does the delete.

It should depend on how you want to handle your foreign key relationship. i.e. deleting foreign references or leaving them in case they have other entries dependent on them, etc.
See referential integrity http://msdn.microsoft.com/en-us/library/ms186973.aspx
So, in the end you should probably let the DB handle it.

Say I want to delete an Area, how
would I do this? I would need to also
delete all Carreras and also all the
Docentes.
On the face of it, it seems you want to change the declarative referential integrity (DRI) action for ON DELETE from the default NO ACTION (i.e. prevent the referenced row from being deleted) to CASCADE (i.e. also delete the rows in the referening table).
Note that such logic usually suggests that the referening column (e.g. Carrera.Area) should be defined as NOT NULL.
For example:
CREATE TABLE Carrera
(
...
Area INTEGER NOT NULL
REFERENCES Area (ID)
ON DELETE CASCADE
);
CREATE TABLE Docente
(
...
Carrera INTEGER NOT NULL
REFERENCES Carrera (ID)
ON DELETE CASCADE,
...
);
However, looking deeper we see that
Evaluacion REFERENCES Docente
Seguimiento REFERENCES Docente
You need to consider whether these too require the ON DELETE CASCADE DRI action.
Furthermore:
Seguimiento REFERENCES Encargado -- twice
Area REFERENCES Encargado
In other words, you have a potential cycle here. Even if your DBMS would allow ON DELETE CASCADE DRI actions on all these (SQL Server, for example, would not) you should consider managing the logic by 'manually' removing rows.
Something else to consider, seeing all those seemingly NULLable columns (but how can you primary key columns be nullable...?) you could consider the ON DELETE SET NULL DRI action. Personally, I would clarify the design by removing the NULLable columns are creating new relationship tables but that could involve a lot of work :)

Have you considered a "logical delete" instead of a physical delete?
Logical deletes make sense when you want to keep historical access to data (for reports or queries) even after they have become obsolete.
Example: your school used to teach Latin, and have a number of professors teaching it, and a number of students enrolled.
Next year, Latin gets removed from the available courses. One of the professors retires, the others go on with other courses. Students still need to prove they got a vote in Latin, even if this will not be part of future offerings.
Solution: add a boolean flag to the Course table (Active=Y/N) and adapt your program so that it excludes Courses (or professors, or anything else) having "Active=N" from queries that must return what is "live", and keep them in for historical reports.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.