Linq2SQL dealing with inserts/deletes on table with unique constraints

Linq2SQL dealing with inserts/deletes on table with unique constraints - c#

I have a table that looks like the following:
TABLE Foo
{
Guid Id [PK],
int A [FK],
int B [FK],
int C [FK],
}
And unique constraint over A, B and C.
Now say for example, you insert a row with a fresh PK with with A = 1, B = 1, C = 1.
SubmitChanges(), all happy.
Now you edit the table.
You remove the previous entry, and insert a row with a fresk PK with A = 1, B = 1, C = 1.
SubmitChanges() BOOM! Unique key constraint SQL exception.
From what I can see, it attempts to first insert the new record, and then try to delete the previous one. I can even understand that it is not possible to determine the order this needs to happen.
But what can I do about it? Would making those 3 fields a composite PK (and removing the old one) be a better solution or wont it even work?
For now, the 'solution' is to remove the unique constraints from the DB (but I'll rather not do so).

One option would be to create a transaction (either a connection-bound transaction, or a TransactionScope) - remove the record and SubmitChanges, add the record and SubmitChanges, then finally commit the transaction (or roll-back if you blew up).
Note that you can associate a connection-bound transaction through the data-context constructor IIRC. TransactionScope should also work, and is easier to do - but not quite as efficient.
Alternatively, write an SP that does this swap job at the database, and access that SP via the data-context.

I had the same problem. Ended up writing a wrapper class with an 'Added' and 'Deleted' collection of entities that I maintained. As well as a 'Current' collection. The UI was bound to the current collection.
Only when I go to save do I InsertOnSubmit / DeleteOnSubmit, and I parse the 2 collections to decide which entities to do what to.

Related

What logic determines the insert order of Entity Framework 6

So, I have a DBContext, and I am doing the following operations:
dbContext.SomeTables1.Add(object1)
dbContext.SomeTables2.AddRange(objectArray2)
dbContext.SomeTables3.AddRange(objectArray3)
dbContext.SaveChanges();
The EF doesn't insert the db records in this order, it inserts them in a random order. To insert them in the same order, I have to do a dbContext.SaveChanges() after each addition. This is not an efficient solution and in my case, it is taking 10 seconds to do all my inserts, while the random order with one save takes around 3 seconds.
N.B. I need the right order to solve a deadlock issue.
My questions are:
Is this issue resolved in EF7?
I can profile EF and determine the random order, however, is there a guarantee that it will be consistently with the same random order or
does it change between requests? (I can adopt my other code if the
answer to this question is positive).
Is there a better way of maintaining the order than dbContext.SaveChanges() on every addition?

There is no way you can specify a save order in EF6 or EF Core
The issue is not resolved in EF Core since this is not an issue.
The order will be the same if the predecessor is the same (which will likely rarely happen)
When you call SaveChanges, all entities are ordered from an internal order in the method “ProduceDynamicCommands” then sorted again by the method “TryTopologicalSort” which loops to add command with no predecessor left (if you add A and B and A depend on B, then B will be inserted before A)
You are left to insert by batch addition.
Since it takes you 3 seconds to perform your insert, I will assume you have thousands of entities and performing bulk insert may improve your performance to reduce the 10 seconds to less, and then maybe the initial 3 seconds!
To improve your performance, you can use http://entityframework-extensions.net/ (PAID but support all cases)
Disclaimer: I'm the owner of the Entity Framework Extensions project.

I've found a way to do it. It just thought I'd let you know:
using (var dbContextTransaction = Context.Database.BeginTransaction())
{
dbContext.SomeTables1.Add(object1);
dbContext.SaveChanges();
dbContext.SomeTables1.Add(object2);
dbContext.SaveChanges();
dbContextTransaction.Commit();
}

To explicitly set the values of the Primary Keys (and hence the order of the Clustered Index) in an Identity column in EF and EF Core, you need to manually turn on IDENTITY_INSERT before calling _context.SaveChanges() after which you need to turn off IDENTITY_INSERT like so:
This example assumes EF Core
// Add your items with Identity Primary Key field manually set
_context.SomeTables1.AddRange(yourItems);
_context.Database.OpenConnection();
try {
_context.Database.ExecuteSqlRaw("SET IDENTITY_INSERT dbo.SomeTables1 ON");
_context.SaveChanges();
_context.Database.ExecuteSqlRaw("SET IDENTITY_INSERT dbo.SomeTables1 OFF");
} finally {
_context.Database.CloseConnection();
}

I've found a very simple solution.
Just set the property for the ID (primary key) of the entity to a value that matches your desired order.
SaveChanges() first sorts by this ID, then by other properties.
The assigned ID may already exist in the database. A unique ID is assigned when writing to the database.
for(int i = 0; i < objectArray2.Count(); i++)
{
objectArray2[i].Id = i;
}
dbContext.SomeTables2.AddRange(objectArray2)

Swapping values with unique constraint in Entity Framework

I have a unique constraint on a Navigations table's column called Index. I have two Navigation entities and I want to swap their Index values.
When I call db.SaveChanges it throws an exception indicating that a unique constraint was violated. It seems EF is updating one value and then the other, thus violating the constraint.
Shouldn't it be updating them both in a transaction and then trying to commit once the values are sorted out and not in violation of the constraint?
Is there a way around this without using temporary values?

It is not problem of EF but the problem of SQL database because update commands are executed sequentially. Transaction has nothing to do with this - all constrains are validated per command not per transaction. If you want to swap unique values you need more steps where you will use additional dummy values to avoid this situation.

You could run a custom SQL Query to swap the values, like this:
update Navigation
set valuecolumn =
case
when id=1 then 'value2'
when id=2 then 'value1'
end
where id in (1,2)
However, Entity Framework cannot do that, because it's outside the scope of an ORM. It just executes sequential update statements for each altered entity, like Ladislav described in his answer.
Another possibility would be to drop the UNIQUE constraint in your database and rely on the application to properly enforce this constraint. In this case, the EF could save the changes just fine, but depending on your scenario, it may not be possible.

There are a few approaches. Some of them are covered in other answers and comments but for completeness, I will list them out here (note that this is just a list that I brainstormed and it might not be all that 'complete').
Perform all of the updates in a single command. See W0lf's answer for an example of this.
Do two sets of updates - one to swap all of the values to the negative of the intended value and then a second to swap them from negative to positive. This is working on the assumptions that negative values are not prevented by other constraints and that they are not values that records other than those in a transient state will have.
Add an extra column - IsUpdating for example - set it to true in the first set of updates where the values are changed and then set it back to false in a second set of updates. Swap the unique constraint for a filtered, unique index which ignores records where IsUpdating is true.
Remove the constraint and deal with duplicate values.

What is the right order of insertion/deletion/modification on dataset?

The MSDN claims that the order is :
Child table: delete records.
Parent table: insert, update, and delete records.
Child table: insert and update records.
I have a problem with that.
Example : ParentTable have two records parent1(Id : 1) and parent2(Id : 2)
ChildTable have a record child1(Id : 1, ParentId : 1)
If we update the child1 to have a new parent parent2, and then we delete parent1.
We have nothing to delete in child table
We delete parent1 : we broke the constraint, because the child is still attached to parent1, unless we update it first.
So what is the right order, and is the MSDN false on the subject?
My personnals thoughts is
Child table: delete records.
Parent table: insert, update records.
Child table: insert and update records.
Parent table: delete records.
But the problem is, with potentially unique constraint, we must always delete the records in a table before adding new... So I have no solution right now for commiting my datas to my database.
Edit : thanks for the answers, but your corner case is my daily case... I opt for the ugly solution to disabled constraint, then update database, and re-enabled constraint. I'm still searching a better solution..

Doesn't your SQL product support deferred constraint checking ?
If not, you could try
Delete all child records - delete all parent records - insert all parent records - insert all child records
where any UPDATEs have been split into their constituent DELETEs and INSERTs.
This should work correctly in all cases, but at acceptable speeds probably in none ...
It is also provable that this is the only scheme that can work correctly in all cases, since :
(a) key constraints on parent dictate that parent DELETES must precede parent INSERTS,
(b) key constraints on child dictate that child DELETES must precede child INSERTS,
(c) FK dictates that child DELETES must precede parent DELETES
(d) FK also dictates that child INSERTS must follow parent INSERTS
The given sequence is the only possible one that satisfies these 4 requirements, and it also shows that UPDATEs to the child make a solution impossible no matter what, since an UPDATE means a "simultaneous" DELETE plus INSERT.

You have to take their context into account. MS said
When updating related tables in a dataset, it is important to update
in the proper sequence to reduce the chance of violating referential
integrity constraints.
in the context of writing client data application software.
Why is it important to reduce the chance of violating referential integrity constraints? Because violating those constraints means
more round trips between the dbms and the client, either for the client code to handle the constraint violations, or for the human user to handle the violations,
more time taken,
more load on the server,
more opportunities for human error, and
more chances for concurrent updates to change the underlying data (possibly confusing either the application code, the human user, or both).
And why do they consider their procedure the right way? Because it provides a single process that will avoid referential integrity violations in almost all the common cases, and even in a lot of the uncommon ones. For example . . .
If the update is a DELETE operation on the referenced table, and if foreign keys in the referencing tables are declared as ON DELETE CASCADE, then the optimal thing is to simply delete the referenced row (the parent row), and let the dbms manage the cascade. (This is also the optimal thing for ON DELETE SET DEFAULT, and for ON DELETE SET NULL.)
If the update is a DELETE operation on the referenced table, and if foreign keys in the referencing tables are declared as ON DELETE RESTRICT, then the optimal thing is to delete all the referencing rows (child rows) first, then delete the referenced row.
But, with proper use of transactions, MS's procedure leaves the database in a consistent state regardless. The value is that it's a single, client-side process to code and to maintain, even though it's not optimal in all cases. (That's often the case in software design--choosing a single way that's not optimal in all cases. ActiveRecord leaps to mind.)
You said
Example : ParentTable have two records parent1(Id : 1) and parent2(Id
: 2)
ChildTable have a record child1(Id : 1, ParentId : 1)
If we update the child1 to have a new parent parent2, and the we
delete parent1.
We have nothing to delete in child table
We delete parent1 : we broke the constraint, because the child is still attached to parent1, unless we update it first.
That's not a referential integrity issue; it's a procedural issue. This problem clearly requires two transactions.
Update the child to have a new parent, then commit. This data must be corrected regardless of what happens to the first parent. Specifically, this data must be corrected even if there are concurrent updates or other constraints that make it either temporarily or permanently impossible to delete the first parent. (This isn't a referential integrity issue, because there's no ON DELETE SET TO NEXT PARENT ID OR MAKE YOUR BEST GUESS clause in SQL foreign key constraints.)
Delete the first parent, then commit. This might require first updating any number of child rows in any number of tables. In a huge organization, I can imagine some deletes like this taking weeks to finish.

Sounds to me like:
Insert parent2. Child still points to parent1.
Update child to point to parent2. Now nothing references parent1.
Delete parent1.
You'd want to wrap it in a transaction where available.
Depending on your schema, you could also extend this to:
Update parent1 to indicate that it is locked (or lock it in the DB), thus preventing updates.
Insert parent2
Update child to point to parent2
Delete parent1
This order has the advantage that a join between the parent and child will return a consistent result throughout. When the child is updating the results of a join will "flip" to the new state.
EDIT:
Another option is to move the parent/child references into another table, e.g. "links";
CREATE TABLE links (
link_id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
parent_id INT NOT NULL,
child_id INT NOT NULL
);
You may well want foreign keys constraints on the parent and child columns, as of course some appropriate indices. This arrangement allows for very flexible relationships between the parent and child tables - possibly too flexible, but that depends on your application. Now you can do something like;
UPDATE links
SET parent_id = #new_parent_id
WHERE parent_id = #old_parent_id
AND child_id = #child_id;

The need to DELETE a parent record without deleting the child records is unusual enough that I am certain the normally prescribed order of dataset operations defined by MS does not apply in this case.
The most efficient method would be to UPDATE the child records to reflect the new parent, then DELETE the original parent. As others have mentioned, this operation should be performed within a transaction.

I think seperating actions on tables is not a good design, so my solution is
insert/update/delete parent table
insert/update/delete child table
the key point is you should not change parentId of a child record, you should delete child of parent1 and add a new child to parent2. by doing like this you will no longer worry about broke constraint. and off course you must use transaction.

MSDN claim is correct in the basis of using dependencies (foreign keys). Think of the order as
Child table (cascade delete)
Parent table: insert and/or update and/or delete record meaning final step of the cascade delete.
Child table: insert or update.
Since we talk about cascade delete, we must guarantee that by deleting a parent record, there is a need to delete any child record relating to parent before we delete the parent record. If we don't have child records, there is no delete at child level. That's all.
On the other hand you may approach you case in different ways. I think that a real life (almost) scenario will be more helpful. Let's assume that the parent table is the master part of orders (orderID, clientID, etc) and the child table is the details part (detailID, orderID, productOrServiceID, etc). So you get an order and you have the following
Parent table
orderID = 1 (auto increment)
...
Child table
detailID = 1 (auto increment)
orderID = 1
productOrServiceID = 342
and
detailID = 2
orderID = 1
productOrServiceID = 169
and
detailID = 3
orderID = 1
productOrServiceID = 307
So we have one order for three products/services. Now your client wants you to move the second product or service to a new order and deliver it later. You have two options to do this.
The first one (direct)
Create a new order (new parent record) that gets orderID = 2
Update child table by setting orderID = 2 where orderID = 1 and productOrServiceID = 169
As a result you will have
Parent table
orderID = 1 (auto increment)
...
and
orderID = 2
...
Child table
detailID = 1 (auto increment)
orderID = 1
productOrServiceID = 342
and
detailID = 2
orderID = 2
productOrServiceID = 169
and
detailID = 3
orderID = 1
productOrServiceID = 307
The second one (indirect)
Keep a DataRow of the second product/service from child table as a variable
Delete the relative row from child table
Create a new order (new parent record) that gets orderID = 2
Insert the kept DataRow on child table by changing the field orderID from 1 to 2
As a result you will have
Parent table
orderID = 1 (auto increment)
...
and
orderID = 2
...
Child table
detailID = 1 (auto increment)
orderID = 1
productOrServiceID = 342
and
detailID = 3
orderID = 1
productOrServiceID = 307
and
detailID = 4
orderID = 2
productOrServiceID = 169
The reason for the second option, which is by the way the preferable one for many applications, is that gives raw sequences of detail ids for each parent record. I have seen cases of expanding the second option by recreating all details records. I think that is quite easy to find open source solutions relating to this case and check the implementation.
Finally my personal advice is to avoid doing this kind of stuff with datasets unless your application is single user. Databases can easily handle this "problem" in a thread safe way with transactions.

How to mimic the use of sequences (from Oracle) using SQL Server and Entity Framework

We have always enjoyed the use of a sequence in Oracle databases in order to create globally-unique primary key IDs across an entire database. So much, that we will mimic the same thing when using SQL Server databases:
CREATE TABLE MainSequence(
Id int IDENTITY(1,1) CONSTRAINT pkMainSequence PRIMARY KEY
)
I'm trying to switch over to Entity Framework, which is very new to me. It seemed like it would be trivial to create an extension method that I could use to quickly get the next available globally-unique Id.
public static int GetNextId( this Entities db ) {
var ms = new MainSequence();
db.MainSequences.AddObject( ms );
db.SaveChanges( SaveOptions.None );
return ms.Id;
}
Since it's an identity column, all I should have to do is add a new object to the database and save the changes so that the Id property is updated with a real value. This works fine. But I seem to run into trouble when trying to use it for foreign-key-related tables:
var dataId = db.GetNextId();
db.Datas.AddObject( Data.CreateData( dataId, someValueForThisColumn );
db.Caches.AddObject(
Cache.CreateCache( db.GetNextId(),
DatabaseMethods.GetOrAddLocation( source.GetLocationText() ),
DateTime.Now,
dataId ) );
The Cache table has a foreign key to the Data table. When SaveChanges(); is called immediately after this, an exception is generated.
System.Data.SqlClient.SqlException: Violation of PRIMARY KEY constraint 'pkData'. Cannot insert duplicate key in object 'dbo.Data'. The duplicate key value is (78162).
The statement has been terminated.
For some reason it appears the new data row is trying to get inserted into the database twice, though I'm not sure why that would be. I've confirmed that for every time the code is run, a different MainSequence ID is returned. It seems as though calling db.SaveChanges whenever a new ID is obtained is the problem, but there's no other way that can get populated with a real int and I don't see why it would be a problem. What am I doing wrong?

My problem is the line db.SaveChanges( SaveOptions.None );. My reason for using this option was to avoid SaveOptions.AcceptAllChangesAfterSave, which in my mind, through information gathered from other StackOverflow questions (either invalid or misunderstood) I thought would provide me a basic form of database transaction. I did read the MSDN documentation several times:
After changes are saved, the AcceptAllChangesAfterSave() method is called, which resets change tracking in the ObjectStateManager.
That to me sounds an awful like transaction-change-tracking. I thought that if the changes were not accepted on a save call that the database "transaction" would be rolled back in the case of an error (I later found out this wasn't the case). Instead, all this does is make the ObjectContext still consider the change to be unsaved. So upon the next save call, the "unsaved" changes would be re-applied. Ta-da, primary key exception.This also explains why my main sequence was skipping values.

"A cycle was detected in the set of changes" When trying to add a circularly linked list to the database

I'm using a 'in database' circularly linked list (cll). I'm inserting the database entries forming these cll's using Linq to Sql.
They have the general form:
id uuid | nextId uuid | current bit
If i try to do a SubmitChanges with a few objects forming a complete cll, i get the error "A cycle was detected in the set of changes".
I can circumvent this by making the linked list 'circular' in a separate SubmitChanges, but this has two down sides: I'm losing my capability to do this in one transaction. For a small period the data in my database isn't correct.
Is there a way to fix this behaviour?

The database needs to enforce its contraints, and I imagine you have a foreign key constraint between nextId and Id. If this chain of relations leads back to the start (as you have found) the database will not allow it.
I suspect your choices are:
Remove the foreign key constraint.
Store in the DB as a linked list, and only join the head with the tail in your code.
Even your second option won't work, as the DB won't allow you to add this last reference.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.