I have written a following piece of code:
public void BulkUpdateItems(List<Items> items)
{
var bulk = new BulkOperations();
using (var trans = new TransactionScope())
{
using (SqlConnection conn = new SqlConnection(#"connstring"))
{
bulk.Setup()
.ForCollection(items)
.WithTable("Items")
.AddColumn(x => x.QuantitySold)
.BulkUpdate()
.MatchTargetOn(x => x.ItemID)
.Commit(conn);
}
trans.Complete();
}
}
With using a SQLBulkTools library... But the problem here is when I run this procedure from multiple threads at a time I run on deadlocks...
And the error states that a certain process ID was deadlocked or something like that....
Is there any alternative to perform a bulk update of 1 table from multiple threads in an efficient way?
Can someone help me out?
I don't know much about that API but a quick read suggests a few things you could try. I would try them in the order listed.
Use a smaller batch size, and/or set the batch timeout higher. This will let each thread take turns.
Use a temporary table. This will allow the threads to work independently.
Set the options to use a table lock. If you lock the whole table, different threads won't be able to lock different rows, so you shouldn't get any deadlocks.
The deadlock message is coming from SQL Server - it means that one of your connections is waiting on a resource locked by another, and that second connection is waiting on a resource held on the first.
If you are trying to update the same table, you are likely running into a simple SQL locking issue and nothing really to do with C#. You need to think more thoroughly about the implications of doing a bulk update on multiple threads; its probably (depending on the percentage of the table you are updating) better to do this on a single connection and use a queue style of mechanism to de-conflict the individual calls.
Try
lock
{
....
}
What this will do is, when one process is executing the code within the curly braces, it will cause other processes to wait until the first one is finished. In that way, only one process will execute the block at a time.
I need to perform atomic operation check value of some field of some entity framework model and increase it if its value is 0.
I thought about transactions, sth like:
bool controlPassed = false;
using (TransactionScope scope = new TransactionScope())
{
var model = ...ModelEntities.first_or_default(...)
if (model.field == 0){
++model.field;
...saveChanges();
controlPassed = true;
}
scope.Complete();
}
if (controlPassed)
{
...
using (TransactionScope scope = new TransactionScope())
{
--model.field;
...saveChanges();
scope.Complete();
}
}
Of course, everything in try catch and so on.
My question is: how would it work?
It is really hard to check.
I have multithreaded application.
Is there a possibility, that two or more threads would pass control (check that field == 0 and increase it)?
Whout would be blocked in database (database, table, row, field)?
I can't let two or more threads to be in controlPassed section simultaneously.
Is there a possibility, that two or more threads would pass control
(check that field == 0 and increase it)?
You have Serializable transaction (that is default for TransactionScope). It means that there can be two threads with field == 0 but immediately after that deadlock happens because transaction for the first thread holds a shared lock on the filed and transaction for the second thread holds another shared lock for the same field. Neither of these transaction can upgrade the lock to exclusive to save changes because they are blocked by shared lock in other transaction. I think same would happen for RepeatableRead isolation level.
If you change isolation level to ReadCommitted (the default for SaveChangeswithout TransactionScope when using MS SQL Server) the answer will be again yes but this time without deadlock because EF uses normal selects without any table hints - that means that no lock on record is held when select completes. Only save changes (modification) locks records until transaction commits or rolls back.
To lock record in ReadCommitted transaction with select you must use native SQL query and UPDLOCK (to lock record for update during select and held it until transaction ends) table hint. EF queries do not support table hints.
Edit: I wrote a long article about pessimistic concurrency which describes why your solution doesn't work and what must be changed to make it work correctly.
Database : SQL server 2005
Programming language : C#
I have a method that does some processing with the User object passed to it. I want to control the way this method works when it is called by multiple threads with the same user object. I have implemented a simple locking which make use of database. I can't use the C#'s lock statement as this method is on a API which will be delivered on different machines. But the database is centralized.
Following code shows what I have. (Exception handled omitted for clarity)
Eg:
void Process(User user)
{
using(var transaction = BeginTransaction())
{
if(LockUser()) {
try {
/* Other processing code */
}
finally {
UnLockUser();
}
}
}
}
LockUser() inserts a new entry into a database table. This table has got a unique constraint on the user id. So when the second thread tries to insert the same data, constraint gets violated and will be an exception. LockUser() catches it and return false. UnlockUser just deletes the entry from the lock table.
Note: Please don't consider the possibilities of lock not getting deleted correctly. We have a SQL job that cleans items that are locked for long time.
Question
Consider two threads executing this method at same time and both of them started the transaction. Since transaction is committed only after all processing logic, will the transaction started on thread2 see the data inserted by thread1 to the lock table?
Is this locking logic OK? Do you see any problems with this approach?
If the acquisition of the lock - by virtue of inserting an entry into the database table - is part of the same transaction then either all or none of the changes of that transaction will become visible to the second thread. This is true for the default isolation level (ReadCommitted).
In other words: Whichever thread has a successful commit of that single transaction has also successfully acquired the lock (= inserted successfully the entry into the database).
In your code example I'm missing the handling of Commit()/Rollback(). Make sure you consider this as part of your implementation.
It depends on the transaction isolation level that you use.
The default isolation (ReadCommitted) level assures that other connections cannot see the uncommitted changes that a connection is making.
When executing your SQL statement, you can explicitly acquire a lock by using locking hints.
I have a GUI C# application that has a single button Start/Stop.
Originally this GUI was creating a single instance of a class that queries a database and performs some actions if there are results and gets a single "task" at a time from the database.
I was then asked to try to utilize all the computing power on some of the 8 core systems. Using the number of processors I figure I can create that number of instances of my class and run them all and come pretty close to using a fair ammount of the computing power.
Environment.ProccessorCount;
Using this value, in the GUI form, I have been trying to go through a loop ProccessorCount number of times and start a new thread that calls a "doWork" type method in the class. Then Sleep for 1 second (to ensure the initial query gets through) and then proceed to the next part of the loop.
I kept on having issues with this however because it seemed to wait until the loop was completed to start the queries leading to a collision of some sort (getting the same value from the MySQL database).
In the main form, once it starts the "workers" it then changes the button text to STOP and if the button is hit again, it should execute on each "worker" a "stopWork" method.
Does what I am trying to accomplish make sense? Is there a better way to do this (that doesn't involve restructuring the worker class)?
Restructure your design so you have one thread running in the background checking your database for work to do.
When it finds work to do, spawn a new thread for each work item.
Don't forget to use synchronization tools, such as semaphores and mutexes, for the key limited resources. Fine tuning the synchronization is worth your time.
You could also experiment with the maximum number of worker threads - my guess is that it would be a few over your current number of processors.
While an exhaustive answer on the best practices of multithreaded development is a little beyond what I can write here, a couple of things:
Don't use Sleep() to wait for something to continue unless ABSOLUTELY necessary. If you have another code process that you need to wait for completion, you can either Join() that thread or use either a ManualResetEvent or AutoResetEvent. There is a lot of information on MSDN about their usage. Take some time to read over it.
You can't really guarantee that your threads will each run on their own core. While it's entirely likely that the OS thread scheduler will do this, just be aware that it isn't guaranteed.
I would assume that the easiest way to increase your use of the processors would be to simply spawn the worker methods on threads from the ThreadPool (by calling ThreadPool.QueueUserWorkItem). If you do this in a loop, the runtime will pick up threads from the thread pool and run the worker threads in parallel.
ThreadPool.QueueUserWorkItem(state => DoWork());
Never use Sleep for thread synchronization.
Your question doesn't supply enough detail, but you might want to use a ManualResetEvent to make the workers wait for the initial query.
Yes, it makes sense what you are trying to do.
It would make sense to make 8 workers, each consuming tasks from a queue. You should take care to synchronize threads properly, if they need to access shared state. From your description of your problem, it sounds like you are having a thread synchronization problem.
You should remember, that you can only update the GUI from the GUI thread. That might also be the source of your problems.
There is really no way to tell, what exactly the problem is, without more information or a code example.
I'm suspecting you have a problem like this: You need to make a copy of the loop variable (task) into currenttask, otherwise the threads all actually share the same variable.
<main thread>
var tasks = db.GetTasks();
foreach(var task in tasks) {
var currenttask = task;
ThreadPool.QueueUserWorkItem(state => DoTask(currenttask));
// or, new Thread(() => DoTask(currentTask)).Start()
// ThreadPool.QueueUserWorkItem(state => DoTask(task)); this doesn't work!
}
Note that you shouldn't Thread.Sleep() on the main thread to wait for the worker threads to finish. if using the threadpool, you can continue to queue work items, if you want to wait for the executing tasks to finish, you should use something like an AutoResetEvent to wait for the threads to finish.
You seem to be encountering a common issue with multithreaded programming. It's called a Race Condition, and you'd do well to do some research on this and other multithreading issues before proceeding too far. It's very easy to quickly mess up all your data.
The short of it is that you must ensure all your commands to your database (eg: Get an available task) are performed within the scope of a single transaction.
I don't know MySQL Well enough to give a complete answer, however a very basic example for T-SQL might look like this:
BEGIN TRAN
DECLARE #taskid int
SELECT #taskid=taskid FROM tasks WHERE assigned = false
UPDATE tasks SET assigned=true WHERE taskid = #taskID
SELECT * from tasks where taskid = #taskid
COMMIT TRAN
MySQL 5 and above has support for transactions too.
You could also do a lock around the "fetch task from DB" code, that way only one thread will query the database at a time - but obviously this decrease the performance gain somewhat.
Some code of what you're doing (and maybe some SQL, this really depends) would be a huge help.
However assuming you're fetching a task from DB, and these tasks require some time in C#, you likely want something like this:
object myLock;
void StartWorking()
{
myLock = new object(); // only new it once, could be done in your constructor too.
for (int i = 0; i < Environment.Processorcount; i++)
{
ThreadPool.QueueUserWorkItem(null => DoWork());
}
}
void DoWork(object state)
{
object task;
lock(myLock)
{
task = GetTaskFromDB();
}
PerformTask(task);
}
There are some good ideas posted above. One of the things that we ran into is that we not only wanted a multi-processor capable application but a multi-server capable application as well. Depending upon your application we use a queue that gets wrapped in a lock through a common web server (causing others to be blocked) while we get the next thing to be processed.
In our case, we are processing lots of data, we to keep things single, we locked an object, get the id of the next unprocessed item, flag it as being processed, unlock the object, hand the record id to be processed back to the main thread on the calling server, and then it gets processed. This seems to work well for us since the time it takes to lock, get, update, and release is very small, and while blocking does occur, we never run into a deadlock situation while waiting for reasources (because we are using lock(object) { } and a nice tight try catch inside to ensure we handle errors gracefully inside.
As mentioned elsewhere, all of this is handled in the primary thread. Given the information to be processed, we push it to a new thread (which for us goes and retrieve 100mb's of data and processes it per call). This approach has allowed us to scale beyond the single server. In the past we had to through high end hardware at the problem, now we can throw several cheaper, but still very capable servers. We can also through this across our virtualization farm in low utilization periods.
On other thing I failed to mention, we also use locking mutexes inside our stored proc as well so if two apps on two servers call it at the same time, it's handled gracefully. So the concept above applies to our app and to the database. Our clients backend is MySql 5.1 series and it is done with just a few lines.
One of this things that I think people forget when they are developing is that you want to get in and out of the lock relatively quickly. If you want to return large chunks of data, I personally wouldn't do it in the lock itself unless you really had to. Otherwise, you can't really do much mutlithreading stuff if everyone is waiting to get data.
Okay, found my MySql code for doing just what you will need.
DELIMITER //
CREATE PROCEDURE getnextid(
I_service_entity_id INT(11)
, OUT O_tag VARCHAR(36)
)
BEGIN
DECLARE L_tag VARCHAR(36) DEFAULT '00000000-0000-0000-0000-000000000000';
DECLARE L_locked INT DEFAULT 0;
DECLARE C_next CURSOR FOR
SELECT tag FROM workitems
WHERE status in (0)
AND processable_date <= DATE_ADD(NOW(), INTERVAL 5 MINUTE)
;
DECLARE EXIT HANDLER FOR NOT FOUND
BEGIN
SET L_tag := '00000000-0000-0000-0000-000000000000';
DO RELEASE_LOCK('myuniquelockis');
END;
SELECT COALESCE(GET_LOCK('myuniquelockis',20), 0) INTO L_locked;
IF L_locked > 0 THEN
OPEN C_next;
FETCH C_next INTO I_tag;
IF I_tag <> '00000000-0000-0000-0000-000000000000' THEN
UPDATE workitems SET
status = 1
, service_entity_id = I_service_entity_id
, date_locked = NOW()
WHERE tag = I_tag;
END IF;
CLOSE C_next;
DO RELEASE_LOCK('myuniquelockis');
ELSE
SET I_tag := L_tag;
END IF;
END
//
DELIMITER ;
In our case, we return a GUID to C# as an out parameter. You could replace the SET at the end with SELECT L_tag; and be done with it and loose the OUT parameter, but we call this from another wrapper...
Hope this helps.
I'm having a major performance issue with LINQ2SQL and transactions. My code does the following using IDE generated LINQ2SQL code:
Run a stored proc checking for an existing record
Create the record if it doesn't exist
Run a stored proc that wraps its own code in a transaction
When I run the code with no transaction scope, I get 20 iterations per second. As soon as I wrap the code in a transaction scope, it drops to 3-4 iterations per second. I don't understand why the addition of a transaction at the top level reduces the performance by so much. Please help?
Psuedo stored proc with transaction:
begin transaction
update some_table_1;
insert into some_table_2;
commit transaction;
select some, return, values
Pseudo LINQ code without transaction:
var db = new SomeDbContext();
var exists = db.RecordExists(some arguments);
if (!exists) {
var record = new SomeRecord
{
// Assign property values
};
db.RecordsTable.InsertOnSubmit(record);
db.SubmitChanges();
var result = db.SomeStoredProcWithTransactions();
}
Pseudo LINQ code with transaction:
var db = new SomeDbContext();
var exists = db.RecordExists(some arguments);
if (!exists) {
using (var ts = new TransactionScope())
{
var record = new SomeRecord
{
// Assign property values
};
db.RecordsTable.InsertOnSubmit(record);
db.SubmitChanges();
var result = db.SomeStoredProcWithTransactions();
ts.Complete();
}
}
I know the transaction isn't being escalated to the DTC because I've disabled the DTC. SQL Profiler shows that several of the queries take much longer with the transactionscope enabled, but I'm not sure why. The queries involved are very short lived and I've got indexes that I have verified are being used. I'm unable to determine why the addition of a parent transaction causes so much degredation in performance.
Any ideas?
EDIT:
I've traced the problem to the following query within the final stored procedure:
if exists
(
select * from entries where
ProfileID = #ProfileID and
Created >= #PeriodStart and
Created < #PeriodEnd
) set #Exists = 1;
If I had with(nolock) as shown below, the problem disappears.
if exists
(
select * from entries with(nolock) where
ProfileID = #ProfileID and
Created >= #PeriodStart and
Created < #PeriodEnd
) set #Exists = 1;
However, I'm concerned that doing so may cause problems down the road. Any advice?
One big thing that changes as soon as you get a transaction - the isolation level. Is your database under heavy contention? If so: by default a TransactionScope is at the highest "serializable" isolation level, which involves read locks, key-range locks, etc. If it can't acquire those locks immediately it will slow down while it it blocked. You could investigate by reducing the isolation level of the transaction (via the constructor). For example (but pick your own isolation-level):
using(var tran = new TransactionScope(TransactionScopeOption.Required,
new TransactionOptions { IsolationLevel = IsolationLevel.Snapshot })) {
// code
tran.Complete();
}
However, picking an isolation level is... tricky; serializable is the safest (hence the default). You can also use granular hints (but not via LINQ-to-SQL) such as NOLOCK and UPDLOCK to help control locking of specific tables.
You could also investigate whether the slowdown is due to trying to talk to DTC. Enable DTC and see if it speeds up. The LTM is good, but I've seen composite operations to a single database escalate to DTC before...
Although you are using a single datacontext, your code sample is likely to use more than one connection and that will escalate your transaction to a distributed transaction.
Try initializing your datacontext with an explicit db connection, or call db.Connection.Open() right after creating the datacontext. That removes the overhead of distributed transactions...
Does the Stored Procedure you call participate in the ambient (parent) transaction? - that is the question.
It's likely that the Stored Procedure participates in the ambient transaction, which is causing the degredation. There's an MSDN article here discussing how they interrelate.
From the article:
"When a TransactionScope object joins an existing ambient transaction, disposing of the scope object may not end the transaction, unless the scope aborts the transaction. If the ambient transaction was created by a root scope, only when the root scope is disposed of, does Commit get called on the transaction. If the transaction was created manually, the transaction ends when it is either aborted, or committed by its creator."
There's also a serious looking document on nested transactions which looks like it is directly applicable localted on MSDN here.
Note:
"If TransProc is called when a transaction is active, the nested transaction in TransProc is largely ignored, and its INSERT statements are committed or rolled back based on the final action taken for the outer transaction."
I think that explains the difference in performance - it's essentially the cost of maintaining the parent transaction. Kristofer's suggestion may help to reduce the overhead.