Entity Framework timeout error due to database block - c#

Have a project that uses the entity framework (v1 with .NET 3.5). It's been in use for a few years, but it's now being used by more people. Started getting timeout errors and have tracked it down to a few things. For simplicity sake let's say my database has three tables, product, part, and product_part. There are ~1400 parts and a handful of products.
The user has the ability to add any number of parts to a product. My problem is that when there are many parts added to the product the inserts take a long time. I think it's mostly due to network traffic/delay, but to insert all 1400 takes around a minute. If someone goes in and tries to view the details of a part while those records are being inserted I get a timeout and can see a block in the Activity Monitor of SQL Server.
What can I do to avoid this? My apologies if this has been asked before and I missed it.
Thanks,
Nick

I think the root problem is that your write transaction is taking so long. EF is not good at executing mass DML. It executes each insert in a separate network roundtrip and separate statement.
If you want to insert 1400 rows, and performance matters, do the insert in one single statement using TVP's (INSERT ... SELECT * FROM #tvp). Or switch to bulk-copy but I don't think that will be advantageous at only 1400 rows.
If your read transactions are getting blocked, and this is a problem, switch on snapshot isolation. That takes care of the readers 100% as they never block under snapshot isolation.

Related

Entity 6 + PostgreSQL - horrible performance

I'm using Entity 6 with PostgreSQL database (with Npgsql connector). Everything works fine, except for poor performance of this setup. When I try to insert not-so-large amount of objects to database (about 20k records), it takes much more time than it should. As this is my first time using Entity Framework, I was rather confused why inserting 20k records to database on my local machine would take more than 1 minute.
In order to optimize inserts I followed every tip I found. I tried to set AutoDetectChangesEnabled to false, call SaveChanges() every 100 or 1000 records, re-creating database context object and use DbContextTransaction objects (by calling dbContext.Database.BeginTransaction() and commiting transaction at the end of the operation or every 100/1000 records). Nothing improved inserts performance even a little bit.
By loging SQL queries generated by Entity I was finally able to discover that no matter what I do, every object is inserted separatedly and every insert takes 2-4 ms. Without re-creating DB context objects and without transactions, there is just one commit after over 20k inserts. When I use transactions and commit every few records, there are more commits and new transaction creations (same when I re-create DB context object, just with connection being re-established as well). If I use transactions and commit them every a few records, I should notice a performance boost, no? But in the end there is no difference in performance, no matter if I use multiple transactions or not. I know transactions won't improve performance drastically, but they should help at least a little bit. Instead, every insert still takes at least 2ms to execute on my local DB.
Database on local machine is one thing, but performing creation of 20k objects on remote database takes much, much, MUCH longer than one minute - logs indicate that single insert can take even 30ms (!), with transactions being commited and created again every 100 or 1000 records. On the other hand, if I execute a single insert manually (taking it straight from the log), it takes less than 1ms to execute. It seems like Entity takes its sweet time inserting every single object to database, even though it uses transactions to wrap larger amount of inserts together. I don't really get it...
What can I do to speed it up for real?
In case anyone's interested, I found a solution to my problem. Entity Framework 6 is unable to provide fast bulk inserts without additional third-party libraries (as mentioned in comments to my question), which are either expensive or not supporting other databases than SQL Server. Entity Framework Core, on the other hand, is another story. It supports fast bulk insertions and can replace EF 6 in project with just a bunch of changes in code: https://learn.microsoft.com/pl-pl/ef/core/index

Deadlock on transaction with multiple tables

My scenario is common:
I have a stored procedure that need to update multiple tables.
if one of updates failed - all the updates should be rolled back.
the strait forward answer is to include all the updates in one transaction and just roll that back. however, in system like ours , this will cause concurrency issues.
when we break the updates into multiple short transactions - we get throughput of ~30 concurrent executions per second before and deadlocking issues start to emerge.
if we put it to one transaction which span all of them - we get concurrent ~2 per second before deadlock shows up.
in our case, we place a try-catch block after every short transaction, and manually DELETE/Update back the changes from the previous ones. so essentially we mimic the transaction behavior in a very expensive way...
It is working alright since its well written and dont get many "rollbacks"...
one thing this approach cannot resolve at all is a case of command timeout from the web server / client.
I have read extensively in many forms and blogs and scanned through the MSDN and cannot find a good solution. many have presented the problem but I am yet to see a good solution.
The question is this: is there ANY solution to this issue that will allow a stable rollback of update to multiple tables, without require to establish exclusivity lock on all of the rows for the entire duration of the long transaction.
Assume that it is not an optimization issue. The tables are almost at the max optimization probably, and can give a very high throughput as long as deadlock don't hit it. there are no table locks/page locks etc. all row locks on updates - but when you have so many concurrent sessions some of them need to update the same row...
it can be via SQL, client side C#, server side C# (extend the SQL server?).
Is there such solution in any book/blog that i have not found?
we are using SQL server 2008 R2, with .NET client/web server connecting to it.
Code example:
Create procedure sptest
Begin transaction
Update table1
Update table2
Commit transaction
In this case, if sptest is run twice, the second instance cannot update table 1 until instance 1 has committed.
Compared to this
Create sptest2
Update table1
Update table2
Sptest2 has a much higher throughput - but it has chance to corrupt the data.
This is what we are trying to solve. Is there even a theoretical solution to this?
Thanks,
JS
I would say that you should dig deeper to find out the reason why deadlock occurs. Possibly you should change the order of updates to avoid them. Maybe some index is "guilty".
You cannot roolback changes if other transactions can change data. So you need to have update lock on them. But you can use snapshot isolation level to allow consistent reads before update commits.
For all inner joined tables that are mostly static or with a high degree of probability not effect the query by using dirty data then you can apply:
INNER JOIN LookupTable (with NOLOCK) lut on lut.ID=SomeOtherTableID
This will tell the query that I do not care about updates made to SomeOtherTable
This can reduce your issue in most cases. For more difficult deadlocks I have implemented a deadlock graph that is generated and emailed when a deadlock occurs contains all the detailed info for the deadlock.

what can affect nhibernate bulk insert performance?

I have various large data modification operations in a project built on c# and Fluent NHibernate.
The DB is sqlite (on disk rather than in memory as I'm interested in performance)
I wanted to check performance of these so I created some tests to feed in large amounts of data and let the processes do their thing. The results from 2 of these processes have got me pretty confused.
The first is a fairly simple case of taking data supplied in an XML file doing some light processing and importing it. The XML contains around 172,000 rows and the process takes a total of around 60 seconds to run with the actual inserts taking around 40 seconds.
In the next process, I do some processing on the same set of data. So I have a DB with approx 172,000 rows in one table. The process then works through this data, doing some heavier processing and generating a whole bunch of DB updates (inserts and updates to the same table).
In total, this results in around 50,000 rows inserted and 80,000 updated.
In this case, the processing takes around 30 seconds, which is fine, but saving the changes to the DB takes over 30 mins! and it crashes before it finishes with an sqlite 'disk or i/o error'
So the question is: why are the inserts/updates in the second process so much slower? They are working on the same table of the same database with the same connection. In both cases, IStatelessSession is used and ado.batch_size is set to 1000.
In both cases, the code looks that does the update like this:
BulkDataInsert((IStatelessSession session) =>
{
foreach (Transaction t in transToInsert) { session.Insert(t); }
foreach (Transaction t in transToUpdate) { session.Update(t); }
});
(although the first process has no 'transToUpdate' line as it's only inserts - Removing the update line and just doing the inserts still takes almost 10 minutes.)
The transTo* variables are List with the objects to be updated/inserted.
BulkDataInsert creates the session and handles the DB transaction.
I didn't understand your second process. However, here are some things to consider:
Are there any clustered or non-clustered indexes on the table?
How many disk drives do you have?
How many threads are writing to the DB in the second test?
It seems that you are experiencing IO bottlenecks that can be resolved by having more disks, more threads, indexes, etc.
So, assuming a lot of things, here is what I "think" is happening:
In the first test your table probably has no indexes, and since you are just inserting data, it is a sequential insert in a single thread which can be pretty fast - especially if you are writing to one disk.
Now, in the second test, you are reading data and then updating data. Your SQL instance has to find the record that it needs to update. If you do not have any indexes this "find" action is basically a table scan, which will happen for each one of those 80,000 row updates. This will make your application really really slow.
The simplest thing you could probably do is add a clustered index on the table for a unique key, and the best option is to use the columns that you are using in the where clause to "update" those rows.
Hope this helps.
DISCLAIMER: I made quite a few assumptions
The problem was due to my test setup.
As is pretty common with nhibernate based projects, I had been using in-memory sqlite databases for unit testing. These work great but one downside is that if you close the session, it destroys the database.
Consequently, my unit of work implementation contains a 'PreserveSession' property to keep the session alive and just create new transactions when needed.
My new performance tests are using on-disk databases but they still use the common code for setting up test databases and so have PreserveSession set to true.
It seems that having several sessions all left open (even though they're not doing anything) starts to cause problems after a while including the performance drop off and the disk IO error.
I re-ran the second test with PreserveSession set to false and immediately I'm down from over 30 minutes to under 2 minutes. Which is more where I'd expect it to be.

How should I process progressive status codes in a database?

I'm working on a project for an academic institution and I need advice on the best way to approach this problem. It's been a long time since I did any traditional application development (close to five years).
The college administration recently revised the college's academic standards policy. Previously, the administration only had three status codes, so this wasn't as big of an issue. However, the new policy has six status codes:
Good Standing
Academic Concern
Academic Intervention (1)
One-Term Dismissal
Academic Intervention (2)
Four-Term Dismissal
From here on, I'll differentiate between GPA for the term by saying termGPA and cumulative GPA by saying cumGPA. If a student's termGPA falls below 2.0, and that causes his/her cumGPA to also fall below 2.0, he/she gets placed on Academic Concern. Once on Academic Concern, one of three things can happen to students in following terms. They:
Return to good standing if their termGPA and cumGPA rise above 2.0.
Stay in the current status if their termGPA is above 2.0, but their cumGPA stays below 2.0.
Move to the next status if both their termGPA and cumGPA are below 2.0.
Normally, I would approach this process by writing a console application that processed each student iteratively and building the status codes as I go. However, we're handling at least 8000 students, and in most cases around 12,500 students per term.
Additionally, this policy has to be applied retroactively over an as-yet-unspecified period of time (since former students could return to the college and would then be subject to the new policy's restrictions), and once I include a student in the data set, I have to go back through that student's entire history with the college. I'm conservatively guessing that I'll go through at least a million student records and calculating each student's termGPA and rolling cumGPA.
Questions:
Is there any way handle this problem in SQL and avoid using a cursor?
(Assuming the answer to 1. is "No") How should I structure a console application? Should I create a large collection and process a few thousand students at a time before writing to the database, or update the database after each I process each student?
Am I making way too big of a deal about this?
Thanks in advance for any insight and advice.
Edit: Based on comments to answers here, I should've provided more information about the data structures and the way I'm calculating the GPAs.
I can't use the pre-calculated cumGPA values in our database -- I need the student's cumGPA at the end of each progressive term, like so (note: I made up the GPA values below):
ID TermID CumGpa TermGPA TermNumber PolicyCode
123545 09-10-2 2.08 2.08 1 GoodStanding
123545 09-10-3 1.94 0.00 2 AcademicConcern
123545 09-10-4 1.75 1.00 3 AcademicIntervention
123545 10-11-2 1.88 2.07 4 AcademicIntervention
123545 10-11-4 2.15 2.40 5 GoodStanding
123545 11-12-1 2.30 2.86 6 GoodStanding
The problem is that each subsequent term's status code could depend on the previous term's status code -- Good Standing is actually the only one that doesn't.
As far as I know, that means that I would have to use a cursor in SQL to get each student's most current status code, which is not something I'm interested in, as I work for a cash-strapped college that has precisely three database servers: one for testing, and two servers with the same data on them (we're in the process of moving to SQL Server 2008 R2).
That is interesting. I don't think you'll have to worry too much about the SQL performance. It will run fairly quickly for your application. I just ran a stupid little console app to fix a mess up and inserted 15000 records one at a time. It took about 5 seconds.
First of all, 12 000 records are nothing for nowadays databases so that's not your consern. You should rather focus on keeping it simple. It seems like your database will be offten based on events so I would recomend using triggers ie: fisrt trriger when your termGPA is inserted - update cumGPA, second one after cumGPA update - check your criteria and update status if they occured.
Even the free vesion of SQL now handles databases up to 10 GB. 12,500 records is small. Going though 1 million records you should itterate though each student or groups to allow the tranaction log to clear. That could be done in using a cursor or a console application. If you can perform the calcution in TSQL then batch them would probably be faster than one at a time. The down side is the bigger the batch the bigger tranaction log so there is a sweet spot. If the calculation is too complex for TSQL and takes almost as long (or longer) than insert statement you could insert on a separate thread (or calculate on a separate thread) so the insert and caluculation are in parrallel. I do this an applicaiton where I parse the words out of text - the parse takes about the amount of time as to insert the words. But I don't let it spin up multiple theads. On the SQL side it still had to maintain the indexes and hitting it with inserts from two threads slowed it down. Just two threads and the faster thread waits on the slower. The order you do your updates also matters. If you process in the order of the clustered index then you have a better chance that record is already in memory.
I ended up writing a console application in C# to process these status codes. My users changed the initial status update requirements to only include the previous two terms, but the process had enough edge cases that I opted to take my time and write cleaner, object-oriented code that will be easier to pick back up (he says, hopefully) once this policy matures and changes.
Also, I ended up having to deploy this database onto a SQL 2005 instance, so table-valued parameters were not available to me. If it had been, I would've opted to commit to the database only after processing each student, rather than after processing each term for each student.

How to directly scan a SQL indexed table

I have a SQL table with indexes containing leads to call. About 30 users will be calling these leads. To be sure that there is no two users calling the same lead, the system has to be instant.
So I would like to go this way:
Set the table to the right index
Scan the table for a lead I can call (there are conditions), following the index
When I have a call, indicate that the record is "in use"
Here are my issues:
- I can't find any way to set a table to an index by c# code
- Linq requires dataContext (not instant) and ADO requires DataSet
I have not found any resource to help me on that. If you have any, they are more than welcome.
Sorry if I may sound ignorant, I'm new to SQL databases.
Thank you very much in advance!
Mathieu
I've worked on similar systems before. The tact we took was to have a distribution routine that handled passing out the leads to the call center people. Typically we had a time limit on how long the lead was allowed to be in any one users queue before it was yanked away and given to someone else.
This allowed us to do some pretty complicated things like giving preference based on details about the lead as well as productivity of the individual call center person.
We had a very high volume of leads that came in and had our distribution routine set to run once a minute. The SLA was set so that a lead was contacted within 2 minutes of us knowing about them.
To support this, your leads table should have a AssignedUserId and probably a date/time stamp of when it was assigned. Write a proc or some c# code which grabs all the records from that table which aren't assigned. Do the assignment routine, saving the changes back to the table. This routine should probably take into account how many leads they are currently working and the acceptable number of open leads per person in order to give preference in a round robin distribution.
When the user refreshes they will have their leads. You can control the refresh rate in the UI.
I don't see how your requirement of being "instant" relates to the use of an index. Accessing a table by index is not instantaneous either.
To solve your problem, I would suggest to lock the whole table while a lead is being called. This will limit performance, but it will also ensure that the same lead is never called by two users.
Example code:
Begin Transaction
Lock Table
Search for Lead
Update Lead to indicate that it is in use
Commit Transaction (removes the lock)
Locking a table in SQL Server until the end of the transaction can be done by using SELECT * FROM table WITH (HOLDLOCK, TABLOCKX) WHERE 1=0.
Disclaimer: Yes, I'm aware that cleaner solutions with less locking are possible. The advantage of the above solution is that it is simple (no worrying about the correct transaction isolation level etc.) and it is usally performant enough (if you remember to keep the "locked part" short and there is not too much concurrent access).

Categories