There is a stored procedure that does very heavy processes. It regenerates sql insert scripts while iterating over about 30 tables and after finished that process it starts to insert those scripts to a table named X. The process takes about 20 minutes and this is unacceptable. The last thing which you have to know is that the procedure calling by a webmethod which is created on .NET.
PS: Tables have indexes.
Here are my questions
I want to use multi-threading to solve the problem. But not sure if it would help? I will slice up the sp into 5 pieces and call it from 5 different threads at the same time. I wonder that would help to decrease the meantime between the start and the end of the processing?
Potentially, this would work as there will be no blocking or waiting for other sections of the stored procedure to complete. Obviously, you are still constrained by the physical resources of the server. To be honest, the only way you will tell for sure is to do it and measure the performance.
I would ensure that you analyse the dependencies of each part of the stored procedure accurately, then do it again just to make sure.
Good luck.
Have you taken into account the performance hit you will get with a large amount of inserts into tables with indexes. If you haven't already done so, try the insert part of the process on tables with no indexes and measure any gains. If it proves to be beneficial, script the index creation and run it at after the inserts.
Threading may help but my experience is that optimizing the sql process is likely to give more benefit. I'll be interested to hear your findings.
I am also trying to call the same SP from the console application with multiple threads and i don't see that my SP is running parallel. Even though i call it multiple times using thread, it is executing 1 after the other. I am using the Sql server 2008, (express edition) and also i have configured the Parallel Processing in SQL Server 2008. but it is not running parallel. any ideas or suggestion will be greatly appreciated.
Thanks
Venkat
Related
I have two DataBases (DB1 & DB2 : both DBs are same, DB2 is created from the backup of DB1). When I run a stored procedure SP1 on both DBs it takes approximately 2 seconds to give me an output (select statements) on both DBs.
Now the problem is when I point these DBs from a service and try to use DataAdapter.Fill method, it gives me different time(54 - 63 seconds on DB1 and 42 - 44 seconds on DB2) on both DBs consistently. Noted that I'm using same service to point DBs so it couldn't be service behave/performance. Now my question is:
What could be the reason for this? Any suggestions are welcome that What should I look into?
Helping Info:
Both DB are on different servers(identical configuration) but since executing the SP on SQL Server Management Studio take the same time
on both DBs so I ruled out the possibility of DB server performance.
Network delay could be a factor But higlly unlikely as both servers
are on same network and infact on same physical location. This is my
last option to check.
Some other services are using SQLDependency ON DB1. Which consistently fill DataAdapter(s), could this be the reason for my
DataAdapter fill method to slow down? (less likely as I'm guessing)
As requested in comments below is code that is filling the DataSet:
PS: The time mentioned above is the execution time of the code line highlighted in the above image.
That sounds very much like a query plan issue.
Erland Sommerskog has written an excellent article about this kind of problems,
Slow in the Application, Fast in SSMS?.
My first guess would be "The Default Settings", but it might be one of the other issues, too.
Have you tried not using the SQL.StoredProcedure and just run it as a line of SQL:
"exec dbname.dbo.storedprocname params".
Its a bit more work because you'll have to loop around the parameters to add to the string at the end but its a SQL string, it doesn't care what you are doing, its not doing anything funny behind the scenes. Should have similar times, if this is failing, try checking things like indexes etc.. on db tables that the Stored Procedure is using.
Step one - rebuild or reorg your indexes. This is usually the most common performance issue with SQL Server and is easy to fix. Restart SQL Server some times this also a matters
I have a maintenance plan that runs on my SQL Server 2008 server every morning before business hours. It was put in place a few years ago to help with some performance issues. The problem that I am seeing is that after that rebuild index finishes, there is a stored procedure in one of the databases that will go from taking nine seconds to run to taking seven minutes to run.
The solution I have found to fix it is to open SQL Management Studio and run:
EXEC sp_recompile N'stored_proc_name';
EXEC stored_prod_name #userId=579
After I run that, the SP fixes itself and goes back to running under nine seconds.
I've tried a couple of different paths to automate this, but it will only work if I run it from my computer through management studio. I tried to wrap it up in a little C# executable that ran a few minutes after the rebuild index job completes, but that didn't work. I also tried creating a SQL job to run it on the server after the rebuild index job completes, but that didn't work either. It has to be run from management studio.
So, two questions:
How can I stop rebuild index from breaking my SPs, or,
Any ideas on how or why my quick fix will only work in a very specific situation?
Thanks,
Mike
This sounds like standard parameter sniffing / parameter-based query-plan caching. The trick here is usually to use the OPTIMIZE FOR / UNKNOWN hint - either for the specific parameter that is causing the problem, or simply for all parameters. This makes it much less likely that a parameter-value with biased distribution will negatively impact the system for other values. A more extreme option (more useful when using command-text, not so useful when using stored procedures) is to embed the value directly into the TSQL rather than using a parameter. This... has impact, however, and should be used with caution.
In your case, I suspect that adding:
OPTION (OPTIMIZE FOR (#userId UNKNOWN))
to the end of your query will fix it.
I have code that carries out data retrieval - basically executes anything from 3 to 12 SQL (oracle) read statements to retrieve data about an object.
Unfortunantly its running slowly (no SQL statement in particular, its just the fact I have so many of them - and they take around 0.2 seconds per statement, which can mean over 2 secs for the code to complete).
I am looking into ways of improving the performance. One way is to merge some of the tables into a single query (which can reduce the combined results by 0.5 secs). However it doesn't make sense to merge the rest since there will only be data there under certain cicumstances, and trying to determine when there is data there to marshal could get tricky.
I am considering introducing threading into my program, so after the initial query, I would spawn a thread for each of the other queries, so they are executed at the same time. However I have never used threading and am wary of introducing deadlocks or other pit falls.
Currently the other queries marshal the results into different sections of the SAME object. Would this cause any issues (i.e. since we are accessing/updating the same object in different threads though different sections/fields within the object?). Would it be better to return the results and marshal into the object after all the threads have finished?
I know these types of questions are hard to answer since its more general advice, but I would appreciate if anyone thought it was a good idea, or had other suggestions?
If you are doing only reading (select from) - don't worry about deadlocks. Oracle readings are not blockable (mostly). The biggest problem with threading queries to oracle would be how to deal with connections. To create connection, run a query and close connection - is very very very bad. Connections are expensive. They are also limited, so you don't want to create one million connections to execute your logic.
As a result, you would use some sort of connection pool and put your queries in a queue.
Also, I hope you are using bind variables and not string concatenation to pass queries to oracle.
In general, I would collect all the data (better in one query) and only then update the object. You could also consider to brake your object into it sections.
Threading workss perfectly. 2 years ago I did a project that used a multi strage / multi threading approeach to push data into a oracle database (and pull some data out of it for updates).
I basicallly used a staged approach (a request would go through multiple stages, get consumed there and new data be pusehd to the next stage) and every stage used a configurable thread pool, which would take a message, process it and post the new messages.
We used I think at that time close to 200 threads to process about a million SQL statements per minute (hitting an Oracle Exadata that was really getting some work out of that).
So, multithreading "just works" - obviously if you know how to do it and you have to get your architecture and the sql statements nice and non blocking. Databases in general are perfectly calable of handling multiple threads.
Now, for details: THAT DEPENDS.
Example:
Currently the other queries marshal the results into different
sections of the SAME object. Would this cause any issues (i.e. since
we are accessing/updating the same object in different threads though
different sections/fields within the object?)
Absolutely no problem as long as:
You make suer all updates are finished before moving the object to the next phase and
The updates do not overlap or have a cardinality (1 must finish for 2 to have the required data).
These are implementation details and it is really hard to make a generic answer for those (totally impossible). Especially as this is multi threading 101 - and has nothing to do with any database access.
In general - you will also have to tune the number of threads. .NET can not do that itself - as it will see the CPU not busy and spawn up more threads, even if the database server is the bottleneck. This is why we went with multiple stages - so we could tune the number of threads depending what they do (and the last stage used bulk inserting to insert the aggregated data into temporary staging tables with a small number of threads, moving a lot of data in every statement - this will require some tuning possibilities to not totally overload the database side).
I have one BIG table(90k rows, size cca 60mb) which holds info about free rooms capacities for about 50 hotels. This table has very few updates/inserts per hour.
My application sends async requests to this(and joined tables) at max 30 times per sec.
When i start 30 threads(with default AppPool class at .NET 3.5 C#) at one time(with random valid sql query string), only few(cca 4) are processed asynchronously and other threads waits. Why?
Is it becouse of SQL SERVER 2008 table locking, or becouse of .NET core? Or something else?
If it is a SQL problem, can help if i split this big table into one table per each hotel model?
My goal is to have at least 10 threads servet at a time.
This table is tiny. It's doesn't even qualify as a "medium sized" table. It's trivial.
You can be full table scanning it 30 times per second, you can be copying the whole thing in ram, no server is going to be the slightest bit bothered.
If your data fits in ram, databases are fast. If you don't find this, you're doing something REALLY WRONG. Therefore I also think the problems are all on the client side.
It is more than likely on the .NET side. If it were table locking more threads would be processing, but they would be waiting on their queries to return. If I remember correctly there's a property for thread pools that controls how many actual threads they create at once. If there are more pending threads than that number, then they get in line and wait for running threads to finish. Check that.
Have you tried changing the transaction isolation level?
Even when reading from a table Sql Server will lock the table
try setting the isolation level to read uncommitted and see if that improves the situation,
but be advised that its feasible that you will read 'dirty' data make sure you understand the ramifications if this is the solution
this link explains what it is.
link text
Rather than ask, measure. Each of your SQL queries that is actually submitted by your application will create a request on the server, and the sys.dm_exec_requests DMV shows the state of each request. When the request is blocked the wait_type column shows a non-empty value. You can judge from this whether your requests are blocked are not. If they are blocked you'll also know the reason why they are blocked.
Here I am dealing with a database containing tens of millions of records. I have an application which connects to the database, gets all the data from a single column in a table and does some operation on it and updates it (for SQL Server - using cursors).
For millions of records it is taking very very ... long time to update. So I want to make it faster by
using multiple threads with an independent connection for each thread.
or
by using a single connection throughout all the threads to fire the update queries.
Which one is faster, or if you have any other ideas plz explain.
I need a solution which is independent of database type , or even if you know specific solutions for each type of db, please reply.
The speedup you're trying to achieve won't work. To the contrary, it will slow down the overall processing as the database now has also to keep multiple connections/sessions/transactions in sync.
Keep with as few connections/transactions as possible for repetitive and comparable operations.
If it takes too long for your taste, maybe try to analyze if the queries can be optimized somehow. Also have a look at database-specific extensions (ie bulk operations) suitable for your problem.
All depends on the database, and the hardware it is running on.
If the database can make use of concurrent processing, and avoids contention on shared resources (e.g. page base locks would span multiple records, record based would not). Shared resources in this case include hardware, a single core box will not be able to execute multiple CPU intensive activities (e.g. parsing SQL) truely in parallel.
Network latency is something you might help alleviate with concurrent inserts even if the database is itself not able to exploit concurrency.
As with any question of performance there is substitute for testing in your specific scenario.
If possible try to use the Stored procedure the do all the processing and update the records.