Constantly sending SQL queries - c#

I don't know if this a common question asked, but if it is, please don't yell at me! :(
I have a Windows Form C# program that executes an UPDATE query every 2 seconds with the threading timer.
My question is: is this dangerous? Will this make my computer run much slower? Am I firing up the CPU usage? I'm a pretty concerned guy when it comes to constantly using something every second.
EDIT: It's UPDATE, not INSERT sorry!

This always depends a lot on the size of the operation that is done every 2 seconds; if the operation takes 1.5 seconds to pre-process, execute and post-process, then it will be a problem. If it takes 4ms, probably not. You also need to think about the server; even if we say it takes 4ms, that could be parallelised over 8 cores, so that is 32ms - and if you have 2000 users all doing that every 2 seconds, it starts to add up.
But by itself: fine.
And client-side, on a modern multi-core PC, this is probably not even enough to register as the tiniest blip on the graph.

The answer completely depends on the amount of work the update statement is performing. If it is updating millions of rows every two seconds, then it will definitely impact the performance.
However, if you are only updating a handful of rows (up to say, 100,000) in an SQL Server database, then this frequency should be perfectly acceptable.
The manner in which the update is performed is also important: using cursors, linked servers, CLR functions, databases other than SQL (i.e. Access), and many, many other factors can all significantly impact the performance.

Related

Concurrency issues writing to multiple SQLite databases simultaneously on different app threads

I've done a lot of searching on this, and haven't had a lot of luck. As a test, I've written a C# WinForms app where I spin up a configurable amount of threads and have each thread write a configurable amount of data to a number of tables in a SQLite database created by the thread. So each thread creates it's own SQLite database, and only that thread interacts with it.
What I'm seeing is that there's definitely some performance degradation happening as a result of the concurrency. For example, if I start each thread roughly simultaneously performance writing to SQLite's tables PLUMMETS compared to if I put a random start delay in each thread to spread out their access.
SQLite starts easily fast enough for my tests, I can insert 20,000 rows in a table in a third of a second, but once I start up 250 threads, those same 20,000 rows can takes MINUTES to write to each of the databases.
I've tried a lot of things, including periodic commits, setting Sychronous=Off, using paramaterized queries, etc... and those all help by shorting the amount of time each statement takes (and therefore reducing the change of concurrent activity) but nothing's really solved it and I'm hoping someone can give some advice.
Thanks!
Andy
Too much concurrency in writes in any relational database does cause some slowdown. Depending upon the scenario that you are trying to optimize you can do various other things, few I can think of are:
1) create batches instead of concurrent writes, this means if you are expecting a large number of users writing simultaneously, collect their data and flush them down in larger groups, be warned though that this this means, while our queue is collecting if the application goes down, u would lose the data, u can do this for non critical data such as logs.
2) if ur threads need to do other work as well before inserting the data, u can still have our threads and then add a semaphore or something equivalent to the part of the code where insertion takes place, this will limit the concurrency and speed up the entire process.
3) if what u r trying to do is bulk insert via a tool which you are trying to make, then mention that in your question, a lot of mysql dba's will answer our question better than me.

Running multiple instances of a program to speed up process?

I have a program that performs a long running process. Loops through thousands of records, one at a time, and calls a stored proc each iteration. Would running two instances of a program like this with one processing half the records and the other processing the other half speed up the processing?
Here are the scenarios:
1 program, running long running process
2 instances of program on same server, connecting to same database, each responsible for processing half (50%) of the records.
2 instance on different server, connecting to the same database, each responsible for half (50%) of the records.
Would scenario 2 or 3 run twice as fast as 1? Would there be a difference between 2 and 3? The main bottleneck is the stored proc call that takes around half a second.
Thanks!
This depends on a lot of factors. Also note that threads may be more appropriate than processes. Or maybe not. Again: it depends. But: is this work CPU-bound? Network-bound? Or bound by what the database server can do? Adding concurrency helps with CPU-bound, and when talking to multiple independent resources. Fighting over the same network connection or the same database server is unlikely to improve things - and can make things much worse.
Frankly, from the sound of it your best bet may be to re-work the sproc to work in batches (rather than individual records).
To answer this question properly you need to know what the resource utilization of the database server currently us: can it take extra load? Or simpler - just try it and see.
It really depends what the stored procedure is doing. If the stored procedure is going to be updating the records, and you have a single database instance then there is going to be contention when writing the data back.
The values at play here, are:
The time it takes to read the data in to your application memory (and this is also dependent on whether you are using client-side or sql-server-side cursors).
The time it takes to process, or do your application logic.
The time it takes to write an updated item back (assuming the proc updates).
One solution (and this is by no means a perfect solution without knowing the exact requirements), is:
Have X servers read Y records, and process them.
Have those servers write the results back to a dedicated writing server in a serialized fashion to avoid the contention.

Loading multiple large ADO.NET DataTables/DataReaders - Performance improvements

I need to load multiple sql statements from SQL Server into DataTables. Most of the statements return some 10.000 to 100.000 records and each take up to a few seconds to load.
My guess is that this is simply due to the amount of data that needs to be shoved around. The statements themselves don't take much time to process.
So I tried to use Parallel.For() to load the data in parallel, hoping that the overall processing time would decrease. I do get a 10% performance increase, but that is not enough. A reason might be that my machine is only a dual core, thus limiting the benefit here. The server on which the program will be deployed has 16 cores though.
My question is, how I could improve the performance more? Would the use of Asynchronous Data Service Queries be a better solution (BeginExecute, etc.) than PLINQ? Or maybe some other approach?
The SQl Server is running on the same machine. This is also the case on the deployment server.
EDIT:
I've run some tests with using a DataReader instead of a DataTable. This already decreased the load times by about 50%. Great! Still I am wondering whether parallel processing with BeginExecute would improve the overall load time if a multiprocessor machine is used. Does anybody have experience with this? Thanks for any help on this!
UPDATE:
I found that about half of the loading time was consumed by processing the sql statement. In SQL Server Management Studio the statements took only a fraction of the time, but somehow they take much longer through ADO.NET. So by using DataReaders instead of loading DataTables and adapting the sql statements I've come down to about 25% of the initial loading time. Loading the DataReaders in parallel threads with Parallel.For() does not make an improvement here. So for now I am happy with the result and leave it at that. Maybe when we update to .NET 4.5 I'll give the asnchronous DataReader loading a try.
My guess is that this is simply due to the amount of data that needs to be shoved around.
No, it is due to using a SLOW framework. I am pulling nearly a million rows into a dictionary in less than 5 seconds in one of my apps. DataTables are SLOW.
You have to change the nature of the problem. Let's be honest, who needs to view 10.000 to 100.000 records per request? I think no one.
You need to consider to handle paging and in your case, paging should be done on sql server. To make this clear, lets say you have stored procedure named "GetRecords". Modify this stored procedure to accept page parameter and return only data relevant for specific page (let's say 100 records only) and total page count. Inside app just show this 100 records (they will fly) and handle selected page index.
Hope this helps, best regards!
Do you often have to load these requests? If so, why not use a distributed cache?

What advantages does one big database query have over many small ones

I inherited the app and what it does is get data from 4 views with an (xml file in it) in chunks of 1000 records then writes them down in an xml file all this split up by a type parameter that has 9 different possibilities. That means in a worst case there will be 36 connections to the database for each 1000 of that type/view combination.
The real data will exist of 90.000 lines and in this case 900 - 936 times fetching up to 1000 lines from database.
Now I am wondering what advantages it would give to read all data into the app and make the app work with this to write the 900+ files.
1000 lines is about 800MB, 90.000 lines is approx 81GB of data being transferred.
The code would have to be rewritten if we read it all at once and although it would make more sense this is a one time job. After the 90.000 lines, we will never use this code again. Is it worth it to spend 2, 3 hours to rewrite code that works to reduce the amount of connections this way?
If it's a one-time thing then why spend any effort at all optimizing it? Answer: no.
Let me add, though, in answer to your general question of what advantage does a big query have over lots of small ones: probably none. If you run a huge query you are leaving a lot of magic up to the middleware, it may or may not work well.
While having 36 simultaneous connections isn't optimal either, its probably better than running a query that could return 80 gigabytes of data. The ideal solution (if you had to use this code more than once) would be to rewrite it to get data in chunks but not leave lots of connections open simultaneously.
Does the code work already? If it does, then I wouldn't spend time rewriting it. You run in to the risk of introducing bugs in the code. Since you will use this once and never use it again, it doesn't seem like it is worth the effort.
If we are talking SQL Server, the biggest disadvantage of a large query (a single batch) over many small ones (note the opposite sense to the question you are asking) is that there can only be one query plan per batch.
If it's a one time job I'd say no. Many times I have done things that i normally wouldn't (cursors) but ONLY because it was a one time job.
Ask yourself it it makes sense to spend 2 to 3 hours on something that already works and you will never use again. There are obviously other factors to take into account though. Like will this lock up your production database for 2-3 hours?
If there are no disastrous side effects I'd say use what you have.

Batch Insert or one by one

I am building a system, where the data will be added by user every 30 secs or so. In this case, should I go for a batch insert(insert after every 2 mins) or do I insert every time the user enters the data. The system is to be built on c# 3.5 and Sql server.
Start with normal inserts. You're nowhere near having to optimize this.
If performance becomes a problem, or it would be obvious that it may be a concern, only then do you need to look at optimizing -- and even then, it may not be an issue with inserts! Use a profiler to determine the bottleneck.
Once every 30 seconds is no significant stress. By the KISS principle, I prefer one-by-one in this case.
If it is every 30 seconds I would go for insert immediately if the insert is as quick as it should be (<2 seconds for large data).
If you see potential future growth to see more frequent transactions then consider bulking the inserts.
It really depends on your requirements, are you able to give more information?
For example:
Are you running a web app?
Is the data time sensitive (to be used by other users)
Do you have to worry about concurrent usage of the data?
What volume of transactions are you looking at?
For small amounts of data I would simply insert the rows as the user requires, the gain of caching will likely be minimal and it simplifies your implementation. If the usage is expected to be quite high come back and look at optimising the design. Comes back to the general rule of avoiding premature optimisations :)

Categories