Write to db efficiently from a multithread application

Write to db efficiently from a multithread application - c#

I have a server application that receives data from clients that must be stored in a database.
Client/server communication is made with ServiceStack, and for every client call there can be 1 or more records to be written.
The clients doesn't need to wait the data to be written or to know if the data has been written.
At my customer site the database sometimes may be unavailable for short times so I want to retry the writing until the database is available again.
I can't use a servicebus, or other software..it must be only my server and the database.
I considered two possibilities:
1) fire a thread for every call to write a record (or group of records with a multiple insert) that in case of failure retries until it has success
2) enqueque the data to be written in a global in-memory list, and have a single background thread to continuosly make a single call to the db (with a multiple insert)
What do you consider the most efficient way do do it? or do you have another proposal?
Option 1 is easier, but I'm worried to have too many threads running at the same time, expecially if the db gets unavailable.
In case I'll follow the second route, my idea is:
1) every server thread opened by a client locks the global list to insert 1 or more records to write to the db, release the lock and closes
2) the background thread locks the global list that has for example 50 records, makes a deep copy to a temp list, unlocks the global list
3) the server thread continues to add data to the global list, in the meantime the background thread tries to write the 50 records, retrying until it has success
4) when the background thread manages to write, it locks again the global list (that maybe now has 80 records), remove the first 50 that has been written, and everything starts again
Is there a better way to do this?
--------- EDIT ----------
My issue is that I don't want in any way the client to have to wait, not even for the adding of the record-to-be-sent to a blocked list (that happens when the writing thread writes or tries to write the list to the DB).
That's why in my solution I lock the list only for the time to copy the list to a temporary list that will be written to db.
I'm just wondering if this is crazy and there is a much simpler solution that I'm not following.

My understanding of the problem is as follows:
1. Client sends a data to be inserted to DB
2. Server receives the data and inserts to DB
3. Client doesn't want to know if data is inserted properly or not
In this case, I would suggest, Let server create a single Queue which holds the data to be inserted to DB, let receive thread just receive the data from client and insert into inmemory Queue, this queue can be emptied by another thread which takes care of writing to DB to persist.
You may even use file based queue or priority queue or just in-memory queue for storing the records temporarily.

If you use the .Net Thread Pool you don't need to worry about creating too many threads as thread lifetime is managed for you.
Task.Factory.StartNew(DbWriteMethodHere)
If you want to be smarter you could add the records you want to commit to a BlockingCollection - and then have a thread do BlockingCollection<T>.Take(50) which will block until there is a big enough batch to commit.

Related

Is it safe to perform a long action in a separate thread?

I'm dealing with a CSV file that's being imported Client side.
This CSV is supposed to contain some information which will be destinated to perform an update in one of the tables of my company's database.
My C# function process the file, looking for errors and, if no errors were found, it sends a bunch of update commands (files usually vary from 50 to 100000 lines).
Until now, I was performing the update in the same thread to execute the update (line by line), but it was getting a little slow, depending on the file, so I choose to send all the SQL to an Azure SQL Queue (which is a service that gets lots of "messages" and runs the SQL code agains the database), so that, the client wouldn't have to wait that much for the action to be performed.
It got a little faster, but still takes a long time (due to the requests to the Azure SQL Queue). So I noticed that putting that action in a separate thread worked and sent all the SQL to my Azure SQL Queue.
I got a little worried about it though. Is it really safe to perform long actions in separate threads? Is it reliable?

A second thread is 100% like the main thread that you're used to working with. I wish I had some authority answer at hand but this is such a common practice that people don't write those anymore...
So, YES, off-loading the work to a second thread is safe and can be considered by most the recommended way to go by it.
Edit 1
Ok, if your thread is running on IIS, you need to register it or it will die because once the request/reponse cycle finishes it will kill it...

Increasing the performance with service and multi threading in C#

I am doing a project that needs to communicate with 20 small computer boards. I will need to keep check of their connections and they will also return some data to me. So my aim is to build a control/monitoring system for these boards.
I will be using Visual Studio 2010 and C# WPF.
My idea/ plan would be like this:
On the main thread:
There will be only one control window, so a main thread will be created mainly to update the data to be displayed. Datas of each board will be display and refreshed at a time interval of 1s. The source of data will be from a database where the main thread will look for the latest data(I have not decided on which kind of database to use yet).
There will be control buttons on the control window too. I already have a .dll library, so I will only need to call the functions inside to direct the boards to action (by starting another thread).
There will be two services:
(Timer service) One will be a scheduled timer to turn the boards on/ off at a specific time. Users would be able to change the on/ off time. It would read from the database to get the on/ off time.
(Connection service) Another one will be responsible to ask and receive information/ status from the board every 30s or less. The work would be including connecting with the board through internet, asking for data, receiving the data and then writing the data to the database. And also writing down the exceptions thrown if the internet connection failed.
My questions:
1) For the connection service, I am wondering if I should be starting 20 threads to do this, one thread per connection to a board. Because if the connections were made by only one thread, the next board connection must wait for the first to finish, which may add up to 1-2 mins for the whole process to end. So I would need around 20 - 40 mins to get all the data back. But if I separate the connection to 20 threads, will it make a big difference in the performance? As the 20 threads never dies, it keeps asking for data every 30s if possible. Besides, does that mean I will have to have 20 database, as it would clash the database if 20 threads are writing in at the same time?
2) For updating the display of data on the main thread for every 1s, should I also start a service to do this? And as the connection service is also accessing the same database, will this clash the database?
There will be more than 100 boards to control and monitor in the future, so I would like to make the program as light as possible.
Thank you very much! Comments and ideas very much appreciated!

Starting 20 threads would be the best bet. (Or as Ralf said, use a thread when needed, in your specific case, it would probably be 20 at some point). Most databases are thread safe, meaning you can write into them from separate threads. If you use a "real" database, this isn't any issue at all.
No, use a Timer on the main thread to update your UI. The UI can easily read from the DB. As long as the update action itself is not taking a lot of time, it is OK to do it on the UI thread.

1) Why not use threads when needed. You can use one DBMS they are build to processing large amounts of information.
2) Not sure what you mean by start a service for the UI thread. As with 1) Database Management Systems are build to process data.

Manage data in a realtime simulation

I am working on a real-time simulation project for a vehicle and looking for advice regarding the best solution in C# to handle the data generated at each timestep.
Basically, I've got the main engine that computes a solution in real-time and can live on his own. In parallel, I need to store the data generated somehow - but without any real time requirements. At each timestep, I am generating sqlite command lines and looking for a solution to execute it in parallel without slowing down the main engine.
Is there any advice around to put together the best structure to handle this problem?

I don't know about "best", but a very good solution would be to put the data into a queue and have a separate thread that reads data from the queue and persists it.
The primary advantage is that the thread collecting the data doesn't get bogged down waiting for the database. It can just enqueue data for that timestep and go back to what it's doing (presumably getting data for the next timestep).
The thread that's persisting the data can run independently. If the database is fast enough, if can do a database insert for every time step. Or it might batch the data and send multiple records at once to do a batch insert.
To make all this happen, create a BlockingCollection (shared queue) that the collecting thread writes to and the persisting thread reads from. BlockingCollection handles multiple producers and multiple consumers without any need for you to do explicit locking or anything like that. It's real easy to use and it performs quite well. It makes this kind of thing very quick to implement.

Windows service threading with looping and WCF

First off, I will be talking about some legacy code and we are trying to avoid changing it as much as possible. Also, my experience with windows services and WCF is a bit limited so some of the questions may be a bit newbie. Just to give a bit of context before the question.
We have an existing service that loops. It checks via a database call to see if it has records to process. If it does not find any records, it sleeps for 30 seconds and then wakes back up to try again.
I would like to add an entry point to this service that would allow me to pass a record to this service in addition to it processing the records from the database. So the basic flow would be.
Loop
* Read record from database
* If no record from DB, process any records that were passed in via the entry point.
* No records at all, sleep for 30 seconds.
My concern is this. Is it possible to implement this in one service such that I have the looping process but I also allow for calls to come in at any time and add additional items to a queue that can be processed within the loop. My concern is with concurrency and keeping the loop and the listener from stepping on each other.
I know this question may not be worded quite right but I am on the new side with working with this. Any help would be appreciated.

My concern is with concurrency and keeping the loop and the listener from stepping on each other.
This shouldn't be an issue, provided you synchronize access correctly.
The simplest option might be to use a thread safe collection, such as a ConcurrentQueue<T>, to hold your items to process. The WCF service can just add items to the collection without worry, and your next processing step would handle it. The synchronization in this case is really minimal, as the queue would already be fully thread safe.

In addition to Reed's excellent answer, you might want to persist the records in a MSMQ queue to prevent your service from losing records on shutdown, restart, or crash of your service.

Is there a fast and scalable solution to save data?

I'm developing a service that needs to be scalable in Windows platform.
Initially it will receive aproximately 50 connections by second (each connection will send proximately 5kb data), but it needs to be scalable to receive more than 500 future.
It's impracticable (I guess) to save the received data to a common database like Microsoft SQL Server.
Is there another solution to save the data? Considering that it will receive more than 6 millions "records" per day.
There are 5 steps:
Receive the data via http handler (c#);
Save the received data; <- HERE
Request the saved data to be processed;
Process the requested data;
Save the processed data. <- HERE
My pre-solution is:
Receive the data via http handler (c#);
Save the received data to Message Queue;
Request from MSQ the saved data to be processed using a windows services;
Process the requested data;
Save the processed data to Microsoft SQL Server (here's the bottleneck);

6 million records per day doesn't sound particularly huge. In particular, that's not 500 per second for 24 hours a day - do you expect traffic to be "bursty"?
I wouldn't personally use message queue - I've been bitten by instability and general difficulties before now. I'd probably just write straight to disk. In memory, use a producer/consumer queue with a single thread writing to disk. Producers will just dump records to be written into the queue.
Have a separate batch task which will insert a bunch of records into the database at a time.
Benchmark the optimal (or at least a "good" number of records to batch upload) at a time. You may well want to have one thread reading from disk and a separate one writing to the database (with the file thread blocking if the database thread has a big backlog) so that you don't wait for both file access and the database at the same time.
I suggest that you do some tests nice and early, to see what the database can cope with (and letting you test various different configurations). Work out where the bottlenecks are, and how much they're going to hurt you.

I think that you're prematurely optimizing. If you need to send everything into a database, then see if the database can handle it before assuming that the database is the bottleneck.
If the database can't handle it, then maybe turn to a disk-based queue like Jon Skeet is describing.

Why not do this:
1.) Receive data
2.) Process data
3.) Save original and processsed data at once
That would save you the trouble of requesting it again if you already have it. I'd be more worried about your table structure and your database machine then the actual flow though. I'd be sure to make sure that your inserts are as cheap as possible. If that isn't possible then queuing up the work makes some sense. I wouldn't use message queue myself. Assuming you have a decent SQL Server machine 6 million records a day should be fine assuming you're not writing a ton of data in each record.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.