Database Insert Performance When Grabbing Items From Queue

Database Insert Performance When Grabbing Items From Queue - c#

We're using RabbitMQ for storing lightweight messages that we eventually want to store in our SQL Server database. There will be times when the queue is empty and times when there is a spike of traffic - 30,000 messages.
We have a C# console app running in the same server.
Do we have the console app run every minute or so and grab a designated number of items off the queue for insertion into the database? (taking manageable bites)
OR
Do we have the console app always "listen" and hammer items into the database as they come in? (more aggressive approach)

Personally I'd go for the first approach. During those "spike" times, you're going to be hammering the database with potentially 30,000 inserts. Whilst this potentially could complete quite quickly (depending on many variables outside the scope of this question), we could do this a little smarter.
Firstly, by periodically polling, you can grab "x" messages from the queue and bulk insert them in a single go (performance-wise, you might want to tweak the the 2 variables here... polling time and how many you take from the queue).
One problem with this approach is that you might end up falling behind during busy periods. So you could make your application change it's polling time based on how many it is receiving, whilst keeping between some min/max thresholds. E.g. if you suddenly get a spike and grab 500 messages... you might decrease your poll time. If the next poll, you can still get thousand, do it again, decrease poll time. As the number you are able to get drops off, you can then begin increasing your polling time under a particular threshold.
This would give you the best of both world imho and be reactive to the spikes/lull periods.

It depends a bit on your requirement but I would create a service that calls SQLBulkCopy to that bulk inserts every couple of minutes. This is by far the fastests approach. Also if your Spike is 30k records I would not worry too much about falling behind.
http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx
We have a C# console app running in the same server.
Why not a Service?

What I would do is have the console app always listen to the rabbitmq and then in the console app build your own queue for inserting into the database that way you can throttle the database insertion. By doing this you can control the flow in busy a time by only allowing so many tasks at once and then in slow times you get a faster reaction then polling every so often. The way I would do this is by raising an event and the you know there is something to do in the queue and you can check the queue length to see how many transactions you want to process.

Instead of using a Console Application, you could set up a Windows Service, and set up a timer on the service to poll every n minutes. Take a look at the links below:
http://www.codeproject.com/Questions/189250/how-to-use-a-timer-in-windows-service
http://msdn.microsoft.com/en-us/library/zt39148a.aspx
With a Windows Service, if the server is re-booted, the service can be set up to restart.

Related

Best practice to have c# winform gui update continuously via Redis versus polling database

I am developing a c# winform gui that revolves around a few datagridviews that need to be updated at some manageable interval depending on # of updates, somewhere around the 1 - 3 second mark.
Currently, upon startup, it retrieves the current batch of data for the grids from a mysql database, then listens to various redis channels for updates/adds/deletes to those grids. These updates are handled by the redis multiplexer and heavy calculations are offloaded to worker threads, then the gui is updated. So essentially it is one to one, meaning one update/add/delete is processed and the gui is then updated. This works well enough so far however during a crunch of data i'm beginning to notice slowness, it should not happen since i'm expecting much heavier loads in the future.
As the system throughput grows, where it is currently at most around 100 redis messages every couple of seconds, this should be designed to handle thousands.
Conceptually, when it comes to general gui design in this fashion, would it be better to do one of the following:
Decouple from the current 1 to 1 update scenario described above, redis msg -> process -> update gui, and have all redis messages queue in a list or datatable, then on a timer poll this awaiting update queue by the gui and update. This way the gui is not flooded, it updates on its own schedule.
Since these updates coming from redis are also persisted in the mysql database, just ignore redis completely, and at some timed interval query the database, however this would probably result in an entire requeue since it will be tough to know what has changed since the last pull.
Do away with attempting to update the gui in semi-realtime fashion, and only provide a summary view then if user digs in, retrieve data accordingly, but this still runs in to the same problem as the data that is then being viewed should be updated, albeit a smaller subset. However, there exist tons of sophisticated c# enterprise level applications that represent large amounts of data updating especially in the finance industry and seem to work just fine.
What is best practice here? I prefer options 1 or 2 because in theory it should be able to work.
thank you in advance

How to create Windows Service with unknown periodicity?

The task is to create Windows Service which should periodically connect to the SQL Server database (it contains GPS data from hundreds or thousands of cars), read data from the table, process it and write result to another table.
The problem is that depends on how much data are there in the database processing time can vary from milliseconds to several hours.
If there are a lot of data it should wait until previous processing will end and then start another iteration.
If there are not much data it should accumulate at least 500 GPS records, process it and start new iteration.
Please provide your examples with C#.
P.S.
Processing of GPS data means generating complex car events, for example, defining car overspeed, stop points, entering specific geographical zone and so on...
From the algorithmic point of view generating some of these events can be resource intensive.
P.P.S
I have already create it but as console application with infinite cycle, but I'm new to windows services and I don't know how to realize such functionality as windows service correctly.

I would
create a watchdog service that checks if the processing
application is running every X seconds/minutes/hours (depending on
how often you want to process the data).
If it is running then wait until the next scheduled event time and check again.
If it's not running then start the processing application.

multi threading from multiple machines

I have researched a lot and I haven't found anything that meets my needs. I'm hoping someone from SO can throw some insight into this.
I have an application where the expected load is thousands of jobs per customer and I can have 100s of customers. Currently it is 50 customers and close to 1000 jobs per each. These jobs are time sensitive (scheduled by customer) and can run up to 15 minutes (each job).
In order to scale and match the schedules, I'm planning to run this as multi threaded on a single server. So far so good. But the business wants to scale more (as needed) by adding more servers into the mix. Currently the way I have it is when it becomes ready in the database, a console application picks up first 500 and uses Task Parallel library to spawn 10 threads and waits until they are complete. I can't scale this to another server because that one could pick up the same records. I can't update a status on the db record as being processed because if the application crashes on one server, the job will be in limbo.
I could do a message queue and have multiple machines pick from it. The problem with this is the queue has to be transactional to support handling for any crashes. MSMQ supports only MS DTC transaction since it involves database and I'm not really comfortable with DTC transactions, especially with multi threads and multiple machines. Too much maintenance and set up and possibly unknown issues.
Is SQL service broker a good approach instead? Has anyone done something like this in a production environment? I also want to keep the transactions short (A job could run for 15,20 minutes - mostly streaming data from a service). The only reason I'm doing a transaction is to keep the message integrity of queue. I need the job to be re-picked if it crashes (re-appear in the queue)
Any words of wisdom?

Why not having an application receive the jobs and insert them in a table that will contain the queue of jobs. Each work process can then pick up a set of jobs and set the status as processing, then complete the work and set the status as done. Other info such as server name that processed each job, start and end time-stamp could also be logged. Moreover, instead of using multiple threads, you could use independent work processes so as to make your programming easier.
[EDIT]
SQL Server supports record level locking and lock escalation can also be prevented. See Is it possible to force row level locking in SQL Server?. Using such mechanism, you can have your work processes take exclusive locks on jobs to be processed, until they are done or crash (thereby releasing the lock).

Scalability and availability

I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.

I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.

how to synchronize near realtime reads from a sql server table

We have a reporting app thats needs to update it's charts as the data gets written to it's corresponding table. (the report is based off just one table). Currently we just keep the last read sessionid + rowid (unique combo) in memory and a polling timer just does a select where rowid > what we have in memory (to get the latest rows added). Timer runs every second or so and the fast sql reader does it's job well. So far so good. However I feel this is not optimal because sometimes there are pauses in the data writes due to the process by design. (user clicking the pause button on the system that writes data ..). Meanwhile our timer keeps hitting the db and does not get any new rows. No errors or anything. How is this situation normally handled. The app that writes the data is separate from the reporting app. The 2 apps run on different machines. Bottomline : How to get data into a c# app as and when it is written into a sql server table without polling unnecessarily. thank you

SQL Server has the capability to notify a waiting application for changes, see The Mysterious Notification. This is how SqlDependency works. But this will only work up to a certain threshold of data change rate. If your data changes too frequently then the cost of setting up a query notification just to be immediately invalidated by receiving the notification is too much. For really high end rates of changes the best place is to notify the application directly from the writer, usually achieved via some forms of a pub-sub infrastructure.
You could also attempt a mixed approach: pool for changes in your display application and only set up a query notification if there are no changes. This way you avoid the cost of constantly setting up Query Notifications when the rate of changes is high, but you also get the benefits of non-pooling once the writes settle down.

Unfortunately the only 'proper' way is to poll, however you can reduce the cost of this polling by having SQL wait in a loop (make sure you WAITFOR something like 30ms each loop pass) until data is available (or a set time period elapses, e.g. 10s). This is commonly used when writing SQL pseudoqueues.
You could use extended procs - but that is fragile, or, you could drop messages into MSMQ.

If your reporting application is running on a single server then you can have the application that is writing the data to SQL Server also send a message to the reporting app letting it know that new data is available.
However, having your application connect to the server to see if new records have been added is the most common way of doing it. As long as you do the polling on a background thread, it shouldn't effect the performance of your application at all.

you will need to push the event out of the database into the realm of your application.
The application will need to listen for the message. (you will need to decide what listening means - what port, what protocol, what format etc.)
The database will send the message based on the event through a trigger. (you need to look up how to use external application logic in triggers)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.