Manage data in a realtime simulation - c#

I am working on a real-time simulation project for a vehicle and looking for advice regarding the best solution in C# to handle the data generated at each timestep.
Basically, I've got the main engine that computes a solution in real-time and can live on his own. In parallel, I need to store the data generated somehow - but without any real time requirements. At each timestep, I am generating sqlite command lines and looking for a solution to execute it in parallel without slowing down the main engine.
Is there any advice around to put together the best structure to handle this problem?

I don't know about "best", but a very good solution would be to put the data into a queue and have a separate thread that reads data from the queue and persists it.
The primary advantage is that the thread collecting the data doesn't get bogged down waiting for the database. It can just enqueue data for that timestep and go back to what it's doing (presumably getting data for the next timestep).
The thread that's persisting the data can run independently. If the database is fast enough, if can do a database insert for every time step. Or it might batch the data and send multiple records at once to do a batch insert.
To make all this happen, create a BlockingCollection (shared queue) that the collecting thread writes to and the persisting thread reads from. BlockingCollection handles multiple producers and multiple consumers without any need for you to do explicit locking or anything like that. It's real easy to use and it performs quite well. It makes this kind of thing very quick to implement.

Related

Best practice to have c# winform gui update continuously via Redis versus polling database

I am developing a c# winform gui that revolves around a few datagridviews that need to be updated at some manageable interval depending on # of updates, somewhere around the 1 - 3 second mark.
Currently, upon startup, it retrieves the current batch of data for the grids from a mysql database, then listens to various redis channels for updates/adds/deletes to those grids. These updates are handled by the redis multiplexer and heavy calculations are offloaded to worker threads, then the gui is updated. So essentially it is one to one, meaning one update/add/delete is processed and the gui is then updated. This works well enough so far however during a crunch of data i'm beginning to notice slowness, it should not happen since i'm expecting much heavier loads in the future.
As the system throughput grows, where it is currently at most around 100 redis messages every couple of seconds, this should be designed to handle thousands.
Conceptually, when it comes to general gui design in this fashion, would it be better to do one of the following:
Decouple from the current 1 to 1 update scenario described above, redis msg -> process -> update gui, and have all redis messages queue in a list or datatable, then on a timer poll this awaiting update queue by the gui and update. This way the gui is not flooded, it updates on its own schedule.
Since these updates coming from redis are also persisted in the mysql database, just ignore redis completely, and at some timed interval query the database, however this would probably result in an entire requeue since it will be tough to know what has changed since the last pull.
Do away with attempting to update the gui in semi-realtime fashion, and only provide a summary view then if user digs in, retrieve data accordingly, but this still runs in to the same problem as the data that is then being viewed should be updated, albeit a smaller subset. However, there exist tons of sophisticated c# enterprise level applications that represent large amounts of data updating especially in the finance industry and seem to work just fine.
What is best practice here? I prefer options 1 or 2 because in theory it should be able to work.
thank you in advance

How can I keep my parallel application across multiple servers from grabbing the same mongodb document for work?

So the question is long but pretty self explanatory. I have an app that runs on multiple servers that uses parallel looping to handle objects coming out of a MongoDB Collection. Since MongoDB forces me to allow multi read access I cannot stop multiple processes and or servers from grabbing the same document from the collection and duplicating work.
The program is such that the app waits for information to appear, does some work to figure out what to do with it, then deletes it once it's done. What I hope to achieve is that if I could keep documents from being accessed at the same time, knowing that once one has been read it will eventually be deleted, I can speed up my throughput a bit overall by reducing the number of duplicates and allowing the apps to grab things that aren't being worked.
I don't think pessimistic is quite what I'm looking for but maybe I misunderstood the concept. Also if alternative setups are being used to solve the same problem I would love to hear what might be being used.
Thanks!
What I hope to achieve is that if I could keep documents from being accessed at the same time
The simplest way to achieve this is by introducing a dispatch process architecture. Add a dedicated process that just watch for changes then delegate or dispatch the tasks out to multiple workers.
The process could utilise MongoDB ChangeStreams to access real-time data changes on a single collection, a database or an entire deployment. Once it receives a stream/document, just sends to a worker for processing.
This should also reduce multiple workers trying to access the same tasks and have a logic to back-down.

Is it safe to perform a long action in a separate thread?

I'm dealing with a CSV file that's being imported Client side.
This CSV is supposed to contain some information which will be destinated to perform an update in one of the tables of my company's database.
My C# function process the file, looking for errors and, if no errors were found, it sends a bunch of update commands (files usually vary from 50 to 100000 lines).
Until now, I was performing the update in the same thread to execute the update (line by line), but it was getting a little slow, depending on the file, so I choose to send all the SQL to an Azure SQL Queue (which is a service that gets lots of "messages" and runs the SQL code agains the database), so that, the client wouldn't have to wait that much for the action to be performed.
It got a little faster, but still takes a long time (due to the requests to the Azure SQL Queue). So I noticed that putting that action in a separate thread worked and sent all the SQL to my Azure SQL Queue.
I got a little worried about it though. Is it really safe to perform long actions in separate threads? Is it reliable?
A second thread is 100% like the main thread that you're used to working with. I wish I had some authority answer at hand but this is such a common practice that people don't write those anymore...
So, YES, off-loading the work to a second thread is safe and can be considered by most the recommended way to go by it.
Edit 1
Ok, if your thread is running on IIS, you need to register it or it will die because once the request/reponse cycle finishes it will kill it...

Using a Concurrent Queue to pass objects between two components of my applciation

I have a program of mine which makes use of the c# concurrent Queue to pass data from my one component to the other.
Component 1:
Multiple network connections receive data and then put it into this Queue
Component 2:
Reads data from this queue and then processes it.
Ok, good all makes sense ( I sure hope).
Now what I want to know, is what is the best / most efficient way to go about passing the data between the two components?
Option 1:
Poll the queue for new data in component 2? Which will entail blocking code, or atleast a while(true)
Option 2:
I don't know if this is possible, but that's why im here asking. Does the queue data structure not have a sort of functionality that say my component 2 can register to the queue to be notified of any inserts / changes? This way whenever data is added it can just go fetch it, and I can then avoid any blocking / polling code.
Component 1 ( Producer) require either manual or automatic blocking since you anticipate multiple access (multiple post mentioned) while producing. This means BlockingQueue make sense in Component1. However, in Component 2 (Consumer), if you think you only (at any time) have one consumer then you don’t need any blocking code.
In order to save or avoid while, you must need a mechanism to inform the consumer that someone has added something into the queue. This can be achieved using a custom eventing (not talking about EventHandle subtypes). Keep in mind, you may not have the element order in such style of eventing.
For a simple implementation of Producer/Consumer you can try using BlockingCollection. For a more complex consumption of data from from various sources Reactive Extensions might help. It's a much steeper learning curve but it is a very powerful pull based framework, so you don't need to do any polling.

What type of queue to use in parallel data processing - C# - .NET 4

Scenario:
Data is received and written to database with timestamps. I need to process the raw data in the order that is received based on the time stamp and write it back to the database, different table, again maintaining the order based on the timestamp.
I came up with the following design: Created two queues, one for storing raw data from database, another for storing processed data before it's written back to DB. I have two threads, one reading to the Initial queue and another reading from Result queue. In between i spawn multiple threads to process data from Initial queue and write it to Result queue.
I have experimented with SortedList (manual locking) and BlockingCollection. I have used two approaches to process in parallel: Parallel.For(ForEach) and TaskFactory.Task.StartNew.
Each unit of data may take variable amount of time to process, based on several factors. One thread can still be processing the first data point while other threads are done with three or four datapoints each, messing up the timestamp order.
I have found out about OrderingPartitioner recently and i thought it would solve the problem, but following MSDNs example i can see, that it's not sorting the underlying collection either. May be i need to implement custom partitioner to order my collection of complex data types? or may be there's a better way of approaching the problem?
Any suggestions and/or links to articles discussing similar problem is highly appreciated.
Personally, I would at least try to start with using a BlockingCollection<T> for the input and a ConcurrentQueue<T> instance for the results.
I would use Parallel Linq to process the results. In order to preserve the order during your processing, you could use AsOrdered() on the PLINQ statement.
Have you considered PLINQ and AsOrdered()? It might be helpful for what you're trying to achieve.
http://msdn.microsoft.com/en-us/library/dd460719.aspx
Maybe you've considered these things, but...
Why not just pass the timestamp to the database and then either let the database do the ordering or fix the ordering in the database after all processing threads have returned? Do the sql statements have to be executed sequentially?
PLINQ is great but I would try to avoid thread synchronization requirements and simply pass more ordering data to the database if you can.

Categories