C# Alternatives to Databases for Frequently Changing Data? - c#

I am writing an application that will perform analytics on logs for the purpose of graphical display.
Each line of data will be analyzed and counters for different tracked metrics will be updated.
For instance, the following line:
[01:15:45] WARNING Application1 Error1 Message Text Goes Here
Would translate to the following updated metrics:
+1 Log received during Hour 01
+1 Log received during Minute 15
+1 Log received during Second 45
+1 WARNING severity received
+1 Application1 application received
+1 Error1 error received
Depending on the underlying data architecture that single line could end up being 6 INSERT/UPDATE statements. As the number of metrics increases so does the load on the database. What if I wanted to track 30 other things about the above line? That would be 30 statements, and depending on the database size, UPDATEs could take a while.
The easiest way I can think to store this data is simply as objects during the application execution, except I'm now constrained by memory limits. In addition, when the application restarts it would have to parse the entire data set over again.
Are there other database-like technologies out there for managing data of this type? The only thing I can think that makes this data "special" is the fact that there will be a LARGE number of small changes. Since this tool will be single-threaded there is no immediate concern for the data to be transaction-ally sound.
Is there a term for this type of data or solution that would help search for a solution? Surely someone has come across this type of need before.

As you said, use Custom Objects and whenever you reach 30 lines serialize it to disk via XML or Binary serialization then free memory, so in this case you will have only 30 lines to work at a time. at the end of each day or when you are finished processing lines, create a thread or process to deserialize the data and BULK insert them into database which will only require one DB hit to insert many rows.

Related

AWS SQS with mysql and C# with hundreds of thousands messages

I am working with aws sqs queue. The queue may having massive messages i.e if i do not process there will be more than a million mesasge per hour.
I am processing all the messages and putting them into a mysql table. Innodb with 22 columns. Insert on Duplicate Key Update. I have a primary key and unique key.
I am working with C# where i ran 80 threads in order to pull messages from sqs.
I applied transaction in c# run the query like "insert on duplicate key update"
at the same time i am using lock in c# so only single thread can update the table. if id do not use C# lock then an exception is thrown from mysql dead lock occured.
Problem is here i can see there are a lot of threads are waiting before C# lock and this time gradually increasing. Can any body suggest me what is the best way to do this..
Note, i have 8GB RAM intell xeon 2.53 with 1GE internet speed. please suggest me in this regard.
If I were to do it, the C# program would primarily be creating the CSV file to empty your SQS queue. Or at least a significant chunk of it. The file would then be used for bulk insert into an empty non-indexed in anyway worktable. I would steer for non-temporary but whatever. I see no reason to add temporary to the mix when this is recurring, and when done the worktable is truncated anyway.
The bulk insert would be achieved through LOAD DATA FROM INFILE fired off from the c# program. Alternatively, a value in a new row in some other table could be written with an incrementer saying file2 is ready, file3 is ready, and the LOAD happens in an event triggered, say every n minutes. An event that was put together with mysql Create Event. Six of one, half dozen of another.
But the benefits of a sentinal, a mutex, might be of value, as this whole thing happens in batches. And the next batch(es) to be processed need to be suspended while this occurs. Let's call this concept The Blocker, and the one being worked on is row N.
Ok, now your data is in the worktable. And it is safe from being stomped on until processed. Let's say you have 250k rows. Other batches shortly to follow. If you have special processing to have happen, you may wish to create indexes. But at this moment there are none.
You perform a normal insert on duplicate key update (IODKU) to the REAL table using this worktable. It would, in that IODKU follow a normal insert into select pattern, where the select part comes from the worktable.
At the end of that statement, the worktable is truncated, any indexes dropped, row N has its status set to complete, and The Blocker is free to work on row N+1 when it appears.
The indexes are dropped to facilitate the next round of bulk insert, where maintaining indexes is of least importance. And indexes on the worktable may very well be overhead baggage unnecessary during IODKU.
In this manner, you get the best of both worlds
LOAD DATA FROM INFILE
IODKU
And the focus is taken off of multi-threading, a good thing to take one's focus off of.
Here is a nice article on performance and strategies titled Testing the Fastest Way to Import a Table into MySQL. Don't let the mysql version of the title or inside the article scare you away. Jumping to the bottom and picking up some conclusions:
The fastest way you can import a table into MySQL without using raw
files is the LOAD DATA syntax. Use parallelization for InnoDB for
better results, and remember to tune basic parameters like your
transaction log size and buffer pool. Careful programming and
importing can make a >2-hour problem became a 2-minute process. You
can disable temporarily some security features for extra performance
I would separate the C# routine entirely from the actual LOAD DATA and IODKU update effort and leave that to the event mentioned with Create Event for several reasons. Mainly better design. As such the C# program is only dealing with SQS and writing out files with incrementing file #'s.

Storing related timed-based data with variable logging frequencies

I work with a data logging system in car racing. I am developing an application that aids in the analysis of this logged data and have found some of the query functionality with, datasets, datatables and LINQ to be very useful, i.e. Minimums, Averages, etc.
Currently, I am extracting all data from its native format to a data table and post-processing that data. I am also currently working with data where all channels are logged at the same rate, i.e. 50 Hz (50 samples per second). I would like to start writing this logged data to a database so it is somewhat platform independent, and the extraction process doesn't have to happen everytime I want to analyze a dataset.
Which leads me to the main question... Does anyone have a recommendation for the best way to store data that is related by time, but logged at different rates? I have approximately 200 channels that are logged and the rates vary from 1 Hz to 500 Hz.
Some of the methods I have thought of so far are:
creating a datatable for all data at 500 Hz using Double.NaN for values that are between actual logged samples
creating separate tables for each logging frquency, i.e. one table for 1 Hz, another for 10 Hz, and another for 500 Hz.
creating a separate table for each channel with a relationship to a time table. Each time step would then be indexed, and the table of data for each channel would not be dependent on a fixed time frequency
I think I'm leaning towards the index time stamp with a separate table for each channel, but I wanted to find out if anyone has advice on a best practice.
For the record, the datasets can range from 10 Mb to 200-300 Mb depending on the duration of the time that the car is on track.
I would like to have a single data store that houses an entire season, or at least an entire race event, so that is something I am considering as well.
Thanks very much for any advice!
Can you create a table something like:
Channel, Timestamp, Measurement
?
The database structure doesn't need to depend on the frequency; the frequency can be determined by the amount of time between timestamps.
This gives you more flexibility as you can write one piece of code to handle the calculations on all the channels, just give it a channel name.

Approach for caching data from data logger

Greetings,
I've been working on a C#.NET app that interacts with a data logger. The user can query and obtain logs for a specified time period, and view plots of the data. Typically a new data log is created every minute and stores a measurement for a few parameters. To get meaningful information out of the logger, a reasonable number of logs need to be acquired - data for at least a few days. The hardware interface is a UART to USB module on the device, which restricts transfers to a maximum of about 30 logs/second. This becomes quite slow when reading in the data acquired over a number of days/weeks.
What I would like to do is improve the perceived performance for the user. I realize that with the hardware speed limitation the user will have to wait for the full download cycle at least the first time they acquire a larger set of data. My goal is to cache all data seen by the app, so that it can be obtained faster if ever requested again. The approach I have been considering is to use a light database, like SqlServerCe, that can store the data logs as they are received. I am then hoping to first search the cache prior to querying a device for logs. The cache would be updated with any logs obtained by the request that were not already cached.
Finally my question - would you consider this to be a good approach? Are there any better alternatives you can think of? I've tried to search SO and Google for reinforcement of the idea, but I mostly run into discussions of web request/content caching.
Thanks for any feedback!
Seems like a very reasonable approach. Personally I'd go with SQL CE for storage, make sure you index the column holding the datetime of the record, then use TableDirect on the index for getting and inserting data so it's blazing fast. Since your data is already chronological there's no need to get any slow SQL query processor involved, just seek to the date (or the end) and roll forward with a SqlCeResultSet. You'll end up being speed limited only by I/O. I profiled doing really, really similar stuff on a project and found TableDirect with SQLCE was just as fast as a flat binary file.
I think you're on the right track wanting to store it locally in some queryable form.
I'd strongly recommend SQLite. There's a .NET class here.

Is there a fast and scalable solution to save data?

I'm developing a service that needs to be scalable in Windows platform.
Initially it will receive aproximately 50 connections by second (each connection will send proximately 5kb data), but it needs to be scalable to receive more than 500 future.
It's impracticable (I guess) to save the received data to a common database like Microsoft SQL Server.
Is there another solution to save the data? Considering that it will receive more than 6 millions "records" per day.
There are 5 steps:
Receive the data via http handler (c#);
Save the received data; <- HERE
Request the saved data to be processed;
Process the requested data;
Save the processed data. <- HERE
My pre-solution is:
Receive the data via http handler (c#);
Save the received data to Message Queue;
Request from MSQ the saved data to be processed using a windows services;
Process the requested data;
Save the processed data to Microsoft SQL Server (here's the bottleneck);
6 million records per day doesn't sound particularly huge. In particular, that's not 500 per second for 24 hours a day - do you expect traffic to be "bursty"?
I wouldn't personally use message queue - I've been bitten by instability and general difficulties before now. I'd probably just write straight to disk. In memory, use a producer/consumer queue with a single thread writing to disk. Producers will just dump records to be written into the queue.
Have a separate batch task which will insert a bunch of records into the database at a time.
Benchmark the optimal (or at least a "good" number of records to batch upload) at a time. You may well want to have one thread reading from disk and a separate one writing to the database (with the file thread blocking if the database thread has a big backlog) so that you don't wait for both file access and the database at the same time.
I suggest that you do some tests nice and early, to see what the database can cope with (and letting you test various different configurations). Work out where the bottlenecks are, and how much they're going to hurt you.
I think that you're prematurely optimizing. If you need to send everything into a database, then see if the database can handle it before assuming that the database is the bottleneck.
If the database can't handle it, then maybe turn to a disk-based queue like Jon Skeet is describing.
Why not do this:
1.) Receive data
2.) Process data
3.) Save original and processsed data at once
That would save you the trouble of requesting it again if you already have it. I'd be more worried about your table structure and your database machine then the actual flow though. I'd be sure to make sure that your inserts are as cheap as possible. If that isn't possible then queuing up the work makes some sense. I wouldn't use message queue myself. Assuming you have a decent SQL Server machine 6 million records a day should be fine assuming you're not writing a ton of data in each record.

Querying data at runtime of its insertion: can we make use of caching?

We are building an application which requires a daily insertion of approximately 1.5 million rows of data per table. We have 16 tables.
We keep track of 3-day historical data including the current day's data.
The application is done using C#; on the server side, we run an exe that fills the data tables during market hours (4.5 hours), and we update the 16 tables every 5 seconds.
On the client side, the application gets user queries which require the most recently inserted data ( in the last 5 seconds) and a historical point which could be today or before, and plots them somehow.
We are having some serious performance issue, as one query might take 1 second or more which is too much. The question is, for today's data that is being inserted at runtime, can we make use of caching instead of going to the database each time we want something from today's data? Will that be more efficient? And if so, how can we do that?
P.S one day data is approximately 300 MB, and we have enough RAM
Keep a copy of the data along with the datetime you used to retrieve the data. The next time, retrieve only the new data, which minimizes the amount of data you send over the wire.
If it is that all the queries run in the operation amount to 1 sec, maybe the issue you are seeing is that the UI is freezing. If that is the case, don't do it on the UI thread.
Upate (based on comments): the code you run in the event handlers of the controls, runs in the UI thread, which is what causes the UI to freeze. There isn't a single way to run it in a separate thread, I suggest BackGroundWorker for this scenario. Look the community provided example at the end.

Categories