I'll give you a basic overview, summize it then ask the question so you all are informed as you can be. If you need more information please don't hesertate to ask.
Basic setup:
Client is constantly in communication with a server providing json data to be serialized and processed.
Client also needs to use this data and catalog it into a mysql server (main issue lay here)
Client also, to a lesser extent, needs to store some of the data provided by the server into a local database specific to that client.
So as stated above I have a client that communicates with a server that outputs json to be processed. Now the question isn't about the json data or the communication with the server. More so to do with the remote and local databases and what approach I should take in the, I guess, dto.
Now - originally I was going to process it through a loop inserting individual segments of data into the database one after each other until it reaches the end of the paginated data. This almost immediately showed itself to be troublesome as deadlocks become a thing very, very quickly. So quickly in fact that after about 1682 inputs deadlocks went from 1 in 500 to 9 of 10 until the rollback function threw again to stop execution.
Here is really my question.
What suggestions would you have to handle a large amount (> 500k) of data initially, then after time as the database is populated, segmented sections (~1k).
I've looked into csv's, bulkinput and query building with stringbuilder. Operationally the string builder option executes the fastest, but I'm not sure on how it will scale once the data is constantly running through it and not just test files.
Any general advice or suggestions. How you think it would be best. Stuff like that. Anything would help. Just looking for real-world scearios from people that have handled a situation like this and can just guide me in the right direction.
As for being told what to do - I will research the option given should you want to be more vague. That's fine :)
Thanks again
Edit: Also - Do you think using Tasks or coding my own threads is a better option for such a situation. Thanks
I personally would choose Bulk Copy. It's easy to implement and the fastest way to store thousands of records in the database.
Useful article to read: http://ignoringthevoices.blogspot.si/2014/09/working-with-entity-framework-code.html
Related
I am creating a website that creates randomised test data, part of this is a random name generator that I am wanting to scale to create circa a million names. (written in .net 4.5 - C#)
My initial solution was to create the names on the web server thread, obviously a bad idea and very slow. That has slowly evolved into a solution that has an offline batch processor that populates an Azure table with precompiled names, that are then downloaded by the webserver (or a worker role that does the final data compile)
However, this also seems quite slow, even running with parallel processes it is taking minutes to download the data.
So I am looking for the best architecture to speed this up.
I have considered having a worker role that does this processing, and keeps the results in memory and waits for the web server to request them. However I'm not sure this is the best approach, or if it will even solve the problem! (mostly because I don't know how to transfer the data out)
So I'm hoping for a little architectural advice, on the best way to bring this data in. I'm not sure if it is simply the case that it is going to take a couple of minutes to process that many records.
(additionally added)
The code is running on an Azure web instance, pulling data out of an Azure Storage Table in the same region.
I have profiled the app, and most of the time is spent downloading data from the table.
The data that the random name generator seeds from is a few hundred thousand records in another Azure Table.
I'm now wondering if maybe I'm asking the wrong question! Maybe the simpler question is, given a source of a few hundred thousand first name / surnames what would the best way to compile a million of them to pull into a web query
P.S. I am not a c# guy by any stretch (more a sysadmin) my c# generally follows the script kiddie approach of find something online that is vaguely close and assimilate - I just can't find anything that is vaguely close in this case. (which makes me think I'm missing something obvious)
In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.
I was looking for some advice on the best approach to a TCP/IP based server. I have done quite a bit of looking on here and other sites and cant help think what I have saw is overkill for the purpose I need it for.
I have previously written one on a thread per connection basis which I now know wont scale well, but what I was thinking was rather that creating a new thread per connection I could use a ThreadPool and queue the incoming connections for processing as time isn't a massive issue (provided they will be processed in less that a minute or two of coming in).
The server itself will be used essentially for obtaining data from devices and will only occasionally have to send a response to the sending device to update settings (Again not really time critical as the devices are setup to stay connected for as long as they can and if for some reason if it becomes disconnected the response will be able to wait until the next time it sends a message).
What I wanted to know is will this scale better than the thread per connection scenario (I assume that it will due to the thread reuse) and roughly what kind of number of devices could this kind of setup support.
Also if this isn't deemed suitable could someone possibly provide a link or explanation of the SocketAsyncEventArgs method. I have done quite a bit of reading on the topic and seen examples but cant quite get my head around the order of events etc and why certain methods are called at the time the are.
Thanks for any and all help.
I have read the comments but could anybody elaborate on these?
Though to be honest i would prefer the initial approach of of rolling my own.
Is it bad practice to use a mysql database running on some remote server as a means of interfacing 2 remote computers? For example having box1 poll on a specific row of the remote db checking for values posted by box2, when box2 posts some value box1 carries out a,b,c.
Thanks for any advice.
Consider using something like ZeroMQ, which is an easy-to-use abstraction over sockets with bindings for most languages. There is some nice intro documentation as well as many examples of various patterns you can use in your application.
I can understand the temptation of using a database for this, but the idea of continually writing/polling simply to signal between clients wastes IO, ties up connections, etc. and, more importantly, seems like it would difficult to understand/debug by another person (or yourself in two years).
You can. If you were building something complex, I would caution against it, but it's fine -- you need to deal with having items being done only once, but that's not that difficult.
What you are doing is known as a message queue and there are open-source projects specific to that -- including some built on MySql.
Yes?
You're obfuscating the point of your code by placing a middleman in the situation. It sounds like you're trying to use something you know to do something you don't know. That's pretty normal, because then the problem seems solvable.
If there are only 2 computers (sender-receiver), then it is bad practice if you need fast response times. Otherwise it's fine... direct socket connection would be better, but don't waste time on it if you don't really need it.
On the other hand if there are more than two machines and/or you need fault tolerance then you actually need a middleman. Depending of the signalling you want between the machines the middleman can be a simple key-value store (e.g.: memcached, redis) or a message queue (e.g.: a dedicated message queue sofware, but I have seen MySQL used as a Queue at two different sites with big traffic)
Greetings,
I've been working on a C#.NET app that interacts with a data logger. The user can query and obtain logs for a specified time period, and view plots of the data. Typically a new data log is created every minute and stores a measurement for a few parameters. To get meaningful information out of the logger, a reasonable number of logs need to be acquired - data for at least a few days. The hardware interface is a UART to USB module on the device, which restricts transfers to a maximum of about 30 logs/second. This becomes quite slow when reading in the data acquired over a number of days/weeks.
What I would like to do is improve the perceived performance for the user. I realize that with the hardware speed limitation the user will have to wait for the full download cycle at least the first time they acquire a larger set of data. My goal is to cache all data seen by the app, so that it can be obtained faster if ever requested again. The approach I have been considering is to use a light database, like SqlServerCe, that can store the data logs as they are received. I am then hoping to first search the cache prior to querying a device for logs. The cache would be updated with any logs obtained by the request that were not already cached.
Finally my question - would you consider this to be a good approach? Are there any better alternatives you can think of? I've tried to search SO and Google for reinforcement of the idea, but I mostly run into discussions of web request/content caching.
Thanks for any feedback!
Seems like a very reasonable approach. Personally I'd go with SQL CE for storage, make sure you index the column holding the datetime of the record, then use TableDirect on the index for getting and inserting data so it's blazing fast. Since your data is already chronological there's no need to get any slow SQL query processor involved, just seek to the date (or the end) and roll forward with a SqlCeResultSet. You'll end up being speed limited only by I/O. I profiled doing really, really similar stuff on a project and found TableDirect with SQLCE was just as fast as a flat binary file.
I think you're on the right track wanting to store it locally in some queryable form.
I'd strongly recommend SQLite. There's a .NET class here.