At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
The basic concept I thought would be best is once the data is received and confirmed as the entire message, the message should then be passed of to some sort of collection to await processing on a FIFO basis, which will parse the values and insert them to sql server. I suppose this is whats known as the consumer/producer pattern.
I have been doing some looking into the best collection / way of doing this and have so far seen the BlockingCollection,ConcurrentCollection and BufferBlock using async/await and i think this may be the way to go but to be honest im not sure.
The best example i have found is on Stephen Cleary's blog in particular this article,
http://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
My main reservations are that I in no way want to slow down or interrupt the receiving of messages which to me would suggest using the multiple producer/consumer example which can be seen at the above link, but what i want to know is;
Am i correct in this assumption or is there a more suitable way of doing this in my scenario.
And if im correct in my assumption could anyone suggest the best way of implementing this taking into consideration my use case.
Any and all help is much appreciated.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
There's a common pitfall with this kind of scenario. It is usually wrong to report success back to the client when the work has yet to be done. Most of the time I've seen this design, it's because of an efficiency "requirement" self-imposed by the developer, not by the client or for technical reasons. So first, take a step back and make absolutely sure that you do want to return a "successful completion" message to the client when the operation has not actually completed yet.
If you are sure that's what you want to do, then there's another question you must ask: is it acceptable to lose requests? That is, after you tell the client that the operation successfully completed, will the system still be stable if the operation does not actually ever complete?
The answer to that question is usually "no." At that point, the most common architectural solution is to have an out-of-process reliable queue (such as an Azure queue or MSMQ), with an independent backend (such as an Azure worker role or Win32 service) that processes the queue messages. This definitely complicates the architecture, but it is a necessary complication if the system must return completion messages early and must not lose messages.
On the other hand, if losing messages is acceptable, then you can keep them in-memory. It is only in this case that you can use one of the in-memory producer/consumer types mentioned on my blog. This is a very rare situation, but it does happen from time to time.
In general, I would avoid using BlockingCollection and friends for this sort of work. Doing so encourages you to architect the entire system into a single process, which is the enemy of scalability and reliability.
I second Stephen Cleary's suggestion of using an out-of-process queue to manage the work. I disagree that this necessarily complicates the architecture, though - in fact, I think it can make things quite a bit simpler. Specifically, a major complication of the original requirement ("put together an asynchronous tcp server") disappears. Asynchronous TCP servers are a pain in the butt to write and easy to screw up - why not just skip that part altogether and be free to focus all of your energy on the post-processing code?
When I built a system like this, I used a Redis List as the task queue. Tasks were serialized to JSON, and clients would add their task to the queue with a RPUSH command. Worker processes retrieve the next task from the queue BLPOP, do their thing, then go back to waiting for the next task.
Advantages:
No locks. All synchronization comes for free from Redis (or whatever task queue you choose).
Everything in the system is single-threaded. Multi-threading is hard.
I'm free to spin up as many worker processes as I want, across as many nodes as I want.
Related
I was looking for some advice on the best approach to a TCP/IP based server. I have done quite a bit of looking on here and other sites and cant help think what I have saw is overkill for the purpose I need it for.
I have previously written one on a thread per connection basis which I now know wont scale well, but what I was thinking was rather that creating a new thread per connection I could use a ThreadPool and queue the incoming connections for processing as time isn't a massive issue (provided they will be processed in less that a minute or two of coming in).
The server itself will be used essentially for obtaining data from devices and will only occasionally have to send a response to the sending device to update settings (Again not really time critical as the devices are setup to stay connected for as long as they can and if for some reason if it becomes disconnected the response will be able to wait until the next time it sends a message).
What I wanted to know is will this scale better than the thread per connection scenario (I assume that it will due to the thread reuse) and roughly what kind of number of devices could this kind of setup support.
Also if this isn't deemed suitable could someone possibly provide a link or explanation of the SocketAsyncEventArgs method. I have done quite a bit of reading on the topic and seen examples but cant quite get my head around the order of events etc and why certain methods are called at the time the are.
Thanks for any and all help.
I have read the comments but could anybody elaborate on these?
Though to be honest i would prefer the initial approach of of rolling my own.
This may not be suitable here, please feel free to move, shout or abuse if so.
We currently have a console application that get started by another and passed in an ID of the 'job', this job will have multiple records that need to be processed. A simple explanation of the flow would be;
Starts 50 threads
Gets records to be processed.
if records > 0 see what threads are not still busy and send it some information.
if records = 0 update something else and exit.
Get more records.
Loop.
Now, I am looking to convert this into a 'polling' service that is continually running and when new records are available, process them. To take what I have and convert this is fairly simple, but the threads stuff is old and probably outdated.
I was looking to refactor most if not all and use Task.Parallel to process the items. However, I am struggling to get a suitable framework for polling and then processing the items and was looking for suggestions on how to achieve this.
Pretty vague I know, but hopefully enough to give some kind of input.
Many thanks
From my experience and this msdn quote:
More efficient and more scalable use of system resources.
Behind the scenes, tasks are queued to the ThreadPool, which has been
enhanced with algorithms (like hill-climbing) that determine and
adjust to the number of threads that maximizes throughput. This makes
tasks relatively lightweight, and you can create many of them to
enable fine-grained parallelism. To complement this, widely-known
work-stealing algorithms are employed to provide load-balancing.
You simply shouldn't care about how many tasks is a good number, or how to create a system where you load balance the threading involved.
Simply use:
Task.Factory.StartNew(() => DoSomeWork());
Every time you want to run something asynchronously, it does all the smart job behind the curtain.
Now since you're likely to create tasks in a loop, please be extra-careful not to introduce a closure bug many people had (including me), which you can look up here.
I have a windows service that runs from 1 to 500 Tasks, and never had trouble.
Hope this helps,
Bab.
If you are polling for new records in a DB table, a better approach would be to install an INSERT-trigger (and possibly also UPDATE- and DELETE-triggers) on this table and to send a message to your service when a new records is inserted.
See Posting Message to MSMQ from SQL Server on MSDN.
The "polling service" sounds like a nice case for an observable collection. There's Rx, a nice way to handle them (http://rxwiki.wikidot.com/101samples), which I think uses the TPL.
We have to send automated emails. They need to be reliably dispatched, so we write them into the database. Simultaneously, a System.Threading.Timer that was started at Application_Start invokes a method every 30s to read out of the database and send then delete entries that have been sent. None of this occurs as a long-running task. Care has been taken to ensure that the process of clearing the db-queue uses async methods, so no phase of the sending/queuing ever blocks, with the whole process being performed by short-lived methods in the ThreadPool. The cost of an app recycle is also minimal (possibly resulting in the resending of a single email... not a problem).
Conventional wisdom says that running this in the web app is a not so good and I should spin this out to a service instead.
Writing services is a PITA. I'd rather avoid it if possible. So why shouldn't I run an efficient async mail queue in my app pool? Can anyone enlighten me?
If your site is not used your app pool will not be started - no mail is sent.
Writing services is a PITA
I guess that is subjective. However, don't you think it would be beneficial to put it in a service? In case you want to change your implementation, it's a lot easier to maintain smaller, individual components in my experience. It usually becomes more of a PITA when you have everything in one place.
You are already writing the emails to a database. It is very simple to write a simple Windows service that simply scans the database and sends emails. I know this might not be ideal, but there are lots of examples floating around on SO and elsewhere. You don't have to get all fancy and use an ESB (unless you want to).
So in the end, just because you can doesn't mean you should. You have to weigh the costs and benefits.
is it possible to store a large amount of messages in bulk?
I want to send them sync, persistent, but to get speed very much at one time.
I am using NMS, the .net version of the java-framework. But if you only know how to do this in java, it would even help. Maybe I can find a solution for .net more easier.
I thought of things like transactions. But I only got transactions to work for consumers, not for producers.
Conventional wisdom used to suggest that if you wanted maximum throughput when sending in bulk, then you should a SESSION_TRANSACTED acknowledgement mode and batch all of the message sends together with a .commit().
Unfortunately, here's a benchmark showing this not to be the case http://www.jakubkorab.net/2011/09/batching-jms-messages-for-performance-not-so-fast.html and that are you better off just sending them as normal without transactions. If you are already using transactions, then it may make sense to try and batch them.
My advice here also is that unless you are dealing with messages that are extremely time sensitive, the rate at which you produce isn't going to be that big of a deal - you should be more concerned with bandwidth as opposed to speed of message sends. If you don't mind your messages being out of order you can have multiple producers produce these messages to a given destination... or if you need them in order use multiple producers and then a resequencer after they are in the broker.
I am currently developing a C# socket server that needs to send and receive commands to a real-time process. The client is an android device. Currently the real-time requirements are "soft", however in the future more strict timing requirements might arise. Lets say in the future it might be to send commands to a crane that could be potentially dangerous.
The server is working, and seemingly very well with my current synchronous socket server design. I have separate threads for receiving and sending data. I am wondering if there would be any reason to attempt an asynchronous server socket approach? Could it provide more stability and/or faster performance?
I'll gloss over the definition of real time and say that asynchronous sockets won't make the body of the request process any faster, but will increase concurrency (the number of requests you can take at any one time). If all processors are busy processing something, you won't get any gain. This only gives you gain in the situation where a processor would have sat waiting for a socket to receive something.
Just a note on real time, if your real time requirements are anything like the need to guarantee a response in x-time, then C# and .NET will not give you such guarantees. This, however, depends on your current and future definitions of "soft". It may be the case that you happen to be getting good response times, but don't confuse that with true real time systems.
If you're doubting the usefullness of something asynchronous in your aplications then you should definitely read about this. It gives you a clear idea of what the asynchronous solutions could add to your applications
I don't think you are going to get more stability or faster performance. If it really is a "real-time" system, then it should be synchronous. If you can tolerate "near real-time" and there are long running or expensive compute operations, then you could consider an asynchronous approach. I would not add the complexity if not needed though.
If it's real time, then you absolutely want your communications to be backed by a queue so that you can prove temporal logic on that queue. This is what nio/io-completion-ports/async gives you. If you are using synchronous programming then you are wasting your CPU while copying data from RAM to the network card.
Furthermore, it means that your server is absolutely single-threaded. You may have a single thread even with async, but still be able to serve thousands of requests.
Say for example that a client wanted to perform a DOS attack. He would connect and send one byte of data. Your application would now become unable to receive further commands for the timeout of that connection, which could be quite large. With async, you would ACK the SYN package back, but your code would not be waiting for the full transmission.