I was looking for some advice on the best approach to a TCP/IP based server. I have done quite a bit of looking on here and other sites and cant help think what I have saw is overkill for the purpose I need it for.
I have previously written one on a thread per connection basis which I now know wont scale well, but what I was thinking was rather that creating a new thread per connection I could use a ThreadPool and queue the incoming connections for processing as time isn't a massive issue (provided they will be processed in less that a minute or two of coming in).
The server itself will be used essentially for obtaining data from devices and will only occasionally have to send a response to the sending device to update settings (Again not really time critical as the devices are setup to stay connected for as long as they can and if for some reason if it becomes disconnected the response will be able to wait until the next time it sends a message).
What I wanted to know is will this scale better than the thread per connection scenario (I assume that it will due to the thread reuse) and roughly what kind of number of devices could this kind of setup support.
Also if this isn't deemed suitable could someone possibly provide a link or explanation of the SocketAsyncEventArgs method. I have done quite a bit of reading on the topic and seen examples but cant quite get my head around the order of events etc and why certain methods are called at the time the are.
Thanks for any and all help.
I have read the comments but could anybody elaborate on these?
Though to be honest i would prefer the initial approach of of rolling my own.
Related
I have about 10.000 jobs that I want to be handled by approx 100 threads. Once a thread finished, the free 'slot' should get a new job untill there are no more jobs available.
Side note: processor load is not an issue, these jobs are mostly waiting for results or (socket) timeouts. And the amount of 100 is something that I am going to play with to find an optimum. Each job will take between 2 seconds and 5 minutes. So I want to assign new jobs to free threads and not pre-assign all jobs to threads.
My problem is that I am not sure how to do this. Im primarily using Visual Basic .Net (but C# is also ok).
I tried to make an array of threads but since each job/thread also returns a value (it also takes 2 input vars), I used 'withevents' and found out that you cannot do that on an array... maybe a collection would work? But I also need a way to manage the threads and feed them new jobs... And all results should go back to the main-form (thread)...
I have it all running in one thread, but now I want to speed up.
And then I though: Actually this is a rather common problem. There is a bunch of work to be done that needs to be distributed over an amount of worker threads.... So thats why I am asking. Whats the most common solution here?
I tried to make it question as generic as possible, so lots of people with the same kind of problem can be helped with your reply. Thanks!
Edit:
What I want to do in more detail is the following. I currently have about 1200 connected sensors that I want to read from via sockets. First thing I want to know is if the device is online (can connect on ip:port) or not. After it connects it will be depending on the device type. The device type is known after connect and Some devices I just read back a sensor value. Other devices need calibration to be performed, taking up to 5 minutes with mostly wait times and some reading/setting of values. All via the socket. Some even have FTP that I need to download a file from, but that I do via socket to.
My problem: Lot's of waiting time, so lot's of possibility to do things paralel and speed it up hugely.
My starting point is a list of ip:port addresses and I want to end up with a file with that shows the results and the results are also shown on a textbox on the main form (next to a start/pause/stop button)
This was very helpfull:
Multi Threading with Return value : vb.net
It explains the concept of a BackgroundWorker which takes away a lot of the hassle. I am now trying to see where it will bring me.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
The basic concept I thought would be best is once the data is received and confirmed as the entire message, the message should then be passed of to some sort of collection to await processing on a FIFO basis, which will parse the values and insert them to sql server. I suppose this is whats known as the consumer/producer pattern.
I have been doing some looking into the best collection / way of doing this and have so far seen the BlockingCollection,ConcurrentCollection and BufferBlock using async/await and i think this may be the way to go but to be honest im not sure.
The best example i have found is on Stephen Cleary's blog in particular this article,
http://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
My main reservations are that I in no way want to slow down or interrupt the receiving of messages which to me would suggest using the multiple producer/consumer example which can be seen at the above link, but what i want to know is;
Am i correct in this assumption or is there a more suitable way of doing this in my scenario.
And if im correct in my assumption could anyone suggest the best way of implementing this taking into consideration my use case.
Any and all help is much appreciated.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
There's a common pitfall with this kind of scenario. It is usually wrong to report success back to the client when the work has yet to be done. Most of the time I've seen this design, it's because of an efficiency "requirement" self-imposed by the developer, not by the client or for technical reasons. So first, take a step back and make absolutely sure that you do want to return a "successful completion" message to the client when the operation has not actually completed yet.
If you are sure that's what you want to do, then there's another question you must ask: is it acceptable to lose requests? That is, after you tell the client that the operation successfully completed, will the system still be stable if the operation does not actually ever complete?
The answer to that question is usually "no." At that point, the most common architectural solution is to have an out-of-process reliable queue (such as an Azure queue or MSMQ), with an independent backend (such as an Azure worker role or Win32 service) that processes the queue messages. This definitely complicates the architecture, but it is a necessary complication if the system must return completion messages early and must not lose messages.
On the other hand, if losing messages is acceptable, then you can keep them in-memory. It is only in this case that you can use one of the in-memory producer/consumer types mentioned on my blog. This is a very rare situation, but it does happen from time to time.
In general, I would avoid using BlockingCollection and friends for this sort of work. Doing so encourages you to architect the entire system into a single process, which is the enemy of scalability and reliability.
I second Stephen Cleary's suggestion of using an out-of-process queue to manage the work. I disagree that this necessarily complicates the architecture, though - in fact, I think it can make things quite a bit simpler. Specifically, a major complication of the original requirement ("put together an asynchronous tcp server") disappears. Asynchronous TCP servers are a pain in the butt to write and easy to screw up - why not just skip that part altogether and be free to focus all of your energy on the post-processing code?
When I built a system like this, I used a Redis List as the task queue. Tasks were serialized to JSON, and clients would add their task to the queue with a RPUSH command. Worker processes retrieve the next task from the queue BLPOP, do their thing, then go back to waiting for the next task.
Advantages:
No locks. All synchronization comes for free from Redis (or whatever task queue you choose).
Everything in the system is single-threaded. Multi-threading is hard.
I'm free to spin up as many worker processes as I want, across as many nodes as I want.
Is it bad practice to use a mysql database running on some remote server as a means of interfacing 2 remote computers? For example having box1 poll on a specific row of the remote db checking for values posted by box2, when box2 posts some value box1 carries out a,b,c.
Thanks for any advice.
Consider using something like ZeroMQ, which is an easy-to-use abstraction over sockets with bindings for most languages. There is some nice intro documentation as well as many examples of various patterns you can use in your application.
I can understand the temptation of using a database for this, but the idea of continually writing/polling simply to signal between clients wastes IO, ties up connections, etc. and, more importantly, seems like it would difficult to understand/debug by another person (or yourself in two years).
You can. If you were building something complex, I would caution against it, but it's fine -- you need to deal with having items being done only once, but that's not that difficult.
What you are doing is known as a message queue and there are open-source projects specific to that -- including some built on MySql.
Yes?
You're obfuscating the point of your code by placing a middleman in the situation. It sounds like you're trying to use something you know to do something you don't know. That's pretty normal, because then the problem seems solvable.
If there are only 2 computers (sender-receiver), then it is bad practice if you need fast response times. Otherwise it's fine... direct socket connection would be better, but don't waste time on it if you don't really need it.
On the other hand if there are more than two machines and/or you need fault tolerance then you actually need a middleman. Depending of the signalling you want between the machines the middleman can be a simple key-value store (e.g.: memcached, redis) or a message queue (e.g.: a dedicated message queue sofware, but I have seen MySQL used as a Queue at two different sites with big traffic)
The requirement of the TCP server:
receive from each client and send
result back to same client (the
server only do this)
require to cater for 100 clients
speed is an important factor, ie:
even at 100 client connections, it should not be laggy.
For now I have been using C# async method, but I find that I always encounter laggy at around 20 connections. By laggy I mean taking around almost 15-20 seconds to get the result. At around 5-10 connections, time to get result is almost immediate.
Actually when the tcp server got the message, it will interact with a dll which does some processing to return a result. Not exactly sure what is the workflow behind it but at small scale you do not see any problem, so I thought the problem might be with my TCP server.
Right now, I thinking of using a sync method. Doing so, I will have a while loop to block the accept method, and spawn a new thread for each client after accept. But at 100 connections, it is definitely overkill.
Chance upon IOCP, not exactly sure, but it seems to be like a connection pool, as the way it handles tcp is quite like the normal way.
For these TCP methods I am also not sure whether it is a better option to open and close connection each time message needs to be passed. On average, message are passed from each client at around 5-10 min interval.
Another alternative might be to use a web, (looking at generic handler) to form only 1 connection with the server. Any message that needs to be handled will be passed to this generic handler, which then sends and receive message from the server.
Need advice from especially those who did TCP in large scale. I do not have 100 PC for me to test out, so quite hard for me. Language wise C# or C++ will do, I'm more familar with C#, but will consider porting to C++ for the speed.
You must be doing it wrong. I personally wrote C# based servers that could handle 1000+ connections, sending more than 1 message per second, with <10ms response time, on commodity hardware.
If you have such high response times it must be your server process that is causing blocking. Perhaps contention on locks, perhaps plain bad code, perhaps blocking on external access leading to thread pool exhaustion. Unfortunately, there are plenty of ways to screw this up, and only few ways to get it right. There are good guidelines out there, starting with the fundamentals covered in Rick Vicik's High Performance Windows Programming articles, going over the SocketAsyncEventArgs example which covers the most performant way of writing socket apps in .Net since the advent of Socket Performance Enhancements in Version 3.5 and so on and so forth.
If you find yourself lost at the task ahead (as it seems you happen to be) I would urge you to embrace an established communication framework, perhaps WCF with a net binding, and use the declarative service model programming of WCF. This way you'll piggyback on the WCF performance. While this may not be enough for some, it will get you far enough, much further than you are right now for sure, with regard to performance.
I don't see why C# should be any worse than C++ in this situation - chances are that you've not yet hit upon the 'right way' to handle the incoming connections. Spawning off a separate thread for each client would certainly be a step in the right direction, assuming that workload for each thread is more I/O bound than CPU intensive. Whether you spawn off a thread per connection or use a thread pool to manage a number of threads is another matter - and something to determine through experimentation and also whilst considering whether 100 clients is your maximum!
I am trying to work out how to calculate the latency of requests through a web-app (Javascript) to a .net webservice.
Currently I am essentially trying to sync both client and server time, which when hitting the webservice I can look at the offset (which would accurately show the 'up' latency.
The problem is - when you sync the time's, you have to factor in latency for that also. So currently I am timeing the sync request (round trip) and dividing by 2, in an attempt to get the 'up' latency...and then modify the sync accordingly.
This works on the assumption that latency is symmetrical, which it isn't. Does anyone know a procedure that would be able to determine specifically the up/down latency of a JS http request to a .net service? If it needs to involve multiple handshakes thats fine, what ever is as accurate as possible.
Thanks!!
I think this is a tough one - or impossible, to be honest.
There are probably a lot of things you can do to come more or less close to what you want. I can see two ways to tackle the problem:
Use something like NTP to synchronize the clocks and use absolute timestamps. This would be fairly easy but is of course only possible if you control both, server and client (which you probably do not).
Try to make an educated guess :) This would be along the lines what you are doing now. Maybe ping could be of some assistance in any way?
The following article might provide some additional idea(s): A Stream-based Time Synchronization Technique For Networked Computer Games.
Mainly it suggests to make multiple measurements and discard "outliers". But in the end it is not that far from your current implementation, if I understand correctly.
Otherwise there is some academic material available for a more theoretical approach (by first reading some stuff, I mean). These are some things I found: Time Synchronization in Ad Hoc Networks and A clock-sampling mutual network time-synchronization algorithm for wireless ad hoc networks. Or you could have a look at the NTP-Protocol.
I have not read those though :)