to multi-thread or not on a scalable client-server application

to multi-thread or not on a scalable client-server application - c#

A few years ago I developed a server app (C#, .NET 4.0) that has multiple clients which connect to it. The way I did this was to create a thread for every single connection, and maintain a list of these connections. When I tested the app, it handled connections for 50 clients across my country. and it ran OK (from what I saw).
My questions are these:
For a scalable solution, is multi-threading a viable solution for handling multiple connections to various clients, or should I handle all connections on the same thread?
Are there limits to the number of threads and threading in general under .NET?
Are there downsides for using threads in .NET?
I know this is sort of vague, but I have forgotten some more intricate details since I developed the project some time ago. I am interested in developing a scalable solution for a server app in .NET and would like to know from the start if there where areas of improvements in my approach before.
UPDATE 1
I didn't use a thread polling instantiated. I actually created a Thread for a method (lets call it method threadLife).
in threadLife i had a while(true) statement in which i awaited for messages from the client. In the while i would wait for the client so send a message (so the while was blocked until i received a message)
In my application, the connections were quite stable (i.e. the clients would stay connected for long periods of time) so the connections where kept alive till the client disconnected (didn't close the connection after every message, i would recieve very frequent messages that let me know the clients state)

Thread-per-connection is not a scalable solution.
To scale well, you should use asynchronous socket methods exclusively. The question is whether to multiplex them all on a single thread or to use the thread pool. The thread pool would scale better than multiplexing, but it introduces multithreading complexities.
A lot of devs attempt to learn socket programming and multithreading at the same time, which is just too much.

one can use message queues, load balancing, dispatching, etc, etc. There is no single answer. Some solutions fit some problems well, some not.
Good places to start can be:
the ØMQ documentation
Warning: Unstable Paradigms!
The Push Framework Technical Architecture

One thread per connection would not be scalable.
I would suggest that a thread could handle a set of connected clients. If the amount of connected clients increases, the server application should be able to add as many thread needed to handle the ramp up. Then other server instances if things continues to grow.
Instead of having threads that do everything (as application instances are doing), have some dedicated threads that processes the same task separately and shares in-memory data a synchronized way.
This is more and less the way IIS is working. The number of working process, the amount of threads and thread pools is manageable through the control panel.
I remember the OpenSim project (a virtual world platform) ran this way : one thread per connected client. It has been refactored since the way explained above.
Apparently you are well started already with multithreading. Maybe this free ebook will help you to dig further.
When I started multithread, I had some trouble at first to understand it means several execution of the same code at the same time, primarily in the same instance.

Related

How can I implement Network and Serial communication components in my WPF application?

I'm looking for less technical and more conceptual answers on this one.
I am looking to build a WPF application using .NET 4.5 for controlling a rover, (glorified RC Car). Here is the intended functionality:
The application and rover will communicate wirelessly by sending and receiving strings - JSON over TCP Socket.
The GUI will display multiple video feeds via RTSP.
A control panel - custom hardware - will connect to the computer via USB and these signals will be converted to JSON before being sent over the TCP connection and providing movement instructions.
The GUI will need to update to reflect the state of the control panel as well as the state of the rover based on data received.
I'm not sure which technologies to use to implement this, but from my research, BackgroundWorkers or Threads, and Asynchronous techniques would be things to look into. Which of these seems like a good route to take? Also, should I use TCP Sockets directly in the application or should/could I use WCF to provide this data?
Any wisdom on this would be great. Thanks in advance.
EDIT:
Here was the final implementation used and boy did it workout great:
Everything fell into place around using the MVVM pattern.
There were Views for the control panel and the networking component which each had a corresponding ViewModel that handled the background operations.
Updating the UI was done via databinding, not the Dispatcher.
Wireless Communication was done Asynchronously (async/await) via TCPListener along with the use of Tasks.
Serial Port Communication was done Asynchronously via SerialPort and Tasks.
Used ModernUI for interface.
Used JSON.NET for the JSON parsing.
Here is a link to the project. It was done over the course of a month so it isn't the prettiest code. I have refined my practices a lot this summer so I'm excited to work on a refactored version that should be completed next year.

As you are using .NET 4.5 you dont need to use Threads and background workers for your project. you dont need to take care of all of your threads. As WPF's Dispatcher is a very powerful tool for handling UI from other threads.
For TCP Communication i would suggest you to use TCP Client and TCP Listner with Async Callbacks. and use Dispatcher for Updating your UI.
For Displaying Cameras over RTSP, Use VLC.Net an Open source wrapper for VLC library good for handling many real time video protocols.
Use Tasks instead of Threads, set their priority according to your requirement.
You don't need WCF for your application.

As far as I can tell (I'm no expert), MS's philosophy these days is to use asynchronous I/O, thread pool tasks for lengthy compute operations, and have a single main thread of execution for the main part of the application. That main thread drives the GUI and commissions the async I/O and thread pool tasks as and when required.
So for your application that would mean receiving messages asynchronously, and initiating a task on the thread pool to process the message, and finally displaying the results on the GUI when the task completes. It will end up looking like a single threaded event loop application. The async I/O and thread pool tasks do in fact use threads, its just they're hidden from you in an as convenient a way as possible.
I've tried (once) bucking this philosophy with my own separate thread handling all my I/O and an internal pipe connection to my main thread to tell it what's happening. I made it work, but it was really, really hard work. For example, I found it impossible to cancel a blocking network or pipe I/O operation in advance of its timeout (any thoughts from anyone out there more familiar with Win32 and .NET?). I was only trying to do that because there's no true equivalent to select() in Windows; the one that is there doesn't work with anything other than sockets... In case anyone is wondering 'why of why oh why?', I was re-implmenting an application originally written for Unix and naively didn't want to change the architecture.
Next time (if there is one) I'll stick to MS's approach.

Good Coding? (Multiple Message Loops)

I'm in need of some advice in proper coding:
I'm working on a program where multiple serial connections are used. Each communication line has a controller working as an abstraction layer. Between the controller and the serial port, a protocol is inserted to wrap the data in packages, ready for transfer. The protocol takes care of failed deliveries, resending etc.
To ensure that the GUI won't hang, the each connection line (protocol and serial port) is created on a separate thread. The controller is handled by the main thread, since it has controls in the GUI.
Currently, when I create the threads, I have chosen to create a message loop on them (Application.Run()), so instead polling buffers and yielding if no work, i simply invoke the thread (BeginInvoke) and uses the message loop as a buffer. This currently works nicely, and no serious problems so far.
My question is now: Is this "good coding", or should i use a while loop on the tread and be polling buffers instead?, or some third thing?
I would like to show code, but so far it is several thousand lines of code, so please be specific if you need to see any part of the code. :)
Thank you.

Using message loops in each thread is perfectly fine; Windows is optimized for this scenario. You are right to avoid polling, but you may want to look into other event-based designs that are more efficient still, for example preparing a package for transfer and calling SetEvent to notify a thread it's ready, or semaphore and thread-safe queue as Martin James suggests.

I'm not 100% sure what you are doing here but, with a bit of 'filling in' it doesn't sound bad:)
When your app is idle, (no comms), is CPU use 0%?
Is your app free of sleep(0)/sleep(1), or similar, polling loops?
Does it operate with a reasonably low latency?
If the answers are three 'YES', you should be fine :)
There are a few, (very few!), cases where polling for results etc. is a good idea, (eg. when the frequency of events in the threads is so high that signaling every progress event to the GUI would overwhelm it), but mostly, it's just poor design.

Server Architecture

Hopefully two simple questions relating to creating a server application:
Is there a theoretical/practical limit on the number of simultaneous sockets that can be open? Ignoring the resources required to process the data once it has arrived! If its of relevance I am targeting the .net framework
Should each connection be run in a separate thread that's permanently assigned to it, or should use of a Thread Pool be made? The dedicated thread approach seems simpler, but it seems odd to have 100+ threads running it once. Is this acceptable practice?
Any advice is greatly appreciated
Venatu

You may find the following answer useful. It illustrates how to write a scalable TCP server using the .NET thread pool and asynchronous sockets methods (BeginAccept/EndAccept and BeginReceive/EndReceive).
This being said it is rarely a good idea to write its own server when you could use one of the numerous WCF bindings (or even write custom ones) and benefit from the full power of the WCF infrastructure. It will probably scale better than every custom written server.

There are practical limits, yes. However, you will most likely run out of resources to handle the load long before you reach them. CPU, or memory are more likely to be exhausted before number of connections.
For maximum scalability, you don't want a seperate thread per connection, but rather you would use an Asynchronous model that only uses threads when servicing active (as in receiving or sending data) connections.

As I remember correctly (did sockets long time ago) the very best way of implementing them is with ReceiveAsync (.NET 3.5) / BeginReceive methods using asynchronous callbacks which will utilize thread pool. Don't open a thread for every connection, it is a waste of resources.

Sync Vs. Async Sockets Performance in .NET

Everything that I read about sockets in .NET says that the asynchronous pattern gives better performance (especially with the new SocketAsyncEventArgs which saves on the allocation).
I think this makes sense if we're talking about a server with many client connections where its not possible to allocate one thread per connection. Then I can see the advantage of using the ThreadPool threads and getting async callbacks on them.
But in my app, I'm the client and I just need to listen to one server sending market tick data over one tcp connection. Right now, I create a single thread, set the priority to Highest, and call Socket.Receive() with it. My thread blocks on this call and wakes up once new data arrives.
If I were to switch this to an async pattern so that I get a callback when there's new data, I see two issues
The threadpool threads will have default priority so it seems they will be strictly worse than my own thread which has Highest priority.
I'll still have to send everything through a single thread at some point. Say that I get N callbacks at almost the same time on N different threadpool threads notifying me that there's new data. The N byte arrays that they deliver can't be processed on the threadpool threads because there's no guarantee that they represent N unique market data messages because TCP is stream based. I'll have to lock and put the bytes into an array anyway and signal some other thread that can process what's in the array. So I'm not sure what having N threadpool threads is buying me.
Am I thinking about this wrong? Is there a reason to use the Async patter in my specific case of one client connected to one server?
UPDATE:
So I think that I was mis-understanding the async pattern in (2) above. I would get a callback on one worker thread when there was data available. Then I would begin another async receive and get another callback, etc. I wouldn't get N callbacks at the same time.
The question still is the same though. Is there any reason that the callbacks would be better in my specific situation where I'm the client and only connected to one server.

The slowest part of your application will be the network communication. It's highly likely that you will make almost no difference to performance for a one thread, one connection client by tweaking things like this. The network communication itself will dwarf all other contributions to processing or context switching time.
Say that I get N callbacks at almost
the same time on N different
threadpool threads notifying me that
there's new data.
Why is that going to happen? If you have one socket, you Begin an operation on it to receive data, and you get exactly one callback when it's done. You then decide whether to do another operation. It sounds like you're overcomplicating it, though maybe I'm oversimplifying it with regard to what you're trying to do.
In summary, I'd say: pick the simplest programming model that gets you what you want; considering choices available in your scenario, they would be unlikely to make any noticeable difference to performance whichever one you go with. With the blocking model, you're "wasting" a thread that could be doing some real work, but hey... maybe you don't have any real work for it to do.

The number one rule of performance is only try to improve it when you have to.
I see you mention standards but never mention problems, if you are not having any, then you don't need to worry what the standards say.

"This class was specifically designed for network server applications that require high performance."
As I understand, you are a client here, having only a single connection.
Data on this connection arrives in order, consumed by a single thread.
You will probably loose performance if you instead receive small amounts on separate threads, just so that you can assemble them later in a serialized - and thus like single-threaded - manner.
Much Ado about Nothing.
You do not really need to speed this up, you probably cannot.
What you can do, however is to dispatch work units to other threads after you receive them.
You do not need SocketAsyncEventArgs for this. This might speed things up.
As always, measure & measure.
Also, just because you can, it does not mean you should.
If the performance is enough for the foreseeable future, why complicate matters?

multi threading a web application

I know there are many cases which are good cases to use multi-thread in an application, but when is it the best to multi-thread a .net web application?

A web application is almost certainly already multi threaded by the hosting environment (IIS etc). If your page is CPU-bound (and want to use multiple cores), then arguably multiple threads is a bad idea, as when your system is under load you are already using them.
The time it might help is when you are IO bound; for example, you have a web-page that needs to talk to 3 external web-services, talk to a database, and write a file (all unrelated). You can do those in parallel on different threads (ideally using the inbuilt async operations, to maximise completion-port usage) to reduce the overall processing time - all without impacting local CPU overly much (here the real delay is on the network).
Of course, in such cases you might also do better by simply queuing the work in the web application, and having a separate service dequeue and process them - but then you can't provide an immediate response to the caller (they'd need to check back later to verify completion etc).

IMHO you should avoid the use of multithread in a web based application.
maybe a multithreaded application could increase the performance in a standard app (with the right design), but in a web application you may want to keep a high throughput instead of speed.
but if you have a few concurrent connection maybe you can use parallel thread without a global performance degradation

Multithreading is a technique to provide a single process with more processing time to allow it to run faster. It has more threads thus it eats more CPU cycles. (From multiple CPU's, if you have any.) For a desktop application, this makes a lot of sense. But granting more CPU cycles to a web user would take away the same cycles from the 99 other users who are doing requests at the same time! So technically, it's a bad thing.
However, a web application might use other services and processes that are using multiple threads. Databases, for example, won't create a separate thread for every user that connects to them. They limit the number of threads to just a few, adding connections to a connection pool for faster usage. As long as there are connections available or pooled, the user will have database access. When the database runs out of connections, the user will have to wait.
So, basically, the use of multiple threads could be used for web applications to reduce the number of active users at a specific moment! It allows the system to share resources with multiple users without overloading the resource. Instead, users will just have to stand in line before it's their turn.
This would not be multi-threading in the web application itself, but multi-threading in a service that is consumed by the web application. In this case, it's used as a limitation by only allowing a small amount of threads to be active.

In order to benefit from multithreading your application has to do a significant amount of work that can be run in parallel. If this is not the case, the overhead of multithreading may very well top the benefits.
In my experience most web applications consist of a number of short running methods, so apart from the parallelism already offered by the hosting environment, I would say that it is rare to benefit from multithreading within the individual parts of a web application. There are probably examples where it will offer a benefit, but my guess is that it isn't very common.

ASP.NET is already capable of spawning several threads for processing several requests in parallel, so for simple request processing there is rarely a case when you would need to manually spawn another thread. However, there are a few uncommon scenarios that I have come across which warranted the creation of another thread:
If there is some operation that might take a while and can run in parallel with the rest of the page processing, you might spawn a secondary thread there. For example, if there was a webservice that you had to poll as a result of the request, you might spawn another thread in Page_Init, and check for results in Page_PreRender (waiting if necessary). Though it's still a question if this would be a performance benefit or not - spawning a thread isn't cheap and the time between a typical Page_Init and Page_Prerender is measured in milliseconds anyway. Keeping a thread pool for this might be a little bit more efficient, and ASP.NET also has something called "asynchronous pages" that might be even better suited for this need.
If there is a pool of resources that you wish to clean up periodically. For example, imagine that you are using some weird DBMS that comes with limited .NET bindings, but there is no pooling support (this was my case). In that case you might want to implement the DB connection pool yourself, and this would necessitate a "cleaner thread" which would wake up, say, once a minute and check if there are connections that have not been used for a long while (and thus can be closed off).
Another thing to keep in mind when implementing your own threads in ASP.NET - ASP.NET likes to kill off its processes if they have been inactive for a while. Thus you should not rely on your thread staying alive forever. It might get terminated at any moment and you better be ready for it.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.