Remote MSMQ connection performance - c#

We experience a lack of performance during iteration across remote private MSMQ queue. We tried to use both API methods - MessageQueue.GetAllMessages() and MessageQueue.GetEnumerator2() and see the same results.
It seems that the problem is in Message Queuing Service, because it always uses only up to 15% of CPU (single core). For example, if we iterate across local queue - we use 100% of CPU and can load 1 million messages in 2 seconds, but for remote queues it takes 30 seconds to load only 10K! Network connection is 100MBPs.
Is there a way to increase MSMQ performance for remote queues and force it to use 100% of CPU or Network?

MSMQ is optimised to go as fast as it can - it's not going slow just to irritate you.
Performance will be poor on remote queues. This is not the best way to use MSMQ. High performance is obtained through the "send remote, read local" model.
Remote access uses RPC which will be slow over a LAN. If you looked at a network trace, you would see all the back-and-forth communication. Binding to the remote RPC service and querying to find where MSMQ is listening; binding to the remote MSMQ RPC listener; requesting messages from the listener; etc etc.

This may or may not be relevant in your scenario, but it's a way to improve overall performance for MSMQ.
If you're sending messages that wrap a consistent type - a serialized class, for example - buffer them before sending and send one message containing an array or collection of items.
I was working with some serialized class and sending a large volume of messages. I tested and found that if I sent them in batches of 50 instead of individually then the size of queue was reduced by 75%. I didn't spend much time optimizing from there. It depends on the size of your messages. But this gets rid of much of the overhead incurred in sending individual messages.

Try using the TCP connection syntax and use an explicit numerical IP address 123.123.123.123. See if this affects your performance. If it does then think security.
You use the terms GetMessage but also talk about loading so I am confused about if you want performance on Message Receive “GetOne” or Load into the queue operations.
For core production code I always operate one at a time on the messages so I am never trying to GetAllMessages or EnumerateAllMessages except in specific management functions.

Related

Asynchronous vs. Synchronous socket server for real-time application

I am currently developing a C# socket server that needs to send and receive commands to a real-time process. The client is an android device. Currently the real-time requirements are "soft", however in the future more strict timing requirements might arise. Lets say in the future it might be to send commands to a crane that could be potentially dangerous.
The server is working, and seemingly very well with my current synchronous socket server design. I have separate threads for receiving and sending data. I am wondering if there would be any reason to attempt an asynchronous server socket approach? Could it provide more stability and/or faster performance?
I'll gloss over the definition of real time and say that asynchronous sockets won't make the body of the request process any faster, but will increase concurrency (the number of requests you can take at any one time). If all processors are busy processing something, you won't get any gain. This only gives you gain in the situation where a processor would have sat waiting for a socket to receive something.
Just a note on real time, if your real time requirements are anything like the need to guarantee a response in x-time, then C# and .NET will not give you such guarantees. This, however, depends on your current and future definitions of "soft". It may be the case that you happen to be getting good response times, but don't confuse that with true real time systems.
If you're doubting the usefullness of something asynchronous in your aplications then you should definitely read about this. It gives you a clear idea of what the asynchronous solutions could add to your applications
I don't think you are going to get more stability or faster performance. If it really is a "real-time" system, then it should be synchronous. If you can tolerate "near real-time" and there are long running or expensive compute operations, then you could consider an asynchronous approach. I would not add the complexity if not needed though.
If it's real time, then you absolutely want your communications to be backed by a queue so that you can prove temporal logic on that queue. This is what nio/io-completion-ports/async gives you. If you are using synchronous programming then you are wasting your CPU while copying data from RAM to the network card.
Furthermore, it means that your server is absolutely single-threaded. You may have a single thread even with async, but still be able to serve thousands of requests.
Say for example that a client wanted to perform a DOS attack. He would connect and send one byte of data. Your application would now become unable to receive further commands for the timeout of that connection, which could be quite large. With async, you would ACK the SYN package back, but your code would not be waiting for the full transmission.

Pushing OR Polling

I have a SL client and a WCF service. The client polls the WCF every 4 seconds and I have almost 100 clients at a time.
The web server is an entry level server with 512 MB RAM.
I want to know, if polling is dependent on the server configuration, if I increase the server configuration will the polling for clients work better?
And second, would pushing (duplex) be better than polling? I have got some mixed response from the blogs I have been reading.
Moreover, what are the best practices in optimizing polling for quicker response at the client? My application needs real-time data
Thanks
My guess would be that you have some kind of race condition that is showing up only with a larger number of clients. What concurrency and instancing modes are you using for your WCF service? (See MSDN: WCF Sessions, Instancing, and Concurrency at http://msdn.microsoft.com/en-us/library/ms731193.aspx)
If you're "losing" responses the first thing I would do is start logging or tracing what's happening at the server. For instance, when a client "doesn't see" a response, is the server ever getting a request? (If so, what happens to it, etc etc.)
I would also keep an eye on memory usage -- you don't say what OS you're using, but 512 MB is awfully skinny these days. If you ever get into a swap-to-disk situation, it's clearly not going to be a good thing.
Lastly, assuming that your service is CPU-bound (i.e. no heavy database & filesystem calls), the best way to raise your throughput is probably to reduce the message payload (wire size), use the most performant bindings (i.e. if client is .NET and you control it, NetTcp binding is much faster than HTTP), and, of course, multithread your service. IMHO, with the info you've provided -- and all other things equal -- polling is probably fine and pushing might just make things more complex. If it's important, you really want to bring a true engineering approach to the problem and identify/measure your bottlenecks.
Hope this helps!
"Push" notifications generally have a lower network overhead, since no traffic is sent when there's nothing to communicate. But "pull" notifications often have a lower application overhead, since you don't have to maintain state when the client is just idling waiting for a notification.
Push notifications also tend to be "faster", since clients are notified immediately when the event happens rather than waiting for the next polling interval. But pull notifications are more flexible -- you can use just about any server or protocol you want, and you can double your client capacity just by doubling your polling wait interval.

need advice for type of TCP server to cater for this type of application

The requirement of the TCP server:
receive from each client and send
result back to same client (the
server only do this)
require to cater for 100 clients
speed is an important factor, ie:
even at 100 client connections, it should not be laggy.
For now I have been using C# async method, but I find that I always encounter laggy at around 20 connections. By laggy I mean taking around almost 15-20 seconds to get the result. At around 5-10 connections, time to get result is almost immediate.
Actually when the tcp server got the message, it will interact with a dll which does some processing to return a result. Not exactly sure what is the workflow behind it but at small scale you do not see any problem, so I thought the problem might be with my TCP server.
Right now, I thinking of using a sync method. Doing so, I will have a while loop to block the accept method, and spawn a new thread for each client after accept. But at 100 connections, it is definitely overkill.
Chance upon IOCP, not exactly sure, but it seems to be like a connection pool, as the way it handles tcp is quite like the normal way.
For these TCP methods I am also not sure whether it is a better option to open and close connection each time message needs to be passed. On average, message are passed from each client at around 5-10 min interval.
Another alternative might be to use a web, (looking at generic handler) to form only 1 connection with the server. Any message that needs to be handled will be passed to this generic handler, which then sends and receive message from the server.
Need advice from especially those who did TCP in large scale. I do not have 100 PC for me to test out, so quite hard for me. Language wise C# or C++ will do, I'm more familar with C#, but will consider porting to C++ for the speed.
You must be doing it wrong. I personally wrote C# based servers that could handle 1000+ connections, sending more than 1 message per second, with <10ms response time, on commodity hardware.
If you have such high response times it must be your server process that is causing blocking. Perhaps contention on locks, perhaps plain bad code, perhaps blocking on external access leading to thread pool exhaustion. Unfortunately, there are plenty of ways to screw this up, and only few ways to get it right. There are good guidelines out there, starting with the fundamentals covered in Rick Vicik's High Performance Windows Programming articles, going over the SocketAsyncEventArgs example which covers the most performant way of writing socket apps in .Net since the advent of Socket Performance Enhancements in Version 3.5 and so on and so forth.
If you find yourself lost at the task ahead (as it seems you happen to be) I would urge you to embrace an established communication framework, perhaps WCF with a net binding, and use the declarative service model programming of WCF. This way you'll piggyback on the WCF performance. While this may not be enough for some, it will get you far enough, much further than you are right now for sure, with regard to performance.
I don't see why C# should be any worse than C++ in this situation - chances are that you've not yet hit upon the 'right way' to handle the incoming connections. Spawning off a separate thread for each client would certainly be a step in the right direction, assuming that workload for each thread is more I/O bound than CPU intensive. Whether you spawn off a thread per connection or use a thread pool to manage a number of threads is another matter - and something to determine through experimentation and also whilst considering whether 100 clients is your maximum!

Message Granularity for Message Queues and Service Buses

I'm working on an application that may generate thousands of messages in a fairly tight loop on a client, to be processed on a server. The chain of events is something like:
Client processes item, places in local queue.
Local queue processing picks up messages and calls web service.
Web service creates message in service bus on server.
Service bus processes message to database.
The idea being that all communications are asynchronous, as there will be many clients for the web service. I know that MSMQ can do this directly, but we don't always have that kind of admin capability on the clients to set things up like security etc.
My question is about the granularity of the messages at each stage. The simplest method would mean that each item processed on the client generates one client message/web service call/service bus message. That's fine, but I know it's better for the web service calls to be batched up if possible, except there's a tradeoff between large granularity web service DTOs, versus short-running transactions on the database. This particular scenario does not require a "business transaction", where all or none items are processed, I'm just looking to achieve the best balance of message size vs. number of web service calls vs. database transactions.
Any advice?
Chatty interfaces (i.e. lots and lots of messages) will tend to have a high overhead from dispatching the incoming message (and, on the client, the reply) to the correct code to process the message (this will be a fixed cost per message). While big messages tend to use the resources in processing the message.
Additionally a lot of web service calls in progress will mean a lot of TCP/IP connections to manage, and concurrency issues (including locking in a database) might become an issue.
But without some details of the processing of the message it is hard to be specific, other than the general advice against chatty interfaces because of the fixed overheads.
Measure first, optimize later. Unless you can make a back-of-the-envelope estimate that shows that the simplest solution yields unacceptably high loads, try it, establish good supervisory measurements, see how it performs and scales. Then start thinking about how much to batch and where.
This approach, of course, requires you to be able to change the web service interface after deployment, so you need a versioning approach to deal with clients which may not have been redesigned, supporting several WS versions in parallel. But not thinking about versioning almost always traps you in suboptimal interfaces, anyway.
Abstract the message queue
and have a swappable message queue backend. This way you can test many backends and give yourself an easy bail-out should you pick the wrong one or grow to like a new one that appears. The overhead of messaging is usually packing and handling the request. Different systems are designed for different levels traffic and different symmetries over time.
If you abstract out the basic features you can swap the mechanics in and out as your needs change, or are more accurately assessed.
You can also translate messages from differing queue types at various portions of the application or message route as the recipient's stresses change because they are handling, for example 1000:1/s vs 10:1/s on a higher level.
Good Luck

Tips / techniques for high-performance C# server sockets

I have a .NET 2.0 server that seems to be running into scaling problems, probably due to poor design of the socket-handling code, and I am looking for guidance on how I might redesign it to improve performance.
Usage scenario: 50 - 150 clients, high rate (up to 100s / second) of small messages (10s of bytes each) to / from each client. Client connections are long-lived - typically hours. (The server is part of a trading system. The client messages are aggregated into groups to send to an exchange over a smaller number of 'outbound' socket connections, and acknowledgment messages are sent back to the clients as each group is processed by the exchange.) OS is Windows Server 2003, hardware is 2 x 4-core X5355.
Current client socket design: A TcpListener spawns a thread to read each client socket as clients connect. The threads block on Socket.Receive, parsing incoming messages and inserting them into a set of queues for processing by the core server logic. Acknowledgment messages are sent back out over the client sockets using async Socket.BeginSend calls from the threads that talk to the exchange side.
Observed problems: As the client count has grown (now 60-70), we have started to see intermittent delays of up to 100s of milliseconds while sending and receiving data to/from the clients. (We log timestamps for each acknowledgment message, and we can see occasional long gaps in the timestamp sequence for bunches of acks from the same group that normally go out in a few ms total.)
Overall system CPU usage is low (< 10%), there is plenty of free RAM, and the core logic and the outbound (exchange-facing) side are performing fine, so the problem seems to be isolated to the client-facing socket code. There is ample network bandwidth between the server and clients (gigabit LAN), and we have ruled out network or hardware-layer problems.
Any suggestions or pointers to useful resources would be greatly appreciated. If anyone has any diagnostic or debugging tips for figuring out exactly what is going wrong, those would be great as well.
Note: I have the MSDN Magazine article Winsock: Get Closer to the Wire with High-Performance Sockets in .NET, and I have glanced at the Kodart "XF.Server" component - it looks sketchy at best.
Socket I/O performance has improved in .NET 3.5 environment. You can use ReceiveAsync/SendAsync instead of BeginReceive/BeginSend for better performance. Chech this out:
http://msdn.microsoft.com/en-us/library/bb968780.aspx
A lot of this has to do with many threads running on your system and the kernel giving each of them a time slice. The design is simple, but does not scale well.
You probably should look at using Socket.BeginReceive which will execute on the .net thread pools (you can specify somehow the number of threads it uses), and then pushing onto a queue from the asynchronous callback ( which can be running in any of the .NET threads ). This should give you much higher performance.
A thread per client seems massively overkill, especially given the low overall CPU usage here. Normally you would want a small pool of threads to service all clients, using BeginReceive to wait for work async - then simply despatch the processing to one of the workers (perhaps simply by adding the work to a synchronized queue upon which all the workers are waiting).
I am not a C# guy by any stretch, but for high-performance socket servers the most scalable solution is to use I/O Completion Ports with a number of active threads appropriate for the CPU(s) the process s running on, rather than using the one-thread-per-connection model.
In your case, with an 8-core machine you would want 16 total threads with 8 running concurrently. (The other 8 are basically held in reserve.)
The Socket.BeginConnect and Socket.BeginAccept are definitely useful. I believe they use the ConnectEx and AcceptEx calls in their implementation. These calls wrap the initial connection negotiation and data transfer into one user/kernel transition. Since the initial send/recieve buffer is already ready the kernel can just send it off - either to the remote host or to userspace.
They also have a queue of listeners/connectors ready which probably gives a bit of boost by avoiding the latency involved with userspace accepting/receiving a connection and handing it off (and all the user/kernel switching).
To use BeginConnect with a buffer it appears that you have to write the initial data to the socket before connecting.
As others have suggested, the best way to implement this would be to make the client facing code all asynchronous. Use BeginAccept() on the TcpServer() so that you dont have to manually spawn a thread. Then use BeginRead()/BeginWrite() on the underlying network stream that you get from the accepted TcpClient.
However, there is one thing I dont understand here. You said that these are long lived connections, and a large number of clients. Assuming that the system has reached steady state, where you have your max clients (say 70) connected. You have 70 threads listening for the client packets. Then, the system should still be responsive. Unless your application has memory/handle leaks and you are running out of resources so that your server is paging. I would put a timer around the call to Accept() where you kick off a client thread and see how much time that takes. Also, I would start taskmanager and PerfMon, and monitor "Non Paged Pool", "Virtual Memory", "Handle Count" for the app and see whether the app is in a resource crunch.
While it is true that going Async is the right way to go, I am not convinced if it will really solve the underlying problem. I would monitor the app as I suggested and make sure there are no intrinsic problems of leaking memory and handles. In this regard, "BigBlackMan" above was right - you need more instrumentation to proceed. Dont know why he was downvoted.
Random intermittent ~250msec delays might be due to the Nagle algorithm used by TCP. Try disabling that and see what happens.
One thing I would want to eliminate is that it isn't something as simple as the garbage collector running. If all your messages are on the heap, you are generating 10000 objects a second.
Take a read of Garbage Collection every 100 seconds
The only solution is to keep your messages off the heap.
I had the same issue 7 or 8 years ago and 100ms to 1 sec pauses , the problem was Garbage Collection .. Had about 400 Meg in use from 4 gig BUT there were a lot of objects.
I ended up storing messages in C++ but you could use ASP.NET cache ( which used to use COM and moved them out of the heap )
I don't have an answer but to get more information I'd suggest sprinkling your code with timers and logging avg and max time taken for suspect operations like adding to the queue or opening a socket.
At least that way you will have an idea of what to look at and where to begin.

Categories