I have code that sends a message into a MessageQueue.
_queue.Send(new Message(myData));
This queue is on the local machine and the threads that Receive() from it are in the same process.
What happens if the messages are inserted faster than they are extracted? Will Send() block?
Is there a way for me to know if the MessageQueue is full before sending into it more events?
(I would prefer at this point just to log myData and not send the event).
Thanks,
Sela.
Short answer: Do the simple thing and don't limit on send.
Long answer:
The message queue will only really get full when the disk it's allocated to save to is out of space - which is the same time that your logging will be out of space. The message queue is very good at holding data you're not ready to process. Don't throttle on send. If you're concerned about system management and disk space then you might prefer to rely on Window's excellent system monitoring facilities and disk space usage threshold alerts. You don't need to reinvent this for your application.
That is unless you're running the queue in memory only mode which may not be necessary. If you can't process the messages fast enough then you definitely have enough time to let the queue manager persist the messages to disk. You should only consider running the queue in memory only mode if you're going to scale to many consumer processes on many servers and the disk IO on the queue manager becomes the bottleneck. One process on the same machine is very far away from that scenario. Let the queue manager do what it does best. Don't optimise prematurely.
If you implement a specified quality of service like X messages per second and bill your customers more for processing a higher quality of service then throttle at the receiving end. I've done this successfully using a semaphore initialised with a resource limit equal to the number of messages to consume per second. Each consumer thread took a snapshot of the message start time, processed 1 message and then waited for the end of second before giving up the semaphore. That way the thread pool could grow to accomodate the quality of service if messages took more than 1 second to process but would not exceed the quality of service.
Good luck!
Designing a system so that messages are produced no faster than they are consumed is good and I agree with that. Nevertheless it might happen that a producer of messages would hit a wall because queue is overfilled especially when there are low quotas set.
To prepare for such a situation you need to monitor if Send() method succeeds. If you send a message into a full queue the message is lost and because Send() returns void there is no immediate result of success or failure. But there is a way to detect that. You should request an acknowledgements when working with MSMQ. To receive them you need to use an administrative queue. This way you can be notified about different things that happen including queue being full.
Message msg = new Message
{
Formatter = new BinaryMessageFormatter(),
Body = data,
AdministrationQueue = this.adminQueue,
AcknowledgeType = AcknowledgeTypes.FullReachQueue
};
this.queue.Send(msg);
Message admMsg = this.adminQueue.Receive();
if (admMsg != null && admMsg.Acknowledgment == Acknowledgment.QueueExceedMaximumSize)
{
// queue is full
}
Related
We have code sending 2 duplicate tell messages ( a integer differences) one after directly the other no processing in between. These normally processed in a few milli seconds, but we are hitting times when one is process instantly the next some 11 seconds later. Which is a life time.
The randomness and sporadic nature of this issue is making it difficult to diagnosis and the fact 99% these message are processed blistering quick makes it a head scratching issue.
Back ground: We have a very controlled /stable environment 64 bit windows 10 machine. A dedicated windows server running self hosted webapi using c#, akka services v.1.3 (.net Framework). No akka remoting or clustering. As messages are posted in, actors and child actors breaking them down process into to smaller and smaller actors, some are stateful basically caching db details about requests to save on DB road trips, as the prices behind requests are fluctuating all the time we look to only post to the DB. None of these parent actors are misbehaving.
Currently logging on entry and exit to processmessage methods provides the only real diagnostics to track behaviour.
It is the behaviour of the message queuing that we think is the issue.
Basically two tells to the same actor, these are very small messages (less 1k) to a very small actor whose sole job is just to send http message. The actor has nocaching or Db requests or IO ( other than logging). Once the message hits the handler's ProcessMessage it is processed in a milli second or 2.
If I understood your issue correctly, upon receiving a message, it is forwarded to a downstream actor whose job is to call an external server? If that is the case, then Akka.NET is not at fault, why?:
Actors process message sequentially and won't process the next message
unless the current message has been processed completely. The more time it
takes to handle the current message the more time it takes for the next
message to be handled.
Probably the external server is over loaded and not sending response quickly
or maybe there is rate limiting turned on at the external server's side.
Probably the httpclient used by you needs fine-tuning!
If you can post a sample of your code, it will help in understanding your issue better!
We use rabbit mq to send messages to a server for processing.
We require the server to ack a message. That way if the server happens to die whilst processing the message, we will retry the message when it restarts, or with a different server.
The problem is, on a very rare occasion, we will get a message that deterministically crashes the server. This is because we call into some open source native dlls, those dlls have bugs, and sometimes these dlls just cause the process to crash with no exception. Of course it would be ideal to fix those bugs, but we don't expect to fix all such issues in pdfium or opencv any time soon. We have to reckon with the fact that whatever we do, we will eventually get such a message.
The result of this is that the message is then retried, the server restarts, picks ups the message, crashes, and so on ad infinitum. Nothing gets processed till we manually stop the server, and purge the message. Not ideal.
What can we do to solve this problem?
What we don't want to do is create another service that monitors the rabbitmq service, looks for such messages and purges them, since that just leads to spiralling complexity. Instead we want to deal with this at the rabbitmq client level. We would be perfectly happy to say that if a message is not processed 3 times, we should just fail the message. We could do this by maintaining a database entry of which messages we've processed, but ideally I wouldn't want to involve anything external, and just contain the solution to this problem in our rabbitmq client library. I'm not sure how to do this though.
One method I have used in my event driven architecture is to use dead letter exchanges (DLXs) or poison queues, that way if we see the same message multiple times due to service failure then it'll be pushed into the DLX instead of being re-queued into the original exchange. These messages then trigger a different type of process within our system to alert us messages are stuck and failing to process, we can then diagnose and fix the consumer. After a fix has been made we trigger another process to move the poison messages back into the original exchange to be then processed as normal.
In your scenario because your process crashes there is two possible options to deal with these messages:
If the message is marked as redelivered then clone the message and add an attempt count to the body or as a header (x-attempt-count) to the message. The copy will then be added to the back of the queue with the attempt count. When the copy is then consumed you can check if it hits the threshold and then move the message into a DLX or store in a database. The major drawback here is that it breaks the order of which the messages are processed.
Use an external services to keep track of the number of delivery attempts, I would recommend using something like redis/memcache where you can increment a counter based on a unique message id. At the start of your process if the message has been marked as redelivered then lookup the counter. If the message has reached the threshold, trigger a different process again like moving it into a DLX.
My function is sending a payload to different sftp servers. Those servers are limited in how many connections they can accept.
I need a solution to throttle our connections to those servers.
The function is triggered by storage queues, and the first draft of the design is:
I then learned that you can only have 1 trigger per function, which led me to sandwhich another aggregating queue:
I can set the batchSize/newBatchThreshold on the originating queues, but I'm not certain this will work because the originating queues will not be aware of when to push messages to the aggregate queue.
I need the function to not scale out to more than N instances for all messages from queue X, since the sftp server X will not accept more than N connections.
Additionally, I need the function to scale out to no more than M instances for all messages from queue Y, since the sftp server Y will not accept more than M connections.
The total instances would be M + N for the above scenario.
How do we adjust our design in order to fit these requirements?
There's no 100% bullet-proof solution to this, the issue is tracked here.
The best way could be setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT to 1 in application settings of the Function App which is triggered by the aggregate queue. Then, you should only get one concurrent instance of the Function App, so the batchSize setting will actually be useful for rate limiting.
You don't need to limit queue processors X/Y/Z in this case, let the messages flow to the aggregate.
Now, I didn't understand if only messages from queue X touch SFPT X, or it's many-to-many. If it's one-on-one, it makes sense to get rid of the aggregate queue, have three Functions and limit the concurrency for each of the queues separately.
Anyway, the limit settings are as I suggested above.
In case this still doesn't satisfy your requirements, you may switch to another messaging service. For instance, send all messages of one type into a separate session of Service Bus or a single partition of Event Hub, which will naturally limit the concurrency on the broker level.
Option 1: Depend on the sftp's error response
Does the sftp server return a 429 (too many requests) response? Or something similar? When you get such a response, you can just exit from the function without deleting the message from the queue. The message will become visible again after 30 seconds, and would trigger a function. 30 seconds is the default value of visibilitytimeout and is customizable on a per-msg basis.
Option 2: Distributed locks
I don't know from top of my head a distributed locking solution with counters. An alternative would be to implement a locking solution on your own using a SQL db and atomic transactions. A function when processing a message from Queue X will look into the db to see if a lock counter for X is less than N, and increase it by 1 if so, and then process the message. In this case, you will have to make sure that the locks get released even if your function crashes. That is, implement locks with lease expiration time.
Say I have a connection to rabbit, and I've pulled 1000 messages, but have not yet ack'd them, as they are being processed by a single thread out of a Blocking collection.
Now suppose my connection dies and is auto recovered. At this point all of these msgs on the server will be re queued for delivery. But I still have copies of them locally, with the old Delivery tag.
This leads me to believe I should handle connection or channel down events by clearing my local queue out.
Can you confirm this is true?
Yes that is the case. Those messages will be redelivered.
So in addition to clearing our your locally queued messages, you might want to consider your prefetch so that you don't have so many messages queued locally.
Is your strategy is to pull 1000, process them all, then finally ack them all? I can see that due to performance reasons you might do this so you can send a single ack with multiple=true, but it does introduce extra redelivery and duplicate processing risks.
You are right.If you are processing one message at a time you can set prefetch count as 1 and you may not need to clear any messages locally,too.
I have a .NET 2.0 server that seems to be running into scaling problems, probably due to poor design of the socket-handling code, and I am looking for guidance on how I might redesign it to improve performance.
Usage scenario: 50 - 150 clients, high rate (up to 100s / second) of small messages (10s of bytes each) to / from each client. Client connections are long-lived - typically hours. (The server is part of a trading system. The client messages are aggregated into groups to send to an exchange over a smaller number of 'outbound' socket connections, and acknowledgment messages are sent back to the clients as each group is processed by the exchange.) OS is Windows Server 2003, hardware is 2 x 4-core X5355.
Current client socket design: A TcpListener spawns a thread to read each client socket as clients connect. The threads block on Socket.Receive, parsing incoming messages and inserting them into a set of queues for processing by the core server logic. Acknowledgment messages are sent back out over the client sockets using async Socket.BeginSend calls from the threads that talk to the exchange side.
Observed problems: As the client count has grown (now 60-70), we have started to see intermittent delays of up to 100s of milliseconds while sending and receiving data to/from the clients. (We log timestamps for each acknowledgment message, and we can see occasional long gaps in the timestamp sequence for bunches of acks from the same group that normally go out in a few ms total.)
Overall system CPU usage is low (< 10%), there is plenty of free RAM, and the core logic and the outbound (exchange-facing) side are performing fine, so the problem seems to be isolated to the client-facing socket code. There is ample network bandwidth between the server and clients (gigabit LAN), and we have ruled out network or hardware-layer problems.
Any suggestions or pointers to useful resources would be greatly appreciated. If anyone has any diagnostic or debugging tips for figuring out exactly what is going wrong, those would be great as well.
Note: I have the MSDN Magazine article Winsock: Get Closer to the Wire with High-Performance Sockets in .NET, and I have glanced at the Kodart "XF.Server" component - it looks sketchy at best.
Socket I/O performance has improved in .NET 3.5 environment. You can use ReceiveAsync/SendAsync instead of BeginReceive/BeginSend for better performance. Chech this out:
http://msdn.microsoft.com/en-us/library/bb968780.aspx
A lot of this has to do with many threads running on your system and the kernel giving each of them a time slice. The design is simple, but does not scale well.
You probably should look at using Socket.BeginReceive which will execute on the .net thread pools (you can specify somehow the number of threads it uses), and then pushing onto a queue from the asynchronous callback ( which can be running in any of the .NET threads ). This should give you much higher performance.
A thread per client seems massively overkill, especially given the low overall CPU usage here. Normally you would want a small pool of threads to service all clients, using BeginReceive to wait for work async - then simply despatch the processing to one of the workers (perhaps simply by adding the work to a synchronized queue upon which all the workers are waiting).
I am not a C# guy by any stretch, but for high-performance socket servers the most scalable solution is to use I/O Completion Ports with a number of active threads appropriate for the CPU(s) the process s running on, rather than using the one-thread-per-connection model.
In your case, with an 8-core machine you would want 16 total threads with 8 running concurrently. (The other 8 are basically held in reserve.)
The Socket.BeginConnect and Socket.BeginAccept are definitely useful. I believe they use the ConnectEx and AcceptEx calls in their implementation. These calls wrap the initial connection negotiation and data transfer into one user/kernel transition. Since the initial send/recieve buffer is already ready the kernel can just send it off - either to the remote host or to userspace.
They also have a queue of listeners/connectors ready which probably gives a bit of boost by avoiding the latency involved with userspace accepting/receiving a connection and handing it off (and all the user/kernel switching).
To use BeginConnect with a buffer it appears that you have to write the initial data to the socket before connecting.
As others have suggested, the best way to implement this would be to make the client facing code all asynchronous. Use BeginAccept() on the TcpServer() so that you dont have to manually spawn a thread. Then use BeginRead()/BeginWrite() on the underlying network stream that you get from the accepted TcpClient.
However, there is one thing I dont understand here. You said that these are long lived connections, and a large number of clients. Assuming that the system has reached steady state, where you have your max clients (say 70) connected. You have 70 threads listening for the client packets. Then, the system should still be responsive. Unless your application has memory/handle leaks and you are running out of resources so that your server is paging. I would put a timer around the call to Accept() where you kick off a client thread and see how much time that takes. Also, I would start taskmanager and PerfMon, and monitor "Non Paged Pool", "Virtual Memory", "Handle Count" for the app and see whether the app is in a resource crunch.
While it is true that going Async is the right way to go, I am not convinced if it will really solve the underlying problem. I would monitor the app as I suggested and make sure there are no intrinsic problems of leaking memory and handles. In this regard, "BigBlackMan" above was right - you need more instrumentation to proceed. Dont know why he was downvoted.
Random intermittent ~250msec delays might be due to the Nagle algorithm used by TCP. Try disabling that and see what happens.
One thing I would want to eliminate is that it isn't something as simple as the garbage collector running. If all your messages are on the heap, you are generating 10000 objects a second.
Take a read of Garbage Collection every 100 seconds
The only solution is to keep your messages off the heap.
I had the same issue 7 or 8 years ago and 100ms to 1 sec pauses , the problem was Garbage Collection .. Had about 400 Meg in use from 4 gig BUT there were a lot of objects.
I ended up storing messages in C++ but you could use ASP.NET cache ( which used to use COM and moved them out of the heap )
I don't have an answer but to get more information I'd suggest sprinkling your code with timers and logging avg and max time taken for suspect operations like adding to the queue or opening a socket.
At least that way you will have an idea of what to look at and where to begin.