is it possible to store a large amount of messages in bulk?
I want to send them sync, persistent, but to get speed very much at one time.
I am using NMS, the .net version of the java-framework. But if you only know how to do this in java, it would even help. Maybe I can find a solution for .net more easier.
I thought of things like transactions. But I only got transactions to work for consumers, not for producers.
Conventional wisdom used to suggest that if you wanted maximum throughput when sending in bulk, then you should a SESSION_TRANSACTED acknowledgement mode and batch all of the message sends together with a .commit().
Unfortunately, here's a benchmark showing this not to be the case http://www.jakubkorab.net/2011/09/batching-jms-messages-for-performance-not-so-fast.html and that are you better off just sending them as normal without transactions. If you are already using transactions, then it may make sense to try and batch them.
My advice here also is that unless you are dealing with messages that are extremely time sensitive, the rate at which you produce isn't going to be that big of a deal - you should be more concerned with bandwidth as opposed to speed of message sends. If you don't mind your messages being out of order you can have multiple producers produce these messages to a given destination... or if you need them in order use multiple producers and then a resequencer after they are in the broker.
Related
I'm interacting with an AzureStorageAccount with CloudQueueClient similarly to the manner described in this msdn example
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("some connection string");
CloudQueueClient queueClient = storageAccount.CreateCloudQueueClient();
and to add an element to a queue (with some boiler plate removed):
var queue = queueClient.GetQueueReference("queuename");
var message = new CloudQueueMessage("myString");
queue.AddMessageAsync(message);
So that means I can add "myString" to my queue. Great. And if I repeatedly call those lines of code I can add "mystring" lots of time. Also good, but inefficient.
How do I add multiple items to the queue in one message?
I've researched this a bit and found Entity Group Transactions, which may be a suitable fit. However, this looks very different to what I've been doing and don't really give me any code examples. Is there any way to use this and continue to use Microsoft.WindowsAzure.StorageClient libary to construct my messages?
One way you can send multiple messages is to build a wrapper class that contains a list of individual string values (or objects), serialize this into a JSon object for example, and send that as the payload for the message. The issue here is that depending on the size of your objects, you could eventually exceed the size limitation of a message. So that's not a recommended implemtation.
At some point I was dealing with a system that needed to send a lot of messages per second; it was massively distributed. Instead of batching multiple messages we ended up creating a shard of message queues, across multiple storage accounts. The scalability requirements drove us to this implementation.
Keep in mind that Chunky vs Chatty applies to sending more information in order to avoid roundtrips, so as to optimize performance. Message queuing is not as much about performance; it is more about scalability and distribution of information. In other words, eventual consistency and distributed scale out are the patterns to favor in this environment. I am not saying you should ignore chunky vs chatty, but you should apply it where it makes sense.
If you need to send 1 million messages per second for example, then chunkier calls is an option; sharding is another. I typically favor sharding because there are fewer scalability boundaries. But in some cases, if the problem is simple enough, chunking might suffice.
I believe there is no real need to add multiple items to a queue in one message, because the best practice is to keep message and corresponding message handler as small as possible.
It is very hard to talk about inefficiency here. But when the number of messages grow so that it will really impact performance then you could use the BatchFlushInterval property to batch your messages. Otherwise follow Best Practices for Performance Improvements Using Service Bus Brokered Messaging
UPDATE:
By batching messages yourself within, for example a list, you would need to solve, at the very least, the following problems which may result in an unmanageable solution:
Keep track on the size of the message so as to not exceed the maximum message size limit
Find the ways to abandon, complete and move particular messages to a dead letter queue
Implement a batching strategy yourself
Keep track of large message processing times and implement locking if it takes too long
PS If you could outline the purpose of your question, then a better solution might be found.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
The basic concept I thought would be best is once the data is received and confirmed as the entire message, the message should then be passed of to some sort of collection to await processing on a FIFO basis, which will parse the values and insert them to sql server. I suppose this is whats known as the consumer/producer pattern.
I have been doing some looking into the best collection / way of doing this and have so far seen the BlockingCollection,ConcurrentCollection and BufferBlock using async/await and i think this may be the way to go but to be honest im not sure.
The best example i have found is on Stephen Cleary's blog in particular this article,
http://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
My main reservations are that I in no way want to slow down or interrupt the receiving of messages which to me would suggest using the multiple producer/consumer example which can be seen at the above link, but what i want to know is;
Am i correct in this assumption or is there a more suitable way of doing this in my scenario.
And if im correct in my assumption could anyone suggest the best way of implementing this taking into consideration my use case.
Any and all help is much appreciated.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
There's a common pitfall with this kind of scenario. It is usually wrong to report success back to the client when the work has yet to be done. Most of the time I've seen this design, it's because of an efficiency "requirement" self-imposed by the developer, not by the client or for technical reasons. So first, take a step back and make absolutely sure that you do want to return a "successful completion" message to the client when the operation has not actually completed yet.
If you are sure that's what you want to do, then there's another question you must ask: is it acceptable to lose requests? That is, after you tell the client that the operation successfully completed, will the system still be stable if the operation does not actually ever complete?
The answer to that question is usually "no." At that point, the most common architectural solution is to have an out-of-process reliable queue (such as an Azure queue or MSMQ), with an independent backend (such as an Azure worker role or Win32 service) that processes the queue messages. This definitely complicates the architecture, but it is a necessary complication if the system must return completion messages early and must not lose messages.
On the other hand, if losing messages is acceptable, then you can keep them in-memory. It is only in this case that you can use one of the in-memory producer/consumer types mentioned on my blog. This is a very rare situation, but it does happen from time to time.
In general, I would avoid using BlockingCollection and friends for this sort of work. Doing so encourages you to architect the entire system into a single process, which is the enemy of scalability and reliability.
I second Stephen Cleary's suggestion of using an out-of-process queue to manage the work. I disagree that this necessarily complicates the architecture, though - in fact, I think it can make things quite a bit simpler. Specifically, a major complication of the original requirement ("put together an asynchronous tcp server") disappears. Asynchronous TCP servers are a pain in the butt to write and easy to screw up - why not just skip that part altogether and be free to focus all of your energy on the post-processing code?
When I built a system like this, I used a Redis List as the task queue. Tasks were serialized to JSON, and clients would add their task to the queue with a RPUSH command. Worker processes retrieve the next task from the queue BLPOP, do their thing, then go back to waiting for the next task.
Advantages:
No locks. All synchronization comes for free from Redis (or whatever task queue you choose).
Everything in the system is single-threaded. Multi-threading is hard.
I'm free to spin up as many worker processes as I want, across as many nodes as I want.
I am currently developing a C# socket server that needs to send and receive commands to a real-time process. The client is an android device. Currently the real-time requirements are "soft", however in the future more strict timing requirements might arise. Lets say in the future it might be to send commands to a crane that could be potentially dangerous.
The server is working, and seemingly very well with my current synchronous socket server design. I have separate threads for receiving and sending data. I am wondering if there would be any reason to attempt an asynchronous server socket approach? Could it provide more stability and/or faster performance?
I'll gloss over the definition of real time and say that asynchronous sockets won't make the body of the request process any faster, but will increase concurrency (the number of requests you can take at any one time). If all processors are busy processing something, you won't get any gain. This only gives you gain in the situation where a processor would have sat waiting for a socket to receive something.
Just a note on real time, if your real time requirements are anything like the need to guarantee a response in x-time, then C# and .NET will not give you such guarantees. This, however, depends on your current and future definitions of "soft". It may be the case that you happen to be getting good response times, but don't confuse that with true real time systems.
If you're doubting the usefullness of something asynchronous in your aplications then you should definitely read about this. It gives you a clear idea of what the asynchronous solutions could add to your applications
I don't think you are going to get more stability or faster performance. If it really is a "real-time" system, then it should be synchronous. If you can tolerate "near real-time" and there are long running or expensive compute operations, then you could consider an asynchronous approach. I would not add the complexity if not needed though.
If it's real time, then you absolutely want your communications to be backed by a queue so that you can prove temporal logic on that queue. This is what nio/io-completion-ports/async gives you. If you are using synchronous programming then you are wasting your CPU while copying data from RAM to the network card.
Furthermore, it means that your server is absolutely single-threaded. You may have a single thread even with async, but still be able to serve thousands of requests.
Say for example that a client wanted to perform a DOS attack. He would connect and send one byte of data. Your application would now become unable to receive further commands for the timeout of that connection, which could be quite large. With async, you would ACK the SYN package back, but your code would not be waiting for the full transmission.
I'm doing a project with some timing constraints right now. Setup is: A web service accepts (tiny) xml files and I have to process these, fast.
First and most naive idea was to handle this processing in the request dispatcher itself, but that didn't scale and was doomed from the start.
So now I'm looking at a varying load of incoming requests that each produce ~ 50 jobs on my side. Technologies available for use are limited due to the customers' rules. If it's not Sql Server or MS MQ it probably won't fly.
I thought about going down the MS MQ route (Web service just submitting messages, multiple consumer processes lateron) and small proof of concept modules worked like a charm.
There's one problem though: The priority of these jobs might change a lot, in the queue. The system is fairly time critical, so if we - for whatever reasons - cannot process incoming jobs in a timely fashion, we need to prefer the latest ones.
Basically the usecase changes from reliable messaging in general to LIFO under (too) heavy load. Old entries still have to be processed, but just lost all of their priority.
Is there any manageable way to build something like this in MS MQ?
Expanding the business side, as requested:
The processing of the incoming job is bound to some tracks, where physical goods are moved around. If I cannot process the messages in time, the things are "gone".
I still want the results for statistical purpose, but really need to focus on the newer messages now.
Think of me being able to influence mechanical things and reroute things moving on a track - if they didn't move past point X yet..
So, if i understand this, you want to be able to switch between sorting the queue by priority OR by arrival time, depending on the situation. MSMQ can only sort the queue by priority AND by arrival time.
Although I understand what you are trying to do, I don't quite see the business justification for it. Can you expand on this?
I would propose using a service to move messages from the incoming queue to a number of work queues for processing. Under normal load, there would be a several queues, each with a monitoring thread.
Under heavy load, new traffic would all go to just one "panic" queue under the load dropped. The threads on the other work queues could be paused if necessary.
CheersJohn Breakwell
I'm working on an application that may generate thousands of messages in a fairly tight loop on a client, to be processed on a server. The chain of events is something like:
Client processes item, places in local queue.
Local queue processing picks up messages and calls web service.
Web service creates message in service bus on server.
Service bus processes message to database.
The idea being that all communications are asynchronous, as there will be many clients for the web service. I know that MSMQ can do this directly, but we don't always have that kind of admin capability on the clients to set things up like security etc.
My question is about the granularity of the messages at each stage. The simplest method would mean that each item processed on the client generates one client message/web service call/service bus message. That's fine, but I know it's better for the web service calls to be batched up if possible, except there's a tradeoff between large granularity web service DTOs, versus short-running transactions on the database. This particular scenario does not require a "business transaction", where all or none items are processed, I'm just looking to achieve the best balance of message size vs. number of web service calls vs. database transactions.
Any advice?
Chatty interfaces (i.e. lots and lots of messages) will tend to have a high overhead from dispatching the incoming message (and, on the client, the reply) to the correct code to process the message (this will be a fixed cost per message). While big messages tend to use the resources in processing the message.
Additionally a lot of web service calls in progress will mean a lot of TCP/IP connections to manage, and concurrency issues (including locking in a database) might become an issue.
But without some details of the processing of the message it is hard to be specific, other than the general advice against chatty interfaces because of the fixed overheads.
Measure first, optimize later. Unless you can make a back-of-the-envelope estimate that shows that the simplest solution yields unacceptably high loads, try it, establish good supervisory measurements, see how it performs and scales. Then start thinking about how much to batch and where.
This approach, of course, requires you to be able to change the web service interface after deployment, so you need a versioning approach to deal with clients which may not have been redesigned, supporting several WS versions in parallel. But not thinking about versioning almost always traps you in suboptimal interfaces, anyway.
Abstract the message queue
and have a swappable message queue backend. This way you can test many backends and give yourself an easy bail-out should you pick the wrong one or grow to like a new one that appears. The overhead of messaging is usually packing and handling the request. Different systems are designed for different levels traffic and different symmetries over time.
If you abstract out the basic features you can swap the mechanics in and out as your needs change, or are more accurately assessed.
You can also translate messages from differing queue types at various portions of the application or message route as the recipient's stresses change because they are handling, for example 1000:1/s vs 10:1/s on a higher level.
Good Luck