I'm interacting with an AzureStorageAccount with CloudQueueClient similarly to the manner described in this msdn example
CloudStorageAccount storageAccount = CloudStorageAccount.Parse("some connection string");
CloudQueueClient queueClient = storageAccount.CreateCloudQueueClient();
and to add an element to a queue (with some boiler plate removed):
var queue = queueClient.GetQueueReference("queuename");
var message = new CloudQueueMessage("myString");
queue.AddMessageAsync(message);
So that means I can add "myString" to my queue. Great. And if I repeatedly call those lines of code I can add "mystring" lots of time. Also good, but inefficient.
How do I add multiple items to the queue in one message?
I've researched this a bit and found Entity Group Transactions, which may be a suitable fit. However, this looks very different to what I've been doing and don't really give me any code examples. Is there any way to use this and continue to use Microsoft.WindowsAzure.StorageClient libary to construct my messages?
One way you can send multiple messages is to build a wrapper class that contains a list of individual string values (or objects), serialize this into a JSon object for example, and send that as the payload for the message. The issue here is that depending on the size of your objects, you could eventually exceed the size limitation of a message. So that's not a recommended implemtation.
At some point I was dealing with a system that needed to send a lot of messages per second; it was massively distributed. Instead of batching multiple messages we ended up creating a shard of message queues, across multiple storage accounts. The scalability requirements drove us to this implementation.
Keep in mind that Chunky vs Chatty applies to sending more information in order to avoid roundtrips, so as to optimize performance. Message queuing is not as much about performance; it is more about scalability and distribution of information. In other words, eventual consistency and distributed scale out are the patterns to favor in this environment. I am not saying you should ignore chunky vs chatty, but you should apply it where it makes sense.
If you need to send 1 million messages per second for example, then chunkier calls is an option; sharding is another. I typically favor sharding because there are fewer scalability boundaries. But in some cases, if the problem is simple enough, chunking might suffice.
I believe there is no real need to add multiple items to a queue in one message, because the best practice is to keep message and corresponding message handler as small as possible.
It is very hard to talk about inefficiency here. But when the number of messages grow so that it will really impact performance then you could use the BatchFlushInterval property to batch your messages. Otherwise follow Best Practices for Performance Improvements Using Service Bus Brokered Messaging
UPDATE:
By batching messages yourself within, for example a list, you would need to solve, at the very least, the following problems which may result in an unmanageable solution:
Keep track on the size of the message so as to not exceed the maximum message size limit
Find the ways to abandon, complete and move particular messages to a dead letter queue
Implement a batching strategy yourself
Keep track of large message processing times and implement locking if it takes too long
PS If you could outline the purpose of your question, then a better solution might be found.
Related
I have an Azure WebJob that loops through the pages of a file and processes them. The job also has an ICollector to an output queue:
[Queue("batch-pages-to-process")] ICollector<QueueMessageBatchPage> outputQueueMessage
I need to wait until all of the pages are processed before I send everything to the output queue, so instead of adding each message to the ICollector in my file processing loop, I add the messages to a list of queue messages:
List<QueueMessageBatchPage>
After all of the pages have been dealt with, I then loop through the list and add the messages to the ICollector:
foreach (var m in outputMessages)
{
outputQueueMessage.Add(m);
}
But this last part seems to take a long time. To add 300 queue messages, it takes almost 50 seconds. I don't have much to gauge by, but that seems slow. Is this normal?
There's no objective standard of slow vs. fast to offer you, but a few thoughts:
a) Part of the queuing time will be serialization of each QueueMessageBatchPage instance... the performance of that will be inversely related to the breadth and depth of the object graphs those instances represent. More data obviously takes more time to write to the queue.
b) I know you mentioned that you can't write to the queue until all file lines have been processed, but if at all possible you might reconsider that choice. To the extent you could parallelize both the processing of lines in the file and subsequent writing to the output queue (using either multiple WebJob instances or perhaps TPL Tasks within a single WebJob instance), you could potentially get this work done a lot faster. Again, I realize you stated upfront that you can't do that, so I'm just suggesting you consider the full implications of that choice (if you haven't already).
c) One other possibility to look at... make sure the region where your storage queue lives is the same as where your WebJob lives, to minimize latency.
Best of luck!
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
The basic concept I thought would be best is once the data is received and confirmed as the entire message, the message should then be passed of to some sort of collection to await processing on a FIFO basis, which will parse the values and insert them to sql server. I suppose this is whats known as the consumer/producer pattern.
I have been doing some looking into the best collection / way of doing this and have so far seen the BlockingCollection,ConcurrentCollection and BufferBlock using async/await and i think this may be the way to go but to be honest im not sure.
The best example i have found is on Stephen Cleary's blog in particular this article,
http://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html
My main reservations are that I in no way want to slow down or interrupt the receiving of messages which to me would suggest using the multiple producer/consumer example which can be seen at the above link, but what i want to know is;
Am i correct in this assumption or is there a more suitable way of doing this in my scenario.
And if im correct in my assumption could anyone suggest the best way of implementing this taking into consideration my use case.
Any and all help is much appreciated.
At the minute I am trying to put together an asynchronous tcp server to receive data which I then want to process, extracting values and inserting to sql server.
There's a common pitfall with this kind of scenario. It is usually wrong to report success back to the client when the work has yet to be done. Most of the time I've seen this design, it's because of an efficiency "requirement" self-imposed by the developer, not by the client or for technical reasons. So first, take a step back and make absolutely sure that you do want to return a "successful completion" message to the client when the operation has not actually completed yet.
If you are sure that's what you want to do, then there's another question you must ask: is it acceptable to lose requests? That is, after you tell the client that the operation successfully completed, will the system still be stable if the operation does not actually ever complete?
The answer to that question is usually "no." At that point, the most common architectural solution is to have an out-of-process reliable queue (such as an Azure queue or MSMQ), with an independent backend (such as an Azure worker role or Win32 service) that processes the queue messages. This definitely complicates the architecture, but it is a necessary complication if the system must return completion messages early and must not lose messages.
On the other hand, if losing messages is acceptable, then you can keep them in-memory. It is only in this case that you can use one of the in-memory producer/consumer types mentioned on my blog. This is a very rare situation, but it does happen from time to time.
In general, I would avoid using BlockingCollection and friends for this sort of work. Doing so encourages you to architect the entire system into a single process, which is the enemy of scalability and reliability.
I second Stephen Cleary's suggestion of using an out-of-process queue to manage the work. I disagree that this necessarily complicates the architecture, though - in fact, I think it can make things quite a bit simpler. Specifically, a major complication of the original requirement ("put together an asynchronous tcp server") disappears. Asynchronous TCP servers are a pain in the butt to write and easy to screw up - why not just skip that part altogether and be free to focus all of your energy on the post-processing code?
When I built a system like this, I used a Redis List as the task queue. Tasks were serialized to JSON, and clients would add their task to the queue with a RPUSH command. Worker processes retrieve the next task from the queue BLPOP, do their thing, then go back to waiting for the next task.
Advantages:
No locks. All synchronization comes for free from Redis (or whatever task queue you choose).
Everything in the system is single-threaded. Multi-threading is hard.
I'm free to spin up as many worker processes as I want, across as many nodes as I want.
is it possible to store a large amount of messages in bulk?
I want to send them sync, persistent, but to get speed very much at one time.
I am using NMS, the .net version of the java-framework. But if you only know how to do this in java, it would even help. Maybe I can find a solution for .net more easier.
I thought of things like transactions. But I only got transactions to work for consumers, not for producers.
Conventional wisdom used to suggest that if you wanted maximum throughput when sending in bulk, then you should a SESSION_TRANSACTED acknowledgement mode and batch all of the message sends together with a .commit().
Unfortunately, here's a benchmark showing this not to be the case http://www.jakubkorab.net/2011/09/batching-jms-messages-for-performance-not-so-fast.html and that are you better off just sending them as normal without transactions. If you are already using transactions, then it may make sense to try and batch them.
My advice here also is that unless you are dealing with messages that are extremely time sensitive, the rate at which you produce isn't going to be that big of a deal - you should be more concerned with bandwidth as opposed to speed of message sends. If you don't mind your messages being out of order you can have multiple producers produce these messages to a given destination... or if you need them in order use multiple producers and then a resequencer after they are in the broker.
I'm doing a project with some timing constraints right now. Setup is: A web service accepts (tiny) xml files and I have to process these, fast.
First and most naive idea was to handle this processing in the request dispatcher itself, but that didn't scale and was doomed from the start.
So now I'm looking at a varying load of incoming requests that each produce ~ 50 jobs on my side. Technologies available for use are limited due to the customers' rules. If it's not Sql Server or MS MQ it probably won't fly.
I thought about going down the MS MQ route (Web service just submitting messages, multiple consumer processes lateron) and small proof of concept modules worked like a charm.
There's one problem though: The priority of these jobs might change a lot, in the queue. The system is fairly time critical, so if we - for whatever reasons - cannot process incoming jobs in a timely fashion, we need to prefer the latest ones.
Basically the usecase changes from reliable messaging in general to LIFO under (too) heavy load. Old entries still have to be processed, but just lost all of their priority.
Is there any manageable way to build something like this in MS MQ?
Expanding the business side, as requested:
The processing of the incoming job is bound to some tracks, where physical goods are moved around. If I cannot process the messages in time, the things are "gone".
I still want the results for statistical purpose, but really need to focus on the newer messages now.
Think of me being able to influence mechanical things and reroute things moving on a track - if they didn't move past point X yet..
So, if i understand this, you want to be able to switch between sorting the queue by priority OR by arrival time, depending on the situation. MSMQ can only sort the queue by priority AND by arrival time.
Although I understand what you are trying to do, I don't quite see the business justification for it. Can you expand on this?
I would propose using a service to move messages from the incoming queue to a number of work queues for processing. Under normal load, there would be a several queues, each with a monitoring thread.
Under heavy load, new traffic would all go to just one "panic" queue under the load dropped. The threads on the other work queues could be paused if necessary.
CheersJohn Breakwell
I'm working on an application that may generate thousands of messages in a fairly tight loop on a client, to be processed on a server. The chain of events is something like:
Client processes item, places in local queue.
Local queue processing picks up messages and calls web service.
Web service creates message in service bus on server.
Service bus processes message to database.
The idea being that all communications are asynchronous, as there will be many clients for the web service. I know that MSMQ can do this directly, but we don't always have that kind of admin capability on the clients to set things up like security etc.
My question is about the granularity of the messages at each stage. The simplest method would mean that each item processed on the client generates one client message/web service call/service bus message. That's fine, but I know it's better for the web service calls to be batched up if possible, except there's a tradeoff between large granularity web service DTOs, versus short-running transactions on the database. This particular scenario does not require a "business transaction", where all or none items are processed, I'm just looking to achieve the best balance of message size vs. number of web service calls vs. database transactions.
Any advice?
Chatty interfaces (i.e. lots and lots of messages) will tend to have a high overhead from dispatching the incoming message (and, on the client, the reply) to the correct code to process the message (this will be a fixed cost per message). While big messages tend to use the resources in processing the message.
Additionally a lot of web service calls in progress will mean a lot of TCP/IP connections to manage, and concurrency issues (including locking in a database) might become an issue.
But without some details of the processing of the message it is hard to be specific, other than the general advice against chatty interfaces because of the fixed overheads.
Measure first, optimize later. Unless you can make a back-of-the-envelope estimate that shows that the simplest solution yields unacceptably high loads, try it, establish good supervisory measurements, see how it performs and scales. Then start thinking about how much to batch and where.
This approach, of course, requires you to be able to change the web service interface after deployment, so you need a versioning approach to deal with clients which may not have been redesigned, supporting several WS versions in parallel. But not thinking about versioning almost always traps you in suboptimal interfaces, anyway.
Abstract the message queue
and have a swappable message queue backend. This way you can test many backends and give yourself an easy bail-out should you pick the wrong one or grow to like a new one that appears. The overhead of messaging is usually packing and handling the request. Different systems are designed for different levels traffic and different symmetries over time.
If you abstract out the basic features you can swap the mechanics in and out as your needs change, or are more accurately assessed.
You can also translate messages from differing queue types at various portions of the application or message route as the recipient's stresses change because they are handling, for example 1000:1/s vs 10:1/s on a higher level.
Good Luck