I have a little trouble deciding which way to go for while designing the message flow in our system.
Because the volatile nature of our business processes (i.e. calculating freight costs) we use a workflow framework to be able to change the process on the fly.
The general process should look something like this
The interface is a service which connects to the customers system via whatever interface the customer provides (webservices, tcp endpoints, database polling, files, you name it). Then a command is sent to the executor containing the received data and the id of the workflow to be executed.
The first problem comes at the point where we want to distribute load on multiple worker services.
Say we have different processes like printing parcel labels, calculating prices, sending notification mails. Printing the labels should never be delayed because a ton of mailing workflows is executed. So we want to be able to route commands to different workers based on the work they do.
Because all commands are like "execute workflow XY" we would be required to implement our own content based routing. NServicebus does not support this out of the box, most times because it's an anti pattern.
Is there a better way to do this, when you are not able to use different message types to route your messages?
The second problem comes when we want to add a monitoring. Because an endpoint can only subscribe to one queue for each message type we can not let all executors just publish a "I completed a workflow" message. The current solution would be to Bus.Send the message to a pre configured auditing endpoint. This feels a little like cheating to me ;)
Is there a better way to consolidate published messages of multiple workers into one queue again? If there would not be problem #1 I think all workers could use the same input queue however this is not possible in this scenario.
You can try to make your routing not content-based, but headers-based which should be much easier. You are not interested if the workflow is to print labels or not, you are interested in whether this command is priority or not. So you can probably add this information into the message header...
Related
We are running multiple instances of a windows service that reads messages from a Topic, runs a report, then converts the results into a PDF and emails them to a user. In case of exceptions we simply log the exception and move on.
The use case we want to handle is when the service is shut down we want to preserve the jobs that are currently running so they can be reprocessed by another instance of the service or when the service is restarted.
Is there a way of requeueing a message? The hacky solution would be to just republish the message from the consuming service, but there must be another way.
When incoming messages are processed, their data is put in an internal queue structure (not a message queue) and processed in batches of parallel threads, so the IbmMq transaction stuff seems hard to implement. Is that what I should be using though?
Your requirement seems to be hard to implement if you don't get rid of the "internal queue structure (not a message queue)" if this is not based on a transaction oriented middleware. The MQ queue / topic works well for multi-threaded consumers, so it is not apparent what you gain from this intermediate step of moving the data to just another queue. If you start your transaction with consuming the message from MQ, you can have it being rolled back when something goes wrong.
If I understood your use case correctly, you can use Durable subscriptions:
Durable subscriptions continue to exist when a subscribing application's connection to the queue manager is closed.
The details are explained in DEFINE SUB (create a durable subscription). Example:
DEFINE QLOCAL(THE.REPORTING.QUEUE) REPLACE DEFPSIST(YES)
DEFINE TOPIC(THE.REPORTING.TOPIC) REPLACE +
TOPICSTR('/Path/To/My/Interesting/Thing') DEFPSIST(YES) DURSUB(YES)
DEFINE SUB(THE.REPORTING.SUB) REPLACE +
TOPICOBJ(THE.REPORTING.TOPIC) DEST(THE.REPORTING.QUEUE)
Your service instances can consume now from THE.REPORTING.QUEUE.
While I readily admit that my knowledge is shaky, from what I understood from IBM’s [sketchy, inadequate, obtuse] documentation there really is no good built in solution. With transactions the Queue Manager assumes all is well unless it receives a roll back request and when it does it rolls back to a syncpoint, so if you’re trying to roll back to one message but two other messages have completed in the meantime it will roll back all three.
We ended up coding our own solution updating the way we’re logging messages and marking them as completed in the DB. Then on both startup and shutdown we find the uncompleted messages and programmatically publish them back to the queue, limiting the DB search by machine name so if we have multiple instances of the service running they won’t duplicate message processing.
I'm trying to figure out how to implement a fault-tolerant message publication solution using MassTransit. We'll focus on the simple scenario where we only need to commit a database change, and publish an event indicating that change. Because there is no (built-in) mechanism that allows an atomic "commit and publish", when our process crashes, we will get into an inconsistent state (some messages would only be committed to the database, and some might only be published to the message queue).
This documentation page offers a solution, where because we assume message handling is idempotent, we can rely on the entire operation to be retried in case of failure, and these partial commits will be resolved eventually. This is a great solution, but it only has one caveat: it assumes that the operation we are performing was triggered by a message, and if we won't send an ack, processing will be retried. This is not a reasonable assumption, as messaging is typically used only for internal communication inside the system, not for communication with the outside world. What should I do when I need to save-and-publish when handling an HTTP request from an external client?
One possible solution is to hack our way into the approach presented in the article, by only publishing (or sending) a message, and listening to it ourselves, then in the message handler we do the commit and the publishing of the actual event we want others to listen to. The main problem I have with this is that it assumes we never have to return anything in the HTTP response. What if we need to indicate the success or failure of the database transaction back to the HTTP client? (example: if we rely on a UNIQUE constraint to tell us whether or not we should accept the request, and we want to indicate failure to the client). We could solve it by using request-response over the message queue (with ourselves), but this is ugly and increases latency and complexity considerably, for what is actually a very common scenario.
The approach I see the most over the internet to solve this problem, is to use an outbox that is persisted to the same database we need to write to anyway, and thus we can wrap the two operations in a regular ACID database transaction. Then a background task polls this database for new events and publishes them to the message broker. Unlike other frameworks, I understand that MassTransit does not support this behavior out of the box. So I guess my question boils down to: before rushing to implement this relatively complex mechanism myself (once per database technology), is there another solution I'm missing? what is the accepted solution to this problem in the MassTransit community?
This has been asked several times, in a variety of forms, here and other places. But the short answer is simple.
In your controller, write to the message broker only. Let the consumer deal with the database, in the context of consuming a reliable message, with all the nice retry and redelivery options that are available in that context. Then you get all the benefits of the InMemoryOutbox, without adding extreme complexity related to having a third-party (HTTP, database, and broker) in a single conversation.
I have been requested to use Amazon SQS in our new system. Our business depends on having some tasks/requests from the clients to our support agents, and once the client submit his task/request, it should be queued in my SQL Server database, and all queued tasks should be assigned to the non-busy agent because the flow says that the agent can process or handle one task/request at the meantime, so, If I have 10 tasks/requests came to my system, all should be queued, then, the system should forward the task to the agent who is free now and once the agent solves the task, he should get the next one if any, otherwise, the system should wait for any agent until finishing his current task to assign a new one, and for sure, there should not be any duplication in tasks/requests handling ... and so on.
What do I need, now?
Simple reference which can clarify what is Amazon SQS as this is my first time to use queuing service?
How can I use the same with C# and SQL Server? I have read this topic but I still feel that there is something messing as I am not able to start. I am just aiming at the way which I can process the task in run-time and assign it to an agent, then close it and getting a new one as I explained above.
Asking us to design a system based on a paragraph of prose is a pretty tall order.
SQS is simply a cloud queue system. Based on your description, I'm not sure it would make your system any better.
First off, you are already storing everything in your database, so why do you need to store things in the queue as well? If you want to have queue semantics while storing stuff in your database you could consider SQL Server Service Broker (https://technet.microsoft.com/en-us/library/ms345108(v=sql.90).aspx#sqlsvcbr_topic2) which supports queues within SQL. Alternatively unless your scale is pretty high (100+ tasks/second maybe) you could just query the table for tasks which need to be picked up.
Secondly, it sounds like you might have a workflow around tasks that could extend to more than just a single queue for agents to pick them up. For example, do you have any follow up on the tasks (emailing clients to ask them how their service was, putting a task on hold until a client gets back to you, etc)? If so, you might want to look at Simple Workflow Service (https://aws.amazon.com/swf/) or since you are already on Microsoft's stack you can look at Windows Workflow (https://msdn.microsoft.com/en-us/library/ee342461.aspx)
BTW, SQS does not guarantee "only one" delivery by default, so if duplication is a big problem for you then you will either have to do your own deduplication or use FIFO queues (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html) which support deduplication, but are limited to 300 transactions/second (aka: roughly 100 messages/second accounting for the standard send -> receive -> delete APIs. Using batching obviously that number could be much higher, but considering your use case it doesn't sound like you would be able to use batching without a lot of work).
I'm currently trying to figure the best solution for following problem using NServiceBus: I have GUI that user can use to search different things, but information about those things is spread in multiple services/databases. Lets say for example that user is searching for list of parks in a city, but each district of this city keeps only info of their parks in their own database (which they expose by web-services). I need NServiceBus to send message to each endpoint(district) what info user needs, wait for response, and then when it gets it from all endpoints (and only then) send it back to user(GUI). User is only interested in full information so Bus needs to know if every endpoint have send its response or not (it also needs to be in real time so Bus will assume that endpoint is offline and will send failure message if it will take too much time). Endpoints can change at any time so code needs to be easy to maintain. Best option will be that adding/removing endpoints can be done without changes in code.
Here are my thoughts about possible solution:
Publish/subscribe pattern lets me easily send message to multiple endpoints and add/remove endpoints at will by subscribing/unsubscribing without changing code of publisher. Problem: By definition publisher doesn't know how many subscribers are there (and what they are), so waiting for all of the subscribers to responds become difficult, if not impossible.
Request/response pattern lets me easily tell endpoints that i want answer and I will know if endpoint responded yet. Problem: Every time I need to add/remove new endpoint I need to change code of the sender. Also scalability may be a problem.
My question: Is there any way to combine those patterns? Or am I looking at this problem wrong way? Is there even a way that I can achieve all I want?
I think you are indeed looking at the problem the wrong way.
It sounds like you want to query multiple services and aggregate the information for presentation in the UI. Generally speaking, a bus is not a good choice for straight querying. A bus is great for sending commands to a specific endpoint, and for publishing state changes as they happen.
If you are performing a query against an endpoint, your best bet would be to model and expose a query (via something like WebAPI).
I am looking to build a distributed task system, in which agents will perform tasks according to a certain workflow
It seems like the concept of Sagas are perfect for this use case, in which there are 2 patterns:
1) Controller saga: a dedicated machine sends a command, waits for a reply, then sends the next command in the sequence, etc...
2) Routing slip saga: the steps are recorded in advance in the message itself.
I would like to get your opinion on these issues:
1) are sagas indeed perfect for this use case?
2) which one of them is preferred for this use case?
3) if only some of the machines are able to perform certain tasks: how do I make sure that none of the other agents won't pick the message up? (example: a task might be "execute this stored procedure" and I want it to only run on an agent that is dedicated for the database)
EDIT (2015-10-24): (more information about the workflow)
The workflow I'm looking for is something along this line: a 10 hours long divided into 10 chunks (mini-tasks). the dependency graph allows for some of these to happen concurrently while some of them will have to finish before next one is queued up. I plan to incorporate this workflow logic (dependencies) into the machine running the controller (=saga).
It would be optimal if I could change the workflow easily (for example: insert another task in the workflow between "step 7" and "step 8" (both of these are mini-tasks).
Each agent will run a few tasks concurrently (the exact number preferrably dictated by cpu/IO utilization) (i.e. might run step 3 of workflow #1 and step 5 of workflow #2)
Thanks
1) are sagas indeed perfect for this use case?
Perfect might be a bit much, but it's a good way to handle many workflows.
2) which one of them is preferred for this use case?
Your updated workflow suggests that a Saga would be a great choice for the workflow. Adding steps would require code changes and deployment, but handling long running workflows with many steps seems perfect. Also, coordinating the completion of multiple async steps before a next step is a common use case I have used sagas for.
3) if only some of the machines are able to perform certain tasks: how do I make sure that none of the other agents won't pick the message up?
By types. Each activity has a specific message type corresponding to the action. E.g. "GetReportData" (executes a stored proc?). You'll have one group of services with consumers for that message type. Only they will receive messages published with that type. If it's more complicated than that, e.g. GetReportData but only for Customer A's machine not Customer B's, then you get into Content Based Routing. This is generally looked poorly upon, and you might want to find another way to model your work, if possible. Content based routing is not something that is supported in MassTransit.
Orchestration
Sagas work well for orchestrations and especially long running orchestrations. I've personally worked on a setup where we had to convert all kinds of media like images, video files but also powerpoint, pdf, subtitles etc. and NServiceBus Sagas were used where it previously used was build on a polling database table that caused congestion issues.
Controller vs Routing slip
Both controller and routing slips variations can be used. You mention that you want to change the workflow easily but did not mention if you want to easily change an already instantiated workflow. Controller types are easier to 'update' and routing slips are very good on workflows that must not change.
Routing slip carry their flow with them so the workflow can easily be radically changed without affecting existing instances. Its hard to change existing instances, controllers are the opposite, flow can be modified but need to be backwards compatible.
There are other variations too, see this post by Jimmy Bogard:
https://lostechies.com/jimmybogard/2013/05/14/saga-patterns-wrap-up/
Changing workflow
Usually the event that creates the saga instance does the setup for the rest of the steps. This becomes part of the saga state. If the workflow is changed, then this cannot influence existing saga instances unless you explicitly want to or if you hardcode steps using if statements.
My experience with the media conversion sagas is that the workflow fetched the tasks to be executed, kept them in saga state and iterated these steps.
Message pattern
The tasks should be a command that should be modelled as asynchronous request/response. Based on the response you execute the next step(s). Pubsub does not really work well as multiple 'workers' would receive the same 'event'.
Task
Create a message per task. Create a consumer that knows how to process this message.
For example:
Service X knows how to process A, B and C
Service Y knows how to process D and E
Scaling
If Service X needs additional resources then you can scale out using either a distribution pattern (MSMQ) or using competing consumer (RabbitMQ, Azure Storage Queues, etc.).
Content Based Routing (CBR)
Avoid to have constructions like
Service X can process A, B and C but instance 1 supports A and B and instance 2 supports C.
Probably better to then split it in three services.
Services X and Y both know how to process D
How are you deciding to which service to send to command/request?
As mentioned, MassTransit does not support CBR and its the same for NServiceBus as CBR is often misused.
See this post by Udi Dahan:
http://udidahan.com/2011/03/20/careful-with-content-based-routing/
I'm not sure if I understand your question completely, but...
I'd rather go for agents pulling tasks. So each agent dequeues a task from the tasklist suitable for 'him'. The tasks should be tagged on type, so the right agent can pick it up. Every time an agent is ready with a task, it can grabs another task. When the agent grabs a task, it will be marked as busy. (you could hold a timestamp to detect timeouts)