I am looking to build a distributed task system, in which agents will perform tasks according to a certain workflow
It seems like the concept of Sagas are perfect for this use case, in which there are 2 patterns:
1) Controller saga: a dedicated machine sends a command, waits for a reply, then sends the next command in the sequence, etc...
2) Routing slip saga: the steps are recorded in advance in the message itself.
I would like to get your opinion on these issues:
1) are sagas indeed perfect for this use case?
2) which one of them is preferred for this use case?
3) if only some of the machines are able to perform certain tasks: how do I make sure that none of the other agents won't pick the message up? (example: a task might be "execute this stored procedure" and I want it to only run on an agent that is dedicated for the database)
EDIT (2015-10-24): (more information about the workflow)
The workflow I'm looking for is something along this line: a 10 hours long divided into 10 chunks (mini-tasks). the dependency graph allows for some of these to happen concurrently while some of them will have to finish before next one is queued up. I plan to incorporate this workflow logic (dependencies) into the machine running the controller (=saga).
It would be optimal if I could change the workflow easily (for example: insert another task in the workflow between "step 7" and "step 8" (both of these are mini-tasks).
Each agent will run a few tasks concurrently (the exact number preferrably dictated by cpu/IO utilization) (i.e. might run step 3 of workflow #1 and step 5 of workflow #2)
Thanks
1) are sagas indeed perfect for this use case?
Perfect might be a bit much, but it's a good way to handle many workflows.
2) which one of them is preferred for this use case?
Your updated workflow suggests that a Saga would be a great choice for the workflow. Adding steps would require code changes and deployment, but handling long running workflows with many steps seems perfect. Also, coordinating the completion of multiple async steps before a next step is a common use case I have used sagas for.
3) if only some of the machines are able to perform certain tasks: how do I make sure that none of the other agents won't pick the message up?
By types. Each activity has a specific message type corresponding to the action. E.g. "GetReportData" (executes a stored proc?). You'll have one group of services with consumers for that message type. Only they will receive messages published with that type. If it's more complicated than that, e.g. GetReportData but only for Customer A's machine not Customer B's, then you get into Content Based Routing. This is generally looked poorly upon, and you might want to find another way to model your work, if possible. Content based routing is not something that is supported in MassTransit.
Orchestration
Sagas work well for orchestrations and especially long running orchestrations. I've personally worked on a setup where we had to convert all kinds of media like images, video files but also powerpoint, pdf, subtitles etc. and NServiceBus Sagas were used where it previously used was build on a polling database table that caused congestion issues.
Controller vs Routing slip
Both controller and routing slips variations can be used. You mention that you want to change the workflow easily but did not mention if you want to easily change an already instantiated workflow. Controller types are easier to 'update' and routing slips are very good on workflows that must not change.
Routing slip carry their flow with them so the workflow can easily be radically changed without affecting existing instances. Its hard to change existing instances, controllers are the opposite, flow can be modified but need to be backwards compatible.
There are other variations too, see this post by Jimmy Bogard:
https://lostechies.com/jimmybogard/2013/05/14/saga-patterns-wrap-up/
Changing workflow
Usually the event that creates the saga instance does the setup for the rest of the steps. This becomes part of the saga state. If the workflow is changed, then this cannot influence existing saga instances unless you explicitly want to or if you hardcode steps using if statements.
My experience with the media conversion sagas is that the workflow fetched the tasks to be executed, kept them in saga state and iterated these steps.
Message pattern
The tasks should be a command that should be modelled as asynchronous request/response. Based on the response you execute the next step(s). Pubsub does not really work well as multiple 'workers' would receive the same 'event'.
Task
Create a message per task. Create a consumer that knows how to process this message.
For example:
Service X knows how to process A, B and C
Service Y knows how to process D and E
Scaling
If Service X needs additional resources then you can scale out using either a distribution pattern (MSMQ) or using competing consumer (RabbitMQ, Azure Storage Queues, etc.).
Content Based Routing (CBR)
Avoid to have constructions like
Service X can process A, B and C but instance 1 supports A and B and instance 2 supports C.
Probably better to then split it in three services.
Services X and Y both know how to process D
How are you deciding to which service to send to command/request?
As mentioned, MassTransit does not support CBR and its the same for NServiceBus as CBR is often misused.
See this post by Udi Dahan:
http://udidahan.com/2011/03/20/careful-with-content-based-routing/
I'm not sure if I understand your question completely, but...
I'd rather go for agents pulling tasks. So each agent dequeues a task from the tasklist suitable for 'him'. The tasks should be tagged on type, so the right agent can pick it up. Every time an agent is ready with a task, it can grabs another task. When the agent grabs a task, it will be marked as busy. (you could hold a timestamp to detect timeouts)
Related
I have been requested to use Amazon SQS in our new system. Our business depends on having some tasks/requests from the clients to our support agents, and once the client submit his task/request, it should be queued in my SQL Server database, and all queued tasks should be assigned to the non-busy agent because the flow says that the agent can process or handle one task/request at the meantime, so, If I have 10 tasks/requests came to my system, all should be queued, then, the system should forward the task to the agent who is free now and once the agent solves the task, he should get the next one if any, otherwise, the system should wait for any agent until finishing his current task to assign a new one, and for sure, there should not be any duplication in tasks/requests handling ... and so on.
What do I need, now?
Simple reference which can clarify what is Amazon SQS as this is my first time to use queuing service?
How can I use the same with C# and SQL Server? I have read this topic but I still feel that there is something messing as I am not able to start. I am just aiming at the way which I can process the task in run-time and assign it to an agent, then close it and getting a new one as I explained above.
Asking us to design a system based on a paragraph of prose is a pretty tall order.
SQS is simply a cloud queue system. Based on your description, I'm not sure it would make your system any better.
First off, you are already storing everything in your database, so why do you need to store things in the queue as well? If you want to have queue semantics while storing stuff in your database you could consider SQL Server Service Broker (https://technet.microsoft.com/en-us/library/ms345108(v=sql.90).aspx#sqlsvcbr_topic2) which supports queues within SQL. Alternatively unless your scale is pretty high (100+ tasks/second maybe) you could just query the table for tasks which need to be picked up.
Secondly, it sounds like you might have a workflow around tasks that could extend to more than just a single queue for agents to pick them up. For example, do you have any follow up on the tasks (emailing clients to ask them how their service was, putting a task on hold until a client gets back to you, etc)? If so, you might want to look at Simple Workflow Service (https://aws.amazon.com/swf/) or since you are already on Microsoft's stack you can look at Windows Workflow (https://msdn.microsoft.com/en-us/library/ee342461.aspx)
BTW, SQS does not guarantee "only one" delivery by default, so if duplication is a big problem for you then you will either have to do your own deduplication or use FIFO queues (http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html) which support deduplication, but are limited to 300 transactions/second (aka: roughly 100 messages/second accounting for the standard send -> receive -> delete APIs. Using batching obviously that number could be much higher, but considering your use case it doesn't sound like you would be able to use batching without a lot of work).
I have been looking around for a while now and I want to create a timeout property on a bookmark in WF 4.0.
I can make it work with using a Picker with two different branches (and have a timer in one of them and my bookmark in the other).
However this does not work if my workflow is persisted to the database (which it will be since the timeout will be several days) since it will not trigger until i load the workflow next time which can be several days also.
Does anyone know if there is any other way to solve this in the WF 4.0? Or have you done a great workaround?
Okay so what you're going to want to do is build a Workflow Service, you will not be able to do this via a workflow that is not hosted via the Workflow Service Host (WSH) near as easily. To tell you it can't be done would be incorrect, but I can tell you that you don't want to.
That service will be available via a WCF endpoint and can do exactly what you're needing. You would be able to build a workflow that had a pick branch that had two things in it, the first is a Receive activity that could be called into by the user if they responded in time. The second would be a durable timer that ticked at a specified interval and would allow you to branch down another path. Now this same service can have more than one Receive activity and thus exposing more than one endpoint so if your workflow has any other branches just like this you can handle all of those in one atomic workflow.
Does this make sense?
I have a little trouble deciding which way to go for while designing the message flow in our system.
Because the volatile nature of our business processes (i.e. calculating freight costs) we use a workflow framework to be able to change the process on the fly.
The general process should look something like this
The interface is a service which connects to the customers system via whatever interface the customer provides (webservices, tcp endpoints, database polling, files, you name it). Then a command is sent to the executor containing the received data and the id of the workflow to be executed.
The first problem comes at the point where we want to distribute load on multiple worker services.
Say we have different processes like printing parcel labels, calculating prices, sending notification mails. Printing the labels should never be delayed because a ton of mailing workflows is executed. So we want to be able to route commands to different workers based on the work they do.
Because all commands are like "execute workflow XY" we would be required to implement our own content based routing. NServicebus does not support this out of the box, most times because it's an anti pattern.
Is there a better way to do this, when you are not able to use different message types to route your messages?
The second problem comes when we want to add a monitoring. Because an endpoint can only subscribe to one queue for each message type we can not let all executors just publish a "I completed a workflow" message. The current solution would be to Bus.Send the message to a pre configured auditing endpoint. This feels a little like cheating to me ;)
Is there a better way to consolidate published messages of multiple workers into one queue again? If there would not be problem #1 I think all workers could use the same input queue however this is not possible in this scenario.
You can try to make your routing not content-based, but headers-based which should be much easier. You are not interested if the workflow is to print labels or not, you are interested in whether this command is priority or not. So you can probably add this information into the message header...
I am basically creating a site for recruiters. One of the functionality in my application requires posting to Facebook periodically. The posting frequency can be from 0(Never) to 4(High)
For Eg. If a recruiter has 4 open jobs and he has posting frequency set to 4, each job should be posted as per it's turn: 1st job on 1st day, 2nd job on 2nd, 3rd job on 3rd etc, on 5th day again 1st job (round robin fashion).
Had he set the posting frequency to 2, two jobs would be posted daily (thus each job would be posted every 2 days)
My only question is what type of threading should I create for this since this is all dynamic!! Also, any guidelines on what type of information should I store in database?
I need just a general strategy to solve this problem. No code..
I think you need to seperate it from your website, I mean its better to run the logic for posting jobs in a service hosted on IIS ( I am not sure such a thing exists or not, but I guess there is).
Also you need to have table for job queue to remember which jobs need to be posted, then your service would pick them up and post them one by one.
To decide if this is the time for posting a job you can define a timer with a configurable interval to check if there is any job to post or not.
Make sure that you keep the verbose log details if posting fails. It is important because it is possible that Facebook changes its API or your API key becomes invalid or anything else then you need to know what happened.
Also I strongly suggest to have a webpage for reporting the status of jobs-to-post queue, if they failed what was the causes of problem.
If you program runs non-stop, you can just use one of the Timer classes available in .NET framework, without the need to go for full-blown concurrency (e.g. via Task Parallel Library).
I suspect, though, that you'll need more than that - some kind of mechanism to detect which jobs were successfully posted and which were "missed" due program not running (or network problems etc.), so they can be posted the next time the program is started (or network becomes available). A small local database (such as SQLite or MS SQL Server Compact) should serve this purpose nicely.
If the requirements are as simple as you described, then I wouldn't use threading at all. It wouldn't even need to be a long-running app. I'd create a simple app that would just try to post a job and then exit immediately. However, I would scheduled it to run once every given period (via Windows Task Scheduler).
This app would check first if it hasn't posted any job yet for the given posting frequency. Maybe put a "Last-Successful-Post-Time" setting in your datastore. If it's allowed to post, the app would just query the highest priority job and then post it to Facebook. Once it successfully posts to Facebook, that job would then be downgraded to the lowest priority.
The job priority could just be a simple integer column in your data store. Lower values mean higher priorities.
Edit:
I guess what I'm suggesting is if you have clear boundaries in your requirements, I would suggest breaking your project into multiple applications. This way there is a separation of concerns. You wouldn't then need to worry how to spawn your Facebook notification process inside your web site code.
I'm tasked to create a web application. I'm currently using c# & asp.net (mvc - but i doubt its relevant to the question) - am a rookie developer and somewhat new to .net.
Part of the logic in the application im building is to make requests to an external smsgateway by means of hitting a particular url with a request - either as part of a user-initiated action in the webapp (could be a couple of messages send) or as part of a scheduledtask run daily (could and will be several thousand message send).
In relation to a daily task, i am afraid that looping - say - 10.000 times in one thread (especially if im also to take action depending on the response of the request - like write to a db) is not the best strategy and that i could gain some performance/timesavings from some parallelization.
Ultimately i'm more afraid that thousands of users at the same time (very likely) will perform the action that triggers a request. With a naive implementation that spawns some kind of background thread (whatever its called) for each request i fear a scenario with hundreds/thousands of requests at once.
So if my assumptions are correct - how do i deal with this? do i have to manually spawn some appropriate number of new Thread()s and coordinate their work from a producer/consumer-like queue or is there some easy way?
Cheers
If you have to make 10,000 requests to a service then it means that the service's API is anemic - probably CRUD-based, designed as a thin wrapper over a database instead of an actual service.
A single "request" to a well-designed service should convey all of the information required to perform a single "unit of work" - in other words, those 10,000 requests could very likely be consolidated into one request, or at least a small handful of requests. This is especially important if requests are going to a remote server or may take a long time to complete (and 2-3 seconds is an extremely long time in computing).
If you do not have control over the service, if you do not have the ability to change the specification or the API - then I think you're going to find this very difficult. A single machine simply can't handle 10,000 outgoing connections at once; it will struggle with even a few hundred. You can try to parallelize this, but even if you achieve a tenfold increase in throughput, it's still going to take half an hour to complete, which is the kind of task you probably don't want running on a public-facing web site (but then, maybe you do, I don't know the specifics).
Perhaps you could be more specific about the environment, the architecture, and what it is you're trying to do?
In response to your update (possibly having thousands of users all performing an action at the same time that requires you to send one or two SMS messages for each):
This sounds like exactly the kind of scenario where you should be using Message Queuing. It's actually not too difficult to set up a solution using WCF. Some of the main reasons why one uses a message queue are:
There are a large number of messages to send;
The sending application cannot afford to send them synchronously or wait for any kind of response;
The messages must eventually be delivered.
And your requirements fit this like a glove. Since you're already on the Microsoft stack, I'd definitely recommend an asynchronous WCF service backed by MSMQ.
If you are working with SOAP, or some other type XML request, you may not have an issue dealing with the level of requests in a loop.
I set up something similar using a SOAP server with 4-5K requests with no problem...
A SOAP request to a web service (assuming .NET 2.0 and superior) looks something like this:
WebServiceProxyClient myclient = new WebServiceProxyClient();
myclient.SomeOperation(parameter1, parameter2);
myclient.Close();
I'm assuming that this code will will be embedded into your business logic that you will be trigger as part of the user initiated action, or as part of the scheduled task.
You don't need to do anything especial in your code to cope with a high volume of users. This will actually be a matter of scalling on your platform.
When you say 10.000 request, what do you mean? 10.000 request per second/minute/hour, this is your page hit per day, etc?
I'd also look into using an AsyncController, so that your site doesn't quickly become completely unusable.