Since it's a long question, cliff notes come first.
Cliff notes:
One client sends input to several services and they keep on working and sending results until the client tells them to stop or they have reached a pre-set maximum number of results.
Do you know how one should go about implementing this, or do you have a C#-example for sth. like this? Is WCF & streaming the right toolset for this ? (Consider that results are custom objects, so it's not exactly the same as streaming a file)
More Detailed Problem Definition:
Situation:
I have full control over the code of the client and the services (iow not dependent on closed 3rd party stuff)
everything is in C#
We have one client who wants to get one task done and has several equal independent services for that.
(equal = equal service-software, the hardware on which each service runs can vary -> service-speeds can vary)
One task consists of "1000 pieces of work" which are all independent from one another.
Within one task all of the 1000 pieces of work are based upon the same piece of input data.
I mention solutions A+B since I think they help explaining the problem:
Solution (A) - The slow non-parallel way:
1. Client sends input to one service.
2. Service initializes based upon the input.
3. Service processes all 1000 pieces of work
(results get added up(super fast btw) so the result of 1000 pieces of work has the same size as the result of one)
4. Service sends result to the client.
5. Client receives result and is happy
Solution (B) - Parallel faster way:
Let's say ten services, so we evenly split it up and each should process 100.
The problem is some services may be much faster than others so giving each the same number(100) is slower than
necessary.
Furthermore we can't split up according to an a priori speed-test since the speed of one service can change and
some might even go down during processing, these are the reasons why I think the following would be best for my purpose.
Solution (C) - The way I would like to implement it:
Client sends out the same request to all services. (same request still implies that the task get's processed in parallel, parallelization is super easy for my problem 1000 pieces of work are so independent that doing 1000 times the "first" piece of work means we are done)
A service keeps working and sending results until it is told to stop or has processed 1000 pieces of work.
One result gets sent for 10 pieces of work done.
This means all services work parallel on the task and when the client has gotten a sum of 1000 results from all service replies combined it will send the stop signal.
That means normally no single service should reach 1000, but with having 1000 we have covered the situation where there is only one service and we have a fail-safe to avoid infinite loops if the stop signal gets lost. (client neither needs to wait nor to be absolutely sure that the stop signal has reached a service)
Throwing away additional results beyond our goal of 1000 is fine.
(The alternative of instead making follow-up requests to services that have responded faster than others would come
with the overhead of wasted time due to messages going back and forth and additional initializations.
(Add. inits could be avoided but it would be complicated and you still have the other overhead))
I basically have solutions/would know how to implement A+B but I have no clue how I would go about realizing (C).
How do you implement a client/service-architecture in C# where the service keeps sending results and doesn't just return one object/value? (Results are custom objects, btw)
Does someone know about C#-example-code where sth. like that is implemented? Would streaming be the right way?
I've found the "writing a custom stream"-example but it seems like it's a pretty long way from there to what I want. (As a WCF-noob I can easily be wrong on that though.)
Streaming in the WCF doesn't work in the way that you will open a stream, return the stream to the client and service will still generate results to the stream. If you want to work this way you must go deeper and use sockets directly. In WCF the stream must be written prior to returning it from the operation (I tried to write to returned stream from other thread but it didn't work). Streaming in WCF is only for data transport.
I don't like any of your solution. I would try:
Variant of B. But tasks will not be divided equally upfront. If you have 10 services and 1000 tasks you will send first 10 tasks (each to one service) only only after the service returns the result it will get another task. If tasks can be completed within reasonable time you will need only multiple async calls to services and wait for responses. If any service will not be able to complete task within defined timeout you will send the task to another service. If tasks can be completed fast you can send small batches instead of single task. If task completion takes long you will need duplex communication.
Use Transactional Message queue - MSMQ. You client will generate 1000 messages to "producer queue" and services will take these messages one by one and process them. They will send results as message to another "consumer queue" where client will take results and process them (each result must have correlation to the task). Transactional queue will ensure that each task can be processed only by single service but if service fails or timeout of the transaction will occur the task will be available for processing in another service. MSMQ also offers some additional features like queue for faulty tasks etc. This is little bit advanced scenario. The main problem of this scenario can be limitation in size of messages (max. 4MB per message).
Edit:
Ok because of your clarification it looks like you need to send the same task to multiple services and task will just trigger series of the same computation on the same data. You can achieve it in this way:
Build a duplex service using Net.tcp binding
Service will implement service contract which will have operations to start computation and to stop computation (you can use IsInitiating and IsTerminating properties of OperationContract)
Service will do computation in separate thread started in start operation
Stop operation will abort computation thread
Client will implement callback contract to receive results from the service
Service will call the client callback when the processing thread has result (or multiple results) to send back
Here is an example of using duplex services with WsDualHttpBinding - don't use this binding in your scenario because it is much more complicated if you want to have single client to communicate with multiple same services over duplex HTTP.
What you describe as Solution (C) sounds like a good use for Asynchronous WCF. Some of these might help...
Synchronous and Asynchronous Operations
Asynchronous Programming Design Patterns
How to: Call WCF Service Operations Asynchronously
Related
Why would WCF services configured with instancing per call and multiple concurrency would perform differently when run with different process and totally differently when called from threads?
I have one application which does distribute data through number of threads and makes calls (don't think that locking occurs in code, will test that again) to WCF service. During test was noticed that increasing number of threads in distribution app does not increase overall performance of wcf processing service, average is about 800 mpm(messages per minute processed) so throughput does not really change BUT if you run second application then average throughput increases to ~1200 mpm.
What am i doing wrong? what have i missed? i can't understand this behavior.
UPDATE #1(answer to questions in comments)
Thanks for such quick responses.
Max connections is set to 1000 in config(yes in system.net).
Referring to this article wcf Instances and threading max calls should be 16 x number of cores, so i assume if called form ~30 threads on 2 cpu wcf service should accept mostly all of those thread calls?
Does it have anything to do with shared memory? because that's probably the only differences between multiple threads and processes, i think.
Don't have a opportunity to right now to test it with more cpu's or single. Will do when can.
So I think to understand this behavior, you first need to understand how WCF processes calls with per-call instancing. The hint is in the name - Per Call.
Every call any client makes is serviced by a new instance of the service (the exception to this is reentrancy, but this is not important in your scenario).
So, configuring service concurrency makes no practical difference to the service behavior. Regardless of whether the calls are coming from a single, multithreaded client, or multiple clients, the service will behave the same: it will create a service instance per call.
Therefore, the difference in overall system performance must be due to something on the client side. If I had to take a wild guess I would say that the one client is slower than two clients because of the cost associated with context switching which is mitigated (via an unidentified mechanism) by running the client in in two separate processes.
If I am correct then you should be able to get the highest performance per thread by running multiple single-threaded clients, which is a test you could do.
In this implementation of operation, below attribute should be added to class.
[ServiceBehavior(InstanceContextMode=InstanceContextMode.PerCall)]
public class MyService : IMyService
{
}
You can read more here:
http://wcftutorial.net/Per-Call-Service.aspx
I have an Work Tracker WPF application which deployed in Windows Server 2008 and this Tracker application is communicating with (Tracker)windows service VIA WCF Service.
User can create any work entry/edit/add/delete/Cancel any work entry from Worker Tracker GUI application. Internally it will send a request to the Windows service. Windows Service will get the work request and process it in multithreading. Each workrequest entry will actually create n number of work files (based on work priority) in a output folder location.
So each work request will take to complete the work addition process.
Now my question is If I cancel the currently creating work entry. I want to to stop the current windows service work in RUNTIME. The current thread which is creating output files for the work should get STOPPED. All the thread should killed. All the thread resources should get removed once the user requested for CANCEL.
My workaround:
I use Windows Service On Custom Command method to send custom values to the windows service on runtime. What I am achieving here is it is processing the current work or current thread (ie creating output files for the work item recieved).and then it is coming to custom command for cancelling the request.
Is there any way so that the Work item request should get stopped once we get the custom command.
Any work around is much appreciated.
Summary
You are essentially talking about running a task host for long running tasks, and being able to cancel those tasks. Your specific question seems to want to know the best way to implement this in .NET. Your architecture is good, although you are brave to roll your own rather than using existing frameworks, and you haven't mentioned scaling your architecture later.
My preference is for using the TPL Task object. It supports cancellation, and is easy to poll for progress, etc. You can only use this in .NET 4 onwards.
It is hard to provide code without basically designing a whole job hosting engine for you and knowing your .NET version. I have described the steps in detail below, with references to example code.
Your approach of using the Windows Service OnCustomCommand is fine, you could also use a messaging service (see below) if you have that option for client-service comms. This would be more appropriate for a scenario where you have many clients talking to a central job service, and the job service is not on the same machine as the client.
Running and cancelling tasks on threads
Before we look at your exact context, it would be good to review MSDN - Asynchronous Programming Patterns. There are three main .NET patterns to run and cancel jobs on threads, and I list them in order of preference for use:
TAP: Task-based Asynchronous Pattern
Based on Task, which has been available only since .NET 4
The prefered way to run and control any thread-based activity from .NET 4 onwards
Much simpler to implement that EAP
EAP: Event-based Asynchronous Pattern
Your only option if you don't have .NET 4 or later.
Hard to implement, but once you have understood it you can roll it out and it is very reliable to use
APM: Asynchronous Programming Model
No longer relevant unless you maintain legacy code or use old APIs.
Even with .NET 1.1 you can implement a version of EAP, so I will not cover this as you say you are implementing your own solution
The architecture
Imagine this like a REST based service.
The client submits a job, and gets returned an identifier for the job
A job engine then picks up the job when it is ready, and starts running it
If the client doesn't want the job any more, then they delete the job, using it's identifier
This way the client is completely isolated from the workings of the job engine, and the job engine can be improved over time.
The job engine
The approach is as follows:
For a submitted task, generate a universal identifier (UID) so that you can:
Identify a running task
Poll for results
Cancel the task if required
return that UID to the client
queue the job using that identifier
when you have resources
run the job by creating a Task
store the Task in a dictionary against the UID as a key
When the client wants results, they send the request with the UID and you return progress by checking against the Task that you retrieve from the dictionary. If the task is complete they can then send a request for the completed data, or in your case just go and read the completed files.
When they want to cancel they send the request with the UID, and you cancel the Task by finding it in the dictionary and telling it to cancel.
Cancelling inside a job
Inside your code you will need to regularly check your cancellation token to see if you should stop running code (see How do I abort/cancel TPL Tasks? if you are using the TAP pattern, or Albahari if you are using EAP). At that point you will exit your job processing, and your code, if designed well, should dispose of IDiposables where required, remove big strings from memory etc.
The basic premise of cancellation is that you check your cancellation token:
After a block of work that takes a long time (e.g. a call to an external API)
Inside a loop (for, foreach, do or while) that you control, you check on each iteration
Within a long block of sequential code, that might take "some time", you insert points to check on a regular basis
You need to define how quickly you need to react to a cancellation - for a windows service it should be within milliseconds, preferably, to make sure that windows doesn't have problems restarting or stopping the service.
Some people do this whole process with threads, and by terminating the thread - this is ugly and not recommended any more.
Reliability
You need to ask: what happens if your server restarts, the windows service crashes, or any other exception happens causing you to lose incomplete jobs? In this case you may want a queue architecture that is reliable in order to be able to restart jobs, or rebuild the queue of jobs you haven't started yet.
If you don't want to scale, this is simple - use a local database that the windows service stored job information in.
On submission of a job, record its details in the database
When you start a job, record that against the job record in the database
When the client collects the job, mark it for delayed garbage collection in the database, and then delete it after a set amount of time (1 hour, 1 day ...)
If your service restarts and there are "in progress jobs" then requeue them and then start your job engine again.
If you do want to scale, or your clients are on many computers, and you have a job engine "farm" of 1 or more servers, then look at using a message queue instead of directly communicating using OnCustomCommand.
Message Queues have multiple benefits. They will allow you to reliably submit jobs to a central queue that many workers can then pick up and process, and to decouple your clients and servers so you can scale out your job running services. They are used to ensure jobs are reliably submitted and processed in a highly decoupled fashion, and this can work locally or globally, but always reliably, you can even then combine it with running your windows service on cloud workers which you can dynamically scale.
Examples of technologies are MSMQ (if you want to maintain your own, or must stay inside your own firewall), or Windows Azure Service Bus (WASB) - which is cheap, and already done for you. In either case you will want to use Patterns and Best Practices for Enterprise Integration. In the case of WASB then there are many (MSDN), many (MSDN samples for BrokeredMessaging etc.), many (new Task-based API) developer resources, and NuGet packages for you to use
Problem I'm tasked to resolve is (from my understanding) a typical producer/consumer problem. We have data incoming 24/7/365. The incoming data (call it raw data) is stored in a table and is unusable for the end user. We then select all raw data that has not been processed and start processing one by one. After each unit of data is processed, its stored in another table and is now ready to be consumed by the client application.
The process from loading the raw data till persisting processed data takes 2 - 5 seconds on average. But its highly dependent on the third party web services that we use to process the data. If the web services are slow, we are no longer processing data as fast as we're getting it in and accumulate backlog, hence causing our customers to loose live feed.
We want to make this process a multithreaded one. From my research I can see that the process can be divided into three discreet parts:
LOADING - A loader task (producer) that runs indefinitely and loads unprocessed data from DB to BlockingCollection<T> (or some other variation of a concurrent collection). My choice of BlockingCollection is due to the fact that it is designed with Producer/Consumer pattern in mind and offers GetConsumingEnumerable() method.
PROCESSING - Multiple consumers that consume data from the above BlockingCollection<T>. In its current implementation I have a Parallel.ForEach loop through GetConsumingEnumerable() that on each iteration starts a task with two task continuations: First step of the task is to call a third party web service, wait for the result and output the result for the second task to consume. Second task does calculations based on the first task's output and outputs the result for the third task, which basically just stores that result into the second BlockingCollection<T> (this one being an output collection). So my consumers are effectively producers too. Ideally each unit of data that has been loaded by the task 1 would be queued for processing in parallel.
PERSISTING - A single consumer runs against the second BlockingCollection mentioned above and persists processed data into database.
Problem I'm facing is the item number 2 from the list above. It does not seem to be fast enough (just by using Parallel.ForEach). I tried inside Parallel.ForEach instead of directly starting a task with continuation, start a wrapping thread that will in turn start the processing task. But this caused OutOfMemory exception, because thread count went out of control and reached 1200 very soon. I also tried scheduling work using ThreadPool with no avail.
Could you please advise if my approach is good enough for what we need done, or is there a better way of doing it?
If the bottleneck is some 3rd party service and this will not handle parallel execution but will queue your request then you cannot do a thing.
But first you can try this:
use the ThreadPool or Tasks (those will use ThreadPool too) - don't fire up Threads yourself
try to make your request async instead of using the thread exclusively
run your service/app through an performance profiler and check where you are "wasting" your time
make a spike/check for the 3rd party service and see how it handles parallel requests
think about caching the answers from this service (if possible)
That's all I can think of without further info right now.
I recently faced a problem which was very much similar to yours,
Here's what i did, hope it might help:
It seems like your 1st and 3rd part are rather simple, and can be
managed on their respective threads without any problem,
The 2nd part must firstly be started on a new thread, Then use System.Threading.timer, to make your web-service calls,
the method that calls the web-service passes the response(result) to the processing method by Invoking it asynchronously and letting it process the data at it's own pace,
this solved my problem, i hope it helps you too, if any doubts ask me, i'll explain it here...
I’m looking for the best way of using threads considering scalability and performance.
In my site I have two scenarios that need threading:
UI trigger: for example the user clicks a button, the server should read data from the DB and send some emails. Those actions take time and I don’t want the user request getting delayed. This scenario happens very frequently.
Background service: when the app starts it trigger a thread that run every 10 min, read from the DB and send emails.
The solutions I found:
A. Use thread pool - BeginInvoke:
This is what I use today for both scenarios.
It works fine, but it uses the same threads that serve the pages, so I think I may run into scalability issues, can this become a problem?
B. No use of the pool – ThreadStart:
I know starting a new thread takes more resources then using a thread pool.
Can this approach work better for my scenarios?
What is the best way to reuse the opened threads?
C. Custom thread pool:
Because my scenarios occurs frequently maybe the best way is to start a new thread pool?
Thanks.
I would personally put this into a different service. Make your UI action write to the database, and have a separate service which either polls the database or reacts to a trigger, and sends the emails at that point.
By separating it into a different service, you don't need to worry about AppDomain recycling etc - and you can put it on an entire different server if and when you want to. I think it'll give you a more flexible solution.
I do this kind of thing by calling a webservice, which then calls a method using a delegate asynchronously. The original webservice call returns a Guid to allow tracking of the processing.
For the first scenario use ASP.NET Asynchronous Pages. Async Pages are very good choice when it comes to scalability, because during async execution HTTP request thread is released and can be re-used.
I agree with Jon Skeet, that for second scenario you should use separate service - windows service is a good choice here.
Out of your three solutions, don't use BeginInvoke. As you said, it will have a negative impact on scalability.
Between the other two, if the tasks are truly background and the user isn't waiting for a response, then a single, permanent thread should do the job. A thread pool makes more sense when you have multiple tasks that should be executing in parallel.
However, keep in mind that web servers sometimes crash, AppPools recycle, etc. So if any of the queued work needs to be reliably executed, then moving it out of process is a probably a better idea (such as into a Windows Service). One way of doing that, which preserves the order of requests and maintains persistence, is to use Service Broker. You write the request to a Service Broker queue from your web tier (with an async request), and then read those messages from a service running on the same machine or a different one. You can also scale nicely that way by simply adding more instances of the service (or more threads in it).
In case it helps, I walk through using both a background thread and Service Broker in detail in my book, including code examples: Ultra-Fast ASP.NET.
I want to get progress updates about a method called on WCF.
For example I run 1000 queries and want to know the current status.
If a duplex contract is not workable in your environment, you will have to resort to polling. Your initial method could return an identifier (a GUID perhaps) and then you could make subsequent calls to another method to check the progress, and pass in the identifier.
This will obviously require you to store the progress information somewhere (like a session or a database), which is not great.
Yes - use a duplex contract and report the progress every so often by using callbacks.
That depends quite a bit on the services you are calling and how long you expect the operations to take.
If you are kicking off 1000 queries on a single service, you will likely get hit by the service throttling before all the calls to the service can be received.
There is a similar phenomenon on the client side. WCF will only allow so many concurrent calls at a time. This is configurable to some extent, but I would be surprised if 1000 concurrent calls would work without a hick-up or two.
If the calls end up being more or less synchronous, I would put all the queries in a queue and process each call in turn. You can then monitor the queue from your UI to update progress as calls to the service are completed.
If your architecture supports 1000 concurrent calls, then the duplex binding will be a good fit. You can just poll for completion.
Alternatively, you can create a pub / sub service that the target service updates as queries are completed. Your client would just catch events from the pub / sub service as the results from the queries become available.