I have following scenario:
C# application (.net 4.0/4.5), with 5-6 different threads. Every thread has a different task, which is launched every x seconds (ranging from 5 to 300).
Each task has following steps:
Fetch items from Sql Server
Convert items in Json
Send data to webserver
Wait for reply from server.
Since this tasks can fail at some point (internet problems, timeout, etc) what is best solution in .NET world?
I thought about following solutions:
Spawn new thread every x seconds (if there is not another thread of this type in execution)
Spawn one thread for each type of task and loop steps every x seconds (to understand the way to manage exceptions)
Which would be more secure and robust? Application will run on unattended systems, so it should be able to remain in execution regardless of any possible exception.
Threads are pretty expensive to create. The first option isn't a great one. If the cycle of "do stuff" is pretty brief (between the pauses), you might consider using the ThreadPool or the TPL. If the threads are mostly busy, or the work takes any appreciable time, then dedicated workers are more appropriate.
As for exceptions: don't let exceptions escape workers. You must catch them. If all that means is that you give up and retry in a few seconds, that is probably fine.
You could have modeled the whole thing using a producer consumer pattern approach. You have a producer who puts the new task description in the queue and you can have multiple consumers (4 or 5 threads) who process from the queue. The number of consumers or the processing thread could vary depending on the load, length of the queue.
Each task involves reading from DB, converting the format, sending to web server and then process the response from web server. I assume each task would do all these steps.
In case of exceptions for an item in the queue, you could potentially mark the queue item as failed and schedule it for a retry later.
Related
I have my main Winforms application.
There are 6 Threads working in parrael + main thread, atleast that is what it ment to be .
I have created one Thread that is an Tcp Server, listening on specific port.
listenerThread = new Thread(new ThreadStart(AsynchronousSocketListener.StartListening));
listenerThread.Start();
I have also 5 different Threads that are doing different kind of stuff (for example updating database, counting averages/sums, being TCP Clients etc.)
My question is:
Is it possible that my TCP Server (which is working on one of 6 threads) wont read a message when one of those 5 different threads thread will take the computing power of CPU, and the TCP Server's Thread thread will have to wait ?
Another question: If that could happend, how could i avoid that ?
This is a summary of my comments above
"Is it possible that my TCP Server (which is working on one of 6 threads) wont read a message when one of those 5 different threads thread will take the computing power of CPU, and the TCP Server's Thread thread will have to wait ?"
Received data is buffered to an extent however if your code does not respond in an appropriate time then it could result in dropped data.
The same could be said in a single-core system and another app, say Prime95 is busy not playing nice and calculating prime numbers right next to yours.
Another question: If that could happend, how could i avoid that ?
When handling I/O and I'll focus on TCP here is to perform the minimal amount of processing in your data received handler irrespective of whether that handler is of the form IAsyncResult or async/await.
A good general flow is:
start an asynchronous read
read result of asynchronous read
place result in a queue
loop back to #1
Meanwhile you process the read results from #2 in a different mechanism whether that be a timer; GUI app-idle-loop; or a different thread so long as the thread processing the results has nothing to do with the flow above.
The reason being is that in a scenario involving reasonably high data transfer rates, if you were to read a block of data and then proceed to immediately update that Telerik datagrid UI showing 1000s of rows, there is a high chance that the next read operation will result in dropped data because you didn't respond in a sufficient amount of time.
I using Azure Cloud Worker Role for processing incoming task from queues. Processing of each task can take up to several hours and each worker-role can handle up to N tasks simultaneously. Basically, it's working.
Now, you can read in documentation that from time to time, the worker role can be shutdown (for software update, OS upgrade, ...). Basically, it's fine. But, this planned shutdown cannot forcedly stop the worker-role already running tasks.
Expected:
When calling the OnStop() method by the environment:
the worker role will stop getting new tasks for processing.
Wait for running tasks completion.
Continue with the planned shutdown.
Actual:
OnStop() method can be block for up to 5 minutes. I cannot guaranty that I'll finish processing the task in 5 minutes - so, this is problem... My task is being killed in the middle of processing and this became unstable situation for my software.
How I'm can avoid this 5 minutes limit? Any tip will be welcome.
How I'm can avoid this 5 minutes limit?
Unfortunately, you can't. This is a hard limit imposed from Azure side. You will need to work around that.
There are two possible solutions I can think of and both of them would require you to rethink about your current architecture:
Break your one big task into many smaller tasks and create some kind of work flow.
Make your task idempotent so that even if it gets terminated in between (because of worker role shutdown or error in task itself) and when it gets pick up by another instance, it starts again in such a way that your output of the task is not corrupted.
No, you cannot bypass this limit. In general you should not rely on any of your instances running continuously for any long period of time. Instances may be suddenly stopped or they may suddenly disappear (because of an underlying server failure). You software should be designed such that when an instance is restarted (possibly redeployed) or some other instance finds capacity to take a previously released work item that work item is reprocessed without any adverse effects.
Problem I'm tasked to resolve is (from my understanding) a typical producer/consumer problem. We have data incoming 24/7/365. The incoming data (call it raw data) is stored in a table and is unusable for the end user. We then select all raw data that has not been processed and start processing one by one. After each unit of data is processed, its stored in another table and is now ready to be consumed by the client application.
The process from loading the raw data till persisting processed data takes 2 - 5 seconds on average. But its highly dependent on the third party web services that we use to process the data. If the web services are slow, we are no longer processing data as fast as we're getting it in and accumulate backlog, hence causing our customers to loose live feed.
We want to make this process a multithreaded one. From my research I can see that the process can be divided into three discreet parts:
LOADING - A loader task (producer) that runs indefinitely and loads unprocessed data from DB to BlockingCollection<T> (or some other variation of a concurrent collection). My choice of BlockingCollection is due to the fact that it is designed with Producer/Consumer pattern in mind and offers GetConsumingEnumerable() method.
PROCESSING - Multiple consumers that consume data from the above BlockingCollection<T>. In its current implementation I have a Parallel.ForEach loop through GetConsumingEnumerable() that on each iteration starts a task with two task continuations: First step of the task is to call a third party web service, wait for the result and output the result for the second task to consume. Second task does calculations based on the first task's output and outputs the result for the third task, which basically just stores that result into the second BlockingCollection<T> (this one being an output collection). So my consumers are effectively producers too. Ideally each unit of data that has been loaded by the task 1 would be queued for processing in parallel.
PERSISTING - A single consumer runs against the second BlockingCollection mentioned above and persists processed data into database.
Problem I'm facing is the item number 2 from the list above. It does not seem to be fast enough (just by using Parallel.ForEach). I tried inside Parallel.ForEach instead of directly starting a task with continuation, start a wrapping thread that will in turn start the processing task. But this caused OutOfMemory exception, because thread count went out of control and reached 1200 very soon. I also tried scheduling work using ThreadPool with no avail.
Could you please advise if my approach is good enough for what we need done, or is there a better way of doing it?
If the bottleneck is some 3rd party service and this will not handle parallel execution but will queue your request then you cannot do a thing.
But first you can try this:
use the ThreadPool or Tasks (those will use ThreadPool too) - don't fire up Threads yourself
try to make your request async instead of using the thread exclusively
run your service/app through an performance profiler and check where you are "wasting" your time
make a spike/check for the 3rd party service and see how it handles parallel requests
think about caching the answers from this service (if possible)
That's all I can think of without further info right now.
I recently faced a problem which was very much similar to yours,
Here's what i did, hope it might help:
It seems like your 1st and 3rd part are rather simple, and can be
managed on their respective threads without any problem,
The 2nd part must firstly be started on a new thread, Then use System.Threading.timer, to make your web-service calls,
the method that calls the web-service passes the response(result) to the processing method by Invoking it asynchronously and letting it process the data at it's own pace,
this solved my problem, i hope it helps you too, if any doubts ask me, i'll explain it here...
I need to optimize a WCF service... it's quite a complex thing. My problem this time has to do with tasks (Task Parallel Library, .NET 4.0). What happens is that I launch several tasks when the service is invoked (using Task.Factory.StartNew) and then wait for them to finish:
Task.WaitAll(task1, task2, task3, task4, task5, task6);
Ok... what I see, and don't like, is that on the first call (sometimes the first 2-3 calls, if made quickly one after another), the final task starts much later than the others (I am looking at a case where it started 0.5 seconds after the others). I tried calling
ThreadPool.SetMinThreads(12*Environment.ProcessorCount, 20);
at the beginning of my service, but it doesn't seem to help.
The tasks are all database-related: I'm reading from multiple databases and it has to take as little time as possible.
Any idea why the last task is taking so long? Is there something I can do about it?
Alternatively, should I use the thread pool directly? As it happens, in one case I'm looking at, one task had already ended before the last one started - I would had saved 0.2 seconds if I had reused that thread instead of waiting for a new one to be created. However, I can not be sure that that task will always end so quickly, so I can't put both requests in the same task.
[Edit] The OS is Windows Server 2003, so there should be no connection limit. Also, it is hosted in IIS - I don't know if I should create regular threads or using the thread pool - which is the preferred version?
[Edit] I've also tried using Task.Factory.StartNew(action, TaskCreationOptions.LongRunning); - it doesn't help, the last task still starts much later (around half a second later) than the rest.
[Edit] MSDN1 says:
The thread pool has a built-in delay
(half a second in the .NET Framework
version 2.0) before starting new idle
threads. If your application
periodically starts many tasks in a
short time, a small increase in the
number of idle threads can produce a
significant increase in throughput.
Setting the number of idle threads too
high consumes system resources
needlessly.
However, as I said, I'm already calling SetMinThreads and it doesn't help.
I have had problems myself with delays in thread startup when using the (.Net 4.0) Task-object. So for time-critical stuff I now use dedicated threads (... again, as that is what I was doing before .Net 4.0.)
The purpose of a thread pool is to avoid the operative system cost of starting and stopping threads. The threads are simply being reused. This is a common model found in for example internet servers. The advantage is that they can respond quicker.
I've written many applications where I implement my own threadpool by having dedicated threads picking up tasks from a task queue. Note however that this most often required locking that can cause delays/bottlenecks. This depends on your design; are the tasks small then there would be a lot of locking and it might be faster to trade some CPU in for less locking: http://www.boyet.com/Articles/LockfreeStack.html
SmartThreadPool is a replacement/extension of the .Net thread pool. As you can see in this link it has a nice GUI to do some testing: http://www.codeproject.com/KB/threads/smartthreadpool.aspx
In the end it depends on what you need, but for high performance I recommend implementing your own thread pool. If you experience a lot of thread idling then it could be beneficial to increase the number of threads (beyond the recommended cpucount*2). This is actually how HyperThreading works inside the CPU - using "idle" time while doing operations to do other operations.
Note that .Net has a built-in limit of 25 threads per process (ie. for all WCF-calls you receive simultaneously). This limit is independent and overrides the ThreadPool setting. It can be increased, but it requires some magic: http://www.csharpfriends.com/Articles/getArticle.aspx?articleID=201
Following from my prior question (yep, should have been a Q against original message - apologies):
Why do you feel that creating 12 threads for each processor core in your machine will in some way speed-up your server's ability to create worker threads? All you're doing is slowing your server down!
As per MSDN do
As per the MSDN docs: "You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorith for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.".
Issues like this are usually caused by bumping into limits or contention on a shared resource.
In your case, I am guessing that your last task(s) is/are blocking while they wait for a connection to the DB server to come available or for the DB to respond. Remember - if your invocation kicks off 5-6 other tasks then your machine is going to have to create and open numerous DB connections and is going to kick the DB with, potentially, a lot of work. If your WCF server and/or your DB server are cold, then your first few invocations are going to be slower until the machine's caches etc., are populated.
Have you tried adding a little tracing/logging using the stopwatch to time how long it takes for your tasks to connect to the DB server and then execute their operations?
You may find that reducing the number of concurrent tasks you kick off actually speeds things up. Try spawning 3 tasks at a time, waiting for them to complete and then spawn the next 3.
When you call Task.Factory.StartNew, it uses a TaskScheduler to map those tasks into actual work items.
In your case, it sounds like one of your Tasks is delaying occasionally while the OS spins up a new Thread for the work item. You could, potentially, build a custom TaskScheduler which already contained six threads in a wait state, and explicitly used them for these six tasks. This would allow you to have complete control over how those initial tasks were created and started.
That being said, I suspect there is something else at play here... You mentioned that using TaskCreationOptions.LongRunning demonstrates the same behavior. This suggests that there is some other factor at play causing this half second delay. The reason I suspect this is due to the nature of TaskCreationOptions.LongRunning - when using the default TaskScheduler (LongRunning is a hint used by the TaskScheduler class), starting a task with TaskCreationOptions.LongRunning actually creates an entirely new (non-ThreadPool) thread for that Task. If creating 6 tasks, all with TaskCreationOptions.LongRunning, demonstrates the same behavior, you've pretty much guaranteed that the problem is NOT the default TaskScheduler, since this is going to always spin up 6 threads manually.
I'd recommend running your code through a performance profiler, and potentially the Concurrency Visualizer in VS 2010. This should help you determine exactly what is causing the half second delay.
What is the OS? If you are not running the server versions of windows, there is a connection limit. Your many threads are probably being serialized because of the connection limit.
Also, I have not used the task parallel library yet, but my limited experience is that new threads are cheap to make in the context of networking.
These articles might explain the problem you're having:
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-are-wcf-responses-slow-and-setminthreads-does-not-work.aspx
http://blogs.msdn.com/b/wenlong/archive/2010/02/11/why-does-wcf-become-slow-after-being-idle-for-15-seconds.aspx
seeing as you're using .Net 4, the first article probably doesn't apply, but as the second article points out the ThreadPool terminates idle threads after 15 seconds which might explain the problem you're having and offers a simple (though a little hacky) solution to get around it.
Whether or not you should be using the ThreadPool directly wouldn't make any difference as I suspect the task library is using it for you underneath anyway.
One third-party library we have been using for a while might help you here - Smart Thread Pool. You still get the same benefits of using the task libraries, in that you can have the return values from the threads and get any exception information from them too.
Also, you can instantiate threadpools so that when you have multiple places each needing a threadpool (so that a low priority process doesn't start eating into the quota of some high priority process) and oh yeah you can set the priority of the threads in the pool too which you can't do with the standard ThreadPool where all the threads are background threads.
You can find plenty of info on the codeplex page, I've also got a post which highlights some of the key differences:
http://theburningmonk.com/2010/03/threading-introducing-smartthreadpool/
Just on a side note, for tasks like the one you've mentioned, which might take some time to return, you probably shouldn't be using the threadpool anyway. It's recommended that we should avoid using the threadpool for any blocking tasks like that because it hogs up the threadpool which is used by all sorts of things by the framework classes, like handling timer events, etc. etc. (not to mention handling incoming WCF requests!). I feel like I'm spamming here but here's some of the info I've gathered around the use of the threadpool and some useful links at the bottom:
http://theburningmonk.com/2010/03/threading-using-the-threadpool-vs-creating-your-own-threads/
well, hope this helps!
I'm making a multi-threaded application using delegates to handle the processing of requests in a WCF service. I want the clients to be able to send the request and then disconnect and await for a callback to announce the work is done (which will most likely be searching through a database). I don't know how many requests may come in at once, it could be one every once in a while or it could spike to dozens.
As far as I know, .Net's threadpool has 25 threads available to use. What happens when I spawn 25 delegates or more? Does it throw an error, does it wait, does it pause an existing operation and start working on the new delegate, or some other behavior?
Beyond that, what happens if I want to spawn up to or more than 25 delegates while other operations (such as incoming/outgoing connections) want to start, and/or when another operation is working and I want to spawn another delegate?
I want to make sure this is scalable without being too complex.
Thanks
All operations are queued (I am assuming that you are using the threadpool directly or indirectly). It is the job of the threadpool to munch through the queue and dispatch operations onto threads. Eventually all threads may become busy, which will just mean that the queue will grow until threads are free to start processing queued work items.
You're confusing delegates with threads, and number of concurrent connections.
With WCF 2-way bindings, the connection remains open while waiting for the callback.
IIS 7 or above, on modern hardware should have no difficulty maintaining a few thousand concurrent connections if they're sitting idle.
Delegates are just method pointers - you can have as many as you wish. That doesn't mean they're being invoked concurrently.
If you are using ThreadPool.QueueUserWorkItem then it just queues the extra items until a thread is available.
ThreadPools default max amount of thread is 250 not 25! You can still set a higher limit for the ThreadPool if you need that.
If your ThreadPool runs out of threads two things may happen: All opperations are queued until the next resource is available. If there are finished threads those might still be "in use" so the GC will trigger and free up some of them, providing you with new resources.
However you can also create Threads not using the ThreadPool.