I am coding an application that runs many threads in the background which have to report back to the main thread so it can update a table in the interface. In the past, the worker threads were ordinary separate classes (named Citizen) which I have ran from the main thread using something like
new Thread(new ThreadStart(citizen.ProcessActions)).Start();
where ProcessActions function was the main function which did all the background work. Before actually starting the thread, I would register event handlers so the Citizen threads could log/report some stuff to the interface. Usually, there are tens of these Citizen threads (around 50) and they're pretty big classes - each has it's own HTTP client and it browses the web.
Is this a good way to do manage threads? Probably not, to be frank; I'm pretty sure the threads aren't gracefully exiting - once the ProcessActions function gets done, I remove the event handlers and that's it - the memory usage keeps rising with each new Citizen started.
What would be the best way to manage many (50+) threads, with which you have to communicate often? I believe I wouldn't have to worry much about thread safety for Citizen variables as I wouldn't be accessing them from other threads but it's own thread.
I think what you're looking for is a thread pool. Here's an MSDN article on them and that should be available in C# 4.0.
The idea would be to create a thread pool, set its count to some high number(say 50), and then start assigning threads to tasks. If the pool needs to expand, it can, but by declaring a high number up front, you get all the expensive creation of threads out of the way.
It might be beneficial to 'queue' tasks that you want to get done, and assign those tasks as threads become available.
Also, memory leaks can be hard to find, but I would start by testing the simple case: Take out all threads(just run one Citizen after another from the main thread) and let it run for a long time. If it's still leaking memory, your thread management isn't the issue.
Related
Scenario
I have a Windows Forms Application. Inside the main form there is a loop that iterates around 3000 times, Creating a new instance of a class on a new thread to perform some calculations. Bearing in mind that this setup uses a Thread Pool, the UI does stay responsive when there are only around 100 iterations of this loop (100 Assets to process). But as soon as this number begins to increase heavily, the UI locks up into eggtimer mode and the thus the log that is writing out to the listbox on the form becomes unreadable.
Question
Am I right in thinking that the best way around this is to use a Background Worker?
And is the UI locking up because even though I'm using lots of different threads (for speed), the UI itself is not on its own separate thread?
Suggested Implementations greatly appreciated.
EDIT!!
So lets say that instead of just firing off and queuing up 3000 assets to process, I decide to do them in batches of 100. How would I go about doing this efficiently? I made an attempt earlier at adding "Thread.Sleep(5000);" after every batch of 100 were fired off, but the whole thing seemed to crap out....
If you are creating 3000 separate threads, you are pushing a documented limitation of the ThreadPool class:
If an application is subject to bursts
of activity in which large numbers of
thread pool tasks are queued, use the
SetMinThreads method to increase the
minimum number of idle threads.
Otherwise, the built-in delay in
creating new idle threads could cause
a bottleneck.
See that MSDN topic for suggestions to configure the thread pool for your situation.
If your work is CPU intensive, having that many separate threads will cause more overhead than it's worth. However, if it's very IO intensive, having a large number of threads may help things somewhat.
.NET 4 introduces outstanding support for parallel programming. If that is an option for you, I suggest you have a look at that.
More threads does not equal top speed. In fact too many threads equals less speed. If your task is simply CPU related you should only be using as many threads as you have cores otherwise you're wasting resources.
With 3,000 iterations and your form thread attempting to create a thread each time what's probably happening is you are maxing out the thread pool and the form is hanging because it needs to wait for a prior thread to complete before it can allocate a new one.
Apparently ThreadPool doesn't work this way. I have never checked it with threads before so I am not sure. Another possibility is that the tasks begin flooding the UI thread with invocations at which point it will give up on the GUI.
It's difficult to tell without seeing code - but, based on what you're describing, there is one suspect.
You mentioned that you have this running on the ThreadPool now. Switching to a BackgroundWorker won't change anything, dramatically, since it also uses the ThreadPool to execute. (BackgroundWorker just simplifies the invoke calls...)
That being said, I suspect the problem is your notifications back to the UI thread for your ListBox. If you're invoking too frequently, your UI may become unresponsive while it tries to "catch up". This can happen if you're feeding too much status info back to the UI thread via Control.Invoke.
Otherwise, make sure that ALL of your work is being done on the ThreadPool, and you're not blocking on the UI thread, and it should work.
If every thread logs something to your ui, every written log line must invoke the main thread. Better to cache the log-output and update the gui only every 100 iterations or something like that.
Since I haven't seen your code so this is just a lot of conjecture with some highly hopefully educated guessing.
All a threadpool does is queue up your requests and then fire new threads off as others complete their work. Now 3000 threads doesn't sounds like a lot but if there's a ton of processing going on you could be destroying your CPU.
I'm not convinced a background worker would help out since you will end up re-creating a manager to handle all the pooling the threadpool gives you. I think more you issue is you've got too much data chunking going on. I think a good place to start would be to throttle the amount of threads you start and maintain. The threadpool manager easily allows you to do this. Find a balance that allows you to process data while still keeping the UI responsive.
I'm trying to fix memory spikes in a very large application. While I'm not sure how much of an effect this would have on memory, I noticed the following:
Application uses a custom thread pool to do all expensive tasks
Application will execute all incoming tasks
Tasks can be composed of thousands of sub tasks
While the thread pool will only execute {T} tasks at a time, and finishes a task completely before starting a new one, it does create a new system thread (Thread class) and start it for every sub task added to it
The sub task system threads are started with a thread start that instantly blocks on a manual reset event (MRE) pending a thread pool slot freeing up
So, this thread pool can create thousands of threads, but all but 30 (or whatever you configure) will be blocked on an MRE while other tasks complete.
My Question:
What impact on the memory/processor will a thousand threads blocked on MREs have? I don't have a lot of time to fix this spike, so if it's minimal, I'd rather leave the issue and work to fix it in a later patch when I have more time.
Also, is this behavior typical in thread pools, or does this sound flawed (I'm leaning towards flawed, but I don't have a good enough background to be sure).
What impact on the memory/processor will a thousand threads blocked on MREs have? I don't have a lot of time to fix this spike, so if it's minimal, I'd rather leave the issue and work to fix it in a later patch when I have more time.
Each thread, when created manually, has it's own stack allocated to it. By default, this will be 1MB per thread, though it is possible to make threads with a smaller stack via a constructor parameter.
You'd be much better off redesigning this, from the description of your problem, to use the standard ThreadPool, and a class like BlockingCollection<T> to handle your throttling. This is designed to directly allow bounding input, with blocking. Making a custom "thread pool" of an infinite number of threads is going to be far less efficient than using the highly tuned ThreadPool included with the framework.
Also, is this behavior typical in thread pools, or does this sound flawed (I'm leaning towards flawed, but I don't have a good enough background to be sure).
This is definitely flawed. The entire point of a ThreadPool is to avoid making a thread per request, and "pool" the threads (reusing them) for multiple requests without having to recreate them.
I have a thread that I fire off every time the user scans a barcode.
Most of the time it is a fairly short running thread. But sometimes it can take a very long time (waiting on a invoke to the GUI thread).
I have read that it may be a good idea to use the ThreadPool for this rather than just creating my own thread for each scan.
But I have also read that if the ThreadPool runs out of threads then it will just wait until some other thread exits (not OK for what I am doing).
So, how likely is it that I am going to run out of threads? And is the benefit of the ThreadPool really worth it? (When I scan it does not seem to take too long for the scan to "run" the thread logic.)
It depends on what you mean by "a very long time" and how common that scenario is.
The MSDN topic "The Managed Thread Pool" offers good guidelines for when not to use thread pool threads:
There are several scenarios in which it is appropriate to create and manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The
thread pool has a maximum number of
threads, so a large number of blocked
thread pool threads might prevent
tasks from starting.
You need to place threads into a single-threaded apartment. All
ThreadPool threads are in the
multithreaded apartment.
You need to have a stable identity associated with the thread, or to
dedicate a thread to a task.
Since the user will never scan more than one barcode at a time, the memory costs of the threadpool might not be worth it - I'd stick with a single thread just waiting in the background.
The point of the thread pool is to amortize the cost of creating threads, which are not inexpensive to spin up and tear down. If you have a short-running task, the cost of creating/destroying the thread can be a significant portion of the overall run-time. The maximum number of threads in the thread pool depends on the version of the .NET Framework, typically dozens to hundreds per processor. The number of threads is scaled depending on available work.
Will you run out of threads and have to wait for a thread to become available? It depends on your workload. You can get the maximum number of threads available via ThreadPool.GetMaxThreads(). Chances are (based on the description of your problem) that this number is sufficiently high.
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.getmaxthreads.aspx
Another option would be to manage your own pool of scan threads and assign them work rather than creating a new thread for every scan. Personally I would try the threadpool first and only manage your own threads if it proved necessary. Even better, I would look into async programming techniques in .NET. The methods will be run on the thread pool, but give you a much nicer programming experience than manual thread management.
If most of the time it is short running threads you could use the thread pool or a BackgroundWorker which draws threads from the pool.
An advantage I can see in your case is that threadpool class puts an upper limit on the amount of threads that may be active. It depends on the context of your application whether you will exhaust system resources. Exhausting a modern desktop system is VERY hard to do really.
If the software is used in a supermarket till it is highly unlikely that you will have more then 5 barcodes being analysed at the same time. If its run in a back-end server for a whole row of supermarket tills. Then perhaps 30-100 concurrent requests might be active.
With this sort of theory crafting it is highly unlikely that you will run out of threads, even on embedded hardware. If you have a dozen or so requests active at a time, and your code works, it's ok to just leave it as it is.
A thread pool is just an abstraction though, and you could have queue in the middle that queues request onto a thread-pool, in this scenario for the row-of-till example above, I'd feel comfortable queueing 100-1000 requests against a threadpool with 10 threads.
In .net (and on windows in general), the question should always be reversed: "Is creating a new thread worth it in this scenario?"
Creating a new thread is expensive, and doing it over and over again is almost certainly not worth it. The thread pool is cheap, and really should be the first thing you turn to when you need a new thread.
If you decide to spin up a new thread, soon you will start worrying about re-using the thread if it's already running. Then you will start worrying that sometimes the thread is running but it seems to be taking too long, and so you should make a new one. Then you're going to decide to have a thread not exit immediately upon finishing work, but to wait a little while in case new work comes in. And then... bam! You've created your own thread pool. At which point you should just back up and use the system-provided one.
The folks who mentioned that the thread pool might "run out of threads" were well-intentioned, but they did you a disservice. The limit on the number of threads in the thread pool is quite large. If you run into it, you have other problems.
(And, of course, since .net 2.0, you can set the maximum number of threads, so you can tweak the number if you absolutely have to.)
Others have directed you to MSDN: "The Managed Thread Pool". I will repeat that direction, as the article is good, but in my mind does not sell the thread pool hard enough. :)
Main execution path (main thread) is going to be forked into two execution paths (two new threads on different jobs) but the main thread is no longer needed. I can assign one of the tasks to main thread and save one thread (one task by main thread and another by a new thread) but I was wondering putting main thread in an infinite sleep Thread.Sleep(Timeout.Infinite) is a good approach or not. My class is going to be instantiated many times and if a thread in infinite sleep takes resource from OS it's bad news for me.
Each thread you create takes up stack space. On Windows, that's 1MB by default. There are also other internal house-keeping data structures that the operating system uses to keep track of threads which will take up a bit of memory as well, but the 1MB stack is definitely going to be the biggest consumer of resources.
Having said that, if we're only talking about 2 vs. 3 threads, then the difference is quite small. If it was 200 vs. 300 then you might have something to worry about. But if you're spawning a lot of threads, you'd be better off using some kind of thread pool (like, say, the one built-in to the .NET framework) rather than spawning individual threads anyway.
All threads tie up resources, regardless of if they're sleeping or not.
When using APIs handling asynchronous events in .Net I find myself unable to predict how the library will scale for large numbers of objects.
For example, using the Microsoft.Office.Interop.UccApi library, when I create an endpoint it gets events when phone events happen. Now let's say I want to create 1000 endpoints. The number of events per endpoint is small, but is what's happening behind the scenes in the API able to keep up with the event flow? I don't know because it never says how it's architected.
Let's say I want to create all 1000 objects in the main thread. Then I want to put the Login method into a large thread pool so all objects login in parallel. Then once all the objects have logged in the next phase will begin.
Are the event callbacks the API raises happening in the original creating thread? A separate threadpool? Or the same threadpool I'm accessing with ThreadPool.QueueUserWorkItem?
Would I be better putting each object in it's own thread? Grouping a few objects in each thread? Or is it fine just creating all 1000 objects in the main thread and through .Net magic it will all be OK?
thanx
The events from interop assemblies are just wrappers around the COM connection points. The thread on which the call from the connection point arrive depends on the threading model of the object that advised on that connection point. COM will ensure the proper thread switching for this.
If your objects are implemented on the main thread, which in .Net is usually an STA, all events should arrive on that same thread. If you want your calls to arrive on a random thread from the COM thread pool (which I think is the same as the CLR thread pool), you need to create your objects on a thread that is configured as an MTA.
I would strongly advise against creating a thread for each object: 1) If you create these threads as STA, each of them will have a message queue, waisting system resource; 2) If you create them as MTA, nothing guarantees you the event call will arrive on your thread; 3) You'll have 1000 idle threads doing nothing and just waiting on an event to shutdown; and 4) Starting up and shutting down all these threads will have terrible perf cost on your application.
It really depends on a lot of things, primarily how powerful your hardware is. The threadpool does have a certain number of threads (which you can increase) that it will make available for your application. So if all of your events are firing at the same time some will most likely be waiting for a few moments while your threadpool waits for threads to become free again. The tradeoff is that you don't have the performance hit of creating new threads all the time either. Probably creating 1000 threads isn't the right answer either.
It may turn out that this is ideal, both because of the performance gains in reusing threads but also because having 1000 threads all running simultaneously might be more memory / CPU usage than it's worth.
I just wanted to note that in .NET 2.0 and greater it's possible to programmatically increase the maximum number of threads in the thread pool using ThreadPool.SetMaxThreads(). Given this you can put a hard cap on the number of threads and so ensure the scheduler won't be brought to it's knees by the overhead.
Even more useful in this sort of case, you can set the minimum number of threads with ThreadPool.SetMinThreads(). With this you can ensure that you only pay the "horrible performance price" Franci is talking about once, at application startup. You could balance this against the expected number peak of users and so ensure you won't be creating tons of new threads.
A single new thread creation won't destroy you. What I would be worried about is the case where a lot of threads need to be created at the same time. If you can say that this will only happen at startup you would be golden.