I have a process that looks at a database table, picks up records and sends emails. At different times of the day/month this process can get pretty backed up, and current we have 30 instances of a windows service running to keep up with demand.
We tried creating a single instance, and spinning up 6 long running TPL tasks per instance, but this is static and didn't scale well.
What I would like to be able to do is look at the table to be processed, count the number of requests, and add threads to a pool up to a specified cap, say NumProcessors * 10. When the demand goes back down, pull these threads back out of the pool because each thread his the DB every 2 seconds, and I would much rather have 6 threads doing that per instance than 60.
Add threads is pretty easy, but I'm having a hard time thinking of a way to gracefully pull threads out of the pool as demand goes down.
One way to do this would be to have a single thread that reads from the database and sends the requests to processing threads (possibly using something like ConcurrentQueue).
This way, you always hit the database only once every 2 seconds (or whatever), but you also can have many threads that actually do the work that takes a long time (sending emails).
Behind the scene Task use ThreadPool - http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
You can control the max/min number of threads in the pool using SetMaxThreads and SetMinThreads functions.
So during peak time you can set max threads to NumProcessors * 10, and then when demand go back restore it to its previous value.
Related
I am trying to figure out exactly what impact does ThreadPool.SetMinThreads makes.
According to official documentation it says
Sets the minimum number of threads the thread pool creates on demand, as new requests are made, before switching to an algorithm for managing thread creation and destruction.
In my understanding, as a developer, I'm suppose to have control over mechanism on how to spin new threads on demand, so they are created and waiting in the idle state, in situations when for example I'm expecting load of request coming at specific time.
And this is exactly what I initially thought SetMinThreads method is designed for.
But when I started actually playing with it - I got really weird results.
So I'm having my ASP.NET .NET5 application, and in controller action I'm having code like this:
ThreadPool.SetMinThreads(32000, 1000);
And of course I'm intuitively expecting runtime to create 32K of worker threads and 1000 io threads for me.
And when I do that, and then call other method - Process.GetCurrentProcess().Threads to get all the process' threads, and print statistic on them, I get something like this
Standby - 17
Running - 4
I thought that maybe app needs some time to spin new threads, so I've tried diffrent delays, 1min, 5min and 10mins.
But result always stays the same, I get 15-20 Standby and 2-4 Running.
So then comes logical question - what exactly SetMinThreads method is doing at all? The description provided by MSDN does not seem very helpful.
And another logical question - what if I wanted to force dotnet to spin 32K of new threads in idle state - does dotnet provide any mechanism for it at all?
The ThreadPool.SetMinThreads sets the minimum number of threads that the ThreadPool creates instantly on demand. That's the key phrase, and it is indeed quite unintuitive. The ThreadPool currently¹ (.NET 5) works in two modes:
When a new request for work arrives, and all the threads in the pool are busy, create instantly a new thread in order to satisfy the request.
When a new request for work arrives, and all the threads in the pool are busy, queue the request, and wait for 1 sec before creating a new thread, hoping that in the meantime one of the worker threads will complete its current work, and will become available for serving the queued request.
The ThreadPool.SetMinThreads sets the threshold between these two modes. It does not give you control over the number of threads that are alive right now. Which is not very satisfying, but it is what it is. If you want to force the ThreadPool to create 1,000 threads instantly, you must also sent an equal number of requests for work, additionally to calling ThreadPool.SetMinThreads(1000, 1000). Something like this should do the trick:
ThreadPool.SetMinThreads(1000, 1000);
Task[] tasks = Enumerable.Range(0, 1000)
.Select(_ => Task.Run(() => Thread.Sleep(100)))
.ToArray();
Honestly I don't think that anyone does that. Creating a new Thread is quite fast in human time (it requires around 0.25 milliseconds per thread in my PC), so for a system that receives requests from humans, the overhead of creating a thread shouldn's have any measurable impact. On the other hand 0.25 msec is an eon in computer time, when you want a thread to do a tiny amount of work (in the range of nanoseconds), like adding something in a List<T>. That's why the ThreadPool was invented in the first place: To amortize the overhead of thread-creation for tiny but numerous workloads.
Be aware that creating a new Thread has also a memory cost, which is generally more significant than the time cost: each thread requires at least 1 MB of RAM for its stack. So creating 32,000 threads will tie down 32 GB of memory just for stack space. This is not very efficient. That's why in recent years asynchronous programming has become so prominent in server-side web development, because it allows to do more work with less threads.
¹ There is nothing preventing the Microsoft engineers from changing/twicking the implementation of the ThreadPool in the future. AFAIK this has already happened at least once in the past.
I'm new to threading so have some patience please.
I have tens of thousands of rows in a database. Each row represents a job needed to be done over the internet. I read a data row, I do some network-related work (which can even take between a couple of seconds up to a couple of minutes) and I grab the next data row (my C# application uses console, not GUI). As you might expect I want to do these jobs concurrently.
I looked into this subject and I thought I would use BackgroundThreads, but if I understand correctly people suggest there is no point in using them in a console application.
I assume I should not use Tasks, because each of my "tasks" will be represented by a single thread.
So I thought I would use ThreadPool with regular Threads.
To make things simple I just want to keep a constant number of threads (spawn new ones when one finishes) untill I run out of things to do (then I wait for data - usually alot of it - to arrive in the database and spawn threads). I need to know when a Thread ends because I have to spawn a new thread and update the database row containing data it was working with. To keep threads and database in sync I would probably have to mark database row with some kind of thread id when it is retrieved and then mark the row (success/fail) when thread ends.
Is this solution (try catch in thread delegate) enough to be sure that a thread has ended (and if it succeded or threw exception)?
I am not sure how to "wait" for the first thread to end - not all and not a particular one.
I also think that I don't want to read too much data in advance (and potentially wait for a thread to free up) because there might be other programs doing the same thing using the same database.
Any ideas appreciated!
Just use Parallel.ForEach to do this:
Parallel.ForEach(rows, row => ProcessRow(row));
If you need to specify a max degree of parallelization because the automatic partitioner happens to be using too many thread pool threads then you can specify it like so:
Parallel.ForEach(rows, new ParallelOptions() { MaxDegreeOfParallelism = 5 }
, row => ProcessRow(row));
I have a console application(c#) where I have to call various third party API's and collect data. This I have to do simultaneously for different users. I am using threads for it. But as the number of users are increasing this service is eating into the CPU performance. It is affecting other processes. Is there a way we can use threads for parallel processing but do not affect the CPU performance in a huge way.
I assume from your question that you're creating threads manually, and so the quick way to answer this is to suggest that you use an API like the Task Parallel Library, because this will take an arbitrary number of tasks and try to use a sensible number of threads to process them - so given 500 API requests, it would limit itself to just a few threads.
However, to answer in more detail: the typical reason that you would see this problem is that code is creating too many threads. Threads are not free resources - they are expensive.
A made up example based on your question might be this:
you have 5 3rd party APIs that you need to call, and each is going to return ~1MB of data per user
you call each API on a separate background thread, for each user
you have 100 users
you therefore have created 500 threads in total, each of which is waiting on data from the network
The problem here is that there are 500 threads the program is trying to manage, and they are all waiting on the slowest piece of the system - the network.
More simply, we are trying to download 500 pieces of data at once (which in this example would mean everything finishes slowly), rather than downloading them one at a time so that individual items will finish earlier. Because each thread will be doing nothing (just waiting for the network), the CPU will switch between idle threads continually. As you increase your number of users, the number of threads increases - which increases the CPU usage just for switch between threads, even though each thread is actually downloading more slowly. This is (approximately) why you'll be seeing slower performance as your user count goes up.
A better example would be to take the same scenario and use just one background thread:
you have 5 3rd party APIs that you need to call, and each is going to return ~1MB of data per user
each API call is put into a queue and the queue is processed by a single thread
you have 100 users
you therefore have 1 thread running in the background which is using the full available bandwidth of the network for each request
In this example, your CPU usage will be pretty consistent - no matter how many users you have, there is only one background thread running, so context switching is minimised. Each individual API call runs at the maximum rate of the network card and so finishes as quickly as possible.
The reality is that one thread is probably not enough: a single request is unlikely to saturate the network, as there will be limiting factors elsewhere. But this is something you can tune later: maybe 2 or 3 threads would be more performant, but 4 threads would be slower again. The general rule when threading is to start small and work up, not to create a thread for each piece of work.
First, run a profiler and checkout some refactoring tools to see if you can perform code optimization to resolve the issue. If your application is still overloading the server then setup or purchase load balancing. In the meantime, if you are running the latest OS's you could try setting a hacky CPU rate limit...however, that may not work for the needs you described.
I have a C# Windows Service that starts up various objects (Class libraries). Each of these objects has its own "processing" logic that start up multiple long running processing threads by using the ThreadPool. I have one example, just like this:
System.Threading.ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(WorkerThread_Processing));
This works great. My app works with no issues, and my threads work well.
Now, for regression testing, I am starting those same objects up, but from a C# Console app rather than a Windows Service. It calls the same exact code (because it is invoking the same objects), however the WorkerThread_Processing method delays for up to 20 seconds before starting.
I have gone in and switched from the ThreadPool to a Thread, and the issue goes away. What could be happening here? I know that I am not over the MaxThreads count (I am starting 20 threads max).
The ThreadPool is specifically not intended for long-running items (more specifically, you aren't even necessarily starting up new threads when you use the ThreadPool, as its purpose is to spread the tasks over a limited number of threads).
If your task is long running, you should either break it up into logical sections that are put on the ThreadPool (or use the new Task framework), or spin up your own Thread object.
As to why you're experiencing the delay, the MSDN Documentation for the ThreadPool class says the following:
As part of its thread management strategy, the thread pool delays before creating threads. Therefore, when a number of tasks are queued in a short period of time, there can be a significant delay before all the tasks are started.
You only know that the ThreadPool hasn't reached its maximum thread count, not how many threads (if any) it actually has sitting idle.
The thread pool's maximum number of threads value is the maximum number that it can create. It is not the maximum number that are already created. The thread pool has logic that prevents it from spinning up a whole bunch of threads instantly.
If you call ThreadPool.QueueUserWorkItem 10 times in quick succession, the thread pool will not create 10 threads immediately. It will start a thread, delay, start another, etc.
I seem to recall that the delay was 500 milliseconds, but I can't find the documentation to verify that.
Here it is: The Managed Thread Pool:
The thread pool has a built-in delay (half a second in the .NET
Framework version 2.0) before starting new idle threads. If your
application periodically starts many tasks in a short time, a small
increase in the number of idle threads can produce a significant
increase in throughput. Setting the number of idle threads too high
consumes system resources needlessly.
You can control the number of idle threads maintained by the thread
pool by using the GetMinThreads and SetMinThreads
Note that this quote is taken from the .NET 3.5 version of the documentation. The .NET 4.0 version does not mention a delay.
I've got a program I'm creating(in C#) and I see two approaches..
1) A job manager that waits for any number of X threads to finish, when finished it gets the next chunk of work and creates a new thread and gives it that chunk
or
2) We create X threads to start, give them each a chunk of work, and when a thread finishes a chunk its asks the job manager for more work. If there isn't any more work it sleeps and then asks again, with the sleep becoming progressively longer.
This program will be a run and done, tho I could see it turning into a service that continually looks for more jobs.
Each chunk will consists of a number of data ids, a call to the database to get some info or perform an operation on the data id, and then writing to the database info on the data id.
Assuming you are aware of the additional precautions that need to be taken when dealing with multithreaded database operations, it sounds like you're describing two different scenarios. In the first, you have several threads running, and once ALL of them finish it will look for new work. In the second, you have several threads running and their operations are completely parallel. Your environment is going to be what determines the proper approach to take; if there is something tying all of the work in the several threads where additional work cannot continue until all of them are finished, then with the former. If they don't have much affect on each other, go with the latter.
The second option isn't really right, as making the sleep time progressively longer means that you will unnecessarily keep those threads blocked.
Rather, you should have a pooled set of threads like the second option, but they use WaitHandles to wait for work and use a producer/consumer pattern. Basically, when the producer indicates that there is work, it sends a signal to a consumer (there will be a manager which will determine which thread will get the work, and then signal that thread) which will wake up and start working.
You might want to look into the Parallel Task Library. It's in beta now, but if you can use it and are comfortable with it, I would recommend it, as it will manage a great deal of this for you (and much better, taking into account the number of cores on a machine, the optimal number of threads, etc, etc).
The former solution (spawn a thread for each new piece of work), is easier to code, and not too bad, if the units of work are large enough.
The second solution (thread-pool, with a queue of work), is more complicated to code, but supports smaller units of work.
Instead of rolling your own solution, you should look at the ThreadPool class in the .NET framework. You could use the QueueUserWorkItem method. It should do exactly what you want to accomplish.