Multiple BackgroundWorkers do not start work immediately - c#

I'm trying to use BackgroundWorker to load-test a web service,
Code is:
private void button1_Click(object sender, EventArgs e)
{
CheckWS("11111");
CheckWS("22222");
CheckWS("3344111");
// .. and some more of those, approx 1000
}
public void CheckWS(string sUser)
{
BackgroundWorker bgWork = new BackgroundWorker();
bgWork.DoWork += new DoWorkEventHandler(bgWork_DoWork);
sObject sData = new sObject();
sData.iNum = iCount++;
sData.sId = sUser;
bgWork.RunWorkerAsync(sData);
}
public void bgWork_DoWork(object sender, DoWorkEventArgs e)
{
// Update a textbox that this worker started
...
// Run a web service
...
// Update the textbox as worker ended
}
But according to the time taken for the job the get done, and according to the output of the workers (worker begin&end in the textbox), it takes way too long for each worker to start -
I would expect everything to run simultaneously, but instead, it executes only 2-3 background workers at a time...
Any ideas on this issue?

Backgroundworkers run on top of the ThreadPool and the TP purposely limits the number of threads.
Since this is a Test(-driver) app there is no problem in changing the config of the Pool. You can set the Minimum number of threads to something sensible.
ThreadPool.SetMinThreads(20, 20);

Giving some background to #Henk Holterman's correct answer that states the BackgroundWorker class uses the thread pool...
The thread pool starts out with one thread in it pool and the pool manager ‘injects' new threads to cope with extra asynchronous workload, upto some limiting maximum. After a set period of inactivity the pool manager may 'retire' threads if it 'thinks' that doing so will lead to a better throughput. In your case above the pool manager is limiting the number of concurrent threads.
You can set the upper limit on the number of threads the pool will create by calling Thread.Pool.SetMaxThread;, the defaults are:
1023 in Framework 4.0 in a 32-bit environment.
32768 in Framework 4.0 in a 64-bit environment.
250 per core in Framework 3.5.
25 per core in Framework 2.0.
[_these figure may vary according to hardware and OS]. The reason for these vast number (at least in the case of .NET 4.0) is to ensure that progress is made even when some threads are blocked (running some intense work etc.)
You can set the lower limit via ThreadPool.SetMinThreads. The role of this limiter is more subtle than that of the max limiter: this instructs the pool manager not to delay the creation of threads until reaching this lower limit on number - setting this number as #Henk pointed out quite rightly will improve you concurrency when there are blocked threads.
I hope this helps.

Related

Executing multiple threads

I am developing Windows Form C# program which reads Excel data from shared drive every 20 minutes (I'm using "Timer") - function "inserting". I want to read multiple Excel files at once because of the performance. For that reason I'm using threads.
Each thread is calling a function (LoadExcelData) which reads data from Excel to ArrayList. I want to know when all threads are finished (when all excel files were loaded to ArrayList) in order to insert this ArrayList to internal database.
I tried with thread[i].Join() but this freezes GUI. I also do not know what would happen if I have 100+ files and for this reason 100+ threads. Would that cause memory exception or some other exception?
//Execute every 20 minutes (Timer). Do not Execute in case previouse run is not finished
void inserting(List<String> excels){
int numOfThreads=excels.length;
Thread[] threads = new Thread[numOfThreads];
for (int index = 0; index < numOfThreads; index++)
{
int i = index;
threads[index] = new Thread(() =>
{
LoadExcelData(excels[i].File_name); //function loads excel data to global array "Weather" which is used later on
});
}
for (int i = 0; i < threads.Length; i++)
{
threads[i].Start(); //start thread
}
for (int i = 0; i < threads.Length; i++)
{
// threads[i].Join(); //this freezes GUI!
}
InsertToDB(object of ArrayList<ClassName>); //insert data which was read from Excels
isRunning=false;//Data was successefully inserted to DB
}
I want to run this every 20 minutes. I'm using Timer:
timer = new System.Windows.Forms.Timer();
timer.Tick += new EventHandler(timerEventHanlder);
timer.Interval = 20 * 60000; // in miliseconds
timer.Start();
private void timerEventHanlder(object sender, EventArgs e)
{
List<String> excels = getExcels();
if (!isRunning){ //in case previous timer even is not finished wait another 20 minutes...
isRunning=true; //flag to true
inserting(excels);
}
}
Is there any better wait to solve above problem?
The UI thread is freezing because you're using a System.Windows.Forms.Timer which fires the timer ticked event on the UI thread; this is useful in that you don't have to Invoke anything on the tick event. Calling Join blocks the calling thread and in your case this is the UI thread.
To avoid this (and since you're not needing to Invoke any UI elements), you can change your System.Windows.Forms.Timer to a System.Timers.Timer, which runs in a thread separate from the UI thread. If you switch to a System.Timers.Timer, you'll need to change some of the syntax in your code (e.g. the Tick event is the Elapsed event instead, etc.).
There's also the System.Thread.Timer and the System.Web.UI.Timer, additionally, you could also spawn a second thread from within the timer tick event to avoid it waiting on the threads within the UI thread, example:
private void timerEventHanlder(object sender, EventArgs e)
{
(new System.Threading.Thread(() => {
List<String> excels = getExcels();
if (!isRunning){ //in case previous timer even is not finished wait another 20 minutes...
isRunning=true; //flag to true
inserting(excels);
}
})).Start();
}
Starting a new thread avoids changing any of your current code and allows you to change it back if you do ever need to invoke anything in the UI.
Answering you're other question though:
I also do not know what would happen if I have 100+ files and for this reason 100+ threads. Would that cause memory exception or some other exception?
Spawning 100+ threads won't cause any exceptions unless your code has a specific exception (like a null delegate passed as the ThreadStart), or if the OS can't create a thread, which if the OS can't create a thread you have bigger problems. It is possible that memory exhaustion could happen since the Thread is a managed object and thus takes up memory (along with an ArrayList, but the amount of memory for 100+ threads (even 1000+) is negligible on any system that is capable of running the .NET framework (even on most embedded systems), so the number of threads won't necessarily be an issue.
Looking at your code, you might want to consider instead of spawning 100+ threads, utilizing the System.Threading.ThreadPool and a System.Threading.CountDownEvent, example:
CountdownEvent Countdown;
void LoadExcelData(object data)
{
// loads excel data to global array "Weather" which is used later on
Countdown.Signal();
}
//Execute every 20 minutes (Timer). Do not Execute in case previouse run is not finished
void inserting(List<object> excels)
{
Countdown = new CountdownEvent(excels.Count);
int i = 0;
while (i < excels.Count) {
ThreadPool.QueueUserWorkItem(LoadExcelData, excels[i++].File_name);
}
Countdown.Wait();
InsertToDB(WeatherList); //insert data which was read from Excels
isRunning = false; //Data was successefully inserted to DB
}
This will utilize the system thread pool to execute your functions and allows .NET to handle the scheduling of the threads to avoid massive resource contention if the number of threads is a lot. You could use other methods to block, like a Mutex or Semaphore, but the CountDownEvent pretty much encapsulates what you'd need to do with other wait objects and joining on the threads from the thread pool.
To be honest though, since you're reading data from Excel files in multiple threads, unless each thread reads the entire contents of the file into RAM then executes the operations that way, you might not see a huge increase in performance. Multi-threaded applications that have heavy I/O usually don't see a huge performance increase unless said I/O is on performance minded equipment or the initial input of the entire file is read into RAM. Just a side note as you're multi-threading with files.
It should also be noted too that utilizing the System.Threading.ThreadPool is ideally for threads you expect to only run for a few seconds or so; if you anticipate that a thread could take longer, you should stick with spawning the threads as you have now. You can still use the CountDownEvent and you don't need an array of threads like you have (you could just just use the (new Thread(function)).Start() syntax).
Hope that can help
The parent thread is going to reach the for loop that joins all the worker threads and wait there until all the threads have finished (and can be joined). If the GUI is running in that same parent thread, execution is not going to return to the GUI until all threads have finished, which is going to be a long time as you've set up timers. Try running the GUI in a different thread.
Edit:
Also on a side note, I'd set your timer lengths to something much shorter while you're debugging to see if it's actually waiting as you expect it to. Then once you have it functioning correctly you can set it back to 20 minutes.

Why is the Completed callback from SocketAsyncEventArgs frequently executed in newly created threads instead of using a bounded thread pool?

I have a simple client application that receives byte buffers from the network with a low throughput. Here is the code:
private static readonly HashSet<int> _capturedThreadIds = new HashSet<int>();
private static void RunClient(Socket socket)
{
var e = new SocketAsyncEventArgs();
e.SetBuffer(new byte[10000], 0, 10000);
e.Completed += SocketAsyncEventsArgsCompleted;
Receive(socket, e);
}
private static void Receive(Socket socket, SocketAsyncEventArgs e)
{
var isAsynchronous = socket.ReceiveAsync(e);
if (!isAsynchronous)
SocketAsyncEventsArgsCompleted(socket, e);
}
private static void SocketAsyncEventsArgsCompleted(object sender, SocketAsyncEventArgs e)
{
if (e.LastOperation != SocketAsyncOperation.Receive || e.SocketError != SocketError.Success || e.BytesTransferred <= 0)
{
Console.WriteLine("Operation: {0}, Error: {1}, BytesTransferred: {2}", e.LastOperation, e.SocketError, e.BytesTransferred);
return;
}
var thread = Thread.CurrentThread;
if (_capturedThreadIds.Add(thread.ManagedThreadId))
Console.WriteLine("New thread, ManagedId: " + thread.ManagedThreadId + ", NativeId: " + GetCurrentThreadId());
//Console.WriteLine(e.BytesTransferred);
Receive((Socket)sender, e);
}
The threading behavior of the application is quite curious:
The SocketAsyncEventsArgsCompleted method is frequently run in new threads. I would have expected that after some time no new thread would be created. I would have expected the threads to be reused, because of the thread pool (or IOCP thread pool) and because the throughput is very stable.
The number of threads stays low, but I can see in the process explorer that threads are frequently created and destroyed. Likewise, I would not have expected threads to be created or destroyed.
Can you explain the application behavior?
Edit: The "low" throughput is 20 messages per second (roughly 200 KB/s). If I increase the throughput to more than 1000 messages per second (50 MB/s), the application behavior does not change.
The low application throughput itself cannot explain the thread creation and destruction. The socket receives 20 messages per seconds, which is more than enough to keep a thread alive (the waiting threads are being destroyed after spending 10 seconds idle).
This problem is related to the thread pool thread injection, i.e. the threads creation and destruction strategy. Thread pool threads are regularly injected and destroyed in order to measure the impact of new threads on the thread pool throughput.
This is called thread probing. It is clearly explained in the Channel 9 video CLR 4 - Inside the Thread Pool (jump to 26:30).
It seems like thread probing is always done with newly created threads instead of moving a thread in and out of the pool. I suppose it works better like this for most applications because it avoids to keep an unused thread alive.
From MSDN
Beginning with the .NET Framework 4, the thread pool creates and
destroys worker threads in order to optimize throughput, which is
defined as the number of tasks that complete per unit of time. Too few
threads might not make optimal use of available resources, whereas too
many threads could increase resource contention.
Note
When demand is low, the actual number of thread pool threads can
fall below the minimum values.
Basically it sounds like your low throughput is causing the thread pool to destroy threads since they are not required, and are just sat taking up resources. I wouldn't worry about it. As MS explicitly state:
In most cases the thread pool will perform better with its own
algorithm for allocating threads.
If you're really bothered, you could always poll ThreadPool.GetAvailableThreads() to watch the pool, and see how different network throughputs affect it.

Thread Pool and it thread provide

My code is
static void Main(string[] args)
{
for (int i = 0; i < 100; i++)
{
ThreadPool.QueueUserWorkItem(y =>
{
Console.WriteLine(Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(3000);
});
}
Console.Read();
}
When I start program and look at the sos.dll I can see that every time thread pool provide me 4-5 threads. Hereupon occurs delay because pool don't give more threads.
Why is this happening?
ThreadPool Class:
There is one thread pool per process. Beginning with the .NET Framework 4, the default size of the thread pool for a process depends on several factors, such as the size of the virtual address space. A process can call the GetMaxThreads method to determine the number of threads. The number of threads in the thread pool can be changed by using the SetMaxThreads method. Each thread uses the default stack size and runs at the default priority.
As an additional note, depending on system resources (like CPU cores, RAM, etc.), more threads may not make your application run faster.

Limiting the number of threadpool threads

I am using ThreadPool in my application. I have first set the limit of the thread pool by using the following:
ThreadPool.SetMaxThreads(m_iThreadPoolLimit,m_iThreadPoolLimit);
m_Events = new ManualResetEvent(false);
and then I have queued up the jobs using the following
WaitCallback objWcb = new WaitCallback(abc);
ThreadPool.QueueUserWorkItem(objWcb, m_objThreadData);
Here abc is the name of the function that I am calling.
After this I am doing the following so that all my threads come to 1 point and the main thread takes over and continues further
m_Events.WaitOne();
My thread limit is 3. The problem that I am facing is, inspite of the thread pool limit set to 3, my application is processing more than 3 files at the same time, whereas it was supposed to process only 3 files at a time. Please help me solve this issue.
What kind of computer are you using?
From MSDN
You cannot set the number of worker
threads or the number of I/O
completion threads to a number smaller
than the number of processors in the
computer.
If you have 4 cores, then the smallest you can have is 4.
Also note:
If the common language runtime is
hosted, for example by Internet
Information Services (IIS) or SQL
Server, the host can limit or prevent
changes to the thread pool size.
If this is a web site hosted by IIS then you cannot change the thread pool size either.
A better solution involves the use of a Semaphore which can throttle the concurrent access to a resource1. In your case the resource would simply be a block of code that processes work items.
var finished = new CountdownEvent(1); // Used to wait for the completion of all work items.
var throttle = new Semaphore(3, 3); // Used to throttle the processing of work items.
foreach (WorkItem item in workitems)
{
finished.AddCount();
WorkItem capture = item; // Needed to safely capture the loop variable.
ThreadPool.QueueUserWorkItem(
(state) =>
{
throttle.WaitOne();
try
{
ProcessWorkItem(capture);
}
finally
{
throttle.Release();
finished.Signal();
}
}, null);
}
finished.Signal();
finished.Wait();
In the code above WorkItem is a hypothetical class that encapsulates the specific parameters needed to process your tasks.
The Task Parallel Library makes this pattern a lot easier. Just use the Parallel.ForEach method and specify a ParallelOptions.MaxDegreesOfParallelism that throttles the concurrency.
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = 3;
Parallel.ForEach(workitems, options,
(item) =>
{
ProcessWorkItem(item);
});
1I should point out that I do not like blocking ThreadPool threads using a Semaphore or any blocking device. It basically wastes the threads. You might want to rethink your design entirely.
You should use Semaphore object to limit concurent threads.
You say the files are open: are they actually being actively processed, or just left open?
If you're leaving them open: Been there, done that! Relying on connections and resources (it was a DB connection in my case) to close at end of scope should work, but it can take for the dispose / garbage collection to kick in.

Multiple Threads

I post a lot here regarding multithreading, and the great stackoverflow community have helped me alot in understand multithreading.
All the examples I have seen online only deal with one thread.
My application is a scraper for an insurance company (family company ... all free of charge). Anyway, the user is able to select how many threads they want to run. So lets say for example the user wants the application to scrape 5 sites at one time, and then later in the day he choses 20 threads because his computer isn't doing anything else so it has the resources to spare.
Basically the application builds a list of say 1000 sites to scrape. A thread goes off and does that and updates the UI and builds the list.
When thats finished another thread is called to start the scraping. Depending on the number of threads the user has set to use it will create x number of threads.
Whats the best way to create these threads? Should I create 1000 threads in a list. And loop through them? If the user has set 5 threads to run, it will loop through 5 at a time.
I understand threading, but it's the application logic which is catching me out.
Any ideas or resources on the web that can help me out?
You could consider using a thread pool for that:
using System;
using System.Threading;
public class Example
{
public static void Main()
{
ThreadPool.SetMaxThreads(100, 10);
// Queue the task.
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);
Console.WriteLine("Main thread exits.");
}
// This thread procedure performs the task.
static void ThreadProc(Object stateInfo)
{
Console.WriteLine("Hello from the thread pool.");
}
}
This scraper, does it use a lot of CPU when its running?
If it does a lot of communication with these 1000 remote sites, downloading their pages, that may be taking more time than the actual analysis of the pages.
And how many CPU cores does your user have? If they have 2 (which is common these days) then beyond two simultaneous threads performing analysis, they aren't going to see any speed up.
So you probably need to "parallelize" the downloading of the pages. I doubt you need to do the same for the analysis of the pages.
Take a look into asynchronous IO, instead of explicit multi-threading. It lets you launch a bunch of downloads in parallel and then get called back when each one completes.
If you really just want the application, use something someone else already spent time developing and perfecting:
http://arachnode.net/
arachnode.net is a complete and comprehensive .NET web crawler for
downloading, indexing and storing
Internet content including e-mail
addresses, files, hyperlinks, images,
and Web pages.
Whether interested or involved in
screen scraping, data mining, text
mining, research or any other
application where a high-performance
crawling application is key to the
success of your endeavors,
arachnode.net provides the solution
you need for success.
If you also want to write one yourself because it's a fun thing to write (I wrote one not long ago, and yes, it is alot of fun ) then you can refer to this pdf provided by arachnode.net which really explains in detail the theory behind a good web crawler:
http://arachnode.net/media/Default.aspx?Sort=Downloads&PageIndex=1
Download the pdf entitled: "Crawling the Web" (second link from top). Scroll to Section 2.6 entitled: "2.6 Multi-threaded Crawlers". That's what I used to build my crawler, and I must say, I think it works quite well.
I think this example is basically what you need.
public class WebScraper
{
private readonly int totalThreads;
private readonly List<System.Threading.Thread> threads;
private readonly List<Exception> exceptions;
private readonly object locker = new object();
private volatile bool stop;
public WebScraper(int totalThreads)
{
this.totalThreads = totalThreads;
threads = new List<System.Threading.Thread>(totalThreads);
exceptions = new List<Exception>();
for (int i = 0; i < totalThreads; i++)
{
var thread = new System.Threading.Thread(Execute);
thread.IsBackground = true;
threads.Add(thread);
}
}
public void Start()
{
foreach (var thread in threads)
{
thread.Start();
}
}
public void Stop()
{
stop = true;
foreach (var thread in threads)
{
if (thread.IsAlive)
{
thread.Join();
}
}
}
private void Execute()
{
try
{
while (!stop)
{
// Scrap away!
}
}
catch (Exception ex)
{
lock (locker)
{
// You could have a thread checking this collection and
// reporting it as you see fit.
exceptions.Add(ex);
}
}
}
}
The basic logic is:
You have a single queue in which you put the URLs to scrape then you create your threads and use a queue object to which every thread has access. Let the threads start a loop:
lock the queue
check if there are items in the queue, if not, unlock queue and end thread
dequeue first item in the queue
unlock queue
process item
invoke an event that updates the UI (Remember to lock the UI Controller)
return to step 1
Just let the Threads do the "get stuff from the queue" part (pulling the jobs) instead of giving them the urls (pushing the jobs), that way you just say
YourThreadManager.StartThreads(numberOfThreadsTheUserWants);
and everything else happens automagically. See the other replies to find out how to create and manage the threads .
I solved a similar problem by creating a worker class that uses a callback to signal the main app that a worker is done. Then I create a queue of 1000 threads and then call a method that launches threads until the running thread limit is reached, keeping track of the active threads with a dictionary keyed by the thread's ManagedThreadId. As each thread completes, the callback removes its thread from the dictionary and calls the thread launcher.
If a connection is dropped or times out, the callback reinserts the thread back into the queue. Lock around the queue and the dictionary. I create threads vs using the thread pool because the overhead of creating a thread is insignificant compared to the connection time, and it allows me to have a lot more threads in flight. The callback also provides a convenient place with which to update the user interface, even allowing you to change the thread limit while it's running. I've had over 50 open connections at one time. Remember to increase your MacConnections property in your app.config (default is two).
I would use a queue and a condition variable and mutex, and start just the requested number of threads, for example, 5 or 20 (and not start 1,000).
Each thread blocks on the condition variable. When woken up, it dequeues the first item, unlocks the queue, works with the item, locks the queue and checks for more items. If the queue is empty, sleep on the condition variable. If not, unlock, work, repeat.
While the mutex is locked, it can also check if the user has requested the count of threads to be reduced. Just check if count > max_count, and if so, the thread terminates itself.
Any time you have more sites to queue, just lock the mutex and add them to the queue, then broadcast on the condition variable. Any threads that are not already working will wake up and take new work.
Any time the user increases the requested thread count, just start them up and they will lock the queue, check for work, and either sleep on the condition variable or get going.
Each thread will be continually pulling more work from the queue, or sleeping. You don't need more than 5 or 20.
Consider using the event-based asynchronous pattern (AsyncOperation and AsyncOperationManager Classes)
You might want to take a look at the ProcessQueue article on CodeProject.
Essentially, you'll want to create (and start) the number of threads that are appropriate, in your case that number comes from the user. Each of these threads should process a site, then find the next site needed to process. Even if you don't use the object itself (though it sounds like it would suit your purposes pretty well, though I'm obviously biased!) it should give you some good insight into how this sort of thing would be done.

Categories