I have a loop that I don't want to continue until LoadAmazonDataByBatch() has returned. I know there must be a straight forward way of doing it, and I'm almost certain I'm approaching the problem wrong.
const int batchSize = 500;
for (int i = 0; i < total; i = i + batchSize)
{
LoadAmazonDataByBatch(i, batchSize, fileList, total, amazonLogHandler, stopWatch);
}
LoadAmazonDataByBatch() does a bunch of things on worker threads including creating a temporary DataSet that would get very large without the batching. I don't want to create a new DataSet until the old one is processed and disposed (by LoadAmazonDataByBatch).
Obviously the way this is written now everything happens almost all at once.
How can I approach this better?
You need to do some sort of thread synchronization.
Not clear where you got the LoadAmazonByBatch() , but I'd suggest
checking the doc for that function to see if there is a synchronous version of the operation.
if no doc is available, then you will need to roll up yr sleeves. It may require viewing or modifying the source of LoadAmazonByBatch(). Look for a ManualResetEvent that is set by the workers when they are finished. Or, maybe there is a regular .NET event that is emitted by that method when it completes. If those things don't exist you'll need to add something like that.
It's very likely that LoadAmazonDataByBatch creates a bunch of threads. You have to call Join on all created threads to wait till they complete.
Surely the only way this wouldn't wait for the function to return is if it's written asynchronously?
The relevant code isn't the loop you posted, it's the definition of LoadAmazonDataByBatch() that we need to see.
If that function has a callback (stopWatch?), perhaps you could call the function (LoadAmazonDataByBatch) within the callback.
If LoadAmazonDataByBatch() generates child threads, and it runs until each of those threads is finished, you can use the Thread.Join() method to make it wait for the child threads to finish. I am not sure how that would work for multiple children but I think it should be OK.
Reference: Threading in C#
Related
So I have been playing around with threads for the last couple months and while my output is as expected i have a feeling I'm not doing this the best way. I can't seem to get a straight answer from anyone i work with on what is best practice so i thought i would ask you guys.
Question: I'm going to try to make this simple so bear with me. Say i have a form that has a start and stop button. The start button fires and event that starts a thread. Inside this thread's DoWork it is going to call 3 methods. Method1() prints to the console "A\n" 10 times with a pause of 10 seconds in between. Method2() and Method3() are the exact same just different letter and different pause times in between Console.WriteLine. Now when you press the stop button you want the response to be immediate. I don't want to have to wait for the methods to complete. How do i go about this?
The way i have been doing this is passing my BackgroundWorker to each method and checking the worker.CancellationPending like so
public void Method1(BackgroundWorker worker)
{
for(int i = 0; i < 10 && !worker.CancellationPending; ++i)
{
Console.WriteLine("A");
for(int j = 0; j < 100 && !worker.CancellationPending; ++i)
{
Thread.Sleep(100);
}
}
}
Like i said this give me the desired result however imagine that method1 becomes a lot more complex, let say it is using a DLL to write that has a keydown and a key up. If i just abort the thread i could possibly leave myself in an undesired state as well. I find myself littering my code with !worker.CancellationPending. Practically every code block i am checking CancellationPending. I look at a lot of example on line and i rarely see people passing a thread around like i am. What is best practices on this?
Consider using iterators (yield return) to break up the steps.
public void Method1(Backgroundworker worker)
{
foreach (var discard in Method1Steps)
{
if (worker.CancelationPending)
return;
}
}
private IEnumerable<object> Method1Steps()
{
for (int i = 0; i < 10; ++i)
{
yield return null;
Console.WriteLine("A");
for (int j = 0; j < 100; ++i)
{
Thread.Sleep(100);
yield return null;
}
}
}
This solution may be harder to implement if you have a bunch of try/catch/finally or a bunch of method calls that also need to know about cancelation.
Yes, you are doing it correctly. It may seem awkward at first, but it really is the best option. It is definitely far better than aborting a thread. Loop iterations, as you have discovered, are ideal candidates for checking CancelationPending. This is because a loop iteration often isolates a logical unit of work and thus easily delineate a safe point. Safe points are markers in the execution of a thread where termination can be easily accomplished without corrupting any data.
The trick is to poll CancelationPending at safe points frequently enough to provide timely feedback to the caller that cancelation completed successfully, but not too frequently to negatively effect performance or as to "litter the code".
In your specific case the inner loop is the best place to poll CancelationPending. I would omit the check on the outer loop. The reason is because the inner loop is where most of the time is spent. The check on the outer loop would be pointless because the outer loop does very little actual work except to get the inner loop going.
Now, on the GUI side you might want to grey out the stop button to let the user know that the cancelation request was accepted. You could display a message like "cancelation pending" or the like to make it clear. Once you get the feedback that cancelation is complete then you could remove the message.
Well, if you are in the situation where you have to abort a CPU-intensive thread, then you are somwhat stuck with testing an 'Abort' boolean, (or cancellation token), in one loop or another, (maybee not the innermost one - depends on how long this takes). AFAIK, you can just 'return' from the inner loop, so exiting the method - no need to check at every level! To minimize the overhead on this, try to make it a local-ish boolean, ie try not to dereference it through half-a-dozen ...... classes every time .
Maybee inherit classes from 'Stoppable', that has an 'Abort' method and a 'Stop' boolean? You example thread above is spending most time sleeping, so you get 50ms average latency before you get to check anything. In such a case, you could wait on some event with a timeout instead of sleeping. Override 'Abort' to set the event as well as calling the inherited Abort & so terminate the wait early. You could also set the event in the cancellationToken delegate/callback, should you implement this new functionality as described by Dan.
There are acually very few Windows API etc. that are not easily 'unstickable' or don't have asynchronous, 'Ex' versions, so it's err.. 'nearly' always possible to cancel, one way or another, eg. closing sockets to force a socket read to except, writing temporary file to force Folder Change Notifications to return.
Rgds,
Martin
For multiple threads wait, can anyone compare the pros and cons of using WaitHandle.WaitAll and Thread.Join?
WaitHandle.WaitAll has a 64 handle limit so that is obviously a huge limitation. On the other hand, it is a convenient way to wait for many signals in only a single call. Thread.Join does not require creating any additional WaitHandle instances. And since it could be called individually on each thread the 64 handle limit does not apply.
Personally, I have never used WaitHandle.WaitAll. I prefer a more scalable pattern when I want to wait on multiple signals. You can create a counting mechanism that counts up or down and once a specific value is reach you signal a single shared event. The CountdownEvent class conveniently packages all of this into a single class.
var finished = new CountdownEvent(1);
for (int i = 0; i < NUM_WORK_ITEMS; i++)
{
finished.AddCount();
SpawnAsynchronousOperation(
() =>
{
try
{
// Place logic to run in parallel here.
}
finally
{
finished.Signal();
}
}
}
finished.Signal();
finished.Wait();
Update:
The reason why you want to signal the event from the main thread is subtle. Basically, you want to treat the main thread as if it were just another work item. Afterall, it, along with the other real work items, is running concurrently as well.
Consider for a moment what might happen if we did not treat the main thread as a work item. It will go through one iteration of the for loop and add a count to our event (via AddCount) indicating that we have one pending work item right? Lets say the SpawnAsynchronousOperation completes and gets the work item queued on another thread. Now, imagine if the main thread gets preempted before swinging around to the next iteration of the loop. The thread executing the work item gets its fair share of the CPU and starts humming along and actually completes the work item. The Signal call in the work item runs and decrements our pending work item count to zero which will change the state of the CountdownEvent to signalled. In the meantime the main thread wakes up and goes through all iterations of the loop and hits the Wait call, but since the event got prematurely signalled it pass on by even though there are still pending work items.
Again, avoiding this subtle race condition is easy when you treat the main thread as a work item. That is why the CountdownEvent is intialized with one count and the Signal method is called before the Wait.
I like #Brian's answer as a comparison of the two mechanisms.
If you are on .Net 4, it would be worthwhile exploring Task Parallel Library to achieve Task Parellelism via System.Threading.Tasks which allows you to manage tasks across multiple threads at a higher level of abstraction. The signalling you asked about in this question to manage thread interactions is hidden or much simplified, and you can concentrate on properly defining what each Task consists of and how to coordinate them.
This may seem offtopic but as Microsoft themselves say in the MSDN docs:
in the .NET Framework 4, tasks are the
preferred API for writing
multi-threaded, asynchronous, and
parallel code.
The waitall mechanism involves kernal-mode objects. I don't think the same is true for the join mechanism. I would prefer join, given the opportunity.
Technically though, the two are not equivalent. IIRC Join can only operate on one thread. Waitall can hold for the signalling of multiple kernel objects.
I am running a background worker thread that takes a long time to run.
The actual function stores the folder structure of a location, but we can take an example of the following pseudo code running in a different thread -
private int currentResult=0;
private void worker() {
for(int n = 0; n<1000; ++n)
{
int result;
// Do some time consuming computation and update the result
currentResult = result;
}
}
This is running in a BackgroundWorker thread. Can I read the currentResult from another thread safely?
Edit:
Keyword volatile seems like a magic solution (thanks Jon)! I am planning to pass a string with a message in this way from the worker class to the UI.
You might be wondering why I don't use ReportProgress. The reason is that the BackgroundWorker.DoWork creates an object of a different class, and calls a method there which does bulk of the work. This method is time consuming. The class is one to get the directory structure, and many related methods in it which the main computing method depends on. So this class does not even know existance of the background worker, and hence cannot report progress to it. Moving functionality of the class to BackgroundWorker seems messy. If this strikes as a bad design I am open to suggestions!
If you make it volatile you can... or if you always use the Interlocked class to both read and write it. (Or always read/write it within a lock on the same monitor.) Without these precautions, you could end up reading a stale value.
However, that's not necessarily the best way to do things. Generally, a background worker should use ReportProgress to indicate its progress... is there any reason you don't want to do that in this case?
Greetings
I have a program that creates multiples instances of a class, runs the same long-running Update method on all instances and waits for completion. I'm following Kev's approach from this question of adding the Update to ThreadPool.QueueUserWorkItem.
In the main prog., I'm sleeping for a few minutes and checking a Boolean in the last child to see if done
while(!child[child.Length-1].isFinished) {
Thread.Sleep(...);
}
This solution is working the way I want, but is there a better way to do this? Both for the independent instances and checking if all work is done.
Thanks
UPDATE:
There doesn't need to be locking. The different instances each have a different web service url they request from, and do similar work on the response. They're all doing their own thing.
If you know the number of operations that will be performed, use a countdown and an event:
Activity[] activities = GetActivities();
int remaining = activities.Length;
using (ManualResetEvent finishedEvent = new ManualResetEvent(false))
{
foreach (Activity activity in activities)
{
ThreadPool.QueueUserWorkItem(s =>
{
activity.Run();
if (Interlocked.Decrement(ref remaining) == 0)
finishedEvent.Set();
});
}
finishedEvent.WaitOne();
}
Don't poll for completion. The .NET Framework (and the Windows OS in general) has a number of threading primitives specifically designed to prevent the need for spinlocks, and a polling loop with Sleep is really just a slow spinlock.
You can try Semaphore.
A blocking way of waiting is a bit more elegant than polling. See the Monitor.Wait/Monitor.Pulse (Semaphore works ok too) for a simple way to block and signal. C# has some syntactic sugar around the Monitor class in the form of the lock keyword.
This doesn't look good. There is almost never a valid reason to assume that when the last thread is completed that the other ones are done as well. Unless you somehow interlock the worker threads, which you should never do. It also makes little sense to Sleep(), waiting for a thread to complete. You might as well do the work that thread is doing.
If you've got multiple threads going, give them each a ManualResetEvent. You can wait on completion with WaitHandle.WaitAll(). Counting down a thread counter with the Interlocked class can work too. Or use a CountdownLatch.
I have a GUI C# application that has a single button Start/Stop.
Originally this GUI was creating a single instance of a class that queries a database and performs some actions if there are results and gets a single "task" at a time from the database.
I was then asked to try to utilize all the computing power on some of the 8 core systems. Using the number of processors I figure I can create that number of instances of my class and run them all and come pretty close to using a fair ammount of the computing power.
Environment.ProccessorCount;
Using this value, in the GUI form, I have been trying to go through a loop ProccessorCount number of times and start a new thread that calls a "doWork" type method in the class. Then Sleep for 1 second (to ensure the initial query gets through) and then proceed to the next part of the loop.
I kept on having issues with this however because it seemed to wait until the loop was completed to start the queries leading to a collision of some sort (getting the same value from the MySQL database).
In the main form, once it starts the "workers" it then changes the button text to STOP and if the button is hit again, it should execute on each "worker" a "stopWork" method.
Does what I am trying to accomplish make sense? Is there a better way to do this (that doesn't involve restructuring the worker class)?
Restructure your design so you have one thread running in the background checking your database for work to do.
When it finds work to do, spawn a new thread for each work item.
Don't forget to use synchronization tools, such as semaphores and mutexes, for the key limited resources. Fine tuning the synchronization is worth your time.
You could also experiment with the maximum number of worker threads - my guess is that it would be a few over your current number of processors.
While an exhaustive answer on the best practices of multithreaded development is a little beyond what I can write here, a couple of things:
Don't use Sleep() to wait for something to continue unless ABSOLUTELY necessary. If you have another code process that you need to wait for completion, you can either Join() that thread or use either a ManualResetEvent or AutoResetEvent. There is a lot of information on MSDN about their usage. Take some time to read over it.
You can't really guarantee that your threads will each run on their own core. While it's entirely likely that the OS thread scheduler will do this, just be aware that it isn't guaranteed.
I would assume that the easiest way to increase your use of the processors would be to simply spawn the worker methods on threads from the ThreadPool (by calling ThreadPool.QueueUserWorkItem). If you do this in a loop, the runtime will pick up threads from the thread pool and run the worker threads in parallel.
ThreadPool.QueueUserWorkItem(state => DoWork());
Never use Sleep for thread synchronization.
Your question doesn't supply enough detail, but you might want to use a ManualResetEvent to make the workers wait for the initial query.
Yes, it makes sense what you are trying to do.
It would make sense to make 8 workers, each consuming tasks from a queue. You should take care to synchronize threads properly, if they need to access shared state. From your description of your problem, it sounds like you are having a thread synchronization problem.
You should remember, that you can only update the GUI from the GUI thread. That might also be the source of your problems.
There is really no way to tell, what exactly the problem is, without more information or a code example.
I'm suspecting you have a problem like this: You need to make a copy of the loop variable (task) into currenttask, otherwise the threads all actually share the same variable.
<main thread>
var tasks = db.GetTasks();
foreach(var task in tasks) {
var currenttask = task;
ThreadPool.QueueUserWorkItem(state => DoTask(currenttask));
// or, new Thread(() => DoTask(currentTask)).Start()
// ThreadPool.QueueUserWorkItem(state => DoTask(task)); this doesn't work!
}
Note that you shouldn't Thread.Sleep() on the main thread to wait for the worker threads to finish. if using the threadpool, you can continue to queue work items, if you want to wait for the executing tasks to finish, you should use something like an AutoResetEvent to wait for the threads to finish.
You seem to be encountering a common issue with multithreaded programming. It's called a Race Condition, and you'd do well to do some research on this and other multithreading issues before proceeding too far. It's very easy to quickly mess up all your data.
The short of it is that you must ensure all your commands to your database (eg: Get an available task) are performed within the scope of a single transaction.
I don't know MySQL Well enough to give a complete answer, however a very basic example for T-SQL might look like this:
BEGIN TRAN
DECLARE #taskid int
SELECT #taskid=taskid FROM tasks WHERE assigned = false
UPDATE tasks SET assigned=true WHERE taskid = #taskID
SELECT * from tasks where taskid = #taskid
COMMIT TRAN
MySQL 5 and above has support for transactions too.
You could also do a lock around the "fetch task from DB" code, that way only one thread will query the database at a time - but obviously this decrease the performance gain somewhat.
Some code of what you're doing (and maybe some SQL, this really depends) would be a huge help.
However assuming you're fetching a task from DB, and these tasks require some time in C#, you likely want something like this:
object myLock;
void StartWorking()
{
myLock = new object(); // only new it once, could be done in your constructor too.
for (int i = 0; i < Environment.Processorcount; i++)
{
ThreadPool.QueueUserWorkItem(null => DoWork());
}
}
void DoWork(object state)
{
object task;
lock(myLock)
{
task = GetTaskFromDB();
}
PerformTask(task);
}
There are some good ideas posted above. One of the things that we ran into is that we not only wanted a multi-processor capable application but a multi-server capable application as well. Depending upon your application we use a queue that gets wrapped in a lock through a common web server (causing others to be blocked) while we get the next thing to be processed.
In our case, we are processing lots of data, we to keep things single, we locked an object, get the id of the next unprocessed item, flag it as being processed, unlock the object, hand the record id to be processed back to the main thread on the calling server, and then it gets processed. This seems to work well for us since the time it takes to lock, get, update, and release is very small, and while blocking does occur, we never run into a deadlock situation while waiting for reasources (because we are using lock(object) { } and a nice tight try catch inside to ensure we handle errors gracefully inside.
As mentioned elsewhere, all of this is handled in the primary thread. Given the information to be processed, we push it to a new thread (which for us goes and retrieve 100mb's of data and processes it per call). This approach has allowed us to scale beyond the single server. In the past we had to through high end hardware at the problem, now we can throw several cheaper, but still very capable servers. We can also through this across our virtualization farm in low utilization periods.
On other thing I failed to mention, we also use locking mutexes inside our stored proc as well so if two apps on two servers call it at the same time, it's handled gracefully. So the concept above applies to our app and to the database. Our clients backend is MySql 5.1 series and it is done with just a few lines.
One of this things that I think people forget when they are developing is that you want to get in and out of the lock relatively quickly. If you want to return large chunks of data, I personally wouldn't do it in the lock itself unless you really had to. Otherwise, you can't really do much mutlithreading stuff if everyone is waiting to get data.
Okay, found my MySql code for doing just what you will need.
DELIMITER //
CREATE PROCEDURE getnextid(
I_service_entity_id INT(11)
, OUT O_tag VARCHAR(36)
)
BEGIN
DECLARE L_tag VARCHAR(36) DEFAULT '00000000-0000-0000-0000-000000000000';
DECLARE L_locked INT DEFAULT 0;
DECLARE C_next CURSOR FOR
SELECT tag FROM workitems
WHERE status in (0)
AND processable_date <= DATE_ADD(NOW(), INTERVAL 5 MINUTE)
;
DECLARE EXIT HANDLER FOR NOT FOUND
BEGIN
SET L_tag := '00000000-0000-0000-0000-000000000000';
DO RELEASE_LOCK('myuniquelockis');
END;
SELECT COALESCE(GET_LOCK('myuniquelockis',20), 0) INTO L_locked;
IF L_locked > 0 THEN
OPEN C_next;
FETCH C_next INTO I_tag;
IF I_tag <> '00000000-0000-0000-0000-000000000000' THEN
UPDATE workitems SET
status = 1
, service_entity_id = I_service_entity_id
, date_locked = NOW()
WHERE tag = I_tag;
END IF;
CLOSE C_next;
DO RELEASE_LOCK('myuniquelockis');
ELSE
SET I_tag := L_tag;
END IF;
END
//
DELIMITER ;
In our case, we return a GUID to C# as an out parameter. You could replace the SET at the end with SELECT L_tag; and be done with it and loose the OUT parameter, but we call this from another wrapper...
Hope this helps.