I using python 3.6 for sync multiple threads. I have a "master thread" that gives work for all the other threads. When a worker thread is finish work, it signal the master thread to give him more work.
In order to achive that, the master thread is waiting for one (or more) threads to finish before collecting new data to process.
while True:
while freeWorkers > 0:
# Give the worker more work...
time.sleep(5) # wait for 5 seconds before checking if we got free workers.
Basiclly, it's working. I want to upgrade it in that way: after a worker finish it job, it report some how to the "master" thread. Because master thread is really quick, in most cases the master thread will be sleeping... I want to make him stop sleeping, what will trigger giving more work for the free workers.
In C#, I did this trick in that way:
An object to handle the syncing around
public object SyncingClock { get; private set; } = new object();
Entering sleep in that way:
lock (SyncingClock)
Monitor.Wait(SyncingClock, 5000);
Worker thread report completion in that way:
lock (SyncingClock)
Monitor.Pulse(SyncingClock);
So, I looking to way to perform this C# trick in Python (or any other alternative).
Thanks.
i think you should look at eventdriven programming (https://emptypage.jp/notes/pyevent.en.html)
and not having a while loop polling for finished workers:
for example something like this:
def create_thread(self, work_finished_method):
t = some_method_to_create_and prepare_a_thread()
t.event_finished += work_finished_method
return t
class MyThread:
name = "SomeNameForTheThread"
event_finished = event.Event(name + " has finished.")
def finished(self):
self.event_finished()
def do_work:
do_something()
finished()
and when the work_finished method is called in the mainhthread you can assign new work to the thread.
This done with Condition object.
self.conditon = threading.Condition()
For waiting to timeout or pulse, do:
with service.conditon:
service.conditon.wait(5)
For notify:
with self.conditon:
self.conditon.notifyAll()
Related
I have a little c# app with multiple threads runing, but my main thread has to wait for all of threads to finish then it can do the rest.
problem now is that im using .join() for each thread, this seems wait for each thread to finish then it goes to next thread, which makes app not really multi-threading and take long time to finish.
so I wonder if there is any way I can get around this problem or just a way to check if there are no more threads is active.
thanks
If you're hanging on to the Thread object, you can use Thread.IsAlive.
Alternately, you might want to consider firing an event from your thread when it is done.
Thread.Join() doesn't mean your application isn't multithreaded - it tells the current thread to wait for the other thread to finish, which is exactly what you want.
Doing the following:
List<Thread> threads = new List<Thread>();
/** create each thread, Start() it, and add it to the list **/
foreach (Thread thread in threads)
{
thread.Join()
}
will continue to run the other threads, except the current/main thread (it will wait until the other threads are done).
Just use Thread.Join()
Ye, as said by Cuong Le, using Task Parallel Library would be much efficient.
However, you can Create a list of Threads and then check if they are alive or not.
var threadsList = new List<Thread>();
threadsList.Add(myThread); // to add
bool areDone = true;
foreach (Thread t in threadsList) {
if (t.IsAlive)
{
areDone = false;
break;
}
}
if (areDone)
{
// Everything is finished :O
}
Run multiple at same time but wanted to wait for all of them to finish, here's a way of doing the same with Parallel.ForEach:
var arrStr = new string[] {"1st", "2nd", "3rd"};
Parallel.ForEach<string>(arrStr, str =>
{
DoSomething(str); // your custom method you wanted to use
Debug.Print("Finished task for: " + str);
});
Debug.Print("All tasks finished");
That was the most simplest and efficient i guess it can go if in C# 4.0 if you want all tasks to run through same method
Try using BackgroundWorker
It raises an event in the main thread (RunWorkerCompleted) after its work is done
Here is one sample from previously answered question
https://stackoverflow.com/a/5551376/148697
Sorry if this is a duplicate, but I'm not quite sure what terms I need to use to find existing answers to this question.
I'm trying to improve start-up performance of an application, the pseudo-code looks a bit like this.
LoadBigFileFromDisk(); //slow
SetupNetwork(); //even slower
UseBigFileFromDisk();
I figured that as the first step is disk-bound, and the other network-bound (and slower), I could run the first in a background thread (currently playing with ThreadPool.QueueUserWorkItem, but not sure if that's the best way) and improve the performance a bit.
It works, but what worries me is that I'm relying on the second step being slow enough for the first to complete.
I know I could set a _done boolean somewhere and while ! on that, but is there a more elegant/idiomatic solution?
(Not .Net 4.0 yet, so though I'm interested in Task-based, I need the fall-back solutions).
In the "main class" do this:
ManualResetEvent mre = new ManualResetEvent(false);
in your "main" method do this:
// Launch the tasks
mre.WaitOne();
in the task when it finishes (just before the return :-) )
mre.Set();
If you have to wait for multiple events, in your "main" create multiple ManualResetEvent and put them in an array, each event "connected" to one of the tasks, and then each task Sets its event when it finishes. Then in your "main" you do:
WaitHandle.WaitAll(arrayOfManualResetEvents);
Note that in this way you can wait up to 64 events. If you need more, there is another method (and note that you have to use this last method even if you are on a STA thread, like the main thread of a WinForm app).
ManualResetEvent mre = new ManualResetEvent(false);
int remaining = xxx; // Number of "tasks" that must be executed.
// Launch tasks
mre.WaitOne();
At the end of each task
if (Interlocked.Decrement(ref remaining) == 0)
{
mre.Set();
}
Only the last task will decrement the remaining field to 0 and mre.Set().
You could try:
var loadTask=Task.Factory.StartNew(()=>LoadBigFileFromDisk());
var setupTask=Task.Factory.StartNew(()=>SetupNetwork());
Task.WaitAll(loadTask,setupTask);
UseBigFileFromDisk();
This uses the Task Parallel Library.
or:
var loadThread=new Thread(()=>LoadBigFileFromDisk());
var setupThread=new Thread(()=>SetupNetwork());
loadThread.Start();
setupThread.Start();
loadThread.Join();
setupThread.Join();
UseBigFileFromDisk();
When you're not using .NET 4. If these tasks take a long time to run then it's best to avoid the thread pool, as it's primarily for short lived tasks.
Try Thread.Join . Something like networkThread.Join()
http://msdn.microsoft.com/en-us/library/95hbf2ta.aspx
I have the following code, could anyone please clarify my doubt below.
public static void Main() {
Thread thread = new Thread(Display);
thread.Start();
Thread.Sleep(5000);
// Throws exception, thread is terminated, cannot be restarted.
thread.Start()
}
public static void Display() {
}
It seems like in order to restart the thread I have to re-instantiate the thread again. Does this means I am creating a new thread? If I keep on creating 100 re-instiation will it create 100 threads and cause performance issue?
Yes, you either have to create a new thread or give the task to the thread pool each time to avoid a genuinely new thread being created. You can't restart a thread.
However, I'd suggest that if your task has failed to execute 100 times in a row, you have bigger problems than the performance overhead of starting new tasks.
You do not need to start the thread after sleep, the thread wake up automatically. It's the same thread.
first of all, you can't start the thread if it has already started. In your example, thread has finished it is work, that's why it is in terminated state.
you can check status using:
Thread.ThreadState
Are you trying to wake the thread up before the 5 seconds in complete? In which case you could try using Monitor (Wait, Pulse etc)
What does it mean when one says no polling is allowed when implimenting your thread solution since it's wasteful, it has latency and it's non-deterministic. Threads should not use polling to signal each other.
EDIT
Based on your answers so far, I believe my threading implementation (taken from: http://www.albahari.com/threading/part2.aspx#_AutoResetEvent) below is not using polling. Please correct me if I am wrong.
using System;
using System.Threading;
using System.Collections.Generic;
class ProducerConsumerQueue : IDisposable {
EventWaitHandle _wh = new AutoResetEvent (false);
Thread _worker;
readonly object _locker = new object();
Queue<string> _tasks = new Queue<string>();
public ProducerConsumerQueue() (
_worker = new Thread (Work);
_worker.Start();
}
public void EnqueueTask (string task) (
lock (_locker) _tasks.Enqueue (task);
_wh.Set();
}
public void Dispose() (
EnqueueTask (null); // Signal the consumer to exit.
_worker.Join(); // Wait for the consumer's thread to finish.
_wh.Close(); // Release any OS resources.
}
void Work() (
while (true)
{
string task = null;
lock (_locker)
if (_tasks.Count > 0)
{
task = _tasks.Dequeue();
if (task == null) return;
}
if (task != null)
{
Console.WriteLine ("Performing task: " + task);
Thread.Sleep (1000); // simulate work...
}
else
_wh.WaitOne(); // No more tasks - wait for a signal
}
}
}
Your question is very unclear, but typically "polling" refers to periodically checking for a condition, or sampling a value. For example:
while (true)
{
Task task = GetNextTask();
if (task != null)
{
task.Execute();
}
else
{
Thread.Sleep(5000); // Avoid tight-looping
}
}
Just sleeping is a relatively inefficient way of doing this - it's better if there's some coordination so that the thread can wake up immediately when something interesting happens, e.g. via Monitor.Wait/Pulse or Manual/AutoResetEvent... but depending on the context, that's not always possible.
In some contexts you may not want the thread to actually sleep - you may want it to become available for other work. For example, you might use a Timer of one sort or other to periodically poll a mailbox to see whether there's any incoming mail - but you don't need the thread to actually be sleeping when it's not checking; it can be reused by another thread-pool task.
Here you go: check out this website:
http://msdn.microsoft.com/en-us/library/dsw9f9ts%28VS.71%29.aspx
Synchronization Techniques
There are two approaches to synchronization, polling and using synchronization objects. Polling repeatedly checks the status of an asynchronous call from within a loop. Polling is the least efficient way to manage threads because it wastes resources by repeatedly checking the status of the various thread properties.
For example, the IsAlive property can be used when polling to see if a thread has exited. Use this property with caution because a thread that is alive is not necessarily running. You can use the thread's ThreadState property to get more detailed information about a thread's status. Because threads can be in more than one state at any given time, the value stored in ThreadState can be a combination of the values in the System.Threading.Threadstate enumeration. Consequently, you should carefully check all relevant thread states when polling. For example, if a thread's state indicates that it is not Running, it may be done. On the other hand, it may be suspended or sleeping.
Waiting for a Thread to Finish
The Thread.Join method is useful for determining if a thread has completed before starting another task. The Join method waits a specified amount of time for a thread to end. If the thread ends before the timeout, Join returns True; otherwise it returns False. For information on Join, see Thread.Join Method
Polling sacrifices many of the advantages of multithreading in return for control over the order that threads run. Because it is so inefficient, polling generally not recommended. A more efficient approach would use the Join method to control threads. Join causes a calling procedure to wait either until a thread is done or until the call times out if a timeout is specified. The name, join, is based on the idea that creating a new thread is a fork in the execution path. You use Join to merge separate execution paths into a single thread again
One point should be clear: Join is a synchronous or blocking call. Once you call Join or a wait method of a wait handle, the calling procedure stops and waits for the thread to signal that it is done.
Copy
Sub JoinThreads()
Dim Thread1 As New System.Threading.Thread(AddressOf SomeTask)
Thread1.Start()
Thread1.Join() ' Wait for the thread to finish.
MsgBox("Thread is done")
End Sub
These simple ways of controlling threads, which are useful when you are managing a small number of threads, are difficult to use with large projects. The next section discusses some advanced techniques you can use to synchronize threads.
Hope this helps.
PK
Polling can be used in reference to the four asyncronous patterns .NET uses for delegate execution.
The 4 types (I've taken these descriptions from this well explained answer) are:
Polling: waiting in a loop for IAsyncResult.Completed to be true
I'll call you
You call me
I don't care what happens (fire and forget)
So for an example of 1:
Action<IAsyncResult> myAction = (IAsyncResult ar) =>
{
// Send Nigerian Prince emails
Console.WriteLine("Starting task");
Thread.Sleep(2000);
// Finished
Console.WriteLine("Finished task");
};
IAsyncResult result = myAction.BeginInvoke(null,null,null);
while (!result.IsCompleted)
{
// Do something while you wait
Console.WriteLine("I'm waiting...");
}
There's alternative ways of polling, but in general it means "I we there yet", "I we there yet", "I we there yet"
What does it mean when one says no
polling is allowed when implimenting
your thread solution since it's
wasteful, it has latency and it's
non-deterministic. Threads should not
use polling to signal each other.
I would have to see the context in which this statement was made to express an opinion on it either way. However, taken as-is it is patently false. Polling is a very common and very accepted strategy for signaling threads.
Pretty much all lock-free thread signaling strategies use polling in some form or another. This is clearly evident in how these strategies typically spin around in a loop until a certain condition is met.
The most frequently used scenario is the case of signaling a worker thread that it is time to terminate. The worker thread will periodically poll a bool flag at safe points to see if a shutdown was requested.
private volatile bool shutdownRequested;
void WorkerThread()
{
while (true)
{
// Do some work here.
// This is a safe point so see if a shutdown was requested.
if (shutdownRequested) break;
// Do some more work here.
}
}
I post a lot here regarding multithreading, and the great stackoverflow community have helped me alot in understand multithreading.
All the examples I have seen online only deal with one thread.
My application is a scraper for an insurance company (family company ... all free of charge). Anyway, the user is able to select how many threads they want to run. So lets say for example the user wants the application to scrape 5 sites at one time, and then later in the day he choses 20 threads because his computer isn't doing anything else so it has the resources to spare.
Basically the application builds a list of say 1000 sites to scrape. A thread goes off and does that and updates the UI and builds the list.
When thats finished another thread is called to start the scraping. Depending on the number of threads the user has set to use it will create x number of threads.
Whats the best way to create these threads? Should I create 1000 threads in a list. And loop through them? If the user has set 5 threads to run, it will loop through 5 at a time.
I understand threading, but it's the application logic which is catching me out.
Any ideas or resources on the web that can help me out?
You could consider using a thread pool for that:
using System;
using System.Threading;
public class Example
{
public static void Main()
{
ThreadPool.SetMaxThreads(100, 10);
// Queue the task.
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);
Console.WriteLine("Main thread exits.");
}
// This thread procedure performs the task.
static void ThreadProc(Object stateInfo)
{
Console.WriteLine("Hello from the thread pool.");
}
}
This scraper, does it use a lot of CPU when its running?
If it does a lot of communication with these 1000 remote sites, downloading their pages, that may be taking more time than the actual analysis of the pages.
And how many CPU cores does your user have? If they have 2 (which is common these days) then beyond two simultaneous threads performing analysis, they aren't going to see any speed up.
So you probably need to "parallelize" the downloading of the pages. I doubt you need to do the same for the analysis of the pages.
Take a look into asynchronous IO, instead of explicit multi-threading. It lets you launch a bunch of downloads in parallel and then get called back when each one completes.
If you really just want the application, use something someone else already spent time developing and perfecting:
http://arachnode.net/
arachnode.net is a complete and comprehensive .NET web crawler for
downloading, indexing and storing
Internet content including e-mail
addresses, files, hyperlinks, images,
and Web pages.
Whether interested or involved in
screen scraping, data mining, text
mining, research or any other
application where a high-performance
crawling application is key to the
success of your endeavors,
arachnode.net provides the solution
you need for success.
If you also want to write one yourself because it's a fun thing to write (I wrote one not long ago, and yes, it is alot of fun ) then you can refer to this pdf provided by arachnode.net which really explains in detail the theory behind a good web crawler:
http://arachnode.net/media/Default.aspx?Sort=Downloads&PageIndex=1
Download the pdf entitled: "Crawling the Web" (second link from top). Scroll to Section 2.6 entitled: "2.6 Multi-threaded Crawlers". That's what I used to build my crawler, and I must say, I think it works quite well.
I think this example is basically what you need.
public class WebScraper
{
private readonly int totalThreads;
private readonly List<System.Threading.Thread> threads;
private readonly List<Exception> exceptions;
private readonly object locker = new object();
private volatile bool stop;
public WebScraper(int totalThreads)
{
this.totalThreads = totalThreads;
threads = new List<System.Threading.Thread>(totalThreads);
exceptions = new List<Exception>();
for (int i = 0; i < totalThreads; i++)
{
var thread = new System.Threading.Thread(Execute);
thread.IsBackground = true;
threads.Add(thread);
}
}
public void Start()
{
foreach (var thread in threads)
{
thread.Start();
}
}
public void Stop()
{
stop = true;
foreach (var thread in threads)
{
if (thread.IsAlive)
{
thread.Join();
}
}
}
private void Execute()
{
try
{
while (!stop)
{
// Scrap away!
}
}
catch (Exception ex)
{
lock (locker)
{
// You could have a thread checking this collection and
// reporting it as you see fit.
exceptions.Add(ex);
}
}
}
}
The basic logic is:
You have a single queue in which you put the URLs to scrape then you create your threads and use a queue object to which every thread has access. Let the threads start a loop:
lock the queue
check if there are items in the queue, if not, unlock queue and end thread
dequeue first item in the queue
unlock queue
process item
invoke an event that updates the UI (Remember to lock the UI Controller)
return to step 1
Just let the Threads do the "get stuff from the queue" part (pulling the jobs) instead of giving them the urls (pushing the jobs), that way you just say
YourThreadManager.StartThreads(numberOfThreadsTheUserWants);
and everything else happens automagically. See the other replies to find out how to create and manage the threads .
I solved a similar problem by creating a worker class that uses a callback to signal the main app that a worker is done. Then I create a queue of 1000 threads and then call a method that launches threads until the running thread limit is reached, keeping track of the active threads with a dictionary keyed by the thread's ManagedThreadId. As each thread completes, the callback removes its thread from the dictionary and calls the thread launcher.
If a connection is dropped or times out, the callback reinserts the thread back into the queue. Lock around the queue and the dictionary. I create threads vs using the thread pool because the overhead of creating a thread is insignificant compared to the connection time, and it allows me to have a lot more threads in flight. The callback also provides a convenient place with which to update the user interface, even allowing you to change the thread limit while it's running. I've had over 50 open connections at one time. Remember to increase your MacConnections property in your app.config (default is two).
I would use a queue and a condition variable and mutex, and start just the requested number of threads, for example, 5 or 20 (and not start 1,000).
Each thread blocks on the condition variable. When woken up, it dequeues the first item, unlocks the queue, works with the item, locks the queue and checks for more items. If the queue is empty, sleep on the condition variable. If not, unlock, work, repeat.
While the mutex is locked, it can also check if the user has requested the count of threads to be reduced. Just check if count > max_count, and if so, the thread terminates itself.
Any time you have more sites to queue, just lock the mutex and add them to the queue, then broadcast on the condition variable. Any threads that are not already working will wake up and take new work.
Any time the user increases the requested thread count, just start them up and they will lock the queue, check for work, and either sleep on the condition variable or get going.
Each thread will be continually pulling more work from the queue, or sleeping. You don't need more than 5 or 20.
Consider using the event-based asynchronous pattern (AsyncOperation and AsyncOperationManager Classes)
You might want to take a look at the ProcessQueue article on CodeProject.
Essentially, you'll want to create (and start) the number of threads that are appropriate, in your case that number comes from the user. Each of these threads should process a site, then find the next site needed to process. Even if you don't use the object itself (though it sounds like it would suit your purposes pretty well, though I'm obviously biased!) it should give you some good insight into how this sort of thing would be done.