I want to run multiple threads at a time simultaneously (max 5 threads, for example) and when either one finishes, the new one starts with different data. (one finishes, one new start, two finishes, two new start...)
Main for loop is in main form, but run from a different thread not to block the UI.
When I run it, program adds 5 web browser controls (as a visual progress) and when the page is done loading it removes loaded ones.
The problem is no more controls is being added to the form.
Maybe semaphore is not released properly to allow new ones to start or am I missing something else?
And if I close the program, it doesn't exit, I think it gets blocked on WaitHandle.WaitOne because there are still more jobs to be done.
I removed some non needed data for more code clarity.
Semaphore pool = new Semaphore(5, 5);
Scraper[] scraper = new Scraper[5];
Gecko.GeckoWebBrowser wb = null;
int j = 0;
for (int i = 0; i < arrScrapeboxItems.Count; i++)
{
pool.WaitOne();
bool pustiMe = true;
while (pustiMe)
{
if (scraper[j] == null) scraper[j] = new Scraper();
if (scraper[j].tred == null)
{
ScrapeBoxItems sbi = (ScrapeBoxItems)arrScrapeboxItems[i];
doneEvents.Add(new ManualResetEvent(false)); // this is for WaitHandle.WaitAll after the for loop is done all the items
wb = new Gecko.GeckoWebBrowser();
PoolObjects po = new PoolObjects();
po.link = sbi.link;
// etc...
scraper[j].ThreadsCompleted += new Scraper.ThreadsHandler(frmMain_NextThreadItemsCompleted);
scraper[j].tred = new Thread(new ParameterizedThreadStart(scraper[j].Scrape));
scraper[j].tred.Start(po);
pustiMe = false;
if (j == maxThreads - 1)
j = 0;
else
j++;
break;
}
else if (scraper[j].tred.IsAlive) // if the thread is finished, make room for new thread
{
scraper[j] = null;
}
if (pustiMe) Thread.Sleep(1000);
}
}
// event from Scraper class
void frmMain_ThreadsCompleted()
{
pool.Release();
}
And the Scraper class look like:
public void Scrape(object o)
{
po = (PoolObjects)o;
// do stuff with po
po.form.Invoke((MethodInvoker)delegate
{
po.form.Controls.Add(po.wb);
po.wb.DocumentCompleted += new EventHandler<Gecko.Events.GeckoDocumentCompletedEventArgs>(wb_DocumentCompleted);
po.wb.Navigate(po.link);
});
}
void wb_DocumentCompleted(object sender, Gecko.Events.GeckoDocumentCompletedEventArgs e)
{
var br = sender as Gecko.GeckoWebBrowser;
if (br.Url == e.Uri)
{
form.Controls.Remove(po.wb);
ThreadsCompleted();
manualReset.Set();
}
}
Either you have a typo or a huge bug. You have
else if (scraper[j].tred.IsAlive)
{
scraper[j] = null;
}
I think you want if (!scraper[j].tred.IsAlive). Otherwise, you'll end up overwriting an active Scraper reference in the array.
More to the point, trying to maintain that array of Scraper objects is causing you a lot of complication that you really don't need. You already have the semaphore controlling how many concurrent threads you can have, so the array of Scraper objects is unnecessary noise.
Also, you don't want a whole bunch of ManualResetEvent objects to wait on. WaitAll can't wait on more than 63 items, so if you have more than that in your items list, WaitAll isn't going to do it for you. I show below a better way to make sure all the jobs are completed.
for (int i = 0; i < arrScrapeboxItems.Count; i++)
{
pool.WaitOne();
ScrapeBoxItems sbi = (ScrapeBoxItems)arrScrapeboxItems[i];
wb = new Gecko.GeckoWebBrowser();
PoolObjects po = new PoolObjects();
po.link = sbi.link;
// more initialization of po ...
// and then start the thread
Thread t = new Thread(ScrapeThreadProc);
t.Start(po);
}
// Here's how you wait for all of the threads to complete.
// You have your main thread (which is running here) call `WaitOne` on the semaphore 5 times:
for (int i = 0; i < 5; ++i)
{
pool.WaitOne();
}
private void ScrapeThreadProc(object o)
{
var po = (PoolObjects)o;
Scraper scraper = new Scraper();
// initialize your Scraper object
scraper.ThreadsCompleted += new Scraper.ThreadsHandler(frmMain_NextThreadItemsCompleted);
scraper.Scrape(po);
// scraping is done. Dispose of the scraper and the po.
// and then release the semaphore
pool.Release();
}
That should greatly simplify your code.
The idea behind having the main thread wait on the semaphore 5 times is pretty simple. If the main thread can acquire the semaphore 5 times without calling Release, then you know that there aren't any other jobs running.
There are other ways to do this, as well, but they would require some more involved restructuring of your code. You should look into the Task Parallel Library, specifically Parallel.ForEach, which will handle the threading for you. You can set the maximum number of concurrent threads to 5, so that you won't get too many threads going at once.
You could also do this using a producer/consumer setup with BlockingCollection or some other shared queue.
In both of those scenarios, you end up creating 5 persistent threads that cooperatively process items from the list. That is typically more efficient than creating one thread for each item.
Related
First about my goal:
I am importing a table with about 1000-5000 rows to a DataTable. This one is bound to a DataGridView. Now for every row there has to run a process that takes about 5-10 seconds. After a single process finished I want to write back the result to the DataTabel (result-column).
Because this process is independent I want to use multithreading to speed it up.
This is an example structure of my current code:
// Will be created for each row
public class FooObject
{
public int RowIndex;
public string Name;
//...
}
// Limiting running tasks to 50
private Semaphore semaphore = new Semaphore(50, 50);
// The DataTable is set up at start-up of the App (columns etc)
private DataTable DtData { get; set; } = new DataTable();
// The button that starts the process
private void btnStartLongRun(object sender, EventArgs e)
{
// some init-stuff
StartRun();
}
private async void StartRun()
{
for (int rowIndex = 0; rowIndex < DtData.Rows.Count)
{
// Creating a task to not block the UI
// Using semaphore here to not create objects
// for all lines before they get in use.
// Having this inside the real task it consumed
// a lot of ram (> 1GB)
await Task.Factory.StartNew(() =>
{
semaphore.WaitOne();
});
// The row to process
var currentRow = DtData.Rows[rowIndex];
// Creating an object from the row-data
FooObject foo = new FooObject()
{
RowIndex = rowIndex;
Name = currentRow["Name"].ToString();
}
// Not awaiting because I want multiple threads
// to run at the same time. The semaphore is
// handling this
TaskScheduler scheduler = TaskScheduler.Current;
Task.Factory.StartNew(() =>
{
// Per-row process
return ProcessFoo(foo);
}).ContinueWith((result) =>
{
FinishProcessFoo(result.Result);
}, CancellationToken.None, TaskContinuationOptions.OnlyOnRanToCompletion, scheduler);
}
}
private FooObject ProcessFoo(FooObject foo)
{
// the actual big process per line
}
private void FinishProcessFoo(FooObject foo)
{
// Locking here because I got broken index errors without
lock(DtGrid.Rows.SyncRoot)
{
// Getting the row that got processed
var procRow = DtData.Rows[foo.RowIndex];
// Writing the result to that row
procRow["Result"] = foo.Result;
// Raising the progressbar
pbData.Value++;
}
// Letting the next task start.
semaphore.Release();
}
The big problem:
In the beginning everything is working fine. All threads are running smooth and doing their job. But as longer the app runs, as more it is getting unresponsive. It looks like the app is slowly starting to block more and more.
I started a test-run with 5000 rows. It got in stuck at around row 2000. Sometimes even an error raises that the app isn't responding.
I haven't got a lot experience in multi threading. So maybe this code is totally bad. I appreciate every help in here. I would also be happy about pointing me into another direction to get this running better.
Thank you very much.
Edit
If there is anything I can debug to help in here just tell me.
Edit 2
I already enabled all Common Language Runtime Exceptions to check if there is anything that's not raising an error. Nothing.
If you want to process up to 50 rows in parallel, you could consider using a Parallel.For with a MaxDegreeOfParallelism of 50:
Parallel.For(0, DtData.Rows.Count, new ParallelOptions() { MaxDegreeOfParallelism = 50 }, rowIndex =>
{
//...
});
Starting a new task just to call WaitOne on a Semaphore is a waste of time.
You are using the UI thread to coordinate thousands of async tasks. This is bad. Wrap your call to StartRun in a new task to avoid this.
The better way of doing this is to divide the number of rows by the number of processors, then start one task per processor for just those rows. No need for Semaphore then.
I have a main thread which is controlling a windows form, upon pressing a button two threads are executed. One is used for recording information, the other is used for reading it. The idea behind putting these in threads is to enable the user to interact with the interface while they are executing.
Here is the creating of the two threads;
Thread recordThread = new Thread(() => RecordData(data));
recordThread.Name = "record";
recordThread.Start();
Thread readThread = new Thread(() => ReadData(data));
readThread.Name = "read";
readThread.Start();
The data is simply a Data-object that stores the data that is recorded during the recording.
The problem that I am facing is that the first thread is executed fine, the second refuses to run until the first one completes. Putting a breakpoint in the second threads function, ReadData lets me know that it is only called after the first thread is done with all of its recording.
I have been trying to solve this for a few hours now and I can't get my head around why it would do this. Adding a;
while(readThread.IsAlive) { }
right after the start will halt the execution of anything after that, and it's state is Running. But it will not go to the given method.
Any ideas?
Edit:
The two functions that are called upon by the threads are;
private void RecordData(Data d)
{
int i = 0;
while (i < time * freq)
{
double[] data = daq.Read();
d.AddData(data);
i++;
}
}
private void ReadData(Data d)
{
UpdateLabelDelegate updateData =
new UpdateLabelDelegate(UpdateLabel);
int i = 0;
while (i < time * freq)
{
double[] data = d.ReadLastData();
this.Invoke(updateData, new object[] { data });
i++;
}
}
The data object has locking in both the functions that are called upon; ReadLastData and Read.
Here are the methods in the Data object.
public void AddData(double[] data)
{
lock (this)
{
int i = 0;
foreach (double d in data)
{
movementData[i].Add(d);
i++;
}
}
}
public double[] ReadLastData()
{
double[] data = new double[channels];
lock (this)
{
int i = 0;
foreach (List<double> list in movementData)
{
data[i] = list[list.Count - 1];
}
}
return data;
}
Looks like you have a race condition between your reading/writing. In your first thread you lock down the object whilst you add data to it and in the second thread you attempt to get an exclusive lock on it to start reading. However, the problem is the first thread is executing so fast that the second thread never really gets a chance to acquire the lock.
The solution to this problem really depends on what sort of behaviour you are after here. If you expect after every write you get a consecutive read then what you need to do is control the execution between the reading/writing operations e.g.
static AutoResetEvent canWrite = new AutoResetEvent(true); // default to true so the first write happens
static AutoResetEvent canRead = new AutoResetEvent(false);
...
private void RecordData(Data d)
{
int i = 0;
while (i < time * freq)
{
double[] data = daq.Read();
canWrite.WaitOne(); // wait for the second thread to finish reading
d.AddData(data);
canRead.Set(); // let the second thread know we have finished writing
i++;
}
}
private void ReadData(Data d)
{
UpdateLabelDelegate updateData =
new UpdateLabelDelegate(UpdateLabel);
int i = 0;
while (i < time * freq)
{
canRead.WaitOne(); // wait for the first thread to finish writing
double[] data = d.ReadLastData();
canWrite.Set(); // let the first thread know we have finished reading
this.Invoke(updateData, new object[] { data });
i++;
}
}
Could you try adding a Sleep inside RecordData?
Maybe it's just your (mono cpu??) windows operating system that doesn't let the second thread get its hand on cpu resources.
Don't do this:
lock (this)
Do something like this instead:
private object oLock = new object();
[...]
lock (this.oLock)
EDIT:
Could you try calls like this:
Thread recordThread = new Thread((o) => RecordData((Data)o));
recordThread.Name = "record";
recordThread.Start(data);
I have a logical problem i am not sure how to solve.. Basically i have a program that starts threads based on a numericUpDown value, if the user selects 5 in the numericUpDown box 5 threads will start.
The problem is that the user also has a listbox they can fill in with info, which will be used in the threads..
So what i want to be able to do in my loop instead of looping it 5 times from the numericUpDown value is if; lets say the user enteres 10 items in the listBox, and selects to use 5 threads.. i then want all the listBox items to be queued but only have 5 run at a time..
How would i accomplish this?
Oh if it matters this is how i start my threads:
Thread thread = new Thread(() => doTask(numeret));
thread.IsBackground = true;
thread.Start();
I believe you wish to use a ThreadPool, as explained here:
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
You need to specify the number of threads to use, and then use ThreadPool.QueueUserWorkItem to queue your tasks.
Alternatively, you can use the parallel extensions to LinQ to perform asynchronous tasks (not the same as multithreading) - and specify the .MaxDegreesOfParalallism() value (which only sets the upper maximum)
itemsToProcess.AsParallel().MaxDegreesOfParalallism(numThreads).ForAll(item =>
{
doTask(item);
});
Usually, something like this is done using worker threads. You create a list of work items (= your listbox entries):
List<WorkItem> myWorkItems = ...; // contains 10 items
And you create your threads. You do not, however, assign a work item to the thread yet (as you do in your example):
for (int i = 0; i < numberOfThreads; i++) { // creates 5 threads
var t = new Thread(doWork);
t.IsBackground = true;
t.Start();
}
Each thread runs a loop, checking for new work items (thread-safe!) and doing work, until no more work is to be done:
void doWork() {
while (true) {
WorkItem item;
lock(someSharedLockObject) {
if (myWorkItems.Count == 0)
break; // no more work to be done
item = myWorkItems[0];
myWorkItems.Remove(item);
}
doTask(item);
}
}
This is similar to what the ThreadPool of the .net Framework does. The ThreadPool, however, is designed to work best when the number of threads can be chosen be the Framework. The example above gives you full control over the number of threads (which seems to be what you want).
Store the info from the listbox in a stack (for example).
Then, in the doTask() method : pop an element from the stack, do the stuff and do it again until the stack is empty.
Basically :
//Stack containing the info to process
Stack<string> infos = new Stack<string>();
//Method for the thread
void doTask()
{
while(true)
{
string info;
//Threadsafe access to the stack
lock (infos.SyncRoot)
{
//Exit the thread if no longer info to process
if (infos.Count == 0) return;
info = infos.Pop();
}
//Do the actual stuff on the info
__DoStuff(info);
}
}
I have an object in my application which performs processing on the items in a collection in a background thread. When the object is created background processing of all existing items in the collection is triggered using the thread pool:
class CollectionProcessor
{
public CollectionProcessor()
{
// Not actually called during the constructor just put it here to simplify the code sample
Action process = new Action(this.Process);
createIndex.BeginInvoke(true, ar => process.EndInvoke(ar), null);
}
void Process()
{
for (int i = 0; i < this.items.Count; i++)
{
this.ProcessItem(this.items[i]);
}
}
}
There is some extra code dotted around for notification callbacks but that is largely the gist of it.
New items can be added to this collection at any time and I need to make sure that those new items are processed - notification of new items is provided by an event which is fired after the items have already been added to the collection. In the event hanlder for this event I need to asynchronously resume the processing of the new items in the collection while also:
Ensuring that I don't process the same item twice
Ensuring that the items are processed in the correct order
Avoiding queuing up lots of blocked background tasks
I also want to achieve this using a thread pool instead of using a dedicated thread - How should I do this? Obviously assume that access to this.items is thread-safe.
I believe I have figured out a reasonably neat way of doing this. They key is to note that if I had a dedicated background thread performing this processing then the solution is fairly easy and might look a little like this:
AutoResetEvent ev = new AutoResetEvent(false);
// Called on a background thread
void ThreadProc()
{
int lastProcessed = 0;
while (true)
{
// Perform our processing as before
for (int i = lastProcessed; i < this.items.Count; i++)
{
this.ProcessItem(this.items[i]);
}
// We have processed all items currently in the list, wait for some more
ev.WaitOne();
}
}
void OnNewItems()
{
ev.Set();
}
The missing link is the ThreadPool.RegisterWaitForSingleObject Method which allows us to convert this to using a thread pool instead of a dedicated thread:
int lastProcessed = 0;
void StartProcessing()
{
ThreadPool.RegisterWaitForSingleObject(
this.ev,
new WaitOrTimerCallback(WaitProc),
null, // All state stored in the class instance itself
-1, // Always wait indefinitely for new items
true // Only execute once - each callback registers a new wait handle ensuring
// that a maximum of 1 task is running Process at any one time
);
}
void WaitProc(object state, bool timedOut)
{
// Perform our processing as before
for (int i = lastProcessed; i < this.items.Count; i++)
{
this.ProcessItem(this.items[i]);
}
// We have processed all items currently in the list, wait for some more
this.StartProcessing();
}
This sets up a loop just as before except we aren't blocking a thread waiting for the reset event.
I am trying to write a simple multithreaded program in C#. It has a button pressing which creates a new label on form, and then a for loop runs displaying loop value in label. So if you press button 3 times, it will create 3 threads with 3 labels on form with loop.
When I press the button once, it works fine. But when I press it more than once to create more labels, it runs into following problems:
As soon as button is pressed more than once, it stops the loop in previous thread and runs loop of new thread. If it is multithreaded then it should not stop first loop.
When loop of second label is finished, it gives following error
Object reference not set to an instance of an object
Here is my complete code. The line which throws error is at the end "mylabel[tcount].Text = i.ToString();".
Screenshot of program: http://i.imgur.com/IFMIs.png
Screenshot of code http://i.imgur.com/sIXtc.png
namespace WindowsFormsApplication2{
public partial class Form1 : Form{
public Form1(){
InitializeComponent();
}
private int tcount = 0;
private int y_point = 0;
Thread[] threads = new Thread[5];
Label[] mylabel = new Label[5];
private void button1_Click(object sender, EventArgs e){
threads[tcount] = new Thread(new ThreadStart(work));
threads[tcount].Start();
}
private void work(){
if (this.InvokeRequired){
this.Invoke(new MethodInvoker(delegate{
mylabel[tcount] = new Label();
mylabel[tcount].Text = "label" + tcount;
mylabel[tcount].Location = new System.Drawing.Point(0, y_point + 15);
y_point += 25;
this.Controls.Add(mylabel[tcount]);
for (int i = 0; i < 10000; i++){
mylabel[tcount].Text = i.ToString();
Application.DoEvents();
}
}));
}
tcount++;
}
}
}
If it is multithreaded then it should not stop first loop.
But it is not multithreaded.
this.Invoke(new MethodInvoker(delegate{
This switches via invoker the context back to the UI Thread, so while you open a lot of threads in the background, you basically then put all the processing back into one main thread.
This:
Application.DoEvents();
Then gives other queued work a chance. Still only on the UI thread.
And finally you never parametrize the threads so they all work on the same variables. There is only one non thread save (no lock, no volatile) variable named tCount - bang.
Basically you demonstrate:
Your problem is not solvable multi threaded - any UI element manipulation HAS to happen on the UI thread (which is why you invoke) and as this is all you do you basically can not multithread.
You lack a basic understanding on how UI programs work with threads and the message pump.
You lack a basic understanding on variable scoing and access patterns between threads.
Back to reading documentation I would say.
The problem is the scope of tcount, as all threads acces the same instance of it, so as soon as the second thread starts the first thread also wirtes into the second label.
Also you invoke your whole worker method which will let it run in the UI-Thread again -> not actually multithreaded...
Your worker method should look something like this:
private void work()
{
int tIndex = tCount; //store the index of this thread
tcount++;
mylabel[tIndex] = new Label();
mylabel[tIndex].Text = "label" + tcount;
mylabel[tIndex].Location = new System.Drawing.Point(0, y_point + 15);
y_point += 25;
Invoke((MethodInvoker)delegate() { this.Controls.Add(mylabel[tIndex]); });
for (int i = 0; i < 10000; i++)
{
//doWork
Invoke((MethodInvoker)delegate() { mylabel[tIndex].Text = i.ToString(); });
}
}
Jep, you need to copy tcount to a local variable. As soon as you hit the button twice while a thread has not yet terminated, it is manipulating the second one.