First about my goal:
I am importing a table with about 1000-5000 rows to a DataTable. This one is bound to a DataGridView. Now for every row there has to run a process that takes about 5-10 seconds. After a single process finished I want to write back the result to the DataTabel (result-column).
Because this process is independent I want to use multithreading to speed it up.
This is an example structure of my current code:
// Will be created for each row
public class FooObject
{
public int RowIndex;
public string Name;
//...
}
// Limiting running tasks to 50
private Semaphore semaphore = new Semaphore(50, 50);
// The DataTable is set up at start-up of the App (columns etc)
private DataTable DtData { get; set; } = new DataTable();
// The button that starts the process
private void btnStartLongRun(object sender, EventArgs e)
{
// some init-stuff
StartRun();
}
private async void StartRun()
{
for (int rowIndex = 0; rowIndex < DtData.Rows.Count)
{
// Creating a task to not block the UI
// Using semaphore here to not create objects
// for all lines before they get in use.
// Having this inside the real task it consumed
// a lot of ram (> 1GB)
await Task.Factory.StartNew(() =>
{
semaphore.WaitOne();
});
// The row to process
var currentRow = DtData.Rows[rowIndex];
// Creating an object from the row-data
FooObject foo = new FooObject()
{
RowIndex = rowIndex;
Name = currentRow["Name"].ToString();
}
// Not awaiting because I want multiple threads
// to run at the same time. The semaphore is
// handling this
TaskScheduler scheduler = TaskScheduler.Current;
Task.Factory.StartNew(() =>
{
// Per-row process
return ProcessFoo(foo);
}).ContinueWith((result) =>
{
FinishProcessFoo(result.Result);
}, CancellationToken.None, TaskContinuationOptions.OnlyOnRanToCompletion, scheduler);
}
}
private FooObject ProcessFoo(FooObject foo)
{
// the actual big process per line
}
private void FinishProcessFoo(FooObject foo)
{
// Locking here because I got broken index errors without
lock(DtGrid.Rows.SyncRoot)
{
// Getting the row that got processed
var procRow = DtData.Rows[foo.RowIndex];
// Writing the result to that row
procRow["Result"] = foo.Result;
// Raising the progressbar
pbData.Value++;
}
// Letting the next task start.
semaphore.Release();
}
The big problem:
In the beginning everything is working fine. All threads are running smooth and doing their job. But as longer the app runs, as more it is getting unresponsive. It looks like the app is slowly starting to block more and more.
I started a test-run with 5000 rows. It got in stuck at around row 2000. Sometimes even an error raises that the app isn't responding.
I haven't got a lot experience in multi threading. So maybe this code is totally bad. I appreciate every help in here. I would also be happy about pointing me into another direction to get this running better.
Thank you very much.
Edit
If there is anything I can debug to help in here just tell me.
Edit 2
I already enabled all Common Language Runtime Exceptions to check if there is anything that's not raising an error. Nothing.
If you want to process up to 50 rows in parallel, you could consider using a Parallel.For with a MaxDegreeOfParallelism of 50:
Parallel.For(0, DtData.Rows.Count, new ParallelOptions() { MaxDegreeOfParallelism = 50 }, rowIndex =>
{
//...
});
Starting a new task just to call WaitOne on a Semaphore is a waste of time.
You are using the UI thread to coordinate thousands of async tasks. This is bad. Wrap your call to StartRun in a new task to avoid this.
The better way of doing this is to divide the number of rows by the number of processors, then start one task per processor for just those rows. No need for Semaphore then.
Related
Assuming that a client app gets data from a server nearly in real time. What is the more efficient way to continuously update the UI based on the retrieved data. Think of multiple xaml controls, like texts that show numbers. Those get updated as long as the application is running. They never stop unless the user decides it. (let's say by pressing a stop button or exit the app)
Below I have a simple example utilizing async and await keywords. Is that a good way for my scenario? Or for example BackgroundWorker would be a better way?
private async void Button_Click_Begin_RT_Update(object sender, RoutedEventArgs e)
{
while(true)
textField1.Text = await DoWork();
}
Task<string> DoWork()
{
return Task.Run(() =>
{
return GetRandomNumberAsString();
});
}
*for the sake of simplicity I use code-behind and not mvvm in my example
Your code is more or less OK if your GetRandomNumberAsString() takes at least 15ms to complete.
If it takes less than that, and you want to minimize update latency i.e. you don't want to just wait, you might want to (1) replace your per-operation Task.Run with an endless loop that completely runs in a background thread (2) Implement throttling mechanism in that loop, and only update your GUI (using e.g. Dispatcher.BeginInvoke()) at around 30-60Hz.
P.S. The exact mechanism how you update your GUI (databinding + INotifyPropertyChanged, or directly like in your code) is not relevant for performance.
Update: here's example (untested)
static readonly TimeSpan updateFrequency = TimeSpan.FromMilliseconds( 20 );
void ThreadProc()
{
Stopwatch sw = Stopwatch.StartNew();
while( true )
{
string val = GetRandomNumberAsString();
if( sw.Elapsed < updateFrequency )
continue; // Too early to update
sw.Restart();
Application.Current.Dispatcher.BeginInvoke( () => { textField1.Text = val; } );
}
}
Fairly frustrating since this seems to be well documented and the fact that I accomplished this before, but can't duplicate the same success. Sorry, I'll try to relate it all clearly.
Visual Studio, C# Form, One Main Form has text fields, among other widgets.
At one point we have the concept that we are "running" and therefore gathering data.
For the moment, I started a one second timer so that I can update simulated data into some fields. Eventually that one second timer will take the more rapid data and update it only once per second to the screen, that's the request for the application right now we update at the rate we receive which is a little over 70 Hz, they don't want it that way. In addition some other statistics will be computed and those should be the field updates. Therefore being simple I'm trying to just generate random data and update those fields at the 1 Hz rate. And then expand from that point.
Definition and management of the timer: (this is all within the same class MainScreen)
System.Timers.Timer oneSecondTimer;
public UInt32 run_time = 0;
public int motion = 5;
private void InitializeTimers()
{
this.oneSecondTimer = new System.Timers.Timer(1000);
this.oneSecondTimer.Elapsed += new System.Timers.ElapsedEventHandler(oneSecondTimer_elapsed);
}
public void start_one_second_timer()
{
run_time = 0;
oneSecondTimer.Enabled = true;
}
public void stop_one_second_timer()
{
oneSecondTimer.Enabled = false;
run_time = 0;
}
Random mot = new Random();
private void oneSecondTimer_elapsed(object source, System.Timers.ElapsedEventArgs e)
{
run_time++;
motion = mot.Next(1, 10);
this.oneSecondThread = new Thread(new ThreadStart(this.UpdateTextFields));
this.oneSecondThread.Start();
}
private void UpdateTextFields()
{
this.motionDisplay.Text = this.motion.ToString();
}
motionDisplay is just a textbox in my main form. I get the Invalid Operation Exception pointing me towards the help on how to make Thread-Safe calls. I also tried backgroundworker and end up with the same result. The details are that motionDisplay is accessed from a thread other than the thread it was created on.
So looking for some suggestions as to where my mistakes are.
Best Regards. I continue to iterate on this and will update if I find a solution.
Use a System.Forms.Timer rather than a System.Timers.Timer. It will fire it's elapsed event in the UI thread.
Don't create a new thread to update the UI; just do the update in the elapsed event handler.
Try this
private void UpdateTextFields()
{
this.BeginInvoke(new EventHandler((s,e)=>{
this.motionDisplay.Text = this.motion.ToString();
}));
}
This will properly marshall a call back to the main thread.
The thing with WinForm development is that all the controls are not thread safe. Even getting a property such as .Text from another thread can cause these type of errors to happen. To make it even more frustrating is that sometimes it will work at runtime and you won't get an exception, other times you will.
This is how I do it:
private delegate void UpdateMotionDisplayCallback(string text);
private void UpdateMotionDisplay(string text) {
// InvokeRequired required compares the thread ID of the
// calling thread to the thread ID of the creating thread.
// If these threads are different, it returns true.
if (this.motionDisplay.InvokeRequired) {
UpdateMotionDisplayCallback d = new UpdateMotionDisplayCallback(UpdateMotionDisplay);
this.Invoke(d, new object[] { text });
} else {
this.motionDisplay.Text = text;
}
}
When you want to update the text in motionDisplay just call:
UpdateMotionDisplay(this.motion.ToString())
I want to run multiple threads at a time simultaneously (max 5 threads, for example) and when either one finishes, the new one starts with different data. (one finishes, one new start, two finishes, two new start...)
Main for loop is in main form, but run from a different thread not to block the UI.
When I run it, program adds 5 web browser controls (as a visual progress) and when the page is done loading it removes loaded ones.
The problem is no more controls is being added to the form.
Maybe semaphore is not released properly to allow new ones to start or am I missing something else?
And if I close the program, it doesn't exit, I think it gets blocked on WaitHandle.WaitOne because there are still more jobs to be done.
I removed some non needed data for more code clarity.
Semaphore pool = new Semaphore(5, 5);
Scraper[] scraper = new Scraper[5];
Gecko.GeckoWebBrowser wb = null;
int j = 0;
for (int i = 0; i < arrScrapeboxItems.Count; i++)
{
pool.WaitOne();
bool pustiMe = true;
while (pustiMe)
{
if (scraper[j] == null) scraper[j] = new Scraper();
if (scraper[j].tred == null)
{
ScrapeBoxItems sbi = (ScrapeBoxItems)arrScrapeboxItems[i];
doneEvents.Add(new ManualResetEvent(false)); // this is for WaitHandle.WaitAll after the for loop is done all the items
wb = new Gecko.GeckoWebBrowser();
PoolObjects po = new PoolObjects();
po.link = sbi.link;
// etc...
scraper[j].ThreadsCompleted += new Scraper.ThreadsHandler(frmMain_NextThreadItemsCompleted);
scraper[j].tred = new Thread(new ParameterizedThreadStart(scraper[j].Scrape));
scraper[j].tred.Start(po);
pustiMe = false;
if (j == maxThreads - 1)
j = 0;
else
j++;
break;
}
else if (scraper[j].tred.IsAlive) // if the thread is finished, make room for new thread
{
scraper[j] = null;
}
if (pustiMe) Thread.Sleep(1000);
}
}
// event from Scraper class
void frmMain_ThreadsCompleted()
{
pool.Release();
}
And the Scraper class look like:
public void Scrape(object o)
{
po = (PoolObjects)o;
// do stuff with po
po.form.Invoke((MethodInvoker)delegate
{
po.form.Controls.Add(po.wb);
po.wb.DocumentCompleted += new EventHandler<Gecko.Events.GeckoDocumentCompletedEventArgs>(wb_DocumentCompleted);
po.wb.Navigate(po.link);
});
}
void wb_DocumentCompleted(object sender, Gecko.Events.GeckoDocumentCompletedEventArgs e)
{
var br = sender as Gecko.GeckoWebBrowser;
if (br.Url == e.Uri)
{
form.Controls.Remove(po.wb);
ThreadsCompleted();
manualReset.Set();
}
}
Either you have a typo or a huge bug. You have
else if (scraper[j].tred.IsAlive)
{
scraper[j] = null;
}
I think you want if (!scraper[j].tred.IsAlive). Otherwise, you'll end up overwriting an active Scraper reference in the array.
More to the point, trying to maintain that array of Scraper objects is causing you a lot of complication that you really don't need. You already have the semaphore controlling how many concurrent threads you can have, so the array of Scraper objects is unnecessary noise.
Also, you don't want a whole bunch of ManualResetEvent objects to wait on. WaitAll can't wait on more than 63 items, so if you have more than that in your items list, WaitAll isn't going to do it for you. I show below a better way to make sure all the jobs are completed.
for (int i = 0; i < arrScrapeboxItems.Count; i++)
{
pool.WaitOne();
ScrapeBoxItems sbi = (ScrapeBoxItems)arrScrapeboxItems[i];
wb = new Gecko.GeckoWebBrowser();
PoolObjects po = new PoolObjects();
po.link = sbi.link;
// more initialization of po ...
// and then start the thread
Thread t = new Thread(ScrapeThreadProc);
t.Start(po);
}
// Here's how you wait for all of the threads to complete.
// You have your main thread (which is running here) call `WaitOne` on the semaphore 5 times:
for (int i = 0; i < 5; ++i)
{
pool.WaitOne();
}
private void ScrapeThreadProc(object o)
{
var po = (PoolObjects)o;
Scraper scraper = new Scraper();
// initialize your Scraper object
scraper.ThreadsCompleted += new Scraper.ThreadsHandler(frmMain_NextThreadItemsCompleted);
scraper.Scrape(po);
// scraping is done. Dispose of the scraper and the po.
// and then release the semaphore
pool.Release();
}
That should greatly simplify your code.
The idea behind having the main thread wait on the semaphore 5 times is pretty simple. If the main thread can acquire the semaphore 5 times without calling Release, then you know that there aren't any other jobs running.
There are other ways to do this, as well, but they would require some more involved restructuring of your code. You should look into the Task Parallel Library, specifically Parallel.ForEach, which will handle the threading for you. You can set the maximum number of concurrent threads to 5, so that you won't get too many threads going at once.
You could also do this using a producer/consumer setup with BlockingCollection or some other shared queue.
In both of those scenarios, you end up creating 5 persistent threads that cooperatively process items from the list. That is typically more efficient than creating one thread for each item.
I have a logical problem i am not sure how to solve.. Basically i have a program that starts threads based on a numericUpDown value, if the user selects 5 in the numericUpDown box 5 threads will start.
The problem is that the user also has a listbox they can fill in with info, which will be used in the threads..
So what i want to be able to do in my loop instead of looping it 5 times from the numericUpDown value is if; lets say the user enteres 10 items in the listBox, and selects to use 5 threads.. i then want all the listBox items to be queued but only have 5 run at a time..
How would i accomplish this?
Oh if it matters this is how i start my threads:
Thread thread = new Thread(() => doTask(numeret));
thread.IsBackground = true;
thread.Start();
I believe you wish to use a ThreadPool, as explained here:
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
You need to specify the number of threads to use, and then use ThreadPool.QueueUserWorkItem to queue your tasks.
Alternatively, you can use the parallel extensions to LinQ to perform asynchronous tasks (not the same as multithreading) - and specify the .MaxDegreesOfParalallism() value (which only sets the upper maximum)
itemsToProcess.AsParallel().MaxDegreesOfParalallism(numThreads).ForAll(item =>
{
doTask(item);
});
Usually, something like this is done using worker threads. You create a list of work items (= your listbox entries):
List<WorkItem> myWorkItems = ...; // contains 10 items
And you create your threads. You do not, however, assign a work item to the thread yet (as you do in your example):
for (int i = 0; i < numberOfThreads; i++) { // creates 5 threads
var t = new Thread(doWork);
t.IsBackground = true;
t.Start();
}
Each thread runs a loop, checking for new work items (thread-safe!) and doing work, until no more work is to be done:
void doWork() {
while (true) {
WorkItem item;
lock(someSharedLockObject) {
if (myWorkItems.Count == 0)
break; // no more work to be done
item = myWorkItems[0];
myWorkItems.Remove(item);
}
doTask(item);
}
}
This is similar to what the ThreadPool of the .net Framework does. The ThreadPool, however, is designed to work best when the number of threads can be chosen be the Framework. The example above gives you full control over the number of threads (which seems to be what you want).
Store the info from the listbox in a stack (for example).
Then, in the doTask() method : pop an element from the stack, do the stuff and do it again until the stack is empty.
Basically :
//Stack containing the info to process
Stack<string> infos = new Stack<string>();
//Method for the thread
void doTask()
{
while(true)
{
string info;
//Threadsafe access to the stack
lock (infos.SyncRoot)
{
//Exit the thread if no longer info to process
if (infos.Count == 0) return;
info = infos.Pop();
}
//Do the actual stuff on the info
__DoStuff(info);
}
}
I have an object in my application which performs processing on the items in a collection in a background thread. When the object is created background processing of all existing items in the collection is triggered using the thread pool:
class CollectionProcessor
{
public CollectionProcessor()
{
// Not actually called during the constructor just put it here to simplify the code sample
Action process = new Action(this.Process);
createIndex.BeginInvoke(true, ar => process.EndInvoke(ar), null);
}
void Process()
{
for (int i = 0; i < this.items.Count; i++)
{
this.ProcessItem(this.items[i]);
}
}
}
There is some extra code dotted around for notification callbacks but that is largely the gist of it.
New items can be added to this collection at any time and I need to make sure that those new items are processed - notification of new items is provided by an event which is fired after the items have already been added to the collection. In the event hanlder for this event I need to asynchronously resume the processing of the new items in the collection while also:
Ensuring that I don't process the same item twice
Ensuring that the items are processed in the correct order
Avoiding queuing up lots of blocked background tasks
I also want to achieve this using a thread pool instead of using a dedicated thread - How should I do this? Obviously assume that access to this.items is thread-safe.
I believe I have figured out a reasonably neat way of doing this. They key is to note that if I had a dedicated background thread performing this processing then the solution is fairly easy and might look a little like this:
AutoResetEvent ev = new AutoResetEvent(false);
// Called on a background thread
void ThreadProc()
{
int lastProcessed = 0;
while (true)
{
// Perform our processing as before
for (int i = lastProcessed; i < this.items.Count; i++)
{
this.ProcessItem(this.items[i]);
}
// We have processed all items currently in the list, wait for some more
ev.WaitOne();
}
}
void OnNewItems()
{
ev.Set();
}
The missing link is the ThreadPool.RegisterWaitForSingleObject Method which allows us to convert this to using a thread pool instead of a dedicated thread:
int lastProcessed = 0;
void StartProcessing()
{
ThreadPool.RegisterWaitForSingleObject(
this.ev,
new WaitOrTimerCallback(WaitProc),
null, // All state stored in the class instance itself
-1, // Always wait indefinitely for new items
true // Only execute once - each callback registers a new wait handle ensuring
// that a maximum of 1 task is running Process at any one time
);
}
void WaitProc(object state, bool timedOut)
{
// Perform our processing as before
for (int i = lastProcessed; i < this.items.Count; i++)
{
this.ProcessItem(this.items[i]);
}
// We have processed all items currently in the list, wait for some more
this.StartProcessing();
}
This sets up a loop just as before except we aren't blocking a thread waiting for the reset event.