I have tried so many variants of this code. I'm receiving the same issue no matter what. The UI updating starts fine and then stalls until the entire process is complete. Can someone point me in the right direction?
The scenario
In a WPF application we will be calling the same API thousands of times with different parameters passed. We need to collect all the responses and do something.
Sample code
List<Task> tasks = new List<Task>();
for (int i = 1; i <= iterations; i++)
{
Task t = SampleTask(new SampleTaskParameterCollection { TaskId = i, Locker = locker, MinSleep = minSleep, MaxSleep = maxSleep });
tasks.Add(t);
}
Task.WhenAll(tasks);
private void SampleTask(SampleTaskParameterCollection parameters)
{
int sleepTime = rnd.Next(parameters.MinSleep, parameters.MaxSleep);
Thread.Sleep(sleepTime);
Application.Current.Dispatcher.BeginInvoke(new Action(() =>
{
lock (parameters.Locker)
{
ProgressBar1.Value = ProgressBar1.Value + 1;
LogTextbox.Text = LogTextbox.Text + Environment.NewLine + "Task " + parameters.TaskId + " slept for " + sleepTime + "ms and has now completed.";
}
LogTextbox.ScrollToEnd();
if (ProgressBar1.Maximum == ProgressBar1.Value)
{
RunSlowButton.IsEnabled = true;
RunFastButton.IsEnabled = true;
ProgressBar1.Value = 0;
}
}), System.Windows.Threading.DispatcherPriority.Send);
}
The current repo is located on GitHub. Look at the SimpleWindow.
Do not create thousands of tasks - this will cause immense performance problems.
Instead, use something like Parallel.For() to limit the number of tasks that run simultaneously; for example:
Parallel.For(1,
iterations + 1,
(index) =>
{
SampleTask(new SampleTaskParameterCollection { TaskId = index, Locker = locker, MinSleep = minSleep, MaxSleep = maxSleep });
});
Also if the UI updates take longer than the interval between the calls to BeginInvoke() then the invokes will begin to be queued up and things will get nasty.
To solve that, you could use a counter in your SampleTask() to only actually update the UI once every N calls (with a suitable value for N).
However, note that to avoid threading issues you'd have to use Interlocked.Increment() (or some other lock) when incrementing and checking the value of the counter. You'd also have to ensure that you updated the UI one last time when all the work is done.
Related
THERE'S AN UPDATE BELOW THIS INITIAL QUESTION
I have a query that pulls in about 90,000 header records. I then want to iterate through that result set to get detail data for each header retrieved. If I process it linearly it take close to an hour and a half. If I parallelize it, I can get it done in 11 minutes. However, there's no screen updates. I have done multi-threaded applications lots of times, and have always been successful with doing things like:
this.lblStatus.Invoke(new MethodInvoker(() => this.lblStatus.Text = "Updating Standards Docs"));
However, this appears to really screen up a Parallel loop. Using that method for some screen updates, the Parallel loop never actually finished. So I need another method.
I've been trying:
Task.Factory.StartNew(() =>
{
OrderablePartitioner<PayrollHeader> partitioner = Partitioner.Create(query, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, thisCheck =>
{
Interlocked.Increment(ref iChckNo);
lock (_status)
{
_status.ProcessMsg = "Voucher " + thisCheck.VoucherNumber;
_status.ProcessName = thisCheck.EmployeeName;
_status.CurrentRec = iChckNo;
dtSpan = DateTime.Now.Subtract(dtSpanStart);
_status.TimeMsg = string.Format("Elapsed {0}:{1}:{2}", dtSpan.Hours, dtSpan.Minutes, dtSpan.Seconds);
}
BeginInvoke((Action) (() =>
{
lblVoucher.Text = _status.ProcessMsg;
lblName.Text = _status.ProcessName;
lblCount.Text = string.Format("Record {0} of {1}", _status.CurrentRec, _status.TotalRecs);
lblTime.Text = _status.TimeMsg;
Application.DoEvents();
}));
thisCheck.GetDetails();
});
}).Wait();
The wait on the Task is because afterwards I need to do something else with the query afterwards, which I'll put into a ContinueWith statement eventually, I just really need to get the screen update to work.
I know all about cross thread corruption, which is why I'm trying to use the Invoker method... I firmly believe long running processes still need to keep the user informed, which is why I'm attempting this.
BTW, it's a WinForms app, not a WPF app. Any help at all would be greatly appreciated...
UPDATE:So someone wanted to see the updated code, with the IProgress into it.
Status _status = new Status {TotalRecs = query.Count};
var progress = new Progress<Status>(msg => WriteStatusUpdate(msg));
Task.Run(() =>
{
OrderablePartitioner<PayrollHeader> partitioner = Partitioner.Create(query, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, thisCheck =>
{
lock (_status)
{
_status.ProcessMsg = "Voucher " + thisCheck.VoucherNumber;
_status.ProcessName = thisCheck.EmployeeName;
_status.CurrentRec = ++iChckNo;
dtSpan = DateTime.Now.Subtract(dtSpanStart);
_status.TimeMsg = string.Format("Elapsed {0}:{1}:{2}", dtSpan.Hours, dtSpan.Minutes, dtSpan.Seconds);
}
((IProgress<Status>) progress).Report(_status);
thisCheck.GetDetails();
});
}).Wait();
private void WriteStatusUpdate(Status _status)
{
lblVoucher.Text = _status.ProcessMsg;
lblVoucher.Refresh();
lblName.Text = _status.ProcessName;
lblName.Refresh();
lblCount.Text = string.Format("Records {0} of {1}", _status.CurrentRec, _status.TotalRecs);
lblCount.Refresh();
lblTime.Text = _status.TimeMsg;
lblTime.Refresh();
}
The code to update the screen never gets called...
Don't try to update the UI from inside the parallel loop. It's not just that you can't update the UI from inside a background thread, it results in ugly and unmaintainable code. The parallel loop should do processing. Reporting should be performed by someone else.
The .NET Framework provides the IProgress< T> interface to report progress and the default implementation Progress< T> raises an event or calls a delegate on its creator thread, eg the UI thread. This results in much simpler code, eg:
var stopwatch = Stopwatch.StartNew();
var progressImpl=new Progress<Tuple<int,string,string>>(
msg=>ReportProgress(msg,stopwatch))
IProgress<Tuple<int,string,string>> progress=progressImpl;
var partitioner = Partitioner.Create(query, EnumerablePartitionerOptions.NoBuffering);
Task.Run(()=> Parallel.ForEach(partitioner, thisCheck =>
{
....
var msg=Tuple.Create(iChckNo,thisCheck.VoucherNumber,thisCheck.EmployeeName);
progress.Report(msg);
...
})
);
...
private void ReportProgress(Tuple<int,string,string> msg,Stopwatch stopwatch)
{
_status.ProcessMsg = "Voucher " + msg.Item2;
_status.ProcessName = msg.Item3;
_status.CurrentRec = msg.Item1;
_status.TimeMsg = string.Format("Elapsed {0:c}", stopwatch.Elapsed);
};
I'm being very lazy here by using a Tuple<int,string,string> instead of a more specific class.
Messages sent from inside the parallel loop will be marshaled on the UI thread by Progress<T> and the ReportProgress function will be called on the UI thread itself.
The cast to IProgress< T> is necessary because the Publish method is explicitly implemented. This is a safety measure to prevent programmers from coding against the implementation itself.
I couldn't get the IProgress thing to actually work, so what I ended up doing, which is probably not the best approach is I put the Parallel.ForEach in a Task, and in the loop I update a public object. When the Task actually starts I sit in a while loop until it's done, and in that while loop I'm updating the UI...
bool blDone = false;
int iChckNo = 0;
_status.TotalRecs = query.Count;
Task.Run(() =>
{
OrderablePartitioner<PayrollHeader> partitioner = Partitioner.Create(query, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, thisCheck =>
{
lock (_status)
{
iChckNo++;
_status.ProcessMsg = "Voucher " + thisCheck.VoucherNumber;
_status.ProcessName = thisCheck.EmployeeName;
_status.CurrentRec = iChckNo;
dtSpan = DateTime.Now.Subtract(dtSpanStart);
_status.TimeMsg = string.Format("Elapsed {0}:{1}:{2}", dtSpan.Hours, dtSpan.Minutes, dtSpan.Seconds);
}
thisCheck.GetDetails();
});
blDone = true;
});
while (!blDone)
{
WriteStatusUpdate();
}
further down in the code is
private void WriteStatusUpdate()
{
lock (_status)
{
lblVoucher.Text = _status.ProcessMsg;
lblName.Text = _status.ProcessName;
lblCount.Text = string.Format("Records {0} of {1}", _status.CurrentRec, _status.TotalRecs);
lblTime.Text = _status.TimeMsg;
Application.DoEvents();
}
}
Again, most likely not the best approach, but whatever gets it done...
I am trying to create a application that multi threaded downloads images from a website, as a introduction into threading. (never used threading properly before)
But currently it seems to create 1000+ threads and I am not sure where they are coming from.
I first queue a thread into a thread pool, for starters i only have 1 job in the jobs array
foreach (Job j in Jobs)
{
ThreadPool.QueueUserWorkItem(Download, j);
}
Which starts the void Download(object obj) on a new thread where it loops through a certain amount of pages (images needed / 42 images per page)
for (var i = 0; i < pages; i++)
{
var downloadLink = new System.Uri("http://www." + j.Provider.ToString() + "/index.php?page=post&s=list&tags=" + j.Tags + "&pid=" + i * 42);
using (var wc = new WebClient())
{
try
{
wc.DownloadStringAsync(downloadLink);
wc.DownloadStringCompleted += (sender, e) =>
{
response = e.Result;
ProcessPage(response, false, j);
};
}
catch (System.Exception e)
{
// Unity editor equivalent of console.writeline
Debug.Log(e);
}
}
}
correct me if I am wrong, the next void gets called on the same thread
void ProcessPage(string response, bool secondPass, Job j)
{
var wc = new WebClient();
LinkItem[] linkResponse = LinkFinder.Find(response).ToArray();
foreach (LinkItem i in linkResponse)
{
if (secondPass)
{
if (string.IsNullOrEmpty(i.Href))
continue;
else if (i.Href.Contains("http://loreipsum."))
{
if (DownloadImage(i.Href, ID(i.Href)))
j.Downloaded++;
}
}
else
{
if (i.Href.Contains(";id="))
{
var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
ProcessPage(alterResponse, true, j);
}
}
}
}
And finally passes on to the last function and downloads the actual image
bool DownloadImage(string target, int id)
{
var url = new System.Uri(target);
var fi = new System.IO.FileInfo(url.AbsolutePath);
var ext = fi.Extension;
if (!string.IsNullOrEmpty(ext))
{
using (var wc = new WebClient())
{
try
{
wc.DownloadFileAsync(url, id + ext);
return true;
}
catch(System.Exception e)
{
if (DEBUG) Debug.Log(e);
}
}
}
else
{
Debug.Log("Returned Without a extension: " + url + " || " + fi.FullName);
return false;
}
return true;
}
I am not sure how I am starting this many threads, but would love to know.
Edit
The goal of this program is to download the different job in jobs at the same time (max of 5) each downloading a maximum of 42 images at the time.
so a maximum of 210 images can/should be downloaded maximum at all times.
First of all, how did you measure the thread count? Why do you think that you have thousand of them in your application? You are using the ThreadPool, so you don't create them by yourself, and the ThreadPool wouldn't create such great amount of them for it's needs.
Second, you are mixing synchronious and asynchronious operations in your code. As you can't use TPL and async/await, let's go through you code and count the unit-of-works you are creating, so you can minimize them. After you do this, the number of queued items in ThreadPool will decrease and your application will gain performance you need.
You don't set the SetMaxThreads method in your application, so, according the MSDN:
Maximum Number of Thread Pool Threads
The number of operations that can be queued to the thread pool is limited only by available memory;
however, the thread pool limits the number of threads that can be
active in the process simultaneously. By default, the limit is 25
worker threads per CPU and 1,000 I/O completion threads.
So you must set the maximum to the 5.
I can't find a place in your code where you check the 42 images per Job, you are only incrementing the value in ProcessPage method.
Check the ManagedThreadId for the handle of WebClient.DownloadStringCompleted - does it execute in different thread or not.
You are adding the new item in ThreadPool queue, why are you using the asynchronious operation for Downloading? Use a synchronious overload, like this:
ProcessPage(wc.DownloadString(downloadLink), false, j);
This will not create another one item in ThreadPool queue, and you wouldn't have a sinchronisation context switch here.
In ProcessPage your wc variable doesn't being garbage collected, so you aren't freeing all your resourses here. Add using statement here:
void ProcessPage(string response, bool secondPass, Job j)
{
using (var wc = new WebClient())
{
LinkItem[] linkResponse = LinkFinder.Find(response).ToArray();
foreach (LinkItem i in linkResponse)
{
if (secondPass)
{
if (string.IsNullOrEmpty(i.Href))
continue;
else if (i.Href.Contains("http://loreipsum."))
{
if (DownloadImage(i.Href, ID(i.Href)))
j.Downloaded++;
}
}
else
{
if (i.Href.Contains(";id="))
{
var alterResponse = wc.DownloadString("http://www." + j.Provider.ToString() + "/index.php?page=post&s=view&id=" + ID(i.Href));
ProcessPage(alterResponse, true, j);
}
}
}
}
}
In DownloadImage method you also use the asynchronious load. This also adds item in ThreadPoll queue, and I think that you can avoid this, and use synchronious overload too:
wc.DownloadFile(url, id + ext);
return true;
So, in general, avoid the context-switching operations and dispose your resources properly.
Your wc WebClinet will go out of scope and be randomly garbage collected before the async callback. Also on all async calls you have to allow for immediate return and the actual delegated function return. So processPage will have to be in two places. Also the j in the original loop may be going out of scope depending on where Download in the original loop is declared.
Can someone help me with the following code please. in the line:
Parallel.Invoke(parallelOptions, () => dosomething(message));
I want to invoke up to 5 parallel tasks (if there are 5 busy, wait for the next available, then start it... if only 4 busy, start a 5th, etc)
private AutoResetEvent autoResetEvent1 = new AutoResetEvent(false);
private ParallelOptions parallelOptions = new ParallelOptions { MaxDegreeOfParallelism = 5 };
private void threadProc()
{
queue.ReceiveCompleted += MyReceiveCompleted1;
Debug.WriteLine("about to start loop");
while (!shutdown)
{
queue.BeginReceive();
autoResetEvent1.WaitOne();
}
queue.ReceiveCompleted -= MyReceiveCompleted1;
queue.Dispose();
Debug.WriteLine("we are done");
}
private void MyReceiveCompleted1(object sender, ReceiveCompletedEventArgs e)
{
var message = queue.EndReceive(e.AsyncResult);
Debug.WriteLine("number of max tasks: " + parallelOptions.MaxDegreeOfParallelism);
Parallel.Invoke(parallelOptions, () => dosomething(message));
autoResetEvent1.Set();
}
private void dosomething(Message message)
{
//dummy body
var i = 0;
while (true)
{
Thread.Sleep(TimeSpan.FromSeconds(1));
i++;
if (i == 5 || i == 10 || i == 15) Debug.WriteLine("loop number: " + i + " on thread: " + Thread.CurrentThread.ManagedThreadId);
if (i == 15)
break;
}
Debug.WriteLine("finished task");
}
THE RESULTS I GET NOW:
1) with dosomething() as you see it above, i get only one at a time (it waits)
2) with dosomething() changed to below, i get none stops x number of tasks (not limited or obeying MaxDegreeOfParallelism
private async Task dosomething(Message message)
{
//dummy body
var i = 0;
while (true)
{
await Task.Delay(TimeSpan.FromSeconds(1));
i++;
if (i == 5 || i == 10 || i == 15) Debug.WriteLine("loop number: " + i + " on thread: " + Thread.CurrentThread.ManagedThreadId);
if (i == 15)
break;
}
Debug.WriteLine("finished task");
}
What am i doing wrong to get what i want accomplished?
THE OUTCOME I WANT:
in "MyReceiveCompleted", i want to make sure only 5 simultaneous tasks are processing messages, if there are 5 busy ones, wait for one to become available.
This line of code:
Parallel.Invoke(parallelOptions, () => dosomething(message));
is telling the TPL to start a new parallel operation with only one thing to do. So, the "max parallelism" option of 5 is kind of meaningless, since there will only be one thing to do. autoResetEvent1 ensures that there will only be one parallel operation at a time, and each parallel operation only has one thing to do, so the observed behavior of only one thing running at a time is entirely expected.
When you change the delegate to be asynchronous, what is actually happening is that the MaxDegreeOfParallelism is only applied to the synchronous portion of the method. So once it hits its first await, it "leaves" the parallel operation and is no longer constrained by it.
The core problem is that Parallel works best when the number of operations is known ahead of time - which your code doesn't know; it's just reading them from a queue and processing them as they arrive. As someone commented, you could solve this using dynamic task parallelism with a TaskScheduler that limits concurrency.
However, the simplest solution is probably TPL Dataflow. You can create an ActionBlock<T> with the appropriate throttling option and send messages to it as they arrive:
private ActionBlock<string> block = new ActionBlock<string>(
message => dosomething(message),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
private void MyReceiveCompleted1(object sender, ReceiveCompletedEventArgs e)
{
var message = queue.EndReceive(e.AsyncResult);
block.Post(message);
autoResetEvent1.Set();
}
One nice aspect of TPL Dataflow is that it also understands asynchronous methods, and will interpret MaxDegreeOfParallelism as a maximum degree of concurrency.
I have such particular code:
for (int i = 0; i < SingleR_mustBeWorkedUp._number_of_Requestes; i++)
{
Random myRnd = new Random(SingleR_mustBeWorkedUp._num_path);
while (true)
{
int k = myRnd.Next(start, end);
if (CanRequestBePutted(timeLineR, k, SingleR_mustBeWorkedUp._time_service, start + end) == true)
{
SingleR_mustBeWorkedUp.placement[i] = k;
break;
}
}
}
I use an infinite loop here which will end only if CanRequestBePutted returns true. So how to know that the app isn't responding?
There is a solution by controlling time of working each loop, but it doesn't seem to be really good. And I can't forecast that is going to happen in every cases.
Any solutions?
If you're concerned that this operation could potentially take long enough for the application's user to notice, you should be running it in a non-UI thread. Then you can be sure that it will not be making your application unrepsonsive. You should only be running it in the UI thread if you're sure it will always complete very quickly. When in doubt, go to a non-UI thread.
Don't try to figure out dynamically whether the operation will take a long time or not. If it taking a while is a possibility, do the work in another thread.
Why not use a task or threadpool so you're not blocking and put a timer on it?
The task could look something like this:
//put a class level variable
static object _padlock = new object();
var tasks = new List<Task>();
for (int i = 0; i < SingleR_mustBeWorkedUp._number_of_Requestes; i++)
{
var task = new Task(() =>
{
Random myRnd = new Random(SingleR_mustBeWorkedUp._num_path);
while (true)
{
int k = myRnd.Next(start, end);
if (CanRequestBePutted(timeLineR, k, SingleR_mustBeWorkedUp._time_service, start + end) == true)
{
lock(_padlock)
SingleR_mustBeWorkedUp.placement[i] = k;
break;
}
}
});
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
However I would also try to figure out a way to take out your while(true), which is a bit dangerous. Also Task requires .NET 4.0 or above and i'm not sure what framework your targeting.
If you need something older you can use ThreadPool.
Also you might want to put locks around shared resources like SingleR_mustBeWorkedUp.placement or anywhere else might be changing a variable. I put one around SingleR_mustBeWorkedUp.placement as an example.
I have a simply foreach loop that limits itself based on while loops and a static int. If I dont limit it, my CPU stays under 10% if i limit it my CPU goes up to 99/100%. How do I safely limit the number of calls to a class within a Paralell.Foreach?
static int ActiveThreads { get; set; }
static int TotalThreads { get; set; }
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = 1;
Parallel.ForEach(urlTable.AsEnumerable(),options,drow =>
{
using (var WCC = new MasterCrawlerClass())
{
while (TotalThreads <= urlTable.Rows.Count)
{
if (ActiveThreads <= 9)
{
Console.WriteLine("Active Thread #: " + ActiveThreads);
ActiveThreads++;
WCC.MasterCrawlBegin(drow);
TotalThreads++;
Console.WriteLine("Done Crawling a datarow");
ActiveThreads--;
}
}
}
});
I need to limit it, and yes I understand Max Parallelism has it's own limit, however, my switch gets bogged down before the CPU in the server hits that limit.
Two things :
1) You don't seem to be using your ParallelOptions() that you created in this example.
2) You can use a Semaphore if for some reason you don't want to use the ParallelOptions.
Semaphore sm = new Semaphore(0, 9);
// increment semaphore or block if = 9
// this will block gracefully without constantly checking for `ActiveThreads <= 9`
sm.WaitOne();
// decrement semaphore
sm.Release();
I have a simply foreach loop that limits itself based on while loops
and a static int. If I dont limit it, my CPU stays under 10% if i
limit it my CPU goes up to 99/100%.
That is pretty odd. It may be a result of the way you have limited the concurrency with the loop which, by the way, appears to cause each drow to be crawled many times. I doubt that is what you want. You are getting low CPU utilization because the crawl operation is IO bound.
If you really want to limit the number of concurrent calls to MasterCrawlBegin to 9 then set MaxDegreesOfParallelism = 9. The while loop and maintanence of TotalThreads and ActiveThreads is not going to work. As a side note you are incrementing and decrementing the counters in a manner that is not thread-safe.
Change your code to look like this.
int ActiveThreads = 0;
var options = new ParallelOptions();
options.MaxDegreeOfParallelism = 9;
Parallel.ForEach(urlTable.AsEnumerable(),options,drow =>
{
int x = Interlocked.Increment(ref ActiveThreads);
Console.WriteLine("Active Thread #: " + x);
try
{
using (var WCC = new MasterCrawlerClass())
{
WCC.MasterCrawlBegin(drow);
}
}
finally
{
Interlocked.Decrement(ref ActiveThreads);
Console.WriteLine("Done Crawling a datarow");
}
});