What I am trying to do here is loop a listview full of URLs, check if the source code contains a string, if it does then update the UI listview with YES or NO.
I had forgotten about the Parallel.ForEach method, so decided to try it out (I'm not even sure if it's the best solution for this)
Parallel.ForEach(listViewMain.Items.Cast<ListViewItem>(), row =>
{
try
{
string html = Helpers.GetRequest(row.Text);
if (html.Contains(txtBoxFind.Text))
{
row.SubItems[3].Text = "YES";
}
else
{
row.SubItems[3].Text = "NO";
}
} catch(Exception) {
}
});
The process is fairly simple doing it without the Parallel.ForEach but the UI is still locking up, have i implemented it right? Helpers.GetRequest simply returns the raw HTML to be checked, i thought using the Parallel.ForEach would stop the UI locking while processing or have i got it wrong, any help is appreciated.
Before we begin, do note that Parallel.ForEach in itself is a blocking call, so that is why you experience the UI being not responsive.
A good read is Avoid Executing Parallel Loops on the UI Thread from the MS docs:
... the parallel loop blocks the UI thread on which it’s executing until all iterations are complete.
That said, you should use a Task based approach if not dealing with CPU bound work but with I/O bound work. Since you are dealing with network calls you should stick to tasks. Try this:
public async Task DoSomething()
{
// Process the items parallel
await Task.WhenAll(listViewMain.Items.Cast<ListViewItem>().Select(async row =>
{
// wrap the long running call in a async Task
string html = await Task.Run(() => Helpers.GetRequest(row.Text));
// no need for context capturing and invokes, this is running on the UI thread
var containsText = html.Contains(txtBoxFind.Text);
if (containsText)
{
row.SubItems[3].Text = "YES";
}
else
{
row.SubItems[3].Text = "NO";
}
}));
}
It would be even better when you can make Helpers.GetRequest(row.Text) a Task based method, then you could do:
public async Task DoSomething()
{
// Process the items parallel
await Task.WhenAll(listViewMain.Items.Cast<ListViewItem>().Select(async row =>
{
// wrap the long running call in a async Task
string html = await Helpers.GetRequestAsync(row.Text);
// no need for context capturing and invokes, this is running on the UI thread
var containsText = html.Contains(txtBoxFind.Text);
if (containsText)
{
row.SubItems[3].Text = "YES";
}
else
{
row.SubItems[3].Text = "NO";
}
}));
}
but we need to see the code of Helpers.GetRequest(row.Text) to assist you with that.
EDIT
You've shown the code of GetRequest. WebClient is not task based, try HttpClient:
public async Task<string> GetRequestAsync(string url)
{
var html = "";
using (HttpClient wc = new HttpClient())
{
html = await wc.GetStringAsync(url);
}
return html;
}
the Parallel.ForEach is executing on the UI thread (current thread) and it will not provide more performance for you in case of non-blocking UI. If you want to avoid the UI block, you could try using the async methods, for sample:
Task.Run(() => CheckItems());
Given you can implement an async version of the GetRequest method, you could implement an async method to do this, for sample:
public async Task CheckItems()
{
foreach (var row in listViewMain.Items.Cast<ListViewItem>())
{
try
{
string html = await Helpers.GetRequestAsync(row.Text);
if (html.Contains(txtBoxFind.Text))
{
row.SubItems[3].Text = "YES";
}
else
{
row.SubItems[3].Text = "NO";
}
} catch(Exception ex) {
}
}
}
Don't use the Parallel.ForEach() for this. Parallel.ForEach() is used for CPU bound work. Not IO.
I would do something like: (I didn't tested it, might contains some typo's) (used notepad)
So you might use this for the idea:
public async void Button_Click(object sender, EventArg e)
{
await CheckItems(listViewMain.Items.Cast<ListViewItem>());
}
public async Task CheckItems(IEnumerable<ListViewItem> items)
{
// Capture the UI thread synchronization.
var context = SynchronizationContext.Current;
var tasks = new List<Task>();
// create tasks.
foreach (var row in items)
{
tasks.Add(Task.Run(() =>
{
// the lookup on a (probably) threadpool thread
string html = Helpers.GetRequest(row.Text);
// the processing here..
var containsText = html.Contains(txtBoxFind.Text);
// post the result (and touching gui items in the UI thread)
// this.Invoke() is also and might be the best solution.
context.Post(() =>
{
if (containsText)
{
row.SubItems[3].Text = "YES";
}
else
{
row.SubItems[3].Text = "NO";
}
});
}));
}
// wait for them
await Task.WhenAll(tasks);
}
While I was adding some comment, you could also use the this.Invoke() for this (instead of the SynchronizationContext)
gtg.
Related
I have few methods that report some data to Data base. We want to invoke all calls to Data service asynchronously. These calls to data service are all over and so we want to make sure that these DS calls are executed one after another in order at any given time. Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' . Will this be appropriate in my use case?
Or what other options will suit my use case? Please, guide me.
So, what i have in my code currently
public static SemaphoreSlim mutex = new SemaphoreSlim(1);
//first DS call
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
//await mutex.WaitAsync(); **//is this correct way to use SemaphoreSlim ?**
foreach (var setting in Module.param)
{
Task job1 = SaveModule(setting);
tasks1.Add(job1);
Task job2= SaveModule(GetAdvancedData(setting));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
//mutex.Release(); // **is this correct?**
}
private async Task SaveModule(Module setting)
{
await Task.Run(() =>
{
// Invokes Calls to DS
...
});
}
//somewhere down the main thread, invoking second call to DS
//Second DS Call
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
//await mutex.WaitAsync();// **is this correct?**
await Task.Run(() =>
{
//TrackInstrumentInfoToDS
//mutex.Release();// **is this correct?**
});
if(param2)
{
await Task.Run(() =>
{
//TrackParam2InstrumentInfoToDS
});
}
}
Initially, i was using async await on each of these methods and each of the calls were executed asynchronously but we found out if they are out of sequence then there are room for errors.
So, i thought we should queue all these asynchronous tasks and send them in a separate thread but i want to know what options we have? I came across 'SemaphoreSlim' .
SemaphoreSlim does restrict asynchronous code to running one at a time, and is a valid form of mutual exclusion. However, since "out of sequence" calls can cause errors, then SemaphoreSlim is not an appropriate solution since it does not guarantee FIFO.
In a more general sense, no synchronization primitive guarantees FIFO because that can cause problems due to side effects like lock convoys. On the other hand, it is natural for data structures to be strictly FIFO.
So, you'll need to use your own FIFO queue, rather than having an implicit execution queue. Channels is a nice, performant, async-compatible queue, but since you're on an older version of C#/.NET, BlockingCollection<T> would work:
public sealed class ExecutionQueue
{
private readonly BlockingCollection<Func<Task>> _queue = new BlockingCollection<Func<Task>>();
public ExecutionQueue() => Completion = Task.Run(() => ProcessQueueAsync());
public Task Completion { get; }
public void Complete() => _queue.CompleteAdding();
private async Task ProcessQueueAsync()
{
foreach (var value in _queue.GetConsumingEnumerable())
await value();
}
}
The only tricky part with this setup is how to queue work. From the perspective of the code queueing the work, they want to know when the lambda is executed, not when the lambda is queued. From the perspective of the queue method (which I'm calling Run), the method needs to complete its returned task only after the lambda is executed. So, you can write the queue method something like this:
public Task Run(Func<Task> lambda)
{
var tcs = new TaskCompletionSource<object>();
_queue.Add(async () =>
{
// Execute the lambda and propagate the results to the Task returned from Run
try
{
await lambda();
tcs.TrySetResult(null);
}
catch (OperationCanceledException ex)
{
tcs.TrySetCanceled(ex.CancellationToken);
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
});
return tcs.Task;
}
This queueing method isn't as perfect as it could be. If a task completes with more than one exception (this is normal for parallel code), only the first one is retained (this is normal for async code). There's also an edge case around OperationCanceledException handling. But this code is good enough for most cases.
Now you can use it like this:
public static ExecutionQueue _queue = new ExecutionQueue();
public async Task SendModuleDataToDSAsync(Module parameters)
{
var tasks1 = new List<Task>();
var tasks2 = new List<Task>();
foreach (var setting in Module.param)
{
Task job1 = _queue.Run(() => SaveModule(setting));
tasks1.Add(job1);
Task job2 = _queue.Run(() => SaveModule(GetAdvancedData(setting)));
tasks2.Add(job2);
}
await Task.WhenAll(tasks1);
await Task.WhenAll(tasks2);
}
Here's a compact solution that has the least amount of moving parts but still guarantees FIFO ordering (unlike some of the suggested SemaphoreSlim solutions). There are two overloads for Enqueue so you can enqueue tasks with and without return values.
using System;
using System.Threading;
using System.Threading.Tasks;
public class TaskQueue
{
private Task _previousTask = Task.CompletedTask;
public Task Enqueue(Func<Task> asyncAction)
{
return Enqueue(async () => {
await asyncAction().ConfigureAwait(false);
return true;
});
}
public async Task<T> Enqueue<T>(Func<Task<T>> asyncFunction)
{
var tcs = new TaskCompletionSource(TaskCreationOptions.RunContinuationsAsynchronously);
// get predecessor and wait until it's done. Also atomically swap in our own completion task.
await Interlocked.Exchange(ref _previousTask, tcs.Task).ConfigureAwait(false);
try
{
return await asyncFunction().ConfigureAwait(false);
}
finally
{
tcs.SetResult();
}
}
}
Please keep in mind that your first solution queueing all tasks to lists doesn't ensure that the tasks are executed one after another. They're all running in parallel because they're not awaited until the next tasks is startet.
So yes you've to use a SemapohoreSlim to use async locking and await. A simple implementation might be:
private readonly SemaphoreSlim _syncRoot = new SemaphoreSlim(1);
public async Task SendModuleDataToDSAsync(Module parameters)
{
await this._syncRoot.WaitAsync();
try
{
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
finally
{
this._syncRoot.Release();
}
}
If you can use Nito.AsyncEx the code can be simplified to:
public async Task SendModuleDataToDSAsync(Module parameters)
{
using var lockHandle = await this._syncRoot.LockAsync();
foreach (var setting in Module.param)
{
await SaveModule(setting);
await SaveModule(GetAdvancedData(setting));
}
}
One option is to queue operations that will create tasks instead of queuing already running tasks as the code in the question does.
PseudoCode without locking:
Queue<Func<Task>> tasksQueue = new Queue<Func<Task>>();
async Task RunAllTasks()
{
while (tasksQueue.Count > 0)
{
var taskCreator = tasksQueue.Dequeu(); // get creator
var task = taskCreator(); // staring one task at a time here
await task; // wait till task completes
}
}
// note that declaring createSaveModuleTask does not
// start SaveModule task - it will only happen after this func is invoked
// inside RunAllTasks
Func<Task> createSaveModuleTask = () => SaveModule(setting);
tasksQueue.Add(createSaveModuleTask);
tasksQueue.Add(() => SaveModule(GetAdvancedData(setting)));
// no DB operations started at this point
// this will start tasks from the queue one by one.
await RunAllTasks();
Using ConcurrentQueue would be likely be right thing in actual code. You also would need to know total number of expected operations to stop when all are started and awaited one after another.
Building on your comment under Alexeis answer, your approch with the SemaphoreSlim is correct.
Assumeing that the methods SendInstrumentSettingsToDS and SendModuleDataToDSAsync are members of the same class. You simplay need a instance variable for a SemaphoreSlim and then at the start of each methode that needs synchornization call await lock.WaitAsync() and call lock.Release() in the finally block.
public async Task SendModuleDataToDSAsync(Module parameters)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
private async Task SendInstrumentSettingsToDS(<param1>, <param2>)
{
await lock.WaitAsync();
try
{
...
}
finally
{
lock.Release();
}
}
and it is importend that the call to lock.Release() is in the finally-block, so that if an exception is thrown somewhere in the code of the try-block the semaphore is released.
I've written a class that asynchronously pings a subnet. It works, however, the number of hosts returned will sometimes change between runs. Some questions:
Am I doing something wrong in the code below?
What can I do to make it work better?
The ScanIPAddressesAsync() method is called like this:
NetworkDiscovery nd = new NetworkDiscovery("192.168.50.");
nd.RaiseIPScanCompleteEvent += HandleScanComplete;
nd.ScanIPAddressesAsync();
namespace BPSTestTool
{
public class IPScanCompleteEvent : EventArgs
{
public List<String> IPList { get; set; }
public IPScanCompleteEvent(List<String> _list)
{
IPList = _list;
}
}
public class NetworkDiscovery
{
private static object m_lockObj = new object();
private List<String> m_ipsFound = new List<string>();
private String m_ipBase = null;
public List<String> IPList
{
get { return m_ipsFound; }
}
public EventHandler<IPScanCompleteEvent> RaiseIPScanCompleteEvent;
public NetworkDiscovery(string ipBase)
{
this.m_ipBase = ipBase;
}
public async void ScanIPAddressesAsync()
{
var tasks = new List<Task>();
m_ipsFound.Clear();
await Task.Run(() => AsyncScan());
return;
}
private async void AsyncScan()
{
List<Task> tasks = new List<Task>();
for (int i = 2; i < 255; i++)
{
String ip = m_ipBase + i.ToString();
if (m_ipsFound.Contains(ip) == false)
{
for (int x = 0; x < 2; x++)
{
Ping p = new Ping();
var task = HandlePingReplyAsync(p, ip);
tasks.Add(task);
}
}
}
await Task.WhenAll(tasks).ContinueWith(t =>
{
OnRaiseIPScanCompleteEvent(new IPScanCompleteEvent(m_ipsFound));
});
}
protected virtual void OnRaiseIPScanCompleteEvent(IPScanCompleteEvent args)
{
RaiseIPScanCompleteEvent?.Invoke(this, args);
}
private async Task HandlePingReplyAsync(Ping ping, String ip)
{
PingReply reply = await ping.SendPingAsync(ip, 1500);
if ( reply != null && reply.Status == System.Net.NetworkInformation.IPStatus.Success)
{
lock (m_lockObj)
{
if (m_ipsFound.Contains(ip) == false)
{
m_ipsFound.Add(ip);
}
}
}
}
}
}
One problem I see is async void. The only reason async void is even allowed is only for event handlers. If it's not an event handler, it's a red flag.
Asynchronous methods always start running synchronously until the first await that acts on an incomplete Task. In your code, that is at await Task.WhenAll(tasks). At that point, AsyncScan returns - before all the tasks have completed. Usually, it would return a Task that will let you know when it's done, but since the method signature is void, it cannot.
So now look at this:
await Task.Run(() => AsyncScan());
When AsyncScan() returns, then the Task returned from Task.Run completes and your code moves on, before all of the pings have finished.
So when you report your results, the number of results will be random, depending on how many happened to finish before you displayed the results.
If you want make sure that all of the pings are done before continuing, then change AsyncScan() to return a Task:
private async Task AsyncScan()
And change the Task.Run to await it:
await Task.Run(async () => await AsyncScan());
However, you could also just get rid of the Task.Run and just have this:
await AsyncScan();
Task.Run runs the code in a separate thread. The only reason to do that is in a UI app where you want to move CPU-heavy computations off of the UI thread. When you're just doing network requests like this, that's not necessary.
On top of that, you're also using async void here:
public async void ScanIPAddressesAsync()
Which means that wherever you call ScanIPAddressesAsync() is unable to wait until everything is done. Change that to async Task and await it too.
This code needs a lot of refactoring and bugs like this in concurrency are hard to pinpoint. My bet is on await Task.Run(() => AsyncScan()); which is wrong because AsyncScan() is async and Task.Run(...) will return before it is complete.
My second guess is m_ipsFound which is called a shared state. This means there might be many threads simultaneously reading and writing on this. List<T> is not a data type for this.
Also as a side point having a return in the last line of a method is not adding to the readability and async void is a prohibited practice. Always use async Task even if you return nothing. You can read more on this very good answer.
I started to have HUGE doubts regarding my code and I need some advice from more experienced programmers.
In my application on the button click, the application runs a command, that is calling a ScrapJockeys method:
if (UpdateJockeysPl) await ScrapJockeys(JPlFrom, JPlTo + 1, "jockeysPl"); //1 - 1049
ScrapJockeys is triggering a for loop, repeating code block between 20K - 150K times (depends on the case). Inside the loop, I need to call a service method, where the execution of the method takes a lot of time. Also, I wanted to have the ability of cancellation of the loop and everything that is going on inside of the loop/method.
Right now I am with a method with a list of tasks, and inside of the loop is triggered a Task.Run. Inside of each task, I am calling an awaited service method, which reduces execution time of everything to 1/4 comparing to synchronous code. Also, each task has assigned a cancellation token, like in the example GitHub link:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
int j = i;
Task task = Task.Run(async () =>
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j);
//doing some stuff with results in here
}, TokenSource.Token);
tasks.Add(task);
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
So about my question, is there everything fine with my code? According to this article:
Many async newbies start off by trying to treat asynchronous tasks the
same as parallel (TPL) tasks and this is a major misstep.
What should I use then?
And according to this article:
On a busy server, this kind of implementation can kill scalability.
So how am I supposed to do it?
Please be noted, that the service interface method signature is Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index);
And also I am not 100% sure that I am using Task.Run correctly within my service class. The methods inside are wrapping the code inside await Task.Run(() =>, like in the example GitHub link:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
await Task.Run(() =>
{
//do some time consuming things
});
return jockey;
}
As far as I understand from the articles, this is a kind of anti-pattern. But I am confused a bit. Based on this SO reply, it should be fine...? If not, how to replace it?
On the UI side, you should be using Task.Run when you have CPU-bound code that is long enough that you need to move it off the UI thread. This is completely different than the server side, where using Task.Run at all is an anti-pattern.
In your case, all your code seems to be I/O-based, so I don't see a need for Task.Run at all.
There is a statement in your question that conflicts with the provided code:
I am calling an awaited service method
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
await Task.Run(() =>
{
//do some time consuming things
});
}
The lambda passed to Task.Run is not async, so the service method cannot possibly be awaited. And indeed it is not.
A better solution would be to load the HTML asynchronously (e.g., using HttpClient.GetStringAsync), and then call HtmlDocument.LoadHtml, something like this:
public async Task<LoadedJockey> ScrapSingleJockeyPlAsync(int index)
{
LoadedJockey jockey = new LoadedJockey();
...
string link = sb.ToString();
var html = await httpClient.GetStringAsync(link).ConfigureAwait(false);
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
if (jockey.Name == null)
...
return jockey;
}
And also remove the Task.Run from your for loop:
private async Task ScrapJockey(string dataType)
{
LoadedJockey jockey = new LoadedJockey();
CancellationToken.ThrowIfCancellationRequested();
if (dataType == "jockeysPl") jockey = await _scrapServices.ScrapSingleJockeyPlAsync(j).ConfigureAwait(false);
if (dataType == "jockeysCz") jockey = await _scrapServices.ScrapSingleJockeyCzAsync(j).ConfigureAwait(false);
//doing some stuff with results in here
}
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
//init values and controls in here
List<Task> tasks = new List<Task>();
for (int i = startIndex; i < stopIndex; i++)
{
tasks.Add(ScrapJockey(dataType));
}
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
As far as I understand from the articles, this is a kind of anti-pattern.
It is an anti-pattern. But if can't modify the service implementation, you should at least be able to execute the tasks in parallel. Something like this:
public async Task ScrapJockeys(int startIndex, int stopIndex, string dataType)
{
ConcurrentBag<Task> tasks = new ConcurrentBag<Task>();
ParallelOptions parallelLoopOptions = new ParallelOptions() { CancellationToken = CancellationToken };
Parallel.For(startIndex, stopIndex, parallelLoopOptions, i =>
{
int j = i;
switch (dataType)
{
case "jockeysPl":
tasks.Add(_scrapServices.ScrapSingleJockeyPlAsync(j));
break;
case "jockeysCz":
tasks.Add(_scrapServices.ScrapSingleJockeyCzAsync(j));
break;
}
});
try
{
await Task.WhenAll(tasks);
}
catch (OperationCanceledException)
{
//
}
finally
{
await _dataServices.SaveAllJockeysAsync(Jockeys.ToList()); //saves everything to JSON file
//soing some stuff with UI props in here
}
}
I have a big time-consuming task and I try to implement asynchronous methods in order to prevent the application from blocking. My code looks like this:
CancellationTokenSource _cts;
async void asyncMethod()
{
// ..................
_cts = new CancellationTokenSource();
var progress = new Progress<double>(value => pbCalculationProgress.Value = value);
try
{
_cts.CancelAfter(25000);
int count = await awaitMethod(_cts.Token, progress);
}
catch (OperationCanceledException ex)
{
// .......
}
finally
{
_cts.Dispose();
}
// ..................
}
async Task<int> awaitMethod(CancellationToken ct, IProgress<double> progress)
{
var task = Task.Run(() =>
{
ct.ThrowIfCancellationRequested();
sqlParser();
progress.Report(1);
return 0;
});
return await task;
}
void sqlParser()
{
string info = form1TxtBox.Text;
// ................
}
Also, the program throws an exception, because sqlParser() updates UI thread, when it retrieves the text from the form. The solution is to introduce Dispatcher method, which allows UI update. I keep the body of awaitMethod the same and simply put sqlParser() inside of the Dispatcher:
DispatcherOperation op = Dispatcher.BeginInvoke((Action)(() =>
{
sqlParser();
}));
Here happens something interesting: asyncMethod() even doesn't dare to call awaitMethod! However, if I put a breakpoint inside of sqlParser() and run debugger, then everything goes very smoothly.
Please, can somebody explain what I miss in my code? What kind of patch should i use to make Dispatcher work correctly? Or: how can I run my program without Dispatcher and without throwing UI-update exception?
The solution is to introduce Dispatcher method, which allows UI update.
That's never a good solution. Having background threads reaching directly into your UI is encouraging spaghetti code.
how can I run my program without Dispatcher and without throwing UI-update exception?
Think of your background thread code as its own separate component, completely separate from the UI. If your background code needs data from the UI, then have your UI code read it before the background code starts, and pass that data into the background code.
async void asyncMethod()
{
...
try
{
var data = myUiComponent.Text;
_cts.CancelAfter(25000);
int count = await awaitMethod(data, _cts.Token, progress);
}
...
}
async Task<int> awaitMethod(string data, CancellationToken ct, IProgress<double> progress)
{
var task = Task.Run(() =>
{
ct.ThrowIfCancellationRequested();
sqlParser(data);
progress.Report(1);
return 0;
});
return await task;
}
I have a question regarding Task.WaitAll. At first I tried to use async/await to get something like this:
private async Task ReadImagesAsync(string HTMLtag)
{
await Task.Run(() =>
{
ReadImages(HTMLtag);
});
}
Content of this function doesn't matter, it works synchronously and is completely independent from outside world.
I use it like this:
private void Execute()
{
string tags = ConfigurationManager.AppSettings["HTMLTags"];
var cursor = Mouse.OverrideCursor;
Mouse.OverrideCursor = System.Windows.Input.Cursors.Wait;
List<Task> tasks = new List<Task>();
foreach (string tag in tags.Split(';'))
{
tasks.Add(ReadImagesAsync(tag));
//tasks.Add(Task.Run(() => ReadImages(tag)));
}
Task.WaitAll(tasks.ToArray());
Mouse.OverrideCursor = cursor;
}
Unfortunately I get deadlock on Task.WaitAll if I use it that way (with async/await). My functions do their jobs (so they are executed properly), but Task.WaitAll just stays here forever because apparently ReadImagesAsync doesn't return to the caller.
The commented line is approach that actually works correctly. If I comment the tasks.Add(ReadImagesAsync(tag)); line and use tasks.Add(Task.Run(() => ReadImages(tag))); - everything works well.
What am I missing here?
ReadImages method looks like that:
private void ReadImages (string HTMLtag)
{
string section = HTMLtag.Split(':')[0];
string tag = HTMLtag.Split(':')[1];
List<string> UsedAdresses = new List<string>();
var webClient = new WebClient();
string page = webClient.DownloadString(Link);
var siteParsed = Link.Split('/');
string site = $"{siteParsed[0]} + // + {siteParsed[1]} + {siteParsed[2]}";
int.TryParse(MinHeight, out int minHeight);
int.TryParse(MinWidth, out int minWidth);
int index = 0;
while (index < page.Length)
{
int startSection = page.IndexOf("<" + section, index);
if (startSection < 0)
break;
int endSection = page.IndexOf(">", startSection) + 1;
index = endSection;
string imgSection = page.Substring(startSection, endSection - startSection);
int imgLinkStart = imgSection.IndexOf(tag + "=\"") + tag.Length + 2;
if (imgLinkStart < 0 || imgLinkStart > imgSection.Length)
continue;
int imgLinkEnd = imgSection.IndexOf("\"", imgLinkStart);
if (imgLinkEnd < 0)
continue;
string imgAdress = imgSection.Substring(imgLinkStart, imgLinkEnd - imgLinkStart);
string format = null;
foreach (var imgFormat in ConfigurationManager.AppSettings["ImgFormats"].Split(';'))
{
if (imgAdress.IndexOf(imgFormat) > 0)
{
format = imgFormat;
break;
}
}
// not an image
if (format == null)
continue;
// some internal resource, but we can try to get it anyways
if (!imgAdress.StartsWith("http"))
imgAdress = site + imgAdress;
string imgName = imgAdress.Split('/').Last();
if (!UsedAdresses.Contains(imgAdress))
{
try
{
Bitmap pic = new Bitmap(webClient.OpenRead(imgAdress));
if (pic.Width > minHeight && pic.Height > minWidth)
webClient.DownloadFile(imgAdress, SaveAdress + "\\" + imgName);
}
catch { }
finally
{
UsedAdresses.Add(imgAdress);
}
}
}
}
You are synchronously waiting for tasks to finish. This is not gonna work for WPF without a little bit of ConfigureAwait(false) magic. Here is a better solution:
private async Task Execute()
{
string tags = ConfigurationManager.AppSettings["HTMLTags"];
var cursor = Mouse.OverrideCursor;
Mouse.OverrideCursor = System.Windows.Input.Cursors.Wait;
List<Task> tasks = new List<Task>();
foreach (string tag in tags.Split(';'))
{
tasks.Add(ReadImagesAsync(tag));
//tasks.Add(Task.Run(() => ReadImages(tag)));
}
await Task.WhenAll(tasks.ToArray());
Mouse.OverrideCursor = cursor;
}
If this is WPF, then I'm sure you would call it when some kind of event happens. The way you should call this method is from event handler, e.g.:
private async void OnWindowOpened(object sender, EventArgs args)
{
await Execute();
}
Looking at the edited version of your question I can see that in fact you can make it all very nice and pretty by using async version of DownloadStringAsync:
private async Task ReadImages (string HTMLtag)
{
string section = HTMLtag.Split(':')[0];
string tag = HTMLtag.Split(':')[1];
List<string> UsedAdresses = new List<string>();
var webClient = new WebClient();
string page = await webClient.DownloadStringAsync(Link);
//...
}
Now, what's the deal with tasks.Add(Task.Run(() => ReadImages(tag)));?
This requires knowledge of SynchronizationContext. When you create a task, you copy the state of thread that scheduled the task, so you can come back to it when you are finished with await. When you call method without Task.Run, you say "I want to come back to UI thread". This is not possible, because UI thread is already waiting for the task and so they are both waiting for themselves. When you add another task to the mix, you are saying: "UI thread must schedule an 'outer' task that will schedule another, 'inner' task, that I will come back to."
Use WhenAll instead of WaitAll, Turn your Execute into async Task and await the task returned by Task.WhenAll.
This way it never blocks on an asynchronous code.
I found some more detailed articles explaining why actually deadlock happened here:
https://medium.com/bynder-tech/c-why-you-should-use-configureawait-false-in-your-library-code-d7837dce3d7f
https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html
Short answer would be making a small change in my async method so it looks like that:
private async Task ReadImagesAsync(string HTMLtag)
{
await Task.Run(() =>
{
ReadImages(HTMLtag);
}).ConfigureAwait(false);
}
Yup. That's it. Suddenly it doesn't deadlock. But these two articles + #FCin response explain WHY it actually happened.
It's like you are saying i do not care when ReadImagesAsync() finishes but you have to wait for it .... Here is a definition
The Task.WaitAll blocks the current thread until all other tasks have completed execution.
The Task.WhenAll method is used to create a task that will complete if and only if all the other tasks are complete.
So, if you are using Task.WhenAll you would get a task object that isn't complete. However, it will not block and would allow the program to execute. On the contrary, the Task.WaitAll method call actually blocks and waits for all other tasks to complete.
Essentially, Task.WhenAll would provide you a task that isn't complete but you can use ContinueWith as soon as the specified tasks have completed their execution. Note that neither the Task.WhenAll method nor the Task.WaitAll would run the tasks, i.e., no tasks are started by any of these methods.
Task.WhenAll(taskList).ContinueWith(t => {
// write your code here
});