I'm wondering if there is a way to report WebClient progress without using EAP(Event-based Asynchronous Pattern).
Old way(using EAP) would be:
var client = new WebClient();
client.DownloadProgressChanged += (s,e) => { //progress reporting }
client.DownloadFileCompleted += (s,e) => { Console.Write("download finished" }
client.DownloadFileAsync(file);
With async/await this can be written as:
var client = new WebClient();
client.DownloadProgressChanged += (s,e) => { //progress reporting }
await client.DownloadFileTaskAsync(file);
Console.Write("downlaod finished");
But in the second example i'm using both EAP and TAP(Task-based Asynchronous Pattern).
Isn't mixing two patterns of asynchrony considered as a bad practice?
Is there a way to achieve the same without using EAP?
I have read about IProgress interface, but I think there is no way to use it to report WebClient progress.
The bad news is that the answer is NO!
The good news is that any EAP API can be converted into a TAP API.
Try this:
public static class WebClientExtensios
{
public static async Task DownloadFileTaskAsync(
this WebClient webClient,
Uri address,
string fileName,
IProgress<Tuple<long, int, long>> progress)
{
// Create the task to be returned
var tcs = new TaskCompletionSource<object>(address);
// Setup the callback event handler handlers
AsyncCompletedEventHandler completedHandler = (cs, ce) =>
{
if (ce.UserState == tcs)
{
if (ce.Error != null) tcs.TrySetException(ce.Error);
else if (ce.Cancelled) tcs.TrySetCanceled();
else tcs.TrySetResult(null);
}
};
DownloadProgressChangedEventHandler progressChangedHandler = (ps, pe) =>
{
if (pe.UserState == tcs)
{
progress.Report(
Tuple.Create(
pe.BytesReceived,
pe.ProgressPercentage,
pe.TotalBytesToReceive));
}
};
try
{
webClient.DownloadFileCompleted += completedHandler;
webClient.DownloadProgressChanged += progressChangedHandler;
webClient.DownloadFileAsync(address, fileName, tcs);
await tcs.Task;
}
finally
{
webClient.DownloadFileCompleted -= completedHandler;
webClient.DownloadProgressChanged -= progressChangedHandler;
}
}
}
And just use it like this:
void Main()
{
var webClient = new WebClient();
webClient.DownloadFileTaskAsync(
new Uri("http://feeds.paulomorgado.net/paulomorgado/blogs/en"),
#"c:\temp\feed.xml",
new Progress<Tuple<long, int, long>>(t =>
{
Console.WriteLine($#"
Bytes received: {t.Item1,25:#,###}
Progress percentage: {t.Item2,25:#,###}
Total bytes to receive: {t.Item3,25:#,###}");
})).Wait();
}
Try to send a mail, but the task is cancelled. Any idea why?
public static Task SendAsync(this SmtpClient client, MailMessage message)
{
TaskCompletionSource<object> tcs = new TaskCompletionSource<object>();
Guid sendGuid = Guid.NewGuid();
SendCompletedEventHandler handler = null;
handler = (o, ea) =>
{
if (ea.UserState is Guid && ((Guid)ea.UserState) == sendGuid)
{
client.SendCompleted -= handler;
if (ea.Cancelled)
{
tcs.SetCanceled(); // TASK CANCELLED: Why?
}
else if (ea.Error != null)
{
tcs.SetException(ea.Error);
}
else
{
tcs.SetResult(null);
}
}
};
client.SendCompleted += handler;
client.SendAsync(message, sendGuid);
return tcs.Task;
}
Called by:
using( SmtpClient smtpClient = new SmtpClient() )
{
return smtpClient.SendAsync(msg);
}
Thanks in advance for any help!
Gerard
A using statement will call an objects Dispose method when finishing the execution block. Calling smtpClient.SendAsync without using await on the async method will cause the execution block to end and Dispose will be called on SmtpClient, even though the SendAsync method is still executing, which explains why some mails finish as some may complete before disposing the objects and others dont.
Do this:
using (SmtpClient smtpClient = new SmtpClient())
{
await smtpClient.SendAsync(msg);
}
I need to change current code to not block current thread when EventWaitHandle.WaitOne is called. Problem is that I am awaiting system-wide event. I did not find any proper replacement yet.
Code:
EventWaitHandle handle = new EventWaitHandle(false, EventResetMode.AutoReset, "Local event", out screenLoadedSignalMutexWasCreated);
StartOtherApp();
if (screenLoadedSignalMutexWasCreated)
{
isOtherAppFullyLoaded = handle.WaitOne(45000, true);
if (isOtherAppFullyLoaded )
{
// do stuff
}
else
{
// do stuff
}
handle.Dispose();
signalingCompleted = true;
}
else
{
isOtherAppFullyLoaded = false;
throw new Exception(" ");
}
I need app to continue and not stop on the line where I call WaitOne, ideally there would be await. How can I implement this ?
You can use AsyncFactory.FromWaitHandle, in my AsyncEx library:
isOtherAppFullyLoaded = await AsyncFactory.FromWaitHandle(handle,
TimeSpan.FromMilliseconds(45000));
The implementation uses ThreadPool.RegisterWaitForSingleObject:
public static Task<bool> FromWaitHandle(WaitHandle handle, TimeSpan timeout)
{
// Handle synchronous cases.
var alreadySignalled = handle.WaitOne(0);
if (alreadySignalled)
return Task.FromResult(true);
if (timeout == TimeSpan.Zero)
return Task.FromResult(false);
// Register all asynchronous cases.
var tcs = new TaskCompletionSource<bool>();
var threadPoolRegistration = ThreadPool.RegisterWaitForSingleObject(handle,
(state, timedOut) => ((TaskCompletionSource<bool>)state).TrySetResult(!timedOut),
tcs, timeout);
tcs.Task.ContinueWith(_ =>
{
threadPoolRegistration.Dispose();
}, TaskScheduler.Default);
return tcs.Task;
}
I am using this method to instantiate a web browser programmatically, navigate to a url and return a result when the document has completed.
How would I be able to stop the Task and have GetFinalUrl() return null if the document takes more than 5 seconds to load?
I have seen many examples using a TaskFactory but I haven't been able to apply it to this code.
private Uri GetFinalUrl(PortalMerchant portalMerchant)
{
SetBrowserFeatureControl();
Uri finalUri = null;
if (string.IsNullOrEmpty(portalMerchant.Url))
{
return null;
}
Uri trackingUrl = new Uri(portalMerchant.Url);
var task = MessageLoopWorker.Run(DoWorkAsync, trackingUrl);
task.Wait();
if (!String.IsNullOrEmpty(task.Result.ToString()))
{
return new Uri(task.Result.ToString());
}
else
{
throw new Exception("Parsing Failed");
}
}
// by Noseratio - http://stackoverflow.com/users/1768303/noseratio
static async Task<object> DoWorkAsync(object[] args)
{
_threadCount++;
Console.WriteLine("Thread count:" + _threadCount);
Uri retVal = null;
var wb = new WebBrowser();
wb.ScriptErrorsSuppressed = true;
TaskCompletionSource<bool> tcs = null;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (s, e) => tcs.TrySetResult(true);
foreach (var url in args)
{
tcs = new TaskCompletionSource<bool>();
wb.DocumentCompleted += documentCompletedHandler;
try
{
wb.Navigate(url.ToString());
await tcs.Task;
}
finally
{
wb.DocumentCompleted -= documentCompletedHandler;
}
retVal = wb.Url;
wb.Dispose();
return retVal;
}
return null;
}
public static class MessageLoopWorker
{
#region Public static methods
public static async Task<object> Run(Func<object[], Task<object>> worker, params object[] args)
{
var tcs = new TaskCompletionSource<object>();
var thread = new Thread(() =>
{
EventHandler idleHandler = null;
idleHandler = async (s, e) =>
{
// handle Application.Idle just once
Application.Idle -= idleHandler;
// return to the message loop
await Task.Yield();
// and continue asynchronously
// propogate the result or exception
try
{
var result = await worker(args);
tcs.SetResult(result);
}
catch (Exception ex)
{
tcs.SetException(ex);
}
// signal to exit the message loop
// Application.Run will exit at this point
Application.ExitThread();
};
// handle Application.Idle just once
// to make sure we're inside the message loop
// and SynchronizationContext has been correctly installed
Application.Idle += idleHandler;
Application.Run();
});
// set STA model for the new thread
thread.SetApartmentState(ApartmentState.STA);
// start the thread and await for the task
thread.Start();
try
{
return await tcs.Task;
}
finally
{
thread.Join();
}
}
#endregion
}
Updated: the latest version of the WebBrowser-based console web scraper can be found on Github.
Updated: Adding a pool of WebBrowser objects for multiple parallel downloads.
Do you have an example of how to do this in a console app by any
chance? Also I don't think webBrowser can be a class variable because
I am running the whole thing in a parallell for each, iterating
thousands of URLs
Below is an implementation of more or less generic **WebBrowser-based web scraper **, which works as console application. It's a consolidation of some of my previous WebBrowser-related efforts, including the code referenced in the question:
Capturing an image of the web page with opacity
Loading a page with dynamic AJAX content
Creating an STA message loop thread for WebBrowser
Loading a set of URLs, one after another
Printing a set of URLs with WebBrowser
Web page UI automation
A few points:
Reusable MessageLoopApartment class is used to start and run a WinForms STA thread with its own message pump. It can be used from a console application, as below. This class exposes a TPL Task Scheduler (FromCurrentSynchronizationContext) and a set of Task.Factory.StartNew wrappers to use this task scheduler.
This makes async/await a great tool for running WebBrowser navigation tasks on that separate STA thread. This way, a WebBrowser object gets created, navigated and destroyed on that thread. Although, MessageLoopApartment is not tied up to WebBrowser specifically.
It's important to enable HTML5 rendering using Browser Feature
Control, as otherwise the WebBrowser obejcts runs in IE7 emulation mode by default.
That's what SetFeatureBrowserEmulation does below.
It may not always be possible to determine when a web page has finished rendering with 100% probability. Some pages are quite complex and use continuous AJAX updates. Yet we
can get quite close, by handling DocumentCompleted event first, then polling the page's current HTML snapshot for changes and checking the WebBrowser.IsBusy property. That's what NavigateAsync does below.
A time-out logic is present on top of the above, in case the page rendering is never-ending (note CancellationTokenSource and CreateLinkedTokenSource).
using Microsoft.Win32;
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace Console_22239357
{
class Program
{
// by Noseratio - https://stackoverflow.com/a/22262976/1768303
// main logic
static async Task ScrapeSitesAsync(string[] urls, CancellationToken token)
{
using (var apartment = new MessageLoopApartment())
{
// create WebBrowser inside MessageLoopApartment
var webBrowser = apartment.Invoke(() => new WebBrowser());
try
{
foreach (var url in urls)
{
Console.WriteLine("URL:\n" + url);
// cancel in 30s or when the main token is signalled
var navigationCts = CancellationTokenSource.CreateLinkedTokenSource(token);
navigationCts.CancelAfter((int)TimeSpan.FromSeconds(30).TotalMilliseconds);
var navigationToken = navigationCts.Token;
// run the navigation task inside MessageLoopApartment
string html = await apartment.Run(() =>
webBrowser.NavigateAsync(url, navigationToken), navigationToken);
Console.WriteLine("HTML:\n" + html);
}
}
finally
{
// dispose of WebBrowser inside MessageLoopApartment
apartment.Invoke(() => webBrowser.Dispose());
}
}
}
// entry point
static void Main(string[] args)
{
try
{
WebBrowserExt.SetFeatureBrowserEmulation(); // enable HTML5
var cts = new CancellationTokenSource((int)TimeSpan.FromMinutes(3).TotalMilliseconds);
var task = ScrapeSitesAsync(
new[] { "http://example.com", "http://example.org", "http://example.net" },
cts.Token);
task.Wait();
Console.WriteLine("Press Enter to exit...");
Console.ReadLine();
}
catch (Exception ex)
{
while (ex is AggregateException && ex.InnerException != null)
ex = ex.InnerException;
Console.WriteLine(ex.Message);
Environment.Exit(-1);
}
}
}
/// <summary>
/// WebBrowserExt - WebBrowser extensions
/// by Noseratio - https://stackoverflow.com/a/22262976/1768303
/// </summary>
public static class WebBrowserExt
{
const int POLL_DELAY = 500;
// navigate and download
public static async Task<string> NavigateAsync(this WebBrowser webBrowser, string url, CancellationToken token)
{
// navigate and await DocumentCompleted
var tcs = new TaskCompletionSource<bool>();
WebBrowserDocumentCompletedEventHandler handler = (s, arg) =>
tcs.TrySetResult(true);
using (token.Register(() => tcs.TrySetCanceled(), useSynchronizationContext: true))
{
webBrowser.DocumentCompleted += handler;
try
{
webBrowser.Navigate(url);
await tcs.Task; // wait for DocumentCompleted
}
finally
{
webBrowser.DocumentCompleted -= handler;
}
}
// get the root element
var documentElement = webBrowser.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously, this will throw if cancellation requested
await Task.Delay(POLL_DELAY, token);
// continue polling if the WebBrowser is still busy
if (webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
// consider the page fully rendered
token.ThrowIfCancellationRequested();
return html;
}
// enable HTML5 (assuming we're running IE10+)
// more info: https://stackoverflow.com/a/18333982/1768303
public static void SetFeatureBrowserEmulation()
{
if (System.ComponentModel.LicenseManager.UsageMode != System.ComponentModel.LicenseUsageMode.Runtime)
return;
var appName = System.IO.Path.GetFileName(System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName);
Registry.SetValue(#"HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\Main\FeatureControl\FEATURE_BROWSER_EMULATION",
appName, 10000, RegistryValueKind.DWord);
}
}
/// <summary>
/// MessageLoopApartment
/// STA thread with message pump for serial execution of tasks
/// by Noseratio - https://stackoverflow.com/a/22262976/1768303
/// </summary>
public class MessageLoopApartment : IDisposable
{
Thread _thread; // the STA thread
TaskScheduler _taskScheduler; // the STA thread's task scheduler
public TaskScheduler TaskScheduler { get { return _taskScheduler; } }
/// <summary>MessageLoopApartment constructor</summary>
public MessageLoopApartment()
{
var tcs = new TaskCompletionSource<TaskScheduler>();
// start an STA thread and gets a task scheduler
_thread = new Thread(startArg =>
{
EventHandler idleHandler = null;
idleHandler = (s, e) =>
{
// handle Application.Idle just once
Application.Idle -= idleHandler;
// return the task scheduler
tcs.SetResult(TaskScheduler.FromCurrentSynchronizationContext());
};
// handle Application.Idle just once
// to make sure we're inside the message loop
// and SynchronizationContext has been correctly installed
Application.Idle += idleHandler;
Application.Run();
});
_thread.SetApartmentState(ApartmentState.STA);
_thread.IsBackground = true;
_thread.Start();
_taskScheduler = tcs.Task.Result;
}
/// <summary>shutdown the STA thread</summary>
public void Dispose()
{
if (_taskScheduler != null)
{
var taskScheduler = _taskScheduler;
_taskScheduler = null;
// execute Application.ExitThread() on the STA thread
Task.Factory.StartNew(
() => Application.ExitThread(),
CancellationToken.None,
TaskCreationOptions.None,
taskScheduler).Wait();
_thread.Join();
_thread = null;
}
}
/// <summary>Task.Factory.StartNew wrappers</summary>
public void Invoke(Action action)
{
Task.Factory.StartNew(action,
CancellationToken.None, TaskCreationOptions.None, _taskScheduler).Wait();
}
public TResult Invoke<TResult>(Func<TResult> action)
{
return Task.Factory.StartNew(action,
CancellationToken.None, TaskCreationOptions.None, _taskScheduler).Result;
}
public Task Run(Action action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler);
}
public Task<TResult> Run<TResult>(Func<TResult> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler);
}
public Task Run(Func<Task> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler).Unwrap();
}
public Task<TResult> Run<TResult>(Func<Task<TResult>> action, CancellationToken token)
{
return Task.Factory.StartNew(action, token, TaskCreationOptions.None, _taskScheduler).Unwrap();
}
}
}
I suspect running a processing loop on another thread will not work out well, since WebBrowser is a UI component that hosts an ActiveX control.
When you're writing TAP over EAP wrappers, I recommend using extension methods to keep the code clean:
public static Task<string> NavigateAsync(this WebBrowser #this, string url)
{
var tcs = new TaskCompletionSource<string>();
WebBrowserDocumentCompletedEventHandler subscription = null;
subscription = (_, args) =>
{
#this.DocumentCompleted -= subscription;
tcs.TrySetResult(args.Url.ToString());
};
#this.DocumentCompleted += subscription;
#this.Navigate(url);
return tcs.Task;
}
Now your code can easily apply a timeout:
async Task<string> GetUrlAsync(string url)
{
using (var wb = new WebBrowser())
{
var navigate = wb.NavigateAsync(url);
var timeout = Task.Delay(TimeSpan.FromSeconds(5));
var completed = await Task.WhenAny(navigate, timeout);
if (completed == navigate)
return await navigate;
return null;
}
}
which can be consumed as such:
private async Task<Uri> GetFinalUrlAsync(PortalMerchant portalMerchant)
{
SetBrowserFeatureControl();
if (string.IsNullOrEmpty(portalMerchant.Url))
return null;
var result = await GetUrlAsync(portalMerchant.Url);
if (!String.IsNullOrEmpty(result))
return new Uri(result);
throw new Exception("Parsing Failed");
}
I'm trying to take benefit from Noseratio's solution as well as following advices from Stephen Cleary.
Here is the code I updated to include in the code from Stephen the code from Noseratio regarding the AJAX tip.
First part: the Task NavigateAsync advised by Stephen
public static Task<string> NavigateAsync(this WebBrowser #this, string url)
{
var tcs = new TaskCompletionSource<string>();
WebBrowserDocumentCompletedEventHandler subscription = null;
subscription = (_, args) =>
{
#this.DocumentCompleted -= subscription;
tcs.TrySetResult(args.Url.ToString());
};
#this.DocumentCompleted += subscription;
#this.Navigate(url);
return tcs.Task;
}
Second part: a new Task NavAjaxAsync to run the tip for AJAX (based on Noseratio's code)
public static async Task<string> NavAjaxAsync(this WebBrowser #this)
{
// get the root element
var documentElement = #this.Document.GetElementsByTagName("html")[0];
// poll the current HTML for changes asynchronosly
var html = documentElement.OuterHtml;
while (true)
{
// wait asynchronously
await Task.Delay(POLL_DELAY);
// continue polling if the WebBrowser is still busy
if (webBrowser.IsBusy)
continue;
var htmlNow = documentElement.OuterHtml;
if (html == htmlNow)
break; // no changes detected, end the poll loop
html = htmlNow;
}
return #this.Document.Url.ToString();
}
Third part: a new Task NavAndAjaxAsync to get the navigation and the AJAX
public static async Task NavAndAjaxAsync(this WebBrowser #this, string url)
{
await #this.NavigateAsync(url);
await #this.NavAjaxAsync();
}
Fourth and last part: the updated Task GetUrlAsync from Stephen with Noseratio's code for AJAX
async Task<string> GetUrlAsync(string url)
{
using (var wb = new WebBrowser())
{
var navigate = wb.NavAndAjaxAsync(url);
var timeout = Task.Delay(TimeSpan.FromSeconds(5));
var completed = await Task.WhenAny(navigate, timeout);
if (completed == navigate)
return await navigate;
return null;
}
}
I'd like to know if this is the right approach.
In one action of my MVC 4 apps, I have a call:
public ActionResult Test()
{
DownloadAsync("uri","file path");
return Content("OK");
}
DownloadAsync return a Task and I expect to see the DownloadAsync run in background. But I always see that MVC only response when the Task of DownloadAsync is completed (means that need wait for download complete before response). If I wrap the async call in to Task.Run() or Task.Factory.StartNew(), then it works as my expectation. Here's method DownloadAsync:
private Task DownloadAsync(string url, string originalFile)
{
var tsc = new TaskCompletionSource<bool>();
var client = new WebClient();
AsyncCompletedEventHandler completedHandler = null;
completedHandler = (s, e) =>
{
var wc = (WebClient)s;
wc.DownloadFileCompleted -= completedHandler;
if (e.Cancelled)
{
tsc.TrySetCanceled();
}
else if (e.Error != null)
{
tsc.TrySetException(e.Error);
}
else
{
tsc.SetResult(true);
}
wc.Dispose();
};
client.DownloadFileCompleted += completedHandler;
client.DownloadFileAsync(new Uri(url), originalFile);
return tsc.Task;
}
So my question are:
Why MVC request need wait for complete Task in this case? Is there any special for Task created by TaskCompletionSource<T>?
How to make the Task of DownloadAsync run in background without pause the response of MVC request?
Thanks,