I'm creating an extension method for the WebBrowser control whose purpose is to wait for the page to have finished completely loading before returning.
By completely loading, I mean that one second has to have elapsed after the last DocumentCompleted event. This is to account for pages where multiple DocumentCompleted events are triggered during the loading of the page (which is something that's affecting my app currently).
I've written the current method which appears to work but I feel like could be improved: it returns a task when there's probably no need and also it suffers from the problem that an exception will be thrown if the application is closed down while it is waiting for the page to load.
public static Task<bool> WaitLoad(this WebBrowser webBrowser, int wait)
{
var timerInternalWait = new Timer {Interval = 1000, Tag = "Internal"};
var timerMaxWait = new Timer {Interval = wait};
var tcs = new TaskCompletionSource<bool>();
WebBrowserNavigatingEventHandler navigatingHandler = (sender, args) => timerInternalWait.Stop();
webBrowser.Navigating += navigatingHandler;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (sender, args) => { timerInternalWait.Stop(); timerInternalWait.Start(); };
webBrowser.DocumentCompleted += documentCompletedHandler;
EventHandler timerHandler = null;
timerHandler = (sender, args) =>
{
webBrowser.Navigating -= navigatingHandler;
webBrowser.DocumentCompleted -= documentCompletedHandler;
timerInternalWait.Tick -= timerHandler;
timerMaxWait.Tick -= timerHandler;
timerMaxWait.Stop();
timerInternalWait.Stop();
tcs.SetResult(((Timer) sender).Tag.ToString() == "Internal");
};
timerInternalWait.Tick += timerHandler;
timerMaxWait.Tick += timerHandler;
return tcs.Task;
}
Is there something I can do to improve it?
The design guidelines state that asynchronous methods should be suffixed.
That timer is making your code harder to read. Try this instead:
public static Task WaitLoad(this WebBrowser webBrowser)
{
var tcs = new TaskCompletionSource<object>();
WebBrowserNavigatingEventHandler navigatingHandler = (sender, args) => timerInternalWait.Stop();
webBrowser.Navigating += navigatingHandler;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = (sender, args) => tcs.SetResult(null);
try
{
webBrowser.DocumentCompleted += documentCompletedHandler;
await tcs.Task;
}
finally
{
webBrowser.DocumentCompleted -= documentCompletedHandler;
await Task.Delay(1000);
}
return tcs.Task;
}
public static Task<bool> WaitLoad(this WebBrowser webBrowser, int timeout)
{
var webBrowserTask = webBrowser.WaitLoad();
return await Task.WhenAny(webBrowserTask, Task.Delay(timeout)) == webBrowserTask;
}
But you should look into cancellation tokens instead of double timers.
I'm relatively new to C# and had stumbled across an issue where my onreceived event is triggered twice but the first time doesn't update the subscriber instead it only update on the second times. Below are some of the codes used in my program. Appreciate if anyone can have a look and tell me where is the cause at.
I have something like this.
public TCPMain()
{
SocketService.TCPClient.onConnected += new SocketService.TCPClient.onConnectedHandler(Client_onConnected);
SocketService.TCPClient.onDataReceived += new SocketService.TCPClient.onDataReceivedHandler(Client_onDataReceived);
SocketService.TCPClient.onDisconnected += new SocketService.TCPClient.onDisconnectedHandler(Client_onDisconnected);
SocketService.TCPClient.onStatus += new SocketService.TCPClient.onStatusHandler(Client_onStatus);
SocketService.TCPClient.onStatus2 += new SocketService.TCPClient.onStatus2Handler(Client_onStatus2);
}
public TCPMain(string RemoteServerIP, int RemoteServerPort, int RemoteServerPort2, string LocalIP = "", int LocalPort = 0)
{
this.RemoteServerIP = RemoteServerIP;
this.RemoteServerPort = RemoteServerPort;
this.RemoteServerPort2 = RemoteServerPort2;
this.LocalIP = LocalIP;
this.LocalPort = LocalPort;
SocketService.TCPClient.onConnected += new SocketService.TCPClient.onConnectedHandler(Client_onConnected);
SocketService.TCPClient.onDataReceived += new SocketService.TCPClient.onDataReceivedHandler(Client_onDataReceived);
SocketService.TCPClient.onDisconnected += new SocketService.TCPClient.onDisconnectedHandler(Client_onDisconnected);
SocketService.TCPClient.onStatus += new SocketService.TCPClient.onStatusHandler(Client_onStatus);
SocketService.TCPClient.onStatus2 += new SocketService.TCPClient.onStatus2Handler(Client_onStatus2);
}
Then I have something created on a virtual
public delegate void onReceivedHandler(string Key, string value);
public event onReceivedHandler OnReceived;
public delegate void onDataReceivedHandler(string Message);
public event onDataReceivedHandler onDataReceived;
public virtual void RaiseOnDataReceived(string Message)
{
if (onDataReceived != null)
onDataReceived(Message);
}
And when the received event is triggered, the following function is called
void Client_onDataReceived(string Message)
{
try
{
RaiseOnDataReceived(Message);
Message = Message.TrimEnd();
string[] _strParts = null;
}
}
When a new message arrived, the Client_onDataReceived is triggered and after that RaiseOnDataReceived is called from the TCPMain with reference to the virtual. The problem is when it reached the virtual, the onDataReceived is = null so it return back to the Client and execute the rest of the coding. The second time the event is triggered, the onDataReceived is = something but there are only one message sent at the time.
So I thought Webclient.DownloadFileAysnc would have a default timeout but looking around the documentation I cannot find anything about it anywhere so I'm guessing it doesn't.
I am trying to download a file from the internet like so:
using (WebClient wc = new WebClient())
{
wc.DownloadProgressChanged += ((sender, args) =>
{
IndividualProgress = args.ProgressPercentage;
});
wc.DownloadFileCompleted += ((sender, args) =>
{
if (args.Error == null)
{
if (!args.Cancelled)
{
File.Move(filePath, Path.ChangeExtension(filePath, ".jpg"));
}
mr.Set();
}
else
{
ex = args.Error;
mr.Set();
}
});
wc.DownloadFileAsync(new Uri("MyInternetFile", filePath);
mr.WaitOne();
if (ex != null)
{
throw ex;
}
}
But if I turn off my WiFi (simulating a drop of internet connection) my application just pauses and the download stops but it will never report that through to the DownloadFileCompleted method.
For this reason I would like to implement a timeout on my WebClient.DownloadFileAsync method. Is this possible?
As an aside I am using .Net 4 and don't want to add references to third party libraries so cannot use the Async/Await keywords
You can use WebClient.DownloadFileAsync(). Now inside a timer you can call CancelAsync() like so:
System.Timers.Timer aTimer = new System.Timers.Timer();
System.Timers.ElapsedEventHandler handler = null;
handler = ((sender, args)
=>
{
aTimer.Elapsed -= handler;
wc.CancelAsync();
});
aTimer.Elapsed += handler;
aTimer.Interval = 100000;
aTimer.Enabled = true;
Else create your own weclient
public class NewWebClient : WebClient
{
protected override WebRequest GetWebRequest(Uri address)
{
var req = base.GetWebRequest(address);
req.Timeout = 18000;
return req;
}
}
Create a WebClientAsync class that implements the timer in the constructor. This way you aren't copying and pasting the timer code into every implementation.
public class WebClientAsync : WebClient
{
private int _timeoutMilliseconds;
public EdmapWebClientAsync(int timeoutSeconds)
{
_timeoutMilliseconds = timeoutSeconds * 1000;
Timer timer = new Timer(_timeoutMilliseconds);
ElapsedEventHandler handler = null;
handler = ((sender, args) =>
{
timer.Elapsed -= handler;
this.CancelAsync();
});
timer.Elapsed += handler;
timer.Enabled = true;
}
protected override WebRequest GetWebRequest(Uri address)
{
WebRequest request = base.GetWebRequest(address);
request.Timeout = _timeoutMilliseconds;
((HttpWebRequest)request).ReadWriteTimeout = _timeoutMilliseconds;
return request;
}
protected override voidOnDownloadProgressChanged(DownloadProgressChangedEventArgs e)
{
base.OnDownloadProgressChanged(e);
timer.Reset(); //If this does not work try below
timer.Start();
}
}
This will allow you to timeout if you lose Internet connection while downloading a file.
Here is another implementation, I tried to avoid any shared class/object variables to avoid trouble with multiple calls:
public Task<string> DownloadFile(Uri url)
{
var tcs = new TaskCompletionSource<string>();
Task.Run(async () =>
{
bool hasProgresChanged = false;
var timer = new Timer(new TimeSpan(0, 0, 20).TotalMilliseconds);
var client = new WebClient();
void downloadHandler(object s, DownloadProgressChangedEventArgs e) => hasProgresChanged = true;
void timerHandler(object s, ElapsedEventArgs e)
{
timer.Stop();
if (hasProgresChanged)
{
timer.Start();
hasProgresChanged = false;
}
else
{
CleanResources();
tcs.TrySetException(new TimeoutException("Download timedout"));
}
}
void CleanResources()
{
client.DownloadProgressChanged -= downloadHandler;
client.Dispose();
timer.Elapsed -= timerHandler;
timer.Dispose();
}
string filePath = Path.Combine(Path.GetTempPath(), Path.GetFileName(url.ToString()));
try
{
client.DownloadProgressChanged += downloadHandler;
timer.Elapsed += timerHandler;
timer.Start();
await client.DownloadFileTaskAsync(url, filePath);
}
catch (Exception e)
{
tcs.TrySetException(e);
}
finally
{
CleanResources();
}
return tcs.TrySetResult(filePath);
});
return tcs.Task;
}
This has been awnsered many times here and at other sites and its working, but I would like ideas to other ways to:
get the ReadyState = Complete after using a navigate or post, without using DoEvents because of all of its cons.
I would also note that using the DocumentComplete event woud not help here as I wont be navigating on only one page, but one after another like this.
wb.navigate("www.microsoft.com")
//dont use DoEvents loop here
wb.Document.Body.SetAttribute(textbox1, "login")
//dont use DoEvents loop here
if (wb.documenttext.contais("text"))
//do something
The way it is today its working by using DoEvents. I would like to know if anyone have a proper way to wait the async call of the browser methods to only then proceed with the rest of the logic. Just for the sake of it.
Thanks in advance.
Below is a basic WinForms app code, illustrating how to wait for the DocumentCompleted event asynchronously, using async/await. It navigates to multiple pages, one after another. Everything is taking place on the main UI thread.
Instead of calling this.webBrowser.Navigate(url), it might be simulating a form button click, to trigger a POST-style navigation.
The webBrowser.IsBusy async loop logic is optional, its purpose is to account (non-deterministically) for the page's dynamic AJAX code which may take place after window.onload event.
using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;
namespace WebBrowserApp
{
public partial class MainForm : Form
{
WebBrowser webBrowser;
public MainForm()
{
InitializeComponent();
// create a WebBrowser
this.webBrowser = new WebBrowser();
this.webBrowser.Dock = DockStyle.Fill;
this.Controls.Add(this.webBrowser);
this.Load += MainForm_Load;
}
// Form Load event handler
async void MainForm_Load(object sender, EventArgs e)
{
// cancel the whole operation in 30 sec
var cts = new CancellationTokenSource(30000);
var urls = new String[] {
"http://www.example.com",
"http://www.gnu.org",
"http://www.debian.org" };
await NavigateInLoopAsync(urls, cts.Token);
}
// navigate to each URL in a loop
async Task NavigateInLoopAsync(string[] urls, CancellationToken ct)
{
foreach (var url in urls)
{
ct.ThrowIfCancellationRequested();
var html = await NavigateAsync(ct, () =>
this.webBrowser.Navigate(url));
Debug.Print("url: {0}, html: \n{1}", url, html);
}
}
// asynchronous navigation
async Task<string> NavigateAsync(CancellationToken ct, Action startNavigation)
{
var onloadTcs = new TaskCompletionSource<bool>();
EventHandler onloadEventHandler = null;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = delegate
{
// DocumentCompleted may be called several time for the same page,
// if the page has frames
if (onloadEventHandler != null)
return;
// so, observe DOM onload event to make sure the document is fully loaded
onloadEventHandler = (s, e) =>
onloadTcs.TrySetResult(true);
this.webBrowser.Document.Window.AttachEventHandler("onload", onloadEventHandler);
};
this.webBrowser.DocumentCompleted += documentCompletedHandler;
try
{
using (ct.Register(() => onloadTcs.TrySetCanceled(), useSynchronizationContext: true))
{
startNavigation();
// wait for DOM onload event, throw if cancelled
await onloadTcs.Task;
}
}
finally
{
this.webBrowser.DocumentCompleted -= documentCompletedHandler;
if (onloadEventHandler != null)
this.webBrowser.Document.Window.DetachEventHandler("onload", onloadEventHandler);
}
// the page has fully loaded by now
// optional: let the page run its dynamic AJAX code,
// we might add another timeout for this loop
do { await Task.Delay(500, ct); }
while (this.webBrowser.IsBusy);
// return the page's HTML content
return this.webBrowser.Document.GetElementsByTagName("html")[0].OuterHtml;
}
}
}
If you're looking to do something similar from a console app, here is an example of that.
The solution is simple:
// MAKE SURE ReadyState = Complete
while (WebBrowser1.ReadyState.ToString() != "Complete") {
Application.DoEvents();
}
// Move on to your sub-sequence code...
Dirty and quick.. I am a VBA guys, this logic has been working forever, just took me days and found none for C# but I just figured this out myself.
Following is my complete function, the objective is to obtain a segment of info from a webpage:
private int maxReloadAttempt = 3;
private int currentAttempt = 1;
private string GetCarrier(string webAddress)
{
WebBrowser WebBrowser_4MobileCarrier = new WebBrowser();
string innerHtml;
string strStartSearchFor = "subtitle block pull-left\">";
string strEndSearchFor = "<";
try
{
WebBrowser_4MobileCarrier.ScriptErrorsSuppressed = true;
WebBrowser_4MobileCarrier.Navigate(webAddress);
// MAKE SURE ReadyState = Complete
while (WebBrowser_4MobileCarrier.ReadyState.ToString() != "Complete") {
Application.DoEvents();
}
// LOAD HTML
innerHtml = WebBrowser_4MobileCarrier.Document.Body.InnerHtml;
// ATTEMPT (x3) TO EXTRACT CARRIER STRING
while (currentAttempt <= maxReloadAttempt) {
if (innerHtml.IndexOf(strStartSearchFor) >= 0)
{
currentAttempt = 1; // Reset attempt counter
return Sub_String(innerHtml, strStartSearchFor, strEndSearchFor, "0"); // Method: "Sub_String" is my custom function
}
else
{
currentAttempt += 1; // Increment attempt counter
GetCarrier(webAddress); // Recursive method call
} // End if
} // End while
} // End Try
catch //(Exception ex)
{
}
return "Unavailable";
}
Here is a "quick & dirty" solution. It's not 100% foolproof but it doesn't block UI thread and it should be satisfactory to prototype WebBrowser control Automation procedures:
private async void testButton_Click(object sender, EventArgs e)
{
await Task.Factory.StartNew(
() =>
{
stepTheWeb(() => wb.Navigate("www.yahoo.com"));
stepTheWeb(() => wb.Navigate("www.microsoft.com"));
stepTheWeb(() => wb.Navigate("asp.net"));
stepTheWeb(() => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }));
bool testFlag = false;
stepTheWeb(() => testFlag = wb.DocumentText.Contains("Get Started"));
if (testFlag) { /* TODO */ }
// ...
}
);
}
private void stepTheWeb(Action task)
{
this.Invoke(new Action(task));
WebBrowserReadyState rs = WebBrowserReadyState.Interactive;
while (rs != WebBrowserReadyState.Complete)
{
this.Invoke(new Action(() => rs = wb.ReadyState));
System.Threading.Thread.Sleep(300);
}
}
Here is a bit more generic version of testButton_Click method:
private async void testButton_Click(object sender, EventArgs e)
{
var actions = new List<Action>()
{
() => wb.Navigate("www.yahoo.com"),
() => wb.Navigate("www.microsoft.com"),
() => wb.Navigate("asp.net"),
() => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }),
() => {
bool testFlag = false;
testFlag = wb.DocumentText.Contains("Get Started");
if (testFlag) { /* TODO */ }
}
//...
};
await Task.Factory.StartNew(() => actions.ForEach((x)=> stepTheWeb (x)));
}
[Update]
I have adapted my "quick & dirty" sample by borrowing and sligthly refactoring #Noseratio's NavigateAsync method from this topic.
New code version would automate/execute asynchronously in UI thread context not only navigation operations but also Javascript/AJAX calls - any "lamdas"/one automation step task implementation methods.
All and every code reviews/comments are very welcome. Especially, from #Noseratio. Together, we will make this world better ;)
public enum ActionTypeEnumeration
{
Navigation = 1,
Javascript = 2,
UIThreadDependent = 3,
UNDEFINED = 99
}
public class ActionDescriptor
{
public Action Action { get; set; }
public ActionTypeEnumeration ActionType { get; set; }
}
/// <summary>
/// Executes a set of WebBrowser control's Automation actions
/// </summary>
/// <remarks>
/// Test form shoudl ahve the following controls:
/// webBrowser1 - WebBrowser,
/// testbutton - Button,
/// testCheckBox - CheckBox,
/// totalHtmlLengthTextBox - TextBox
/// </remarks>
private async void testButton_Click(object sender, EventArgs e)
{
try
{
var cts = new CancellationTokenSource(60000);
var actions = new List<ActionDescriptor>()
{
new ActionDescriptor() { Action = ()=> wb.Navigate("www.yahoo.com"), ActionType = ActionTypeEnumeration.Navigation} ,
new ActionDescriptor() { Action = () => wb.Navigate("www.microsoft.com"), ActionType = ActionTypeEnumeration.Navigation} ,
new ActionDescriptor() { Action = () => wb.Navigate("asp.net"), ActionType = ActionTypeEnumeration.Navigation} ,
new ActionDescriptor() { Action = () => wb.Document.InvokeScript("eval", new[] { "$('p').css('background-color','yellow')" }), ActionType = ActionTypeEnumeration.Javascript},
new ActionDescriptor() { Action =
() => {
testCheckBox.Checked = wb.DocumentText.Contains("Get Started");
},
ActionType = ActionTypeEnumeration.UIThreadDependent}
//...
};
foreach (var action in actions)
{
string html = await ExecuteWebBrowserAutomationAction(cts.Token, action.Action, action.ActionType);
// count HTML web page stats - just for fun
int totalLength = 0;
Int32.TryParse(totalHtmlLengthTextBox.Text, out totalLength);
totalLength += !string.IsNullOrWhiteSpace(html) ? html.Length : 0;
totalHtmlLengthTextBox.Text = totalLength.ToString();
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message, "Error");
}
}
// asynchronous WebBroswer control Automation
async Task<string> ExecuteWebBrowserAutomationAction(
CancellationToken ct,
Action runWebBrowserAutomationAction,
ActionTypeEnumeration actionType = ActionTypeEnumeration.UNDEFINED)
{
var onloadTcs = new TaskCompletionSource<bool>();
EventHandler onloadEventHandler = null;
WebBrowserDocumentCompletedEventHandler documentCompletedHandler = delegate
{
// DocumentCompleted may be called several times for the same page,
// if the page has frames
if (onloadEventHandler != null)
return;
// so, observe DOM onload event to make sure the document is fully loaded
onloadEventHandler = (s, e) =>
onloadTcs.TrySetResult(true);
this.wb.Document.Window.AttachEventHandler("onload", onloadEventHandler);
};
this.wb.DocumentCompleted += documentCompletedHandler;
try
{
using (ct.Register(() => onloadTcs.TrySetCanceled(), useSynchronizationContext: true))
{
runWebBrowserAutomationAction();
if (actionType == ActionTypeEnumeration.Navigation)
{
// wait for DOM onload event, throw if cancelled
await onloadTcs.Task;
}
}
}
finally
{
this.wb.DocumentCompleted -= documentCompletedHandler;
if (onloadEventHandler != null)
this.wb.Document.Window.DetachEventHandler("onload", onloadEventHandler);
}
// the page has fully loaded by now
// optional: let the page run its dynamic AJAX code,
// we might add another timeout for this loop
do { await Task.Delay(500, ct); }
while (this.wb.IsBusy);
// return the page's HTML content
return this.wb.Document.GetElementsByTagName("html")[0].OuterHtml;
}
I am having a method which fetches HTML from a url, extracts entities by parsing it, and returns List of entites. Here is sample code:
public List<Entity> FetchEntities()
{
List<Entity> myList = new List<Entity>();
string url = "<myUrl>";
string response = String.Empty;
client = new WebClient();
client.DownloadStringCompleted += (sender, e) =>
{
response = e.Result;
// parse response
// extract content and generate entities
// <---- I am currently filling list here
};
client.DownloadStringAsync(new Uri(url));
return myList;
}
The problem is while async call is in progress control returns with empty myList. How can I prevent this. My ultimate goal is to return filled list.
And also this method is in a seperate class library project and being called from windows phone application and I have to keep it like that only. Is there any way to do this or I am missing something? Any help will be greatly appreciated.
You can either pass callback to the method like this and make it async without Tasks, so u have to update method usage slightly.
public void FetchEntities(
Action<List<Entity>> resultCallback,
Action<string> errorCallback)
{
List<Entity> myList = new List<Entity>();
string url = "<myUrl>";
string response = String.Empty;
client = new WebClient();
client.DownloadStringCompleted += (sender, e) =>
{
response = e.Result;
// parse response
// extract content and generate entities
// <---- I am currently filling list here
if (response == null)
{
if (errorCallback != null)
errorCallback("Ooops, something bad happened");
}
else
{
if (callback != null)
callback(myList);
}
};
client.DownloadStringAsync(new Uri(url));
}
The other option is to force it be synchronous. Like that
public List<Entity> FetchEntities()
{
List<Entity> myList = new List<Entity>();
string url = "<myUrl>";
string response = String.Empty;
client = new WebClient();
AutoResetEvent waitHandle = new AutoResetEvent(false);
client.DownloadStringCompleted += (sender, e) =>
{
response = e.Result;
// parse response
// extract content and generate entities
// <---- I am currently filling list here
waitHandle.Set();
};
client.DownloadStringAsync(new Uri(url));
waitHandle.WaitOne();
return myList;
}
That is the point of asynchronous programming to be non-blocking. You can pass a callback as a parameter and handle the result somewhere else instead of trying to return it.
If you need to return the result you can use this TPL library, I've been using it without problem for a while now.
public Task<string> GetWebResultAsync(string url)
{
var tcs = new TaskCompletionSource<string>();
var client = new WebClient();
DownloadStringCompletedEventHandler h = null;
h = (sender, args) =>
{
if (args.Cancelled)
{
tcs.SetCanceled();
}
else if (args.Error != null)
{
tcs.SetException(args.Error);
}
else
{
tcs.SetResult(args.Result);
}
client.DownloadStringCompleted -= h;
};
client.DownloadStringCompleted += h;
client.DownloadStringAsync(new Uri(url));
return tcs.Task;
}
}
And calling it is exactly how you use TPL in .net 4.0
GetWebResultAsnyc(url).ContinueWith((t) =>
{
t.Result //this is the downloaded string
});
or:
var downloadTask = GetWebResultAsync(url);
downloadTask.Wait();
var result = downloadTask.Result; //this is the downloaded string
Hope this helps :)