I've been researching this stuff and everyone seems to agree that the solution is to check the ReadyState of the Web Browser until is set to Complete.
But actually the event is sometimes fired with the ReadyState set to Complete several times.
I don't think there is a solution with that crappy .NET WebBrowser, but there might be one if I use the underlying DOM component.
Only problem is, I have no idea how do access the DOM component behind the WebBrowser that fires the DocumentCompleted event.
DocumentCompleted will fire for each frame in the web page. The hard way is to count off the frames, shows you how to access the DOM:
private int mFrameCount;
private void startNavigate(string url) {
mFrameCount = 0;
webBrowser1.Navigate(url);
}
private void DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
mFrameCount += 1;
bool done = true;
if (webBrowser1.Document != null) {
HtmlWindow win = webBrowser1.Document.Window;
if (win.Frames.Count > mFrameCount && win.Frames.Count > 0) done = false;
}
if (done) {
Console.WriteLine("Now it is really done");
}
}
The easy way is to check the URL that completed loading:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.Equals(webBrowser1.Url)) {
Console.WriteLine("Now it is really done");
}
}
This would probably happen if the page uses Javascript or <meta refresh> to redirect to another page.
If so, there's no good workaround.
I can't find anything that will give 100% certainty.
Mentioned example (e.Url.Equals(webBrowser1.Url)) may work for a simple WebBrowser.Navigate(url), however, in my case I click nodes in code to open new frames in existing frames. Mostly the number of times "Navigating" and "DocumentCompleted" fire will be the same, but again NOT always. "isBusy = false" and "ReadyState = Complete" will always be the case when it's finished (at least so far) but it will also a few times have this state when it's still loading. Counting frames also seems useless for me, in one case DocumentCompleted is fired 23 times, however, all frames and sub(-sub-sub and so on) frames are 14 in total.
The only thing that seems to work is wait a short period (1 or 2 seconds?) to see if anything happens (any events fired, any state changes).
Hmm, I found another solution for me. Often we're not interested in the whole page being loaded, often we want certain elements to exists. So after each DocumentCompleted and when "isBusy = false" and "ReadyState = Complete" we can search the DOM if this element exists.
In my experience it's impossible to tell when a web page has finished loading until DocumentCompleted hasn't fired for a while. So I refresh a timer for around 1000ms every time the DocumentCompleted event triggers. Then when the timer times out I process the web page.
Related
I have a problem with my little c# project.
I need to somehow navigate through a site, performing a few simple actions on each page. My solution to it was along the lines of this:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var button = webBrowser1.Document.GetItemById("next_page_button");
button.InvokeMember("click");
webBrowser1.Refresh();
//here's my ugly solution which works
do {} while (webBrowser1.ReadyState!=WebBrowserReadyState.Complete);
webBrowser1.Navigate("http://www.webtest.com/page3");
webBrowser1.Refresh();
//same method of waiting for loading, causes endless loop this time
do {} while (webBrowser1.ReadyState!=WebBrowserReadyState.Complete);
var images = webBrowser1.Document.GetElementsByTagName("img");
//and then I do stuff with all them images..
So basically my program detects that the webBrowser loaded a page just fine the first time with that ugly while loop, but then, after the navigate() command it enters the second loop and never comes out of it. How come?
I've checked and double checked everything in debug mode, going through every step.
I could use your advice on organizing this program better for sure. xD
After two years i don't know it would help!!
but for others, the main thread is getting busy with your while loop so webbrowser object can not do anything, you need to implement this
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler((object sender, WebBrowserDocumentCompletedEventArgs arg) =>
{
/// do anything you want with webbrowser document.
})
or using Application.DoEvents(), this will make your main thread loop once and the webbrowser object load it's resources like javascripts files.
I'm using WebBrowser for parse webpages with javascript and
I've been try to find what I need on page by using
DocumentCompleted event and checking Document property in WebBrowser by HtmlElement inner text
(i know that it's appear after some time while page loading).
something like that:
private void WebDocCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
parseWebbrowserDocumentPropertyFunc(); // wich set the content value to true
if (!content)
return;
}
so, when I start debug code for execution, i see one thing:
tag (HtmlElement) appear in webbroser, after webbroser stops fire DocumentCompleted event.
I mean that there is no DocumentCompleted event happend, but DocumentText property still change.
Ok, i've been done my work by using timer + Application.DoEvents()
everything is fine, but parsing process begin to take a lot of time, because - i dont know why.
And now I think that Application.DoEvents() is not good solution and i still want to use DocumentCompleted event,
but i cant find some inforation about:
Why DocumentText property changes happend without fire DocumentCompleted event
or
What can I use to wait tag i needed on a page instead timer
timer sets to 200ms
The DocumentCompleted event does fire every time a frame is loaded. So obviously the document is different every time it fires.
You can add an additional check in your event. The ReadyState only gets set when the whole document is completed.
private void WebDocCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (yourBrowser.ReadyState != WebBrowserReadyState.Complete)
return;
parseWebbrowserDocumentPropertyFunc();
}
Why DocumentText property changes happend without fire
DocumentCompleted event
This is because your timer working too fast (200ms) and DocumentComplete event can not get in time. Try to stop you timer until DocumentComplete not fired.
private void DocumentComplete(object sender, WebBrowserDocumentCompletedEventArgs e)
{
_Timer.Start();
if (yourBrowser.ReadyState != WebBrowserReadyState.Complete)
return;
parseWebbrowserDocumentPropertyFunc();
}
private void parseWebbrowserDocumentPropertyFunc()
{
_Timer.Stop();
//something parse.....
}
I am trying to make my WebBrowser wait until the page fully loads, then proceed to the next step. I've research how to do this, yet, my code keeps running before the page loads.
private void AdobeConnect_Load(object sender, EventArgs e)
{
for (int x = 1; x <= 3; x++)
{
while (acBrwsr.ReadyState != WebBrowserReadyState.Complete)
{
Application.DoEvents();
}
adobeStepper(x);
}
}
The problem is you are relying on ReadyState and you shouldn't be. In WebBrowser.DocumentComplete you need to check e.Url == WebBrowser.Url then check for the ready state. DocumentComplete fires multiple times with forms when you have frames and that messes with ReadyState.
What I do with my bots that use WebBrowser is I activate a timer when I have a document complete state for the actual page then grab my next item in the queue to process for that page like 1 second after the completion. (Of course always turn off the timer in the OnTick event.)
I wrote a queue that groups a set of tasks together where I can prioritize and remove any items like a list so I don't repeat tasks but only perform when DocumentCompleted e.url == webBrowser.url and my ReadyState is Complete.
I made a simple webbrowser in c#, which keeps reloading a page and then does something when the page is loaded.
However, after the first time, the following function doesn't fire anymore:
public void gotourl()
{
webBrowser.Navigate("Stackoverflow.com");
}
public void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
MessageBox.Show("Page loaded succesfully."); // only fires once
// waiting for a few seconds by using a timer
gotourl();
}
However, I did reload the page. The documentcompleted state simply doesn't fire again.
Is there a way to let the function fire every time I navigate to an url?
( I also tried webBrowser.Refresh() )
EDIT: I added the unbelievable solution..
The solution is unbelievable. After hours of searching and trying things, I found the answer.
In the properties of webBrowser1, I set the property "AllowNavigation" to false.
If set to false, it only registers the webBrowser1.DocumentCompleted() function only ONCE, but when the AllowNavigation property is set to true(which it is by default), the DocumentCompleted() function repeats.
I have no clue why it works this way and I hope people with the same problem find this answer, as it is the only answer on the net..
How can I detect when a System.Windows.Forms.WebBrowser control has completed loading?
I tried to use the Navigate and DocumentCompleted events but both of them were raised a few times during document loading!
I think the DocumentCompleted event will get fired for all child documents that are loaded as well (like JS and CSS, for example). You could look at the WebBrowserDocumentCompletedEventArgs in DocumentCompleted and check the Url property and compare that to the Url of the main page.
I did the following:
void BrowserDocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//The page is finished loading
}
The last page loaded tends to be the one navigated to, so this should work.
From here.
The following should work.
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//Check if page is fully loaded or not
if (this.webBrowser1.ReadyState != WebBrowserReadyState.Complete)
return;
else
//Action to be taken on page loading completion
}
Note the url in DocumentCompleted can be different than navigating url due to server transfer or url normalization (e.g. you navigate to www.microsoft.com and got http://www.microsoft.com in documentcomplete)
In pages with no frames, this event fires one time after loading is complete. In pages with multiple frames, this event fires for each navigating frame (note navigation is supported inside a frame, for instance clicking a link in a frame could navigate the frame to another page). The highest level navigating frame, which may or may not be the top level browser, fires the final DocumentComplete event.
In native code you would compare the sender of the DocumentComplete event to determine if the event is the final event in the navigation or not. However in Windows Forms the sender parameter is not wrapped by WebBrowserDocumentCompletedEventArgs. You can either sink the native event to get the parameter's value, or check the readystate property of the browser or frame documents in the DocumentCompleted event handler to see if all frames are in the ready state.
There is a prolblem with the readystate method as if a download manager is present and the navigation is to a downloadable file, the navigation could be cancelled by the download manager and the readystate won't become complete.
I had the same issue of multiple DocumentCompleted fired events and tried out all the suggestions above. Finally, seems that in my case neither IsBusy property works right nor Url property, but the ReadyState seems to be what I needed, because it has the status 'Interactive' while loading the multiple frames and it gets the status 'Complete' only after loading the last one. Thus, I know when the page is fully loaded with all its components.
I hope this may help others too :)
It doesn't seem to trigger DocumentCompleted/Navigated events for external Javascript or CSS files, but it will for iframes. As PK says, compare the WebBrowserDocumentCompletedEventArgs.Url property (I don't have the karma to make a comment yet).
If you're using WPF there is a LoadCompleted event.
If it's Windows.Forms, the DocumentCompleted event should be the correct one. If the page you're loading has frames, your WebBrowser control will fire the DocumentCompleted event for each frame (see here for more details). I would suggest checking the IsBusy property each time the event is fired and if it is false then your page is fully done loading.
Using the DocumentCompleted event with a page with multiple nested frames didn't work for me.
I used the Interop.SHDocVW library to cast the WebBrowser control like this:
public class webControlWrapper
{
private bool _complete;
private WebBrowser _webBrowserControl;
public webControlWrapper(WebBrowser webBrowserControl)
{
_webBrowserControl = webBrowserControl;
}
public void NavigateAndWaitForComplete(string url)
{
_complete = false;
_webBrowserControl.Navigate(url);
var webBrowser = (SHDocVw.WebBrowser) _webBrowserControl.ActiveXInstance;
if (webBrowser != null)
webBrowser.DocumentComplete += WebControl_DocumentComplete;
//Wait until page is complete
while (!_complete)
{
Application.DoEvents();
}
}
private void WebControl_DocumentComplete(object pDisp, ref object URL)
{
// Test if it's the main frame who called the event.
if (pDisp == _webBrowserControl.ActiveXInstance)
_complete = true;
}
This code works for me when navigating to a defined URL using the webBrowserControl.Navigate(url) method, but I don't know how to control page complete when a html button is clicked using the htmlElement.InvokeMember("click").
You can use the event ProgressChanged ; the last time it is raised will indicate that the document is fully rendered:
this.webBrowser.ProgressChanged += new
WebBrowserProgressChangedEventHandler(webBrowser_ProgressChanged);