I'm using WebBrowser for parse webpages with javascript and
I've been try to find what I need on page by using
DocumentCompleted event and checking Document property in WebBrowser by HtmlElement inner text
(i know that it's appear after some time while page loading).
something like that:
private void WebDocCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
parseWebbrowserDocumentPropertyFunc(); // wich set the content value to true
if (!content)
return;
}
so, when I start debug code for execution, i see one thing:
tag (HtmlElement) appear in webbroser, after webbroser stops fire DocumentCompleted event.
I mean that there is no DocumentCompleted event happend, but DocumentText property still change.
Ok, i've been done my work by using timer + Application.DoEvents()
everything is fine, but parsing process begin to take a lot of time, because - i dont know why.
And now I think that Application.DoEvents() is not good solution and i still want to use DocumentCompleted event,
but i cant find some inforation about:
Why DocumentText property changes happend without fire DocumentCompleted event
or
What can I use to wait tag i needed on a page instead timer
timer sets to 200ms
The DocumentCompleted event does fire every time a frame is loaded. So obviously the document is different every time it fires.
You can add an additional check in your event. The ReadyState only gets set when the whole document is completed.
private void WebDocCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (yourBrowser.ReadyState != WebBrowserReadyState.Complete)
return;
parseWebbrowserDocumentPropertyFunc();
}
Why DocumentText property changes happend without fire
DocumentCompleted event
This is because your timer working too fast (200ms) and DocumentComplete event can not get in time. Try to stop you timer until DocumentComplete not fired.
private void DocumentComplete(object sender, WebBrowserDocumentCompletedEventArgs e)
{
_Timer.Start();
if (yourBrowser.ReadyState != WebBrowserReadyState.Complete)
return;
parseWebbrowserDocumentPropertyFunc();
}
private void parseWebbrowserDocumentPropertyFunc()
{
_Timer.Stop();
//something parse.....
}
Related
What I have going on is a Invokemember("Click"), the problem is I want to be able to grab the resulting innerhtml. The problem is i'm unsure of how/if it's possible to wait until the resulting action of the invokemember("click") is resolved. Meaning, in a javascript when you perform this click it will take you ot the next 20 items listed. However, i'm unsure of how to tell when that javascript will be fully loaded. Below is what I'm using.
private void button1_Click(object sender, EventArgs e)
{
HtmlElement button = webBrowser1.Document.GetElementById("ctl08_ctl00_InventoryListDisplayFieldRepeater2_ctl00_BlockViewPaging_Next");
button.InvokeMember("click");
HtmlElement document = webBrowser1.Document.GetElementsByTagName("html")[0];
}
One possible solution is to modify your "click" event handler in javascript so it changes the value of some hidden input field right before exiting the method (after all work is done). You can attach to the event of changing field from C# code and act when it's fired.
// Your init method - you can call it after `InitializeComponent`
// in the constructor of your form
Init() {
webBrowser1.DocumentCompleted += webBrowser1_DocumentCompleted;
}
void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
webBrowser1.Document.GetElementsByTagName("statusField")[0].AttachEventHandler("onchange", WorkDone);
}
void WorkDone(object sender, EventArgs e) {
HtmlElement document = webBrowser1.Document.GetElementsByTagName("html")[0];
}
That's the raw solution, I haven't yet checked whether "onchange" is a correct DOM event.
Also, you can't attach to DOM events before the document is completely loaded, that's why I put the attaching logic in the handler of DocumentCompleted event.
I'm talking about that "webbrowser" control which uses internet explorer.
How can I detect when it finishes loading?
Always look for events in such cases. In this case, DocumentCompleted event.
Subscribe to the WebBrowser control's DocumentCompleted event. i.e.
this.webBrowser.DocumentCompleted += new System.Windows.Forms.WebBrowserDocumentCompletedEventHandler(this.WebBrowser_DocumentCompleted);
private void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// Do stuff here
}
I have this simple code, where when the user leaves the TextBox control, TreeView gets focused:
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
this.treeView1.Nodes.Add("A");
this.treeView1.Nodes[0].Nodes.Add("A.A");
this.treeView1.Nodes.Add("B");
this.treeView1.Nodes[0].Nodes.Add("B.A");
}
private void textBox1_Leave(object sender, EventArgs e)
{
System.Diagnostics.Debug.WriteLine("Leave..");
this.treeView1.Focus();
}
}
If we execute this code the Leave event is fired twice:
Leave..
Leave..
But if we set focus to other control, only one Leave event is fired.
Is that a problem of the TreeView? Do you know any workaround? Should we report this to Microsoft?
Thanks,
RG
this.treeView1.Focus();
Do not use the Focus() method in an event handler that's called because of a focusing event, like Leave. If you need to prevent a focus change then use the Validating event instead. Setting e.Cancel = true stops it.
But do note that this isn't very logical to do so for a TreeView, there isn't anything the user can do to alter the state of the control. You'll trap the user. Maybe that was the intention, do make sure the user can still close the window. If not then you might need the FormClosing event to force e.Cancel back to false.
Given that there is no code there to wire up the event I'm guessing you did it from the designer which means a line of code such as
textBox1.Leave += new EventHandler(textBox1_Leave);
will have been added to the Form1.designer.cs, check this file to ensure the line doesn't exist more than once as for each time this line is run you will get an event trigger, so if you run the line 3 times the Leave event will fire 3 times when you leave the textbox!
HTH
OneShot
I've been researching this stuff and everyone seems to agree that the solution is to check the ReadyState of the Web Browser until is set to Complete.
But actually the event is sometimes fired with the ReadyState set to Complete several times.
I don't think there is a solution with that crappy .NET WebBrowser, but there might be one if I use the underlying DOM component.
Only problem is, I have no idea how do access the DOM component behind the WebBrowser that fires the DocumentCompleted event.
DocumentCompleted will fire for each frame in the web page. The hard way is to count off the frames, shows you how to access the DOM:
private int mFrameCount;
private void startNavigate(string url) {
mFrameCount = 0;
webBrowser1.Navigate(url);
}
private void DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
mFrameCount += 1;
bool done = true;
if (webBrowser1.Document != null) {
HtmlWindow win = webBrowser1.Document.Window;
if (win.Frames.Count > mFrameCount && win.Frames.Count > 0) done = false;
}
if (done) {
Console.WriteLine("Now it is really done");
}
}
The easy way is to check the URL that completed loading:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.Equals(webBrowser1.Url)) {
Console.WriteLine("Now it is really done");
}
}
This would probably happen if the page uses Javascript or <meta refresh> to redirect to another page.
If so, there's no good workaround.
I can't find anything that will give 100% certainty.
Mentioned example (e.Url.Equals(webBrowser1.Url)) may work for a simple WebBrowser.Navigate(url), however, in my case I click nodes in code to open new frames in existing frames. Mostly the number of times "Navigating" and "DocumentCompleted" fire will be the same, but again NOT always. "isBusy = false" and "ReadyState = Complete" will always be the case when it's finished (at least so far) but it will also a few times have this state when it's still loading. Counting frames also seems useless for me, in one case DocumentCompleted is fired 23 times, however, all frames and sub(-sub-sub and so on) frames are 14 in total.
The only thing that seems to work is wait a short period (1 or 2 seconds?) to see if anything happens (any events fired, any state changes).
Hmm, I found another solution for me. Often we're not interested in the whole page being loaded, often we want certain elements to exists. So after each DocumentCompleted and when "isBusy = false" and "ReadyState = Complete" we can search the DOM if this element exists.
In my experience it's impossible to tell when a web page has finished loading until DocumentCompleted hasn't fired for a while. So I refresh a timer for around 1000ms every time the DocumentCompleted event triggers. Then when the timer times out I process the web page.
How can I detect when a System.Windows.Forms.WebBrowser control has completed loading?
I tried to use the Navigate and DocumentCompleted events but both of them were raised a few times during document loading!
I think the DocumentCompleted event will get fired for all child documents that are loaded as well (like JS and CSS, for example). You could look at the WebBrowserDocumentCompletedEventArgs in DocumentCompleted and check the Url property and compare that to the Url of the main page.
I did the following:
void BrowserDocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//The page is finished loading
}
The last page loaded tends to be the one navigated to, so this should work.
From here.
The following should work.
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//Check if page is fully loaded or not
if (this.webBrowser1.ReadyState != WebBrowserReadyState.Complete)
return;
else
//Action to be taken on page loading completion
}
Note the url in DocumentCompleted can be different than navigating url due to server transfer or url normalization (e.g. you navigate to www.microsoft.com and got http://www.microsoft.com in documentcomplete)
In pages with no frames, this event fires one time after loading is complete. In pages with multiple frames, this event fires for each navigating frame (note navigation is supported inside a frame, for instance clicking a link in a frame could navigate the frame to another page). The highest level navigating frame, which may or may not be the top level browser, fires the final DocumentComplete event.
In native code you would compare the sender of the DocumentComplete event to determine if the event is the final event in the navigation or not. However in Windows Forms the sender parameter is not wrapped by WebBrowserDocumentCompletedEventArgs. You can either sink the native event to get the parameter's value, or check the readystate property of the browser or frame documents in the DocumentCompleted event handler to see if all frames are in the ready state.
There is a prolblem with the readystate method as if a download manager is present and the navigation is to a downloadable file, the navigation could be cancelled by the download manager and the readystate won't become complete.
I had the same issue of multiple DocumentCompleted fired events and tried out all the suggestions above. Finally, seems that in my case neither IsBusy property works right nor Url property, but the ReadyState seems to be what I needed, because it has the status 'Interactive' while loading the multiple frames and it gets the status 'Complete' only after loading the last one. Thus, I know when the page is fully loaded with all its components.
I hope this may help others too :)
It doesn't seem to trigger DocumentCompleted/Navigated events for external Javascript or CSS files, but it will for iframes. As PK says, compare the WebBrowserDocumentCompletedEventArgs.Url property (I don't have the karma to make a comment yet).
If you're using WPF there is a LoadCompleted event.
If it's Windows.Forms, the DocumentCompleted event should be the correct one. If the page you're loading has frames, your WebBrowser control will fire the DocumentCompleted event for each frame (see here for more details). I would suggest checking the IsBusy property each time the event is fired and if it is false then your page is fully done loading.
Using the DocumentCompleted event with a page with multiple nested frames didn't work for me.
I used the Interop.SHDocVW library to cast the WebBrowser control like this:
public class webControlWrapper
{
private bool _complete;
private WebBrowser _webBrowserControl;
public webControlWrapper(WebBrowser webBrowserControl)
{
_webBrowserControl = webBrowserControl;
}
public void NavigateAndWaitForComplete(string url)
{
_complete = false;
_webBrowserControl.Navigate(url);
var webBrowser = (SHDocVw.WebBrowser) _webBrowserControl.ActiveXInstance;
if (webBrowser != null)
webBrowser.DocumentComplete += WebControl_DocumentComplete;
//Wait until page is complete
while (!_complete)
{
Application.DoEvents();
}
}
private void WebControl_DocumentComplete(object pDisp, ref object URL)
{
// Test if it's the main frame who called the event.
if (pDisp == _webBrowserControl.ActiveXInstance)
_complete = true;
}
This code works for me when navigating to a defined URL using the webBrowserControl.Navigate(url) method, but I don't know how to control page complete when a html button is clicked using the htmlElement.InvokeMember("click").
You can use the event ProgressChanged ; the last time it is raised will indicate that the document is fully rendered:
this.webBrowser.ProgressChanged += new
WebBrowserProgressChangedEventHandler(webBrowser_ProgressChanged);