Why does WebBrowser.Navigate returns null HttpDocument? - c#

I have tried:
var browser1 = new WebBrowser();
browser1.Navigate("https://zikiti.co.il/");
HtmlDocument document = browser1.Document;
But browser.Document is null.
Why?
What am I doing wrong ?
public static void FillForm()
{
browser1 = new WebBrowser();
browser1.Navigate(new Uri("https://zikiti.co.il/"));
browser1.Navigated += webBrowser1_Navigated;
Thread.CurrentThread.Join();
}
private static void webBrowser1_Navigated(object sender,
WebBrowserNavigatedEventArgs e)
{
HtmlDocument document = browser1.Document;
System.Console.WriteLine();
}
The application is stuck.
Btw, is there any easier way to fill and submit this form? (I cannot see the request header in Fiddler as the page is always blocked by JS).

Because it takes time to download the html. The amount of time nobody ever wants to wait for, especially a user interface thread, the hourglass won't do these day.
It tells you explicitly when it is available. DocumentCompleted event.
You have to pump a message loop to get that event.

Because Navigate is asynchronous, and the navigation has not even started by the time you read the Document property's value.
If you look at the example on that page, you will see that to read the "current" URL it needs to subscribe to the Navigated event; same applies to reading Document. The documentation for this event states:
Handle the Navigated event to receive notification when the WebBrowser
control has navigated to a new document. When the Navigated event
occurs, the new document has begun loading, which means you can access
the loaded content through the Document, DocumentText, and
DocumentStream properties.

Related

Click button after webBrowser1_DocumentCompleted event

I have a C# 4.0 WinForms application, which has a WebBrowser control and 2-buttons.
Clicking the first button sends a URL to the browser to navigate to a specified webSite.
Clicking the second button parses the OuterHtml of the webBrowser1.Document, looking for an "https://..." link for File Download.
The code then uses a webClient.DownloadFileAsync to pull down a file for further use in the application.
The above code successfully works, if I manually click those buttons.
In an effort to automate this for the end-user, I place the first button's click event, i.e. btnDisplayWeb.PerformClick(); in the form's Form1_Load event. This also works, allowing the webBrowser1 to populate its Document with the desired webSite.
However, I am unable to programatically click the 2nd button to acquire the web link for file download.
I have tried to place the 2nd buttons click event within the browser's DocumentCompleted event, as shown below.
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
btnMyUrl.PerformClick();
}
However, from what I've read on StackOverFlow and other sites, it's possible that this particular event gets called more than once, and hence it fails.
I've also attempted to loop for a number of seconds, or even use a Thread.Sleep(xxxx), but the browser window fails to populate until the sleep or timer stops.
I attempted to use the suggestions found on the following StackOverFlow site shown below.
How to use WebBrowser control DocumentCompleted event in C#?
private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
string url = e.Url.ToString();
if (!(url.StartsWith("http://") || url.StartsWith("https://")))
{
// in AJAX
}
if (e.Url.AbsolutePath != this.webBrowser.Url.AbsolutePath)
{
// IFRAME
}
else
{
// REAL DOCUMENT COMPLETE
}
}
However, in parsing the OuterHtml, nothing is returned in the first two sections, and in the third section, other elements are returned instead of the desired "https://..." link for File Download.
Interestingly, if I use a webBrowser1.ReadyState event, as shown below, and place a MessageBox inside DocumentCompleted, this seems to allow the browser document to complete, because after clicking the OK button, the parsing is successful.
if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
{
MessageBox.Show("waiting", "CHECKING");
btnMyUrl.PerformClick();
}
However, I then have the difficulty of finding a way to click the OK button of the MessageBox.
Is there another event that occurs after the DocumentCompleted event.
OR, can someone suggest how to programmatically close the MessageBox?
If this can be done in code, then I can perform the buttonClick() of the 2nd button in that section of code.
After finding that the addition of a MessageBox allows the webBrowser1.Document to complete, and using webBrowser1.ReadyState event within the webBrowser_DocumentCompleted event, all I needed to do, was to find a way to programmatically close the MessageBox.
Further searching on StackOverFlow revealed the following solution on the site below.
Close a MessageBox after several seconds
Implementing the AutoClosingMessageBox, and setting a time interval, closed the MessageBox and allowed my button click, i.e. btnMyUrl.PerformClick(); to successfully parse the OuterHtml and now the code works properly.
Hopefully, if someone else discovers that placing a MessageBox within the webBrowser_DocumentCompleted event allows the document to complete; the aforementioned AutoClosingMessageBox will assist them as well.

Open new page in standard browser when link is clicked

I can open the link in my standard browser with this code:
public void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
//cancel the current event
e.Cancel = true;
//this opens the URL in the user's default browser
Process.Start(e.Url.ToString());
}
But the problem is that IE only should be opened when a link on the webbrowser is clicked. When using this code IE also opens when I change the documenttext.
My suggestion would be to take a different approach. At the point in time immediately after the initial page has loaded in the WebBrowser control (Navigated event), you can use the webBrowser1.Document property to retrieve an HtmlDocument instance.
From this you should be able to find your link by using, for example,
http://msdn.microsoft.com/en-us/library/system.windows.forms.htmldocument.getelementbyid(v=vs.110).aspx
Then you can add an event handler to detect when this link is clicked, and in this handler, run your code to start the IE process.

WebBrowser OnPropertyChange event is not firing

I have a webpage, and an input button on it. Clicking on this button, in a specific div are loaded some data. My problem is that I can't catch this data.
The following code is my attempt to solve this problem, but without success.
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
HtmlElement div_result = webBrowser1.Document.GetElementById("div_result");
div_result.AttachEventHandler("onpropertychange", new EventHandler(resultEventHandler));
}
private void resultEventHandler(object sender, EventArgs e)
{
MessageBox.Show("Loaded");
}
If I click on input button, div's content is modified, but the resultEventHandler does not fire.
So, I have two questions:
Where is my fault in this code?
Is there a "normal way"(I mean without using timers or Aplication.DoEvents()) to work with ajax using WebBrowser control in C#?
Changing the innerText or innerHTML of child elements will not cause the onpropertychange event to fire for the parent element.
I cannot tell why HtmlElement events do not work for you. But I had the same problem and resolved it by using COM wrappers:
mshtml.HTMLDocumentClass doc = (mshtml.HTMLDocumentClass)webBrowser1.Document.DomDocument;
mshtml.IHTMLElement2 div_result = (mshtml.IHTMLElement2)doc.getElementById("div_result");
mshtml.HTMLElementEvents2_Event events = (mshtml.HTMLElementEvents2_Event)div_result;
events.onpropertychange += resultEventHandler;
This may be too late but here it is anyways in case you are still wondering:
1 - HtmlElement div_result = webBrowser1.Document.GetElementById("div_result");
Line #1 executes when the web page has loaded and the DocumentCompleted event has been called by the browser. Inside the event handler you first retrieve a pointer to the DOM element "div_result" and you assign an HtmlElement type variable named div_result.
2 - div_result.AttachEventHandler("onpropertychange", new EventHandler(resultEventHandler));
Line #2 - registers the "onpropertychange" event and assigns the method resultEventHandler as the listener method.
Every time you click on the button on your web page, the button (which I assume) is in a form that gets submitted which causes the web page to load; by the way you did not specify which post method you are using when the button is clicked (get or post). When the web page download completes and the DOM element tree is constructed your DocumentCompleted event is called. You're DocumentCompleted event handler method performed the instructions described above.
Every time you click your button, you web page is reloaded and you reassign the event listener for the onpropertychange event. You are only assigning the event listener. The event will never be called. It will just be defined every time you click the button. You have a classic which came first chicken, or egg problem. But in your case, your chicken is the button click event causing the DocumentCompleted method to run which resets the state of all variables in the method, and your egg is wanting a pointer to the DIV element's onpropertychange event before the button is clicked on the web page . How do you assign an event listener to an htmlelement before you can get a pointer from the DOM which has not been constructed? Put a Boolean flag variable in the class that contains the DocumentCompleted method and set its initial state. Move the div_result variable outside of the DocumentCompleted method to increase its scope and to save its state across a button click event. This way, you will get a pointer to your div element and set its onpropertychange event listener the first time the web page is downloaded. Add a test if you just want to set a pointer to the DIV element's onpropertychange event listener just once (I put one in the sample code below), and the next time you click your button, your event will fire. NOTE! Make sure you do not add, or delete any element to your web page after you store a pointer to any of the web page's elements or their events. Otherwise you will have to reparse the DOM to get a pointer to the Element's new position in the DOM tree.
//See below:
class SomeClass
{
bool DocumentHasLoaded = false;
HtmlElement div_result = null;
//Constructor and other methods go here....
//Then change your DocumentCompleted method to look like this:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (DocumentHasLoaded == false) // I prefer using a ! before the variable instead
{
DocumentHasLoaded = true; // You will have to create your own appropriately timed mechanism to reset this variable's state should you want to execute the whole process again.
div_result = webBrowser1.Document.GetElementById("div_result");
div_result.AttachEventHandler("onpropertychange", new EventHandler(resultEventHandler));
}
}
}
In order to answer your second question I require more information about the data that is loaded in the DIV; e.g. where it comes from and what type it is plus any other pertinent information you can think of.

WebBrowser DocumentCompleted event fired more than once

I've been researching this stuff and everyone seems to agree that the solution is to check the ReadyState of the Web Browser until is set to Complete.
But actually the event is sometimes fired with the ReadyState set to Complete several times.
I don't think there is a solution with that crappy .NET WebBrowser, but there might be one if I use the underlying DOM component.
Only problem is, I have no idea how do access the DOM component behind the WebBrowser that fires the DocumentCompleted event.
DocumentCompleted will fire for each frame in the web page. The hard way is to count off the frames, shows you how to access the DOM:
private int mFrameCount;
private void startNavigate(string url) {
mFrameCount = 0;
webBrowser1.Navigate(url);
}
private void DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e) {
mFrameCount += 1;
bool done = true;
if (webBrowser1.Document != null) {
HtmlWindow win = webBrowser1.Document.Window;
if (win.Frames.Count > mFrameCount && win.Frames.Count > 0) done = false;
}
if (done) {
Console.WriteLine("Now it is really done");
}
}
The easy way is to check the URL that completed loading:
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.Equals(webBrowser1.Url)) {
Console.WriteLine("Now it is really done");
}
}
This would probably happen if the page uses Javascript or <meta refresh> to redirect to another page.
If so, there's no good workaround.
I can't find anything that will give 100% certainty.
Mentioned example (e.Url.Equals(webBrowser1.Url)) may work for a simple WebBrowser.Navigate(url), however, in my case I click nodes in code to open new frames in existing frames. Mostly the number of times "Navigating" and "DocumentCompleted" fire will be the same, but again NOT always. "isBusy = false" and "ReadyState = Complete" will always be the case when it's finished (at least so far) but it will also a few times have this state when it's still loading. Counting frames also seems useless for me, in one case DocumentCompleted is fired 23 times, however, all frames and sub(-sub-sub and so on) frames are 14 in total.
The only thing that seems to work is wait a short period (1 or 2 seconds?) to see if anything happens (any events fired, any state changes).
Hmm, I found another solution for me. Often we're not interested in the whole page being loaded, often we want certain elements to exists. So after each DocumentCompleted and when "isBusy = false" and "ReadyState = Complete" we can search the DOM if this element exists.
In my experience it's impossible to tell when a web page has finished loading until DocumentCompleted hasn't fired for a while. So I refresh a timer for around 1000ms every time the DocumentCompleted event triggers. Then when the timer times out I process the web page.

Detect WebBrowser complete page loading

How can I detect when a System.Windows.Forms.WebBrowser control has completed loading?
I tried to use the Navigate and DocumentCompleted events but both of them were raised a few times during document loading!
I think the DocumentCompleted event will get fired for all child documents that are loaded as well (like JS and CSS, for example). You could look at the WebBrowserDocumentCompletedEventArgs in DocumentCompleted and check the Url property and compare that to the Url of the main page.
I did the following:
void BrowserDocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
if (e.Url.AbsolutePath != (sender as WebBrowser).Url.AbsolutePath)
return;
//The page is finished loading
}
The last page loaded tends to be the one navigated to, so this should work.
From here.
The following should work.
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
//Check if page is fully loaded or not
if (this.webBrowser1.ReadyState != WebBrowserReadyState.Complete)
return;
else
//Action to be taken on page loading completion
}
Note the url in DocumentCompleted can be different than navigating url due to server transfer or url normalization (e.g. you navigate to www.microsoft.com and got http://www.microsoft.com in documentcomplete)
In pages with no frames, this event fires one time after loading is complete. In pages with multiple frames, this event fires for each navigating frame (note navigation is supported inside a frame, for instance clicking a link in a frame could navigate the frame to another page). The highest level navigating frame, which may or may not be the top level browser, fires the final DocumentComplete event.
In native code you would compare the sender of the DocumentComplete event to determine if the event is the final event in the navigation or not. However in Windows Forms the sender parameter is not wrapped by WebBrowserDocumentCompletedEventArgs. You can either sink the native event to get the parameter's value, or check the readystate property of the browser or frame documents in the DocumentCompleted event handler to see if all frames are in the ready state.
There is a prolblem with the readystate method as if a download manager is present and the navigation is to a downloadable file, the navigation could be cancelled by the download manager and the readystate won't become complete.
I had the same issue of multiple DocumentCompleted fired events and tried out all the suggestions above. Finally, seems that in my case neither IsBusy property works right nor Url property, but the ReadyState seems to be what I needed, because it has the status 'Interactive' while loading the multiple frames and it gets the status 'Complete' only after loading the last one. Thus, I know when the page is fully loaded with all its components.
I hope this may help others too :)
It doesn't seem to trigger DocumentCompleted/Navigated events for external Javascript or CSS files, but it will for iframes. As PK says, compare the WebBrowserDocumentCompletedEventArgs.Url property (I don't have the karma to make a comment yet).
If you're using WPF there is a LoadCompleted event.
If it's Windows.Forms, the DocumentCompleted event should be the correct one. If the page you're loading has frames, your WebBrowser control will fire the DocumentCompleted event for each frame (see here for more details). I would suggest checking the IsBusy property each time the event is fired and if it is false then your page is fully done loading.
Using the DocumentCompleted event with a page with multiple nested frames didn't work for me.
I used the Interop.SHDocVW library to cast the WebBrowser control like this:
public class webControlWrapper
{
private bool _complete;
private WebBrowser _webBrowserControl;
public webControlWrapper(WebBrowser webBrowserControl)
{
_webBrowserControl = webBrowserControl;
}
public void NavigateAndWaitForComplete(string url)
{
_complete = false;
_webBrowserControl.Navigate(url);
var webBrowser = (SHDocVw.WebBrowser) _webBrowserControl.ActiveXInstance;
if (webBrowser != null)
webBrowser.DocumentComplete += WebControl_DocumentComplete;
//Wait until page is complete
while (!_complete)
{
Application.DoEvents();
}
}
private void WebControl_DocumentComplete(object pDisp, ref object URL)
{
// Test if it's the main frame who called the event.
if (pDisp == _webBrowserControl.ActiveXInstance)
_complete = true;
}
This code works for me when navigating to a defined URL using the webBrowserControl.Navigate(url) method, but I don't know how to control page complete when a html button is clicked using the htmlElement.InvokeMember("click").
You can use the event ProgressChanged ; the last time it is raised will indicate that the document is fully rendered:
this.webBrowser.ProgressChanged += new
WebBrowserProgressChangedEventHandler(webBrowser_ProgressChanged);

Categories