How to get HTML from WebBrowser control [duplicate] - c#

This question already has an answer here:
How can I get an HtmlElementCollection from a WPF WebBrowser
(1 answer)
Closed 5 years ago.
There are loads of posts similar to this.
How to get rendered html (processed by Javascript) in WebBrowser control? suggests to use something like
webBrowser1.Document.GetElementsByTagName("HTML")[0].OuterHtml;
Document is treated as an object, I have no option to use GetElementsByTagName
Copy all text from webbrowser control suggests to use DocumentText
I have Document but no DocumentText
That post also suggests webBrowser.Document.Body.InnerText;
I have the option to use webBrowser.Document but that is it. For some reason webBrowser.Document is an object and as such I can't access these methods.
Getting the HTML source through the WebBrowser control in C# also suggests using DocumentStream. Again, I don't have that.
I'm doing this in a WPF application and using WebBrowser from System.Windows.Controls
All I'm trying to is read the rendered HTML from the web page.
My code
public void Begin(WebBrowser wb)
{
this._wb = wb;
_wb.Navigated += _wb_Navigated;
_wb.Navigate("myUrl");
}
private void _wb_Navigated(object sender, System.Windows.Navigation.NavigationEventArgs e)
{
var html = _wb.Document;//this is where I need help
}

Your samples refer to the WinForms-WebBrowserControl.
Add a reference to Microsoft.mshtml (via add-reference dialog->search) to your project.
Cast the Document-Property to
HTMLDocument
in order to access methods and properties (as stated on MSDN).
See also my GitHub-Sample:
private void WebBrowser_Navigated(object sender, NavigationEventArgs e) {
var document = (HTMLDocument)_Browser.Document;
_Html.Text = document.body.outerHTML;
}

Related

C# Webbrowser control, mismatch between displayed content and Document.innerHtml

So I have a website which I load into my form's webbrowser control. After loading the document, I retrive the webbrowser.documenttext . I am looking here to parse specific table. But I am not finding the table in here but I see that it is being dispayed in the form browser.
I tell you that this specific table is being loaded/appended to doc by already loaded javascript code.
When I right click and select the "View Source" , it pops the document with correct html.
My question is how can I get the same document referenced by ViewSource or is there any way to get the document what is being rendered on form?
Instead of using Webbrowser Control use HtmlAgilityPackage to parse data based on your need.
var html = new HtmlDocument();
html.LoadHtml(new WebClient().DownloadString("http://www.asp.net"));
var root = html.DocumentNode;
var commonPosts = root.Descendants().Where(n => n.GetAttributeValue("class",
"").Equals("common-post"));
Similar Existing Question
The above issue was very similar to my issue and after going thorough the answer I learnt that I somehow need to wait and poll the webbrowser to get the dynamic content.
I did not really implement the code provided in the answer but I changed my documentCompleted event as async and provided a await task delay of 5s
private async void Browser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
await Task.Delay(5000);
var html= wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
}
Now I get the dynamic result. Thanks, I am feeling now.

C# Winform Webbrowser not updating after document text update

Good Morning,
I have a Web browser embedded within a C# winform. When loading the web-browser, it loads in a local file and displays the page with no issues.
I then have a button with an OnClick method which does the following:
private void button1_Click(object sender, EventArgs e) {
this.webBrowser1.Navigate("about:blank");
HtmlDocument doc = this.webBrowser1.Document;
doc.Write(String.Empty);
this.webBrowser1.DocumentText = //PathToDocumentText;
}
This was taken from this SO question and causes the web browser to freeze up. On hover shows the cursor with the loading spinning icon.
I am simply wanting to change the document text from one local file to another (both work if I load them in manually OnLoad).
Any help appreciated.
this.webBrowser1.Navigate("about:blank");
this.webBrowser1.Document.OpenNew(false);
this.webBrowser1.Document.Write(//pathtoFile);
this.webBrowser1.Refresh();
This does the trick, Thanks to anyone who looked at this.

C# WebBrowser different html document after navigate

I have a really strange problem in C#:
First I use the WebBrowser control and the navigate method to browse.
wb_email.Navigate("https://registrierung.web.de");
Now I can change the innerText of htmlelements without any problems.
wb_email.Document.GetElementById("id4").InnerText = "Alexander";
But when I reload the page by simply using the navigate method with the same url again,
I get a null exception. It seems as he can't find the element.
So I used an inspector for Firefox to see if the htmlelement really changed, after reloading.
But only the url is changing, htmlelements are all the same.
What I'm doing wrong?
You're just changing the DOM in the displayed page. When you reload the page, the WebBrowser instance will just refresh the DOM from the server again and lose your changes.
The WebBrowser class isn't designed for editing rendered pages inside itself, as it's basically just a wrapper to an embedded Internet Explorer instance.
Make sure the website has finished loading before accessing any element. Like:
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);
void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// Access elements here
}

Prevent webbrowser from formating html file in c#

I have some trouble using the webbrowser control in C#, especially when I print it. The thing is, I have a Barcode in my html file and I have to print it (I use a font to create the code). When I open the html file with Firefox or any other Web browser, my barecode is good and I can scan it. But, when I open my file with my webbrowser in c#, or when I print it, the webbrowser ad 2 characters after the barecode. And, when I print the file, my document is not centered, it's like the webbrowser add a margin-left property. So my question is, is there any way to print an html file, using a webbroser, exactly how I see the html file when I use firefox or chrome for example. Here is the code I use.
curDir = Directory.GetCurrentDirectory();
webBrowserA4.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(PrintDocument);
webBrowserA4.Url = new Uri(String.Format("file:///{0}/print.html", curDir));
private void PrintDocument(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// Print the document now that it is fully loaded.
((WebBrowser)sender).Print();
// Dispose the WebBrowser now that the task is complete.
((WebBrowser)sender).Dispose();
}
EDIT: So, now, I have another problem, here is the screen of what my file looks like when I print it: http://imgur.com/q7ovEA1 As you can see, there is a "margin-left", and I can't remove it. I also want to remove this "page 1 of 1" and the Title.

How to grab the Contents Which updated by JavaScript WebBrowser

private void button1_Click(object sender, EventArgs e)
{
webBrowser1.Navigate(textBox1.Text);
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser web = (WebBrowser)sender;
richTextBox1.Text = web.DocumentText;
}
above is sample code.
it's giving all Text of Current Open, if contents is updated by JavaScript, it visible but Document.Text not update.
Please Help guys
I had the same problem. Use the following sample code:
IHTMLDocument2 doc = webBrowser1.Document.DomDocument as IHTMLDocument2;
string content = doc.body.innerText;
Also, add mshtml to the references of your project (if you dont know how to add the refernce, just google it).
Actually, whenever you use this code, the value in the doc variable is the updated version of the contents of the webbrowser.
Good Luck
I would guess that the javascript that is executing on the page which is modifying the content is happening after the DocumentCompleted event; Perhaps you can try a different event such as 'Invalidated'.
The WebBrowser.DocumentText also many not reflect any changes to the DOM, and you may need to navigate the DOM through the WebBrowser.Document property.
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.document.aspx

Categories