How to convert HtmlDocument.DomDocument to string?
This example is a bit convoluted, but, assuming you have a form called Form1, with a WebBrowser control called webBrowser1, the variable content will contain the markup that forms the document:
private void Form1_Load(object sender, EventArgs e)
{
webBrowser1.Url = new Uri(#"http://www.robertwray.co.uk/");
}
private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var document = webBrowser1.Document;
var documentAsIHtmlDocument3 = (mshtml.IHTMLDocument3)document.DomDocument;
var content = documentAsIHtmlDocument3.documentElement.innerHTML;
}
The essential "guts" of extracting it from the HtmlDocument.DomDocument is in the webBrowser1_DocumentCompleted event handler.
Note: mshtml is obtained by adding a COM reference to 'Microsoft HTML Object Library` (aka: mshtml.dll)
It would be easier to use the HtmlDocument itself, rather than its DomDocument property:
string html = htmlDoc.Body.InnerHtml;
Or even simpler, if you have access to the WebBrowser containing the document:
string html = webBrowser.DocumentText;
Related
I'm writing a software that gets the content from URL. When working on that, I run into to problem that I can not get exactly the HTML content after the java script finished.
There are some websites that renders HTML by java-script, some do not support browsers which does not run js.
I tried using System.Windows.Controls.WebBrowser with WebBrowser.Document in LoadCompleted but no luck.
After that, I tried the OpenWebkitSharp library. On the UI, it showes the content of website correctly, but with code Document in DocumentCompleted, it still returns the content which does not rendered by java-script.
Here is my code:
...
using WebKit;
using WebKit.Interop;
public MainWindow()
{
windowFormHost = new System.Windows.Forms.Integration.WindowsFormsHost();
webBrowser = new WebKit.WebKitBrowser();
webBrowser.AllowDownloads = false;
windowFormHost.Child = webBrowser;
grdBrowserHost.Children.Add(windowFormHost);
webBrowser.Load += WebBrowser_Load;
}
private void WebBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var contentHtml = ((WebKitBrowser)sender).DocumentAsHTMLDocument;
}
The contentHtml has value which is not rendered after java-script finished.
Do solve this problem, I have added some trick into my code to get the full Html content after java-script finished.
using WebKit;
using WebKit.Interop;
using WebKit.JSCore; //We need add refrence JSCore which following with Webkit package.
public MainWindow()
{
InitializeComponent();
InitBrowser();
}
private void InitBrowser()
{
windowFormHost = new System.Windows.Forms.Integration.WindowsFormsHost();
webBrowser = new WebKit.WebKitBrowser();
webBrowser.AllowDownloads = false;
windowFormHost.Child = webBrowser;
grdBrowserHost.Children.Add(windowFormHost);
webBrowser.Load += WebBrowser_Load;
}
private void WebBrowser_Load(object sender, EventArgs e)
{
//The ResourceIntercepter will throws exception if webBrowser have not finished loading its components
//We can not use DocumentCompleted to load the Htmlcontent. Because that event will be fired before Java-script is finised
webBrowser.ResourceIntercepter.ResourceFinishedLoadingEvent += new ResourceFinishedLoadingHandler(ResourceIntercepter_ResourceFinishedLoadingEvent);
}
private void ResourceIntercepter_ResourceFinishedLoadingEvent(object sender, WebKitResourcesEventArgs e)
{
//The WebBrowser.Document still show the html without java-script.
//The trict is call Javascript (I used Jquery) to get the content of HTML
JSValue documentContent = null;
var readyState = webBrowser.GetScriptManager.EvaluateScript("document.readyState");
if (readyState != null && readyState.ToString().Equals("complete"))
{
documentContent = webBrowser.GetScriptManager.EvaluateScript("$('html').html();");
var contentHtml = documentContent.ToString();
}
}
Hope this one can help you.
I am trying to automate fill the textbox of a website in c# and i used:
private void button1_Click(object sender, EventArgs e)
{
System.Windows.Forms.WebBrowser webBrowser = new WebBrowser();
HtmlDocument document = null;
document=webBrowser.Document;
System.Diagnostics.Process.Start("http://www.google.co.in");
document.GetElementById("lst-ib").SetAttribute("value", "ss");
}
The webpage is opening but the text box is not filled with the specified value. I have also tried innertext instead of setAttribute. I am using windows forms.
You are expecting that your webBrowser will load the page at specified address, but actually your code will start default browser (pointing at "http://www.google.co.in"), while webBrowser.Document will remain null.
try to replace the Process.Start with
webBrowser.Navigate(yourUrl);
Eliminate the Process.Start() statement (as suggested by Gian Paolo) because it starts a WebBrowser as an external process.
The problem with your code is that you want to manipulate the value of your element too fast. Wait for the website to be loaded completely:
private void button1_Click(object sender, EventArgs e)
{
System.Windows.Forms.WebBrowser webBrowser = new WebBrowser();
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);
webBrowser.Navigate("http://www.google.co.in");
}
private void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
webBrowser.document.GetElementById("lst-ib").SetAttribute("value", "ss");
}
Please note that using a instance of a WebBrowser is not often the best solution for a problem. It uses a lot of RAM and has some overhead you could avoid.
I would like to use the drop functionality of the WebBrowser control in C#. Unfortunately it doesn't work although I set the AllowWebBrowserDrop property to true.
For testing I wrote this little programm with just a textbox and a webbrowser control:
public Form1()
{
InitializeComponent();
webBrowser1.AllowWebBrowserDrop = true;
textBox1.Text = "http://www.google.com";
}
private void textBox1_MouseMove(object sender, MouseEventArgs e)
{
if (e.Button == System.Windows.Forms.MouseButtons.Left)
DoDragDrop(textBox1.Text, DragDropEffects.Link);
}
private void webBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
{
MessageBox.Show(e.Url.AbsoluteUri);
}
The DoDragDrop method gets executed correctly, but I never see the MessageBox appearing when dropping the string from the TextBox over the WebControl. Since the WebControl doesn't offer the usual drag & drop events I'm lost.
What do I have to do to make the url drop to the WebBrowser control work?
Use the following approach to initiate FileDrop dragging:
DataObject dObj = new DataObject();
var paths = new System.Collections.Specialized.StringCollection();
paths.Add(textBox1.Text);
dObj.SetFileDropList(paths);
textBox1.DoDragDrop(dObj, DragDropEffects.Link);
I'm trying to create a control that converts the page it's on into a PDF.
protected void ConvertPageToPDF_click(object sender, EventArgs e)
{
string pageHtml;
byte[] pdfBytes;
string url = HttpContext.Current.Request.Url.AbsoluteUri;
// get the HTML for the entire page into pageHtml
HtmlTextWriter hw = new HtmlTextWriter(new StringWriter());
this.Page.RenderControl(hw);
pageHtml = hw.InnerWriter.ToString();
// send pageHtml to a library for conversion
// send the PDF to the user
}
This partially works. On my pages I have several repeaters; their content does not show up in pageHtml. Any thoughts on why that is? How should I fix this?
It turns out that I had to render the control after the data had been bound to it. My click method just adds an event handler:
protected void ConvertRepeaterToPDF_click(object sender, EventArgs e)
{
// handle the PreRender complete method
Page.PreRenderComplete += new EventHandler(Page_OnPreRenderComplete);
}
Then in Page_OnPreRenderComplete I can use RenderControl as above.
I'm trying to make my own webbrowser with C#,
my wpf application seems to be correct. but it's still missing something.
the webpage doesn't appear. :s
Does someone have an idea?
Here's my code in C# :
public partial class Window1 : Window
{
public Window1()
{
InitializeComponent();
}
private void textBox1_TextChanged(object sender, TextChangedEventArgs e)
{
}
private void button1_Click(object sender, RoutedEventArgs e)
{
WebBrowser web = new WebBrowser();
web.NavigateToString (textBox1.Text);
}
Thanks for your help.
As I understand, you are instantiating a new WebBrowser control in code and you aren't adding it as a control to the actual form. You'd better add the control in design view and just do the method call in the code.
When you create the WebBrowser, try adding a third line:
WebBrowser web = new WebBrowser();
Content = web; // extra line
web.NavigateToString (textBox1.Text);
If the textbox is your address bar, it won't work. NavigateToString will interpret what's in your textbox as literal HTML.
web.NavigateToString (textBox1.Text);
should be
web.Source = new Uri(textBox1.Text, UriKind.Absolute);