I have the following code in my C# windows app which places the data from my webbrowser control into the clipboard. However when I come to pasting this into MSWord it pastes the HTML markup rather than the contents of the page.
Clipboard.SetDataObject(WebBrowser.DocumentText, true);
Any Idea how I can get around this?
OK this feels like a dirty hack, but it solves my problem:
WebBrowser1.Document.ExecCommand("SelectAll", false, null);
WebBrowser1.Document.ExecCommand("Copy", false, null);`
Another option would be to capture an image of the page, rather than the html and paste that into the document. I don't think the WebBrowser control can handle this, but Watin can. Watin's (http://watin.sourceforge.net/) capturewebpagetofile() function works well for this functionality. I have had to use this instead of capturing HTML because outlook cannot format HTML well at all.
string allText = WebBrowser1.DocumentText;
will return you all currently laoded document markup. Is it what are you lookin for?
I guess that happens because what the webbrowser actually contains is the markup, not all the images etc.
You might be best to use the webbrowser to save the full page to disk, and then use word to open that. That way it'll all be available locally for IE to use. Just means you have to clean up afterwards though.
The link below has some stuff about saving using the webbrowser in c#
http://www.c-sharpcorner.com/UploadFile/mahesh/WebBrowserInCS12072005232330PM/WebBrowserInCS.aspx
Related
I am attempting to get the resulting web page content so I can extract the display text. I have attempted the code below but it gets me the source html and not the resulting html.
string urlPath = "http://www.cbsnews.com/news/jamar-clark-protests-follow-decision-not-to-file-charges-in-minneapolis-police-shooting/";
WebClient client = new WebClient();
string str = client.DownloadString(urlPath);
Compare the text in the str variable with the html in the Developer Tools in the Chrome browser and you will get different results.
Any recommendations will be appreciated.
I'm assuming you mean that you want the article text. If so you will need to follow a different course of action. The page you refer to is loaded with client script that injects loads of content into the base HTML document. This is done by executing the client-side script. You will need to parse the DOM after the script is executed to get the content you're interested in.
As others have pointed out, an actual web browser will parse the downloaded HTML and execute javascript against it, potentially altering its content. While you could try to do that parsing yourself, the easiest route is to ask a real web browser to do it for you and then grab the results.
The easiest solution specifically in C# would be to use the WebBrowser Control from Windows Forms, which essentially exposes IE to your program, allowing you to control it. Use the Navigate method to load the URL in question, then use the Document property to navigate the DOM. You can, at that point, get the outerHTML to get the final content of the DOM as HTML.
If you're not writing a Windows program and are interested more in headless operation, have a look at PhantomJS. It's a headless Webkit browser that is scriptable from javascript and would give you similar capability, although not in C#.
I have a C# project. I use Awesomium webbrowser. I want to get only the source code of any url. Because i don't need to show the page content in my program. If i could do this, i will save time. Is it any possibilities for that?
If you want to just get source of loaded page into Awesomium's webControl, you can do this with webControl.HTML property, but you need to wait till document is loaded: DocumentReadyState.Loaded
Or you can see what is between html tags: webControl.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");
I have HTML generated on a form. If you click a button, the HTML is generated and is currently placed in a ASP Textbox with just the text
<!doctype html><html>....</html>
and I want to automatically print this HTML (including the styling that is only for the generated body).
How can I do this?
To print your code you need to do several things
Tell the browser what to print, here's a few ideas, or create a new window with only the stuff to print.
Find a way to show the plain code, like using <pre> tags or the mentioned CodeMirror
Launch the print dialog, a simple javascript basically something like window.print()
There will be no automatic printing (unless you target IE with some black magic) as that would be insane to allow on the web.
I hope I understand your question correctly. CodeMirror is the one you might want to try. It displays and formats HTML text with colors.
http://codemirror.net/mode/htmlmixed/index.html
You will need to trigger the print request using Javascript, but you can set this off using C# (look at the ClientScriptManager for usage).
The window.print() function is what you need.
You will presumably want to restrict what you print, take a look at this SO post for ideas.
I have webBrowser component and I would like to save modified HTML code to file.
I don't know if you understood me but browser navigates to one page, receives HTML + JS and then JS modifies HTML code, now I need to save that modified HTML code.
I have tried to use DocumentText but form result I get it outputs original HTML code not HTML code modified by JS.
Does anyone know how to solve this problem?
A lot of developer plug-ins (Firebug or Firefox or Developer tools for IE or Chrome) will allow you to see the updated HTML.
You can use outerHTML of an element you are interested in (i.e. BODY).
Look at methods of HTmlDocument like http://msdn.microsoft.com/en-us/library/system.windows.forms.htmldocument.getelementsbytagname.aspx and HtmlElement - http://msdn.microsoft.com/en-us/library/system.windows.forms.htmlelement.outerhtml.aspx
it might sound stupid, but I need a way to take an XML file / string, and show it to a user in a form.
I'm currently trying to use the WebBrowser control, but its Document field is read only. I tried setting DocumentText instead but it seems to be accepting HTML only. What control should I use? It can be anything in WinForms or Infragistics.
Also, if there's a .NET XML parser, I'd love to know.
Thanks.
Give XmlVisualizer a try. Also you might want to take a look at the following posts:
http://www.dotnettutorials.com/tutorials/xml/winform-filter-xml-cs.aspx
Show XML file in WinForms app with IE-like coloring and collapsing nodes
Try setting the DocumentText instead. See this post for further details.
Load the document with XmlDocument.Load(). Then use XmlDocument.OuterXml...