Awesomium - How to get only source code? - c#

I have a C# project. I use Awesomium webbrowser. I want to get only the source code of any url. Because i don't need to show the page content in my program. If i could do this, i will save time. Is it any possibilities for that?

If you want to just get source of loaded page into Awesomium's webControl, you can do this with webControl.HTML property, but you need to wait till document is loaded: DocumentReadyState.Loaded
Or you can see what is between html tags: webControl.ExecuteJavascriptWithResult("document.getElementsByTagName('html')[0].innerHTML");

Related

WebView2 show html variable instead of url

Hey guys I have a program that still uses the old WebBrowser Xaml control. Now I wanted to exchange this on WebView2 here I have problems. Currently, the body of an email is pulled from a database as html code and written to a variable, which is linked to the WebBrowser control. Only WebView2 wants a URL to be passed in the source function. How can I pass a variable containing HTML code instead of a link to a web page? Or are there other alternatives for this?
I would suggest you try NavigateToString
await webView21.EnsureCoreWebView2Async();
webView21.NavigateToString(anEmailHTMLVariable);

I can not see whole html source code by pressing ctrl+U

I am working on scrapping of website. so i make one desktop application for that.
I check website using inspect element then i can see whole data of website but when i try to check website data using page source(ctrl+U) then there is nothing.
means i can't find any website data in page source but can see in firebug(inspect element).
because of this when i am trying to get data using c# coding then i am getting only page source data which doesn't contains any website data only contains schema(structure) and js links.
see below image of firebug.
And this is page source image.
You met the js-powered site. The content is dynamicly loaded thru js, thus it's not visible in page-source. Turn to the scrape libraries that support js code evaluation. See here an example.

Find URL Responses? Alternative To Default WebBrowser Control?

Hello guys I have an issue bugging me for the past few weeks.
What I'm trying to accomplish: I need a webbrowser control with the ability to change user agent (once at start) and referrer. But most important The ability to see the urls responses. What I mean by that for example if you navigate to a website you get back Images/Javascripts files/Dyanmic URLS in response I need access to those urls which some of them have dynamic variables (Regular Webbrowser Control will not show you those & you can't access it in any way beside using fiddler core).
I was able to do that with webbrowser + fiddlercore I can see and do what ever with those urls addresses. The problem was if you run few instances of this program (or sometimes once if the program has some automation to work with the url responses) It gets stuck or doesn't work. I tried fixing it and making it work but it's kind of a hacky solution that doesn't work right. I need a simple way to access those urls just as if you used httpwebrequest but as a webbrowser. Why I need it as a webbrowser? The way I work I need the execution of all the tracking pixels and scripts and images etc.. a normal webbrowser behaivor in httpwebrequest you can't just navigate and all the scripts will be execute as webbrowser, or can you?
Using the System.Windows.Forms.WebBrowser control in a WinForms app, set the webBrowser.URL property to the URL of the page you're interested in.
The webbrowser's DocumentCompleted event fires after the page has loaded. Any dynamically loaded JavaScript should be done by then. Hook the DocumentCompleted event and use the webbrowser.Document.Images to get a list of all image elements on the page. From those images you can get their SRC attributes which contains their URLs including any query parameters hanging off the end. You can use webbrowser.Document.Links to get a list of all hyperlinks on the page. For other HTML elements of interest, you can use GetElementsByTagName("foo") to fetch all elements with that tag name from the page, then dig into their attributes to pull out URL properties.
With webbrowser.Document you can get to any HTML element, whether it is statically or dynamically created.
What you can't get to through webbrower.Document is data that is loaded asynchronously using XMLHttpRequest(), because this data is not part of the browser Document Object Model. Web pages with scripted false buttons will be difficult to intercept.
However, if you know where the data is stored by the JavaScript executing on the page, you may be able to access it using webbrowser.Document.InvokeScript(). If the JavaScript on the page stores URLs in a mydata property of the window object, for example, you could try webbrowser.Document.InvokeScript("window.mydata") or some variation to retrieve the value of mydata into the C# app.

webbrowser midified HTML code c#

I have webBrowser component and I would like to save modified HTML code to file.
I don't know if you understood me but browser navigates to one page, receives HTML + JS and then JS modifies HTML code, now I need to save that modified HTML code.
I have tried to use DocumentText but form result I get it outputs original HTML code not HTML code modified by JS.
Does anyone know how to solve this problem?
A lot of developer plug-ins (Firebug or Firefox or Developer tools for IE or Chrome) will allow you to see the updated HTML.
You can use outerHTML of an element you are interested in (i.e. BODY).
Look at methods of HTmlDocument like http://msdn.microsoft.com/en-us/library/system.windows.forms.htmldocument.getelementsbytagname.aspx and HtmlElement - http://msdn.microsoft.com/en-us/library/system.windows.forms.htmlelement.outerhtml.aspx

Copying from WebBrowser Control

I have the following code in my C# windows app which places the data from my webbrowser control into the clipboard. However when I come to pasting this into MSWord it pastes the HTML markup rather than the contents of the page.
Clipboard.SetDataObject(WebBrowser.DocumentText, true);
Any Idea how I can get around this?
OK this feels like a dirty hack, but it solves my problem:
WebBrowser1.Document.ExecCommand("SelectAll", false, null);
WebBrowser1.Document.ExecCommand("Copy", false, null);`
Another option would be to capture an image of the page, rather than the html and paste that into the document. I don't think the WebBrowser control can handle this, but Watin can. Watin's (http://watin.sourceforge.net/) capturewebpagetofile() function works well for this functionality. I have had to use this instead of capturing HTML because outlook cannot format HTML well at all.
string allText = WebBrowser1.DocumentText;
will return you all currently laoded document markup. Is it what are you lookin for?
I guess that happens because what the webbrowser actually contains is the markup, not all the images etc.
You might be best to use the webbrowser to save the full page to disk, and then use word to open that. That way it'll all be available locally for IE to use. Just means you have to clean up afterwards though.
The link below has some stuff about saving using the webbrowser in c#
http://www.c-sharpcorner.com/UploadFile/mahesh/WebBrowserInCS12072005232330PM/WebBrowserInCS.aspx

Categories