Cefsharp Load Html page issue - c#

I recently started working with Cefsharp browser in winforms by using the Load method some time its working fine but some times iam not able to render my html file Can some please help me.
BrowserSettings settings = new BrowserSettings();
Cef.Initialize(new CefSettings());
CefSharp.WinForms.ChromiumWebBrowser webBrowser = new CefSharp.WinForms.ChromiumWebBrowser(string.Empty);
webBrowser.Load(#"C:\kiranprac\CEFExample\CEFExample\HTMLResources\html\RTMTables_GetOrder.html");
OrderDetailsPnl.Controls.Add(webBrowser);

This is one of many timing issues in Chromium. You sometimes have to wait until the browser finishes the previous step before issuing another command.
In this case, you are constructing the browser with "about:blank", and then changing URL straight afterwards.
The easiest solution here is to supply your URL in the ChromiumWebBrowser constructor instead of calling Load separately.

when you create browser obj, give a valid url.
then load your html text right after. it works at cef v49!.
this works:
var browser = new ChromiumWebBrowser("http://google.com"); //workaround!! yess!!!
var htmlText = "<html>hello world- this my html</html>"
browser.LoadHtml(htmlText, "http://example/");
this doesnt work:
var browser = new ChromiumWebBrowser("randomstring"); // silent failll
var htmlText = "<html>hello world- this my html</html>"
browser.LoadHtml(htmlText, "http://example/");

Related

HtmlAgilityPack Not Finding Specific Node That Should Be There

I'm loading a URL and am looking for a specific node that should exist in the HTML doc but it is returning null every time. In fact, every node that I try to find is returning null. I have used this same code on other web pages but for some reason in this instance it isn't working. Could the HtmlDoc be loading something different than the source I see in my browser?
I'm obviously new to web scraping but have run into this kind of problem multiple times where I have to make an elaborate workaround because I'm unable to select a node that I can see in my browser. Is there something fundamentally wrong with how I'm going about this?
string[] arr = { "abercrombie", "adt" };
for(int i=0;i<1;i++)
{
string url = #"https://www.google.com/search?rlz=1C1CHBF_enCA834CA834&ei=lsfeXKqsCKOzggf9ub3ICg&q=" + arr[i] + "+ticker" + "&oq=abercrombie+ticker&gs_l=psy-ab.3..35i39j0j0i22i30l2.102876.105833..106007...0.0..0.134.1388.9j5......0....1..gws-wiz.......0i71j0i67j0i131j0i131i67j0i20i263j0i10j0i22i10i30.3zqfY4KZsOg";
HtmlWeb web = new HtmlWeb();
var htmlDoc = web.Load(url);
var node = htmlDoc.DocumentNode.SelectSingleNode("//span[#class = 'HfMth']");
Console.WriteLine(node.InnerHtml);
}
UPDATE
Thanks to RobertBaron for pointing me in the right direction. Here is a great copy paste solution.
The page that you are trying to scrape has javascript code that runs to load the entire contents of the page. Because your browser runs that javascript, you see the entire contents of the page. The HtmlWeb.Load() does not run any javascript code and so you only see a partial page.
You can use the WebBrowser control to scrape that page. Just like your browser, it will run any javascript code, and the entire page will be loaded. There are several stack overflow articles that show how to do this. Here are some of them.
WebBrowser Control in a new thread
Perform screen-scape of Webbrowser control in thread
How to cancel Task await after a timeout period
That content is dynamically added and not present in what is returned via your current method + url; which is why your xpath is unsuccessful. You can check what is returned with, for example:
var node = htmlDoc.DocumentNode.SelectSingleNode("//*");
Selecting something which is present for your first url - to show you can select a node
var node = htmlDoc.DocumentNode.SelectSingleNode("//span[#class = 'st']");
You can use developer tools > network tab > to see if any specific dynamic content you are after is available by a separate xhr request url.

Download the html source code to a string from a open website in IE with C#?

I been looking all over for this answer but can't find it anywhere..
This is what I want to be able to do:
I have a form application where i have a button that says "collect html code". When I press this button I want C# to download the HTML source code from the website I'm currently on (using IE). I've been using this code:
WebClient web = new WebClient();
string html = web.DownloadString("www.example.com");
But now I don't want to specify the URL in my code! And I don't want to use a webbrowser in my application.
Anyone got a solution?
Thanks!
With this code you can get IE7 and later version URL in opened tabs :
SHDocVw.ShellWindows allBrowsers = new SHDocVw.ShellWindows();
foreach (SHDocVw.InternetExplorer ieInst in allBrowsers )
{
String url = ieInst.LocationURL;
// do your stuff
}
So you can access the urls and do your stuff with WebClient class.
You need to add a reference to a COM component called Microsoft Internet Controls
You are talking about getting URLs from IE window? If so, here you are:
var urls = (new SHDocVw.ShellWindows()).Cast<SHDocVw.InternetExplorer>().
Select(x => x.LocationURL).ToArray();
Don't forget to add COM reference "Microsoft Internet Controls" in your project.

Download PDF file from WebPage WatIn

I have a website (Bank WebSite) i using WatIn to Login and getting to page with links(with pdf files), each link open a Page with opened pdf.file,on that page i have only the opened pdf file and button to download this file(no need to click on it because page automaticlly popUp message with save\saveAs)
I tried:
1- string page=browser.body.OuterHtml
Not working i cant see the Iframe,i cant find it too.
2-int response = URLDownloadToFile(0, Link, FullFilePath, 0, 0);
Not working a gettin login page it because i need cookies
3- WebClient myWebClient = new WebClient();
myWebClient.DownloadFile(myStringWebResource,fileName);
Gives me the same result.
I CAN'T GET COOKIES FROM WatIn Browser and SET IT IN WebClient
CookieCollection cookies = _browser.GetCookiesForUrl(new Uri(url));
string cookies= ie.Eval("document.cookie");
returns my only 1 parameter
sow please do not say to me that i just need to get cokies from WatIn and set it in myWebClient.
Sow any ideas how can i save this pdf file?
One option would be using iTextSharp library, which would give a list helpful methods to download the PDF. Sample code is below...
Uri uri = new Uri("browser url");
PdfReader reader = new PdfReader(uri);

How to get complete HTML content using WatiN properties

I am using the WatiN tool in my ASP.NET web application. I want the complete HTML content of the IE which I had launched. IE class in WatiN provides a property called "Html", which returns only the body content of the HTML source. What is the way to get the head tag content also along with it?
Here is my source code:
IE myIE = new IE();
myIE.GoTo("http://wsspg.dequecloud.com/worldspace/wsservice/eval/checkCompliance.jsp");
myIE.TextField(Find.ByName("txtUrl")).TypeText("http://rockycode.com/blog/watin-and-xpath-support/");
myIE.Button(Find.ByValue("Generate Report")).Click();
myIE.WaitForComplete();
var myHtml = myIE.Html;
Don't know why, but WatiN doesn't give you direct access to the head or html elements. But you can still get to them!
using (IE myIE = new IE())
{
myIE.GoTo("http://wsspg.dequecloud.com/worldspace/wsservice/eval/checkCompliance.jsp");
myIE.TextField(Find.ByName("txtUrl")).TypeText("http://rockycode.com/blog/watin-and-xpath-support/");
myIE.Button(Find.ByValue("Generate Report")).Click();
myIE.WaitForComplete();
string myHtml = myIE.Body.Parent.OuterHtml;
}
I've not seen anything which does exactly what you want, but here's how you'd retrieve the head element:-
ElementContainer<Element> head = (ElementContainer<Element>)myIE.Element(Find.By("TagName", "HEAD"));

download the html code rendered by asp.net web sites

I have to download and parse a website which is rendered by ASP.NET. If I use the code below I only get half of the page without the rendered "content" that I need. I would like to get the full content that I can see with Firebug or the IE Developer Tool.
How can I do this. I didn#t find a solution.
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(URL);
HttpWebResponse response = (HttpWebResponse)req.GetResponse();
StreamReader streamReader = new StreamReader(response.GetResponseStream());
string code = streamReader.ReadToEnd();
Thank you!
UPDATE
I tried the webcontrol solution. But it didn't work. I have in a WPF Project and use the following code and don't even get the content of a website. I don't see my mistake right now :( .
System.Windows.Forms.WebBrowser webBrowser = new System.Windows.Forms.WebBrowser();
Uri uri = new Uri(myAdress);
webBrowser.AllowNavigation = true;
webBrowser.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wb_DocumentCompleted);
webBrowser.Navigate(uri);
private void wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
System.Windows.Forms.WebBrowser wb = sender as System.Windows.Forms.WebBrowser;
string tmp = wb.DocumentText;
}
UPDATE 2
That's the code I came up with in the meantime.
However I don't get any output. My elementCollection doesn't return any values.
If I can get the html source as a string I'd be happy and parse it with the HtmlAgilityPack.
(I don't want to incoporate the browser into my XMAL code)
Sorry for getting on your nerves!
Thank you!
WebBrowser wb = new WebBrowser();
wb.Source = new Uri(MyURL);
HTMLDocument doc = (HTMLDocument)wb.Document;
IHTMLElementCollection elementCollection = doc.getElementsByName("body");
foreach (IHTMLElementCollection element in elementCollection)
{
tb.Text = element.toString();
}
If the page you're referring to has IFrames or other dynamic loading mechanisms, the use of HTTPWebRequest would'nt be enough. a better solution would be (if possible) to use a WebBrowser control
The answer might be that the content of the web site is rendered with JavaScript - probably with some AJAX calls that fetch additional data from the server to build the content. Firebug and IE Developer Tool will show you the rendered html code, but if you choose 'view source', you should see the same same html as the one that you fetch with the code.
I would use a tool like the Fiddler Web Debugger to monitor what the page downloads when it is rendered. You might be able to get the needed content by simulating the AJAX requests that the page makes.
Note that it can be a b*tch to simulate browsing ASP.NET web site if the navigation has been made with post backs, because you will need to include the value of all the form elements (including the hidden view state) when simulation clicks on links.
Probably not an answer, but you might use the WebClient class to simplify your code:
WebClient client = new WebClient();
string html = client.DownloadString(URL);
Your code should be downloading the entire page. However, the page may, through JavaScript, add content after it's been loaded. Unless you actually run that JavaScript in a web browser, you won't see the entire DOM you see in Firebug.
You can try this:
public override void Render(HtmlTextWriter writer):
{
StringBuilder renderedOutput = new StringBuilder();
Streamwriter strWriter = new StringWriter(renderedOutput);
HtmlTextWriter tWriter = new HtmlTextWriter(strWriter);
base.Render(tWriter);
string html = tWriter.InnerWriter.ToString();
string filename = Server.MapPath(".") + "\\data.txt";
outputStream = new FileStream(filename, FileMode.Create);
StreamWriter sWriter = new StreamWriter(outputStream);
sWriter.Write(renderedOutput.ToString());
sWriter.Flush();
//render for output
writer.Write(renderedOutput.ToString());
}
I will recommend you to use following rendering engine instead of the Web Browser
https://github.com/cefsharp/CefSharp

Categories