Webbrowser control is not showing Html but shows webpage - c#

I am automating a task using webbrowser control , the site display pages using frames.
My issue is i get to a point , where i can see the webpage loaded properly on the webbrowser control ,but when it gets into the code and i see the html i see nothing.
I have seen other examples here too , but all of those do no return all the browser html.
What i get by using this:
HtmlWindow frame = webBrowser1.Document.Window.Frames[1];
string str = frame.Document.Body.OuterHtml;
Is just :
The main frame tag with attributes like SRC tag etc, is there any way how to handle this?Because as i can see the webpage completely loaded why do i not see the html?AS when i do that on the internet explorer i do see the pages source once loaded why not here?
ADDITIONAL INFO
There are two frames on the page :
i use this to as above:
HtmlWindow frame = webBrowser1.Document.Window.Frames[0];
string str = frame.Document.Body.OuterHtml;
And i get the correct HTMl for the first frame but for the second one i only see:
<FRAMESET frameSpacing=1 border=1 borderColor=#ffffff frameBorder=0 rows=29,*><FRAME title="Edit Search" marginHeight=0 src="http://web2.westlaw.com/result/dctopnavigation.aspx?rs=WLW12.01&ss=CXT&cnt=DOC&fcl=True&cfid=1&method=TNC&service=Search&fn=_top&sskey=CLID_SSSA49266105122&db=AK-CS&fmqv=s&srch=TRUE&origin=Search&vr=2.0&cxt=RL&rlt=CLID_QRYRLT803076105122&query=%22LAND+USE%22&mt=Westlaw&rlti=1&n=1&rp=%2fsearch%2fdefault.wl&rltdb=CLID_DB72585895122&eq=search&scxt=WL&sv=Split" frameBorder=0 name=TopNav marginWidth=0 scrolling=no><FRAME title="Main Document" marginHeight=0 src="http://web2.westlaw.com/result/dccontent.aspx?rs=WLW12.01&ss=CXT&cnt=DOC&fcl=True&cfid=1&method=TNC&service=Search&fn=_top&sskey=CLID_SSSA49266105122&db=AK-CS&fmqv=s&srch=TRUE&origin=Search&vr=2.0&cxt=RL&rlt=CLID_QRYRLT803076105122&query=%22LAND+USE%22&mt=Westlaw&rlti=1&n=1&rp=%2fsearch%2fdefault.wl&rltdb=CLID_DB72585895122&eq=search&scxt=WL&sv=Split" frameBorder=0 borderColor=#ffffff name=content marginWidth=0><NOFRAMES></NOFRAMES></FRAMESET>
UPDATE
The two url of the frames are as follows :
Frame1 whose html i see
http://web2.westlaw.com/nav/NavBar.aspx?RS=WLW12.01&VR=2.0&SV=Split&FN=_top&MT=Westlaw&MST=
Frame2 whose html i do not see:
http://web2.westlaw.com/result/result.aspx?RP=/Search/default.wl&action=Search&CFID=1&DB=AK%2DCS&EQ=search&fmqv=s&Method=TNC&origin=Search&Query=%22LAND+USE%22&RLT=CLID%5FQRYRLT302424536122&RLTDB=CLID%5FDB6558157526122&Service=Search&SRCH=TRUE&SSKey=CLID%5FSSSA648523536122&RS=WLW12.01&VR=2.0&SV=Split&FN=_top&MT=Westlaw&MST=
And the properties of the second frame whose html i do not get are in the picture below:
Thank you

I paid for the solution of the question above and it works 100 %.
What i did was use this function below and it returned me the count to the tag i was seeking which i could not find :S.. Use this to call the function listed below:
FillFrame(webBrowser1.Document.Window.Frames);
private void FillFrame(HtmlWindowCollection hwc)
{
if (hwc == null) return;
foreach (HtmlWindow hw in hwc)
{
HtmlElement getSpanid = hw.Document.GetElementById("mDisplayCiteList_ctl00_mResultCountLabel");
if (getSpanid != null)
{
doccount = getSpanid.InnerText.Replace("Documents", "").Replace("Document", "").Trim();
break;
}
if (hw.Frames.Count > 0) FillFrame(hw.Frames);
}
}
Hope it helps people .
Thank you

For taking html you have to do it that way:
WebClient client = new WebClient();
string html = client.DownloadString(#"http://stackoverflow.com");
That's an example of course, you can change the address.
By the way, you need using System.Net;

This works just fine...gets BODY element with all inner elements:
Somewhere in your Form code:
wb.Url = new Uri("http://stackoverflow.com");
wb.DocumentCompleted += new WebBrowserDocumentCompletedEventHandler(wbDocumentCompleted);
And here is wbDocumentCompleted:
void wb1DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
var yourBodyHtml = wb.Document.Body.OuterHtml;
}
wb is System.Windows.Forms.WebBrowser
UPDATE:
The same as for the document, I think that your second frame is not loaded at the time you check for it's content...You can try solutions from this link. You will have to wait for your frames to be loaded in order to see its content.

The most likely reason is that frame index 0 has the same domain name as the main/parent page, while the frame index 1 has a different domain name. Am I correct?
This creates a cross-frame security issue, and the WB control just leaves you high and dry and doesn't tell you what on earth went wrong, and just leaves your objects, properties and data empty (will say "No Variables" in the watch window when you try to expand the object).
The only thing you can access in this situation is pretty much the URL and iFrame properties, but nothing inside the iFrame.
Of course, there are ways to overcome teh cross-frame security issues - but they are not built into the WebBrowser control, and they are external solutions, depending on which WB control you are using (as in, .NET version or pre .NET version).
Let me know if I have correctly identified your problem, and if so, if you would like me to tell you about the solution tailored to your setup & instance of the WB control.
UPDATE: I have noticed that you're doing a .getElementByTagName("HTML")(0).outerHTML to get the HTML, all you need to do is call this on the document object, or the .body object and that should do it. MyDoc.Body.innerHTML should get the the content you want. Also, notice that there are additional iFrames inside these documents, in case that is of relevance. Can you give us the main document URL that has these two URL's in it so we / I can replicate what you're doing here? Also, not sure why you are using DomElement but you should just cast it to the native object it wants to be cast to, either a IHTMLDocument2 or the object you see in the watch window, which I think is IHTMLFrameElement (if i recall correctly, but you will know what i mean once you see it). If you are trying to use an XML object, this could be the reason why you aren't able to get the HTML content, change the object declaration and casting if there is one, and give it a go & let us know :). Now I'm curious too :).

Related

Why is the Url value in C# WebBrowser always null?

I am trying to load a local HTML file into an instance of C# WebBrowser (WinForms).
This is what I am doing:
string url = #"file:///C:MyHtml/hello.html";
myWebbrowser.Url = new Uri(url, UriKind.Absolute);
object test = myWebbrowser.Url; // breakpoint here
The path above is correct; if I copy it and paste into an external browser, the file is immediately opened. But the instance of WebBrowser does not want to react. I set a breakpoint in the last line of the snippet, and what I get there is that myWebbrowser.Url is null (the test variable). The control remains correspondingly empty.
myWebbrowser.AllowNavigation is explicitly set to true. I have also tried all possible versions of slashes and backslashes; the result is always the same. The version of the webbrowser seems to be 11 (myWebbrowser.Version = "{11.0.18362.1139}"). I am working in Windows 10, VS 2019.
What can be wrong in this setup?
The path above is correct; if I copy it and paste into an external browser, the file is immediately opened. But the instance of WebBrowser does not want to react. I set a breakpoint in the last line of the snippet, and what I get there is that myWebbrowser.Url is null.
I was able to replicate this exact issue, it's because the property of the Url doesn't get actually set until the document has actually finished loading.
To resolve this issue, you must handle the DocumentCompleted event. You can do so for example:
string url = #"file:///C:MyHtml/hello.html";
myWebbrowser.DocumentCompleted += MyWebbrowser_DocumentCompleted;
myWebbrowser.Url = new Uri(url, UriKind.Absolute);
Create a new routine to handle the DocumentCompleted event:
private void MyWebbrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
string test = myWebbrowser.Url.ToString();
}
You can also get the Url from the WebBrowserDocumentCompletedEventArgs:
string testUrl = e.Url.ToString();
I am not sure exactly why when setting the Url and then checking it, it is null, I haven't found anything to explain why. My only guess is that it may be an invalid Url and or path, if navigation succeeds then that property is set.
Edit: upon looking at the source for WebBrowserDocumentCompleted, it does seem the Url property is only set in the DocumentCompleted, you can see more there.
Please note: you must register the DocumentCompleted event first before setting the Url property as when you do, it will navigate first and you will not receive the DocumentCompleted event.
I was able to find out why it did not want to function, at least I hope so. The "hello.html" file contained calls to jquery and THREE.js, while WebBroser seems not to support the latter. Therefore I did not see any content in the control. After I threw out THREE.js and inserted most simple HTML code, it worked just OK! Now I am busy trying to bring WebBrowser to support THREE.js (there exists a skeptical opinion about this, though).

Cannot access same source HTML as a browser

I am coming back to work on a BOT that scraped data from a site once a day for my personal use.
However they have changed the code during COVID and now it seems they are loading in a lot of the content with Ajax/JavaScript.
I thought that if I did a WebRequest and obtained the response HTML from a URL, it should match the same content that I see in a browser (FF/Chrome) when I right click and "view source". I thought the actual DOM and generated source code would come later when those files were loaded as onload events fired, scripts lazily loaded and so on.
However the source HTML I obtain with my BOT is NOT the same as the HTML I see when viewing the source code. So my regular expressions that find certain links are not available to me.
Why am I seeing a difference between "view source" and a download of the HTML?
I can only think that when the page loads, SCRIPTS run that load other content into the page and that when I view source I am actually seeing a partial generated source rather than the original source code. Therefore is there a way I can call the page with my BOT, wait X seconds before obtaining the response to get this "onload" generated HTML?
Or even better a way for MY BOT (not using someone elses), to view generated source.
This BOT runs as a web service. I can find another site to scrape but it's just painful when I have all the regular expressions working on the source I see, except it's NOT the source my BOT obtains.
A bit confused at why my browser is showing me more content with a view source (not generated source), than my BOT gets when making a valid request.
Any help would be much appreciated this is almost an 8 year project that I have been doing on/off and this change has ruined one of the core parts of the system.
In response to OP's comment, here is the Java code for how to click at different parts on the screen to do this:
You could use Java's Robot class. I just learned about it a few days ago:
// Import
import java.awt.Robot;
// Code
void click(int x, int y, int btn) {
Robot robot = new Robot();
robot.mouseMove(x, y);
robot.mousePress(btn);
robot.mouseRelease(btn);
}
You would then run the click function with the x and y position to click, as well as the button (MouseEvent.BUTTON1, MouseEvent.BUTTON2, etc.)
After stringing together the right positions (this will vary depending on the screen) you could do just about anything.
To use shortcuts, just use the keyPress and keyRelease functions. Here is a good way to do this:
void key(int keyCode, boolean ctrl, boolean alt, boolean shift) {
if (ctrl)
robot.keyPress(KeyEvent.VK_CONTROL);
if (alt)
robot.keyPress(KeyEvent.VK_ALT);
if (shift)
robot.keyPress(KeyEvent.VK_SHIFT);
robot.keyPress(keyCode);
robot.keyRelease(keyCode);
if (ctrl)
robot.keyRelease(KeyEvent.VK_CONTROL);
if (alt)
robot.keyRelease(KeyEvent.VK_ALT);
if (shift)
robot.keyRelease(KeyEvent.VK_SHIFT);
}
Thus, something like Ctrl+Shift+I to open the inspect menu would look like this:
key(KeyEvent.VK_I, true, false, true);
Here are the steps to copy a website's code (from the inspector) with Google Chrome:
Ctrl + Shift + I
Right click the HTML tag
Select "Edit as HTML"
Ctrl + A
Ctrl + C
Then, you can use the technique from this StackOverflow to get the content from the clipboard:
Clipboard c = Toolkit.getDefaultToolkit().getSystemClipboard();
String text = (String) c.getData(DataFlavor.stringFlavor);
Using something like FileOutputStream to put the info into a file:
FileOutputStream output = new FileOutputStream(new File( PATH HERE ));
output.write(text.getBytes());
output.close();
I hope this helps!
I have seemed to have fixed it by just turning on the ability to store cookies in my custom HTTP (Bot/Scraper) class, that was being called from the class trying to obtain the data. Probably the site has a defense system to prevent visitors requesting pages and not the JS/CSS with a different session ID on each request.
However I would like to see some other examples because if it is just cookies then they could use JavaScript to test for JavaScript e.g an AJAX call to log if JS is actually on or some DOM manipulation to determine if you are really Human or not which would break it again.
Every site uses different methods to prevent scrapers, email harvesters, job rapists, link harvesters etc inc working out the standard time between requests for 100% verifiable humans and BOTS and then using those values to help determine spoofed user-agents etc. I wrote a whole system to stop BOTS at my last place of work and its a layered approach, just glad the cookies being enabled solved it on this site but it could easily be beefed up with other tricks to test for BOTS vs HUMANS.
I do know some Java, enough to work out what is going on anyway. My BOT is in C#.

c# fail to get element in content page with getelementbyid, getelementsbytagname

I have posted the same question but I post it again since I haven't got any answers to that post yet.
I am trying to get some information (such as tagName, id using GetElementsByTagName method or GetElementById method) from a content page in a website using winforms.
as you see the pictures attached, no matter which selection you make (select1, select2, select3 etc) web address stays same. however, contents under those selections are different in content page.
I am trying to access to a tagName(or id) from one of them(not selections but contents under a specific selection).
I have debugged and figured out(or seems like) I can not access to tagName(or id) from any of those contents under a specific selection.
It seems like I can only access tagName(or id) from main page. picture 3 will help better explanation of some terms such as main page, content page.
I tried to explain in detail, if my question seems still not clear, let me know plz.
My code looks like this.
var countGetFile = webBrowser1.Document.GetElementsByTagName("IFRAME");
foreach (HtmlElement l in countGetFile)
{
if (l.GetAttribute("width").Equals("100%"))
{
MessageBox.Show(l.GetAttribute("height").ToString());
MessageBox.Show(l.GetAttribute("outerText").ToString());
}
}
I was not able to grab information under 2 down level of #document from html.
html looks something like
...
<src="..." id="A" ... >
#document
...
<src="..." id="B" ... >
#document
...
<span="C" ...>
...
I could grab span information (third curly brackets) with codes looking like
HtmlWindow frame1 = webBrowser1.Document.GetElementById("A").Document.Window.Frames["A"];
HtmlWindow frame2 = frame1.Document.GetElementById("B").Document.Window.Frames["B"];
foreach (HtmlElement elm in frame2.Document.All)
{
if (elm.GetAttribute("tagName").Equals("C"))
{
// your command
}
}
to use Document.Window.Frames you need a header using "System.Collections";
btw, there is a problem. When I try to access to the information in third curly bracket, I need to do some kinds of work between frame1 and frame2 such as delaying for frame2 to have enough time to be able to access to next level after frame1.
I figured a kind of hack to get it through. Place a messagebox to pop up for short time delay, or place a delay function( not freeze ) with async code looking like,
async Task PutTaskDelay()
{
await Task.Delay(5000);//5 secs
}
I just found a temporary solution for accessing to second level. I will appreciate anyone who knows some ways to solve this problem.

Can not take a full page screenshot in Applitools C#

I'm trying to take a full screen screenshot of my page by using this code:
public void OpenEyesForVisualTesting(string testName) {
this.seleniumDriver.Driver = this.eyes.Open(this.seleniumDriver.Driver, "Zinc", testName);
}
public void CheckScreenForVisualTesting() {
this.eyes.Check("Zinc", Applitools.Selenium.Target.Window().Fully());
}
public void CloseEyes() {
this.eyes.close();
}
but instead I just get a half a page of the screenshot, I tried to contact Applitools but they just told me to replace eyes.checkwindow() to eyes.Check("tag", Target.Window().Fully()); which still didn't work.
If anyone can help me that would be great.
I work for Applitools and sorry for your troubles. Maybe you did not see our replies or they went to your spam folder. You need to set ForceFullPageScreenshot = true and StitchMode = StitchModes.CSS in order to capture a full page screenshot.
The below code example is everything you'd need to do in order to capture a full page image. Also, please make sure your .Net Eyes.Sdk version is >= 2.6.0 and Eyes.Selenium >= 2.5.0.
If you have any further questions or still encountering issues with this please feel free to email me directly. Thanks.
var eyes = new Eyes();
eyes.ApiKey = "your-api-key";
eyes.ForceFullPageScreenshot = true;
eyes.StitchMode = StitchModes.CSS;
eyes.Open(driver, "Zinc", testName, new Size(800, 600)); //last parameter is your viewport size IF testing on a browser. Do not set if testing on a mobile devices.
eyes.CheckWindow("Zinc");
//or new Fluet API method
eyes.Check("Zinc", Applitools.Selenium.Target.Window().Fully();
eyes.Close();
Use extent Reporting library, in that you can take screen shot as well as an interactive report on pass and fail cases.
here is the link how it works:
http://www.softwaretestingmaterial.com/generate-extent-reports/
If you want to take complete page Screenshot/ Particular element with Applitools. You can use lines of code:-
Full Screenshot
eyes.setHideScrollbars(false);
eyes.setStitchMode(StitchMode.CSS);
eyes.setForceFullPageScreenshot(true);
eyes.checkWindow();
Take screenshot for particular element
WebElement Button = driver.findElement(By.xpath("//button[#id='login-btn']"));
eyes.setHideScrollbars(false);
eyes.setStitchMode(StitchMode.CSS);
eyes.open(driver, projectName, testName);
eyes.setMatchLevel(MatchLevel.EXACT);
eyes.checkElement(Button );
(I am from Applitools.)
When setting forcefullpagescreenshoot to true, applitools will try to scroll the HTML element in the page, if the HTML is not the scrollable element then you will need to set it yourself:
eyes.Check(Target.Window().ScrollRootElement(selector))
In Firefox, you can see a scroll tag near elements that are scrollable.

I must be a heretic for wanting a C# browser with both NewWindow2 and GetElementsByTagName

You can't have your cake and eat it too, apparently.
I'm currently using the System.Windows.Forms.WebBrowser in my application. The program currently depends on using the GetElementsByTagName function. I use it to gather up all the elements of a certain type (either "input"s or "textarea"s), so I can sort through them and return the value of a specific one. This is the code for that function (my WebBrowser is named web1):
// returns the value from a element.
public String FetchValue(String strTagType, String strName)
{
HtmlElementCollection elems;
HtmlDocument page = web1.Document.Window.Frames[1].Document;
elems = page.GetElementsByTagName(strTagType);
foreach (HtmlElement elem in elems)
{
if (elem.GetAttribute("name") == strName ||
elem.GetAttribute("ref") == strName)
{
if (elem.GetAttribute("value") != null)
{
return elem.GetAttribute("value");
}
}
}
return null;
}
(points to note: the webpage I need to pull from is in a frame, and depending on circumstances, the element's identifying name will be either in the name or the ref attribute)
All of that works like a dream with the System.Windows.Forms.WebBrowser.
But what it is unable to do, is redirect the opening of a new window to remain in the application. Anything that opens in a new window shoots to the user's default browser, thus losing the session. This functionality can be easily fixed with the NewWindow2 event, which System.Windows.Forms.WebBrowser doesn't have.
Now forgive me for being stunned at its absence. I have but recently ditched VB6 and moved on to C# (yes VB6, apparently I am employed under a rock), and in VB6, the WebBrowser possessed both the GetElementsByTagName function and the NewWindow2 event.
The AxSHDocVw.WebBrowser has a NewWindow2 event. It would be more than happy to help me route my new windows to where I need them. The code to do this in THAT WebBrowser is (frmNewWindow being a simple form containing only another WebBrowser called web2 (Dock set to Fill)):
private void web1_NewWindow2(
object sender,
AxSHDocVw.DWebBrowserEvents2_NewWindow2Event e)
{
frmNewWindow frmNW = new frmNewWindow();
e.ppDisp = frmNW.web2.Application;
frmNW.web2.RegisterAsBrowser = true;
frmNW.Visible = true;
}
I am unable to produce on my own a way to replicate that function with the underwhelming regular NewWindow event.
I am also unable to figure out how to replicate the FetchValue function I detailed above using the AxSHDocVw.WebBrowser. It appears to go about things in a totally different way and all my knowledge of how to do things is useless.
I know I'm a sick, twisted man for this bizarre fantasy of using these two things in a single application. But can you find it in your heart to help this foolish idealist?
I could no longer rely on the workaround, and had to abandon System.Windows.Forms.WebBrowser. I needed NewWindow2.
I eventually figured out how to accomplish what I needed with the AxWebBrowser. My original post was asking for either a solution for NewWindow2 on the System.Windows.Forms.WebBrowser, or an AxWebBrowser replacement for .GetElementsByTagName. The replacement requires about 4x as much code, but gets the job done. I thought it would be prudent to post my solution, for later Googlers with the same quandary. (also in case there's a better way to have done this)
IHTMLDocument2 webpage = (IHTMLDocument2)webbrowser.Document;
IHTMLFramesCollection2 allframes = webpage.frames;
IHTMLWindow2 targetframe = (IHTMLWindow2)allframes.item("name of target frame");
webpage = (IHTMLDocument2)targetframe.document;
IHTMLElementCollection elements = webpage.all.tags("target tagtype");
foreach (IHTMLElement element in elements)
{
if (elem.getAttribute("name") == strTargetElementName)
{
return element.getAttribute("value");
}
}
The webbrowser.Document is cast into an IHTMLDocument2, then the IHTMLDocument2's frames are put into a IHTMLFramesCollection2, then I cast the specific desired frame into an IHTMLWindow2 (you can choose frame by index # or name), then I cast the frame's .Document member into an IHTMLDocument2 (the originally used one, for convenience sake). From there, the IHTMLDocument2's .all.tags() method is functionally identical to the old WebBrowser.Document.GetElementsByTagName() method, except it requires an IHTMLElementCollection versus an HTMLElementCollection. Then, you can foreach the collection, the individual elements needing to be IHTMLElement, and use .getAttribute to retrieve the attributes. Note that the g is lowercase.
The WebBrowser control can handle the NewWindow event so that new popup windows will be opened in the WebBrowser.
private void webBrowser1_NewWindow(object sender, CancelEventArgs e)
{
// navigate current window to the url
webBrowser1.Navigate(webBrowser1.StatusText);
// cancel the new window opening
e.Cancel = true;
}
http://social.msdn.microsoft.com/Forums/en-US/csharpgeneral/thread/361b6655-3145-4371-b92c-051c223518f2/
The only solution to this I have seen was a good few years ago now, called csExWb2, now on Google code here.
It gives you an ExWebBrowser control, but with full-on access to all the interfaces and events offered by IE. I used it to get deep and dirty control of elements in a winforms-hosted html editor.
It may be a bit of a leap jumping straight into that, mind.

Categories