Xpath returning null with html agility pack - c#

I am using Html Agility Pack to get the info about each product on the page:
http://www.hobbyking.com/hobbyking/store/_57_191__Planes_ARF_RTF_KIT-All_Models.html
My code is this but the node is returning null.I am using the Xpath found using Google Chrome.
private void getDataBtn_Click(object sender, EventArgs e)
{
if (URL != null)
{
HttpWebRequest request;
HttpWebResponse response;
StreamReader sr;
List<string> Items = new List<string>(50);
HtmlAgilityPack.HtmlDocument Doc = new HtmlAgilityPack.HtmlDocument();
request = (HttpWebRequest)WebRequest.Create(URL);
response = (HttpWebResponse)request.GetResponse();
sr = new StreamReader(response.GetResponseStream());
Doc.Load(sr);
var Name = Doc.DocumentNode.SelectSingleNode("/html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[3]/a");
}
}
What am I doing wrong? Is there any other tool which can create agility pack compatible xpath expressions?

Because there is no such node in this page. when you download it by agility pack (not by the browser) the page has this text:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>HobbyKing Page not found.</title>
</head>
<body>
<img src="http://www.hobbyking.com/hk_logo.gif"><br>
<span style="font-family:Verdana, Arial, Helvetica, sans-serif">
<strong>Page no longer available</strong><br>
It seems you have landed on a page that doesnt exist anymore.<br>
Please update your links to point towards the correct hobbyking.com location;<br>
http://www.hobbyking.com/hobbyking/store/uh_index.asp<br>
<br>
If you continue to see this message, please email support#hobbyking.zendesk.com</span>
</body>
</html>
You can see in the page the following sentences:
"Please update your links to point towards the correct hobbyking.com location;
http://www.hobbyking.com/hobbyking/store/uh_index.asp"
p.s. you can see it by checking it in the debug of visual-studio.

Related

ASP.NET Core Razor Page: Write html content to new browser tab?

Using HttpClient, I make a GET request which returns a html response.
Request to url in method handler:
var response = string.Empty;
var result = await _httpClient.CreateClient().GetAsync(newUrl);
if (result.IsSuccessStatusCode)
{
response = await result.Content.ReadAsStringAsync();
var bytesResponse = Encoding.ASCII.GetBytes(response)
// How to open bytesResponse in a new browser tab?
}
Html response content from request:
<!DOCTYPE html>
<html lang="en">
<head>
<style>
</style>
</head>
<body>
<!-- body content -->
<script>
</script>
</body>
</html>
Attempting to add await Response.Body.WriteAsync(bytesResponse, 0, bytesResponse.Length); will throw an System.ObjectDisposedException: IFeatureCollection has been disposed. exception
Would appreciate assistance on how to display the response in a new browser tab.
Firstly,if you want to open a new browser tag,it cannot be done in razor handler.It need to be done in view.You need to decorate your link with the "_blank" attribute like this:
<a asp-page-handler="xxxxx" target="_blank">link</a>
Here is a demo worked:
test.cshtml:
<a asp-page-handler="Html" target="_blank">link</a>
test.cshtml.cs:
public IActionResult OnGet()
{
return Page();
}
public ActionResult OnGetHtml()
{
return base.Content("<div>Hello</div>", "text/html");
}
result:

Webdriver doesn't find any elements by id or class NoSuchElementException

I'm trying to get data from inside a div of a public website.
The Selenium WebDriver doesn't seem to find any elements. I tried to find elements with id and class even with a XPath but still didn't find anything.
I can see the html page code when looking at PageSource, this confirms the driver works. What am I doing wrong here? Selenium V2.53.1 // IEDriverServer Win32 v2.53.1
My code:
IWebDriver driver = new InternetExplorerDriver("C:\\Program Files\\SeleniumWebPagetester");
driver.Navigate().GoToUrl("D:\\test.html");
await Task.Delay(30000);
var src = driver.PageSource; //shows the html page -> works
var ds = driver.FindElement(By.XPath("//html//body")); //NoSuchElementException
var test = driver.FindElement(By.Id("aspnetForm")); //An unhandled exception of type 'OpenQA.Selenium.NoSuchElementException' occurred in WebDriver.dll
var testy = driver.FindElement(By.Id("aspnetForm"), 30); //'OpenQA.Selenium.NoSuchElementException'
var tst = driver.FindElement(By.XPath("//*[#id=\"lx-home\"]"), 30); //'OpenQA.Selenium.NoSuchElementException'
driver.Quit();
Simple HTML page:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
</head>
<body>
<form action="#" id="aspnetForm" onsubmit="return false;">
<section id="lx-home" style="margin-bottom:50px;">
<div class="bigbanner">
<div class="splash mc">
<div class="bighead crb">LEAD DELIVERY MADE EASY</div>
</div>
</div>
</section>
</form>
</body>
</html>
Side note, my XPath works perfect with HtmlWeb:
string Url = "D:\\test.html";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var element = doc.DocumentNode.SelectNodes("//*[#id=\"lx-home\"]"); //WORKS
It seems IE parses the local file in different way, so you cannot access DOM. Here are your options:
Use Chrome instead of IE
Keep using IE, move the file to C:\inetpub\wwwroot then change your code to open URL instead of localfile: driver.Navigate().GoToUrl("http://localhost/test.html");

Getting HTML values in Store apps

I am parsing a HTML file from my storage folder. I am going to parse to get some values.
StorageFile store = await appfolder.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
doc.LoadXml(content);
XmlNodeList names = doc.GetElementsByTagName("img");
I am getting Exception in LoadXml(content) line.
"An exception of type 'System.Exception' occurred in IMG.exe but was not handled in user code,
Additional information: Exception from HRESULT: 0xC00CE584"
I tried this answer But not yet worked for me.link
This is some part from my HTML file.
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<meta name="generator" content="Web Books Publishing" />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Main Text</title>
</head>
<body>
<div>
<div class="figcenter">
<img src="images/img2.jpg" alt="Cinderella" title="" />
</div>
I checked some files which I want to work with, not yet fine.
I want to know there is any other way to getting HTML values.
Thanks,
You HTML is not well formed according to W3Schools
Try with this
StorageFile store = await appfolder.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
XmlLoadSettings loadSettings = new XmlLoadSettings();
loadSettings.ProhibitDtd = false;
doc.LoadXml(content, loadSettings);
XmlNodeList names = doc.GetElementsByTagName("img");
UPDATE 1
Here's my working code
StorageFile store = await Windows.ApplicationModel.Package.Current.InstalledLocation.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
XmlLoadSettings loadSettings = new XmlLoadSettings();
loadSettings.ProhibitDtd = false;
doc.LoadXml(content, loadSettings);
XmlNodeList names = doc.GetElementsByTagName("img");
UPDATE 2
replace to &nbsp;, it worked for me.

"The 'html' element is not declared." in XmlValidatingReader

I have this html doc:
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<form class="Form" onsubmit="return checkForm(this);" id="Form" method="post">
//form body
</form>
</body>
</html>
This is stack trace:
at System.Xml.XmlValidatingReaderImpl.ValidationEventHandling.System.Xml.IValidationEventHandling.SendEvent(Exception exception, XmlSeverityType severity)
at System.Xml.Schema.BaseValidator.SendValidationEvent(String code, String arg)
at System.Xml.Schema.DtdValidator.ProcessElement()
at System.Xml.Schema.DtdValidator.Validate()
at System.Xml.XmlValidatingReaderImpl.Read()
at System.Xml.XmlReader.MoveToContent()
at System.ServiceModel.Channels.Message.CreateMessage(MessageVersion version, String action, XmlDictionaryReader body)
at Renault.LMT.ServiceModel.Dispatcher.ServerMessageFormatter.SerializeReply(MessageVersion messageVersion, Object[] parameters, Object result)
Code, error occurs in the last line:
MemoryStream MemoryStreamm = new MemoryStream(Encoding.UTF8.GetBytes((MessageBody)));
MemoryStreamm.Position = 0;
XmlReaderSettings settingsReader = new XmlReaderSettings();
settingsReader.DtdProcessing = DtdProcessing.Parse;
settingsReader.ValidationType = ValidationType.DTD;
settingsReader.XmlResolver = null;
XmlReader reader = XmlReader.Create(MemoryStreamm, settingsReader);
MessageResponse = Message.CreateMessage(messageVersion, string.Format("ServiceModel/ILMTService/{0}", Operation), reader);
According to http://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.xmlresolver.aspx , it doesn't look like a good idea to set the XmlResolver to null. It's likely that the DTD can't be loaded so it can't match any element, the first of which is html.
I strongly recommend that you store a copy of the DTD locally, and implement an XmtResolver that, when the DTD is requested, returns that local copy. You should always do this for DTDs and XML Schemas because many servers providing these files severely throttle the number of requests from any one location.

Problem with Crystal Report in ASP.NET - ExportToHttpResponse

I am using the code to export pdf file from a popup.
On button click
function popupReport()
{
var url = 'Report.aspx';
window.open(url, 'winPopupReport', 'width=300,height=300,resizable=no,scrollbars=no,toolbar=no,directories=no,status=no,menubar=no,copyhistory=no');
return false;
}
and in Report.aspx.cs
ReportDocument repDoc = ( ReportDocument ) System.Web.HttpContext.Current.Session["StudyReportCrystalDocument"];
// Stop buffering the response
Response.Buffer = false;
// Clear the response content and headers
Response.ClearContent();
Response.ClearHeaders();
try
{
repDoc.ExportToHttpResponse( CrystalDecisions.Shared.ExportFormatType.PortableDocFormat, Response, true, "StudyReport" );
}
catch( Exception ex )
{
}
The code works fine in IE7. But in IE6 the popup window is not closing. Why this happends?
Some browser deny automatic closing for web pages in some conditions.
Try this workround to close a page.
Write a script, in the page that you want to close, that opens another page; in this sample the script is injected via code after a button click, but you can write it directly in HTML if you need it.
ClientScript.RegisterStartupScript(typeof(Page), "closePage", "window.open('Success.htm', '_self', null);", true);
Create Success.htm page this way
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<title></title>
<script language="javascript" type="text/javascript">
var redirectTimerId = 0;
function closeWindow() {
window.opener = top;
redirectTimerId = window.setTimeout('redirect()', 2000);
window.close();
}
function stopRedirect() {
window.clearTimeout(redirectTimerId);
}
function redirect() {
window.location = 'default.aspx';
}
</script>
</head>
<body onload="closeWindow()" onunload="stopRedirect()" style="">
<center><h1>Please Wait...</h1></center>
</body></html>

Categories