I have this html doc:
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<form class="Form" onsubmit="return checkForm(this);" id="Form" method="post">
//form body
</form>
</body>
</html>
This is stack trace:
at System.Xml.XmlValidatingReaderImpl.ValidationEventHandling.System.Xml.IValidationEventHandling.SendEvent(Exception exception, XmlSeverityType severity)
at System.Xml.Schema.BaseValidator.SendValidationEvent(String code, String arg)
at System.Xml.Schema.DtdValidator.ProcessElement()
at System.Xml.Schema.DtdValidator.Validate()
at System.Xml.XmlValidatingReaderImpl.Read()
at System.Xml.XmlReader.MoveToContent()
at System.ServiceModel.Channels.Message.CreateMessage(MessageVersion version, String action, XmlDictionaryReader body)
at Renault.LMT.ServiceModel.Dispatcher.ServerMessageFormatter.SerializeReply(MessageVersion messageVersion, Object[] parameters, Object result)
Code, error occurs in the last line:
MemoryStream MemoryStreamm = new MemoryStream(Encoding.UTF8.GetBytes((MessageBody)));
MemoryStreamm.Position = 0;
XmlReaderSettings settingsReader = new XmlReaderSettings();
settingsReader.DtdProcessing = DtdProcessing.Parse;
settingsReader.ValidationType = ValidationType.DTD;
settingsReader.XmlResolver = null;
XmlReader reader = XmlReader.Create(MemoryStreamm, settingsReader);
MessageResponse = Message.CreateMessage(messageVersion, string.Format("ServiceModel/ILMTService/{0}", Operation), reader);
According to http://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.xmlresolver.aspx , it doesn't look like a good idea to set the XmlResolver to null. It's likely that the DTD can't be loaded so it can't match any element, the first of which is html.
I strongly recommend that you store a copy of the DTD locally, and implement an XmtResolver that, when the DTD is requested, returns that local copy. You should always do this for DTDs and XML Schemas because many servers providing these files severely throttle the number of requests from any one location.
Related
I'm trying to get data from inside a div of a public website.
The Selenium WebDriver doesn't seem to find any elements. I tried to find elements with id and class even with a XPath but still didn't find anything.
I can see the html page code when looking at PageSource, this confirms the driver works. What am I doing wrong here? Selenium V2.53.1 // IEDriverServer Win32 v2.53.1
My code:
IWebDriver driver = new InternetExplorerDriver("C:\\Program Files\\SeleniumWebPagetester");
driver.Navigate().GoToUrl("D:\\test.html");
await Task.Delay(30000);
var src = driver.PageSource; //shows the html page -> works
var ds = driver.FindElement(By.XPath("//html//body")); //NoSuchElementException
var test = driver.FindElement(By.Id("aspnetForm")); //An unhandled exception of type 'OpenQA.Selenium.NoSuchElementException' occurred in WebDriver.dll
var testy = driver.FindElement(By.Id("aspnetForm"), 30); //'OpenQA.Selenium.NoSuchElementException'
var tst = driver.FindElement(By.XPath("//*[#id=\"lx-home\"]"), 30); //'OpenQA.Selenium.NoSuchElementException'
driver.Quit();
Simple HTML page:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
</head>
<body>
<form action="#" id="aspnetForm" onsubmit="return false;">
<section id="lx-home" style="margin-bottom:50px;">
<div class="bigbanner">
<div class="splash mc">
<div class="bighead crb">LEAD DELIVERY MADE EASY</div>
</div>
</div>
</section>
</form>
</body>
</html>
Side note, my XPath works perfect with HtmlWeb:
string Url = "D:\\test.html";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var element = doc.DocumentNode.SelectNodes("//*[#id=\"lx-home\"]"); //WORKS
It seems IE parses the local file in different way, so you cannot access DOM. Here are your options:
Use Chrome instead of IE
Keep using IE, move the file to C:\inetpub\wwwroot then change your code to open URL instead of localfile: driver.Navigate().GoToUrl("http://localhost/test.html");
I am parsing a HTML file from my storage folder. I am going to parse to get some values.
StorageFile store = await appfolder.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
doc.LoadXml(content);
XmlNodeList names = doc.GetElementsByTagName("img");
I am getting Exception in LoadXml(content) line.
"An exception of type 'System.Exception' occurred in IMG.exe but was not handled in user code,
Additional information: Exception from HRESULT: 0xC00CE584"
I tried this answer But not yet worked for me.link
This is some part from my HTML file.
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<meta name="generator" content="Web Books Publishing" />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Main Text</title>
</head>
<body>
<div>
<div class="figcenter">
<img src="images/img2.jpg" alt="Cinderella" title="" />
</div>
I checked some files which I want to work with, not yet fine.
I want to know there is any other way to getting HTML values.
Thanks,
You HTML is not well formed according to W3Schools
Try with this
StorageFile store = await appfolder.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
XmlLoadSettings loadSettings = new XmlLoadSettings();
loadSettings.ProhibitDtd = false;
doc.LoadXml(content, loadSettings);
XmlNodeList names = doc.GetElementsByTagName("img");
UPDATE 1
Here's my working code
StorageFile store = await Windows.ApplicationModel.Package.Current.InstalledLocation.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
XmlLoadSettings loadSettings = new XmlLoadSettings();
loadSettings.ProhibitDtd = false;
doc.LoadXml(content, loadSettings);
XmlNodeList names = doc.GetElementsByTagName("img");
UPDATE 2
replace to , it worked for me.
I want to insert, update data from the fusion tables.
While selecting from the fusion table all seems to work fine. But during row addition i need to used OAuth 2.0 but unable to find a suitable solution to get the access token and use it during the insert.
A code sample would help a lot.
var fusiondata;
function initialize() {
// Initialize JSONP request
var script = document.createElement('script');
var url = ['https://www.googleapis.com/fusiontables/v1/query?'];
url.push('sql=');
var query = 'insert into 1bPbx7PVJU9NaxgAGKqN2da4g5EbXDybE_UVvlAE (name,luckynumber) values('abc',89)';
var encodedQuery = encodeURIComponent(query);
url.push(encodedQuery);
url.push('&callback=viewData');
url.push('&key=AIzaSyA0FVy-lEr_MPGk1p_lHSrxGZDcxy6wH4o');
script.src = url.join('');
var body = document.getElementsByTagName('body')[0];
body.appendChild(script);
}
function viewData(data) {
// code not required
}
I know most of you are suffering for google auth and inserting and updating fusion table. I am providing the entire code how to use the gauth lib to insert in a simple manner
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Authorization Request</title>
<script src="https://apis.google.com/js/client.js"></script>
<script type="text/javascript">
function auth() {
var config = {
'client_id': '365219651081-7onk7h52kas6cs5m17t1api72ur5tcrh.apps.googleusercontent.com',
'scope': 'https://www.googleapis.com/auth/fusiontables'
};
gapi.auth.authorize(config, function() {
console.log('login complete');
console.log(gapi.auth.getToken());
});
}
function insert_row(){
alert("insert called");
gapi.client.setApiKey('AIzaSyA0FVy-lEr_MPGk1p_lHSrxGZDcxy6wH4o');
var query = "INSERT INTO 1T_qE-o-EtX24VZASFDn6p3mMoPcWQ_GyErJpPIc(Name, Age) VALUES ('Trial', 100)";
gapi.client.load('fusiontables', 'v1', function(){
gapi.client.fusiontables.query.sql({sql:query}).execute(function(response){console.log(response);});
});
}
</script>
</head>
<body>
<button onclick="auth();">Authorize</button>
<p> </p>
<button onclick="insert_row();">Insert Data</button>
</body>
</html>
I am using Html Agility Pack to get the info about each product on the page:
http://www.hobbyking.com/hobbyking/store/_57_191__Planes_ARF_RTF_KIT-All_Models.html
My code is this but the node is returning null.I am using the Xpath found using Google Chrome.
private void getDataBtn_Click(object sender, EventArgs e)
{
if (URL != null)
{
HttpWebRequest request;
HttpWebResponse response;
StreamReader sr;
List<string> Items = new List<string>(50);
HtmlAgilityPack.HtmlDocument Doc = new HtmlAgilityPack.HtmlDocument();
request = (HttpWebRequest)WebRequest.Create(URL);
response = (HttpWebResponse)request.GetResponse();
sr = new StreamReader(response.GetResponseStream());
Doc.Load(sr);
var Name = Doc.DocumentNode.SelectSingleNode("/html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[3]/a");
}
}
What am I doing wrong? Is there any other tool which can create agility pack compatible xpath expressions?
Because there is no such node in this page. when you download it by agility pack (not by the browser) the page has this text:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>HobbyKing Page not found.</title>
</head>
<body>
<img src="http://www.hobbyking.com/hk_logo.gif"><br>
<span style="font-family:Verdana, Arial, Helvetica, sans-serif">
<strong>Page no longer available</strong><br>
It seems you have landed on a page that doesnt exist anymore.<br>
Please update your links to point towards the correct hobbyking.com location;<br>
http://www.hobbyking.com/hobbyking/store/uh_index.asp<br>
<br>
If you continue to see this message, please email support#hobbyking.zendesk.com</span>
</body>
</html>
You can see in the page the following sentences:
"Please update your links to point towards the correct hobbyking.com location;
http://www.hobbyking.com/hobbyking/store/uh_index.asp"
p.s. you can see it by checking it in the debug of visual-studio.
So I have an aspx page that servers XML + XSL to a client and does a client-side transform which works fine.
I am trying to detect the client and if they don't support client side transformation I am doing it serverside. I am interrupting the render processor the aspx page that would return XML and I am getting it's output, combining it with the output from the XSL page and serving it out. This output however is not well formed. I get
XML Parsing Error: mismatched tag. Expected: </link>.
Location: http://oohrl.com/dashboard.aspx
Line Number 36, Column 20: </script></head>
-------------------^
In the client side generated output, which works fine, I get for instance
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" type="text/css" href="./css/dboard.css"/>
<link rel="stylesheet" type="text/css" href="./css/dboardmenu.css"/>
<script type="text/javascript" src="./js/simpletabs.js"/>
<link href="../css/simpletabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6/jquery.min.js"/>
<script type="text/javascript">
$(document).ready(function () {
$("#BlogSelectList li a").live("click", function () {
var str = ($(this).attr("href")).slice(1, 37)
$.ajax({
contentType: "application/json; charset=utf-8",
url: '../ws/WebServices.asmx/SetActiveBlog',
data: '{ActiveBlogID: "' + str + '"}',
dataType: 'json',
type: "post",
success: function (j) {
window.location.href = 'dashboard.aspx'
}
});
});
})
function showlayer(layer) {
var myLayer = document.getElementById(layer);
if (myLayer.style.display == "none" || myLayer.style.display == "") {
myLayer.style.display = "block";
}
else {
myLayer.style.display = "none";
}
}
</script></head>
If I generate it server side I get
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-16">
<link rel="stylesheet" type="text/css" href="./css/dboard.css">
<link rel="stylesheet" type="text/css" href="./css/dboardmenu.css">
<script type="text/javascript" src="./js/simpletabs.js"></script>
<link href="../css/simpletabs.css" rel="stylesheet" type="text/css">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6/jquery.min.js">
</script>
<script type="text/javascript">
$(document).ready(function () {
$("#BlogSelectList li a").live("click", function () {
var str = ($(this).attr("href")).slice(1, 37)
$.ajax({
contentType: "application/json; charset=utf-8",
url: '../ws/WebServices.asmx/SetActiveBlog',
data: '{ActiveBlogID: "' + str + '"}',
dataType: 'json',
type: "post",
success: function (j) {
window.location.href = 'dashboard.aspx'
}
});
});
})
function showlayer(layer) {
var myLayer = document.getElementById(layer);
if (myLayer.style.display == "none" || myLayer.style.display == "") {
myLayer.style.display = "block";
}
else {
myLayer.style.display = "none";
}
}
</script></head>
Which gives me the error. Of course I notice the difference in the <link/> vs <link> tag but I have no idea why the server side processing engine give me different results or how to fix it?
Here is the code I use to generate the XHTML on the server
protected override void Render(HtmlTextWriter writer)
{
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
HtmlTextWriter hWriter = new HtmlTextWriter(sw);
base.Render(hWriter);
// *** store to a string
string XMLOutput = sb.ToString();
// *** Write it back to the server
if (!Request.Browser.IsBrowser("IE"))
{
writer.Write(XMLOutput);
}
else
{
StringWriter XSLsw = new StringWriter();
HttpContext.Current.Server.Execute("DashboardXSL.aspx", XSLsw);
string output = String.Empty;
using (StringReader srt = new StringReader(XSLsw.ToString())) // xslInput is a string that contains xsl
using (StringReader sri = new StringReader(XMLOutput)) // xmlInput is a string that contains xml
{
using (XmlReader xrt = XmlReader.Create(srt))
using (XmlReader xri = XmlReader.Create(sri))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(xrt);
using (StringWriter _sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(_sw, xslt.OutputSettings)) // use OutputSettings of xsl, so it can be output as HTML
{
xslt.Transform(xri, xwo);
output = _sw.ToString();
}
}
}
writer.Write(output);
Response.Flush();
Response.End();
}
Because the root element of your output document is <html>, the processor chooses HTML as the default format. To create a well-formed XHTML document instead, make sure your XSLT contains the following as a child of the root <xsl:stylesheet> or <xsl:transform> element:
<xsl:output method="xml" omit-xml-declaration="yes" />
I had to set the content type on the xsl sheet to text/html which fixed all problems.
Note this change is ONLY used when transforming server side. When sending it to the client for a client transformation it is not changed to text/html