I am parsing a HTML file from my storage folder. I am going to parse to get some values.
StorageFile store = await appfolder.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
doc.LoadXml(content);
XmlNodeList names = doc.GetElementsByTagName("img");
I am getting Exception in LoadXml(content) line.
"An exception of type 'System.Exception' occurred in IMG.exe but was not handled in user code,
Additional information: Exception from HRESULT: 0xC00CE584"
I tried this answer But not yet worked for me.link
This is some part from my HTML file.
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<meta name="generator" content="Web Books Publishing" />
<link rel="stylesheet" type="text/css" href="style.css" />
<title>Main Text</title>
</head>
<body>
<div>
<div class="figcenter">
<img src="images/img2.jpg" alt="Cinderella" title="" />
</div>
I checked some files which I want to work with, not yet fine.
I want to know there is any other way to getting HTML values.
Thanks,
You HTML is not well formed according to W3Schools
Try with this
StorageFile store = await appfolder.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
XmlLoadSettings loadSettings = new XmlLoadSettings();
loadSettings.ProhibitDtd = false;
doc.LoadXml(content, loadSettings);
XmlNodeList names = doc.GetElementsByTagName("img");
UPDATE 1
Here's my working code
StorageFile store = await Windows.ApplicationModel.Package.Current.InstalledLocation.GetFileAsync("01MB154.html");
string content = await FileIO.ReadTextAsync(store);
XmlDocument doc = new XmlDocument();
XmlLoadSettings loadSettings = new XmlLoadSettings();
loadSettings.ProhibitDtd = false;
doc.LoadXml(content, loadSettings);
XmlNodeList names = doc.GetElementsByTagName("img");
UPDATE 2
replace to , it worked for me.
Related
I'm trying to get data from inside a div of a public website.
The Selenium WebDriver doesn't seem to find any elements. I tried to find elements with id and class even with a XPath but still didn't find anything.
I can see the html page code when looking at PageSource, this confirms the driver works. What am I doing wrong here? Selenium V2.53.1 // IEDriverServer Win32 v2.53.1
My code:
IWebDriver driver = new InternetExplorerDriver("C:\\Program Files\\SeleniumWebPagetester");
driver.Navigate().GoToUrl("D:\\test.html");
await Task.Delay(30000);
var src = driver.PageSource; //shows the html page -> works
var ds = driver.FindElement(By.XPath("//html//body")); //NoSuchElementException
var test = driver.FindElement(By.Id("aspnetForm")); //An unhandled exception of type 'OpenQA.Selenium.NoSuchElementException' occurred in WebDriver.dll
var testy = driver.FindElement(By.Id("aspnetForm"), 30); //'OpenQA.Selenium.NoSuchElementException'
var tst = driver.FindElement(By.XPath("//*[#id=\"lx-home\"]"), 30); //'OpenQA.Selenium.NoSuchElementException'
driver.Quit();
Simple HTML page:
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
</head>
<body>
<form action="#" id="aspnetForm" onsubmit="return false;">
<section id="lx-home" style="margin-bottom:50px;">
<div class="bigbanner">
<div class="splash mc">
<div class="bighead crb">LEAD DELIVERY MADE EASY</div>
</div>
</div>
</section>
</form>
</body>
</html>
Side note, my XPath works perfect with HtmlWeb:
string Url = "D:\\test.html";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var element = doc.DocumentNode.SelectNodes("//*[#id=\"lx-home\"]"); //WORKS
It seems IE parses the local file in different way, so you cannot access DOM. Here are your options:
Use Chrome instead of IE
Keep using IE, move the file to C:\inetpub\wwwroot then change your code to open URL instead of localfile: driver.Navigate().GoToUrl("http://localhost/test.html");
I am using Html Agility Pack to get the info about each product on the page:
http://www.hobbyking.com/hobbyking/store/_57_191__Planes_ARF_RTF_KIT-All_Models.html
My code is this but the node is returning null.I am using the Xpath found using Google Chrome.
private void getDataBtn_Click(object sender, EventArgs e)
{
if (URL != null)
{
HttpWebRequest request;
HttpWebResponse response;
StreamReader sr;
List<string> Items = new List<string>(50);
HtmlAgilityPack.HtmlDocument Doc = new HtmlAgilityPack.HtmlDocument();
request = (HttpWebRequest)WebRequest.Create(URL);
response = (HttpWebResponse)request.GetResponse();
sr = new StreamReader(response.GetResponseStream());
Doc.Load(sr);
var Name = Doc.DocumentNode.SelectSingleNode("/html/body/table[2]/tbody/tr/td[2]/table/tbody/tr[2]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[1]/td[3]/a");
}
}
What am I doing wrong? Is there any other tool which can create agility pack compatible xpath expressions?
Because there is no such node in this page. when you download it by agility pack (not by the browser) the page has this text:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>HobbyKing Page not found.</title>
</head>
<body>
<img src="http://www.hobbyking.com/hk_logo.gif"><br>
<span style="font-family:Verdana, Arial, Helvetica, sans-serif">
<strong>Page no longer available</strong><br>
It seems you have landed on a page that doesnt exist anymore.<br>
Please update your links to point towards the correct hobbyking.com location;<br>
http://www.hobbyking.com/hobbyking/store/uh_index.asp<br>
<br>
If you continue to see this message, please email support#hobbyking.zendesk.com</span>
</body>
</html>
You can see in the page the following sentences:
"Please update your links to point towards the correct hobbyking.com location;
http://www.hobbyking.com/hobbyking/store/uh_index.asp"
p.s. you can see it by checking it in the debug of visual-studio.
I have a like button in my page. On clicking the Button, I am trying to send the following tags information in the facebook...
<meta property="og:title" content="Title" />
<meta property="og:description" content="Description" />
<meta property="og:url" content="url info" />
<meta property="og:image" content="image url" />
Following is my Like Button Frame
<iframe frameborder="0" scrolling="no" allowtransparency="true"
style="border: none; overflow: hidden; width: 260px; height: 35px;"
src="http://www.facebook.com/plugins/like.php?
href=http://localhost:49334/WebForm1.aspx&send=false&
layout=button_count&width=100&show_faces=false&
action=like&colorscheme=light&font=arial&height=35">
</iframe>
Following is the first Approach to dynamically handle the Meta Tags.
var fbTitleTag = new MetaTag
{
AgentPageURL = "/",
MetaTagName = "og:title",
UserSiteName = CurrentAgent.UserSiteName,
MetaTagContent = Request.Cookies.Get("MasterTitle").Value
};
var fbDesc = new MetaTag
{
AgentPageURL = "/",
MetaTagName = "og:description",
UserSiteName = CurrentAgent.UserSiteName,
MetaTagContent = Request.Cookies.Get("MasterDescription").Value
};
var fbUrl = new MetaTag
{
AgentPageURL = "/",
MetaTagName = "og:url",
UserSiteName = CurrentAgent.UserSiteName,
MetaTagContent = Request.Cookies.Get("MasterURL").Value
};
var fbImage = new MetaTag
{
AgentPageURL = "/",
MetaTagName = "og:image",
UserSiteName = CurrentAgent.UserSiteName,
MetaTagContent = Request.Cookies.Get("MasterImage").Value
};
var tags = new MetaTagCollection { fbTitleTag, fbDesc, fbUrl, fbImage };
Literal ltMetaTags = null;
ltMetaTags = (Literal)this.Master.FindControl("ltMetaTags");
MetaTags(tags, "wsws", "/", ltMetaTags, true);
public static void MetaTags(MetaTagCollection MetaTags, string name, string strRawURL, Literal ltlMetaHolders, bool isProp)
{
// ltlMetaHolders.Text = "";
foreach (AgentMetaTag oAgentMetaTag in agentMetaTags)
{
if (string.Compare(strRawURL, oAgentMetaTag.AgentPageURL, true) == 0)
{
if (oAgentMetaTag.MetaTagName.ToLower().Trim() != "footer" && oAgentMetaTag.MetaTagName.ToLower().Trim() != "title")
{
if (oAgentMetaTag.MetaTagName.ToLower().Trim() == "fbtitle")
oAgentMetaTag.MetaTagName = "title";
RenderMetaTagByContentName(ltlMetaHolders, oAgentMetaTag.MetaTagName, oAgentMetaTag.MetaTagContent, isProp);
}
}
}
}
public static void RenderMetaTagByContentName(Literal ltlMetaHolder, string contentName, string content, bool isProp)
{
var metaTagFromat = isProp ? "<meta property=\"{0}\" content=\"{1}\" />" : "<meta name=\"{0}\" content=\"{1}\" /> ";
ltlMetaHolder.Text += string.Format(metaTagFromat, contentName, content);
}
Following is the second Approach to dynamically handle the Meta Tags.
HtmlMeta tag = new HtmlMeta();
tag.Attributes.Add("property", "og:title");
tag.Content = "Title";
Page.Header.Controls.Add(tag);
HtmlMeta tag1 = new HtmlMeta();
tag1.Attributes.Add("property", "og:description");
tag1.Content = "Desc";
Page.Header.Controls.Add(tag1);
HtmlMeta tagurl = new HtmlMeta();
tagurl.Attributes.Add("property", "og:url");
tagurl.Content = "URL info";
Page.Header.Controls.Add(tagurl);
HtmlMeta tagimg = new HtmlMeta();
tagimg.Attributes.Add("property", "og:img");
tagimg.Content = "Image URL";
Page.Header.Controls.Add(tagimg);
Finally it is rendering the meta tags as below..
<meta property="og:title" content="Title" />
<meta property="og:description" content="Description" />
<meta property="og:url" content="url info" />
<meta property="og:image" content="image url" />
Now the moment i click the Like button it only sends the url. and not sending the Description/Image/Title.
I am using the link "http://developers.facebook.com/tools/debug". It says that the Description/Image/Title is missing.
Any Ideas?
You don't send the metadata to Facebook, Facebook retrieves the metadata from the page's HTML when it loads the page. Try viewing your URL with the following tool:
http://developers.facebook.com/tools/debug/og/echo?q=<your URL here>
It will show you what Facebook sees (it's the 'Scraped URL' link at the bottom of the debug tool that you're using now).
If it does not include the metadata tags then Facebook does not see them and it won't add the metadata to its Open Graph object. If that's the case then you might not be adding the metadata properly to the HTML.
The second approach looks correct. The question is, where did you place that code? It should be called on Page_Load.
Clicking the Like button does not send the og:xxxx information. Your page should already have those og:xxxx meta tags from the very beginning.
Do you use MVC ASP.NET?
You can try to set meta tags in Layout.cshtml as
<meta property="og:type" content="website">
<meta property="og:site_name" content="Site name">
<meta property="og:title" content="#ViewBag.OGTitle" />
<meta property="og:description" content="#ViewBag.OGDesc" />
<meta property="og:url" content="#ViewBag.OGUrl">
<meta property="og:image" content="#ViewBag.OGImage" />
and then set tags values in separate page MyPage.cshtml
#model Books.Web.Models.ItemSource
#{
ViewBag.OGTitle = Model.Item.Title;
ViewBag.OGDesc = Model.Item.Description;
ViewBag.OGUrl = Request.Url.AbsoluteUri;
ViewBag.OGImage = Request.Url.Scheme + "://" + Request.Url.Host + Url.Action("ItemCover", "Image", new { id = Model.Item.Id, height = 350 });
Layout = "~/Views/Shared/Layout.cshtml";
}
<div>Page content here</div>
So I have an aspx page that servers XML + XSL to a client and does a client-side transform which works fine.
I am trying to detect the client and if they don't support client side transformation I am doing it serverside. I am interrupting the render processor the aspx page that would return XML and I am getting it's output, combining it with the output from the XSL page and serving it out. This output however is not well formed. I get
XML Parsing Error: mismatched tag. Expected: </link>.
Location: http://oohrl.com/dashboard.aspx
Line Number 36, Column 20: </script></head>
-------------------^
In the client side generated output, which works fine, I get for instance
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" type="text/css" href="./css/dboard.css"/>
<link rel="stylesheet" type="text/css" href="./css/dboardmenu.css"/>
<script type="text/javascript" src="./js/simpletabs.js"/>
<link href="../css/simpletabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6/jquery.min.js"/>
<script type="text/javascript">
$(document).ready(function () {
$("#BlogSelectList li a").live("click", function () {
var str = ($(this).attr("href")).slice(1, 37)
$.ajax({
contentType: "application/json; charset=utf-8",
url: '../ws/WebServices.asmx/SetActiveBlog',
data: '{ActiveBlogID: "' + str + '"}',
dataType: 'json',
type: "post",
success: function (j) {
window.location.href = 'dashboard.aspx'
}
});
});
})
function showlayer(layer) {
var myLayer = document.getElementById(layer);
if (myLayer.style.display == "none" || myLayer.style.display == "") {
myLayer.style.display = "block";
}
else {
myLayer.style.display = "none";
}
}
</script></head>
If I generate it server side I get
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=utf-16">
<link rel="stylesheet" type="text/css" href="./css/dboard.css">
<link rel="stylesheet" type="text/css" href="./css/dboardmenu.css">
<script type="text/javascript" src="./js/simpletabs.js"></script>
<link href="../css/simpletabs.css" rel="stylesheet" type="text/css">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.6/jquery.min.js">
</script>
<script type="text/javascript">
$(document).ready(function () {
$("#BlogSelectList li a").live("click", function () {
var str = ($(this).attr("href")).slice(1, 37)
$.ajax({
contentType: "application/json; charset=utf-8",
url: '../ws/WebServices.asmx/SetActiveBlog',
data: '{ActiveBlogID: "' + str + '"}',
dataType: 'json',
type: "post",
success: function (j) {
window.location.href = 'dashboard.aspx'
}
});
});
})
function showlayer(layer) {
var myLayer = document.getElementById(layer);
if (myLayer.style.display == "none" || myLayer.style.display == "") {
myLayer.style.display = "block";
}
else {
myLayer.style.display = "none";
}
}
</script></head>
Which gives me the error. Of course I notice the difference in the <link/> vs <link> tag but I have no idea why the server side processing engine give me different results or how to fix it?
Here is the code I use to generate the XHTML on the server
protected override void Render(HtmlTextWriter writer)
{
StringBuilder sb = new StringBuilder();
StringWriter sw = new StringWriter(sb);
HtmlTextWriter hWriter = new HtmlTextWriter(sw);
base.Render(hWriter);
// *** store to a string
string XMLOutput = sb.ToString();
// *** Write it back to the server
if (!Request.Browser.IsBrowser("IE"))
{
writer.Write(XMLOutput);
}
else
{
StringWriter XSLsw = new StringWriter();
HttpContext.Current.Server.Execute("DashboardXSL.aspx", XSLsw);
string output = String.Empty;
using (StringReader srt = new StringReader(XSLsw.ToString())) // xslInput is a string that contains xsl
using (StringReader sri = new StringReader(XMLOutput)) // xmlInput is a string that contains xml
{
using (XmlReader xrt = XmlReader.Create(srt))
using (XmlReader xri = XmlReader.Create(sri))
{
XslCompiledTransform xslt = new XslCompiledTransform();
xslt.Load(xrt);
using (StringWriter _sw = new StringWriter())
using (XmlWriter xwo = XmlWriter.Create(_sw, xslt.OutputSettings)) // use OutputSettings of xsl, so it can be output as HTML
{
xslt.Transform(xri, xwo);
output = _sw.ToString();
}
}
}
writer.Write(output);
Response.Flush();
Response.End();
}
Because the root element of your output document is <html>, the processor chooses HTML as the default format. To create a well-formed XHTML document instead, make sure your XSLT contains the following as a child of the root <xsl:stylesheet> or <xsl:transform> element:
<xsl:output method="xml" omit-xml-declaration="yes" />
I had to set the content type on the xsl sheet to text/html which fixed all problems.
Note this change is ONLY used when transforming server side. When sending it to the client for a client transformation it is not changed to text/html
I have this html doc:
<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
<body>
<form class="Form" onsubmit="return checkForm(this);" id="Form" method="post">
//form body
</form>
</body>
</html>
This is stack trace:
at System.Xml.XmlValidatingReaderImpl.ValidationEventHandling.System.Xml.IValidationEventHandling.SendEvent(Exception exception, XmlSeverityType severity)
at System.Xml.Schema.BaseValidator.SendValidationEvent(String code, String arg)
at System.Xml.Schema.DtdValidator.ProcessElement()
at System.Xml.Schema.DtdValidator.Validate()
at System.Xml.XmlValidatingReaderImpl.Read()
at System.Xml.XmlReader.MoveToContent()
at System.ServiceModel.Channels.Message.CreateMessage(MessageVersion version, String action, XmlDictionaryReader body)
at Renault.LMT.ServiceModel.Dispatcher.ServerMessageFormatter.SerializeReply(MessageVersion messageVersion, Object[] parameters, Object result)
Code, error occurs in the last line:
MemoryStream MemoryStreamm = new MemoryStream(Encoding.UTF8.GetBytes((MessageBody)));
MemoryStreamm.Position = 0;
XmlReaderSettings settingsReader = new XmlReaderSettings();
settingsReader.DtdProcessing = DtdProcessing.Parse;
settingsReader.ValidationType = ValidationType.DTD;
settingsReader.XmlResolver = null;
XmlReader reader = XmlReader.Create(MemoryStreamm, settingsReader);
MessageResponse = Message.CreateMessage(messageVersion, string.Format("ServiceModel/ILMTService/{0}", Operation), reader);
According to http://msdn.microsoft.com/en-us/library/system.xml.xmlreadersettings.xmlresolver.aspx , it doesn't look like a good idea to set the XmlResolver to null. It's likely that the DTD can't be loaded so it can't match any element, the first of which is html.
I strongly recommend that you store a copy of the DTD locally, and implement an XmtResolver that, when the DTD is requested, returns that local copy. You should always do this for DTDs and XML Schemas because many servers providing these files severely throttle the number of requests from any one location.