Hi am using xml file given below,i want to parse html file .
<Description>
<Fullcontent>
<div id="container" class="cf">
<link rel="stylesheet" href="http://dev2.mercuryminds.com/imageslider/css/demo.css" type="text/css" media="screen" />
<ul class="slides">
<li>Sonam Kapoor<img src="http://deys.jpeg"/></li>
<li>Amithab<img src="http://deysAmithab.jpeg"/></li>
<li>sridevi<img src="http://deyssridevi.jpeg"/></li>
<li>anil-kapoor<img src="http://deysanil-kapoor.jpeg"/></li>
</ul>
</div>
</Fullcontent>
</Description>
i want bind image with name
You can install HtmlAgilityPack from NuGet (just search for agility). Parsing is also simple. Here is way for selecting image tags and taking source attributes:
HtmlDocument html = new HtmlDocument();
html.Load(path_to_file);
var urls = html.DocumentNode.SelectNodes("//ul[#class='slides']/li/img")
.Select(node => node.Attributes["src"].Value);
Btw looks like direct selection of attributes is not supported yet.
Related
I have a WebBrowser control on a WinForm form, which should show a specific HTML page. Within the HTML page, I have some <template>s that should be used to populate the page. When I use the following code:
string template = webBrowser.Document.GetElementById("template-id").InnerHtml
I get null, instead of the actual inner HTML.
I guess it has to do with the fact that the template tag is not supported in old IE versions, hence in the WebBrowser control. How can I fix it? Is it even possible?
Of course, I could put my template into a string and be done with it, but this seems very hacky.
Thanks!
WebBrowser control uses Internet Explorer and Internet Explorer doesn't support <template> tag. (See Browser compatibility) section.
If you can modify content of pages
If you can modify content of pages, there are some workaround for you to be able to use <template> tag. You can use either of the following options:
Option 1 - Make the tag hidden and load IE in edge mode
webBrowser1.DocumentText = #"
<html>
<head>
<title>Test</title>
<meta http-equiv=""X-UA-Compatible"" content=""IE=Edge"" />
</head>
<body>
<h1>Test</h1>
<div>This is a test page.</div>
<template id=""template1"" style=""display:none;"">
<h2>Flower</h2>
<img src=""img_white_flower.jpg"" />
</template>
</body>
</html>";
webBrowser1.DocumentCompleted += (obj, args) =>
{
var element = webBrowser1.Document.GetElementById("template1");
MessageBox.Show(element?.InnerHtml);
};
Option 2 - Include The HTML5 Shiv in the page
webBrowser1.DocumentText = #"
<html>
<head>
<title>Test</title>
<!--[if lt IE 9]>
<script src=""https://cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv.min.js"">
</script>
<![endif]-->
</head>
<body>
<h1>Test</h1>
<div>This is a test page.</div>
<template id=""template1"">
<h2>Flower</h2>
<img src=""img_white_flower.jpg"" />
</template>
</body>
</html>";
webBrowser1.DocumentCompleted += (obj, args) =>
{
var element = webBrowser1.Document.GetElementById("template1");
MessageBox.Show(element?.InnerHtml);
};
If you cannot modify content of pages
If you cannot modify content of the page, you need to use a different browser control, for example:
You can use the new WebViewCompatible control for Windows Forms. You can see simple steps to use here: Replace WebBrowser control by new WebView Compatible control for Windows Forms. It will use Edge rendering engine on Windows 10, but for other windows versions, it uses IE again.
You can rely on other You can rely on other browser controls like CefSharp.
I have some HTML code stored into a string variable, resulting from a HttpWebRequest:
<html>
<head>
<div>Lots of scripts and libraries</div>
</head>
<body>
<div>Some very useful data</div>
</body>
<footer>
<div>Not interesting struff</div>
</footer>
<html>
How can I do to remove all unecesary nodes and get into this:
<body>
<div>Some very useful data</div>
</body>
The easiest way is to use HtmlAgilityPack to grab just the body tag.
var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);
HtmlNode body = document.DocumentNode.SelectSingleNode("//body");
From there, you can use HtmlAgilityPack to further parse the body node for more detail.
HTML
<html>
<head>
<title>Sample Page</title>
</head>
<body>
<form action="demo_form.asp" id="form1" method="get">
First name: <input type="text" name="fname"><br>
Last name: <input type="text" name="lname"><br>
<input type="submit" value="Submit">
</form>
</body>
</html>
Code
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(File.ReadAllText(#"C:\sample.html"));
HtmlNode nd = doc.DocumentNode.SelectSingleNode("//form[#id='form1']");
//nd.InnerHtml is "".
//nd.InnerText is "".
Problem
nd.ChildNodes //Collection(to get all nodes in form) is always null.
nd.SelectNodes("/input") //returns null.
nd.SelectNodes("./input") //returns null.
"//form[#id='form1']/input" //returns null.
what i want is to access childnodes of form tag with id=form1 one by one in order of occurrence. I tried same xpath in chrome developer console and it works just exactly the way i wanted. Is HTMlAgility pack is having problem in reading html from file or Web.
Your html is invalid and may be preventing the html agility pack from working properly.
Try adding a doctype (and an xml namespace) to the start of your document and change your input element's closing tags from > to />
Try adding the following statement before loading the document:
HtmlNode.ElementsFlags.Remove("form");
HtmlAgilityPack's default behaviour adds all the form's inner-elements as siblings in stead of children. The statement above alters that behaviour so that they (meaning the input tags) will appear as childnodes.
Your code would look like this:
HtmlNode.ElementsFlags.Remove("form");
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(File.ReadAllText(#"C:\sample.html"));
HtmlNode nd = doc.DocumentNode.SelectSingleNode("//form[#id='form1']");
etc...
references:
bug issue & fix: http://htmlagilitypack.codeplex.com/workitem/23074
codeplex forum post: http://htmlagilitypack.codeplex.com/discussions/247206
I create a Webpage dynamically and Show it with the NavigateToString method of the Webbrowser control. I'ld like to reference a style sheet like this:
<link rel=stylesheet type="text/css" href="style.css">
I placed style.css in the applications workdir. But it doesn't work. When I save the created Website to a html file, place the style.css next to it and open that it in a browser, it displays corretly.
How can I reference static files like that?
You Can do this:
Before navigate to html string Change It's Css address
string html = System.IO.File.ReadAllText(#"Your html file").Replace("BaseAdress", #"location of css file");
Example:
Html File:
<html>
<head>
<link rel=stylesheet type="text/css" href="BaseAdress\style.css">
</head>
<body>
<p>hi</p>
</body>
</html>
CS Code:
string html = System.IO.File.ReadAllText(#"E:\1.html").Replace("BaseAdress",#"E:\");
webBrowser1.NavigateToSting(html);
I am working on a MVC2 site and am having issues getting my objects on my views to inherit the css classes.
Here is my helper object and CSS that is saved in the Site.css which is linked in the master page.
This also works fine if I put the CSS in a tag on the masterpage.
<%= Html.ListBox("ExpenseItems", (System.Web.Mvc.SelectList)ViewData["ExpenseDefinitions"], new { #class = "optionlist" })%>
.optionlist
{
width:100px;
height:100px;
}
Browser HTML:
..
<link href="../Content/Site.css" rel="stylesheet" type="text/css" />
..
<select class="optionlist" id="ExpenseItems" multiple="multiple" name="ExpenseItems">
<option value="1">Test</option>
</select>
Figured it out... Can't apply the style to the list.
Some reason, you need to apply it to a div then apply to the control in CSS.
example:
CSS:
.optionlist select
{
width:100px;
height:100px;
}
<div class="optionlist">
... Lisbox
</div>
when you link your css file that way, and if you are browing in in a page with a url like this http://yoursite.com/MyPage/Content/Article of course the css file will not be found since it goes this way.
css file mapped in `../Content/Sites.css`
Page is `/MyPage/Content/Article`
css real content is placed in `/Content`
when the parser looks for the css it looks in `/MyPage/Content/Site.css`
which is not where it where it is.
My suggestion is add a base url to your css link
<%
string baseUrl = "http://" + Request.Url.Host + (Request.Url.Port != 80 ? ":" + Request.Url.Port.ToString() : "");
$>
<link href=<%=baseUrl%>/Content/Site.css rel="stylesheet" type="text/css" />
Don't put " in href of the link tag