I have the following XML:
<stories>
<story id="1234">
<title>This is a title</title>
<date>1/1/1980</date>
<article>
<![CDATA[<p>This is an article.</p>]]>
</article>
</story>
</stories>
And the following Linq to XML code in C#:
#{
XDocument xmlDoc = XDocument.Load("foo.xml");
var stories = from story in xmlDoc.Descendants("stories")
.Descendants("story")
.OrderByDescending(s => (string)s.Attribute("id"))
select new
{
title = story.Element("title").Value,
date = story.Element("date").Value,
article = story.Element("article").Value,
};
foreach (var story in stories)
{
<text><div class="news_item">
<span class="title">#story.title</span>
<span class="date">#story.date</span>
<div class="story">#story.article</div>
</div></text>
}
}
The rendered HTML is output to the browser as:
<div class="news_item">
<span class="title">This is a title</span>
<span class="date">1/1/1980</span>
<div class="story"><p>This is an article.</p></div>
</div>
I want the <p> tag rendered as HTML to the browser, not encoded. How do I accomplish this?
Razor encodes values by default. You need to use Html.Raw helper to avoid it ( Html.Raw() in ASP.NET MVC Razor view )
<div class="story">#Html.Raw(story.article)</div>
Related
I have this HTML code where I want to extract the date from:
<div id="footer">
<div style="font-size:smaller">
Added in:
<strong>
07/06/2021 2:15:36 PM
</strong>
</div>
</div>
This is my C# HTMLAgilityPack
doc.DocumentNode.SelectSingleNode("//div[#id='footer']").InnerText
doc.DocumentNode.SelectSingleNode("//div[#id='footer']/div/strong").InnerText
Update :
All Code :
var html ="<div id=\"footer\"><div style=\"font-size:smaller\"> Added in:<strong> 07/06/2021 2:15:36 PM </strong></div></div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var time = doc.DocumentNode.SelectSingleNode("//div[#id='footer']/div/strong").InnerText;
and I extracted the Date
I have following xpath fetched using firefox xpath plugin
id('some_id')/x:ul/x:li[4]/x:span
using html agility pack I'm able to fetch id('some_id')/x:ul/x:li[4]
htmlDoc.DocumentNode.SelectNodes(#"//div[#id='some_id']/ul/li[4]").FirstOrDefault();
but I dont know how to get this span value.
update
<div id="some_id">
<ul>
<li><li>
<li><li>
<li><li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>
You don't need parse HTML with LINQ2XML, HTMLAgilityPack it's for it and it's more easy to obtain the node in the following way :
var html = #" <div id=""some_id"">
<ul>
<li></li>
<li></li>
<li></li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var value = doc.DocumentNode.SelectSingleNode("div[#id='some_id']/ul/li/span").InnerText;
Console.WriteLine(value);
An alternative approach (without html-agility-pack) would be to use LINQ2XML. You can use the XDocument.Descendants method to take the span element and take it's value:
var xml = #" <div id=""some_id"">
<ul>
<li></li>
<li></li>
<li></li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>";
var doc = XDocument.Parse(xml);
Console.WriteLine(doc.Root.Descendants("span").FirstOrDefault().Value);
The code can be extended to check if the div element has the matching id, using the XElement.Attribute property:
var doc = XDocument.Parse(xml);
Console.WriteLine(doc.Elements("div").Where (e => e.Attribute("id").Value == "some_id").Descendants("span").FirstOrDefault().Value);
One drawback of this solution is that the XML structure (HTML, XHTML) needs to be properly closed or else the parsing will fail.
I just ran into a head scratcher, I'm not quite sure why this does not work. I want to find all the elements with the attribute "video".
My XML document looks like this:
<MainMenu>
<div id="BroughtInMenu">
<div class="menuItem0">
Menu Item
<div class="subMenu0">
<div class="menuItem1">
Dictation
<div class="subMenu1">
<div class="menuItem2" video="1">Fee Earner</div>
<div class="menuItem2" video="1">Secretary</div>
<div class="menuItem2" video="1">View File History</div>
</div>
</div>
<div class="menuItem1">
PM Advanced Agenda
<div class="subMenu1">
<div class="menuItem2">
Help
<div class="subMenu2">
<div class="menuItem3" video="1">Release Notes</div>
</div>
</div>
<div class="menuItem2">
System Maintenance
<div class="subMenu2">
<div class="menuItem3" video="1">Additional Field Setup</div>
<div class="menuItem3" video="1">Role Permission Maintenance</div>
<div class="menuItem3" video="1">Shared Diary Permissions</div>
</div>
</div>
<div class="menuItem2">
Utilities
<div class="subMenu2">
<div class="menuItem3" video="1">Change Entity Subtype</div>
<div class="menuItem3" video="1">Field Maintenance</div>
<div class="menuItem3" video="1">Move Client and Files to Fee Earner</div>
<div class="menuItem3" video="1">Reallocate Files</div>
</div>
</div>
</div>
</div> . . . . . . . . . . . . . . . . ..
This is very the same as HTML. This is for a website, so at the end I want to get all the elements with the attribute "video".
If I can do this, then I will only grab the div elements with the attribute "video", and then I will be able to use that for something else, like in a search, where I actually search the xml document and return the div, etc etc... hope you see my drift here...
Because the video attribute is going to point to a location, it will be very useful for html purposes to just jump to the video when the div is clicked.
So far I have tried this, but i am not getting the elements at all:
XElement xDoc = XElement.Load(Server.MapPath("automation/xml/mainMenu.xml"));
IEnumerable<XElement> list = from el in xDoc.Elements("div") where el.Attribute("video") != null select el;
foreach (XElement element in list)
{
//Nothing found?
}
I also thought about REGEX... maybe regex will be able to pull the divs i want, already in text format so that i can just push it into an html element in the website?
Any help will be greatly appreceiated!
Use Descendands instead of Elements. Elements returns just immediate children.
var xDoc = XElement.Load(Server.MapPath("automation/xml/mainMenu.xml"));
var list = from el in xDoc.Descendants("div")
where el.Attribute("video") != null
select el;
foreach (XElement element in list)
{
//Nothing found?
}
You can select elements where a particular attribute is present with XPath. To use the XPath extension methods, you need to include the namespace.
using System.Xml.XPath;
An XPath such as "//div[#video]" will include all "div" tags at any level, but filter the selected elements to only those with a "video" attribute, so you're not looping unnecessarily through lots of elements checking for the presence of an attribute.
var xDoc = XElement.Load(Server.MapPath("automation/xml/mainMenu.xml"));
foreach (var divWithVideo in xDoc.XPathSelectElements ("//div[#video]")) {
Console.WriteLine (divWithVideo);
}
Here you are only iterating on the elements with a "video" attribute.
I am trying to parse the first link in the html code below /search?id=3
<div class="brs_col">
<p>
<a href="/search?id=3">
<b>
vastu shastra
</b>
</a>
</p>
<p>
<a href="/search?id=1">
<b>
bygga
</b>
bastu
</a>
</p>
</div>
I've tried to select it with the following XPATH, but cant seem to get any of them to work:
//div[#class='brs_col']//p//a[#href]
//div[#class='brs_col']//p[0]//a[#href]
//div[#class='brs_col']//p//a[0][#href]
Any ideas?
Try this:
var doc = new HtmlDocument();
doc.LoadHtml(#"<div class=""brs_col"">
<p><b>vastu shastra</b></p>
<p><b>bygga</b>bastu</p>
</div>");
var hrefValue = doc.DocumentNode
.SelectSingleNode("//div[#class='brs_col']/p/a")
.Attributes["href"]
.Value;
You can try this
doc.DocumentNode.SelectNodes("//a[#href]").FirstOrDefault();
This if you sure that is the first url in the whole HTML document:
doc.DocumentNode.SelectSingleNode("//a").Attributes["href"].Value;
Or this if you sure that is the first ulr in the class brs_col
doc.DocumentNode.SelectSingleNode("//div[#class='brs_col']//a").Attributes["href"].Value;
I'm getting full html code using WebClient. But i need to get specified div from full html using regular expression.
for example:
<body>
<div id="main">
<div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
</div>
<div id="right" style="float:left"> main side</div>
<div>
</body>
if i need div named 'main', function return
<div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
</div>
<div id="right" style="float:left"> main side</div>
If i need div named 'left', function return
this is a <b>left</b> side:<div style='color:red'> 1 </div>
If i need div named 'right', function return
main side
How can i do?
Why do people insist on trying to use regex to parse html? You can probably do it if you exclude a whole host of edge-cases... but just use HTML Agility Pack and you're done:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(...); // or Load
string main = doc.DocumentNode.SelectSingleNode("//div[#id='main']").InnerHtml;
(note I'm assuming it is not xhtml; if it is xhtml, use XmlDocument or XDocument, and very similar code to the above)
string divname = "somename";
Match m = RegEx.Match(htmlContent, "<div[^>]*id="+divname+".*?>(.*?)</div");
string contenct = m.Groups[1].Tostring();
won't work if you have nested divs inside the desired div