Linq to XML - Render CDATA as HTML - c#

I have the following XML:
<stories>
<story id="1234">
<title>This is a title</title>
<date>1/1/1980</date>
<article>
<![CDATA[<p>This is an article.</p>]]>
</article>
</story>
</stories>
And the following Linq to XML code in C#:
#{
XDocument xmlDoc = XDocument.Load("foo.xml");
var stories = from story in xmlDoc.Descendants("stories")
.Descendants("story")
.OrderByDescending(s => (string)s.Attribute("id"))
select new
{
title = story.Element("title").Value,
date = story.Element("date").Value,
article = story.Element("article").Value,
};
foreach (var story in stories)
{
<text><div class="news_item">
<span class="title">#story.title</span>
<span class="date">#story.date</span>
<div class="story">#story.article</div>
</div></text>
}
}
The rendered HTML is output to the browser as:
<div class="news_item">
<span class="title">This is a title</span>
<span class="date">1/1/1980</span>
<div class="story"><p>This is an article.</p></div>
</div>
I want the <p> tag rendered as HTML to the browser, not encoded. How do I accomplish this?

Razor encodes values by default. You need to use Html.Raw helper to avoid it ( Html.Raw() in ASP.NET MVC Razor view )
<div class="story">#Html.Raw(story.article)</div>

Related

HTMLAgilityPack C#, How to extract text from nested Tags in DIV

I have this HTML code where I want to extract the date from:
<div id="footer">
<div style="font-size:smaller">
Added in:
<strong>
07/06/2021 2:15:36 PM
</strong>
</div>
</div>
This is my C# HTMLAgilityPack
doc.DocumentNode.SelectSingleNode("//div[#id='footer']").InnerText
doc.DocumentNode.SelectSingleNode("//div[#id='footer']/div/strong").InnerText
Update :
All Code :
var html ="<div id=\"footer\"><div style=\"font-size:smaller\"> Added in:<strong> 07/06/2021 2:15:36 PM </strong></div></div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var time = doc.DocumentNode.SelectSingleNode("//div[#id='footer']/div/strong").InnerText;
and I extracted the Date

fetching span value from html document

I have following xpath fetched using firefox xpath plugin
id('some_id')/x:ul/x:li[4]/x:span
using html agility pack I'm able to fetch id('some_id')/x:ul/x:li[4]
htmlDoc.DocumentNode.SelectNodes(#"//div[#id='some_id']/ul/li[4]").FirstOrDefault();
but I dont know how to get this span value.
update
<div id="some_id">
<ul>
<li><li>
<li><li>
<li><li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>
You don't need parse HTML with LINQ2XML, HTMLAgilityPack it's for it and it's more easy to obtain the node in the following way :
var html = #" <div id=""some_id"">
<ul>
<li></li>
<li></li>
<li></li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>";
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);
var value = doc.DocumentNode.SelectSingleNode("div[#id='some_id']/ul/li/span").InnerText;
Console.WriteLine(value);
An alternative approach (without html-agility-pack) would be to use LINQ2XML. You can use the XDocument.Descendants method to take the span element and take it's value:
var xml = #" <div id=""some_id"">
<ul>
<li></li>
<li></li>
<li></li>
<li>
Some text
<span>text I want to grab</span>
</li>
</ul>
</div>";
var doc = XDocument.Parse(xml);
Console.WriteLine(doc.Root.Descendants("span").FirstOrDefault().Value);
The code can be extended to check if the div element has the matching id, using the XElement.Attribute property:
var doc = XDocument.Parse(xml);
Console.WriteLine(doc.Elements("div").Where (e => e.Attribute("id").Value == "some_id").Descendants("span").FirstOrDefault().Value);
One drawback of this solution is that the XML structure (HTML, XHTML) needs to be properly closed or else the parsing will fail.

Cannot find specific XML elements in XML Document

I just ran into a head scratcher, I'm not quite sure why this does not work. I want to find all the elements with the attribute "video".
My XML document looks like this:
<MainMenu>
<div id="BroughtInMenu">
<div class="menuItem0">
Menu Item
<div class="subMenu0">
<div class="menuItem1">
Dictation
<div class="subMenu1">
<div class="menuItem2" video="1">Fee Earner</div>
<div class="menuItem2" video="1">Secretary</div>
<div class="menuItem2" video="1">View File History</div>
</div>
</div>
<div class="menuItem1">
PM Advanced Agenda
<div class="subMenu1">
<div class="menuItem2">
Help
<div class="subMenu2">
<div class="menuItem3" video="1">Release Notes</div>
</div>
</div>
<div class="menuItem2">
System Maintenance
<div class="subMenu2">
<div class="menuItem3" video="1">Additional Field Setup</div>
<div class="menuItem3" video="1">Role Permission Maintenance</div>
<div class="menuItem3" video="1">Shared Diary Permissions</div>
</div>
</div>
<div class="menuItem2">
Utilities
<div class="subMenu2">
<div class="menuItem3" video="1">Change Entity Subtype</div>
<div class="menuItem3" video="1">Field Maintenance</div>
<div class="menuItem3" video="1">Move Client and Files to Fee Earner</div>
<div class="menuItem3" video="1">Reallocate Files</div>
</div>
</div>
</div>
</div> . . . . . . . . . . . . . . . . ..
This is very the same as HTML. This is for a website, so at the end I want to get all the elements with the attribute "video".
If I can do this, then I will only grab the div elements with the attribute "video", and then I will be able to use that for something else, like in a search, where I actually search the xml document and return the div, etc etc... hope you see my drift here...
Because the video attribute is going to point to a location, it will be very useful for html purposes to just jump to the video when the div is clicked.
So far I have tried this, but i am not getting the elements at all:
XElement xDoc = XElement.Load(Server.MapPath("automation/xml/mainMenu.xml"));
IEnumerable<XElement> list = from el in xDoc.Elements("div") where el.Attribute("video") != null select el;
foreach (XElement element in list)
{
//Nothing found?
}
I also thought about REGEX... maybe regex will be able to pull the divs i want, already in text format so that i can just push it into an html element in the website?
Any help will be greatly appreceiated!
Use Descendands instead of Elements. Elements returns just immediate children.
var xDoc = XElement.Load(Server.MapPath("automation/xml/mainMenu.xml"));
var list = from el in xDoc.Descendants("div")
where el.Attribute("video") != null
select el;
foreach (XElement element in list)
{
//Nothing found?
}
You can select elements where a particular attribute is present with XPath. To use the XPath extension methods, you need to include the namespace.
using System.Xml.XPath;
An XPath such as "//div[#video]" will include all "div" tags at any level, but filter the selected elements to only those with a "video" attribute, so you're not looping unnecessarily through lots of elements checking for the presence of an attribute.
var xDoc = XElement.Load(Server.MapPath("automation/xml/mainMenu.xml"));
foreach (var divWithVideo in xDoc.XPathSelectElements ("//div[#video]")) {
Console.WriteLine (divWithVideo);
}
Here you are only iterating on the elements with a "video" attribute.

c# htmlagilitypack parse first link from div with class?

I am trying to parse the first link in the html code below /search?id=3
<div class="brs_col">
<p>
<a href="/search?id=3">
<b>
vastu shastra
</b>
</a>
</p>
<p>
<a href="/search?id=1">
<b>
bygga
</b>
bastu
</a>
</p>
</div>
I've tried to select it with the following XPATH, but cant seem to get any of them to work:
//div[#class='brs_col']//p//a[#href]
//div[#class='brs_col']//p[0]//a[#href]
//div[#class='brs_col']//p//a[0][#href]
Any ideas?
Try this:
var doc = new HtmlDocument();
doc.LoadHtml(#"<div class=""brs_col"">
<p><b>vastu shastra</b></p>
<p><b>bygga</b>bastu</p>
</div>");
var hrefValue = doc.DocumentNode
.SelectSingleNode("//div[#class='brs_col']/p/a")
.Attributes["href"]
.Value;
You can try this
doc.DocumentNode.SelectNodes("//a[#href]").FirstOrDefault();
This if you sure that is the first url in the whole HTML document:
doc.DocumentNode.SelectSingleNode("//a").Attributes["href"].Value;
Or this if you sure that is the first ulr in the class brs_col
doc.DocumentNode.SelectSingleNode("//div[#class='brs_col']//a").Attributes["href"].Value;

how to get html div element innertext by id using regular expression in C#

I'm getting full html code using WebClient. But i need to get specified div from full html using regular expression.
for example:
<body>
<div id="main">
<div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
</div>
<div id="right" style="float:left"> main side</div>
<div>
</body>
if i need div named 'main', function return
<div id="left" style="float:left">this is a <b>left</b> side:<div style='color:red'> 1 </div>
</div>
<div id="right" style="float:left"> main side</div>
If i need div named 'left', function return
this is a <b>left</b> side:<div style='color:red'> 1 </div>
If i need div named 'right', function return
main side
How can i do?
Why do people insist on trying to use regex to parse html? You can probably do it if you exclude a whole host of edge-cases... but just use HTML Agility Pack and you're done:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(...); // or Load
string main = doc.DocumentNode.SelectSingleNode("//div[#id='main']").InnerHtml;
(note I'm assuming it is not xhtml; if it is xhtml, use XmlDocument or XDocument, and very similar code to the above)
string divname = "somename";
Match m = RegEx.Match(htmlContent, "<div[^>]*id="+divname+".*?>(.*?)</div");
string contenct = m.Groups[1].Tostring();
won't work if you have nested divs inside the desired div

Categories