How to Extract value of a cdata from xelement? - c#

I have the following XML
<?xml version="1.0"?>
<DisplayViewHtml>
<embeddedHTML><![CDATA[<html><body><div>Hello World</div></body></html>]]></embeddedHTML>
<executive>Madan Mishra</executive>
<imgSRC>/executive/2.jpg</imgSRC>
</DisplayViewHtml>
In the c# code trying to extract the value of embeddedHTML with out CDATA.
My c# code is given below,
XElement displayViewHtml=null;
XmlReader reader = XmlReader.Create(new StringReader(e.Result));
displayViewHtml = XElement.Load(reader);
IEnumerable<XElement> settings = from item in displayViewHtml.Elements() select item;
foreach (XElement setting in settings)
{
switch (setting.Name.ToString())
{
case "embeddedHTML":
counterViewHtml = setting.Value;
break;
case "executive":
executive = setting.Value;
break;
case "imgSRC":
imgSRC = setting.Value;
break;
default:
//log
break;
}
}
from the above code I am able to extract the value of embeddedHTML,executive and imgSRC But embeddedHTML gives
<![CDATA[<html><body><div>Hello World</div></body></html>]]>
but I want
<html><body><div>Hello World</div></body></html>
kindly don't suggest to use .Replace method

As #CamBruce suggested, the problem is that your xml file has encoded characters where they shouldn't. Ideal solution is to fix the program that generate xml file. Anyway if you, for some reason expect a work-around here, this way will do :
.....
case "embeddedHTML":
var element = XElement.Parse("<embeddedHtml>" +
setting.Value +
"</embeddedHtml>");
counterViewHtml = element.Value;
break;
.....
The codes above tell the program to create new XElement (which is variable element) by parsing string that already unescaped. Hence, value of newly created XElement will contains string that you want :
<html><body><div>Hello World</div></body></html>

It looks like the CData declaration in the XML is encoded with the rest of the HTML. Ensure that the producer of this XML has the non-encoded CData declaration, like this <![CDATA[ encoded HTML content ]]>
Otherwise, the code you have looks correct. There is nothing special you need to do to read CData with Linq to XML.

Related

How can I control if the tag I wanna read is in the xml file

I read tags for some information in the online dynamic xml file. But an error occurs if the tag I wanna read is not in the xml file. So, I wanna check the xml file. if the tag is in the xml file, start reading xml for the tag. if the tag is not in the xml file, not reading. I am not good in coding c#..
I use this method for reading xml file.
var xmldoc = new XmlDocument();
xmldoc.Load("http://yourwebsite.com/weather.xml");,
temperature.Text = xmldoc.SelectSingleNode("temp").InnerXml.ToString();
windspeed.Text = xmldoc.SelectSingleNode("wind_spd").InnerXml.ToString();
storm.Text = xmldoc.SelectSingleNode("storm").InnerXml.ToString();
The storm tag is sometimes to be in the xml file. Then I can read this time.
But when the storm tag is not to be in xml file, I take an error. The code doesn't work.
Shortly, I wanna do this,
if(the storm tag is in xml) //check xml file.
{
storm.Text = xmldoc.SelectSingleNode("storm").InnerXml.ToString();
}
else
{
storm.text = "";
}
check for node like this:
var node = xmldoc.SelectSingleNode("storm");
if (node != null)
{
storm.Text = xmldoc.SelectSingleNode("storm").InnerXml.ToString();
}
else
{
//node doesn't exist
}
You can use null propagation.
Something like:
temperature.Text = xmldoc.SelectSingleNode("temp")?.InnerXml.ToString();
So in case of xmldoc.SelectSingleNode("temp") returns null - temperature.Text will also be null without exception.

get xml node value using c#

I have a problem where I need to get the value of a specific node in c#
I have this sample XML-Code and here is my C# code
string xml = #"
<ChapterHeader>
<Text> I need to get the text here</Text>
</ChapterHeader>
";
XmlReader rdr = XmlReader.Create(new System.IO.StringReader(xml));
while (rdr.Read())
{
if (rdr.NodeType == XmlNodeType.Element)
{
Console.WriteLine(rdr.LocalName);
if (rdr.LocalName == "ChapterHeader")
{
Console.WriteLine(rdr.Value);
}
}
}
The desired output is
<Text> I need to get the text here</Text>
including the Text Node. How can i do that? thank you
I also need to loop a huge xml file
and I need to get the value of a specific node
and I need to skip some specific node also.
example I have a node. the program must not read that Node and its childen Node.
How can i do that?
<ChapterHeader>
<Text> I need to get the text here</Text>
</ChapterHeader>
<Blank>
<Not>
</Not>
</Blank>
The desired output is
<Text> I need to get the text here</Text>
Look for ReadInnerXml which reads all the content, including markup, as a string.
Console.WriteLine( rdr.ReadInnerXml());
In the following question, you want to deal with larger Xml. I prefer Linq to Xml when dealing with larger set.
The program must not read that Node and its childen Node
Yes, it is possible. You could do something like this.
XDocument doc = XDocument.Load("filepath");
var nestedElementValues =
doc.Descendants("ChapterHeader") // flattens hierarchy and look for specific name.
.Elements() // Get elements for found element
.Select(x=>(string)x.Value); // Read the value.
Check this Example
System.Xml.Linq is a newer library designed to get rid of undesired reader style.
var document = XDocument.Parse(xml);
var texts = document.Descendants("Text");
foreach (var text in texts)
{
Console.WriteLine(text);
}
You can use the same parsing style you're using (rdr.LocalName = "Text") and then use rdr.ReadOuterXml()

Unable to convert special characters on reading XML

I am using the following code to import XML into a dataset:
DataSet dataSet = new DataSet();
dataSet.ReadXml(file.FullName);
if (dataSet.Tables.Count > 0) //not empty XML file
{
da.ClearFieldsForInsert();
DataRow order = dataSet.Tables["Orders"].Rows[0];
da.AddStringForInsert("ProductDescription", order["ProductDescription"].ToString());
}
Special characters such as &apos; are not getting translated to ' as I would have thought they should.
I can convert them myself in code, but would have thought the ReadXML method should do it automatically.
Is there anything I've missed here?
EDIT:
Relevant line of XML file:
<ProductDescription>Grey &apos;Aberdeen&apos; double wardrobe</ProductDescription>
EDIT:
I then tried using XElement:
XDocument doc = XDocument.Load(file.FullName);
XElement order = doc.Root.Elements("Orders").FirstOrDefault();
...
if (order != null)
{
da.ClearFieldsForInsert();
IEnumerable<XElement> items = doc.Root.Elements("Orders");
foreach (XElement item in items)
{
da.ClearFieldsForInsert();
da.AddStringForInsert("ProductDescription", item.Element("ProductDescription").value.ToString());
}
Still not getting converted!
As stated here, &apos; is a valid XML escape code.
However, it is not necessary to escape ' in element values.
<ProductDescription>Grey 'Aberdeen' double wardrobe</ProductDescription>
is valid XML.
Workaround aside, a standards compliant XML parser should honour the predefined entities, wherever they occur (except in CDATA.)
This frailty, and deviation from standard XML parsing, of Data.ReadXml is noted in the documentation. I quote:
The DataSet itself only escapes illegal
XML characters in XML element names and hence can only consume the
same. When legal characters in XML element name are escaped, the
element is ignored while processing.
Due to its limitations, I wouldn't use DataTable.ReadXml for XML parsing. Instead you could use XDocument something like this,
using System.Xml.Linq;
...
var doc = XDocument.Load(file.FullName);
var order in doc.Root.Elements("Order").FirstOrDefault();
if (order != null)
{
da.ClearFieldsForInsert();
var productDescription = order.Element("ProductDescription");
da.AddStringForInsert(
"ProductDescription",
productDescription.Value);
}

Reading XML only contents from a Log file

I have a log file which stores the data in XML format. I want to read this data but the problem that I am getting is that log file is not well structured XML file. It contains some additional data like :
03/22/2013 : 13:23:32 <?xml version="1.0" encoding="UTF-8"?>
<element1>
...
...
...
</element1>
As you will notice 03/22/2013 : 13:23:32 is not allowing me to read data and throwing exception saying "Data at the root level is invalid"
I am using following method to read XML
XmlDocument doc = new XmlDocument();
doc.Load("file.log");
string xmlcontents = doc.InnerXml;
label1.Text = xmlcontents;
Please guide me through the solution as this is a rare case for me. I tried googling for a solution but couldnt succeed
Thanks
A quick hack would be to parse the log file to extract only the text found between the root xml tags, in your case, what is found between < element1> and < /element1>.
You can search for the tag < ?xml, and what comes after ?> is your root tag, and go with that. Depending on how predictable of clearly structured the log file is, you can formulate better ways of doing this, but if nothing else works, you can try this way.
var doc = new XmlDocument();
doc.LoadXml(string.Concat(File.ReadAllLines("file.log").Skip(1)));
Reorganize your XML, so the date become an Element or Attribute, if the file is too large
Maybe you should read whole file to array of lines (System.IO.File.ReadAllLines(string path)) and then join elements of array skipping first line and rest of lines which is not fragments of XML structure (assuming your sample is only part of input file).
You can skip first line
var onlyXml = (File.ReadAllLines("file.log")).Skip(1).SelectMany(l => l).ToArray();
var xmlContent = new String(onlyXml);
XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlContent);
string xmlcontents = doc.InnerXml;
label1.Text = xmlcontents;
EDIT
You can get only xml between first '<' and last '>'
var text = File.ReadAllText("file.log");
var beginIndex = text.IndexOf('<');
var endIndex = text.LastIndexOf('>');
var onlyXml = text.Substring(beginIndex, endIndex - beginIndex + 1);

XML problem - HTML within a node is being removed (ASP.NET C# LINQ to XML)

When I load this XML node, the HTML within the node is being completely stripped out.
This is the code I use to get the value within the node, which is text combined with HTML:
var stuff = innerXml.Descendants("root").Elements("details").FirstOrDefault().Value;
Inside the "details" node is text that looks like this:
"This is <strong>test copy</strong>. This is A Link"
When I look in "stuff" var I see this:
"This is test copy. This is A Link". There is no HTML in the output... it is pulled out.
Maybe Value should be innerXml or innerHtml? Does FirstOrDefault() have anything to do with this?
I don't think the xml needs a "cdata" block...
HEre is a more complete code snippet:
announcements =
from link in xdoc.Descendants(textContainer).Elements(textElement)
where link.Parent.Attribute("id").Value == Announcement.NodeId
select new AnnouncmentXml
{
NodeId = link.Attribute("id").Value,
InnerXml = link.Value
};
XDocument innerXml;
innerXml = XDocument.Parse(item.InnerXml);
var abstract = innerXml.Descendants("root").Elements("abstract").FirstOrDefault().Value;
Finally, here is a snippet of the Xml Node. Notice how there is "InnerXml" within the standard xml structure. It starts with . I call this the "InnerXml" and this is what I am passing into the XDocument called InnerXml:
<text id="T_403080"><root> <title>How do I do stuff?</title> <details> Look Here Some Form. Please note that lorem ipsum dlor sit amet.</details> </root></text>
[UPDATE]
I tried to use this helper lamda, and it will return the HTML but it is escaped, so when it displays on the page I see the actual HTML in the view (it shows instead of giving a link, the tag is printed to screen:
Title = innerXml.Descendants("root").Elements("title").FirstOrDefault().Nodes().Aggregate(new System.Text.StringBuilder(), (sb, node) => sb.Append(node.ToString()), sb => sb.ToString());
So I tried both HTMLEncode and HTMLDecode but neither helped. One showed the escaped chars on the screen and the other did nothing:
Title =
System.Web.HttpContext.Current.Server.HtmlDecode(
innerXml.Descendants("root").Elements("details").Nodes().Aggregate(new System.Text.StringBuilder(), (sb, node) => sb.Append(node.ToString()), sb => sb.ToString())
);
I ended up using an XmlDocument instead of an XDocument. It doesn't seem like LINQ to XML is mature enough to support what I am trying to do. THere is no InnerXml property of an XDoc, only Value.
Maybe someday I will be able to revert to LINQ. For now, I just had to get this off my plate. Here is my solution:
// XmlDoc to hold custom Xml within each node
XmlDocument innerXml = new XmlDocument();
try
{
// Parse inner xml of each item and create objects
foreach (var faq in faqs)
{
innerXml.LoadXml(faq.InnerXml);
FAQ oFaq = new FAQ();
#region Fields
// Get Title value if node exists and is not null
if (innerXml.SelectSingleNode("root/title") != null)
{
oFaq.Title = innerXml.SelectSingleNode("root/title").InnerXml;
}
// Get Details value if node exists and is not null
if (innerXml.SelectSingleNode("root/details") != null)
{
oFaq.Description = innerXml.SelectSingleNode("root/details").InnerXml;
}
#endregion
result.Add(oFaq);
}
}
catch (Exception ex)
{
// Handle Exception
}
I do think wrapping your details node in a cdata block is the right decision. CData basically indicates that the information contained within it should be treated as text, and not parsed for XML special characters. The html charaters in the details node, especially the < and > are in direct conflict with the XML spec, and should really be marked as text.
You might be able to hack around this by grabbing the innerXml, but if you have control over the document content, cdata is the correct decision.
In case you need an example of how that should look, here's a modified version of the detail node:
<details>
<![CDATA[
This is <strong>test copy</strong>. This is A Link
]]>
</details>

Categories