How to parse a XML with nested XML text - c#

Trying to read XML file with nested XML object with own XML declaration. As expected got exception:
Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.
How can i read that specific element as text and parse it as separate XML document for later deserialization?
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Items>
<Item>
<Target type="System.String">Some target</Target>
<Content type="System.String"><?xml version="1.0" encoding="utf-8"?><Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>
Every approach i'm trying fail due to declaration exception.
var xml = System.IO.File.ReadAllText("Info.xml");
var xDoc = XDocument.Parse(xml); // Exception
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml); // Exception
var xmlReader = XmlReader.Create(new StringReader(xml));
xmlReader.ReadToFollowing("Content"); // Exception
I have no control over XML creation.

The only way I would know is by getting rid of the illegal second <?xml> declaration. I wrote a sample that will simply look for and discard the second <?xml>. After that the string has become valid XML and can be parsed. You may need to tweak it a bit to make it work for your exact scenario.
Code:
using System;
using System.Xml;
public class Program
{
public static void Main()
{
var badXML = #"<?xml version=""1.0"" encoding=""UTF-8""?>
<Data>
<Items>
<Item>
<Target type=""System.String"">Some target</Target>
<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?><Data><Items><Item><surname type=""System.String"">Some Surname</surname><name type=""System.String"">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>";
var goodXML = badXML.Replace(#"<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?>"
, #"<Content type=""System.String"">");
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(goodXML);
XmlNodeList itemRefList = xmlDoc.GetElementsByTagName("Content");
foreach (XmlNode xn in itemRefList)
{
Console.WriteLine(xn.InnerXml);
}
}
}
Output:
<Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data>
Working DotNetFiddle: https://dotnetfiddle.net/ShmZCy
Perhaps needless to say: all of this would not have been needed if the thing that created this invalid XML would have applied the common rule to wrap the nested XML in a <![CDATA[ .... ]]> block.

The <?xml ...?> processing declaration is only valid on the first line of an XML document, and so the XML that you've been given isn't well-formed XML. This will make it quite difficult to parse as is without either changing the source document (and you've indicated that's not possible) or preprocessing the source.
You could try:
Stripping out the <?xml ?> instruction with regex or string manipulation, but the cure there may be worse than the disease.
The HTMLAgilityPack, which implements a more forgiving parser, may work with an XML document
Other than that, the producer of the document should look to produce well-formed XML:
CDATA sections can help this, but be aware that CDATA can't contain the ]]> end tag.
XML escaping the XML text can work fine; that is, use the standard routines to turn < into < and so forth.
XML namespaces can also help here, but they can be daunting in the beginning.

Related

C# Skip anything to next tag

I have a log file in xml format like
<log> // skip this node
<?xml version="1.0" encoding="UTF-8"?>
<qbean logger="main-logger">
</qbean>
</log>
<log> // go to this node
</log>
Now ReadToNextSibling("log") throw an exception an I need to skip content of first "log" tag and move to next "log" tag without throwing exception.
Is there a way?
Hint:
Your XML is invalid since the <?xml version="1.0" encoding="UTF-8"?> has to be before the root element. You can search for it and remove it if that fixes your problem. You can use yourXml.Repalce("<?xml version=\"1.0\" encoding=\"UTF-8\"?>", "")
You have to create a root element for your XML to be valid for parsing.
Then, you can use the XmlDocument class to parse the XML data that you have and skip anything you want. You would need something like this:
var document = new XmlDocument();
document.LoadXml(yourXml);
document.DocumentElement.ChildNodes[1]

Malformed xml not parsing in XDocument.Parse

Please help, I have an xml document that I want to parse with XDocument. But the xml string that I have has multiple xmlns attributes that is empty. xmlns="" and the moment I remove it it parse, But I receive this from a webservice. I tried replacing it but every which way I try it only replaces one " and I am left with an invalid xmlstring <test xmlns="> I tried regex, I tried the Replace function I tried every know way, and I am now stuck,
Any Suggestions?
string xmlString = #"
<UserFile xmlns=""http://temuri.org"">
<user>
<UserName>Daniel</UserName>
<UserSurname>Vrey</UserSurname>
<Toys xmlns="">
<TToy>Toyota</TToy>
<TToy>Ford</TToy>
</Toys>
</user>
</UserFile>";
XDocument d = XDocument.Parse(xmlString);

Process namespaces using XmlReader

I have a complex XML file with structure as follows:
<?xml version="1.0" encoding="UTF-8"?>
<Document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="xxx:xxx:xxx:xxx:xxxxx:xxx:xsd:xxxx.xxx.xxx.xx">
<Element1>
<Element2>
<Element2A>xxxxxx</Element2A>
<Element2B>2012-08-29T00:00:00</Element2B>
</Element2>
</Element1>
</Document>
Now I am using XmlReader to read this XML document and process information as follows
XmlReader xr = XmlReader.Create(filename);
while (xr.Read())
{
xr.MoveToElement();
XElement node = (XElement)XElement.ReadFrom(xr);
Console.WriteLine(node.Name);
}
xr.Close();
The problem I am facing is in the output the namespace is prefixed to the ElementName. E.g output
{xxx:xxx:xxx:xxx:xxxxx:xxx:xsd:xxxx.xxx.xxx.xx}Element1
Is there any way I can remove/ handle this as I need to do further filtering using Element names and Child names.
XElement.Name is not (as you might expect) a String, but rather an XName which has a LocalName property, thus:
Console.WriteLine(node.Name.LocalName);
You may want to remove the namespace. One way to remove namespace is to write c# code and other way is to use XSLT transformation as suggested in Remove Namespace
-Milind

how Deserializing this type of xml file?

<row>
<id>1</id>
<code></code>
<name></name>
<address></address>
<state></state>
<zone>?</zone>
</row>
<row>
<id>2</id>
<code>AA</code>
<name>Ataria</name>
<address>Sitapur National Highway 24, Uttar Pradesh</address>
<state>Uttar Pradesh</state>
<zone>NER</zone>
</row>
i have no root element in this xml file only row element start and end this xml file.
how Deserializing this type of data ? in c#
If you sure that missing root is only the one issue with your XML - just add it manually:
string fileContent = File.ReadAllText(path);
string rawXml = "<root>" + fileContent + "</root>";
// now you can use LINQ-to-XML or whatever
XDocument xdoc = XDocument.Load(rawXml);
You can also load an XML Fragment directly, via
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;
using (XmlReader reader = XmlReader.Create("tracelog.xml", settings))
{
while (reader.Read())
{
// Process each node of the fragment,
// possibly using reader.ReadSubtree()
}
}
You would create XElements by passing the results of reader.ReadSubTree() to XElement.Load(...).
Well to start with, it's not an XML file - or at least, it doesn't represent an XML document.
One option would be to copy the file into a new file which does have document start/end tags... then you can load it as a normal document. Just create a file, write a document start tag, copy the contents of this file, then write a document end tag, and close the file handle. You could even do this in memory.
Alternatively, it may be possible to read it as it is, in fragments - possibly via XmlReader. I can't say it's something I've done, and I'd generally encourage you to create a full XML file instead, as then you'll be on more familiar territory.
its not an XML file if it doesn't have the root. parser will throw an error if you try to parse it. you can do this way
<?xml version="1.0"?>
<Root>
--- add your file content here
</Root>
then give this file path to the parser.

How to flat xml to one line in c# code?

How to flat xml to one line in c# code ?
Before:
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
After:
<CATALOG><CD><TITLE>Empire Burlesque</TITLE><ARTIST>Bob Dylan</ARTIST>COUNTRY>USA</COUNTRY>....
Assuming you're able to use LINQ to XML, and the XML is currently in a file:
XDocument document = XDocument.Load("test.xml");
document.Save("test2.xml", SaveOptions.DisableFormatting);
If you cant use LINQ to XML, you can:
XmlDocument xmlDoc = new XmlDocument()
xmlDoc.LoadXml("Xml as string"); or xmlDoc.Load(filepath)
xmlDoc.InnerXml -- this should return one liner
If you have the XML in a string:
xml.Replace("\n", "").Replace("\r", "")
I know, that this is old question, but it helped me to find XDocument.ToString()
XDocument doc = XDocument.Load("file.xml");
// Flat one line XML
string s = doc.ToString(SaveOptions.DisableFormatting);
Check SaveOptions documentaion
Depends what you are working with and what output you need...
John Skeet's answer works if reading and writing to files.
Aleksej Vasinov's answer works if you want xml without the xml declaration.
If you simply want the xml in a string, but want the entire structure of the xml, including the declaration, ie..
<?xml version "1.0" encoding="utf-16"?> <-- This is the declaration ->
<TheRestOfTheXml />
.. use a StringWriter...
XDocument doc = GetTheXml(); // op's xml
var wr = new StringWriter();
doc.Save(wr, SaveOptions.DisableFormatting);
var s = wr.GetStringBuilder().ToString();

Categories