How to remove the header from an XML document - c#

I'm not exactly sure if it's called a header, but anyway, I'm using a serializer in C# to generate XML tags,
How do I remove the xml link that's stuck on my first tag, for example currently I have:
<FirstTag xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> testData
</FirstTag>
And this is the desired output:
<FirstTag> testData </FirstTag>

Related

How to parse a XML with nested XML text

Trying to read XML file with nested XML object with own XML declaration. As expected got exception:
Unexpected XML declaration. The XML declaration must be the first node in the document, and no white space characters are allowed to appear before it.
How can i read that specific element as text and parse it as separate XML document for later deserialization?
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Items>
<Item>
<Target type="System.String">Some target</Target>
<Content type="System.String"><?xml version="1.0" encoding="utf-8"?><Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>
Every approach i'm trying fail due to declaration exception.
var xml = System.IO.File.ReadAllText("Info.xml");
var xDoc = XDocument.Parse(xml); // Exception
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml); // Exception
var xmlReader = XmlReader.Create(new StringReader(xml));
xmlReader.ReadToFollowing("Content"); // Exception
I have no control over XML creation.
The only way I would know is by getting rid of the illegal second <?xml> declaration. I wrote a sample that will simply look for and discard the second <?xml>. After that the string has become valid XML and can be parsed. You may need to tweak it a bit to make it work for your exact scenario.
Code:
using System;
using System.Xml;
public class Program
{
public static void Main()
{
var badXML = #"<?xml version=""1.0"" encoding=""UTF-8""?>
<Data>
<Items>
<Item>
<Target type=""System.String"">Some target</Target>
<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?><Data><Items><Item><surname type=""System.String"">Some Surname</surname><name type=""System.String"">Some Name</name></Item></Items></Data></Content>
</Item>
</Items>
</Data>";
var goodXML = badXML.Replace(#"<Content type=""System.String""><?xml version=""1.0"" encoding=""utf-8""?>"
, #"<Content type=""System.String"">");
var xmlDoc = new XmlDocument();
xmlDoc.LoadXml(goodXML);
XmlNodeList itemRefList = xmlDoc.GetElementsByTagName("Content");
foreach (XmlNode xn in itemRefList)
{
Console.WriteLine(xn.InnerXml);
}
}
}
Output:
<Data><Items><Item><surname type="System.String">Some Surname</surname><name type="System.String">Some Name</name></Item></Items></Data>
Working DotNetFiddle: https://dotnetfiddle.net/ShmZCy
Perhaps needless to say: all of this would not have been needed if the thing that created this invalid XML would have applied the common rule to wrap the nested XML in a <![CDATA[ .... ]]> block.
The <?xml ...?> processing declaration is only valid on the first line of an XML document, and so the XML that you've been given isn't well-formed XML. This will make it quite difficult to parse as is without either changing the source document (and you've indicated that's not possible) or preprocessing the source.
You could try:
Stripping out the <?xml ?> instruction with regex or string manipulation, but the cure there may be worse than the disease.
The HTMLAgilityPack, which implements a more forgiving parser, may work with an XML document
Other than that, the producer of the document should look to produce well-formed XML:
CDATA sections can help this, but be aware that CDATA can't contain the ]]> end tag.
XML escaping the XML text can work fine; that is, use the standard routines to turn < into < and so forth.
XML namespaces can also help here, but they can be daunting in the beginning.

C# Skip anything to next tag

I have a log file in xml format like
<log> // skip this node
<?xml version="1.0" encoding="UTF-8"?>
<qbean logger="main-logger">
</qbean>
</log>
<log> // go to this node
</log>
Now ReadToNextSibling("log") throw an exception an I need to skip content of first "log" tag and move to next "log" tag without throwing exception.
Is there a way?
Hint:
Your XML is invalid since the <?xml version="1.0" encoding="UTF-8"?> has to be before the root element. You can search for it and remove it if that fixes your problem. You can use yourXml.Repalce("<?xml version=\"1.0\" encoding=\"UTF-8\"?>", "")
You have to create a root element for your XML to be valid for parsing.
Then, you can use the XmlDocument class to parse the XML data that you have and skip anything you want. You would need something like this:
var document = new XmlDocument();
document.LoadXml(yourXml);
document.DocumentElement.ChildNodes[1]

Change an XML node value

I have an xml document that looks like this
<?xml version="1.0"?>
<XML>
<VIDEO>
<WIDTH>800</WIDTH>
<HEIGHT>600</HEIGHT>
<COLORBITS>32</COLORBITS>
<GAMMA>255</GAMMA>
<FULLSCREEN>TRUE</FULLSCREEN>
<REFLECTION>true</REFLECTION>
<LIGHTMAP>true</LIGHTMAP>
<DYNAMICLIGHT>true</DYNAMICLIGHT>
<SHADER>true</SHADER>
<CHARACTORTEXTURELEVEL>0</CHARACTORTEXTURELEVEL>
<MAPTEXTURELEVEL>0</MAPTEXTURELEVEL>
<EFFECTLEVEL>0</EFFECTLEVEL>
<TEXTUREFORMAT>1</TEXTUREFORMAT>
<NHARDWARETNL>false</NHARDWARETNL>
</VIDEO>
</XML>
I want to change the value of the "MAPTEXTURELEVEL" node from 0 to 6 using the checked statement of a checkbox in a C# application, but I really have no idea of how I can do it.
I don't have VS to test it, but it should be something like this using LINQ to XML:
var doc = XDocument.Load("video.xml");
doc
.Element("XML")
.Element("VIDEO")
.SetElementValue("MAPTEXTURELEVEL", 6);
doc.Save("video_modified.xml");
Hope it helps!

Parse three specific elements from an XML snippet in C# 2.0

How could parse the value of few tag from my XML using C# 2.0?
I want to parse the tag and their value like
1) <v9:Severity>SUCCESS</v9:Severity>
2) <v9:TrackingNumber>634649515000016</v9:TrackingNumber>
3) <v9:Image>iVBORw0KGgoAAAANSUhEUgAAAyAAAASwAQAAAAAryhMIAAAagEl</v9:Image>
How to get the value of those above elements programmatically with C# 2.0?
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
<env:Header xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"/>
<soapenv:Body>
<v9:ProcessShipmentReply xmlns:v9="http://fedex.com/ws/ship/v9">
<v9:HighestSeverity xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">SUCCESS</v9:HighestSeverity>
<v9:Notifications xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<v9:Severity>SUCCESS</v9:Severity>
<v9:Source>ship</v9:Source>
<v9:Code>0000</v9:Code>
<v9:Message>Success</v9:Message>
<v9:LocalizedMessage>Success</v9:LocalizedMessage>
</v9:Notifications>
<v9:CompletedShipmentDetail>
<v9:CompletedPackageDetails xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<v9:SequenceNumber>1</v9:SequenceNumber>
<v9:TrackingIds>
<v9:TrackingIdType>GROUND</v9:TrackingIdType>
<v9:TrackingNumber>634649515000016</v9:TrackingNumber>
</v9:TrackingIds>
<v9:Barcodes>
<v9:BinaryBarcodes>
<v9:Type>COMMON_2D</v9:Type>
<v9:Value>Wyk+HjAxHTAyMDI3ODAdODQwHTEzNx02MzQ2NDk1</v9:Value>
</v9:BinaryBarcodes>
<v9:StringBarcodes>
<v9:Type>GROUND</v9:Type>
<v9:Value>9612137634649515000016</v9:Value>
</v9:StringBarcodes>
</v9:Barcodes>
<v9:Label>
<v9:Type>OUTBOUND_LABEL</v9:Type>
<v9:ShippingDocumentDisposition>RETURNED</v9:ShippingDocumentDisposition>
<v9:Resolution>200</v9:Resolution>
<v9:CopiesToPrint>1</v9:CopiesToPrint>
<v9:Parts>
<v9:DocumentPartSequenceNumber>1</v9:DocumentPartSequenceNumber>
<v9:Image>iVBORw0KGgoAAAANSUhEUgAAAyAAAASwAQAAAAAryhMIAAAagEl</v9:Image>
</v9:Parts>
</v9:Label>
</v9:CompletedPackageDetails>
</v9:CompletedShipmentDetail>
</v9:ProcessShipmentReply>
</soapenv:Body>
</soapenv:Envelope>
Since you said that you use c# 2.0 (and, thus, cannot use LINQ-to-XML), the easiest way to just find single values out of your XML would be to use XPath:
You can use an XPathNavigator (MSDN: Select XML Data using XPathNavigator)
or you can use XmlNode.SelectNodes directly.
Since your XML contains namespaces <v9:...>, the issue gets a bit more complicated: You need to initialize an XmlNamespaceManager and pass it to the XPathNavigator. Here is a blog post that explains this issue in detail; an example can also be found at the XmlNode.SelectNodes MSDN page (see link above).
Query XML with Namespaces using XPathNavigator

XML DocumentElement is trashing the innerXml

I have a simple XML file, shown below, which when read-in via a basic XmlDocument.Load(filename.xml). If I load the file, and inspect it's innerXML, it all looks normal. However, when I inspect the value of DocumentElement, it's a mess!!! I kept the example small, so you can easily see there is no mal-formation:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults>
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
Now, try this in C# with this simple code:
...
XmlDocument xDoc = new XmlDocument();
xDoc.Load("*XMLSAMPLE.XML*");
textBox1.Text = xDoc.InnerXml;
textBox2.Text = xDoc.DocumentElement.InnerXml;
...
It's completely mangled, with the 2nd namespace repeated with every dd tag, and not even included in the top-most tag.
What am I doing wrong? This is driving me nuts!
The content returned by xDoc.DocumentElement.InnerXml is semantically identical to your original ServiceDefaults tag - if the first fragment conforms to your XML schema, the InnerXml fragment will also conform to the definition of the inner element. Just because the framework has re-arranged the namespace declarations does not change the semantics of the document.
Compare the output of your the two XmlDocument properties:
xDoc.DocumentElement:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults>
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
xDoc.DocumentElement.InnerXml:
<fax:ServiceDefaults xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/">
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">false</dd:AutoCompleteToNANP>
<dd:RetryInterval xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">0</dd:RetryInterval>
<dd:MaxRetryAttempts xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
A look at the following link in MSDN will help shed light on your situation:
http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.innerxml.aspx
Basically, xDoc.DocumentElement.InnerXml is looking at the <fax:ServiceDefaults> node, whereas xDoc.InnerXml is looking one level higher (FaxService node). This is crucial to understanding your problem - because all of your xmlns is on the FaxService node.
Make the following change to your XML document, and notice what happens (basically, copy over the xmlns info to the ServiceDefaults node:
<?xml version="1.0" encoding="UTF-8"?>
<fax:FaxService xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceDefaults xmlns:fax="http://www.hp.com/schemas/imaging/con/service/fax/2009/02/11/" xmlns:dd="http://www.hp.com/schemas/imaging/con/dictionaries/1.0/">
<fax:ServiceSendDefaults>
<fax:InternetFaxSettings>
<dd:FaxFileFormat>MTIFFG4</dd:FaxFileFormat>
<dd:UseEmailAsFaxAcctAddr>false</dd:UseEmailAsFaxAcctAddr>
<dd:AutoCompleteToNANP>false</dd:AutoCompleteToNANP>
<dd:RetryInterval>0</dd:RetryInterval>
<dd:MaxRetryAttempts>0</dd:MaxRetryAttempts>
</fax:InternetFaxSettings>
</fax:ServiceSendDefaults>
</fax:ServiceDefaults>
</fax:FaxService>
Suddenly your code will behave according to your expectations. So hopefully this helps you towards understanding the issue. What the permanent fix should be, that's up to you.
HTH!

Categories