How to ignore XML syntax in a node - c#

I want to use an xml document to store print strings that will be sent to an IPL printer along with some other data using C#.
Is there an attribute I can assign to the node that will ignore any xml syntax within that node and basically treat it as a string? For example I have a node below called string which will contain the print job that I want to send to the printer. The problem is the string has XML style tags within it which is causing formatting issues when viewed in visual studio. I am using C# serializer to read the xml tag and copy its contents to a string. What would be the best way to accomplish this task?
<string>
<STX>R<ETX>
<STX><ESC>C<SI>W565<SI>h<ETX>
<STX><ESC>P<ETX>
<STX>E3;F3<ETX>
<STX>U1,LOGO;f3;o130,30;c2;h0,w0<ETX>
<STX>H2,A1;f3;o130,220;c26;b0;h12;w12;d3,PART NO:<ETX>
<STX>H3,B1;f3;o35,220;c26;b0;h8;w7;d3,Date Code<ETX>
<STX>H4,C1;f3;o35,390;c26;b0;h8;w7;d3,Supplier Code<ETX>
<STX>H5,D1;f3;o15,220;c26;b0;h8;w7;d3,lss 1<ETX>
<STX>H6,E1;f3;o15,435;c26;b0;h8;w7;d3,ASSEMBLED IN USA<ETX>
<STX>B7,CODEA;f3;o95,220;c0,3;w2;h55;r0;d0,10<ETX>
<STX>H8,DATAA;f3;o130,350;c26;b0;h12;w12;d0;10<ETX>
<STX>H9,DATAB;f3;o35,300;c26;b0;h8;w7;d0,8<ETX>
<STX>H10,DATAC;f3;o35,510;c26;b0;h8;w7;d0,7<ETX>
<STX>R<ETX>
<STX><ESC>E3<CAN><ETX>
<STX><ESC>F7<DEL>var0<ETX>
<STX><ESC>F8<DEL>var1<ETX>
<STX><ESC>F9<DEL>var2<ETX>
<STX><ESC>F10<DEL>var3<ETX
<STX><RS>1<US>1<ETX>
<STX><ETB><FF><ETX>
</string>

You can turn xml parsing off for your string by using CDATA.
<string>
<![CDATA[
<STX>R<ETX>
[...]
<STX><ETB><FF><ETX>
]]>
</string>
All content between the opening and closing CDATA tag will strictly be treated as string. For more see: What does CDATA in XML mean?

Related

Processing large CDATA section in C#

I am trying to retrieve a cdata section from an xml document, the format of the xml is like this:
<Configuration>
<ConfigItem>
<Key>Hello World</Key>
<Value><![CDATA[For the value we have a large chunk of XAML stored in a CDATA section]]></Value>
</ConfigItem>
</Configuration>
What I am trying to do is retrieve the XAML from the CDATA section, my code so far is as follows:
XmlDocument document = new XmlDocument();
document.Load("Configuration.xml");
XmlCDataSection cDataNode = (XmlCDataSection) document.SelectSingleNode("//*[local-name()='Value']").ChildNodes[0];
String cdata = cDataNode.Data;
However the cdata string has been truncated and is incomplete, I guess because the actual cdata is too large to fit in the string object.
Whats the correct way to do this?
EDIT:
So my original assumption that the string was too long was incorrect. The problem now is that my CDATA contains a nested CDATA within it. Reading online it appears that the proper way to escape the nested cdata is to use ]]]]><![CDATA[> which this xml is using, but it seems like when I select the node it is escaping at the wrong place.
When there's nested CDATA sections, what you need to do is piece the data back together. At present, you're just selecting ChildNodes[0] and ignoring all of the other children. What you'll probably find is that ChildNodes[1] contains some plain text, and then ChildNodes[2] contains another CDATA section, and so on.
You need to extract all of these, extract the data from the CData sections, and concatenate them all together to get the effective "text" contents of the Value element.

Reading Xml files with umlaut chars

I have asked this question yesterday and got a reply.
Writing encoded values for umlauts
In the code the parse method works if it's a string like so:
XDocument xDoc = XDocument.Parse("<description>Top Shelf-ÖÄÜookcase</description>");
To pass the input xml file as string, I have to read it first. The read method will fail if there are umlauts in the input xml.
How do I get past that?
Tried both Load and Parse methods of XDocument.
Load:
Invalid character in the given encoding. Line 3, position 35.
Parse:
Data at the root level is invalid. Line 1, position 1.
Here is a sample xml after using CDATA:
<?xml version="1.0" encoding="utf-8"?>
<kal>
<description><![CDATA[Top Shelf-ÖÄÜookcase]]> </description>
</kal>
Change encoding to "iso-8859-1"
Have you tried wrapping the description data with a CDATA?
<description><![CDATA[Top Shelf-ÖÄÜookcase]]> </description>
Special characters don't particularly parse well in XML unless you wrap them with CDATA.
As Besi stated, you have to use the correct encoding of the xml-file in order to achieve correct handling of the umlauts.
Even so you said that the creation of the incoming xml-file is not in your hand, you can still affect the encoding to use for parsing the xml by using a dedicated StreamReader:
// create your XDocument
XDocument Doc;
// setup a StreamReader for your file, specifying the encoding you need
using (StreamReader Reader = new StreamReader(#"C:\your-file.xml", System.Text.Encoding.GetEncoding("ISO-8859-1")))
{
// PARSE the STRING that is RETURNED from the StreamReader.ReadToEnd()-method
Doc = XDocument.Parse(Reader.ReadToEnd());
}

Save string to XML File

I want to save the following string in an XML File:
<text><![CDATA[<p>what is my pet name</p>]]></text>
When I am saving it, it looks like:
<text><![CDATA[<p>what is my pet name</p>]]></text>
I have tried File.WriteAllText(), XmlDocument.Save() methods but didnt get the proper response.
basically everywhere other than opening and closing tags in the XML, < is replaced by < and > is replaced by >.
What is happening is that the XML parser is encoding your string. When you try to access the string later, it can be decoded again at that time.
What I suggest, is that you either try to load the text as into a new 'XmlDocument' with XmlDocument.LoadXml(string s), and then import that into your current document, or leave it encoded.
You should not try to both use an XML parser, and manually add text at the same time.
I guess you add the CDATA manually and the XML writing mechanism correctly escapes your CDATA because it treats it as text content. Instead explicitly add a CDATA section with just the contents.
If you are using the old XML API (System.XML), then use this method to create the CDATA Section: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createcdatasection
Then append the node to the element just like in the example in the link.
XML is being written correctly.
XML has special characters that are reserved for commands, just like C# reserves words like "if" and "string".
XML is encoding your string for storage. What you need to do is when you retrieve your string, run it through a similar decode process.
Use this: HttpServerUtility.HtmlDecode(encodedString)
Reference:
Decode XML returned by a webservice (< and > are replaced with < and &gt)?

XDocument saves & instead of just &

I am using XDocument to switch a value in an xml document.
In the new value I need to use the character '&' (ampersand)
but after XDocument.save() the xml has & instead!
I tried using encoding and stuff… nothing worked
XDocument is doing exactly what it's supposed to do.
& is invalid XML. (it's an unfinished character/entity reference)
& means "Start of an entity" in XML so if you want to include an & as data you must express it as an entity — & (or use it in a CDATA block).
What you describe is normal behaviour and the XML would break otherwise.
There are two options. Either to ensure proper XML encoding/decoding of all your content in the XML document. Remember that HTML and XML encoding/decoding is slightly different.
Option two is to use base64 encoding on whatever content in the xml that might contain invalid elements.
Is your output file app.config supposed to be an XML file?
If it is, then the & must be escaped as &.
If it isn't, then you should be using the text output method instead of the xml output method: use <xsl:output method='text'/>.
PS: this question appears to be a duplicate of How can I add an ampersand for a value in a ASP.net/C# app config file value

why is XLinq reformatting my XML?

I am using XLinq (XML to Linq) to parse a xml document and one part of the document deals with representing rich-text and uses the xml:space="preserve" attribute to preserve whitespace within the rich-text element.
The issue I'm experiencing is that when I have a element inside the rich-text which only contains a sub-element but no text, XLinq reformats the xml and puts the element on its own line. This, of course, causes additional white space to be created which changes the original content.
Example:
<rich-text xml:space="preserve">
<text-run><br/></text-run>
</rich-text>
results in:
<rich-text xml:space="preserve">
<text-run>
<br/>
</text-run>
</rich-text>
If I add a space or any other text before the <br/> in the original xml like so
<rich-text xml:space="preserve">
<text-run> <br/></text-run>
</rich-text>
the parser doesn't reformat the xml
<rich-text xml:space="preserve">
<text-run> <br/></text-run>
</rich-text>
How can I prevent the xml parser from reformatting my element?
Is this reformatting normal for XML parsing or is this just an unwanted side effect of the XLinq parser?
EDIT:
I am parsing the document like this:
using (var reader = System.Xml.XmlReader.Create(stream))
return XElement.Load(reader);
I am not using any custom XmlReaderSettings or LoadOptions
The problem occurs when I use the .Value property on the text-run XElement to get the text value of the element. Instead of receiving \n which would be the correct output from the original xml, I will receive
\n \n
Note the additional whitespace and line break due to the reformatting! The reformatting can also be observed when inspecting the element in the debugger or calling .ToString().
Have you tried this:
yourXElement.ToString(SaveOptions.DisableFormatting)
This should solve your problem.
btw - you should also do a similar thing on load:
XElement.Parse(sr, LoadOptions.PreserveWhitespace);

Categories