Create xml file in binary format in C# - c#

I want to create a xml file which has xml declaration, root node and child nodes.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<Tag1>
<SubTag>
<Id>
</Id>
<Name>IdentityManagement</Name>
<Time>4/11/2017 6:26:15 PM</Time>
<Message>Message1</Message>
</SubTag>
<SubTag>
<Id>
</Id>
<Name>MainWindow</Name>
<Time>4/11/2017 6:26:20 PM</Time>
<Message>Message2</Message>
</SubTag>
</Tag1>
But I need to write this xml in binary format, so no one can read it.
On calling of one function, one can add another SubTag.
So there can be n number of .

If you want to convert it into a form that is not trivially readable by a human, encode it to base64:
Convert.ToBase64String(textAsBytes);
If it should not be readable by anyone under any circumstances, encrypt it.
I am not sure what you mean when you say 'binary' though, all text is already binary when stored in a file, it is just encoded using an encoding scheme like ASCII or UTF8.

Related

How to read an XML file with carriage return in its contents?

I need to read an XML file that has 
 chars in some node contents and I need to keep that chars as is and avoid converting them into new lines. Those nodes have xmldsig signatures and converting 
 chars into new lines invalidate the signatures.
I have tried loading the XML with XmlDocument.Load, XmlReader, StreamReader and the special chars ends up converted into new lines.
UPDATE with an XML sample
<?xml version="1.0"?>
<catalog>
<book>
<description>description
with
several
lines
</description>
</book>
<Signature xmlns="http://www.w3.org/2000/09/xmldsig#">
...
</Signature>
</catalog>
If the CR characters are literal 0x0D bytes, any conformant XML parser is obliged to drop these or convert them to newlines, under the rules for normalizing line endings in the XML recommendation: see https://www.w3.org/TR/REC-xml/#sec-line-ends.
Generally, any processing of an XML file is going to make changes at the binary level, for example whitespace between attributes will be lost. Your expectation that you can parse and serialize an XML file while preserving its binary representation is fundamentally wrong.
However, the algorithm for XML digital signatures is careful to ignore such variations. It works at a logical level, and should ignore things such as the whitespace within start tags, or the exact representation of line endings. You state that converting CR to NL is invalidating the signature: that sounds wrong to me. The signature should be unaffected.
There are a few ways to read an XML file with carriage return 
 in its contents:
Use an XML parser that supports 
 as a line ending character.
Use a text editor that supports 
 as a line ending character.
Use a tool that can convert 
 to a different line ending character.

Filter certain unicode characters out of XML

... specifically xA3 (&pound, &#xa3, &#163)
I'm loading several long XML documents and periodically, I'll run into one that won't load, throwing the exception:
Invalid character in the given encoding. Line x, position y.
Here's the code in question:
var doc = new XmlDocument();
doc.Load(file.FullName);
When I look at the document in question at the line indicated, I'll see the xA3 formatted inversely (black bg, white fg) within one of the XML tags.
The header of each XML file is nothing remarkable:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
This may sound like a really dumb question, but is there a way to either remove the offending character or tell the XMLDocument that reads the file to accept the character coding?
This answer is based on the assumption that your XML file does not contain the character entity £ but the byte value 0xa3.
The UTF-8 code for the pound sign is the two byte code 0xc2 0xa3. If there is no byte 0xc2 before 0xa3 the encoding of your XML file is not UTF-8, and the header information is wrong.
If this is the case you can either change the encoding in the XML header to ISO 8859-1 (where the pound sign can be found at code point 0xa3), or try to figure out why your XML files are not UTF-8 encoded and fix them. As I don't know if your files contain any characters that do not exist in ISO 8859-1 I would prefer the second option.

Needs space in the element name in xml formation

I've set of data in the database and needs to convert to the xml format, but the problem is one of the element name has the space between the element name but I want to use this name in the xml
<Data>
<Out put xml>
<ROW>
</ROW>
</Out put xml>
<Data>
I aware about that couldn't possible to use space in the element name. Please suggest me any other alternative to achieve this.
Try to encode the name while converting to xml
XmlConvert.EncodeName(Name);
Then to convert back just decode it
XmlConvert.DecodeName(Name);

XML UTF-8 encoding checking

I have an XML structure like this, some Student item contains invalid UTF-8 byte sequenceswhich may cause XML parsing fail for the whole XML document.
What I want to do is, filter out Student item which contains UTF-8 byte sequences, and keep the valid byte sequences ones. Any advice or samples about how to do this in .Net (C# preferred)?
BTW: invalid byte sequences I mean => http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
<?xml version="1.0" encoding="utf-8"?>
<AllStudents>
<Student>
Mike
</Student>
<Student>
(Invalid name here)
</Student>
</AllStudents>
thanks in advance,
George
That's pretty hard to do. You won't get an XML parser to parse a document with invalid characters in it, so I think you're reduced to a couple of options:
Figure out why the encoding is wrong - a common problem is labeling the document as UTF-8 (or having no encoding declaration) when the document is actually written in Latin-1.
Take out the bad sections by hand.
Try and find a tag soup parser for .NET that will continue parsing after the error.
Reject the invalid XML document.
I don't know C#, so I'm afraid I can't give you code to do this, but the basic idea is to read the whole file as a utf-8 text file, using a DecoderFallback to replace invalid sequences with either question mark characters or the unicode chacter 0xFFFD. Then write the file back out as a utf-8 text file, and parse that.
Basically, you separate out the operation of "wiping out bad utf-8 sequences" from the operation of "parsing the xml file".
You should probably even be able to skip writing the file back out again before running the XML parser to read in the fixed data; there should be some way to write the file to an in-memory byte stream and parse that byte stream as XML. (Again, sorry for not knowing C#)
Very close from XML encoding issue.

XmlDocument dropping encoded characters

My C# application loads XML documents using the following code:
XmlDocument doc = new XmlDocument();
doc.Load(path);
Some of these documents contain encoded characters, for example:
<xsl:text>
</xsl:text>
I notice that when these documents are loaded,
gets dropped.
My question: How can I preserve <xsl:text>
</xsl:text>?
FYI - The XML declaration used for these documents:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
Are you sure the character is dropped? character 10 is just a line feed- it wouldn't exactly show up in your debugger window. It could also be treated as whitespace. Have you tried playing with the whitespace settings on your xmldocument?
If you need to preserve the encoding you only have two choices: a CDATA section or reading as plain text rather than Xml. I suspect you have absolutely 0 control over the documents that come into the system, therefore eliminating the CDATA option.
Plain-text rather than Xml is probably distasteful as well, but it's all you have left. If you need to do validation or other processing you could first load and verify the xml, and then concatenate your files using simple file streams as a separate step. Again: not ideal, but it's all that's left.
is a linefeed - i.e. whitespace. The XML parser will load it in as a linefeed, and thereafter ignore the fact that it was originally encoded. The encoding is just part of the serialization of the data to text format - it's not part of the data itself.
Now, XML sometimes ignores whitespace and sometimes doesn't, depending on context, API etc. As Joel says you may find that it's not missing at all - or you may find that using it with an API which allows you to preserve whitespace fixes the problem. I wouldn't be at all surprised to see it turned into an unencoded linefeed character when you output the data though.
maybe it would be better to keep data in ![CDATA] ?
http://www.w3schools.com/XML/xml_cdata.asp

Categories