I read XML files that sometimes contain elements like
<stringValue>text
text</stringValue>
XmlReader returns
text\ntext
for such strings.
So, when I rewrite the source XML later using XmlWriter I don't get the same strings (there is no
in them).
Should I worry about all this or it's fine to allow string to be changed this way?
I would worry about it yes because your manipulating the data. This means if you do a round-trip to the XML document the text formatting wouldn't be the same.
You would need to make sure on saving back out to XML persist the same formatting.
is the xml encoding for a new line character (\n). If your XML data has a new line in the text, then this notation is correct and the output from XMLWriter is correct. If the new line was not in the original XML data, I've been seeing an issue with IE10/IE11 using the XMLHttpRequest object inserting \r\n in the XML data.
Related
Im converting mass files to XML and each file is either XML, JSON, CSV or PSV. To do the conversion I need to know what data type the file is without looking at the file extension (Some are coming from API's). Someone suggested that I try parse each file by each of the types until you get a success but that is pretty inefficient and CSV cant be easily parsed as it is essentially just a text file (Same as PSV).
Does anyone have any ideas on what I can do? Thanks.
You can have some kind of "pre-parsing":
Either it starts with an XML declaration, or directly with the root node, first character of an XML file should be <.
First character of a JSON file can only be { if the JSON is built on an object, or [ if the JSON is built on an array.
For CSV and PSV (I guess PSV stands for Point-Separated Values?), each line of the file represent a specific record.
So by checking first character, you may find XML and/or JSON parsing is pointless.
Parsing the first line of the file should be enough to decide if the file format is CSV or PSV.
I am invoking a service that returns responses as xml format. The response doesnt follow the xml guidelines and contains some new lines and "\".
Due to the formatting issues, the deserialization is failing.
XML Format:
\r\n\r\n<?xml version=\"1.0\" encoding=\"utf-8\"?>\r\n<N><details><date>25042014</date><orderNumber>OrderNumber </orderNumber><Response>1</Response></details>
I worked around the problem by removing the new lines and "\" before deserialization but was searching for a cleaner solution if exists.
The XML file has to be well defined, so it must be corresponding to an XSD structure. The escape sequences and new lines will destroy the valid xml, and thus will not correspond to the XSD structure, which, in turn, will cause the deserialization to fail. As far as I know, there is no way around it, except to read the file beforehand, remove the unwanted characters and sequences, and saving it again, so that it may be successfully deserialized when read by an XmlDocument.
I want to save the following string in an XML File:
<text><![CDATA[<p>what is my pet name</p>]]></text>
When I am saving it, it looks like:
<text><![CDATA[<p>what is my pet name</p>]]></text>
I have tried File.WriteAllText(), XmlDocument.Save() methods but didnt get the proper response.
basically everywhere other than opening and closing tags in the XML, < is replaced by < and > is replaced by >.
What is happening is that the XML parser is encoding your string. When you try to access the string later, it can be decoded again at that time.
What I suggest, is that you either try to load the text as into a new 'XmlDocument' with XmlDocument.LoadXml(string s), and then import that into your current document, or leave it encoded.
You should not try to both use an XML parser, and manually add text at the same time.
I guess you add the CDATA manually and the XML writing mechanism correctly escapes your CDATA because it treats it as text content. Instead explicitly add a CDATA section with just the contents.
If you are using the old XML API (System.XML), then use this method to create the CDATA Section: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createcdatasection
Then append the node to the element just like in the example in the link.
XML is being written correctly.
XML has special characters that are reserved for commands, just like C# reserves words like "if" and "string".
XML is encoding your string for storage. What you need to do is when you retrieve your string, run it through a similar decode process.
Use this: HttpServerUtility.HtmlDecode(encodedString)
Reference:
Decode XML returned by a webservice (< and > are replaced with < and >)?
I need to create a xml file which is to be converted to an excel file(.xls), and this means that the xml has a lot of meta info in it. Its easy to write all the contents into the xml file as a text file.
var sw = new FileInfo(tempReportFilePath).CreateText();
sw.WriteLine("meta info and other tags")
However, this method does not escape characters, and when the data contains '<' or '>' or '&' etc. the xml is rendered invalid and the .xls file does not open. I can easily do a replace ( '<' with '<' and so on), but for performance reasons, this method is not suitable.
The other alternative is to use xml text writer, but with a ton of meta info, it will mean writing a lot of tags in code. With sw.WriteLine('stuff'), I could simply put parts of meta info in one tag (as a string) and write them to file. Using xslt, the problem I faced was that tags required spaces. For example, for tabular data, the top row fields could have spaces.
How to go about creating a well formed xml file with a lot of meta info, and where the chareacters ('<', '>' etc) are excaped?
Uri.EscapeDataString(string stringToEscape);
XDocument tutorials.
Why not create xls in the first place, there is a nice library to do so :
http://npoi.codeplex.com/
I used the WriteRaw method for writing the meta info tags. For the other data, which was required to be escaped, I used WriteString method.
I'm reading XML data from a varchar column in a SQL db, into a linq to sql XElement belonging to an XDocument.
When I execute the XDocument.Save method, the XML is written to file but includes the escape characters. For example, ">" is changed to ">".
Is there an easy way to prevent this?
First, there seems to be no reason to prevent it. Like kenny mentioned, unless special characters are XML encoded, no parser would be able to parse produced XML (because '<' or '>' characters means a lot for that parser). Second, when your parser decodes XML (e.g. you call XElement.Value), all special characters will be converted back to what they originally were. Finally, if you want to keep the original string (e.g. for purposes other than XML parsing), you can use CDATA, which in case of Linq2XML is represented by XCData class.
EDIT: As Rob pointed out, I might have gotten it wrong. If the point is to save add existing XML to a document, without special characters appear, use the following code:
XDocument document = new XDocument();
var xmlFromDb = "<xml>content</xml>";
using (var stream = new MemoryStream(Encoding.UTF8.GetBytes(xmlFromDb)))
{
using (var reader = XmlReader.Create(stream)) {
reader.MoveToContent();
document.Add(XElement.ReadFrom(reader));
}
}