We are generating an xml file in C# using xmlseralizer and UTF8 encoding. We check the output and the xml is well formed and passes XSD validation.
We send this xml to customer who load this in UNIX environment. They keep on telling us that xml is not valid and has invalid characters. We don't have UNIX environment to test.
The question being, is there any difference when loading xml files in UNIX?
What can we ask the customer to provide to better understand this situation?
You might have a UTF-8 BOM as the first three bytes of your file:
<?xml version="1.0" encoding="utf-8"?>
It is not part of the XML document so a file reader should not pass it on to be interpreted by the XML parser. If you have it, you could try to remove it and see if your users have the same complaint. Most editors will not show it to you so you might have use a hex editor. (Hex: EF BB BF).
If the problem remains, you'd need to know at what byte offset the purported invalid characters are and which section of the XML specification they violate. Which program and version they are use and what feedback it gives might be helpful, too.
You might also consider that the file is getting damaged in delivery. A round trip transmission might help detect that.
Related
This question already has answers here:
How to parse invalid (bad / not well-formed) XML?
(4 answers)
Closed last year.
I am trying to decode a custom xml config file in C#, but I am having troubles to create this file from the string I was able to get after my decode step.
After trying to build the xml, I got this error:
System.Xml.XmlException : 'Name cannot begin with the '0' character, hexadecimal value 0x30. Line 1, position 2.'
I know my xml is not a valid xml file because of its bad formatting but I would like to know is there is a way to build it anyways.
Format of the "xml file" :
<01_config.xml>
<name dataType="String">some_name</name>
<description dataType="String">some_description</description>
</01_config.xml>
If I replace 01_config.xml by config during debug, everything will work fine since it will become a valid xml file. But it will not be the good format for my config.
I guess I can still build the file without using the C# Xml building tools, but I would like to know if it's possible to do it with it in the first place.
XML component (element and attribute) names may not begin with a number.
Strictly speaking, this is a matter of the rules for being an XML document – well-formedness, not validity.
Reasons to correct this mistake
You want your document to be XML.
You want users of your document to be able to use the XML ecosystem of editors, parsers, validators, databases, transformation/selection languages, and libraries available in many languages.
You want the interoperability benefits of using a standard.
You want to participate in a community of users – tapping into, and contribute to, a collective body of knowledge to the community's mutual benefit.
Reasons to proceed with bad "XML"
You like the aesthetics of your "XML" variant because it's a "good format for my config".
Recommendation
Fix the mistake and work with standard XML.
See also
How to have an XML tag start with a number?
How to parse invalid (bad / not well-formed) XML?
I know my xml is not a valid xml file because of its bad formatting but I would like to know is there is a way to build it anyways.
Sure, you can build files that aren't well-formed XML, but then you won't be able to read them using XML tools.
I'm new to Powershell commands and I use get-content and then .replace to edit part of web.config on server, I wanted to know if this method of editing web.config is safe or not?
thanks
In a word, no.
web.config is an XML file, and should be edited as an XML document. The file contents are text (instead of binary), but the catch is that the XML element structure must be maintained. Editing the file as text is prone to mistakes, such as mixing up open/close element order, missing elements and encoding characters that are reserved.
If one makes a single mistake in editing XML as it would be an ordinary text file, any application that expects it to be a valid XML document will throw an error as the file is not valid.
To be honest, it is possible to update an XML config as it would be a text file. Why take the risk, though? Since .Net, and thus Powershell, is capable of parsing and processing XML, load the config file as an XML document, update the contents and save it. By processing it as an XML document, .Net libraries will take care about how to process the file so that the result is always a valid XML file.
There are existing questions at, SO, try searching for "powershell web.config $setting-you-want-to-change" for similar cases.
Also, the config can be updated via web administration cmdlets, for example, adding SSL settings.
In the XML i need to read in C#, i find characters such as
é, É.
As far as i know , i should not find those characters in a windows-1252 encoded XML. Can i fix that problem in C# or the XML itself must be updated?
Thanks in advance.
It does look like the XML needs to be updated.
You could certainly write something that reads it in as the UTF-8 it really is and writes it back out as the Windows-1252 it claimed to be, but why bother? XML in Windows-1252 is like someone using their smart-phone while dressed ye olde knight at a Renaissance Faire anyway. Just drop the incorrect declaration from the first line and away you go.
The simple answer is: you're probably using the wrong encoding. From this I'd say you should be using UTF-8. You can force it by downloading the document before parsing it.
I should note that downloading URL's is tricky: web servers often report the wrong encoding. That is also the reason why the HTML5 standard includes a section on encoding detection. I'm afraid there's no easy generic solution for this -- we ended up implementing our own encoding detection algorithms for our web crawlers.
I'm making a world editor for a game in C# XNA.
The file contains a large sum of data so I feel XmlWriter is necessary.
The application runs perfectly fine. Files are saved in a directory which they're immediately accessible in, however, for the file to load directly into the pipeline it's necessary to include the line
<Asset Type = ObjectID.objectID[]>
Unfortunately this includes hexidecimal characters not supported by XmlWriter, XDocument and XmlDocument so I'm wondering if there's a way around it or perhaps there's an xml type I've not tried that allows odd hexidecimal characters.
If there isn't, that's quite alright as I've a back-up plan, but I'm just wondering.
Thank you kindly for the read and I hope my question is well written. :)
I found that I was able to use WriteRaw to write the line as a raw string, though this breaks the file format :(
writer.WriteRaw("<Asset Type = \"objectID.objectID[]>\"");
Sorry to be the one to answer my own question but thanks for the support all the same.
<?xml version="1.0" encoding="utf-8" standalone="no"?>
<XnaContent><Asset Type = "objectID.objectID[]>"<Item><ID>2</ID><xPos>640</xPos><yPos>280</yPos> <xPath>0</xPath><yPath>0</yPath></Item></Asset></XnaContent>
I am writing a parser to parse incoming text files. I have it to where it will parse everything accurately.
I have an option for it to output to text - this was done to check the accuracy of the parsing. I am currently implementing an option to write to a spreadsheet but it doesn't output everything yet.
I have a request to output as static HTML. Is it worth outputting to XML and then generating HTML from that?
I see C# has the XMLTransform class which looks like it would do what I need. Is using the XML designer in VS and writing the XSLT file easier than hand-coding all of the HTML output? I know Excel will import XML files, but it is a little messy and I don't get the formatting options I can get if I generate the .xls file directly
I would give you a qualified No.
It is generally not worth building XML then running it through an XSLT transformation to build HTML.
That said, I might consider such an option if I wanted to easily swap out transformations, such as if this is an app used by multiple clients and the generated HTML would be client dependent. Even then I'd investigate using a simple tokenized HTML template in which I just plugged in the data I wanted. However, if the transformation was sufficiently complex then, yes, I'd go the XSLT route.
The reason for the No is that by the conversion adds such a level of complexity that it is usually not worth the time involved.