I have a need to normalize XML streams to UTF-16. I use the following method:
All streams passed are byte streams: MemoryStream or FileStream. My problem is when I pass in a filestream containing the following (correctly encoded) XML as jobTicket:
<?xml version="1.0" encoding="utf-8"?>
<workflow>
<file>
<request name="create-temp-file" tag="очень">
</request>
<request name="create-temp-folder" tag="非常に">
</request>
</file>
</workflow>
ticketStreamU16 contains an XML declaration, complete with UTF-8 encoding declaration as UTF-16. This is not well formed XML.
public void EncodeJobTicket(Stream jobTicket, Stream ticketStreamU16)
{
XmlWriterSettings settings = new XmlWriterSettings();
settings.Encoding = Encoding.Unicode;
if (jobTicket.CanSeek)
{
jobTicket.Position = 0;
}
using (XmlReader xmlRdr = XmlReader.Create(jobTicket))
using (XmlWriter xmlWtr = XmlWriter.Create(ticketStreamU16, settings))
{
xmlWtr.WriteNode(xmlRdr, false);
}
}
What am I missing? Shouldn't xmlWtr write the correct xml declartion? Do I have to look for a declaration and replace it?
You should remove the xml declaration from the read stream, because it copy it without change. Xml writer will write correct one for you:
using (XmlWriter xmlWtr = XmlWriter.Create(ticketStreamU16, settings))
{
xmlRdr.MoveToContent();
xmlWtr.WriteNode(xmlRdr, false);
}
Related
I sent an xml file which I created while serializing an object and received a response that it is incorrect and not well-formed:
<?xml version="1.0" encoding="utf-8"?>
Moreover, I am supposed to use ISO-8859-1.
I assume that I not only have to change <?xml version="1.0" encoding="ISO-8859-1"?>, but additionally I have to create the file during serialization from the code already with encoding ISO-8859-1. Correct?
I am doint it this way:
XmlSerializer ser = new XmlSerializer(obj.GetType());
var encoding = Encoding.GetEncoding("ISO-8859-1");
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings
{
Indent = true,
OmitXmlDeclaration = false,
Encoding = encoding
};
XmlDocument xd = null;
using (MemoryStream memStm = new MemoryStream())
{
using (var xmlWriter = XmlWriter.Create(memStm, xmlWriterSettings))
{
ser.Serialize(xmlWriter, input);
}
memStm.Position = 0;
XmlReaderSettings settings = new XmlReaderSettings();
using (var xtr = XmlReader.Create(memStm, settings))
{
xd = new XmlDocument();
xd.Load(xtr);
}
}
byte[] file = encoding.GetBytes(xml.OuterXml);
I used a framework to find out what encoding my created files have and when I create them with ISO-8859-1 as above my encoding checker gives me ASCII, is that correct?
I sent an xml file which I created while serializing an object and received a response that it is incorrect and not well-formed:
The  chars represents a BOM (byte-order-mark) for utf-8 files. That BOM can be a part of utf-8 encoded files. So your xml is valid if read properly.
More information about BOM: http://www.unicode.org/faq/utf_bom.html#bom1
I assume that I not only have to change <?xml version="1.0" encoding="ISO-8859-1"?>, but additionally I have to create the file during serialization from the code already with encoding ISO-8859-1. Correct?
Correct.
I used a framework to find out what encoding my created files have and when I create them with ISO-8859-1 as above my encoding checker gives me ASCII, is that correct?
So, the encoding of a text file cannot be determined exactly, but only "guessed" by means of an analysis. Various encodings have the same code pages for the ASCII characters, therefore ASCII is suitable as result.
I have created the following wrapper method to disable DTD
public class Program
{
public static void Main(string[] args)
{
string s = #"<?xml version =""1.0"" encoding=""utf-16""?>
<ArrayOfSerializingTemplateItem xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"">
<SerializingTemplateItem>
</SerializingTemplateItem>
</ArrayOfSerializingTemplateItem >";
try
{
XmlReader reader = XmlWrapper.CreateXmlReaderObject(s);
XmlSerializer sr = new XmlSerializer(typeof(List<SerializingTemplateItem>));
Object ob = sr.Deserialize(reader);
}
catch (Exception ex)
{
Console.WriteLine(ex);
throw;
}
Console.ReadLine();
}
}
public class XmlWrapper
{
public static XmlReader CreateXmlReaderObject(string sr)
{
byte[] byteArray = Encoding.UTF8.GetBytes(sr);
MemoryStream stream = new MemoryStream(byteArray);
stream.Position = 0;
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.None;
settings.DtdProcessing = DtdProcessing.Ignore;
return XmlReader.Create(stream, settings);
}
}
public class SerializingTemplateItem
{
}
The above throws exception "There is no Unicode byte order mark. Cannot switch to Unicode." (Demo fiddle here: https://dotnetfiddle.net/pGxOE9).
But if I use the following code to create the XmlReader instead of calling the XmlWrapper method. It works fine.
StringReader stringReader = new StringReader( xml );
XmlReader reader = new XmlTextReader( stringReader );
But I need to use the wrapper method as a security requirement to disable DTD. I don't know why I am unable to deserialize after calling my wrapper method. Any help will be highly appreciated.
Your problem is that you have encoded the XML into a MemoryStream using Encoding.UTF8, but the XML string itself claims to be encoded in UTF-16 in the encoding declaration in its XML text declaration:
<?xml version ="1.0" encoding="utf-16"?>
<ArrayOfSerializingTemplateItem>
<!-- Content omitted -->
</ArrayOfSerializingTemplateItem >
Apparently when the XmlReader encounters this declaration, it tries honor the declaration and switch from UTF-8 to UTF-16 but fails for some reason - possibly because the stream really is encoded in UTF-8. Conversely when the deprecated XmlTextReader encounters the declaration, it apparently just ignores it as not implemented, which happens to cause things to work successfully in this situation.
The simplest way to resolve this is to read directly from the string using a StringReader using XmlReader.Create(TextReader, XmlReaderSettings):
public class XmlWrapper
{
public static XmlReader CreateXmlReaderObject(string sr)
{
var settings = new XmlReaderSettings
{
ValidationType = ValidationType.None,
DtdProcessing = DtdProcessing.Ignore,
};
return XmlReader.Create(new StringReader(sr), settings);
}
}
Since a c# string is always encoded internally in UTF-16 the encoding statement in the XML will be ignored as irrelevant. This will also be more performant as the conversion to an intermediate byte array is completely skipped.
Incidentally, you should dispose of your XmlReader via a using statement:
Object ob;
using (var reader = XmlWrapper.CreateXmlReaderObject(s))
{
XmlSerializer sr = new XmlSerializer(typeof(List<SerializingTemplateItem>));
ob = sr.Deserialize(reader);
}
Working sample fiddle here.
Related questions:
Meaning of - <?xml version="1.0" encoding="utf-8"?>
Ignoring specified encoding when deserializing XML
I have to generate a XML file using "ISO-8859-1" encoding from my Asp.Net Web API application but MemoryStream lowercases the encoding attribute from the generated XML definition to "iso-8859-1".
This method generates a XML file based on a object which has been created by a XSD.
public static MemoryStream GenerateXml<T>(T entity) where T : class
{
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
//Add an empty namespace and empty value
ns.Add("", "");
var memoryStream = new MemoryStream();
var streamWriter = new StreamWriter(memoryStream, Encoding.GetEncoding("ISO-8859-1"));
var serializer = new XmlSerializer(typeof(T));
serializer.Serialize(streamWriter, entity, ns);
return memoryStream;
}
Then I need to use XDocument to replace the prefix definition of XML elements (Its a prerequisite that all elements should be only named with their own tags). So I had to do this:
public MemoryStream GenerateXmlOpening<T>(T entity) where T : class
{
var xmlMemStream = XmlHelper.GenerateXml(entity);
xmlMemStream.Position = 0;
XDocument doc = XDocument.Load(xmlMemStream, LoadOptions.PreserveWhitespace);
//Removes the namespace declaration as prefix on elements
doc.Descendants().Attributes().Where(a => a.IsNamespaceDeclaration).Remove();
//the memory stream retreived from 'xmlMemStream' is already with "iso-8859-1 in lowercase, so im trying to override it
doc.Declaration.Encoding = "ISO-8859-1";
MemoryStream stream = new MemoryStream();
// when i save the xdoc to the new memorystream, the encoding goes from "ISO-8859-1" to "iso-8859-1" again.
doc.Save(stream);
stream.Position = 0;
return stream;
}
This is the beginning of the returned generated XML file:
<?xml version="1.0" encoding="iso-8859-1"?>
... content
How it's supposed to be:
<?xml version="1.0" encoding="ISO-8859-1"?>
... content
Ps.* Im writing the XML using a MemoryStream because I have to write a .zip file and return a response of all generated XML files within this zip. This .Zip generator receives a list of MemoryStreams.
I'm trying to read an XML file and I get an XmlException : "data at the root level is invalid. Line 1, position 1".
Here is the content of the XML file :
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<root>
<Materials override="TRUE">
<Material name="" diffuse="" />
</Materials>
</root>
And here is my code :
using (FileStream fstr = File.OpenRead(sFullPath))
{
XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Document;
fstr.Position = 0;
using (XmlReader xmlReader = XmlReader.Create(fstr, settings))
{
while (xmlReader.Read())
{
}
}
}
The exception is raised by the call to Read().
I've been searching for an answer on different sites, had a look at the MSDN too, but can't solve my problem.
My code is taken from http://www.codeproject.com/Articles/318876/Using-the-XmlReader-class-with-Csharp but I tried different snippets too.
I also checked the encoding of my file on Notepad++, tried both UTF-8 and UTF-8 without BOM, didn't make a change.
I'm stuck on this for a couple of days and I'm running out of ideas.
Thanx for your help!
Edit : removed the "..." in the snippet to avoid confusing people. I also did a try with :
using (XmlTextReader xmlReader = new XmlTextReader(fstr))
and it appears that xmlReader.Encoding is returning null, whereas my file is encoded to UTF-8.
I have a problem when I deserialize the xml into List of Objects. I searched it on the net this morning, but my problem isn't resolved.
Deserialization method
public static List<FileAction> DeSerialization()
{
XmlRootAttribute xRoot=new XmlRootAttribute();
xRoot.ElementName="ArrayOfSerializeClass";
xRoot.IsNullable=true;
XmlSerializer serializer = new XmlSerializer(typeof(List<FileAction>),xRoot);//, new XmlRootAttribute("ArrayOfSerializeClass")
using (Stream streamReader = File.OpenRead(#"C:\serialization\SerializationWithFileWatcher\Output\XmlSerialize.xml"))//FileStream fs =new FileStream(xmlPath,FileMode.Open)
{
using (XmlReader reader = XmlReader.Create(streamReader))
{
int count =0;
List<FileAction> serialList2 = (List<FileAction>)serializer.Deserialize(reader);
return (List<FileAction>)serializer.Deserialize(reader);
}
}
Calling Method
String resultPath = #"C:\serialization\SerializationWithFileWatcher\Output\XmlSerialize.xml";
if (!File.Exists(resultPath))
{
XmlSerializer xs = new XmlSerializer(typeof(List<SerializeClass>));
using (FileStream fileStream = new FileStream(#"C:\serialization\SerializationWithFileWatcher\Output\XmlSerialize.xml", FileMode.Create))
{
xs.Serialize(fileStream, serializeList);//seri
fileStream.Close();
}
Console.WriteLine("Succesfully serialized to XML");
}
else
{
//string path= #"C:\serialization\SerializationWithFileWatcher\Output\XmlSerialize.xml";
DeSerialization();
XmlSerializer xs = new XmlSerializer(typeof(List<SerializeClass>));
FileStream fs = new FileStream(#"C:\serialization\SerializationWithFileWatcher\Output\XmlSerialize.xml", FileMode.Append, FileAccess.Write);
using (XmlWriter xwr = XmlWriter.Create(fs))//TextWriter xwr = new StreamWriter
{
xs.Serialize(xwr, serializeList);//seri
//fs.Close();
}
Console.WriteLine("Succesfully serialized to XML");
}
return serializeList;
The reason why I am calling it here is that I want to add this object again to the xml file.
THe error is that here is an error in XML document (15,27).
My Xml structure
<?xml version="1.0"?>
<ArrayOfSerializeClass xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<SerializeClass>
<creationTime>2013-11-25T09:53:25.3325289+05:30</creationTime>
<fileAction>Renamed</fileAction>
<Properties>
<FileAttributes fileName="validate json.txt">
<fileSize>307</fileSize>
<extension>.txt</extension>
<lastAccessTime>2013-11-25T09:53:25.3325289+05:30</lastAccessTime
<fullPath>C:\serialization\SerializationWithFileWatcher\SerializationWithFileWatcherProj\validate json.txt</fullPath>
</FileAttributes>
</Properties>
</SerializeClass>
</ArrayOfSerializeClass>
What I understand from the code above is that you are trying to extend the current XML, by first reading it as a FileStream and then using an XmlWriter to add some more content to it.
If my understanding is correct, then you are trying to write to the end of an existing XML file, which is not allowed since any XML document can have only one root node. In your case, that root node is ArrayOfSerializeClass.
So, in order to successfully achieve your task, you must append your XML within the root node.
Update:
Possible solution here: how to append a xml file in c#?