Deserializing string list with "\n" results in empty string - c#

I have been banging my head on this one for a bit. It seems like it must be a simple solution but I have searched the internet and tried quite a few things.
I have a complex object which includes a string list that needs to be serialized into xml and then deserialized.
The serialization code has long since been part of the application and works in countless other scenarios but the issue here appears to be that one of the elements in the string list is a mere new line character (i.e. "\n").
It is my understanding, based on my research, it is serializing as expected (see below) but after deserialization the element contains an empty string (i.e. "") instead of "\n".
Here is the code...
public DoStuff(ItemTypeObj item)
{
string myItem = XmlSerialize<ItemType>(item);
ItemTypeObj myNewItemTypeObj = XmlDeserialize<CustomItem>(myItem)
}
public static string XmlSerialize<T>(T objectToSerialize)
{
string ret = string.Empty;
XmlSerializer s = new XmlSerializer(typeof(T));
using (MemoryStream ms = new MemoryStream())
{
s.Serialize(ms, objectToSerialize);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms))
{
sRet = sr.ReadToEnd();
}
}
return ret;
}
public static T XmlDeserialize<T>(string serializedObject)
{
T retVal = default(T);
byte[] ba = ASCIIEncoding.UTF8.GetBytes(serializedObject);
using (MemoryStream ms = new MemoryStream(ba))
{
XmlSerializer s = new XmlSerializer(typeof(T));
retVal = (T)s.Deserialize(ms);
}
return retVal;
}
To give you an idea of the data sent in, ItemTypeObj is the object which includes a string List. The string list can be variable length but sample data could look like this...
[0] = "Zero element text \n"
[1] = "[element1]"
[2] = "\n"
[3] = "[element3]"
[4] = "\n"
[5] = "[element5]"
When serialized it will look like this (which seems correct to me):
<Text>
<string>Zero element text
</string>
<string>[element1]</string>
<string>
</string>
<string>[element3]</string>
<string>
</string>
<string>[element5]</string>
<Text>
From what I've read the newlines are represented as expected in the xml above. The issue is after it is deserialized the string list is this:
[0] = "Zero element text \n"
[1] = "[element1]"
[2] = ""
[3] = "[element3]"
[4] = ""
[5] = "[element5]"
Only the newline characters in the elements that also have text (e.g. [0]) will still exist. The other two are replaced with empty string. If I add text to those elements the new line will be retained.
Looking at the byte array in the deserialization, the array element at the location in the serialized string where the "\n" was turns into a 10 (aka LF, new line). Then that does not successfully get turned into "\n" in the Deserialize. Perhaps that is too much to ask.
Any insight would be most appreciated. Thanks.

You'll need to use the XmlReader and XmlWriter classes or the DataContractSerializer.
See: How to keep XmlSerializer from killing NewLines in Strings?
public static string XmlSerialize<T>(T objectToSerialize)
{
XmlSerializer s = new XmlSerializer(typeof(T));
var settings = new XmlWriterSettings
{
NewLineHandling = NewLineHandling.Entitize
};
using(var stream = new StringWriter())
using(var writer = XmlWriter.Create(stream, settings))
{
s.Serialize(writer, objectToSerialize);
return stream.ToString();
}
}
public static T XmlDeserialize<T>(string serializedObject)
{
XmlSerializer s = new XmlSerializer(typeof(T));
using(var stream = new StringReader(serializedObject))
using(var reader = XmlReader.Create(stream))
{
return (T)s.Deserialize(reader);
}
}
Usage:
public class Foo
{
public string Bar { get; set; }
}
var foo = new Foo { Bar = "\n" };
var result = XmlSerialize(foo);
Console.WriteLine(result);
var newFoo = XmlDeserialize<Foo>(result);
Console.WriteLine(newFoo.Bar);
Debug.Assert(newFoo.Bar == "\n");

Related

xml serialization encoding "&" character

I am converting an object into xml string and then into an escaped string.
public class Program
{
public static void Main(string[] args)
{
BankDetails details = new BankDetails();
var xmlstring = ToXmlString(details);
var escaped = SecurityElement.Escape(xmlstring);
}
private static string ToXmlString<T>(T input)
{
XmlSerializer xsSubmit = new XmlSerializer(typeof(T));
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
var xml = "";
ns.Add("", "");
using (var sww = new StringWriter())
{
using (XmlWriter writer = XmlWriter.Create(sww, new XmlWriterSettings()
{
OmitXmlDeclaration = true
}))
{
xsSubmit.Serialize(writer, input, ns);
xml = sww.ToString();
}
}
return xml;
}
}
public class BankDetails
{
public string MemberName = "B & A Auto";
}
How can I avoid getting & in xmlstring variable.
<BankDetails><MemberName>B & A Auto</MemberName></BankDetails>
I am looking for output something like this:
xmlstring = //<BankDetails><MemberName>B & A Auto</MemberName></BankDetails>
//and then
escaped = //<BankDetails><MemberName>B & A Auto</MemberName></BankDetails>
Working Fiddle
You can use Unicode equivalent character ie decimal or hex, & or & instead.
"B & A Auto" => "B &#038 A Auto";
You can parse your string, convert amps to their unicode equivalence and then escape those.
No, you can not. The & is a special character in XML and used for escaping other characters.
Escaped character in XML
' = &apos;
< = <
> = >
& = &
" = "

How can I extract xml file encoding using streamreader?

I needed to get the encoding type from the top of the xml file
<?xml version=“1.0” encoding=“utf-8”?>
but only the encoding="utf-8" is needed
the "utf-8" only without the quotation mark, how can I achieve this using streamreader?
You need utf-8 or encoding="utf-8" ? this block returns utf-8 as a result. If you need encoding="utf-8", you need to change.
using (var sr = new StreamReader(#"yourXmlFilePath"))
{
var settings = new XmlReaderSettings { ConformanceLevel = ConformanceLevel.Fragment };
using (var xmlReader = XmlReader.Create(sr, settings))
{
if (!xmlReader.Read()) throw new Exception("No line");
var result = xmlReader.GetAttribute("encoding"); //returns utf-8
}
}
Since it's xml, I would recommend XmlTextReader that provides fast, non-cached, forward-only access to XML data and read just top of the xml file since declaration is there. See following method:
string FindXmlEncoding(string path)
{
XmlTextReader reader = new XmlTextReader(path);
reader.Read();
if (reader.NodeType == XmlNodeType.XmlDeclaration)
{
while (reader.MoveToNextAttribute())
{
if (reader.Name == "encoding")
return reader.Value;
}
}
return null;
}
how can I achieve this using StreamReader?
Something like this:
using (StreamReader sr = new StreamReader("XmlFile.xml"))
{
string line = sr.ReadLine();
int closeQuoteIndex = line.LastIndexOf("\"") - 1;
int openingQuoteIndex = line.LastIndexOf("\"", closeQuoteIndex);
string encoding = line.Substring(openingQuoteIndex + 1, closeQuoteIndex - openingQuoteIndex);
}
const string ENCODING_TAG = "encoding"; //You are searching for this. Lets make it constant.
string line = streamReader.ReadLine(); //Use your reader here
int start = line.IndexOf(ENCODING_TAG);
start = line.IndexOf('"', start)+1; //Start of the value
int end = line.IndexOf('"', start); //End of the value
string encoding = line.Substring(start, end-start);
NOTE: This approach expects the encoding to be in the first line of an existing declaration. Which it does not need to be.

XML Deserializing a List<>

I have managed to serialize a list of objects of type Word using XML Serialization:
public static void WriteXML(string fileName)
{
System.Xml.Serialization.XmlSerializer writer =
new System.Xml.Serialization.XmlSerializer(typeof(Word));
System.IO.StreamWriter file = new System.IO.StreamWriter(
fileName);
foreach (var word in Words)
{
writer.Serialize(file, word);
}
file.Close();
}
I have a problem with deserializing this list. Im using this code snippet: http://msdn.microsoft.com/en-us/library/vstudio/ms172872.aspx
I changed my code to something like that:
public static void ReadXML(string fileName)
{
System.Xml.Serialization.XmlSerializer reader =
new System.Xml.Serialization.XmlSerializer(typeof(Word));
System.IO.StreamReader file = new System.IO.StreamReader(
fileName);
foreach (????)
{
Word word=new Word();
word = (Word) reader.Deserialize(file);
Words.Add(word); //Words is a List<Word>
}
}
Of course the foreach() loop is not used properly here. I just have no clue how to do this.
First of all, you should not serialize each word one by one. This would result in a single file containing many xmls, which would of course be invalid.
You want to serialize Words (which is List<Word>) . Therefore your serializer creation should be new XmlSerializer(typeof(List<Word>)) and serialization as writer.Serialize(file, Words);
So your code can be like this:
List<Word> Words = ........
WriteXML("a.xml", Words);
var newWords = ReadXML<List<Word>>("a.xml");
public static void WriteXML(string fileName,object obj)
{
using (var f = File.Create(fileName))
{
XmlSerializer ser = new XmlSerializer(obj.GetType());
ser.Serialize(f, obj);
}
}
public static T ReadXML<T>(string fileName)
{
using (var f = File.Open(fileName,FileMode.Open,FileAccess.Read))
{
XmlSerializer ser = new XmlSerializer(typeof(T));
return (T)ser.Deserialize(f);
}
}
PS: Serializable attribute is required only for BinaryFormatter. XmlSerializer doesn't need it.
You can find the details of the attributes XmlSerializer uses here
How can you serialize individual Word object to same file? This is a kind of overriding the file on each iteration. Simply just serialize the Database object instead of separate Word objects this way:
System.Xml.Serialization.XmlSerializer writer =
new System.Xml.Serialization.XmlSerializer(typeof(Database));
System.IO.StreamWriter file = new System.IO.StreamWriter(fileName);
writer.Serialize(file, yourDatabaseObject);
Note: In addition, make sure that Database is marked with Serializable attribute.

How keep carriage return from parsing XML

I am looking on Internet how keep the carriage return from XML data but I could not find the answer, so I'm here :)
The objective is to write in a file the content of a XML data. So, if the value of the node contains some "\r\n" data, the soft need to write them in file in order to create new line, but it doesn't write, even with space:preserve.
Here is my test class:
XElement xRootNode = new XElement("DOCS");
XElement xData = null;
//XNamespace ns = XNamespace.Xml;
//XAttribute spacePreserve = new XAttribute(ns+"space", "preserve");
//xRootNode.Add(spacePreserve);
xData = new XElement("DOC");
xData.Add(new XAttribute("FILENAME", "c:\\titi\\prout.txt"));
xData.Add(new XAttribute("MODE", "APPEND"));
xData.Add("Hi my name is Baptiste\r\nI'm a lazy boy");
xRootNode.Add(xData);
bool result = Tools.writeToFile(xRootNode.ToString());
And here is my process class:
try
{
XElement xRootNode = XElement.Parse(xmlInputFiles);
String filename = xRootNode.Element(xNodeDoc).Attribute(xAttributeFilename).Value.ToString();
Boolean mode = false;
try
{
mode = xRootNode.Element(xNodeDoc).Attribute(xWriteMode).Value.ToString().ToUpper().Equals(xWriteModeAppend);
}
catch (Exception e)
{
mode = false;
}
String value = xRootNode.Element(xNodeDoc).Value;
StreamWriter destFile = new StreamWriter(filename, mode, System.Text.Encoding.Unicode);
destFile.Write(value);
destFile.Close();
return true;
}
catch (Exception e)
{
return false;
}
Does anybody have an idea?
If you want to preserve cr lf in element or attribute content when saving a XDocument or XElement you can do that by using certain XmlWriterSettings, namely NewLineHandling to Entitize:
string fileName = "XmlOuputTest1.xml";
string attValue = "Line1.\r\nLine2.";
string elementValue = "Line1.\r\nLine2.\r\nLine3.";
XmlWriterSettings xws = new XmlWriterSettings();
xws.NewLineHandling = NewLineHandling.Entitize;
XDocument doc = new XDocument(new XElement("root",
new XAttribute("test", attValue),
elementValue));
using (XmlWriter xw = XmlWriter.Create(fileName, xws))
{
doc.Save(xw);
}
doc = XDocument.Load(fileName);
Console.WriteLine("att value: {0}; element value: {1}.",
attValue == doc.Root.Attribute("test").Value,
elementValue == doc.Root.Value);
In that example the value are preserved in the round trip of saving and loading as the output of the sample is "att value: True; element value: True."
Heres a useful link I found for parsing an Xml string with carraige returns, line feeds in it.
howto-correctly-parse-using-xelementparse-for-strings-that-contain-newline-character-in
It may help those who are parsing an Xml string.
For those who can't be bothered to click it says use an XmlTextReader instead
XmlTextReader xtr = new XmlTextReader(new StringReader(xml));
XElement items = XElement.Load(xtr);
foreach (string desc in items.Elements("Item").Select(i => (string)i.Attribute("Description")))
{
Console.WriteLine("|{0}|", desc);
}

Why is XmlReader / XmlSerializer messing up line jumps in text when deserializing?

My object template, which is deserialized from a hand made XML file contains mixed types and the text can contain line jumps. When I look at the text I can see line jumps are \r\n, but in my deserialized template object, line jumps are \n. How can I keep line jumps as \r\n?
XmlReaderSettings settings = new XmlReaderSettings();
settings.CloseInput = true;
//settings.ValidationEventHandler += ValidationEventHandler;
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(schema);
StringReader r = new StringReader(syntaxEdit.Text);
Schema.template rawTemplate = null;
using (XmlReader validatingReader = XmlReader.Create(r, settings))
{
try
{
XmlSerializer serializer = new XmlSerializer(typeof(Schema.template));
rawTemplate = serializer.Deserialize(validatingReader) as Schema.template;
}
catch (Exception ex)
{
rawTemplate = null;
string floro = ex.Message + (null != ex.InnerException ? ":\n" + ex.InnerException.Message : "");
MessageBox.Show(floro);
}
}
It seems that this is required behavior by the XML specification and is a "feature" in Microsoft's implementation of the XmlReader (see this answer).
Probably the easiest thing for you to do would be to replace \n with \r\n in your result.
That's the behavior mandated by the XML specification: every \r\n, \r or \n MUST be interpreted as a single \n character. If you want to maintain the \r in your output, you have to change it to a character reference (
) as shown below.
public class StackOverflow_7374609
{
[XmlRoot(ElementName = "MyType", Namespace = "")]
public class MyType
{
[XmlText]
public string Value;
}
static void PrintChars(string str)
{
string toEscape = "\r\n\t\b";
string escapeChar = "rntb";
foreach (char c in str)
{
if (' ' <= c && c <= '~')
{
Console.WriteLine(c);
}
else
{
int escapeIndex = toEscape.IndexOf(c);
if (escapeIndex >= 0)
{
Console.WriteLine("\\{0}", escapeChar[escapeIndex]);
}
else
{
Console.WriteLine("\\u{0:X4}", (int)c);
}
}
}
Console.WriteLine();
}
public static void Test()
{
string serialized = "<MyType>Hello\r\nworld</MyType>";
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(serialized));
XmlSerializer xs = new XmlSerializer(typeof(MyType));
MyType obj = (MyType)xs.Deserialize(ms);
Console.WriteLine("Without the replacement");
PrintChars(obj.Value);
serialized = serialized.Replace("\r", "
");
ms = new MemoryStream(Encoding.UTF8.GetBytes(serialized));
obj = (MyType)xs.Deserialize(ms);
Console.WriteLine("With the replacement");
PrintChars(obj.Value);
}
}

Categories