Why is XmlReader / XmlSerializer messing up line jumps in text when deserializing? - c#

My object template, which is deserialized from a hand made XML file contains mixed types and the text can contain line jumps. When I look at the text I can see line jumps are \r\n, but in my deserialized template object, line jumps are \n. How can I keep line jumps as \r\n?
XmlReaderSettings settings = new XmlReaderSettings();
settings.CloseInput = true;
//settings.ValidationEventHandler += ValidationEventHandler;
settings.ValidationType = ValidationType.Schema;
settings.Schemas.Add(schema);
StringReader r = new StringReader(syntaxEdit.Text);
Schema.template rawTemplate = null;
using (XmlReader validatingReader = XmlReader.Create(r, settings))
{
try
{
XmlSerializer serializer = new XmlSerializer(typeof(Schema.template));
rawTemplate = serializer.Deserialize(validatingReader) as Schema.template;
}
catch (Exception ex)
{
rawTemplate = null;
string floro = ex.Message + (null != ex.InnerException ? ":\n" + ex.InnerException.Message : "");
MessageBox.Show(floro);
}
}

It seems that this is required behavior by the XML specification and is a "feature" in Microsoft's implementation of the XmlReader (see this answer).
Probably the easiest thing for you to do would be to replace \n with \r\n in your result.

That's the behavior mandated by the XML specification: every \r\n, \r or \n MUST be interpreted as a single \n character. If you want to maintain the \r in your output, you have to change it to a character reference (
) as shown below.
public class StackOverflow_7374609
{
[XmlRoot(ElementName = "MyType", Namespace = "")]
public class MyType
{
[XmlText]
public string Value;
}
static void PrintChars(string str)
{
string toEscape = "\r\n\t\b";
string escapeChar = "rntb";
foreach (char c in str)
{
if (' ' <= c && c <= '~')
{
Console.WriteLine(c);
}
else
{
int escapeIndex = toEscape.IndexOf(c);
if (escapeIndex >= 0)
{
Console.WriteLine("\\{0}", escapeChar[escapeIndex]);
}
else
{
Console.WriteLine("\\u{0:X4}", (int)c);
}
}
}
Console.WriteLine();
}
public static void Test()
{
string serialized = "<MyType>Hello\r\nworld</MyType>";
MemoryStream ms = new MemoryStream(Encoding.UTF8.GetBytes(serialized));
XmlSerializer xs = new XmlSerializer(typeof(MyType));
MyType obj = (MyType)xs.Deserialize(ms);
Console.WriteLine("Without the replacement");
PrintChars(obj.Value);
serialized = serialized.Replace("\r", "
");
ms = new MemoryStream(Encoding.UTF8.GetBytes(serialized));
obj = (MyType)xs.Deserialize(ms);
Console.WriteLine("With the replacement");
PrintChars(obj.Value);
}
}

Related

xml serialization encoding "&" character

I am converting an object into xml string and then into an escaped string.
public class Program
{
public static void Main(string[] args)
{
BankDetails details = new BankDetails();
var xmlstring = ToXmlString(details);
var escaped = SecurityElement.Escape(xmlstring);
}
private static string ToXmlString<T>(T input)
{
XmlSerializer xsSubmit = new XmlSerializer(typeof(T));
XmlSerializerNamespaces ns = new XmlSerializerNamespaces();
var xml = "";
ns.Add("", "");
using (var sww = new StringWriter())
{
using (XmlWriter writer = XmlWriter.Create(sww, new XmlWriterSettings()
{
OmitXmlDeclaration = true
}))
{
xsSubmit.Serialize(writer, input, ns);
xml = sww.ToString();
}
}
return xml;
}
}
public class BankDetails
{
public string MemberName = "B & A Auto";
}
How can I avoid getting & in xmlstring variable.
<BankDetails><MemberName>B & A Auto</MemberName></BankDetails>
I am looking for output something like this:
xmlstring = //<BankDetails><MemberName>B & A Auto</MemberName></BankDetails>
//and then
escaped = //<BankDetails><MemberName>B & A Auto</MemberName></BankDetails>
Working Fiddle
You can use Unicode equivalent character ie decimal or hex, & or & instead.
"B & A Auto" => "B &#038 A Auto";
You can parse your string, convert amps to their unicode equivalence and then escape those.
No, you can not. The & is a special character in XML and used for escaping other characters.
Escaped character in XML
' = &apos;
< = <
> = >
& = &
" = "

How to loop XmlTextReader properly (C#)?

Below is a sample of the type of XML file I am trying to handle. If I have only one part along with an accompanying number/character I can process the data extraction without the necessity of the 'if (!reader.EOF)' control structure. However when I try to include this structure so that I can loop back to checking for another part, number, and character group, it deadlocks.
Any advice as to how to do this properly? This was the most efficient idea that popped into my head. I am new to reading data from XMLs.
Sample Xml:
<?xml version="1.0" encoding="UTF-8"?>
<note>
<part>100B</part>
<number>45</number>
<character>a</character>
<part>100C</part>
<number>55</number>
<character>b</character>
</note>
Code:
String part = "part";
String number = "number";
String character = "character";
String appendString = "";
StringBuilder sb = new StringBuilder();
try
{
XmlTextReader reader = new XmlTextReader("myPath");
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element: // The node is an element.
myLabel:
if (reader.Name == part)
{
part = reader.ReadInnerXml();
}
if (reader.Name == number)
{
number = reader.ReadInnerXml();
number = double.Parse(number).ToString("F2"); //format num
}
if (reader.Name == character)
{
character = reader.ReadInnerXml();
}
//new string
appendString = ("Part: " + part + "\nNumber: " + number +
"\nCharacter: " + character + "\n");
//concatenate
sb.AppendLine(appendString);
if (reader.EOF != true)
{
Debug.Log("!eof");
part = "part";
number = "number";
character = "character";
goto myLabel;
}
//print fully concatenated result
sb.ToString();
//reset string builder
sb.Length = 0;
break;
}
}
}
catch (XmlException e)
{
// Write error.
Debug.Log(e.Message);
}
catch (FileNotFoundException e)
{
// Write error.
Debug.Log(e);
}
catch(ArgumentException e)
{
// Write error.
Debug.Log(e);
}
XmlReader class has many useful methods. Use it.
See this:
var sb = new StringBuilder();
using (var reader = XmlReader.Create("test.xml"))
{
while (reader.ReadToFollowing("part"))
{
var part = reader.ReadElementContentAsString();
sb.Append("Part: ").AppendLine(part);
reader.ReadToFollowing("number");
var number = reader.ReadElementContentAsDouble();
sb.Append("Number: ").Append(number).AppendLine();
reader.ReadToFollowing("character");
var character = reader.ReadElementContentAsString();
sb.Append("Character: ").AppendLine(character);
}
}
Console.WriteLine(sb);
Alexander's answer is fine, I just want to add sample using XDocument, according comments of Jon Skeet:
var sb = new StringBuilder();
var note = XDocument.Load("test.xml").Root.Descendants();
foreach (var el in note)
{
sb.Append(el.Name).Append(": ").AppendLine(el.Value);
}
Console.WriteLine(sb);

Deserializing string list with "\n" results in empty string

I have been banging my head on this one for a bit. It seems like it must be a simple solution but I have searched the internet and tried quite a few things.
I have a complex object which includes a string list that needs to be serialized into xml and then deserialized.
The serialization code has long since been part of the application and works in countless other scenarios but the issue here appears to be that one of the elements in the string list is a mere new line character (i.e. "\n").
It is my understanding, based on my research, it is serializing as expected (see below) but after deserialization the element contains an empty string (i.e. "") instead of "\n".
Here is the code...
public DoStuff(ItemTypeObj item)
{
string myItem = XmlSerialize<ItemType>(item);
ItemTypeObj myNewItemTypeObj = XmlDeserialize<CustomItem>(myItem)
}
public static string XmlSerialize<T>(T objectToSerialize)
{
string ret = string.Empty;
XmlSerializer s = new XmlSerializer(typeof(T));
using (MemoryStream ms = new MemoryStream())
{
s.Serialize(ms, objectToSerialize);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms))
{
sRet = sr.ReadToEnd();
}
}
return ret;
}
public static T XmlDeserialize<T>(string serializedObject)
{
T retVal = default(T);
byte[] ba = ASCIIEncoding.UTF8.GetBytes(serializedObject);
using (MemoryStream ms = new MemoryStream(ba))
{
XmlSerializer s = new XmlSerializer(typeof(T));
retVal = (T)s.Deserialize(ms);
}
return retVal;
}
To give you an idea of the data sent in, ItemTypeObj is the object which includes a string List. The string list can be variable length but sample data could look like this...
[0] = "Zero element text \n"
[1] = "[element1]"
[2] = "\n"
[3] = "[element3]"
[4] = "\n"
[5] = "[element5]"
When serialized it will look like this (which seems correct to me):
<Text>
<string>Zero element text
</string>
<string>[element1]</string>
<string>
</string>
<string>[element3]</string>
<string>
</string>
<string>[element5]</string>
<Text>
From what I've read the newlines are represented as expected in the xml above. The issue is after it is deserialized the string list is this:
[0] = "Zero element text \n"
[1] = "[element1]"
[2] = ""
[3] = "[element3]"
[4] = ""
[5] = "[element5]"
Only the newline characters in the elements that also have text (e.g. [0]) will still exist. The other two are replaced with empty string. If I add text to those elements the new line will be retained.
Looking at the byte array in the deserialization, the array element at the location in the serialized string where the "\n" was turns into a 10 (aka LF, new line). Then that does not successfully get turned into "\n" in the Deserialize. Perhaps that is too much to ask.
Any insight would be most appreciated. Thanks.
You'll need to use the XmlReader and XmlWriter classes or the DataContractSerializer.
See: How to keep XmlSerializer from killing NewLines in Strings?
public static string XmlSerialize<T>(T objectToSerialize)
{
XmlSerializer s = new XmlSerializer(typeof(T));
var settings = new XmlWriterSettings
{
NewLineHandling = NewLineHandling.Entitize
};
using(var stream = new StringWriter())
using(var writer = XmlWriter.Create(stream, settings))
{
s.Serialize(writer, objectToSerialize);
return stream.ToString();
}
}
public static T XmlDeserialize<T>(string serializedObject)
{
XmlSerializer s = new XmlSerializer(typeof(T));
using(var stream = new StringReader(serializedObject))
using(var reader = XmlReader.Create(stream))
{
return (T)s.Deserialize(reader);
}
}
Usage:
public class Foo
{
public string Bar { get; set; }
}
var foo = new Foo { Bar = "\n" };
var result = XmlSerialize(foo);
Console.WriteLine(result);
var newFoo = XmlDeserialize<Foo>(result);
Console.WriteLine(newFoo.Bar);
Debug.Assert(newFoo.Bar == "\n");

Why does XDocument.Parse throw NotSupportedException?

I am trying to parse xml data using XDocument.Parse wchich throws NotSupportedException, just like in topic: Is XDocument.Parse different in Windows Phone 7? and I updated my code according to posted advice, but it still doesn't help. Some time ago I parsed RSS using similar (but simpler) method and that worked just fine.
public void sList()
{
WebClient client = new WebClient();
client.Encoding = Encoding.UTF8;
string url = "http://eztv.it";
Uri u = new Uri(url);
client.DownloadStringAsync(u);
client.DownloadStringCompleted += new DownloadStringCompletedEventHandler(client_DownloadStringCompleted);
}
private void client_DownloadStringCompleted(object sender, DownloadStringCompletedEventArgs e)
{
try
{
string s = e.Result;
s = cut(s);
XmlReaderSettings settings = new XmlReaderSettings();
settings.DtdProcessing = DtdProcessing.Ignore;
XDocument document = null;// XDocument.Parse(s);//Load(s);
using (XmlReader reader = XmlReader.Create(new StringReader(e.Result), settings))
{
document = XDocument.Load(reader); // error thrown here
}
// ... rest of code
}
catch (Exception ex)
{
MessageBox.Show( ex.Message);
}
}
string cut(string s)
{
int iod = s.IndexOf("<select name=\"SearchString\">");
int ido = s.LastIndexOf("</select>");
s = s.Substring(iod, ido - iod + 9);
return s;
}
When I substitute string s for
//string s = "<select name=\"SearchString\"><option value=\"308\">10 Things I Hate About You</option><option value=\"539\">2 Broke Girls</option></select>";
Everything works and no exception is thrown, so what do I do wrong?
There are special symbols like '&' in e.Result.
I just tried replace this symbols (all except '<', '>', '"') with HttpUtility.HtmlEncode() and XDocument parsed it
UPD:
I didn't want to show my code, but you left me no chance :)
string y = "";
for (int i = 0; i < s.Length; i++)
{
if (s[i] == '<' || s[i] == '>' || s[i] == '"')
{
y += s[i];
}
else
{
y += HttpUtility.HtmlEncode(s[i].ToString());
}
}
XDocument document = XDocument.Parse(y);
var options = (from option in document.Descendants("option")
select option.Value).ToList();
It's work for me on WP7. Please, do not use this code for html conversion. I wrote it quickly just for test purposes

C# Find if a word is in a document

I am looking for a way to check if the "foo" word is present in a text file using C#.
I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.
Any comments ?
What's wrong with a simple search?
If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()
Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.
string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;
FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while ((input = SR.ReadLine()) != null)
{
firstword = input.Substring(0, input.IndexOf(" "));
if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); }
Regex RegPattern = new Regex(pattern);
Match Match1 = RegPattern.Match(input);
string value1 = Match1.ToString();
if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true; }
lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}
Here is a quick quick example using LINQ
static void Main(string[] args)
{
{ //LINQ version
bool hasFoo = "file.txt".AsLines()
.Any(l => l.Contains("foo"));
}
{ // No LINQ or Extension Methods needed
bool hasFoo = false;
foreach (var line in Tools.AsLines("file.txt"))
if (line.Contains("foo"))
{
hasFoo = true;
break;
}
}
}
}
public static class Tools
{
public static IEnumerable<string> AsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
while (line.EndsWith("-") && !reader.EndOfStream)
line = line.Substring(0, line.Length - 1)
+ reader.ReadLine();
yield return line;
}
}
}
What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.
Regex r = new Regex("\bfoo\b");
Also ensure you are taking into consideration case insensitivity if you need to.
You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.
using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
string line = null;
while (!sr.EndOfStream) {
line = sr.ReadLine();
if (line.Contains("foo"))
{
// foo was found in the file
}
}
}
You could construct a regex which allows for newlines to be placed between every character.
private static bool IsSubstring(string input, string substring)
{
string[] letters = new string[substring.Length];
for (int i = 0; i < substring.Length; i += 1)
{
letters[i] = substring[i].ToString();
}
string regex = #"\b" + string.Join(#"(\r?\n?)", letters) + #"\b";
return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}

Categories