deserialize xml attribute and handle newline and other special characters - c#

I've tried finding the answer to this for the last 2 days and I just can't find anything that will work with our code.
We have an incoming xml response formatted as below and need to be able to handle newline and other special characters inside of attributes.
The one we're having issues with is "agent-notes" we can not seem to be able to find an XPath function to convert the special characters into \r \n etc.
"anything
everything
something" should be "anything \r \n everything \r \n something"

Unfortunetly, you can't. The agent property value is valid and cannot be assumed to be converted for you in XPath search. You will have to convert you search path by replacing all \n\r to "
". If its the value that you are expecting to be converted then you can use "HttpUtility.HtmlDecode Method".
I've had this problem before and suffered the same fustration as. Coding is not always a perfect science, as much as you would like it to be.

Related

Text files - how to programmatically mimic opening in Wordpad and overwriting as plain text [duplicate]

How can I replace lone instances of \n with \r\n (LF alone with CRLF) using a regular expression in C#?
I know to do it using plan String.Replace, like:
myStr.Replace("\n", "\r\n");
myStr.Replace("\r\r\n", "\r\n");
However, this is inelegant, and would destroy any "\r+\r\n" already in the text (although they are not likely to exist).
It might be faster if you use this.
(?<!\r)\n
It basically looks for any \n that is not preceded by a \r. This would most likely be faster, because in the other case, almost every letter matches [^\r], so it would capture that, and then look for the \n after that. In the example I gave, it would only stop when it found a \n, and them look before that to see if it found \r
Will this do?
[^\r]\n
Basically it matches a '\n' that is preceded with a character that is not '\r'.
If you want it to detect lines that start with just a single '\n' as well, then try
([^\r]|$)\n
Which says that it should match a '\n' but only those that is the first character of a line or those that are not preceded with '\r'
There might be special cases to check since you're messing with the definition of lines itself the '$' might not work too well. But I think you should get the idea.
EDIT: credit #Kibbee Using look-ahead s is clearly better since it won't capture the matched preceding character and should help with any edge cases as well. So here's a better regex + the code becomes:
myStr = Regex.Replace(myStr, "(?<!\r)\n", "\r\n");
I was trying to do the code below to a string and it was not working.
myStr.Replace("(?<!\r)\n", "\r\n")
I used Regex.Replace and it worked
Regex.Replace( oldValue, "(?<!\r)\n", "\r\n")
I guess that "myStr" is an object of type String, in that case, this is not regex.
\r and \n are the equivalents for CR and LF.
My best guess is that if you know that you have an \n for EACH line, no matter what, then you first should strip out every \r. Then replace all \n with \r\n.
The answer chakrit gives would also go, but then you need to use regex, but since you don't say what "myStr" is...
Edit:looking at the other examples tells me one thing.. why do the difficult things, when you can do it easy?, Because there is regex, is not the same as "must use" :D
Edit2: A tool is very valuable when fiddling with regex, xpath, and whatnot that gives you strange results, may I point you to: http://www.regexbuddy.com/
myStr.Replace("([^\r])\n", "$1\r\n");
$ may need to be a \
Try this: Replace(Char.ConvertFromUtf32(13), Char.ConvertFromUtf32(10) + Char.ConvertFromUtf32(13))
If I know the line endings must be one of CRLF or LF, something that works for me is
myStr.Replace("\r?\n", "\r\n");
This essentially does the same neslekkiM's answer except it performs only one replace operation on the string rather than two. This is also compatible with Regex engines that don't support negative lookbehinds or backreferences.

Escape string from file

I have to parse some files that contain some string that has characters in them that I need to escape. To make a short example you can imagine something like this:
var stringFromFile = "This is \\n a test \\u0085";
Console.WriteLine(stringFromFile);
The above results in the output:
This is \n a test \u0085
, but I want the text escaped. How do I do this in C#? The text contains unicode characters too.
To make clear; The above code is just an example. The text contains the \n and unicode \u00xx characters from the file.
Example of the file contents:
Fisika (vanaf Grieks, \u03C6\u03C5\u03C3\u03B9\u03BA\u03CC\u03C2,
\"Natuurlik\", en \u03C6\u03CD\u03C3\u03B9\u03C2, \"Natuur\") is die
wetenskap van die Natuur
Try it using: Regex.Unescape(string)
Should be the right way.
Att.
Don't use the # symbol -- this interprets the string as 100% literal. Just take it off and all shall be well.
EDIT
I may have been a bit hasty with my reply. I think what you're asking is: how can I have C# turn the literal string '\n' into a newline, when read from a file (similar question for other escaped literals).
The answer is: you write it yourself. You need to search for "\\n" and convert it to "\n". Keep in mind that in C#, it's the compiler not the language that changes your strings into actual literals, so there's not some library call to do this (actually there could be -- someone look this up, quick).
EDIT
Aha! Eureka! Behold:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.unescape.aspx
Since you are reading the string from a file, \n is not read as a unicode character but rather as two characters \ and n.
I would say you probably need a search an replace function to convert string "\n" to its unicode character '\n' and so on.
I don't think there's any easy way to do this. Because it's the job of lexical analyzer to parse literals.
I would try generating and compiling a class via CodeDOM with the string inserted there as constant. It's not very fast but it will do all escaping.

Im getting funny % signs back how do I get rid of them?

Im trying to parse a response from an authentication server that is url encoded. However when I do it I am getting \r\n characters. And I need to not have these in my text as I need to be able to run a regular expression that looks for white space but doesnt work with these escape characters.
So basically the string returned is:
ClientIP=192.168.20.31%0d%0aUrl%3d%2fflash%2f56553550_hi%3funiqueReference%3d27809666.mp4
After url decoding it it is:
192.168.20.31\r\nUrl=/flash/56553550_hi?uniqueReference=27809666.mp4
So you see I dont want it to have \r \n etc I want to have:
"192.168.20.31 Url=/flash/56553550_hi?uniqueReference=27809666.mp4"
As a verbatim string in c#.
Is this possible? Or do I have to do a string.replace on \r and replace with " "?
Either replace %0d%0a with %20 before URL decoding the string, or the \r\n with after.
Since the data exists in the original string, you can't just make it go away without replacing it.

Why do I get literal \r and \n when getting text from a database using C#?

In a database field, I'm storing and returning the "body" of my email (in case it changes). In this I have \n\r characters to get new lines. But, it seems not to be working. Any thoughts:
So data in field:
'TBD.\r\n\nExpect'
And my output looks like (literal \r and \n):
TBD.\r\n\nExpect
Thoughts?
Escape sequences have no meaning within actual string objects - only when the C# parser/compiler interprets them. You want to store an actual new line in your database field rather than the 4 characters "\r\n".
It is likely that the \r\n is escaped, so the string actually returned would be equivalent to a string
myString = "\\r\\n";
So you would need to remove these extra slashes either when adding or removing from the database.
Though likely unrelated to your problem, the extra \n you have may cause viewing problems depending on the system, editor, etc.
You could replace all occurrences of \\n\\r, etc. using:
replacedString = myString.Replace("\\r\\n", "\r\n");
This should work to fix your problem.
Because \r, \n, etc. works only within a string in your C# code. If you read a string from a file, a database, or other things, they just get the verbatim values...
Replace your \r\n with System.Environment.NewLine as the below may do work for you:
text.Replace("\r\n", System.Environment.NewLine);

string replace on escape characters

Today I found out that putting strings in a resource file will cause them to be treated as literals, i.e putting "Text for first line \n Text for second line" will cause the escape character itself to become escaped, and so what's stored is "Text for first line \n Text for second line" - and then these come out in the display, instead of my carriage returns and tabs
So what I'd like to do is use string.replace to turn \\ into \ - this doesn't seem to work.
s.Replace("\\\\", "\\");
doesn't change the string at all because the string thinks there's only 1 backslash
s.Replace("\\", "");
replaces all the double quotes and leaves me with just n instead of \n
also, using # and half as many \ chars or the Regex.Replace method give the same result
anyone know of a good way to do this without looping through character by character?
Since \n is actually a single character, you cannot acheive this by simply replacing the backslashes in the string. You will need to replace each pair of \ and the following character with the escaped character, like:
s.Replace("\\n", "\n");
s.Replace("\\t", "\t");
etc
You'd be better served adjusting the resx files themselves. Line breaks can be entered via two mechanisms: You can edit the resx file as XML (right-click in Solution Explorer, choose "Open As," and choose XML), or you can do it in the designer.
If you do it in the XML, simply hit Enter, backspace to the beginning of the newline you've created, and you're done. You could do this with Search and Replace, as well, though it will be tricky.
If you use the GUI resx editor, holding down SHIFT while pressing ENTER will give you a line break.
You could do the run-time replacement thing, but as you are discovering, it's tricky to get going -- and in my mind constitutes a code smell. (You can also make a performance argument, but that would depend on how often string resources are called and the scale of your app in general.)
I'm actually going with John's solution and editing the XML directly as that's the better solution for the project, but codelogic answered the question that was driving me insane.

Categories