Today I found out that putting strings in a resource file will cause them to be treated as literals, i.e putting "Text for first line \n Text for second line" will cause the escape character itself to become escaped, and so what's stored is "Text for first line \n Text for second line" - and then these come out in the display, instead of my carriage returns and tabs
So what I'd like to do is use string.replace to turn \\ into \ - this doesn't seem to work.
s.Replace("\\\\", "\\");
doesn't change the string at all because the string thinks there's only 1 backslash
s.Replace("\\", "");
replaces all the double quotes and leaves me with just n instead of \n
also, using # and half as many \ chars or the Regex.Replace method give the same result
anyone know of a good way to do this without looping through character by character?
Since \n is actually a single character, you cannot acheive this by simply replacing the backslashes in the string. You will need to replace each pair of \ and the following character with the escaped character, like:
s.Replace("\\n", "\n");
s.Replace("\\t", "\t");
etc
You'd be better served adjusting the resx files themselves. Line breaks can be entered via two mechanisms: You can edit the resx file as XML (right-click in Solution Explorer, choose "Open As," and choose XML), or you can do it in the designer.
If you do it in the XML, simply hit Enter, backspace to the beginning of the newline you've created, and you're done. You could do this with Search and Replace, as well, though it will be tricky.
If you use the GUI resx editor, holding down SHIFT while pressing ENTER will give you a line break.
You could do the run-time replacement thing, but as you are discovering, it's tricky to get going -- and in my mind constitutes a code smell. (You can also make a performance argument, but that would depend on how often string resources are called and the scale of your app in general.)
I'm actually going with John's solution and editing the XML directly as that's the better solution for the project, but codelogic answered the question that was driving me insane.
Related
How can I replace lone instances of \n with \r\n (LF alone with CRLF) using a regular expression in C#?
I know to do it using plan String.Replace, like:
myStr.Replace("\n", "\r\n");
myStr.Replace("\r\r\n", "\r\n");
However, this is inelegant, and would destroy any "\r+\r\n" already in the text (although they are not likely to exist).
It might be faster if you use this.
(?<!\r)\n
It basically looks for any \n that is not preceded by a \r. This would most likely be faster, because in the other case, almost every letter matches [^\r], so it would capture that, and then look for the \n after that. In the example I gave, it would only stop when it found a \n, and them look before that to see if it found \r
Will this do?
[^\r]\n
Basically it matches a '\n' that is preceded with a character that is not '\r'.
If you want it to detect lines that start with just a single '\n' as well, then try
([^\r]|$)\n
Which says that it should match a '\n' but only those that is the first character of a line or those that are not preceded with '\r'
There might be special cases to check since you're messing with the definition of lines itself the '$' might not work too well. But I think you should get the idea.
EDIT: credit #Kibbee Using look-ahead s is clearly better since it won't capture the matched preceding character and should help with any edge cases as well. So here's a better regex + the code becomes:
myStr = Regex.Replace(myStr, "(?<!\r)\n", "\r\n");
I was trying to do the code below to a string and it was not working.
myStr.Replace("(?<!\r)\n", "\r\n")
I used Regex.Replace and it worked
Regex.Replace( oldValue, "(?<!\r)\n", "\r\n")
I guess that "myStr" is an object of type String, in that case, this is not regex.
\r and \n are the equivalents for CR and LF.
My best guess is that if you know that you have an \n for EACH line, no matter what, then you first should strip out every \r. Then replace all \n with \r\n.
The answer chakrit gives would also go, but then you need to use regex, but since you don't say what "myStr" is...
Edit:looking at the other examples tells me one thing.. why do the difficult things, when you can do it easy?, Because there is regex, is not the same as "must use" :D
Edit2: A tool is very valuable when fiddling with regex, xpath, and whatnot that gives you strange results, may I point you to: http://www.regexbuddy.com/
myStr.Replace("([^\r])\n", "$1\r\n");
$ may need to be a \
Try this: Replace(Char.ConvertFromUtf32(13), Char.ConvertFromUtf32(10) + Char.ConvertFromUtf32(13))
If I know the line endings must be one of CRLF or LF, something that works for me is
myStr.Replace("\r?\n", "\r\n");
This essentially does the same neslekkiM's answer except it performs only one replace operation on the string rather than two. This is also compatible with Regex engines that don't support negative lookbehinds or backreferences.
I have a tab delimited file, and in one particular field, sometimes the content will contain a sentence with a line feed character in the middle of it after looking at it on Notepad++. Subsequently, when the program tries to split this line with the delimiter, it sort of stops at that point and starts again which is bad.
So I've been doing to usual with replace, and then trim to get rid of it, but it's not picked them up.
i.e.
line = line.Replace("\r", " ").Replace("\n", " ");
and
line = line.Trim('\r', '\n');
What am I missing? Is there another representation of \n out there?
Edit. I have also tried (char)10 and didn't find it either.
Edit 2. As a big picture, I've solved what I needed to do, but not with this particular method of replacing. Because I was using .Readline() on my file, I determined replace wouldn't work regardless as that line had finished even though I know it wasn't, so I would read into the next line and then combine the two strings together and my mystery line feed was gone.
From the comments to the question:
You mentioned you tried (char)10, but it only worked if it was in quotes. That just treats it as "(char)10" (a string) instead of the character that a base 10 (decimal) value of "10" actually is: a linefeed character.
It didn't work without quotes because replacing a char requires that the char be replaced with a char instead of a string, which is what happens by default when we don't specify a string is actually a char. (confusing, I know). In VB, and perhaps in C# as well, this is denoted with a c after the closing quote.
Example:
line = line.Replace("\r", " ").Replace("\n", " ").Replace((char)10, " "c);
The other possible way of doing this without using the " "c is to use the char of the space's ascii value (32):
line = line.Replace("\r", " ").Replace("\n", " ").Replace((char)10, (char)32);
That's probably why your initial attempt didn't work when trying to replace the linefeed character.
I think you also may have been fine if you replaced the \n by itself (especially before replacing the \r), or Environment.NewLine, which represents the CRLF - control return (\r) line feed ((char)10) used to denote a new line in windows. (Environment.NewLine is able to adapt to the system the code is running from, though, whereas \n is always CRLF... Correct me if I'm wrong.)
I'm converting from VB to C#, and in C# I seem not to be able to simply write a path string to the application settings..
D:\Something becomes D:\\Something
I tried also #"D:\Something", but that also doesn't work.
So what is the correct way? Say I want to have two settings; path and filename. How shall I format them, for the purpose of Path.Combine to make this a valid file-path/name for a database, or in other words, to have single backslashes?
Your code is working correctly - when you read a string with doubled slashes back, they becomes single slashes again. This is called escaping. It is designed to let you enter special characters as sequences starting in \. Single slash becomes special in this scheme, so you need to escape it with a slash as well.
I have to parse some files that contain some string that has characters in them that I need to escape. To make a short example you can imagine something like this:
var stringFromFile = "This is \\n a test \\u0085";
Console.WriteLine(stringFromFile);
The above results in the output:
This is \n a test \u0085
, but I want the text escaped. How do I do this in C#? The text contains unicode characters too.
To make clear; The above code is just an example. The text contains the \n and unicode \u00xx characters from the file.
Example of the file contents:
Fisika (vanaf Grieks, \u03C6\u03C5\u03C3\u03B9\u03BA\u03CC\u03C2,
\"Natuurlik\", en \u03C6\u03CD\u03C3\u03B9\u03C2, \"Natuur\") is die
wetenskap van die Natuur
Try it using: Regex.Unescape(string)
Should be the right way.
Att.
Don't use the # symbol -- this interprets the string as 100% literal. Just take it off and all shall be well.
EDIT
I may have been a bit hasty with my reply. I think what you're asking is: how can I have C# turn the literal string '\n' into a newline, when read from a file (similar question for other escaped literals).
The answer is: you write it yourself. You need to search for "\\n" and convert it to "\n". Keep in mind that in C#, it's the compiler not the language that changes your strings into actual literals, so there's not some library call to do this (actually there could be -- someone look this up, quick).
EDIT
Aha! Eureka! Behold:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.unescape.aspx
Since you are reading the string from a file, \n is not read as a unicode character but rather as two characters \ and n.
I would say you probably need a search an replace function to convert string "\n" to its unicode character '\n' and so on.
I don't think there's any easy way to do this. Because it's the job of lexical analyzer to parse literals.
I would try generating and compiling a class via CodeDOM with the string inserted there as constant. It's not very fast but it will do all escaping.
I'm in the process of updating a program that fixes subtitles.
Till now I got away without using regular expressions, but the last problem that has come up might benefit by their use. (I've already solved it without regular expressions, but it's a very unoptimized method that slows my program significantly).
TL;DR;
I'm trying to make the following work:
I want all instances of:
"! ." , "!." and "! . " to become: "!"
unless the dot is followed by another dot, in which case I want all instances of:
"!.." , "! .." , "! . . " and "!. ." to become: "!..."
I've tried this code:
the_str = Regex.Replace(the_str, "\\! \\. [^.]", "\\! [^.]");
that comes close to the first part of what I want to do, but I can't make the [^.] character of the replacement string to be the same character as the one in the original string... Please help!
I'm interested in both C# and PHP implementations...
$str = preg_replace('/!(?:\s*\.){2,3}/', '!...', $str);
$str = preg_replace('/!\s*\.(?!\s*\.)/', '!', $str);
This does the work in to PCREs. You probably could do some magic to merge it to one, but it wouldn't be readable anymore. The first PCRE is for !..., the second one for !. They are quite straightforward.
C#
s = Regex.Replace(s, #"!\s?\.\s?(\.?)\s?", "!$1$1$1");
PHP
$s = preg_replace('/!\s?\.\s?(\.?)\s?/', '!$1$1$1', $s);
The first dot is consumed but not captured; you're effectively throwing that one away. Group #1 captures the second dot if there is one, or an empty string if not. In either case, plugging it into the replacement string three times yields the desired result.
I used \s instead of literal spaces to make it more obvious what I was doing, and added the ? quantifier to make the spaces optional. If you really need to restrict it to actual space characters (not tabs, newlines, etc.) you can change them back to spaces. If you want to allow more than one space at a time, you can change ? to * where appropriate--e.g.:
#"!\s*\.\s*(\.?)\s*"
Also, notice the use of C#'s verbatim string literals--the antidote for backslashitis. ;)