Escape string from file - c#

I have to parse some files that contain some string that has characters in them that I need to escape. To make a short example you can imagine something like this:
var stringFromFile = "This is \\n a test \\u0085";
Console.WriteLine(stringFromFile);
The above results in the output:
This is \n a test \u0085
, but I want the text escaped. How do I do this in C#? The text contains unicode characters too.
To make clear; The above code is just an example. The text contains the \n and unicode \u00xx characters from the file.
Example of the file contents:
Fisika (vanaf Grieks, \u03C6\u03C5\u03C3\u03B9\u03BA\u03CC\u03C2,
\"Natuurlik\", en \u03C6\u03CD\u03C3\u03B9\u03C2, \"Natuur\") is die
wetenskap van die Natuur

Try it using: Regex.Unescape(string)
Should be the right way.
Att.

Don't use the # symbol -- this interprets the string as 100% literal. Just take it off and all shall be well.
EDIT
I may have been a bit hasty with my reply. I think what you're asking is: how can I have C# turn the literal string '\n' into a newline, when read from a file (similar question for other escaped literals).
The answer is: you write it yourself. You need to search for "\\n" and convert it to "\n". Keep in mind that in C#, it's the compiler not the language that changes your strings into actual literals, so there's not some library call to do this (actually there could be -- someone look this up, quick).
EDIT
Aha! Eureka! Behold:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.unescape.aspx

Since you are reading the string from a file, \n is not read as a unicode character but rather as two characters \ and n.
I would say you probably need a search an replace function to convert string "\n" to its unicode character '\n' and so on.

I don't think there's any easy way to do this. Because it's the job of lexical analyzer to parse literals.
I would try generating and compiling a class via CodeDOM with the string inserted there as constant. It's not very fast but it will do all escaping.

Related

Properly escape filepaths

How do I escape with the #-sign when using variables?
File.Delete(#"c:\test"); // WORKS!
File.Delete(#path); // doesn't work :(
File.Delete(#"c:\test"+path); // WORKS
Anyone have any idea? It's the 2nd example I want to use!
Strings prefixed with # character are called verbatim string literals (whose contents do not need to be escaped).
Therefore, you can only use # with string literals, not string variables.
So, just File.Delete(path); will do, after you assign the path in advance of course (from a verbatim string or some other string).
Verbatim strings are just a syntactic nicety to be able to type strings containing backslashes (paths, regexes) easier. The declarations
string path = "C:\\test";
string path = #"C:\test";
are completely identical in their result. Both result in a string containing C:\test. Note that either option is just needed because the C# language treats \ in strings as special.
The # is not some magic pixie dust needed to make paths work properly, it has a defined meaning when prefixed to strings, in that the strings are interpreted without the usual \ escape sequences.
The reason your second example doesn't work like you expect is that # prefixed to a variable name does something different: It allows you to use reserved keywords as identifiers, so that you could use #class as an identifier, for example. For identifiers that don't clash with keywords the result is the same as without.
If you have a string variable containing a path, then you can usually assume that there is no escaping needed at all. After all it already is in a string. The things I mentioned above are needed to get text from source code correctly through the compiler into a string at runtime, because the compiler has different ideas. The string itself is just data that's always represented the same.
This still means that you have to initialise the string in a way that backslashes survive. If you read it from somewhere no special treatment should be necessary, if you have it as a constant string somewhere else in the code, then again, one of the options at the top has to be used.
string path = #"c:\test";
File.Delete(path);
This will work only on a string. The "real" string is "c:\\test".
Read more here.
There's a major problem with your understanding of the # indicator.
#"whatever string" is a literal string specifier verbatim string literal. What it does is tells the C# compiler to not look for escape sequences. Normally, "\" is an escape sequence in a string, and you can do things like "\n" to indicate a new line or "\t" to indicate a tab. However, if you have #"\n", it tells the compiler "no, I really want to treat the backslash as a backslash character, not an escape sequence."
If you don't like literal mode, the way to do it is to use "\\" anywhere you want a single backslash, because the compiler knows to treat an escaped backslash as the single character.
In either case, #"\n" and "\\n" will produce a 2-character string in memory, with the characters '\' and 'n'. It doesn't matter which way you get there; both are ways of telling the compiler you want those two characters.
In light of this, #path makes no sense, because you don't have any literal characters - just a variable. By the time you have the variable, you already have the characters you want in memory. It does compile ok, as explained by Joey, but it's not logically what you're looking for.
If you're looking for a way to get rid of occurrences of \\ within a variable, you simply want String.Replace:
string ugly = #"C:\\foo";
ugly = ugly.Replace(#"\\", #"\");
First and third are actual paths hence would work.
Second would not even compile and would work if
string path = #"c:\test";
File.Delete(path);

Remove double backslashes c# (for use ESC/POS programming)

I've seached a long time, and it seems that my problem is world-wide known. But, all the answers that are given, won't work for me. Most of the time, people say 'there is no problem'.
The problem: I'm programming a POS solution, and I'm using a Epson POS printer. To print the buttom to the receipt, I'm storing a string in the database. This is, so users can adjust the text at the bottom of the receipt. But, when I'm pulling the string out of the database, C# adds slashes to the string, so my excape characters won't work. I know, that usualy is not a problem, but in my case it is, because my ECS/POS commands won't work.
I've already tried some scripts, which replaces the double \ with a single \, but they don't work. (eg. String.Replace(#'\\',#'\').
Problem:
I have a sting: "foo \n bar"
Needs to print as:
foo
bar
C# adds slashes: "foo \\n bar"
Now it's printed as:
foo \n bar
Anyone an idea?
The problem is a misunderstanding of how C# handles strings. Take the following sample code:
string foo = "a\nb";
int fooLength = foo.Length; \\ 3 characters.
int bar = (int)(foo[1]); \\ 10 = linefeed character.
versus:
string foo = #"a\nb"; \\ NB: # prefix!
int fooLength = foo.Length; \\ 4 characters.
int bar = (int)(foo[1]); \\ 92 = backslash character.
The first example uses a string literal ("a\nb") which is interpreted by the C# compiler to yield three characters. The second example uses a verbatim string literal, due the prefix #, that suppresses the interpretation of the string.
Note that the debugger is designed to add to the confusion by displaying strings with escape codes added, e.g. string foo = "a\nb" + (Char)9; results in a string that the debugger shows as "a\nb\t". If you use the "text visualizer" in the debugger (by clicking on the magnifying glass when examining the the variable's value) you can see the difference between literal and interpreted characters.
Databases are, as a rule, designed to accept and return string values without interpretation. That way you needn't worry about names like "Delete D'table". Neither the presence of a SQL keyword, nor punctuation used in SQL statements, should present a problem in a data column.
Now the OP's issue should be becoming clearer. The string retrieved from the database does not contain a linefeed, but instead contains the characters '\' and 'n'. .NET has no reason to change those values when the string is read from the database and written to a printer. Unfortunately, the debugger confounds the difference. (Use the text visualizer as described above.)
The solution involves adding code to reproduce the C# compiler's processing of escape sequences. (This should include escaping escape characters!) Alternatively, tokens can be added that are suitable for the application at hand, e.g. occurrences of «ESC» could be replaced with an ASCII escape character. This can be employed for longer sequences, for example if a print uses several characters to introduce a font change then write the code to replace «SetFont» with the correct sequence. More generally, you can replace a snippet with a dynamic value, e.g. «Now» could be replaced with the current date/time when the receipt is being printed. (Register number, cashier name, store hours, ... .) This makes the values in the database more human readable than embedded Unicode oddities and more flexible than fixed strings.
Left as an exercise for the reader: extend snippets to support formatting and null value substitution. «Now|DD.MM.YY hh:mm» to specify a format, «Discount|*|n/a» to specify a value ("n/a") to be displayed if the field is null.

Why is \n from an ini file not working?

I am just curious about the title.
String a = "abc\ndef";
Console.Writeline(a);
The output is
abc
def
Then I stored that value into an ini file and retrieved it from there.
ini.iniwritevalue("a", "a", a);
string b = ini.inireadvalue("a", "a");
Then I showed it on the console. The result is the following:
abc\ndef
Why is \n not working after I retrieved it from the ini file?
P.S. I have a ini.dll file. Our company is using that dll to read and write ini files.
The interpretation of the \n escape code in your source code is done by the compiler when parsing the source file.
If you just read in a file as "data" at runtime, no such interpretation is necessarily going to occur.
You may need to find or write a function which takes a string containing escape sequences and converts them to binary values (\n usually becomes 0x0a)
This is because \n in C# is not just a \ and an n, but an escape sequence with a special meaning. \n is considered a single character and is a line ending. You will not get it when you simply read a \ and an n from a file.
Possibly, you read \\n from there. \\ is also an escape sequence which means the \ character. All you have to do is replace \\n with \n, and it'll be okay.
string s = ... //get the value
s = s.Replace("\\n", "\n");
You need to escape the slash when you write the value, like this:
abc\\ndef

Removing String Escape Codes

My program outputs strings like "Wzyryrff}av{v5~fvzu: Bb``igbuz~+\177Ql\027}C5]{H5LqL{" and the problem is the escape codes (\\\ instead of \, \177 instead of the character, etc.)
I need a way to unescape the string of all escape codes (mainly just the \\\ and octal \027 types). Is there something that already does this?
Thanks
Reference: http://www.tailrecursive.org/postscript/escapes.html
The strings are an encrypted value and I need to decrypt them, but I'm getting the wrong values since the strings are escaped
It sounds more like it's encoded rather than simply escaped (if \177 is really a character). So, try decoding it.
There is nothing built in to do exactly this kind of escaping.
You will need to parse and replace these sequences yourself.
The \xxx octal escapes can be found with a RegEx (\\\d{3}), iterating over the matches will allow you to parse out the octal part and get the replacement character for it (then a simple replace will do).
The others appear to be simple to replace with string.Replace.
If the string is encrypted then you probably need to treat it as binary and not text. You need to know how it is encoded and decode it accordingly. The fact that you can view it as text is incidental.
If you want to replace specific contents you can just use the .Replace() method.
i.e. myInput.Replace("\\", #"\")
I am not sure why the "\" is a problem for you. If it its actually an escape code then it just should be fine since the \ represents the \ in a string.
What is the reason you need to "remove" the escape codes?

string replace on escape characters

Today I found out that putting strings in a resource file will cause them to be treated as literals, i.e putting "Text for first line \n Text for second line" will cause the escape character itself to become escaped, and so what's stored is "Text for first line \n Text for second line" - and then these come out in the display, instead of my carriage returns and tabs
So what I'd like to do is use string.replace to turn \\ into \ - this doesn't seem to work.
s.Replace("\\\\", "\\");
doesn't change the string at all because the string thinks there's only 1 backslash
s.Replace("\\", "");
replaces all the double quotes and leaves me with just n instead of \n
also, using # and half as many \ chars or the Regex.Replace method give the same result
anyone know of a good way to do this without looping through character by character?
Since \n is actually a single character, you cannot acheive this by simply replacing the backslashes in the string. You will need to replace each pair of \ and the following character with the escaped character, like:
s.Replace("\\n", "\n");
s.Replace("\\t", "\t");
etc
You'd be better served adjusting the resx files themselves. Line breaks can be entered via two mechanisms: You can edit the resx file as XML (right-click in Solution Explorer, choose "Open As," and choose XML), or you can do it in the designer.
If you do it in the XML, simply hit Enter, backspace to the beginning of the newline you've created, and you're done. You could do this with Search and Replace, as well, though it will be tricky.
If you use the GUI resx editor, holding down SHIFT while pressing ENTER will give you a line break.
You could do the run-time replacement thing, but as you are discovering, it's tricky to get going -- and in my mind constitutes a code smell. (You can also make a performance argument, but that would depend on how often string resources are called and the scale of your app in general.)
I'm actually going with John's solution and editing the XML directly as that's the better solution for the project, but codelogic answered the question that was driving me insane.

Categories