Convert backslash-escaped characters to literals, within a string - c#

Are there any .NET provided functions to convert a string with backslash-escaped characters into the literal string?
For example, the string #"this\x20is a\ntest" should become "this is a\ntest", where \n is a literal newline character and \x20 is a literal space. These would (preferably) be Microsoft escape characters.

Try using Regex.Unescape
using System.Text.RegularExpressions;
...
string result=Regex.Unescape(#"this\x20is a\ntest");
This results in:
this is a
test
https://dotnetfiddle.net/y2f5GE
It might not work all the time as expected, please read the docs for details

Related

how to validate regular expression using System.Text.RegularExpressions.Regex.IsMatch in C# [duplicate]

I have a trial version of ReSharper and it always suggests that I switch regular strings to verbatim strings. What is the difference?
A verbatim string is one that does not need to be escaped, like a filename:
string myFileName = "C:\\myfolder\\myfile.txt";
would be
string myFileName = #"C:\myfolder\myfile.txt";
The # symbol means to read that string literally, and don't interpret control characters otherwise.
This is covered in section 2.4.4.5 of the C# specification:
2.4.4.5 String literals
C# supports two forms of string literals: regular string literals and verbatim string literals.
A regular string literal consists of zero or more characters enclosed in double quotes, as in "hello", and may include both simple escape sequences (such as \t for the tab character) and hexadecimal and Unicode escape sequences.
A verbatim string literal consists of an # character followed by a double-quote character, zero or more characters, and a closing double-quote character. A simple example is #"hello". In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. In particular, simple escape sequences and hexadecimal and Unicode escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines.
In other words the only special character in a #"verbatim string literal" is the double-quote character. If you wish to write a verbatim string containing a double-quote you must write two double-quotes. All other characters are interpreted literally.
You can even have literal new lines in a verbatim string literal. In a regular string literal you cannot have literal new lines. Instead you must use for example "\n".
Verbatim strings literals are often useful for embedding filenames and regular expressions in the source code, because backslashes in these types of strings are common and would need to be escaped if a regular string literal were used.
There is no difference at runtime between strings created from regular string literals and strings created from a verbatim string literals - they are both of type System.String.
There is no runtime difference between a string and verbatim string. They're only different at compile time. The compiler accepts fewer escape sequences in a verbatim string so what-you-see-is-what-you-get other than a quote escape.
You can also use the verbatim character, #, to tell the compiler to treat a keyword as a name:
var #if = "if";
//okay, treated as a name
Console.WriteLine(#if);
//compiler err, if without # is a keyword
Console.WriteLine(if);
var #a = "a";
//okay
Console.WriteLine(#a);
//also okay, # isn't part of the name
Console.WriteLine(a);
You can have multiline string too using verbatim strings:
Console.WriteLine(#"This
is
a
Test
for stackoverflow");
without # you got an error.
In VB14 there is a new feature called Multiline Strings, it's like verbatim strings in C#.
Pro tip: VB string literals are now exactly like C# verbatim strings.
Regular strings use special escape sequences to translate to special characters.
/*
This string contains a newline
and a tab and an escaped backslash\
*/
Console.WriteLine("This string contains a newline\nand a tab\tand an escaped backslash\\");
Verbatim strings are interpreted as is, without translating any escape sequences:
/*
This string displays as is. No newlines\n, tabs\t or backslash-escapes\\.
*/
Console.WriteLine(#"This string displays as is. No newlines\n, tabs\t or backslash-escapes\\.");
If you want to suppress the ReSharper warnings, you can use:
Localizable(false)
For things like parameters, file locations, etc., this could be a good solution.

On C# escape curly braces and a backslash after

I am trying to format a text so I can provide a template some RFT text.
My string is declared with the stringformater as:
var fullTitleString = string.Format(
CultureInfo.CurrentCulture,
"{{\\Test",
title,
filterName);
But I keep obtaining a string as "{\Test". Using a single backslash results on errors at it does not understand the \T escaped character.
Doing #"{{\Test" also yields "{\Test". I have looked over the MSDN documentation and other questions where they tell to use another backslash as escaping character, but it doesn't seem to work.
There are two levevls of escaping here:
1. Escaping string literals
In c# strings, a backslash (\) is used as special character and needs to be escaped by another \. So if your resulting string should look like \\uncpath\folder your string literal should be var s = "\\\\uncpath\\folder".
2. Escape format strings
string.Format uses curly braces for place holders, so you need to escape them with extra braces.
So let's say you have
string title = "myTitle";
string filterName = "filter";
then
string.Format("{{\\Test {0}, {1}}}", title, filterName);
results in
{\Test myTitle, filter}
If you want two curly braces at the beginning, you need to put four in your format string:
string.Format("{{{{\\Test {0}, {1}}}", title, filterName);
results in
{{\Test myTitle, filter}
If you provide a clear example of what you are trying to achieve, I may tell you the correct format string.
Side note: In C# 6 the last example could also be $"{{{{\\Test {title}, {filterName}}}" (using string interpolation without explicitly calling string.Format)
NOTE: The Visual Studio debugger always shows the unescaped string literal. So if you declare a string like string s = "\\" you will see both backslashes in your debugger windows, but if you Console.WriteLine(s) only one backslash will be written to console.

Using a Regex to clean string versus Base64 Encoded string

I have a extension method that is using a Regex.Replace to clean up invalid characters in an user-entered string before it is added to a XML document.
The intent of the regex is to strip out some random hi-ASCII characters that are occasionally in the input when the user pastes text from Microsoft Word and replace them with a space:
public static string CleanInput(this string inputString) {
if (string.IsNullOrEmpty(inputString))
return string.Empty;
// Replace invalid characters with a space.
return Regex.Replace(inputString, #"[^\w\.#-]", " ");
}
Now as fate would have it, someone is now using this extension method on a string that contains base64-encoded data.
What I believe is that the regex will leave MOST of the base64 data unmodified, however I think it is might be changing some of it.
So - knowing that \w in the regex is matching [A-Za-z0-9_] and that Base64 effectively the same range, should this regex be changing the string or not?
If it is changing the string, why and how would you change it so that hi-ASCII garbage is still cleaned up in regular non-encoded text without mucking up the encoded string.
Base64 also uses +,/, and =.
You can add these to your character class:
[^\w\.#+/=-]
Note that - has to be last in order for it to be a literal hyphen-minus instead of specifying a range.
It may also be worth considering that \w isn't necessarily the same as [A-Za-z0-9_] according to Microsoft.

Regex, MVS does not like my Regex strings, how do I make it comply

So in microsoft visual studio I have a string that is compiled into a regex. My string is "#(\d+(.\d+)?)=(\d+(.\d+)?)". I cannot compile my program because I get an error saying that \d is a unrecognized escape character. How do I tell it to shut up and let me regex like a pro?
Begin your string with #, that causes the compiler to leave (almost) all characters alone, unescaped (the exception is ", which can be escaped as ""):
#"#(\d+(.\d+)?)=(\d+(.\d+)?"
The problem is that c# does not like the \d inside the string. Use a verbatim string instead
string pattern = #"#(\d+(.\d+)?)=(\d+(.\d+)?)";
The "#" denotes it. C# will not look for escape sequences in the string. If you have to escape a " use two "".
Of cause you can use normal strings. but then you will have to escape the backslashes
string pattern = "#(\\d+(.\\d+)?)=(\\d+(.\\d+)?)";
If you're using a normal string, you need to escape your backslashes, like so:
"#(\\d+(.\\d+)?)=(\\d+(.\\d+)?)"
Basically, you're putting a literal string into C#; the C# compiler sees the string first, and tries to interpret \d as an escape sequence (which doesn't exist, hence error). Therefore, you use \\d to get the C# compiler to see the string as \d, which then gets passed to the regex engine (which does recognize \d as something meaningful). (yes, if you want to match a literal backslash in your regex pattern, you need to use \\\\)
But in C#, you have the alternative of just prepending the string with # to get the compiler to leave the string alone (though " still needs escaping), so that would be like this:
#"#(\d+(.\d+)?)=(\d+(.\d+)?)"
You could also use a verbatim string literal (I prefer to use these because of readability).
Use #"(#\d+(.\d+)?)=(\d+(.\d+)?)"
The #" sign indicates that the string shouldn't interpret escaped characters (A character prefixed by a \) until the closing " is reached.
Note: You can match a single " in your search pattern by double quoting instead "". For instance you can match "Hello" by using the pattern #"""\w+"""

What does the # prefix do on string literals in C#

I read some C# article to combine a path using Path.Combine(part1,part2).
It uses the following:
string part1 = #"c:\temp";
string part2 = #"assembly.txt";
May I know what is the use of # in part1 and part2?
# is not related to any method.
It means that you don't need to escape special characters in the string following to the symbol:
#"c:\temp"
is equal to
"c:\\temp"
Such string is called 'verbatim' or #-quoted. See MSDN.
As other have said its one way so that you don't need to escape special characters and very useful in specifying file paths.
string s1 =#"C:\MyFolder\Blue.jpg";
One more usage is when you have large strings and want it to be displayed across multiple lines rather than a long one.
string s2 =#"This could be very large string something like a Select query
which you would want to be shown spanning across multiple lines
rather than scrolling to the right and see what it all reads up";
As stated in C# Language Specification 4.0:
2.4.4.5 String literals
C# supports two forms of string
literals: regular string literals and
verbatim string literals. A regular
string literal consists of zero or
more characters enclosed in double
quotes, as in "hello", and may include
both simple escape sequences (such as
\t for the tab character), and
hexadecimal and Unicode escape
sequences. A verbatim string literal
consists of an # character followed by
a double-quote character, zero or more
characters, and a closing double-quote
character. A simple example is
#"hello". In a verbatim string
literal, the characters between the
delimiters are interpreted verbatim,
the only exception being a
quote-escape-sequence. In particular,
simple escape sequences, and
hexadecimal and Unicode escape
sequences are not processed in
verbatim string literals.
It denotes a verbatim string literal, and allows you to use certain characters that normally have special meaning, for example \, which is normally an escape character, and new lines. For this reason it's very useful when dealing with Windows paths.
Without using #, the first line of your example would have to be:
string part1 = "c:\\temp";
More information here.
With # you dont have to escape special characters.
So you would have to write "c:\\temp" without #
If more presise it is called 'verbatim' strings. You could read here about it:
http://msdn.microsoft.com/en-us/library/aa691090(v=vs.71).aspx
The # just indicates a different way of specifying a string such that you do not have to escape characters with . the only caveat is that double quotes need to be "" to represent a single ".

Categories