Why does attempting to verbatimize this string fail? - c#

I have this substring I want to strip out of a string:
<ArrayOfSiteQuery xmlns:i="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://schemas.datacontract.org/2004/07/CStore.DomainModels.HHS">
Realizing it was full of funkiness, I thought verbatimizing it would solve all ills:
String messedUpJunk = #"<ArrayOfSiteQuery xmlns:i="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://schemas.datacontract.org/2004/07/CStore.DomainModels.HHS">";
...but, to paraphrase the robot on Lost In Space, that does not compute (compile); I get, "; expected" on the first "http".
I can get it compilable by escaping the quotes:
String messedUpJunk = "<ArrayOfSiteQuery xmlns:i=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns=\"http://schemas.datacontract.org/2004/07/CStore.DomainModels.HHS\">";
...but what use is verbatim if it's not verbatimatic?

A double quote is the only character you need to escape in verbatim strings. Escaping it is done differently, you need to double it ("") instead of using the backslash:
String messedUpJunk = #"<ArrayOfSiteQuery xmlns:i=""http://www.w3.org/2001/XMLSchema-instance""
xmlns=""http://schemas.datacontract.org/2004/07/CStore.DomainModels.HHS"">";
MSDN link:
Use double quotation marks to embed a quotation mark inside a verbatim string

It is the double quote which requires escaping (using double double quotes), for rest of the characters you don't need to escape, for example back slash \.
See: 2.4.4.5 String literals - C#
A verbatim string literal consists of an # character followed by a
double-quote character, zero or more characters, and a closing
double-quote character. A simple example is #"hello". In a verbatim
string literal, the characters between the delimiters are interpreted
verbatim, the only exception being a quote-escape-sequence. In
particular, simple escape sequences and hexadecimal and Unicode escape
sequences are not processed in verbatim string literals. A verbatim
string literal may span multiple lines.
The reason you need to escape double quote is because it represents start and end of string, whether verbatim or regular.

Related

how to validate regular expression using System.Text.RegularExpressions.Regex.IsMatch in C# [duplicate]

I have a trial version of ReSharper and it always suggests that I switch regular strings to verbatim strings. What is the difference?
A verbatim string is one that does not need to be escaped, like a filename:
string myFileName = "C:\\myfolder\\myfile.txt";
would be
string myFileName = #"C:\myfolder\myfile.txt";
The # symbol means to read that string literally, and don't interpret control characters otherwise.
This is covered in section 2.4.4.5 of the C# specification:
2.4.4.5 String literals
C# supports two forms of string literals: regular string literals and verbatim string literals.
A regular string literal consists of zero or more characters enclosed in double quotes, as in "hello", and may include both simple escape sequences (such as \t for the tab character) and hexadecimal and Unicode escape sequences.
A verbatim string literal consists of an # character followed by a double-quote character, zero or more characters, and a closing double-quote character. A simple example is #"hello". In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. In particular, simple escape sequences and hexadecimal and Unicode escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines.
In other words the only special character in a #"verbatim string literal" is the double-quote character. If you wish to write a verbatim string containing a double-quote you must write two double-quotes. All other characters are interpreted literally.
You can even have literal new lines in a verbatim string literal. In a regular string literal you cannot have literal new lines. Instead you must use for example "\n".
Verbatim strings literals are often useful for embedding filenames and regular expressions in the source code, because backslashes in these types of strings are common and would need to be escaped if a regular string literal were used.
There is no difference at runtime between strings created from regular string literals and strings created from a verbatim string literals - they are both of type System.String.
There is no runtime difference between a string and verbatim string. They're only different at compile time. The compiler accepts fewer escape sequences in a verbatim string so what-you-see-is-what-you-get other than a quote escape.
You can also use the verbatim character, #, to tell the compiler to treat a keyword as a name:
var #if = "if";
//okay, treated as a name
Console.WriteLine(#if);
//compiler err, if without # is a keyword
Console.WriteLine(if);
var #a = "a";
//okay
Console.WriteLine(#a);
//also okay, # isn't part of the name
Console.WriteLine(a);
You can have multiline string too using verbatim strings:
Console.WriteLine(#"This
is
a
Test
for stackoverflow");
without # you got an error.
In VB14 there is a new feature called Multiline Strings, it's like verbatim strings in C#.
Pro tip: VB string literals are now exactly like C# verbatim strings.
Regular strings use special escape sequences to translate to special characters.
/*
This string contains a newline
and a tab and an escaped backslash\
*/
Console.WriteLine("This string contains a newline\nand a tab\tand an escaped backslash\\");
Verbatim strings are interpreted as is, without translating any escape sequences:
/*
This string displays as is. No newlines\n, tabs\t or backslash-escapes\\.
*/
Console.WriteLine(#"This string displays as is. No newlines\n, tabs\t or backslash-escapes\\.");
If you want to suppress the ReSharper warnings, you can use:
Localizable(false)
For things like parameters, file locations, etc., this could be a good solution.

difference of reading files

string value1 = File.ReadAllText("C:\\file.txt");
string value2 = File.ReadAllText(#"C:\file.txt");
In the above statements when is the difference of using #"C:\file.txt" and C:\file.txt
Compiler would read #"C:\file.txt" as is. Removing verbatim (#) will make it treat '\f' as a single escape character (Form feed). In other words:
#"C:\file.txt" == "C:\\file.txt"
#"C:\file.txt" != "C:\file.txt" // treated as C: + FormFeed + ile.txt
Verbatim string literals start with # and are also enclosed in double
quotation marks. For example:
#"good morning" // a string literal
The advantage of verbatim strings is that escape sequences are not
processed, which makes it easy to write, for example, a fully
qualified file name:
#"c:\Docs\Source\a.txt" // rather than "c:\\Docs\\Source\\a.txt"
String literals:
A regular string literal consists of zero or more characters enclosed
in double quotes, as in "hello", and may include both simple escape
sequences (such as \t for the tab character) and hexadecimal and
Unicode escape sequences.
A verbatim string literal consists of an # character followed by a
double-quote character, zero or more characters, and a closing
double-quote character. A simple example is #"hello". In a verbatim
string literal, the characters between the delimiters are interpreted
verbatim, the only exception being a quote-escape-sequence. In
particular, simple escape sequences and hexadecimal and Unicode escape
sequences are not processed in verbatim string literals. A verbatim
string literal may span multiple lines.
When using a \ in a string, you normally have to use \\ because the \ is an escape character. In reality, the first string you show (File.ReadAllText("C:\file.txt");) should throw a compile error.
The # will allow you to build your string without using \\ every time you need a \.
string value1 = "C:\file.txt";
string value2 = #"C:\file.txt";
the string for value1 will contain a formfeed character where the \f is, while the second one will keep the backslash and the f. (This becomes very clear if you try to output them in a console application with Console.Write...)
The correct way for the value1 version would be "C:\\file.txt"
(The value2 version uses whats called, as Dmitry said, a verbatim string)

What does the # prefix do on string literals in C#

I read some C# article to combine a path using Path.Combine(part1,part2).
It uses the following:
string part1 = #"c:\temp";
string part2 = #"assembly.txt";
May I know what is the use of # in part1 and part2?
# is not related to any method.
It means that you don't need to escape special characters in the string following to the symbol:
#"c:\temp"
is equal to
"c:\\temp"
Such string is called 'verbatim' or #-quoted. See MSDN.
As other have said its one way so that you don't need to escape special characters and very useful in specifying file paths.
string s1 =#"C:\MyFolder\Blue.jpg";
One more usage is when you have large strings and want it to be displayed across multiple lines rather than a long one.
string s2 =#"This could be very large string something like a Select query
which you would want to be shown spanning across multiple lines
rather than scrolling to the right and see what it all reads up";
As stated in C# Language Specification 4.0:
2.4.4.5 String literals
C# supports two forms of string
literals: regular string literals and
verbatim string literals. A regular
string literal consists of zero or
more characters enclosed in double
quotes, as in "hello", and may include
both simple escape sequences (such as
\t for the tab character), and
hexadecimal and Unicode escape
sequences. A verbatim string literal
consists of an # character followed by
a double-quote character, zero or more
characters, and a closing double-quote
character. A simple example is
#"hello". In a verbatim string
literal, the characters between the
delimiters are interpreted verbatim,
the only exception being a
quote-escape-sequence. In particular,
simple escape sequences, and
hexadecimal and Unicode escape
sequences are not processed in
verbatim string literals.
It denotes a verbatim string literal, and allows you to use certain characters that normally have special meaning, for example \, which is normally an escape character, and new lines. For this reason it's very useful when dealing with Windows paths.
Without using #, the first line of your example would have to be:
string part1 = "c:\\temp";
More information here.
With # you dont have to escape special characters.
So you would have to write "c:\\temp" without #
If more presise it is called 'verbatim' strings. You could read here about it:
http://msdn.microsoft.com/en-us/library/aa691090(v=vs.71).aspx
The # just indicates a different way of specifying a string such that you do not have to escape characters with . the only caveat is that double quotes need to be "" to represent a single ".

escape characters

There is a type of string in which you disable the processing of a literal’s escape characters and print the string as is. What is this string? the symbol used to prefix the string, and a possible use for?
is it \?
It is the # character: #"c:\path"
It is called a verbatim string literal.
It's called a verbatim string literal, and uses the # prefix.
Without the prefix, it's still a string literal - it's a regular string literal.
(Some people mistakenly think that the term "string literal" only applies to verbatim string literals, but it's more general than that.)
Verbatim string literals are useful for:
Multiline strings
Strings which naturally contain backslashes (such as Windows paths and regular expressions)
Note that this only makes a difference at compile time. In other words, these two statements are precisely equivalent:
string x = "foo\\bar"; // Regular string literal
string x = #"foo\bar"; // Verbatim string literal
Verbatim string literals are still interned in the same way as regular string literals, still refer to instances of System.String etc.
From section 2.4.4.5 of the C# 4.0 specification:
A regular string literal consists of zero or more characters enclosed in double quotes, as in "hello", and may include both simple escape sequences (such as \t for the tab character), and hexadecimal and Unicode escape sequences.
A verbatim string literal consists of an # character followed by a double-quote character, zero or more characters, and a closing double-quote character. A simple example is #"hello". In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. In particular, simple escape sequences, and hexadecimal and Unicode escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines.
Note that # can also be used as a prefix to allow you to use keywords as identifiers:
int class = 10; // Invalid
int #class = 10; // Valid
This is rarely useful, but can sometimes be required if you have to use a particular identifier. (The class keyword can be useful for an anonymous type property in ASP.NET MVC, for example.)
#
string sLiteral = #"This will be formatted. Even including
return characters,
and spaces at the start of lines";
if i had a string as such: c:\monkey.txt
i would have to escape the slash like this:
string s = "c:\\monkey.txt"
notice the double slash
alternatively you can use the '#' symbol to indicate that the string is to be taken literally:
string s = #"c:\monkey.txt"
In C# you can use the # sign to make a verbatim string literal.
All but the \" are ignored.
var literal = #"C:\Test\Test.txt
Newlines are also parsed";

What are all the usages of '#' in C#?

I have noticed this piece of code:
FileInfo[] files =new DirectoryInfo(#"C:\").GetFiles();
What is the purpose of #? Are there other uses?
Strings literals
C# supports two forms of string literals: regular string literals and verbatim string literals.
A regular string literal consists of zero or more characters enclosed in double quotes, as in "hello", and may include both simple escape sequences (such as \t for the tab character) and hexadecimal and Unicode escape sequences.
A verbatim string literal consists of an # character followed by a double-quote character, zero or more characters, and a closing double-quote character. A simple example is #"hello". In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. In particular, simple escape sequences and hexadecimal and Unicode escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines.
Example 1:
#"C:\Path\File.zip" == "C:\\Path\\File.zip"
// where
"C:\\Path\\File.zip" // regular string literal
#"C:\Path\File.zip" // verbatim string literal
Note: In verbatim string literals you should escape double quotes.
Example 2:
#"He said: ""Hello""" == "He said: \"Hello\""
More info here:
string (C# Reference) at MSDN
String literals at MSDN
String Basics (C# Programming Guide) at MSDN
Working with Strings in C#
Strings in .NET and C#
Identifiers
The prefix "#" enables the use of keywords as identifiers, which is useful when interfacing with other programming languages. The character # is not actually part of the identifier, so the identifier might be seen in other languages as a normal identifier, without the prefix. An identifier with an # prefix is called a verbatim identifier. Use of the # prefix for identifiers that are not keywords is permitted, but strongly discouraged as a matter of style.
Example:
class #class
{
public static void #static(bool #bool) {
if (#bool)
System.Console.WriteLine("true");
else
System.Console.WriteLine("false");
}
}
class Class1
{
static void M() {
cl\u0061ss.st\u0061tic(true);
}
}
The # symbol has a couple of meanings in C#:
When used at the beginning of the string, it means "take this string literally; do not interpret characters that would otherwise be used as escape characters." This is called the verbatim string literal. For example, #"\n\t" is equal to "\\n\\t".
When used at the beginning of an identifier, it means "interpret this as an identifier, not as a keyword". For example, int #this would allow you to name an integer variable "this", which would normally be illegal since this is a keyword.
The # symbol basically says "take this string exactly as it is". It allows the developer to avoid escaping.
So, #"C:\" == "C:\\" (with the \ escaped).
I find it most useful in RegEx... RegEx can get really nasty and confusing quick when you start escaping different things, etc, so it's nice to just use the literal value.
That's a verbatim string literal so you can use backslash without being interpreted as escape character.
A verbatim string literal consists of
an # character followed by a
double-quote character, zero or more
characters, and a closing double-quote
character. A simple example is
#"hello". In a verbatim string
literal, the characters between the
delimiters are interpreted verbatim,
the only exception being a
quote-escape-sequence. In particular,
simple escape sequences and
hexadecimal and Unicode escape
sequences are not processed in
verbatim string literals. A verbatim
string literal may span multiple
lines.
quote-escape-sequence: ""
So, the only escaping the verbatim string literal does is escaping "" as " within two "s.
the # sign before a string allows for literal interpretation of the string, otherwise you'd have to escape the backslash with another ("C:\").
# indicates that the string is a literal string and therefor you don't have to escape characters inside it. As the documentation for string puts it:
The advantage of verbatim strings is that escape sequences are not processed, which makes it easy to write, for example, a fully qualified file name: #"c:\Docs\Source\a.txt" rather than "c:\\Docs\\Source\\a.txt"
Putting a # before a string turns it from a regular literal string into a verbatim literal string that does not use the forward slash to escape () but uses two quotes in a row to escape a quote.
Jon Skeet writes more about strings in C#/.NET.

Categories