Verbatim string literals v escape sequences - c#

Is there any difference in how the C# compiler or .NET run-time handles verbatim string literals versus using escape sequences (i.e. performance) or is it just a matter of design time style? E.G.:
var pathA = "c:\\somewhere";
var pathB = #"c:\somewhere";
I would imagine they are compiled the same and it doesn't matter, but was just curious.

Any difference here is limited strictly to the compiler; the IL and runtime have no concept of verbatim vs escaped - it just has the string.
As for which to choose: whichever is more convenient ;p I almost always use verbatim string literals if there are unusual characters, as that allows for multi-line strings very easily and visually.
As an interesting case:
bool areSame = ReferenceEquals("c:\\somewhere", #"c:\somewhere"); // true
which tells are they are exactly the same string instance (thanks to "interning"). They aren't just equivalent; they are the same string instance to the runtime. It is therefore impossible that they can be (to the runtime) different in any way.

They are exactly the same. Try to decompile the two versions with a decompiler.
It's only a matter of convenience for developers when writing it in the code.

The # sign in front of a string tells the compiler to ignore any embeded
escape sequences.
string "\"" would yield a single double quote.
string "\" would yield a single back slash
string #"\" would yield two backslashes

Related

Is it possible to create our own escape sequence in c#?

As inserting Environment.NewLine in string literals reduces readability, I want to create my own escape sequence, for example, \z that is equal to Environment.NewLine. Is it possible to do this in c#? If yes, how?
Edit
I have submitted my request for a new escape sequence to the c# language github, please kindly up the request up here.
You can take advantage of string interpolation ($):
var nl = Environment.NewLine;
Console.WriteLine($"Hello!{nl}This is Another line of text.{nl}One more.");
To be clear, in this code the string {nl} will be replaced with the value of the variable nl, which is set to Environment.NewLine.
See it online in sharplab.
This is not dependent on Console.WriteLine. And it is better than String.Format... Why? The compiler can optimize string interpolation to String.Concat and even bake constants at compile time.

Properly escape filepaths

How do I escape with the #-sign when using variables?
File.Delete(#"c:\test"); // WORKS!
File.Delete(#path); // doesn't work :(
File.Delete(#"c:\test"+path); // WORKS
Anyone have any idea? It's the 2nd example I want to use!
Strings prefixed with # character are called verbatim string literals (whose contents do not need to be escaped).
Therefore, you can only use # with string literals, not string variables.
So, just File.Delete(path); will do, after you assign the path in advance of course (from a verbatim string or some other string).
Verbatim strings are just a syntactic nicety to be able to type strings containing backslashes (paths, regexes) easier. The declarations
string path = "C:\\test";
string path = #"C:\test";
are completely identical in their result. Both result in a string containing C:\test. Note that either option is just needed because the C# language treats \ in strings as special.
The # is not some magic pixie dust needed to make paths work properly, it has a defined meaning when prefixed to strings, in that the strings are interpreted without the usual \ escape sequences.
The reason your second example doesn't work like you expect is that # prefixed to a variable name does something different: It allows you to use reserved keywords as identifiers, so that you could use #class as an identifier, for example. For identifiers that don't clash with keywords the result is the same as without.
If you have a string variable containing a path, then you can usually assume that there is no escaping needed at all. After all it already is in a string. The things I mentioned above are needed to get text from source code correctly through the compiler into a string at runtime, because the compiler has different ideas. The string itself is just data that's always represented the same.
This still means that you have to initialise the string in a way that backslashes survive. If you read it from somewhere no special treatment should be necessary, if you have it as a constant string somewhere else in the code, then again, one of the options at the top has to be used.
string path = #"c:\test";
File.Delete(path);
This will work only on a string. The "real" string is "c:\\test".
Read more here.
There's a major problem with your understanding of the # indicator.
#"whatever string" is a literal string specifier verbatim string literal. What it does is tells the C# compiler to not look for escape sequences. Normally, "\" is an escape sequence in a string, and you can do things like "\n" to indicate a new line or "\t" to indicate a tab. However, if you have #"\n", it tells the compiler "no, I really want to treat the backslash as a backslash character, not an escape sequence."
If you don't like literal mode, the way to do it is to use "\\" anywhere you want a single backslash, because the compiler knows to treat an escaped backslash as the single character.
In either case, #"\n" and "\\n" will produce a 2-character string in memory, with the characters '\' and 'n'. It doesn't matter which way you get there; both are ways of telling the compiler you want those two characters.
In light of this, #path makes no sense, because you don't have any literal characters - just a variable. By the time you have the variable, you already have the characters you want in memory. It does compile ok, as explained by Joey, but it's not logically what you're looking for.
If you're looking for a way to get rid of occurrences of \\ within a variable, you simply want String.Replace:
string ugly = #"C:\\foo";
ugly = ugly.Replace(#"\\", #"\");
First and third are actual paths hence would work.
Second would not even compile and would work if
string path = #"c:\test";
File.Delete(path);

C# Regular Expression always returns FALSE

regexPattern="\w{6}(AAAAA|BBBBB|CCCCC)"
I need the strings below to return TRUE. So ANY 6 letters followed by AAAAA or BBBBB or CCCCC:
TXCDTLAAAAA000
TXCDTLBBBBB111
TXCDTLCCCCC222
but giving the pattern above I always get a FALSE in return. How do I fix this pattern to work right?
So Basically this code is working:
if (Regex.IsMatch("123456BBBBB", #"\w{6}(AAAAA|BBBBB|CCCCC)"))
{
//true
}
so I am fixing the code now
Thank you!
You didn't mention which host language you are using, but the backslash is usually an escape character in double quoted string, so if it is a common language, you may need double backslash
regexPattern="\\w{6}(AAAAA|BBBBB|CCCCC)"
Or use another way to express the pattern that doesn't require escape characters. For example, in Python you can prefix the raw string:
regexPattern = r"\w{6}(AAAAA|BBBBB|CCCCC)"
Although Python won't treat the \w as an escape sequence anyway, but it will help for others.
With C# use # (verbatim string) to accomplish it:
var regexPattern = #"\w{6}(AAAAA|BBBBB|CCCCC)";

the meaning of Contains(#"""")) in c#

A string strChkQoutes is
IF(H15:H119=\"y\",IF(G15:G119=\"y\",1,0)
The following value is true(c#).
strChkQoutes.Contains(#"""")
I don't understand it's meaning. If I want to convert it to java, the string strChkQoutes is
IF(H15:H119="y",IF(G15:G119="y",1,0)
the following value is false(java).
strChkQoutes.contains("\"\"")
what is the difference of the contains function in .net and in java?
The difference here doesn't lie in the methods, but the strings you're passing to the methods.
In C# verbatim string literals, #"""" really means one double quote character. The first inner " escapes the second inner ", since you can't use backslashes for escaping. Reference.
If you didn't use a verbatim string literal, the C# call would look like this:
strChkQuotes.Contains("\"")
Which is different from your Java string, which contains two escaped double quotes in a row and so causes contains() to return false.
# is a C# String literal that java does not have. In Java you'd have to escape your string: .contains("\""). See here for how #-literals are resolved.

When Is it Better To Use # Before a String?

In code that declares or uses a string, I usually see the developers declare it like this:
string randomString = #"C:\Random\RandomFolder\ThisFile.xml";
Instead of:
string randomString = "C:\\Random\\RandomFolder\\ThisFile.xml";
That's the only thing that I see which is better to use the # prefix, since you don't need to do \\, but is there any other use for it when it's better than just without it?
The # sign indicates to the compiler that the string is a verbatim string literal, and thus does not require you to escape any of the characters. Not just the backslash, of course. No escape sequences of any kind are processed by the compiler.
Whether it's "better" or not is an extremely difficult question to answer. This is a purely stylistic choice. Some might argue that the string contents are more readable when you use a string literal, rather than having to escape all of the characters. Others might prefer consistency, where all strings that contain characters that would ordinarily require escaping would have to be escaped. This makes it easier to notice errors in code at a glance. (For what it's worth, I fall into the latter camp. All my paths have \\.)
That being said, it's extremely convenient for regular expressions, for which you'd otherwise be escaping all over the place. And since they don't look much like regular strings, there's minimal risk of confusion.
Windows path names aren't the only things with a lot of backslashes. For instance, #-strings are very useful for regular expressions because they avoid having to double-escape everything.
They can also span multiple lines, so if you ever end up having to have multi-line strings in your code, they make it a bit more convenient.
Makes Regex simpler
#"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|([a-zA-Z]+[\w-]+\.)+[a-zA-Z]{2,4})$";
Multiple lines string
string s = #"testing
some
string with multiple
lines";
It is especially useful for regular expressions that involve matching a backslash character explicitly. Since this is a special character in both C# strings syntax and regex syntax, it requires "double escaping". Example:
string regex = "\\\\.*\\.jpg"
Same expression using the #-notation would be more tidy:
string regex = #"\\.*\.jpg"
"" and #"" are both string literals, first is regular literal but the latter is a verbatim string literal
'#' prefix before any string in C# .NET (Regular string literal and Verbatim string literal in C#.NET)

Categories