Is it possible to create our own escape sequence in c#? - c#

As inserting Environment.NewLine in string literals reduces readability, I want to create my own escape sequence, for example, \z that is equal to Environment.NewLine. Is it possible to do this in c#? If yes, how?
Edit
I have submitted my request for a new escape sequence to the c# language github, please kindly up the request up here.

You can take advantage of string interpolation ($):
var nl = Environment.NewLine;
Console.WriteLine($"Hello!{nl}This is Another line of text.{nl}One more.");
To be clear, in this code the string {nl} will be replaced with the value of the variable nl, which is set to Environment.NewLine.
See it online in sharplab.
This is not dependent on Console.WriteLine. And it is better than String.Format... Why? The compiler can optimize string interpolation to String.Concat and even bake constants at compile time.

Related

How to insert a variable into a long string with multiple quotation marks and escape characters?

I have this really miserable line of code I need to work with that I haven't found a better way to do. My issue is I'm trying to get a variable inserted in the middle of this string, so normally I would just concatenate with +, but in this case the huge amount of quotation marks and escape characters have made this awful to try to do logistically. If anyone knows a simpler way of doing this I'd appreciate it- I'm sure there's an easy solution but I just can't get it. Here's the line:
Process.Start(#"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe ", #"""C:\Thunder\Scripts\script.ps1"" ""VARIABLE""");
So what I'm trying to do is put a string variable where the VARIABLE text is here. When I try to break it apart to concat, the combination of the #, the "s, and the \'s, I can't get the string apart in such a way that I can concat the variable into it. I assume there's an easier way here that I'm missing. Thanks.
You could use string interpolation:
$#"""C:\Thunder\Scripts\script.ps1"" ""{variable}""";
Using the $ prefix allows variables to be inserted using curly brackets, which will reduce the number of quotations in your case.
If variable is not a string, an implicit ToString() is called on the object instead.

Properly escape filepaths

How do I escape with the #-sign when using variables?
File.Delete(#"c:\test"); // WORKS!
File.Delete(#path); // doesn't work :(
File.Delete(#"c:\test"+path); // WORKS
Anyone have any idea? It's the 2nd example I want to use!
Strings prefixed with # character are called verbatim string literals (whose contents do not need to be escaped).
Therefore, you can only use # with string literals, not string variables.
So, just File.Delete(path); will do, after you assign the path in advance of course (from a verbatim string or some other string).
Verbatim strings are just a syntactic nicety to be able to type strings containing backslashes (paths, regexes) easier. The declarations
string path = "C:\\test";
string path = #"C:\test";
are completely identical in their result. Both result in a string containing C:\test. Note that either option is just needed because the C# language treats \ in strings as special.
The # is not some magic pixie dust needed to make paths work properly, it has a defined meaning when prefixed to strings, in that the strings are interpreted without the usual \ escape sequences.
The reason your second example doesn't work like you expect is that # prefixed to a variable name does something different: It allows you to use reserved keywords as identifiers, so that you could use #class as an identifier, for example. For identifiers that don't clash with keywords the result is the same as without.
If you have a string variable containing a path, then you can usually assume that there is no escaping needed at all. After all it already is in a string. The things I mentioned above are needed to get text from source code correctly through the compiler into a string at runtime, because the compiler has different ideas. The string itself is just data that's always represented the same.
This still means that you have to initialise the string in a way that backslashes survive. If you read it from somewhere no special treatment should be necessary, if you have it as a constant string somewhere else in the code, then again, one of the options at the top has to be used.
string path = #"c:\test";
File.Delete(path);
This will work only on a string. The "real" string is "c:\\test".
Read more here.
There's a major problem with your understanding of the # indicator.
#"whatever string" is a literal string specifier verbatim string literal. What it does is tells the C# compiler to not look for escape sequences. Normally, "\" is an escape sequence in a string, and you can do things like "\n" to indicate a new line or "\t" to indicate a tab. However, if you have #"\n", it tells the compiler "no, I really want to treat the backslash as a backslash character, not an escape sequence."
If you don't like literal mode, the way to do it is to use "\\" anywhere you want a single backslash, because the compiler knows to treat an escaped backslash as the single character.
In either case, #"\n" and "\\n" will produce a 2-character string in memory, with the characters '\' and 'n'. It doesn't matter which way you get there; both are ways of telling the compiler you want those two characters.
In light of this, #path makes no sense, because you don't have any literal characters - just a variable. By the time you have the variable, you already have the characters you want in memory. It does compile ok, as explained by Joey, but it's not logically what you're looking for.
If you're looking for a way to get rid of occurrences of \\ within a variable, you simply want String.Replace:
string ugly = #"C:\\foo";
ugly = ugly.Replace(#"\\", #"\");
First and third are actual paths hence would work.
Second would not even compile and would work if
string path = #"c:\test";
File.Delete(path);

Verbatim string literals v escape sequences

Is there any difference in how the C# compiler or .NET run-time handles verbatim string literals versus using escape sequences (i.e. performance) or is it just a matter of design time style? E.G.:
var pathA = "c:\\somewhere";
var pathB = #"c:\somewhere";
I would imagine they are compiled the same and it doesn't matter, but was just curious.
Any difference here is limited strictly to the compiler; the IL and runtime have no concept of verbatim vs escaped - it just has the string.
As for which to choose: whichever is more convenient ;p I almost always use verbatim string literals if there are unusual characters, as that allows for multi-line strings very easily and visually.
As an interesting case:
bool areSame = ReferenceEquals("c:\\somewhere", #"c:\somewhere"); // true
which tells are they are exactly the same string instance (thanks to "interning"). They aren't just equivalent; they are the same string instance to the runtime. It is therefore impossible that they can be (to the runtime) different in any way.
They are exactly the same. Try to decompile the two versions with a decompiler.
It's only a matter of convenience for developers when writing it in the code.
The # sign in front of a string tells the compiler to ignore any embeded
escape sequences.
string "\"" would yield a single double quote.
string "\" would yield a single back slash
string #"\" would yield two backslashes

regular expression for c# verbatim like strings (processing ""-like escapes)

I'm trying to extract information out of rc-files. In these files, "-chars in strings are escaped by doubling them ("") analog to c# verbatim strings. is ther a way to extract the string?
For example, if I have the following string "this is a ""test""" I would like to obtain this is a ""test"". It also must be non-greedy (very important).
I've tried to use the following regular expression;
"(?<text>[^""]*(""(.|""|[^"])*)*)"
However the performance was awful.
I'v based it on the explanation here: http://ad.hominem.org/log/2005/05/quoted_strings.php
Has anybody any idea to cope with this using a regular expression?
You've got some nested repetition quantifiers there. That can be catastrophic for the performance.
Try something like this:
(?<=")(?:[^"]|"")*(?=")
That can now only consume either two quotes at once... or non-quote characters. The lookbehind and lookahead assert, that the actual match is preceded and followed by a quote.
This also gets you around having to capture anything. Your desired result will simply be the full string you want (without the outer quotes).
I do not assert that the outer quotes are not doubled. Because if they were, there would be no way to distinguish them from an empty string anyway.
This turns out to be a lot simpler than you'd expect. A string literal with escaped quotes looks exactly like a bunch of simple string literals run together:
"Some ""escaped"" quotes"
"Some " + "escaped" + " quotes"
So this is all you need to match it:
(?:"[^"]*")+
You'll have to strip off the leading and trailing quotes in a separate step, but that's not a big deal. You would need a separate step anyway, to unescape the escaped quotes (\" or "").
Don't if this is better or worse than m.buettner's (guessing not - he seems to know his stuff) but I thought I'd throw it out there for critique.
"(([^"]+(""[^"]+"")*)*)"
Try this (?<=^")(.*?"{2}.*?"{2})(?="$)
it will be maybe more faster, than two previous
and without any bugs.
Match a " beginning the string
Multiple times match a non-" or two "
Match a " ending the string
"([^"]|(""))*?"

When Is it Better To Use # Before a String?

In code that declares or uses a string, I usually see the developers declare it like this:
string randomString = #"C:\Random\RandomFolder\ThisFile.xml";
Instead of:
string randomString = "C:\\Random\\RandomFolder\\ThisFile.xml";
That's the only thing that I see which is better to use the # prefix, since you don't need to do \\, but is there any other use for it when it's better than just without it?
The # sign indicates to the compiler that the string is a verbatim string literal, and thus does not require you to escape any of the characters. Not just the backslash, of course. No escape sequences of any kind are processed by the compiler.
Whether it's "better" or not is an extremely difficult question to answer. This is a purely stylistic choice. Some might argue that the string contents are more readable when you use a string literal, rather than having to escape all of the characters. Others might prefer consistency, where all strings that contain characters that would ordinarily require escaping would have to be escaped. This makes it easier to notice errors in code at a glance. (For what it's worth, I fall into the latter camp. All my paths have \\.)
That being said, it's extremely convenient for regular expressions, for which you'd otherwise be escaping all over the place. And since they don't look much like regular strings, there's minimal risk of confusion.
Windows path names aren't the only things with a lot of backslashes. For instance, #-strings are very useful for regular expressions because they avoid having to double-escape everything.
They can also span multiple lines, so if you ever end up having to have multi-line strings in your code, they make it a bit more convenient.
Makes Regex simpler
#"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|([a-zA-Z]+[\w-]+\.)+[a-zA-Z]{2,4})$";
Multiple lines string
string s = #"testing
some
string with multiple
lines";
It is especially useful for regular expressions that involve matching a backslash character explicitly. Since this is a special character in both C# strings syntax and regex syntax, it requires "double escaping". Example:
string regex = "\\\\.*\\.jpg"
Same expression using the #-notation would be more tidy:
string regex = #"\\.*\.jpg"
"" and #"" are both string literals, first is regular literal but the latter is a verbatim string literal
'#' prefix before any string in C# .NET (Regular string literal and Verbatim string literal in C#.NET)

Categories