Regex.IsMatch() not recognizing escape sequence? - c#

I'm doing a match comparison on some escaped strings:
Regex.IsMatch("\\Application.evtx", "DebugLogs\\ConfigurationServices.log");
I don't see why I'm getting:
"parsing "DebugLogs\ConfigurationServices.log" - Unrecognized escape sequence \C."
The \C is escaped?

The edit really fooled a lot of people, including me!
'\' is a special character in regular expressions - it effectively is an escape character or denotes an escape sequence.
So the RegEx engine sees DebugLogs*\C*onfigurationServices.log which is, indeed, an unrecognized escape sequence. \A actually is an existing escape sequence.
So you need to escape the escape character. The simplest way to do this is to double the number of slashes used:
Regex.IsMatch("\\\\Application.evtx", "DebugLogs\\\\ConfigurationServices.log");
Which the RegEx engine will see as comparisons betweeen "\\Appplication.evtx" and "DebugLogs\\ConfigurationServices.log" - now the backslash has been escaped and has no special meaning.
Regex.IsMatch(#"\\Application.evtx", #"DebugLogs\\ConfigurationServices.log");
works fine too and is more readable.

The \ character is the escape character in strings. For example if you'd like to do a carriage return, you'd use \r. To get around this either use literal strings
#"\Application.evtx"
Or escape the escape character
"\\Application.evtx"

You probably want
Regex.IsMatch(#"\Application.evtx", #"DebugLogs\ConfigurationServices.log");
without the "#" C# will treat \C as an escape sequence similar to the way it would convert \n in to a newline character however \C is not a recognised/valid escape sequence.

Related

Regex.Escape() unrecognized exception

I have the following code in C#:
String str = #"\hello ali how are you? are you fine? hello and hi";
Console.WriteLine(Regex.Matches(Regex.Escape(str), #"\hello").Count);
I get the following error:
{"parsing \"\hello\" - Unrecognized escape sequence \h."}
I need to know when a \ + (some names) exists in my code. I have used # for not using escape sequence characters but I still get the mentioned error. I can not understand why this happens!
\ is used to denote shorthand classes in a regular expression. e.g. \d means "any digit", or \w means "any word character (alphanumeric characters plus underscores)". The regex engine thinks you're trying to use \h as one of these shorthand sequences, but it's not a valid one.
To match a literal \ in the regex you need to escape it with another one, e.g. \\. So in your example your regex would be \\hello.
See http://www.regular-expressions.info/characters.html for more detailed information.
Escape sequences are used independently in both Strings and regular expressions with different meanings. When you prefix the string literal with # you are telling the compiler to interpret the string literally without escape sequences. However the Regular Expression engine again sees the "\h" and tries to interpret as an escape sequence.
Basically you need to apply both the String literal # and the Regex.Escape function to make the \h be interpreted as a literal by both parsers:
Console.WriteLine(Regex.Matches(str, Regex.Escape(#"\hello")).Count);

Escaping hash and quote to regular expression

I am trying to define a regular to use with a regular expression validator that limits the content of a textbox to only alphanumeric characters, slash (/), hash (#), left and right parentheses (()), period (.), apostrophe ('), quote ("), hyphen (-) and spaces.
I am having troubles with the hash and quote, the other restrictions are working, but when I insert one of these chars the evaluation fails and I get the error message. I have tried to escape these characters without and also using verbatim which was my last attempt.
#"[ a-zA-ZÀ-ÿ/().\'-""#]"
Any thoughts on these? Thank you
The regex language is smart enough to understand that periods and parentheses within a character class actually refer to the characters and not to the patterns they usually do when they appear outside of character classes.
Within your character class, you need to escape the slash (\) and the hyphen(-), but that's it:
#"[ a-zA-ZÀ-ÿ/().\\'\-""#]"
If you move your hyphen to the end of the character class, you won't even need to escape that:
#"[ a-zA-ZÀ-ÿ/().\\'""#-]"
And of course this still only matches one a single character. If you want to ensure that the entire string consists only of these characters, you'll need to use start (^) and end ($) anchors and a quantifier (* or +) after your character class.
I believe your final pattern should look like this:
#"^[ a-zA-ZÀ-ÿ/().\\'""#-]*$"

Strings and use of \

I have stringBuilder & string class, storing a path:
StringBuilder name = new StringBuilder();
name.Append(#"NETWORK\MyComputer");
String serverName = name.ToString(); //this converts the \ to a \\
I have tried a number of things, but it always results in the string having \
Using serverName.Replace("\\", #"\"); doesn't work, it leaves it as a \
servername.Replace("\\", "\""); adds a " to the string, which is still not correct.
Please assist.
If you are concerned at a single back slash being shown as a double back slash then don't be - that is simply the way it is shown to you in the debugger.
The back slash is a special character, that 'specialness' is turned off by doubling it up. Alternatively the # symbol can be prefixed to the string in source code which avoids having to use it.
Use
name.Append(Path.Combine("NETWORK", "MyComputer");
In strings \ is an escape sequence. So \ in debugger will be \\
Acc.to MSDN
Character combinations consisting of a backslash (\) followed by a letter or by a combination of digits are called "escape sequences." To represent a newline character, single quotation mark, or certain other characters in a character constant, you must use escape sequences. An escape sequence is regarded as a single character and is therefore valid as a character constant.
Escape sequences are typically used to specify actions such as carriage returns and tab movements on terminals and printers. They are also used to provide literal representations of nonprinting characters and characters that usually have special meanings, such as the double quotation mark ("). The following table lists the ANSI escape sequences and what they represent.
Read Escape Sequences
I don't think your code can be compiled. Because \ is an escape character, thus the string "\" will be wrong. The #"\" is right because the # (literal) has ignored that escape and tread it as a normal character.
See more here

C# won't escape "\"? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How to use “\” in a string without making it an escape sequence - C#?
Why is it giving me an error in C# when I use a string like: "\themes\default\layout.png"? At the "d" and "l" location? It says unrecognized escape sequence. And how do I stop it from giving me an error when I use: "\"?
Thans
You need to escape it with an additional \:
string value = "\\themes\\default\\layout.png";
or use the # symbol:
string value = #"\themes\default\layout.png";
which will avoid you from doubling all \.
Or if you are dealing with paths (which is what it seems you are) you could use the Path.Combine method:
string value = Path.Combine(#"\", "themes", "default", "layout.jpg");
You're using a backslash to escape 't' and 'd'. If you want to escape the actual backslash you need to do so:
"\\themes\\default\\layout.png"
"Regular" string literals treat the \ character as a special character, used for escape sequences to insert quickly special characters in strings - \n, for example, is used to insert the newline character, \" is used to insert the " character without terminating the string, and so on.
Because of this, to insert a backslash into a "normal" string you have to insert the corresponding escape sequence, which, unsurprisingly, is \\; you would then write in your case:
"\\themes\\default\\layout.png"
Failing to escape the backslashes will result in weird results or errors like the ones you got, since the compiler will try to interpret the couple backslash-letter that follows it as an escape sequence; if such sequence is defined you'll get unwanted characters (e.g. the first \t is escaped to a tab character), if it's not (like \l) you'll get an error about an undefined escape sequence.
Another option, if you don't need to escape any character, is to use the so-called "verbatim" strings literals: if you prefix the string with an # character the escape sequences will be disabled, and the string you write will be taken verbatim by the compiler. The only exception to this rule is for quotes, that can be inserted inside the verbatim string via the "quote escape sequence", i.e. "". In your case you would write:
#"\themes\default\layout.png"
For more info about regular vs verbatim string literals have a look at their documentation.
The backslash is treated as an escape character. Either escape the backslash itslef in the string like so:
"\\themes\\default\\layout.png"
or disable escaping altogether using a verbatim string literal:
#"\themes\default\layout.png"

Why does .NET add an additional slash to the already existent slashes in a path?

I've noticed that C# adds additional slashes (\) to paths. Consider the path C:\Test. When I inspect the string with this path in the text visualiser, the actual string is C:\\Test.
Why is this? It confuses me, as sometimes I may want to split the path up (using string.Split()), but have to wonder which string to use (one or two slashes).
The \\ is used because the \ is an escape character and is need to represent the a single \.
So it is saying treat the first \ as an escape character and then the second \ is taken as the actual value. If not the next character after the first \ would be parsed as an escaped character.
Here is a list of available escape characters:
\' - single quote, needed for character literals
\" - double quote, needed for string literals
\\ - backslash
\0 – Null
\a - Alert
\b - Backspace
\f - Form feed
\n - New line
\r - Carriage return
\t - Horizontal tab
\v - Vertical quote
\u - Unicode escape sequence for character
\U - Unicode escape sequence for surrogate pairs.
\x - Unicode escape sequence similar to "\u" except with variable length.
EDIT: To answer your question regarding Split, it should be no issue. Use Split as you would normally. The \\ will be treated as only the one character of \.
.Net is not adding anything to your string here. What your seeing is an effect of how the debugger chooses to display strings. C# strings can be represented in 2 forms
Verbatim Strings: Prefixed with an # sign and removes the need o escape \\ characters
Normal Strings: Standard C style strings where \\ characters need to escape themselves
The debugger will display a string literal as a normal string vs. a verbatim string. It's just an issue of display though, it doesn't affect it's underlying value.
Debugger visualizers display strings in the form in which they would appear in C# code. Since \ is used to escape characters in non-verbatum C# strings, \\ is the correct escaped form.
Okay, so the answers above are not wholly correct. As such I am adding my findings for the next person who reads this post.
You cannot split a string using any of the chars in the table above if you are reading said string(s) from an external source.
i.e,
string[] splitStrings = File.ReadAllText([path]).Split((char)7);
will not split by those chars. However internally created strings work fine.
i.e.,
string[] splitStrings = "hello\agoodbye".Split((char)7);
This may not hold true for other methods of reading text from a file. I am unsure as I have not tested with other methods. With that in mind, it is probably best not to use those chars for delimiting strings!

Categories