I am having trouble creating a regular expression to disallow the following four characters and limit the size:
/
#
?
\
What I currently have is:
Regex regex = new Regex("^[^/\\#?]{0,1024}$", RegexOptions.Compiled);
if (!regex.IsMatch("\\"))
{
Console.WriteLine("Bad");
}
All of the characters except \ are disallowed. I cannot get \ to work.
Any suggestions on how to support this?
Your regex is fine, ^[^/\\#?]{0,1024}$.
However, in C# backslash is an escape character, so a C# "\\" is a single backslash.
Hence for each backslash in your regex, you have to backslash again for C#:
Regex regex = new Regex("^[^/\\\\#?]{0,1024}$", RegexOptions.Compiled);
Alternatively, you can use a raw string, meaning backslashes in C# strings remain backslashes (note the # symbol):
Regex regex = new Regex(#"^[^/\\#?]{0,1024}$", RegexOptions.Compiled);
You were close, you need to escape the backslash:
^[^/\\#?]{0,1024}$
Even though you do not need to escape special characters inside a character class you do need to escape the escape character itself.
Try two forward slashes.
^[^/\\#?]{0,1024}$
In C++, the forward slash is reserved for escape characters, like \n. To make a literal forward slash, use \\.
Related
Can anyone provide me with regex for validating string which only should not allow any special characters except backslash. I tried
var regexItem = new Regex("^[a-zA-Z0-9\\\ ]*$");
But it doesn't seem to work
Backslashes need to be escaped in regular expressions - and they also need to be escaped in C#, unless you use verbatim string literals. So either of these should work:
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Both of these ensure that the following string content is passed to the Regex constructor:
^[a-zA-Z0-9\\ ]*$
The Regex code will then see the double backslash and treat it as "I really want to match the backslash character."
Basically, you always need to distinguish between "the string contents you want to pass to the regex engine" and "the string literal representation in the source code". (This is true not just for regular expressions, of course. The debugger doesn't help by escaping in Watch windows etc.)
EDIT: Now that the question has been edited to show that you originally had three backslashes, that's just not valid C#. I suspect you were aiming for "a string with three backslashes in" which would be either of these:
var regexItem = new Regex(#"^[a-zA-Z0-9\\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\\\ ]*$");
... but you don't need to escape the space as far as the regular expression is concerned.
You either need to double escape it (once for C# and once for the Regex engine):
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Or you can use the verbatim string feature of C# (note the #):
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
In a verbatim string, the backslash is not interpreted as starting an escape sequence, so you just need to escape it once for the Regex engine.
I assume that your current code doesn't compile. It should say something along the lines of "Unrecognized escape sequence".
The reason for this is that you have three backslashes followed by a space. The first two backslashes are interpreted as an escape sequence representing a backslash, but the third backslash is interpreted as starting an escape sequence with a space as the second character. Such an escape sequence doesn't exist, leading to the error.
You would have to escape the \ for regex as well as C# string by adding 4 \s for matching a single \.
new Regex(#"\n|\r|\\|<|>|\*|!|\$|%|;");
I have an regex example above, but I can not really understand what is trying to find? can anyone give me a hand please?
The regex matches one of the characters separated by the alternation operator |. There are a few special characters (like \n or \r for newline and carriage return, or \$ for a literal dollar sign and \* for a literal asterisk because $ and * are regex metacharacters), but other than that, it's quite straightforward.
That said, for matching a single character out of a list of valid characters, a character class is usually the better choice, not only because there is less need to escape the metacharacters:
new Regex(#"[\n\r\\<>*!$%;]");
It'll try to match any of the special character listed: \n, \r, \, <, >, *, !, $, % The | is the regex OR operator.
Some characters need to be escaped with an extra \ as they have a signification in the regex lanugage (\, $, ...)
| in regex is an alternation operator. A|B means match either A or B. It can also be written using a character class - [AB] which also means the same thing.
The benefit of using character class is, you don't need to escape regex meta-characters inside it, which you have to do outside, as you did for *. So, your regex can be shortened to:
new Regex(#"[\n\r\\<>*!$%;]");
I want to split a string using the backslash ('\'). However, it's not allowed - the compiler says "newline in constant". Is there a way to split using backslash?
//For example...
String[] breakApart = sentence.Split('\'); //this gives an error.
Try using the escaped character '\\' instead of '\':
String[] breakApart = sentence.Split('\\');
The backslash \ in C# is used as an escape character for special characters like quotes and apostrophes. So when you are trying to wrap the backslash with apostrophes, the backslash together with the final apostrophe is being interpreted as an escaped apostrophe.
Here is a list of character escapes available in C#.
Here is Microsoft's documentation for character literals in C#.
It's backslash, a character literal.
To do the split:
String[] breakApart = sentence.Split('\\');
you can use #
String[] breakApart = sentence.Split(#"\");
Can anyone provide me with regex for validating string which only should not allow any special characters except backslash. I tried
var regexItem = new Regex("^[a-zA-Z0-9\\\ ]*$");
But it doesn't seem to work
Backslashes need to be escaped in regular expressions - and they also need to be escaped in C#, unless you use verbatim string literals. So either of these should work:
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Both of these ensure that the following string content is passed to the Regex constructor:
^[a-zA-Z0-9\\ ]*$
The Regex code will then see the double backslash and treat it as "I really want to match the backslash character."
Basically, you always need to distinguish between "the string contents you want to pass to the regex engine" and "the string literal representation in the source code". (This is true not just for regular expressions, of course. The debugger doesn't help by escaping in Watch windows etc.)
EDIT: Now that the question has been edited to show that you originally had three backslashes, that's just not valid C#. I suspect you were aiming for "a string with three backslashes in" which would be either of these:
var regexItem = new Regex(#"^[a-zA-Z0-9\\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\\\ ]*$");
... but you don't need to escape the space as far as the regular expression is concerned.
You either need to double escape it (once for C# and once for the Regex engine):
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Or you can use the verbatim string feature of C# (note the #):
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
In a verbatim string, the backslash is not interpreted as starting an escape sequence, so you just need to escape it once for the Regex engine.
I assume that your current code doesn't compile. It should say something along the lines of "Unrecognized escape sequence".
The reason for this is that you have three backslashes followed by a space. The first two backslashes are interpreted as an escape sequence representing a backslash, but the third backslash is interpreted as starting an escape sequence with a space as the second character. Such an escape sequence doesn't exist, leading to the error.
You would have to escape the \ for regex as well as C# string by adding 4 \s for matching a single \.
I'm doing a match comparison on some escaped strings:
Regex.IsMatch("\\Application.evtx", "DebugLogs\\ConfigurationServices.log");
I don't see why I'm getting:
"parsing "DebugLogs\ConfigurationServices.log" - Unrecognized escape sequence \C."
The \C is escaped?
The edit really fooled a lot of people, including me!
'\' is a special character in regular expressions - it effectively is an escape character or denotes an escape sequence.
So the RegEx engine sees DebugLogs*\C*onfigurationServices.log which is, indeed, an unrecognized escape sequence. \A actually is an existing escape sequence.
So you need to escape the escape character. The simplest way to do this is to double the number of slashes used:
Regex.IsMatch("\\\\Application.evtx", "DebugLogs\\\\ConfigurationServices.log");
Which the RegEx engine will see as comparisons betweeen "\\Appplication.evtx" and "DebugLogs\\ConfigurationServices.log" - now the backslash has been escaped and has no special meaning.
Regex.IsMatch(#"\\Application.evtx", #"DebugLogs\\ConfigurationServices.log");
works fine too and is more readable.
The \ character is the escape character in strings. For example if you'd like to do a carriage return, you'd use \r. To get around this either use literal strings
#"\Application.evtx"
Or escape the escape character
"\\Application.evtx"
You probably want
Regex.IsMatch(#"\Application.evtx", #"DebugLogs\ConfigurationServices.log");
without the "#" C# will treat \C as an escape sequence similar to the way it would convert \n in to a newline character however \C is not a recognised/valid escape sequence.