Regular expression to allow backslash in C# - c#

Can anyone provide me with regex for validating string which only should not allow any special characters except backslash. I tried
var regexItem = new Regex("^[a-zA-Z0-9\\\ ]*$");
But it doesn't seem to work

Backslashes need to be escaped in regular expressions - and they also need to be escaped in C#, unless you use verbatim string literals. So either of these should work:
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Both of these ensure that the following string content is passed to the Regex constructor:
^[a-zA-Z0-9\\ ]*$
The Regex code will then see the double backslash and treat it as "I really want to match the backslash character."
Basically, you always need to distinguish between "the string contents you want to pass to the regex engine" and "the string literal representation in the source code". (This is true not just for regular expressions, of course. The debugger doesn't help by escaping in Watch windows etc.)
EDIT: Now that the question has been edited to show that you originally had three backslashes, that's just not valid C#. I suspect you were aiming for "a string with three backslashes in" which would be either of these:
var regexItem = new Regex(#"^[a-zA-Z0-9\\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\\\ ]*$");
... but you don't need to escape the space as far as the regular expression is concerned.

You either need to double escape it (once for C# and once for the Regex engine):
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Or you can use the verbatim string feature of C# (note the #):
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
In a verbatim string, the backslash is not interpreted as starting an escape sequence, so you just need to escape it once for the Regex engine.
I assume that your current code doesn't compile. It should say something along the lines of "Unrecognized escape sequence".
The reason for this is that you have three backslashes followed by a space. The first two backslashes are interpreted as an escape sequence representing a backslash, but the third backslash is interpreted as starting an escape sequence with a space as the second character. Such an escape sequence doesn't exist, leading to the error.

You would have to escape the \ for regex as well as C# string by adding 4 \s for matching a single \.

Related

How to allow a special character in a regex search pattern [duplicate]

Can anyone provide me with regex for validating string which only should not allow any special characters except backslash. I tried
var regexItem = new Regex("^[a-zA-Z0-9\\\ ]*$");
But it doesn't seem to work
Backslashes need to be escaped in regular expressions - and they also need to be escaped in C#, unless you use verbatim string literals. So either of these should work:
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Both of these ensure that the following string content is passed to the Regex constructor:
^[a-zA-Z0-9\\ ]*$
The Regex code will then see the double backslash and treat it as "I really want to match the backslash character."
Basically, you always need to distinguish between "the string contents you want to pass to the regex engine" and "the string literal representation in the source code". (This is true not just for regular expressions, of course. The debugger doesn't help by escaping in Watch windows etc.)
EDIT: Now that the question has been edited to show that you originally had three backslashes, that's just not valid C#. I suspect you were aiming for "a string with three backslashes in" which would be either of these:
var regexItem = new Regex(#"^[a-zA-Z0-9\\\ ]*$");
var regexItem = new Regex("^[a-zA-Z0-9\\\\\\ ]*$");
... but you don't need to escape the space as far as the regular expression is concerned.
You either need to double escape it (once for C# and once for the Regex engine):
var regexItem = new Regex("^[a-zA-Z0-9\\\\ ]*$");
Or you can use the verbatim string feature of C# (note the #):
var regexItem = new Regex(#"^[a-zA-Z0-9\\ ]*$");
In a verbatim string, the backslash is not interpreted as starting an escape sequence, so you just need to escape it once for the Regex engine.
I assume that your current code doesn't compile. It should say something along the lines of "Unrecognized escape sequence".
The reason for this is that you have three backslashes followed by a space. The first two backslashes are interpreted as an escape sequence representing a backslash, but the third backslash is interpreted as starting an escape sequence with a space as the second character. Such an escape sequence doesn't exist, leading to the error.
You would have to escape the \ for regex as well as C# string by adding 4 \s for matching a single \.

Split string using backslash

I want to split a string using the backslash ('\'). However, it's not allowed - the compiler says "newline in constant". Is there a way to split using backslash?
//For example...
String[] breakApart = sentence.Split('\'); //this gives an error.
Try using the escaped character '\\' instead of '\':
String[] breakApart = sentence.Split('\\');
The backslash \ in C# is used as an escape character for special characters like quotes and apostrophes. So when you are trying to wrap the backslash with apostrophes, the backslash together with the final apostrophe is being interpreted as an escaped apostrophe.
Here is a list of character escapes available in C#.
Here is Microsoft's documentation for character literals in C#.
It's backslash, a character literal.
To do the split:
String[] breakApart = sentence.Split('\\');
you can use #
String[] breakApart = sentence.Split(#"\");

regex syntax stop search

How do I make Regex stop the search after "Target This"?
HeaderText="Target This" AnotherAttribute="Getting Picked Up"
This is what i've tried
var match = Regex.Match(string1, #"(?<=HeaderText=\").*(?=\")");
The quantifier * is eager, which means it will consume as many characters as it can while still getting a match. You want the lazy quantifier, *?.
As an aside, rather than using look-around expressions as you have done here, you may find it in general easier to use capturing groups:
var match = Regex.Match(string1, "HeaderText=\"(.*?)\"");
^ ^ these make a capturing group
Now the match matches the whole thing, but match.Groups[1] is just the value in the quotes.
Plain regex pattern
(?<=HeaderText=").*?(?=")
or as string
string pattern = "(?<=HeaderText=\").*?(?=\")";
or using a verbatim string
string pattern = #"(?<=HeaderText="").*?(?="")";
The trick is the question mark after .*. It means "as few as possible", making it stop after the first end-quotes it encounters.
Note that verbatim strings (introduced with #) do not recognize the backslash \ as escape character. Escape the double quotes by doubling them.
Note for others interested in regex: The search pattern used finds a postion between a prefix and a suffix:
(?<=prefix)find(?=suffix)
Try this:
var match = Regex.Match(string1, "HeaderText=\"([^\"]+)");
var val = match.Groups[1].Value; //Target This
UPDATE
if there possibilities have double quotes in target,change the regex to:
HeaderText=\"(.+?)\"\\s+\\w
Note: it's not right way to do this, if it's a XML, check out System.XML otherwise,HtmlAgilityPack / How to use HTML Agility pack.

How do I create a regular expression to disallow backslash

I am having trouble creating a regular expression to disallow the following four characters and limit the size:
/
#
?
\
What I currently have is:
Regex regex = new Regex("^[^/\\#?]{0,1024}$", RegexOptions.Compiled);
if (!regex.IsMatch("\\"))
{
Console.WriteLine("Bad");
}
All of the characters except \ are disallowed. I cannot get \ to work.
Any suggestions on how to support this?
Your regex is fine, ^[^/\\#?]{0,1024}$.
However, in C# backslash is an escape character, so a C# "\\" is a single backslash.
Hence for each backslash in your regex, you have to backslash again for C#:
Regex regex = new Regex("^[^/\\\\#?]{0,1024}$", RegexOptions.Compiled);
Alternatively, you can use a raw string, meaning backslashes in C# strings remain backslashes (note the # symbol):
Regex regex = new Regex(#"^[^/\\#?]{0,1024}$", RegexOptions.Compiled);
You were close, you need to escape the backslash:
^[^/\\#?]{0,1024}$
Even though you do not need to escape special characters inside a character class you do need to escape the escape character itself.
Try two forward slashes.
^[^/\\#?]{0,1024}$
In C++, the forward slash is reserved for escape characters, like \n. To make a literal forward slash, use \\.

Regex battle between maximum and minimum munge

Greetings, I have file with the following strings:
string.Format("{0},{1}", "Having \"Two\" On The Same Line".Localize(), "Is Tricky For regex".Localize());
my goal is to get a match set with the two strings:
Having \"Two\" On The Same Line
Is Tricky For regex
My current regex looks like this:
private Regex CSharpShortRegex = new Regex("\"(?<constant>[^\"]+?)\".Localize\\(\\)");
My problem is with the escaped quotes in the first line I end up stopping at the quote and I get:
On The Same Line
Is Tricky For This Style Too
however attempting to ignore the escaped quotes is not working out because it makes the Regex greedy and I get
Having \"Two\" On The Same Line".Localize(), "Is Tricky For regex"
We seem to be caught between maximum and minimum munge. Is there any hope? I have some backup plans. Can you Regex backwards? that would make it easier because I can start with the "()ezilacoL."
EDIT:
To clarify. This is my lone edge case. Most of the time the string sits alone like:
var myString = "Hot Patootie".Localize()
This one works for me:
\"((?:[^\\"]|(?:\\\"))*)\"\.Localize\(\)
Tested on http://www.regexplanet.com/simple/index.html against a number of strings with various escaped quotes.
Looks like most of us who answered this one had the same rough idea, so let me explain the approach (comments after #s):
\" # We're looking for a string delimited by quotation marks
( # Capture the contents of the quotation marks
(?: # Start a non-capturing group
[^\\"] # Either read a character that isn't a quote or a slash
|(?:\\\") # Or read in a slash followed by a quote.
)* # Keep reading
) # End the capturing group
\" # The string literal ends in a quotation mark
\.Localize\(\) # and ends with the literal '.Localize()', escaping ., ( and )
For C# you'll need to escape the slashes twice (messy):
\"((?:[^\\\\\"]|(?:\\\\\"))*)\"\\.Localize\\(\\)
Mark correctly points out that this one doesn't match escaped characters other than quotation marks. So here's a better version:
\"((?:[^\\"]|(?:\\")|(?:\\.))*)\"\.Localize\(\)
And its slashed-up equivalent:
\"((?:[^\\\\\"]|(?:\\\\\")|(?:\\\\.))*)\"\\.Localize\\(\\)
Works the same way, except it has a special case that if encounters a slash but it can't match \", it just consumes the slash and the following character and moves on.
Thinking about it, it's better to just consume two characters at every slash, which is effectively Mark's answer so I won't repeat it.
Here's the regular expression you need:
#"""(?<constant>(\\.|[^""])*)""\.Localize\(\)"
A test program:
using System;
using System.Text.RegularExpressions;
using System.IO;
class Program
{
static void Main()
{
Regex CSharpShortRegex =
new Regex(#"""(?<constant>(\\.|[^""])*)""\.Localize\(\)");
foreach (string line in File.ReadAllLines("input.txt"))
foreach (Match match in CSharpShortRegex.Matches(line))
Console.WriteLine(match.Groups["constant"].Value);
}
}
Output:
Having \"Two\" On The Same Line
Is Tricky For regex
Hot Patootie
Notice that I have used #"..." to avoid having to escape backslashes inside the regular expression. I think this makes it easier to read.
Update:
My original answer (below the horizontal rule) has a bug: regular-expression matchers attempt alternatives in left-to-right order. Having [^"] as the first alternative allows it to consume the backslash, but then the next character to be matched is a quote, which prevents the match from proceeding.
Incompatibility note: Given the pattern below, perl backtracks to the other alternative (the escaped quote) and successfully finds a match for the Having \"Two\" On The Same Line case.
The fix is to try an escaped quote first and then a non-quote:
var CSharpShortRegex =
new Regex("\"(?<constant>(\\\\\"|[^\"])*)\"\\.Localize\\(\\)");
or if you prefer the at-string form:
var CSharpShortRegex =
new Regex(#"""(?<constant>(\\""|[^""])*)""\.Localize\(\)");
Allow for escapes:
private Regex CSharpShortRegex =
new Regex("\"(?<constant>([^\"]|\\\\\")*)\"\\.Localize\\(\\)");
Applying one level of escaping to make the pattern easier to read, we get
"(?<constant>([^"]|\\")*)"\.Localize\(\)
That is, a string starts and ends with " characters, and everything between is either a non-quote or an escaped quote.
Looks like you're trying to parse code so one approach might be to evaluate the code on the fly:
var cr = new CSharpCodeProvider().CompileAssemblyFromSource(
new CompilerParameters { GenerateInMemory = true },
"class x { public static string e() { return " + input + "}}");
var result = cr.CompiledAssembly.GetType("x")
.GetMethod("e").Invoke(null, null) as string;
This way you could handle all kinds of other special cases (e.g. concatenated or verbatim strings) that would be extremely difficult to handle with regex.
new Regex(#"((([^#]|^|\n)""(?<constant>((\\.)|[^""])*)"")|(#""(?<constant>(""""|[^""])*)""))\s*\.\s*Localize\s*\(\s*\)", RegexOptions.Compiled);
takes care of both simple and #"" strings. It also takes into account escape sequences.

Categories