Regex.Replace not working - c#

I am having a strange issue with Regex.Replace
string test = "if the other party is in material breach of such contract and, after <span style=\"background-color:#ffff00;\">forty-five (45)</span> calendar days notice of such breach";
string final = Regex.Replace(test, "forty-five (45)", "forty-five (46)", RegexOptions.IgnoreCase);
the "final" string still shows "forty-five (45)". Any idea why? I am assuming it has to do something with the tag. How do I fix this?
Thanks

Escape the parenthesis. Depending on the language, might require two back slashes.
string final = Regex.Replace(test, "forty-five \(45\)", "forty-five (46)", RegexOptions.IgnoreCase);
Basically, parenthesis are defined to mean something, and by escaping the characters, you are telling regex to use the parenthesis character, and not the meaning.
Better yet, why are you using a Regex to do this at all? Try just doing a normal string replacement.
string final = test.Replace("forty-five (45)", "forty-six (46)")

Parentheses are special in regular expressions. They delimit a group, to allow for things such as alternation. For example, the regular expression foo(bar|bat)baz matches:
foo, followed by
either bar OR bat, followed by
baz
So, a regular expression like foo(bar) will never match the literal string foo(bar). What it will match is the literal string foobar. Consequently, you need to escape the metacharacters. In C#, this should do you:
string final = Regex.Replace(test, #"forty-five \(45\)", "forty-five (46)", RegexOptions.IgnoreCase);
The #-quoted string helps avoid headaches from excessive backslashes. Without it, you'd have to write "forty-five \(45\)".

If you are unable to escape the parenthesis, put them in a character class:
forty-five [(]45[)]

Related

How to use regex to match anything from A to B, where B is not preceeded by C

I'm having a hard time with this one. First off, here is the difficult part of the string I'm matching against:
"a \"b\" c"
What I want to extract from this is the following:
a \"b\" c
Of course, this is just a substring from a larger string, but everything else works as expected. The problem is making the regex ignore the quotes that are escaped with a backslash.
I've looked into various ways of doing it, but nothing has gotten me the correct results. My most recent attempt looks like this:
"((\"|[^"])+?)"
In various test online, this works the way it should - but when I build my ASP.NET page, it cuts off at the first ", leaving me with just the a-letter, white space and a backslash.
The logic behind the pattern above is to capture all instances of \" or something that is not ". I was hoping this would search for \", making sure to find those first - but I got the feeling that this is overridden by the second part of the expression, which is only 1 single character. A single backslash does not match 2 characters (\"), but it will match as a non-". And from there, the next character will be a single ", and the matching is completed. (This is just my hypothesis on why my pattern is failing.)
Any pointers on this one? I have tried various combinations with "look"-methods in regex, but I didn't really get anywhere. I also get the feeling that is what I need.
ORIGINAL ANSWER
To match a string like a \"b\" c, you need to use following regex declaration:
(?:\\"|[^"])+
var rx = Regex(#"(?:\\""|[^""])+");
See RegexStorm demo
Here is an IDEONE demo:
var str = "a \\\"b\\\" c";
Console.WriteLine(str);
var rx = new Regex(#"(?:\\""|[^""])+");
Console.WriteLine(rx.Match(str).Value);
Please note the # in front of the string literal that lets us use verbatim string literals where we have to double quotes to match literal quotes and use single escape slashes instead of double. This makes regexps easier to read and maintain.
If you want to match any escaped entities in your input string, you can use:
var rx = new Regex(#"[^""\\]*(?:\\.[^""\\]*)*");
See demo on RegexStorm
UPDATE
To match the quoted strings, just add quotes around the pattern:
var rx = new Regex(#"""(?<res>[^""\\]*(?:\\.[^""\\]*)*)""");
This pattern yields much better performance than Tim Long's suggested regex, see RegexHero test resuls:
The following expression worked for me:
"(?<Result>(\\"|.)*)"
The expression matches as follows:
An opening quote (literal ")
A named capture (?<name>pattern) consisting of:
Zero or more occurences * of literal \" or (|) any single character (.)
A final closing quote (literal ")
Note that the * (zero or more) quantifier is non-greedy so the final quote is matched by the literal " and not the "any single character" . part.
I used ReSharper 9's built-in Regular Expression validator to develop the expression and verify the results:
I have used the "Explicit Capture" option to reduce cruft in the output (RegexOptions.ExplicitCapture).
One thing to note is that I am matching the whole string, but I am only capturing the substring, using a named capture. Using named captures is a really useful way to get at the results you want. In code, it might look something like this:
static string MatchQuotedString(string input)
{
const string pattern = #"""(?<Result>(\\""|.)*)""";
const RegexOptions options = RegexOptions.ExplicitCapture;
Regex regex = new Regex(pattern, options);
var matches = regex.Match(input);
var substring = matches.Groups["Result"].Value;
return substring;
}
Optimization: If you are planning on using the regex a lot, you could factor it out into a field and use the RegexOptions.Compiled option, this pre-compiles the expression and gives you faster throughput at the expense of longer initialization.

Regex pattern with dot and quotation mark

How can I declare regex pattern in c#. Pattern starts with quotation mark and later is url address, some thing like below
\"www\.mypage\.pl //<- this is my pattern
string pattern = ? //todo: what should I put there
using verbatim strings:
string pattern = #"""www\.mypage\.pl";
You can use \ backslash to escape the regex semantic and revert to the actual literal. If you want to escape
"www.mypage.pl
you can use
\"[0-9a-zA-Z]*\.[0-9a-zA-Z]*\.[0-9a-zA-Z]*
^ this is not to escape the regex but the eventual string
if you use single quotes you don't need to escape the quotes!
Note that [0-9a-zA-Z]* will require some more characters like % and - and _ to capture all cases according to an RFC about URL syntax I can't produce now (it's easy to find online).
If you want to escape
\"www\.mypage\.pl
you have to escape the escapes as well:
\\\"[0-9a-zA-Z]*\\\.[0-9a-zA-Z]*\\\.[0-9a-zA-Z]*
Other answers have already handled (very nicely) how to put the phrase itself into a string.
To use it, you need to refer to the System.Text.RegularExpressions namespace, documentation for which is over at MSDN. As a (very) quick example:
System.Text.RegularExpressions.Regex.Replace(content,
regexPattern, newMatchContent);
will replace matches for the regular expression regexPattern in content with newMatchContent, and return the result.

regex syntax stop search

How do I make Regex stop the search after "Target This"?
HeaderText="Target This" AnotherAttribute="Getting Picked Up"
This is what i've tried
var match = Regex.Match(string1, #"(?<=HeaderText=\").*(?=\")");
The quantifier * is eager, which means it will consume as many characters as it can while still getting a match. You want the lazy quantifier, *?.
As an aside, rather than using look-around expressions as you have done here, you may find it in general easier to use capturing groups:
var match = Regex.Match(string1, "HeaderText=\"(.*?)\"");
^ ^ these make a capturing group
Now the match matches the whole thing, but match.Groups[1] is just the value in the quotes.
Plain regex pattern
(?<=HeaderText=").*?(?=")
or as string
string pattern = "(?<=HeaderText=\").*?(?=\")";
or using a verbatim string
string pattern = #"(?<=HeaderText="").*?(?="")";
The trick is the question mark after .*. It means "as few as possible", making it stop after the first end-quotes it encounters.
Note that verbatim strings (introduced with #) do not recognize the backslash \ as escape character. Escape the double quotes by doubling them.
Note for others interested in regex: The search pattern used finds a postion between a prefix and a suffix:
(?<=prefix)find(?=suffix)
Try this:
var match = Regex.Match(string1, "HeaderText=\"([^\"]+)");
var val = match.Groups[1].Value; //Target This
UPDATE
if there possibilities have double quotes in target,change the regex to:
HeaderText=\"(.+?)\"\\s+\\w
Note: it's not right way to do this, if it's a XML, check out System.XML otherwise,HtmlAgilityPack / How to use HTML Agility pack.

C# regex not matching string

I have a string which is formatted like this: $20,$40,$AA,$FF. Basically, hex numbers and they can be of many bytes. I want to check if a string is in the above format, so I tried something like this:
string a = "$20,$30,$40";
Regex reg = new Regex(#"$[0-9a-fA-F],");
if (a.StartsWith(string.Format("{0}{1}", reg, reg)))
MessageBox.Show("A");
It doesn't seem to work though, is there anything I'm missing?
$ is a special character in regular expressions and means end of string. That regex won't match anything at all since you're specifying stuff after the string end. Escape the $ character like
"\$[0-9a-fA-F]{2},"
Anyway AFAIK this will not work with your string since it doesn't end with an ",". You might try:
"^(\$[0-9a-fA-F]{2},?)+$"
You can even simplify the regex by using case-insensitive regex matching:
Regex reg = new Regex(#"^(\$[0-9A-F]{2},?)+$", RegexOptions.IgnoreCase);
EDIT: corrected to match exactly 2 hexadecimal digits.
EDIT: maybe you should write your regex checking like:
if (Regex.IsMatch(a,#"^(\$[0-9A-F]{2},?)+$",RegexOptions.IgnoreCase))
{
// Do whatever
}
I think you are missing a quantifier:
"\$[0-9a-fA-F]+,"
For the problem with the comma at the end, I would simply append one at the end to keep the regex as simple as possible. But this is just the way I would do it.
There are 3 things that need to be changed:
Need to escape your $ symbol as it represents end of line.
\$
Need to tweak your regex pattern to match the entire string instead of parts.
^(\$[0-9a-fA-F]{2},+)+\$[0-9a-fA-F]{2}$
Need to change your code to use Regex.IsMatch.
string a = "$20,$30,$40";
if (Regex.IsMatch(a,#"^(\$[0-9a-fA-F]{2},+)+\$[0-9a-fA-F]{2}$",RegexOptions.IgnoreCase))
MessageBox.Show("A");
PS:
If the input string has white space like a tab or a space in between, then this regex will need to be modified. In such cases, you have to use "\s" at the right positions. For example, if you have white space around the commas like
string a = "$20 ,$30, $40";
then you need to tweak your RegEx this way:
^(\$[0-9a-fA-F]{2}\s*,+\s*)+\$[0-9a-fA-F]{2}\s*$
References:
C# Regex Testers
A Better .NET Regular Expression Tester
RegexHero tester
about Regex.IsMatch (instead of using Match)
MSDN Regex.isMatch
Usage example
C# Regular Expression Cheat Sheet
Old answer below (Ignore):
Try this:
"\$[0-9a-fA-F]{2}?[,]{0,1}"
You might also want to add a repeat modifier to your set such that it becomes;
"\$[0-9a-fA-F]+,"

Simple regex pattern

i'm using C# and i'm trying to allow only alphabetical letters and spaces. my expression at the moment is:
string regex = "^[A-Za-z\s]{1,40}$";
my IDE says that \s is an "Unrecognized escape sequence"
what am i missing?
"\" is a c# escape character as well as a regex escape character. Try:
string regex = #"^[A-Za-z\s]{1,40}$";
You need to put an # in front of your string to turn it into a verbatim string literal:
string regex = #"^[A-Za-z\s]{1,40}$";
Right now, the \ in your regex is being interpreted as trying to escape the following s, which the compiler doesn't understand.
Alternatively, you can just escape the backslash with another one:
string regex = "^[A-Za-z\\s]{1,40}$";
but in general, prefer the first approach to the second.
An additional note, your regex doesn't do what you describe. You say a max of 1 space in between words. In order to do that, you need to move the "\s" out of the character list. The pattern you're currently using allows "any alphanumeric or space from 1 to 40 times" which allows for multiple successive spaces. You'll need something more like the following:
string regex = #"^(?:[A-Za-z]+\s?)+$";
This means "any alphanumeric 1 or more times followed by an optional space, this whole thing one or more times". I don't know how to limit the whole string to 40 characters when you don't know the size of the first expression in advance. Maybe this can be achieved with a "look behind" expression, but I'm not sure. You might have to do it in two steps.

Categories