Not detecting the text between two characters? - c#

I need to recognize the number between the tags [DN]4[-DN] so I wrote this regex:
Regex regexCount = new Regex(#"\[DN]([^)]*)\[-DN]");
Match matchCount = regexCount.Match("[DN]4[-DN]");
However when I try to convert the string match to a Int32, I get this error:
Input string was not in a correct format.
This is how I tried converting:
int count = Convert.ToInt32(matchCount.Value);
When I debugged, I saw that the matched value returns {[DN]2[-DN]} instead of 2. However the regex101 test gave away the correct result with the same regex: regex101
What am I doing wrong folks?

You're currently returning the entire match. You need to return the matched context from your capturing group. The Groups property gets the captured groups within the regular expression.
int Count = Convert.ToInt32(matchCount.Groups[1].Value);
Also, the negated character class seems incorrect, I would use the regex token \d instead.
#"\[DN](\d+)\[-DN]"

Related

C# Regular Expression not matching

I have a regular expression
string dateformattwo = #"^(?:(?:31(\/|-|\.)(?:0?[13578]|1[02]|(?:Jan|Mar|May|Jul|Aug|Oct|Dec)))\1|(?:(?:29|30)(\/|-|\.)(?:0?[1,3-9]|1[0-2]|(?:Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))\2))(?:(?:1[6-9]|[2-9]\d)?\d{2})$|^(?:29(\/|-|\.)(?:0?2|(?:Feb))\3(?:(?:(?:1[6-9]|[2-9]\d)?(?:0[48]|[2468][048]|[13579][26])|(?:(?:16|[2468][048]|[3579][26])00))))$|^(?:0?[1-9]|1\d|2[0-8])(\/|-|\.)(?:(?:0?[1-9]|(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep))|(?:1[0-2]|(?:Oct|Nov|Dec)))\4(?:(?:1[6-9]|[2-9]\d)?\d{2})";
and two strings
string value = "30.Jul.2019 This is the line that I want to match"
string value2 = "30.jul.2019"
The regex is correct however it does not match with value but it matches with value2. Why is that happening?
I couldn't get your regex to match your strings, so it's hard to say exactly what's expected here, but I can take a guess as to why it's not working: nowhere in your regex are you looking for july - looks to me like you're only matching for JUL.
Edit: each of your regexes end with $, which asserts its position at the end of the line. Your first line fails because there's characters after the date.
Updates regex here which, despite being a php-matching regex as pointed out in the comments, still matches your desired text.

Get Regex.Matches to start the match at Position 0

I am trying to use Regex to count the number of times a certain string appears in another comma-separated string.
I am using Regex.Matches(comma-separated string, certain string).Count to grab the number. The only issue I have is that I want it to simply count as a match if it lines up at the start of the string.
For instance, if I have the comma separated string
string comma_separated = "dog,cat,bird,blackdog,dog(1)";
and want to see how many times the search string matches with the contents of the comma-separated string
string search = "dog";
I use:
int count = Regex.Matches(comma_separated, search).Count;
I would expect it to be 2 since it matches up with
"dog,cat,bird,blackdog,dog(1)",
however it returns a 3 since it is also matching up with the dog part of blackdog.
Is there any way I can get it to only count as a match when it recognizes a match starting at the start of the string? Or am I just using Regex incorrectly?
As noted in the comments, a regex may not be the most logical way for you to achieve your desired result. However, if you would like to use a regex to find your matches, something like this would provide your desired result
(?<=,|^)dog
This will perform a "positive lookbehind" to ensure that the word "dog" is preceded by either a comma or is at the start of the string you are searching.
More info available on lookarounds in Regex here: https://www.regular-expressions.info/lookaround.html
string comma_separated = "dog,cat,bird,blackdog,dog(1)";
int count = Regex.Matches(comma_separated, string.Format(#"\b{0}\b", Regex.Escape("dog")), RegexOptions.IgnoreCase).Count;
By appending the \b to either side of the text you can find the "EXACT" match within the text.
Try using this pattern: search = #"\bdog";. \b matches word boundary.

Regex.Replace replaces more than bargained for

I'm writing some test cases for IIS Rewrite rules, but my tests are not matching the same way as IIS is, leading to some false negatives.
Can anyone tell me why the following two lines leads to the same result?
Regex.Replace("v1/bids/aedd3675-a0f2-4494-a2c0-32418cf2476a", ".*v[1-9]/bids/.*", "http://localhost:9900/$0")
Regex.Replace("v1/bids/aedd3675-a0f2-4494-a2c0-32418cf2476a", "v[1-9]/bids/", "http://localhost:9900/$0")
Both return:
http://localhost:9900/v1/bids/aedd3675-a0f2-4494-a2c0-32418cf2476a
But I would expect the last regex to return:
http://localhost:9900/v1/bids/
As the GUID is not matched.
On IIS, the pattern tester yields the result below. Is {R:0} not equivalent to $0?
What I am asking is:
Given the test input of v[1-9]/bids/, how can I match IIS' way of doing Regex replaces so that I get the result http://localhost:9900/v1/bids/, which appears to be what IIS will rewrite to.
The point here is that the pattern you have matches the test strings at the start.
The first .*v[1-9]/bids/.* regex matches 0+ any characters but a newline (as many as possible) up to the last v followed with a digit (other than 0) and followed with /bids/, and then 0+ characters other than a newline. Since the string is matched at the beginning the whole string is matched and placed into Group 0. In the replacement, you just pre-pend http://localhost:9900/ to that value.
The second regex replacement returns the same result because the regex matches v1/bids/, stores it in Group 0, and replaces it with http://localhost:9900/ + v1/bids/. What remains is just appended to the replacement result as it does not match.
You need to match that "tail" in order to remove it.
To only get the http://localhost:9900/v1/bids/, use a capturing group around the v[0-9]/bids/ and use the $1 backreference in the replacement part:
(v[1-9]/bids/).*
Replace with http://localhost:9900/$1. Result: http://localhost:9900/v1/bids/
See the regex demo
Update
The IIS keeps the base URL and then adds the parts you match with the regex. So, in your case, you have http://localhost:9900/ as the base URL and then you match v1/bids/ with the regex. So, to simulate this behavior, just use Regex.Match:
var rx = Regex.Match("v1/bids/aedd3675-a0f2-4494-a2c0-32418cf2476a", "v[1-9]/bids/");
var res = rx.Success ? string.Format("http://localhost:9900/{0}", rx.Value) : string.Empty;
See the IDEONE demo

How do I exclude a regex value in a replace

I have a regex expression which searches for strings using a Prefix and a Suffix. In it's simplest form \$\$\w+\$\$ will find $$My_Name$$ (in this case the Prefix and Suffix are both equal to $$) Another example would be \[\#\w+\#\] to match [#My_Name#].
The Prefix and Suffix will always be a specific string of 0 to n characters which I can always safely escape for a direct character match.
I extract the Matches in my C# code so I can work with them but obviously my match contains $$My_Name$$ but what I want is to simply get the inner string between the Suffix and Prefix: My_Name.
How do I exclude the Prefix and Suffix from the result?
Change your REGEX to \$\$(\w+)\$\$ and use $1 to get the matching (inner) string.
For example
string pattern = #"\$\$(\w+)\$\$";
string input = "$$My_Name$$";
Regex rgx = new Regex(pattern);
Match result = rgx.Match(input);
Console.WriteLine(result.Groups[1]);
Outputs: "My Name"
P.S - There's no need to use explicitly typed local variables, but I just wanted the types to be clear.
You can group your w+ into a group like this (w+) then when you retrieve the matched string you might be able to ask for that subgroup.
I do not know if I am wrong (but you didn't provided any code whatsoever) but I think this is how it is done: .Groups[1].Value on the the result of Regex.Match.
How about the regex below. It works by capturing the first character into a named group then capturing any repeats into a named group called first group which it then uses to match the end of the string. It will work with any number of repeated character so long as they repeated at the end of the word.
'(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+'
You just need to then extract the capture group named word like so:
String sample = "$$My_Name$$";
Regex regex = new Regex("(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+");
Match match = regex.Match(sample);
if (match.Success)
{
Console.WriteLine(match.Groups["word"].Value);
}
You can use named group like this:
(\$\$)(?<group1>.+?)\1 -- pattern 1 (first case)
\[(#)(?<group2>.+?)\1\] -- pattern 2 (second case)
or combined representation would be:
(\$\$)(?<group1>.+?)\1|\[(#)(?<group2>.+?)\3\]
I would suggest you to use .+? it will help you to match any character other than your prefix/suffix.
Live Demo

Regular Expression with Groups and Values in C#

I am trying to write a simple regex to convert some two digit years to four digit years in a pipe delimited file. I am using:
Regex dateFormat = new Regex(#"\|(\d\d)/(\d\d)/([\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|$1$220$3|'");
What I want is |10/31/09| to be replaced with |10312009|.
What I am getting is |10$22009|
I think the problem is .NET is evaluating $1 and $3 but is thinking there is a group in the middle with no value ($220 maybe?). How can I let .NET know that the 20 is a constant value instead of part of the group value?
Thanks in advance
Your intuition about the problem is correct: the second backreference is being interpreted as $220, not $2. To fix this, use curly braces:
dateFormat.Replace(contents,#"|$1${2}20$3|'");
More info about .NET regular expressions is available here.
Your regex text doesn't parse. Was the "[" supposed to be there? Wrap the number in {} to fix the replace issue:
Regex dateFormat = new Regex(#"\|(\d\d)/(\d\d)/(\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${1}${2}20${3}|'");
You can modify your Regex to use named groups instead. The syntax for a named group is (?). Then, in your Replace function you can use the group names instead of the group number.
Regex dateFormat = new Regex(#"\|(?<month>\d\d)/(?<day>\d\d)/(?<year>[\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${month}${day}20${year}|'");
I don't know how to do that but here is my workaround. To use named group.
Regex dateFormat = new Regex(#"\|(?<month>\d\d)/(?<date>\d\d)/(?<year>\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${month}${date}20${year}|'");
See more infor at the bottom of this page.
Hope this help.
Try this:
string contents = "|10/31/09|";
Regex dateFormat = new Regex(#"\|(?<mm>\d\d)/(?<dd>\d\d)/(?<yy>\d\d)\|");
Console.WriteLine(dateFormat.Replace(contents, "|${mm}${dd}20${yy}|"));
More information:
Call RegexObj.Replace("subject", "replacement") to perform a search-and-replace using the regex on the subject string, replacing all matches with the replacement string. In the replacement string, you can use $& to insert the entire regex match into the replacement text. You can use $1, $2, $3, etc... to insert the text matched between capturing parentheses into the replacement text. Use $$ to insert a single dollar sign into the replacement text. To replace with the first backreference immediately followed by the digit 9, use ${1}9. If you type $19, and there are less than 19 backreferences, the $19 will be interpreted as literal text, and appear in the result string as such. To insert the text from a named capturing group, use ${name}. Improper use of the $ sign may produce an undesirable result string, but will never cause an exception to be raised.
From http://www.regular-expressions.info/dotnet.html
I see problems with your regular expression, namely the unmatched [ character. The following works fine:
\|(?<month>\d{2})/(?<day>\d{2})/(?<year>\d{2})\|
That will group the month, day, and year results. You can then replace with the following string:
|$1/$2/20$3|

Categories