Regular Expression with Groups and Values in C# - c#

I am trying to write a simple regex to convert some two digit years to four digit years in a pipe delimited file. I am using:
Regex dateFormat = new Regex(#"\|(\d\d)/(\d\d)/([\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|$1$220$3|'");
What I want is |10/31/09| to be replaced with |10312009|.
What I am getting is |10$22009|
I think the problem is .NET is evaluating $1 and $3 but is thinking there is a group in the middle with no value ($220 maybe?). How can I let .NET know that the 20 is a constant value instead of part of the group value?
Thanks in advance

Your intuition about the problem is correct: the second backreference is being interpreted as $220, not $2. To fix this, use curly braces:
dateFormat.Replace(contents,#"|$1${2}20$3|'");
More info about .NET regular expressions is available here.

Your regex text doesn't parse. Was the "[" supposed to be there? Wrap the number in {} to fix the replace issue:
Regex dateFormat = new Regex(#"\|(\d\d)/(\d\d)/(\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${1}${2}20${3}|'");

You can modify your Regex to use named groups instead. The syntax for a named group is (?). Then, in your Replace function you can use the group names instead of the group number.
Regex dateFormat = new Regex(#"\|(?<month>\d\d)/(?<day>\d\d)/(?<year>[\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${month}${day}20${year}|'");

I don't know how to do that but here is my workaround. To use named group.
Regex dateFormat = new Regex(#"\|(?<month>\d\d)/(?<date>\d\d)/(?<year>\d\d)\|");
string convertedString = dateFormat.Replace(contents, #"|${month}${date}20${year}|'");
See more infor at the bottom of this page.
Hope this help.

Try this:
string contents = "|10/31/09|";
Regex dateFormat = new Regex(#"\|(?<mm>\d\d)/(?<dd>\d\d)/(?<yy>\d\d)\|");
Console.WriteLine(dateFormat.Replace(contents, "|${mm}${dd}20${yy}|"));
More information:
Call RegexObj.Replace("subject", "replacement") to perform a search-and-replace using the regex on the subject string, replacing all matches with the replacement string. In the replacement string, you can use $& to insert the entire regex match into the replacement text. You can use $1, $2, $3, etc... to insert the text matched between capturing parentheses into the replacement text. Use $$ to insert a single dollar sign into the replacement text. To replace with the first backreference immediately followed by the digit 9, use ${1}9. If you type $19, and there are less than 19 backreferences, the $19 will be interpreted as literal text, and appear in the result string as such. To insert the text from a named capturing group, use ${name}. Improper use of the $ sign may produce an undesirable result string, but will never cause an exception to be raised.
From http://www.regular-expressions.info/dotnet.html

I see problems with your regular expression, namely the unmatched [ character. The following works fine:
\|(?<month>\d{2})/(?<day>\d{2})/(?<year>\d{2})\|
That will group the month, day, and year results. You can then replace with the following string:
|$1/$2/20$3|

Related

I want only matching string using regex

I have a string "myname 18-may 1234" and I want only "myname" from whole string using a regex.
I tried using the \b(^[a-zA-Z]*)\b regex and that gave me "myname" as a result.
But when the string changes to "1234 myname 18-may" the regex does not return "myname". Please suggest the correct way to select only "myname" whole word.
Is it also possible - given the string in
"1234 myname 18-may" format - to get myname only, not may?
UPDATE
Judging by your feedback to your other question you might need
(?<!\p{L})\p{L}+(?!\p{L})
ORIGINAL ANSWER
I have come up with a lighter regex that relies on the specific nature of your data (just a couple of words in the string, only one is whole word):
\b(?<!-)\p{L}+\b
See demo
Or even a more restrictive regex that finds a match only between (white)spaces and string start/end:
(?<=^|\s)\p{L}+(?=\s|$)
The following regex is context-dependent:
\p{L}+(?=\s+\d{1,2}-\p{L}{3}\b)
See demo
This will match only the word myname.
The regex means:
\p{L}+ - Match 1 or more Unicode letters...
(?=\s+\d{1,2}-\p{L}{3}\b) - until it finds 1 or more whitespaces (\s+) followed with 1 or 2 digits, followed with a hyphen and 3 Unicode letters (\p{L}{3}) which is a whole word (\b). This construction is a positive look-ahead that only checks if something can be found after the current position in the string, but it does not "consume" text.
Since the date may come before the string, you can add an alternation:
\p{L}+(?=[ ]+\d{1,2}-\p{L}{3}\b)|(?<=\d{1,2}-\p{L}{3}[ ]+)\p{L}+
See another demo
The (?<=\d{1,2}-\p{L}{3}\s+) is a look-behind that checks for the same thing (almost) as the look-ahead, but before the myname.
here is a solution without RegEx
string input = "myname 18-may 1234";
string result = input.Split(' ').Where(x => x.All(y => char.IsLetter(y))).FirstOrDefault();
Do a replace using this regex:
(\s*\d+\-.{3}\s*|\s*.{3}\-\d+\s*)|(\s*\d+\s*)
you will end up with just your name.
Demo

How do I exclude a regex value in a replace

I have a regex expression which searches for strings using a Prefix and a Suffix. In it's simplest form \$\$\w+\$\$ will find $$My_Name$$ (in this case the Prefix and Suffix are both equal to $$) Another example would be \[\#\w+\#\] to match [#My_Name#].
The Prefix and Suffix will always be a specific string of 0 to n characters which I can always safely escape for a direct character match.
I extract the Matches in my C# code so I can work with them but obviously my match contains $$My_Name$$ but what I want is to simply get the inner string between the Suffix and Prefix: My_Name.
How do I exclude the Prefix and Suffix from the result?
Change your REGEX to \$\$(\w+)\$\$ and use $1 to get the matching (inner) string.
For example
string pattern = #"\$\$(\w+)\$\$";
string input = "$$My_Name$$";
Regex rgx = new Regex(pattern);
Match result = rgx.Match(input);
Console.WriteLine(result.Groups[1]);
Outputs: "My Name"
P.S - There's no need to use explicitly typed local variables, but I just wanted the types to be clear.
You can group your w+ into a group like this (w+) then when you retrieve the matched string you might be able to ask for that subgroup.
I do not know if I am wrong (but you didn't provided any code whatsoever) but I think this is how it is done: .Groups[1].Value on the the result of Regex.Match.
How about the regex below. It works by capturing the first character into a named group then capturing any repeats into a named group called first group which it then uses to match the end of the string. It will work with any number of repeated character so long as they repeated at the end of the word.
'(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+'
You just need to then extract the capture group named word like so:
String sample = "$$My_Name$$";
Regex regex = new Regex("(?<first_group>(?<first_char>.)\k<first_char>+)(?<word>\w+)\k<first_group>+");
Match match = regex.Match(sample);
if (match.Success)
{
Console.WriteLine(match.Groups["word"].Value);
}
You can use named group like this:
(\$\$)(?<group1>.+?)\1 -- pattern 1 (first case)
\[(#)(?<group2>.+?)\1\] -- pattern 2 (second case)
or combined representation would be:
(\$\$)(?<group1>.+?)\1|\[(#)(?<group2>.+?)\3\]
I would suggest you to use .+? it will help you to match any character other than your prefix/suffix.
Live Demo

Regex Substring or Left Equivalent

Greetings beloved comrades.
I cannot figure out how to accomplish the following via a regex.
I need to take this format number 201101234 and transform it to 11-0123401, where digits 3 and 4 become the digits to the left of the dash, and the remaining five digits are inserted to the right of the dash, followed by a hardcoded 01.
I've tried http://gskinner.com/RegExr, but the syntax just defeats me.
This answer, Equivalent of Substring as a RegularExpression, sounds promising, but I can't get it to parse correctly.
I can create a SQL function to accomplish this, but I'd rather not hammer my server in order to reformat some strings.
Thanks in advance.
You can try this:
var input = "201101234";
var output = Regex.Replace(input, #"^\d{2}(\d{2})(\d{5})$", "${1}-${2}01");
Console.WriteLine(output); // 11-0123401
This will match:
two digits, followed by
two digits captured as group 1, followed by
five digits captured as group 2
And return a string which replaces that matched text with
group 1, followed by
a literal hyphen, followed by
group 2, followed by
a literal 01.
The start and end anchors ( ^ / $ ) ensure that if the input string does not exactly match this pattern, it will simply return the original string.
If you can use custom C# scripts, you may want to use Substring instead:
string newStr = string.Format("{0}-{1}01", old.Substring(2,2), old.Substring(4));
I don't think you really need a regex here. Substring would be better. But still if you want regex only, you can use this:
string newString = Regex.Replace(input, #"^\d{2}(\d{2})(\d+)$", "$1-${2}01");
Explanation:
^\d{2} // Match first 2 digits. Will be ignored
(\d{2}) // Match next 2 digits. Capture it in group 1
(\d+)$ // Match rest of the digits. Capture it in group 2
Now, the required digits, are in group 1 and 2, which you use in the replacement string.
Do you even SQL? Pull some levers and stuff.

Regular expression for numbers in string

The input string "134.45sdfsf" passed to the following statement
System.Text.RegularExpressions.Regex.Match(input, pattern).Success;
returns true for following patterns.
pattern = "[0-9]+"
pattern = "\\d+"
Q1) I am like, what the hell! I am specifying only digits, and not special characters or alphabets. So what is wrong with my pattern, if I were to get false returned value with the above code statement.
Q2) Once I get the right pattern to match just the digits, how do I extract all the numbers in a string?
Lets say for now I just want to get the integers in a string in the format "int.int^int" (for example, "11111.222^3333", In this case, I want extract the strings "11111", "222" and "3333").
Any idea?
Thanks
You are specifying that it contains at least one digit anywhere, not they are all digits. You are looking for the expression ^\d+$. The ^ and $ denote the start and end of the string, respectively. You can read up more on that here.
Use Regex.Split to split by any non-digit strings. For example:
string input = "123&$456";
var isAllDigit = Regex.IsMatch(input, #"^\d+$");
var numbers = Regex.Split(input, #"[^\d]+");
it says that it has found it.
if you want the whole expression to be checked so :
^[0-9]+$
Q1) Both patterns are correct.
Q2) Assuming you are looking for a number pattern "5 digits-dot-3 digits-^-4 digits" - here is what your looking for:
var regex = new Regex("(?<first>[0-9]{5})\.(?<second>[0-9]{3})\^(?<third>[0-9]{4})");
var match = regex.Match("11111.222^3333");
Debug.Print(match.Groups["first"].ToString());
Debug.Print(match.Groups["second"].ToString
Debug.Print(match.Groups["third"].ToString
I prefer named capture groups - they will give a more clear way to acces than

C# regex not matching string

I have a string which is formatted like this: $20,$40,$AA,$FF. Basically, hex numbers and they can be of many bytes. I want to check if a string is in the above format, so I tried something like this:
string a = "$20,$30,$40";
Regex reg = new Regex(#"$[0-9a-fA-F],");
if (a.StartsWith(string.Format("{0}{1}", reg, reg)))
MessageBox.Show("A");
It doesn't seem to work though, is there anything I'm missing?
$ is a special character in regular expressions and means end of string. That regex won't match anything at all since you're specifying stuff after the string end. Escape the $ character like
"\$[0-9a-fA-F]{2},"
Anyway AFAIK this will not work with your string since it doesn't end with an ",". You might try:
"^(\$[0-9a-fA-F]{2},?)+$"
You can even simplify the regex by using case-insensitive regex matching:
Regex reg = new Regex(#"^(\$[0-9A-F]{2},?)+$", RegexOptions.IgnoreCase);
EDIT: corrected to match exactly 2 hexadecimal digits.
EDIT: maybe you should write your regex checking like:
if (Regex.IsMatch(a,#"^(\$[0-9A-F]{2},?)+$",RegexOptions.IgnoreCase))
{
// Do whatever
}
I think you are missing a quantifier:
"\$[0-9a-fA-F]+,"
For the problem with the comma at the end, I would simply append one at the end to keep the regex as simple as possible. But this is just the way I would do it.
There are 3 things that need to be changed:
Need to escape your $ symbol as it represents end of line.
\$
Need to tweak your regex pattern to match the entire string instead of parts.
^(\$[0-9a-fA-F]{2},+)+\$[0-9a-fA-F]{2}$
Need to change your code to use Regex.IsMatch.
string a = "$20,$30,$40";
if (Regex.IsMatch(a,#"^(\$[0-9a-fA-F]{2},+)+\$[0-9a-fA-F]{2}$",RegexOptions.IgnoreCase))
MessageBox.Show("A");
PS:
If the input string has white space like a tab or a space in between, then this regex will need to be modified. In such cases, you have to use "\s" at the right positions. For example, if you have white space around the commas like
string a = "$20 ,$30, $40";
then you need to tweak your RegEx this way:
^(\$[0-9a-fA-F]{2}\s*,+\s*)+\$[0-9a-fA-F]{2}\s*$
References:
C# Regex Testers
A Better .NET Regular Expression Tester
RegexHero tester
about Regex.IsMatch (instead of using Match)
MSDN Regex.isMatch
Usage example
C# Regular Expression Cheat Sheet
Old answer below (Ignore):
Try this:
"\$[0-9a-fA-F]{2}?[,]{0,1}"
You might also want to add a repeat modifier to your set such that it becomes;
"\$[0-9a-fA-F]+,"

Categories