I want to check if an input string follows a pattern and if it does extract information from it.
My pattern is like this Episode 000 (Season 00). The 00s are numbers that can range from 0-9. Now I want to check if this input Episode 094 (Season 02) matches this pattern and because it does it should then extract those two numbers, so I end up with two integer variables 94 & 2:
string latestFile = "Episode 094 (Season 02)";
if (!Regex.IsMatch(latestFile, #"^(Episode)\s[0-9][0-9][0-9]\s\((Season)\s[0-9][0-9]\)$"))
return
int Episode = Int32.Parse(Regex.Match(latestFile, #"\d+").Value);
int Season = Int32.Parse(Regex.Match(latestFile, #"\d+").Value);
The first part where I check if the overall string matches the pattern works, but I think it can be improved. For the second part, where I actually extract the numbers I'm stuck and what I posted above obviously doesn't works, because it grabs all digits from the string. So if anyone of you could help me figure out how to only extract the three number characters after Episode and the two characters after Season that would be great.
^Episode (\d{1,3}) \(Season (\d{1,2})\)$
Captures the 2 numbers (even with length 1 to 3/2) and gives them back as a group.
You can go even further and name your groups:
^Episode (?<episode>\d{1,3}) \(Season (?<season>\d{1,2})\)$
and then call them.
Example for using groups:
string pattern = #"abc(?<firstGroup>\d{1,3})abc";
string input = "abc234abc";
Regex rgx = new Regex(pattern);
Match match = rgx.Match(input);
string result = match.Groups["firstGroup"].Value; //=> 234
You can see what the expressions mean and test them here
In your regex ^(Episode)\s[0-9][0-9][0-9]\s\((Season)\s[0-9][0-9]\)$ you are capturing Episode and Season in a capturing group, but what you actually want to capture is the digits. You could switch your capturing groups like this:
^Episode\s([0-9][0-9][0-9])\s\(Season\s([0-9][0-9])\)$
Matching 3 digits in this way [0-9][0-9][0-9] can be written as \d{3} and [0-9][0-9] as \d{2}.
That would look like ^Episode\s(\d{3})\s\(Season\s(\d{2})\)$
To match one or more digits you could use \d+.
The \s is a matches a whitespace character. You could use \s or a whitespace.
Your regex could look like:
^Episode (\d{3}) \(Season (\d{2})\)$
string latestFile = "Episode 094 (Season 02)";
GroupCollection groups = Regex.Match(latestFile, #"^Episode (\d{3}) \(Season (\d{2})\)$").Groups;
int Episode = Int32.Parse(groups[1].Value);
int Season = Int32.Parse(groups[2].Value);
Console.WriteLine(Episode);
Console.WriteLine(Season);
That would result in:
94
2
Demo C#
Related
I want to extract the double value from the string that contains a specific keyword. For example:
Amount : USD 3,747,190.67
I need to extract the value "3,747,190.67" from the string above using the keyword Amount, for that I tested this pattern in different online Regex testers and it works:
(?<=\bAmount.*)(\d+\,*\.*)*
However it doesn't work on my C# code:
if (type == typeof(double))
{
double doubleVal = 0;
pattern = #"(?<=\bAmount.*)(\d+\,*\.*)*";
matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
return doubleVal;
}
This one works:
(?<=\bAmount.*)\d+(,\d+)*(\.\d+)?
(?<=\bAmount.*) the look behind
\d+ leading digits (at least one digit)
(,\d+)* thousands groups (zero or more times)
(\.\d+)? decimals (? = optional)
Note that the regex tester says "9 matches found" for your pattern. For my pattern it says "1 match found".
The problem with your pattern is that its second part (\d+\,*\.*)* can be empty because of the * at the end. The * quantifier means zero, one or more repetitions. Therefore, the look-behind finds 8 empty entries between Amount and the number. The last of the 9 matches is the number. You can correct it by replacing the * with a +. See: regextester with *, regextester with +. You can also test it with "your" tester and switch to the table to see the detailed results.
My solution does not allow consecutive commas or points but allows numbers without thousands groups or a decimal part.
The lookbehind (?<=\bAmount.*) is always true in the example data after Amount.
The first 7 matches are empty, as (\d+\,*\.*)* can not consume a character where there is no digit, but it matches at the position as the quantifier * matches 0 or more times.
See this screenshot of the matches:
You might use
(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b
(?<=\bAmount\D*) Positive lookbehind, assert Amount to the left followed by optional non digits
\d{1,3} Match 1-3 digits
(?:,\d{3})* Optionally repeat , and 3 digits
(?:\.\d{1,2})? Optionally match . and 1 or 2 digits
\b A word boundary
See a .NET regex demo
For example
double doubleVal = 0;
var pattern = #"(?<=\bAmount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?\b";
var textToParse = "Amount : USD 3,747,190.67";
var matchPattern = Regex.Match(textToParse, pattern);
if (matchPattern.Success)
{
double.TryParse(matchPattern.Value.ToString(), out doubleVal);
}
Console.WriteLine(doubleVal);
Output
3747190.67
You can omit the word boundaries if needed if a partial match is also valid
(?<=Amount\D*)\d{1,3}(?:,\d{3})*(?:\.\d{1,2})?
I need to check if a string begins with 2 specific letters and then is followed by any 4 numbers.
the 2 letters are "BR" so BR1234 would be valid so would BR7412 for example.
what bit of code do I need to check that the string is a match with the Regex in C#?
the regex I have written is below, there is probably a more efficient way of writing this (I'm new to RegEx)
[B][R][0-9][0-9][0-9][0-9]
You can use this:
Regex regex = new Regex(#"^BR\d{4}");
^ defines the start of the string (so there should be no other characters before BR)
BR matches - well - BR
\d is a digit (0-9)
{4} says there must be exactly 4 of the previously mentioned group (\d)
You did not specify what is allowed to follow the four digits. If this should be the end of the string, add a $.
Usage in C#:
string matching = "BR1234";
string notMatching = "someOther";
Regex regex = new Regex(#"^BR\d{4}");
bool doesMatch = regex.IsMatch(matching); // true
doesMatch = regex.IsMatch(notMatching); // false;
BR\d{4}
Some text to make answer at least 30 characters long :)
Why doesn't this regex pattern parse the string "Season 02 Episode 01" properly?
For example, this is not a match:
var fileName = "Its Always Sunny in Philadelphia Season 02 Episode 01 - Charlie Gets Crippled.avi"
// Regex explanation:
// Starts with "S" and can contain more letters, can continue with space, then contains two numbers.
// Then starts with "E" again and can contain more letters, can continue with space, then contains two numbers.
var pattern = #"S\w?\s?(\d\d)\s?E\w?\s?(\d\d)";
var regex = new Regex(pattern, RegexOptions.IgnoreCase);
var match = regex.Match(fileName);
Use * instead of ?
? is for 0 or 1 time. * is for 0 or more times.
Starts with "S" and can contain more letters [...]
You mean +, not ?.
var pattern = #"S\w+\s+(\d+)\s+E\w+\s+(\d+)";
Note that this regex is pretty unspecific. Watch out for false positives. I'd recommend to make the expression more specific.
I would like to replace from a number of 16 digits, it's 5th to 10th digit.
How can that be achieved with a regular expression (C#)?
The way to do it is to capture in the inner and outer portions separately, like this:
// Split into 2 groups of 5 digits and 1 of 6
string regex = "(\\d{5})(\\d{5})(\\d{6})";
// Insert ABCDEF in the middle of
// match 1 and match 3
string replaceRegex = "${1}ABCDE${3}";
string testString = "1234567890999999";
string result = Regex.Replace(testString, regex, replaceRegex);
// result = '12345ABCDE999999'
Why use a regular expression? If by "number of 16 digits", you mean a 16 character long string representation of a number, then you'd probably be better off just using substring.
string input = "0000567890000000";
var output = input.Substring(0, 4) + "222222" + input.Substring(10, 6);
Or did you mean you want to swap the 5th and 10th digits? Your question isn't very clear.
Use the regular expression (?<=^\d{4})\d{6}(?=\d{6}$) to achieve it without capture groups.
It looks for 6 consecutive digits (5th to 10th inclusively) that are preceded by the first 4 digits and the last 6 digits of the string.
Regex.Replace("1234567890123456", #"(?<=^\d{4})\d{6}(?=\d{6}$)", "replacement");
Got it...
by creating 3 capturing groups:
([\d]{5})([\d]{5})([\d]{6})
keep capturing group1 and 3 and replace group2 with stars (or whatever)
$1*****$3
C# code below
string resultString = null;
try {
resultString = Regex.Replace(subjectString, #"([\d]{5})([\d]{5})([\d]{6})", "$1*****$2", RegexOptions.Singleline);
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
I have the following string:
01-21-27-0000-00-048 and it is easy to split it apart because each section is separated by a -, but sometimes this string is represented as 01-21-27-0000-00048, so splitting it is not as easy because the last 2 parts are combined. How can I handle this? Also, what about the case where it might be something like 01-21-27-0000-00.048
In case anyone is curious, this is a parcel number and it varies from county to county and a county can have 1 format or they can have 100 formats.
This is a very good case for using regular expressions. You string matches the following regexp:
(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})
Match the input against this expression, and harvest the six groups of digits from the match:
var str = new[] {
"01-21-27-0000-00048", "01-21-27-0000-00.048", "01-21-27-0000-00-048"
};
foreach (var s in str) {
var m = Regex.Match(s, #"(\d{2})-(\d{2})-(\d{2})-(\d{4})-(\d{2})[.-]?(\d{3})");
for (var i = 1 /* one, not zero */ ; i != m.Groups.Count ; i++) {
Console.Write("{0} ", m.Groups[i]);
}
Console.WriteLine();
}
If you would like to allow for other characters, say, letters in the segments that are separated by dashes, you could use \w instead of \d to denote a letter, a digit, or an underscore. If you would like to allow an unspecified number of such characters within a known range, say, two to four, you can use {2,4} in the regexp instead of the more specific {2}, which means "exactly two". For example,
(\w{2,3})-(\w{2})-(\w{2})-(\d{4})-(\d{2})[.-]?(\d{3})
lets the first segment contain two to three digits or letters, and also allow for letters in segments two and three.
Normalize the string first.
I.e. if you know that the last part is always three characters, then insert a - as the fourth-to-last character, then split the resultant string. Along the same line, convert the dot '.' to a dash '-' and split that string.
Replace all the char which are not digit with emptyString('').
then any of your string become in the format like
012127000000048
now you can use the divide it in (2, 2, 2, 4, 2, 3) parts.