.NET Regex - Specific String containing a changing number - c#

I am new to working with Regexs in C# .NET. Say I have a string as follows...
"Working on log #4"
And within this string we can expect to see the number (4) vary. How can I use a Regex to extract only that number from the string.
I want to make sure that the string matches the first part:
"Working on log #"
And then exctract the integer from it.
Also - I know that I could do this using string.Split(), or .Substring, etc. I just wanted to know how I might use regex's to do this.
Thanks!

"Working on log #(\d+)"
The () create a match group, so you will be able to extract that section.
The \d matches any digit.
The + says "look at the previous token, match it one or more times" so it will make it match one or more digits.
So overall you're capturing a group containing one or more digits, where that group comes after "Working on log #"

RegEx rgx = new RegEx("Working on log #[0-9]"); is the pattern you want to use. The first part is a string literal, [0-9] says that character can be any value 0 through 9. If you allow multiple digits then change it to [0-9]{x} where x is the number of repetitions or [0-9]+ as a + after any character means 1 or more of that character is allowed.
You could also just do string.StartsWith("Working on log #") then split on # and use int.TryParse() with the second value to confirm it is in fact a valid integer.

Try this: ^(?<=Working on log #)\d+$. This only captures the number. No need for a capture group. Remove ^ and $ if this is within a larger string.
^ - start of string
(?<=) - positive lookbehind - ensures what is between = and ) is found before
\d+ - at least one digit
$ - end of string

A capturing group is the solution:
"Working on log #(?<Number>[0-9]+)"
Then you can access the matched groups using the Match.Groups property.

Related

RegEx for string starts with number and followed by + character

I want a regular expression for such inputs:
1+2
3
1+22+3
But If I write following inputs then it should not allow. Such as;
+1+2
1+
a+1+b+c
12+
The string must start with number and then followed by only + character. But After the + character, it has to be any number.
I tried this [^0-9][^+]? but İt deletes the + sign at the start with the regex I wrote, but there is a problem. While deleting the + character, it also removes the number next to it. This event keeps repeating.
How can I do this?
Please try :
\d+(\+\d)*
Demo: https://regex101.com/r/hfqmYr/2
Where:
\d -> Matches with any digit
+ -> Matches a symbol one or more times
* -> Matching a symbol 0 or many times
As mentioned in the comments, it looks like you can use:
^[0-9]+(?:\+[0-9]+)*$
This is to allow the mentioned sample data and discard those you don't want to allow. See an online demo. The pattern matches:
^ - Start line anchor.
[0-9]+ - 1+ Digits (ASCII).
(?:\+[0-9]+)* - 0+ Times a non-capture group to allow for a literal plus followed by 1+ digits (ASCII).
$ - End line anchor.
As per my knowledge .NET requires you to explicitly mention these ASCII digits to avoid matching numbers from other languages (unless specified otherwise using ECMAScript options).

C# Regular expression to match on a character not following pairs of the same charcater

Objective: Regex Matching
For this example I'm interested in matching a "|" pipe character.
I need to match it if it's alone: "aaa|aaa"
I need to match it (the last pipe) only if it's preceded by pairs of pipe: (2,4,6,8...any even number)
Another way: I want to ignore ALL pipe pairs "||" (right to left)
or I want to select bachelor bars only (the odd man out)
string twomatches = "aaaaaaaaa||||**|**aaaaaa||**|**aaaaaa";
string onematch = "aaaaaaaaa||**|**aaaaaaa||aaaaaaaa";
string noMatch = "||";
string noMatch = "||||";
I'm trying to select the last "|" only when preceded by an even sequence of "|" pairs or in a string when a single bar exists by itself.
Regardless of the number of "|"
You may use the following regex to select just odd one pipe out:
(?<=(?<!\|)(?:\|{2})*)\|(?!\|)
See regex demo.
The regex breakdown:
(?<=(?<!\|)(?:\|{2})*) - if a pipe is preceded with an even number of pipes ((?:\|{2})* - 0 or more sequences of exactly 2 pipes) from a position that has no preceding pipe ((?<!\|))
\| - match an odd pipe on the right
(?!\|) - if it is not followed by another pipe.
Please note that this regex uses a variable-width look-behind and is very resource-consuming. I'd rather use a capturing group mechanism here, but it all depends on the actual purpose of matching that odd pipe.
Here is a modified version of the regex for removing the odd one out:
var s = "1|2||3|||4||||5|||||6||||||7|||||||";
var data = Regex.Replace(s, #"(?<!\|)(?<even_pipes>(?:\|{2})*)\|(?!\|)", "${even_pipes}");
Console.WriteLine(data);
See IDEONE demo. Here, the quantified part is moved from lookbehind to an even_pipes named capturing group, so that it could be restored with the backreference in the replaced string. Regexhero.net shows 129,046 iterations per second for the version with a capturing group and 69,206 with the original version with variable-width lookbehind.
Only use variable-width look-behind if it is absolutely necessary!
Oh, it's reopened! If you need better performance, also try this negative improved version.
\|(?!\|)(?<!(?:[^|]|^)(?:\|\|)*)
The idea here is to first match the last literal | at right side of a sequence or single | and execute a negated version of the lookbehind just after the match. This should perform considerably better.
\|(?!\|) matches literal | IF NOT followed by another pipe character (right most if sequence).
(?<!(?:[^|]|^)(?:\|\|)*) IF position right after the matched | IS NOT preceded by (?:\|\|)* any amount of literal || until a non| or ^ start.In other words: If this position is not preceded by an even amount of pipe characters.
Btw, there is no performance gain in using \|{2} over \|\| it might be better readable.
See demo at regexstorm

How Do I format a telephone number using regex

I need to format my telephone numbers in a specific way. Unfortunately business rules prohibit me from doing this up front. (separate input boxes etc..)
The format needs to be +1-xxx-xxx-xxxx where the "+1" is constant. (We don't do business internationally)
Here is my regex pattern to test the input:
"\\D*([2-9]\\d{2})(\\D*)([2-9]\\d{2})(\\D*)(\\d{4})\\D*"
(which I stole from somewhere else)
Then I perform a regex.Replace() like so:
regex.Replace(telephoneNumber, "+1-$1-$3-$5"); **THIS IS WHERE IT BLOWS UP**
If my telephone number already has the "+1" in the string, it prepends another so that I get +1-+1-xxx-xxx-xxxx
Can someone please help?
You can add (?:\+1\D*)? to catch an optional prefix before the number. As it's caught it will be replaced if it's there.
You don't need to use \D* before and after the number. As they are optional, they don't change anything.
You don't need to capture the parts that you won't use, that makes it easier to see what ends up in the replacement.
str = Regex.Replace(str, #"(?:\+1\D*)?([2-9]\d{2})\D*([2-9]\d{2})\D*(\d{4})", "+1-$1-$2-$3");
You might consider using something more specific than \D* for the separators though, for example [\- /]?. With a too non-specific pattern you risk catching something that's not a phone number, for example changing "I have 234 cats, 528 dogs and 4509 horses." into "I have +1-234-528-4509 horses.".
str = Regex.Replace(str, #"(?:\+1[\- /]?)?([2-9]\d{2})[\- /]?([2-9]\d{2})[\- /]?(\d{4})", "+1-$1-$2-$3");
try something like this to make things more readable:
Regex rxPhoneNumber = new Regex( #"
^ # anchor the start-of-match to start-of-text
\D* # - allow and ignore any leading non-digits
1? # - we'll allow (and ignore) a leading 1 (as in 1-800-123-4567
\D* # - allow and ignore any non-digits following that
(?<areaCode>[2-9]\d\d) # - required 3-digit area code
\D* # - allow and ignore any non-digits following the area code
(?<exchangeCode>[2-9]\d\d) # - required 3-digit exchange code (central office)
\D* # - allow and ignore any non-digits following the C.O.
(?<subscriberNumber>\d\d\d\d) # - required 4-digit subscriber number
\D* # - allow and ignore any non-digits following the subscriber number
$ # - followed the end-of-text.
" ,
RegexOptions.IgnorePatternWhitespace|RegexOptions.ExplicitCapture
);
string input = "voice: 1 (234) 567/1234 (leave a message)" ;
bool isValid = rxPhoneNumber.IsMatch(input) ;
string tidied = rxPhoneNumber.Replace( input , "+1-${areaCode}-${exchangeCode}-${subscriberNumber}" ) ;
which will give tidied the desired value
+1-234-567-1234
You can use the following regex
\D*(\+1-)?([2-9]\d{2})\D*([2-9]\d{2})\D*(\d{4})\D*
And the replacement string:
$1$2-$3-$4
Here is a demo
This is a kind of an adjustment of the regex you had. If you need to match the whole numbers, I'd use
(\+1-)?\b([2-9]\d{2})\D*([2-9]\d{2})\D*(\d{4})\b
See demo 2
Also, if the hyphen in \+1- is optional, add a ?: \+1-?.
To make the regex safer, I'd replace \D* (0 or more non-digit symbols) with some character class containing known separators, e.g [ /-]* (matching /, spaces and -s).

What is this regex supposed to match - h*

I have a piece of code that is supposed look through a list of strings to match a regular expression whose pattern is an input from the user. Inputs such as
h*
q*
y*
seem to match anything and everything. My questions -
Is any of the above a valid regex pattern at all?
If yes, what exactly are they supposed to match?
I've gone through http://regexhero.net/reference/ but couldn't find anything that specifies such expression.
I've used http://regexhero.net/tester/ to check what my regex matches with q* as the Regular Expression and Whatever as the Target String. It gives me 9 matches!
h* means zero or more h characters
The same for the others
These patterns match any number of the specified character, including zero. Without any anchors, there are 9 places where there are zero q in whatever (between the characters and at the ends).
Out of your reference:
Ordinary characters — Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.
* — Repeat 0 or more times matching as many times as possible.

Does regex + symbol apply to previous element only?

In order to match all strings beginning with 04 and only containing digits, will the following work?
Regex.IsMatch(str, "^04[0-9]+$")
Or is another set of brackets necessary?
Regex.IsMatch(str, "^04([0-9])+$")
In Regex:
[character_group]
Matches any single character in character_group.
\d
Matches any decimal digit.
+
Matches the previous element one or more times.
(subexpression)
Captures the matched subexpression and assigns it a ordinal number.
^
The match must start at the beginning of the string or line.
$
The match must occur at the end of the string or before \n at the end of the line or string.
so that this code could be helpful:
Regex.IsMatch(str, "^04\d+$")
and all of your code works correctly.
Your first regex is correct, but the second one isn't. It matches the same things as the first regex, but it does a lot of unnecessary work in the process. Check it out:
Regex.IsMatch("04123", #"^04([0-9])+$")
In this example, the 1 is captured in group #1, only to be overwritten by 2 and again by 3. It's almost never a good idea to add a quantifier to a capturing group. For a detailed explanation, read this.
But maybe it's precedence rules you're asking about. Quantifiers have higher precedence than concatenation, so there's no need to isolate the character class with parentheses (if that's what you're doing).

Categories