I am trying to match the numbers that are not in the context of Value(X) and discard rest of text.
Example text:
lorem ipsum Value (3) dfasdf 654345435ds sdfsdf asdf
asd
F
asdf
sad Value (2)
Example Regex:
Value\((\d)\)
Thanks for help.
The .NET regex engine supports a quantifier in the lookbehind assertion.
What you might do is assert that from the current position, the is not Value( to the left that has 1+ digits and ) to the right. If that is the case, match 1 or more digits.
The pattern matches:
(?<!\bValue[\p{Zs}\t]*\((?=[0-9]+\)))[0-9]+
(?<! Positive lookbehind, assert what is to the left is
\bValue Match Value preceded by a word boundary to prevent a partial match
[\p{Zs}\t]*\( Match optional horizontal spaces followed by (
(?=[0-9]+\)) Positive lookahead, assert 1+ digits followed by ) to the right
) Close lookbehind
[0-9]+ Match 1+ digits 0-9
.NET regex demo
Note that \d matches more digits than 0-9 only, but also from other languages. If you want match that, you can use \d, else you can use [0-9] instead.
You are looking for:
(?<!Value *\()\d+)
Note that I am assuming that every Value( has a closing bracket.
Explanation:
(?<!Value *\() asserts that what follows it is not preceded by "Value(", Value (, Value ( and so on.
\d+ matches a digit between one and infinite times
Something like this ought to do you:
private static readonly Regex rx = new Regex(#"
(?<! # A zero-width negative look-behind assertion, consisting of:
\w # - a word boundary, followed by
Value # - the literal 'Value', followed by
\s* # - zero or more whitespace characters, followed by
[(] # - a left parenthesis '(', followed by
\s* # - zero or more whitespace characters,
) # The whole of which is followed by
( # A number, consisting of
-? # - an optional minus sign, followed by
\d+ # - 1 or more decimal digits,
) # The whole of which is followed by
(?! # A zero-width negative look-ahead assertion, consisting of
\s* # - zero or more whitespace characters, followed by
[)] # - a single right parenthesis ')'
) #
",
rxOpts
);
private const RegexOptions rxOpts = RegexOptions.IgnoreCase
| RegexOptions.ExplicitCapture
| RegexOptions.IgnorePatternWhitespace
;
Then . . .
foreach ( Match m in rx.Matches( someText ) )
{
string nbr = m.Value;
Console.WriteLine("Found '{0}', nbr);
}
Related
This will execute in the C# Regex Engine, in the .Net Framework 4.7.2.
I need a Regular Expression to search strings for "words" that match the following properties:
A numeric value, such as 1234, or 10.00
An alphanumeric value, such as ABC123 or ABC10.00
NOT an alpha-only value, such as cat or CAT
Matches separated by any non alpha-numeric character.
Matches: "123", "ABC123", "abc123", "10.00", "ABC.123", "Foo10.00"
Non-matches: "sugar", "rush", "XYZ"
In the following example string, the matches I want are in bold-italic:
789|--|789 ABC 123 10.00 ABC123 123ABC ABC123ABC abc.123.abc
I am currently using the following regex, but it is just an aggregation of all the special cases, and doesn't cover fully-complex cases. There must be a more efficient way to write this:
(?<=^|[\W])(?:[\d]+[A-Za-z]{1,}|[A-Za-z]+[\d]{1,}|[\d]+[.]+[\d]{1,}|[\d]{1,})(?=$|[\W])
This regex will match most of the examples above, but it will not not match any value where we toggle from numbers to letters and back, or vice-versa, like this: A1B2C3D4.
To test: https://regex101.com/r/oeSg10/1
You may use
(?xi) # Enable free-spacing and case insensitive mode
\b # Word boundary
(?=[A-Z.]*[0-9]) # After any 0+ letters/dots there must be a digit
[A-Z0-9]+ # 1+ letters or digits
(?:\.[A-Z0-9]+)* # 0+ repetitions of a . and then 1+ letters/digits
\b # Word boundary
See the regex demo at regex101.com and a .NET regex demo showing it really works in a .NET environment.
In C# code, you may use
var Pattern = new Regex(#"
\b # Word boundary
(?=[A-Z.]*[0-9]) # After any 0+ letters/dots there must be a digit
[A-Z0-9]+ # 1+ letters or digits
(?:\.[A-Z0-9]+)* # 0+ repetitions of a . and then 1+ letters/digits
\b # Word boundary",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
where (?x) = RegexOptions.IgnorePatternWhitespace and (?i) = RegexOptions.IgnoreCase.
Sample String
"[] [ds*[000112]] [1448472995] sample string [1448472995] ***";
The regex should match
[1448472995] [1448472995]
and should not match [000112] since there is outer square bracket.
Currently I have this regex that is matching [000112] as well
const string unixTimeStampPattern = #"\[([0-9]+)]";
This is a good way to do it using balanced text.
( \[ \d+ \] ) # (1)
| # or,
\[ # Opening bracket
(?> # Then either match (possessively):
[^\[\]]+ # non - brackets
| # or
\[ # [ increase the bracket counter
(?<Depth> )
| # or
\] # ] decrease the bracket counter
(?<-Depth> )
)* # Repeat as needed.
(?(Depth) # Assert that the bracket counter is at zero
(?!)
)
\] # Closing bracket
C# sample
string sTestSample = "[] [ds*[000112]] [1448472995] sample string [1448472995] ***";
Regex RxBracket = new Regex(#"(\[\d+\])|\[(?>[^\[\]]+|\[(?<Depth>)|\](?<-Depth>))*(?(Depth)(?!))\]");
Match bracketMatch = RxBracket.Match(sTestSample);
while (bracketMatch.Success)
{
if (bracketMatch.Groups[1].Success)
Console.WriteLine("{0}", bracketMatch);
bracketMatch = bracketMatch.NextMatch();
}
Output
[1448472995]
[1448472995]
You need to use balancing groups to handle this - it looks a bit daunting but isn't all that complicated:
Regex regexObj = new Regex(
#"\[ # Match opening bracket.
\d+ # Match a number.
\] # Match closing bracket.
(?= # Assert that the following can be matched ahead:
(?> # The following group (made atomic to avoid backtracking):
[^\[\]]+ # One or more characters except brackets
| # or
\[ (?<Depth>) # an opening bracket (increase bracket counter)
| # or
\] (?<-Depth>) # a closing bracket (decrease bracket counter, can't go below 0).
)* # Repeat ad libitum.
(?(Depth)(?!)) # Assert that the bracket counter is now zero.
[^\[\]]* # Match any remaining non-bracket characters
\z # until the end of the string.
) # End of lookahead.",
RegexOptions.IgnorePatternWhitespace);
Are you just trying to capture the unix time stamp? Then you can try a simpler one where you specify the minimum number of characters matched in a group.
\[([0-9]{10})\]
Here I limit it to 10 characters since I doubt the time stamp will hit 11 characters anytime soon... To protect against that:
\[([0-9]{10,11})\]
Of course this could lead to false positives if you have a 10-length number in an enclosing bracket.
This will match your expression as expected: http://regexr.com/3csg3 it uses lookahead.
Examples:
i General Biology i
i General Biology
General Biology i
I need to catch any phrase that begins with a single letter or number, ends with a letter or number, or both begins and ends with a single letter or number so that I can pre-parse the data to this:
General Biology
I've tried tons of examples on Rubular but can't seem to figure this one out. I've used literal match groups to get those characters but I don't want the match groups per se I literally just want the regex to only capture those two letters.
You can use the following to achieve this:
String result = Regex.Replace(input, #"(?i)^[a-z0-9]\s+|\s+[a-z0-9]$", "");
Explanation:
This removes a single letter/number at the beginning/end of the string followed or preceded by whitespace.
(?i) # set flags for this block (case-insensitive)
^ # the beginning of the string
[a-z0-9] # any character of: 'a' to 'z', '0' to '9'
\s+ # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
| # OR
\s+ # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
[a-z0-9] # any character of: 'a' to 'z', '0' to '9'
$ # before an optional \n, and the end of the string
Working Demo
So far I have this perfectly working regex:
(?:(?<=\s)|^)#(\w*[A-Za-z_]+\w*)
It finds any word that starts with a hash tag (ex. #lolz but not hsshs#jdjd)
The problem is I also want it to match parenthesis. So if I have this it will match:
(#lolz wow)
or
(wow #cool)
or
(#cool)
Any idea on how can I make or use my regex to work like that?
The following seemed to work for me ...
\(?#(\w*[A-Za-z_]+\w*)\)?
The way you are using the following in context is overkill..
\w*[A-Za-z_]\w*
\w alone matches word characters ( a-z, A-Z, 0-9, _ ). And it is not necessary for the use of the non-capturing group (?: to be wrapped around your lookbehind assertion here.
I do believe that the following would suffice by itself.
(?<=^|\s)\(?#(\w+)\)?
Regular expression:
(?<= look behind to see if there is:
^ the beginning of the string
| OR
\s whitespace (\n, \r, \t, \f, and " ")
) end of look-behind
\(? '(' (optional (matching the most amount possible))
# '#'
( group and capture to \1:
\w+ word characters (a-z, A-Z, 0-9, _) (1 or more times)
) end of \1
\)? ')' (optional (matching the most amount possible))
See live demo
You can also use a negative lookbehind here if you wanted to.
(?<![^\s])\(?#(\w+)\)?
I WOuld like to implement textBox in which user can only insert text in pattern like this:
dddddddddd,
dddddddddd,
dddddddddd,
...
where d is a digit. If user leave control with less then 10 digits in a row validation should fail and he should not be able to write in one line more than 10 digits, then acceptable should be only comma ",".
Thanks for help
Match m = Regex.Match(textBox.Text, #"^\d{10},$", RegexOptions.Multiline);
Haven't tried it, but it should work. Please take a look here and here for more information.
I suggest the regex
\A(?:\s*\d{10},)*\s*\d{10}\s*\Z
Explanation:
\A # start of the string
(?: # match the following zero or more times:
\s* # optional whitespace, including newlines
\d{10}, # 10 digits, followed by a comma
)* # end of repeated group
\s* # match optional whitespace
\d{10} # match 10 digits (this time no comma)
\s* # optional whitespace
\Z # end of string
In C#, this would look like
validInput = Regex.IsMatch(subjectString, #"\A(?:\s*\d{10},)*\s*\d{10}\s*\Z");
Note that you need to use a verbatim string (#"...") or double all the backslashes in the regex.