Regex to capture an exact word in a sentence

Regex to capture an exact word in a sentence - c#

I'm having some trouble to capture a specific string inside of a sentence.
The Regex I'm using is \b[0-9]{9,12}\b to capture numbers which have between 9 and 12 digits. The boundary I was using it to specify the exact number, but the problem is, when I have a number which matches with this regex followed by a dot, for example, the regex still matching and giving me much trouble.
As I searched, the problem is that \b uses some special characters as a separator too, right? Then is there a way to consider, for example 123456789. a whole string and the regex will not match with that example?
Thanks !

The word boundary \b requires a non-word character before and after a digit (as a digit is a word character). As dots and commas are non-word characters, they are allowed. To make sure the digit sequence between dots is not matched, you need to use lookarounds.
You can use
\b(?<!\.)[0-9]{9,12}(?!\.)\b
See the regex demo
The additional subpatterns are the lookbehind (?<!\.) and a lookahead (?!\.) that make sure there are no . before and after the digit sequence.
If you have . and , as decimal separators, you may want to adjust the pattern to
\b(?<![.,])[0-9]{9,12}(?![.,])\b

Related

Update regular expression to only allow single spaces [duplicate]

Right now I have a regex that prevents the user from typing any special characters. The only allowed characters are A through Z, 0 through 9 or spaces.
I want to improve this regex to prevent the following:
No leading/training spaces - If the user types one or more spaces before or after the entry, do not allow.
No double-spaces - If the user types the space key more than once, do not allow.
The Regex I have right now to prevent special characters is as follows and appears to work just fine, which is:
^[a-zA-Z0-9 ]+$
Following some other ideas, I tried all these options but they did not work:
^\A\s+[a-zA-Z0-9 ]+$\A\s+
/s*^[a-zA-Z0-9 ]+$/s*
Could I get a helping hand with this code? Again, I just want letters A-Z, numbers 0-9, and no leading or trailing spaces.
Thanks.

You can use the following regex:
^[a-zA-Z0-9]+(?: [a-zA-Z0-9]+)*$
See regex demo.
The regex will match alphanumerics at the start (1 or more) and then zero or more chunks of a single space followed with one or more alphanumerics.
As an alternative, here is a regex based on lookaheads (but is thus less efficient):
^(?!.* {2})(?=\S)(?=.*\S$)[a-zA-Z0-9 ]+$
See the regex demo
The (?!.* {2}) disallows consecutive spaces and (?=.*\S$) requires a non-whitespace to be at the end of the string and (?=\S) requires it at the start.

Regex to extract string between digit pattern and colon or newline

I have to extract string between digit pattern and either a colon or newline (first occurence)
my string would look like:
05-30-1306-29-13 BUILDERS RISK:
LIMITS/DEDUCTIBLES:
I would like to extract BUILDERS RISK. There may or may not be a colon, in such case we will treat newline as the terminating pattern
Here's what I have come up with so far
\d{2}-\d{2}-\d{4}-\d{2}-\d{2}\s*\W+[^:|\n]+:\s*
Numerical pattern will always be 2-2-4-2 followed by any string followed by either \n or :
The regex so far gets what I need but I don't know how to break it into different matches so I can take the second match
1st match - digit pattern
2nd match - what i need
3rd match - colon or newline
Any pointers will be helpful.
UPDATE: Couple of alternatives of the text term to be searched could be this
11-06-1212-29-12 DWELLING FIRE (DP-3): ANNUAL RENTAL
11-05-1212-26-12 HOMEOWNERS (HO-3): SECONDARY HOME
I would only want anything before colon or if that is not there, take string till newline is found. As a side note, the text of significance may not be present in same line and appear in next line but will always be followed by either a colon or newline in the same line.
PS: Extracted text should not contain colon

It appears you may use
\b(\d{2}-\d{2}-\d{4}-\d{2}-\d{2})\W+(.*?)(:?\r?\n\s*)
See the regex demo yielding
Details
\b - a word boundary (change to (?<!\d) if the digits can be glued to a letter or underscore)
(\d{2}-\d{2}-\d{4}-\d{2}-\d{2}) - Group 1: two digits, -, two digits, -, four digits, -, two digits, -, two digits
\W+ - 1+ non-word chars (to stay on the line, replace with [^\w\r\n]+)
(.*?) - Group 2: any zero or more chars other than newline, as few as possible
(:?\r?\n\s*) - Group 3: an optional :, an optional CR, an LF symbol and then any 0+ whitespace chars.

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.

Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.

One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).

If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.

This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.

This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

C# Regex boundary with special characters

I want to have a Regex that finds "Attributable".
I tried #"\bAttributable\b" but the \b boundary doesn't work with special characters.
For example, it wouldn't differentiate Attributable and Non-Attributable. Is there any way to Regex for Attributable and not it's negative?

Do a negative look-behind?
(?<!-)\bAttributable\b
Obviously this only checks for -s. If you want to check for other characters, put them in a character class in the negative look-behind:
(?<![-^])\bAttributable\b
Alternatively, if you just want to not match Non-Attributable but do match SomethingElse-Attributable, then put Non- in the look-behind:
(?<!Non-)\bAttributable\b

There are several ways to fix the issue like you have but it all depends on the real requirements. It is sometimes necessary to precise what "word boundary" you need in each concrete case, since \b word boundary is 1) context dependent, and 2) matches specific places in the string that you should be aware of:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.
Now, here are several approaches that you may follow:
When you only care about compound words usually joined with hyphens (similar #Sweeper's answer): (?<!-)\bAttributable\b(?!-)
Only match between whitespaces or start/end of string: (?<!\S)Attributable(?!\S). NOTE: Actually, if it is what you want, you may do without a regex by using s.Split().Contains("Attributable")
Only match if not preceded with punctuation and there is no letter/digit/underscore right after: (?<!\p{P})Attributable\b
Only match if not preceded with punctation symbols but some specific ones (say, you want to match the word after a comma and a colon): (?<![^\P{P},;])Attributable\b.

regex not matching number correctly

I have the following regex: (\d{14}) decimal that matches 14 character long number. The problem is that it also matches numbers, that are 16 characters long. I need to add a condition to match if there are no numbers at beginning or end of string.
So for example 112222222222222233 wouldn't be a match i want, but xx22222222222222xx would be match I need.

use word boundary \b
\b\d{14}\b

M42's answer can work in cases where the number is delimited by spaces or other word delimiters. But if you want to match a number in a word containing non-digits (like your example xx22222222222222xx) something like this should work:
(^|[^\d])\d{14}([^\d]|$)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex to capture an exact word in a sentence - c#

Related

Update regular expression to only allow single spaces [duplicate]

Regex to extract string between digit pattern and colon or newline

Regex for alpha number string in c# accepting underscore and white spaces

C# Regex boundary with special characters

regex not matching number correctly

Categories

Resources