Regex: How to not match the last character of a word? - c#

I am trying to create a regex that does not match a word (a-z only) if the word has a : on the end but otherwise matches it. However, this word is in the middle of a larger regex and so I (don't think) you can use a negative lookbehind and the $ metacharacter.
I tried this negative lookahead instead:
([a-z]+)(?!:)
but this test case
example:
just matches to
exampl
instead of failing.

If you are using a negative lookahead, you could put it at the beginning:
(?![a-z]*:)[a-z]+
i.e: "match at least one a-z char, except if the following chars are 0 to n 'a-z' followed by a ':'"
That would support a larger regex:
X(?![a-z]*:)[a-z]+Y
would match in the following string:
Xeee Xrrr:Y XzzzY XfffZ
only 'XzzzY'

Try this:
[a-z]\s

([a-z]+\b)(?!:)
asserts a word boundary at the end of the match and thus will fail "exampl"

[a-z]+(?![:a-z])

Related

Match up to the comma - Regex

I have created a Regex Pattern (?<=[TCC|TCC_BHPB]\s\d{3,4})[-_\s]\d{1,2}[,]
This Pattern match just:
TCC 6005_5,
What should I change to the end to match these both strings:
TCC 6005-5 ,
TCC 6005_5,
You can add a non-greedy wildcard to your expression (.*?):
(?<=(?:TCC|TCC_BHPB)\s\d{3,4})[-_\s]\d{1,2}.*?[,]
^^^
This will now also match any characters between the last digit and the comma.
As has been pointed out in the comments, [TCC|TCC_BHPB] is a character class rather than a literal match, so I've changed this to (?:TCC|TCC_BHPB) which is presumably what your intention was.
Try it online
This part of the pattern [TCC|TCC_BHPB] is a character class that matches one of the listed characters. It might also be written for example as [|_TCBHP]
To "match" both strings, you can match all parts instead of using a positive lookbehind.
\bTCC(?:_BHPB)?\s\d{3,4}[-_\s]\d{1,2}\s?,
See a regex demo
\bTCC A word boundary to prevent a partial match, then match TCC
(?:_BHPB)?\s\d{3,4} Optionally match _BHPB, match a whitespace char and 3-4 digits (Use [0-9] to match a digit 0-9)
[-_\s]\d{1,2} Match one of - _ or a whitespace char
\s?, Match an optional space and ,
Note that \s can also match a newline.
Using the lookbehind:
(?<=TCC(?:_BHPB)?\s\d{3,4})[-_\s]\d{1,2}\s?,
Regex demo
Or if you want to match 1 or more spaces except a newline
\bTCC(?:_BHPB)?[\p{Zs}\t][0-9]{3,4}[-_\p{Zs}\t][0-9]{1,2}[\p{Zs}\t]*,
Regex demo

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

Regex - Replace specific zeros in a string before a period

sorry for such a direct question but i've spent a little too long trying to find a suitable RegEx that can alter the following strings:
01.10
10.01
setting them as:
1.10
10.1
So basically always remove the first '0' in the complete sequence before each period, or in the last sequence.
Is this possible with RegEx as currently it doesn't seem so?
Try this:
find: (^|\.)0+
replace: $1
See here a demo
Note: if the expression is not at the beginning of the string, you should not use ^, but the word boundary \b, like this:
(\b|\.)0+
eventually, double escape it:
(\\b|\.)0+
See other demo
Perhaps you could try it using this regex. This will not match the zero in 0.0 or 0.1 but only when there are digits after the leading zero(s).
\b0+(?=\d\.\d+\b)|(?<=\b\d+\.)0+(?=\d+\b)
\b word boundary
0+(?=\d\.\d+\b) match a zero and use a positive lookahead to assert that the zero is followed by a digit, dot, one or more digits and a word boundary
| Or
(?<=\b\d+\.)0+(?=\d+\b) Positive lookbehind that asserts that what is on the left is a wordboundary, one or more digits and a dot. Then match one or more zeroes and assert that what follows id one or more digits and a wordboundary.

Regex to capture an exact word in a sentence

I'm having some trouble to capture a specific string inside of a sentence.
The Regex I'm using is \b[0-9]{9,12}\b to capture numbers which have between 9 and 12 digits. The boundary I was using it to specify the exact number, but the problem is, when I have a number which matches with this regex followed by a dot, for example, the regex still matching and giving me much trouble.
As I searched, the problem is that \b uses some special characters as a separator too, right? Then is there a way to consider, for example 123456789. a whole string and the regex will not match with that example?
Thanks !
The word boundary \b requires a non-word character before and after a digit (as a digit is a word character). As dots and commas are non-word characters, they are allowed. To make sure the digit sequence between dots is not matched, you need to use lookarounds.
You can use
\b(?<!\.)[0-9]{9,12}(?!\.)\b
See the regex demo
The additional subpatterns are the lookbehind (?<!\.) and a lookahead (?!\.) that make sure there are no . before and after the digit sequence.
If you have . and , as decimal separators, you may want to adjust the pattern to
\b(?<![.,])[0-9]{9,12}(?![.,])\b

regex to get substring before substring

I have a string like following,
hi,hello,-LSB-,ASPECT,-RSB-,you
I want to extract sub-string that comes before -LSB-,ASPECT, till comma, hello in this case.
I have written regular expression like
\b\w+[/-/,LSB/-/,ASPECT]
however it extracts entire substring before and inclusing-LSB-,ASPECT, till start like,
hi,hello,-LSB-,ASPECT
Any clue??
The regex for this (using a positive lookahead assertion) would be
[^,]*(?=,-LSB-,ASPECT,)
Explanation:
[^,]* # Match any number of characters except commas
(?= # until the following regex can be matched:
,-LSB-,ASPECT, # the literal text ",-LSB-,ASPECT,".
) # (End of lookahead assertion)
Careful, square brackets create a character class which you don't want in this case.
Live demo
Try this:
(\w+),-LSB-,ASPECT

Categories