I am trying to build regex to match - Test get:all words:test
can start with a word then space and followed by any occurrence of word:word separated by space.
#"^[a-zA-Z]+/s(^[a-zA-Z]+:^[a-zA-Z]+/s)*"
You added extra start of string anchors, ^, inside the pattern, and you need to remove them for sure.
Besides, the whitespace patterns must be written as \s and the first \s must be moved inside the repeated group that should be converted into a non-capturing one ((?:...)) for better performance.
You can use
^[a-zA-Z]+(?:\s+[a-zA-Z]+:[a-zA-Z]+)*$
See the regex demo. Details:
^ - start of string
[a-zA-Z]+ - one or more ASCII letters
(?:\s+[a-zA-Z]+:[a-zA-Z]+)* - zero or more repetitions of
\s+ - one or more whitespaces
[a-zA-Z]+:[a-zA-Z]+ - one or more ASCII letters, :, one or more ASCII letters
$ - end of string (or use \z to match the very end of string).
If you meant to allow any word chars (letters, digits, connector punctuation) then replace each [a-zA-Z] with \w.
If you need to support just any Unicode letters, replace each [a-zA-Z] with \p{L}.
Related
I have created a Regex Pattern (?<=[TCC|TCC_BHPB]\s\d{3,4})[-_\s]\d{1,2}[,]
This Pattern match just:
TCC 6005_5,
What should I change to the end to match these both strings:
TCC 6005-5 ,
TCC 6005_5,
You can add a non-greedy wildcard to your expression (.*?):
(?<=(?:TCC|TCC_BHPB)\s\d{3,4})[-_\s]\d{1,2}.*?[,]
^^^
This will now also match any characters between the last digit and the comma.
As has been pointed out in the comments, [TCC|TCC_BHPB] is a character class rather than a literal match, so I've changed this to (?:TCC|TCC_BHPB) which is presumably what your intention was.
Try it online
This part of the pattern [TCC|TCC_BHPB] is a character class that matches one of the listed characters. It might also be written for example as [|_TCBHP]
To "match" both strings, you can match all parts instead of using a positive lookbehind.
\bTCC(?:_BHPB)?\s\d{3,4}[-_\s]\d{1,2}\s?,
See a regex demo
\bTCC A word boundary to prevent a partial match, then match TCC
(?:_BHPB)?\s\d{3,4} Optionally match _BHPB, match a whitespace char and 3-4 digits (Use [0-9] to match a digit 0-9)
[-_\s]\d{1,2} Match one of - _ or a whitespace char
\s?, Match an optional space and ,
Note that \s can also match a newline.
Using the lookbehind:
(?<=TCC(?:_BHPB)?\s\d{3,4})[-_\s]\d{1,2}\s?,
Regex demo
Or if you want to match 1 or more spaces except a newline
\bTCC(?:_BHPB)?[\p{Zs}\t][0-9]{3,4}[-_\p{Zs}\t][0-9]{1,2}[\p{Zs}\t]*,
Regex demo
I have to extract string between digit pattern and either a colon or newline (first occurence)
my string would look like:
05-30-1306-29-13 BUILDERS RISK:
LIMITS/DEDUCTIBLES:
I would like to extract BUILDERS RISK. There may or may not be a colon, in such case we will treat newline as the terminating pattern
Here's what I have come up with so far
\d{2}-\d{2}-\d{4}-\d{2}-\d{2}\s*\W+[^:|\n]+:\s*
Numerical pattern will always be 2-2-4-2 followed by any string followed by either \n or :
The regex so far gets what I need but I don't know how to break it into different matches so I can take the second match
1st match - digit pattern
2nd match - what i need
3rd match - colon or newline
Any pointers will be helpful.
UPDATE: Couple of alternatives of the text term to be searched could be this
11-06-1212-29-12 DWELLING FIRE (DP-3): ANNUAL RENTAL
11-05-1212-26-12 HOMEOWNERS (HO-3): SECONDARY HOME
I would only want anything before colon or if that is not there, take string till newline is found. As a side note, the text of significance may not be present in same line and appear in next line but will always be followed by either a colon or newline in the same line.
PS: Extracted text should not contain colon
It appears you may use
\b(\d{2}-\d{2}-\d{4}-\d{2}-\d{2})\W+(.*?)(:?\r?\n\s*)
See the regex demo yielding
Details
\b - a word boundary (change to (?<!\d) if the digits can be glued to a letter or underscore)
(\d{2}-\d{2}-\d{4}-\d{2}-\d{2}) - Group 1: two digits, -, two digits, -, four digits, -, two digits, -, two digits
\W+ - 1+ non-word chars (to stay on the line, replace with [^\w\r\n]+)
(.*?) - Group 2: any zero or more chars other than newline, as few as possible
(:?\r?\n\s*) - Group 3: an optional :, an optional CR, an LF symbol and then any 0+ whitespace chars.
What is the regular expression for all characters except white space , and minimum6 characters.
This is what I have now :
^[\w'?#&#.]{6,}$
But this does not accept all the special characters. And I am using in .net app if that makes any difference
[^\s]{6,}$ should make it. But note the answer above, if you only want to skip the white spaces, you better use [^ ]. The notation [^\s] will ignore any white space character (space, tab or newline).
A .NET regex to match any string that does not contain any whitespace chars (at least 6 occurrences) is
\A\S{6,}\z
See the regex demo online
Do not use $ because it may match before a final \n (LF symbol) inside a string, \z is the most appropriate anchor here as it matches the very end of the string. To make the string compatible with JS (if you use it in ASP.NET for both server and client side validation) you need to use ^\S{6,}$(?!\n).
The \S shorthand character class matches any character other than a Unicode whitespace char (if ECMAScript option is not used).
The {6,} limiting quantifier matches six or more occurrences of the quantified subpattern.
Details
\A - (an unambiguous anchor, its behavior cannot be redefined with any regex options) start of a string
\S{6,} - any 6 or more chars other than a Unicode whitespace char
\z - the very end of the string.
I am trying to match the following pattern.
A minimum of 3 'groups' of alphanumeric characters separated by a hyphen.
Eg: ABC1-AB-B5-ABC1
Each group can be any number of characters long.
I have tried the following:
^(\w*(-)){3,}?$
This gives me what I want to an extent.
ABC1-AB-B5-0001 fails, and ABC1-AB-B5-0001- passes.
I don't want the trailing hyphen to be a requirement.
I can't figure out how to modify the expression.
Your ^(\w*(-)){3,}?$ pattern even allows a string like ----- because the only required pattern here is a hyphen: \w* may match 0 word chars. The - may be both leading and trailing because of that.
You may use
\A\w+(?:-\w+){2,}\z
Details:
\A - start of string
\w+ - 1+ word chars (that is, letters, digits or _ symbols)
(?:-\w+){2,} - 2 or more sequences of:
- - a single hyphen
\w+ - 1 or more word chars
\z - the very end of string.
See the regex demo.
Or, if you do not want to allow _:
\A[^\W_]+(?:-[^\W_]+){2,}\z
or to only allow ASCII letters and digits:
\A[A-Za-z0-9]+(?:-[A-Za-z0-9]+){2,}\z
It can be like this:
^\w+-\w+-\w+(-\w+)*$
^(\w+-){2,}(\w+)-?$
Matches 2+ groups separated by a hyphen, then a single group possibly terminated by a hyphen.
((?:-?\w+){3,})
Matches minimum 3 groups, optionally starting with a hyphen, thus ignoring the trailing hyphen.
Note that the \w word character also select the underscore char _ as well as 0-9 and a-z
link to demo
I want to match any string that does not contain the string "DontMatchThis".
What's the regex?
try this:
^(?!.*DontMatchThis).*$
The regex to match a string that does not contain a certain pattern is
(?s)^(?!.*DontMatchThis).*$
If you use the pattern without the (?s) (which is an inline version of the RegexOptions.Singleline flag that makes . match a newline LF symbol as well as all other characters), the DontMatchThis will only be searched for on the first line, and only a string without LF symbols will be matched with .*.
Pattern details:
(?s) - a DOTALL/Singleline modifier making . match any character
^ - start of string anchor
(?!.*DontMatchThis) - a negative lookahead checking if there are any 0 or more characters (matched with greedy .* subpattern - NOTE a lazy .*? version (matching as few characters as possible before the next subpattern match) might get the job done quicker if DontMatchThis is expected closer to the string start) followed with DontMatchThis
.* - any zero or more characters, as many as possible, up to
$ - the end of string (see Anchor Characters: Dollar ($)).