Regex to find a hypen not sourrounded by charachters - c#

I need help to build a regular expression to find all hyphens whose previous and next characters are not a-z and A-Z. The following are the examples in which hyphens should be found.
This is - test
this is -test
this is- test
this is 2- test
this is 2 -test
this is 2-2 test
Following is the example in which hyphen is ignored:
this is-test
So far I am able to write this:
(?<=[^a-z])-(?=[^a-z])
And this is only searching the following hyphens in the lines:
This is - test
this is 2- test
this is 2-2 test
Many thanks.

First, instead of using a negated class in a positive Lookahead/Lookbehind, you could use negative Lookaheads/Lookbehinds instead (unless you want to make sure that the hyphen is preceded and followed by something). Now, your pattern means:
Match a hyphen that is not preceded by [a-z] and not followed
by [a-z].
Whereas you seem to be looking for:
Match a hyphen that is not both preceded by [a-z] and followed by
[a-z] at the same time.
In which case, you may use the following:
(?<![a-z])-|-(?![a-z])
Demo.
Or if you want to keep the positive Lookarounds with negated classes:
(?<=[^a-z])-|-(?=[^a-z])
Note: You mentioned that you want to check for both a-z and A-Z but in your example, you only used a-z. To check for both, you may replace [a-z] with [a-zA-Z] in the pattern above.

Related

Regular expression to match capture group not preceded by certain characters

I want to write a regex that will match if and only if the pattern is not preceded by the characters "Etc/".
Strings that should match:
GMT+01:00
UTC+01:00
UTC+01
+01:00
...
Strings that should not match:
Etc/GMT+01:00
Etc/UTC+01:00
Etc/UTC+01
...
This is what I have so far:
(?<!Etc\/)((UTC|GMT)?(\+|\-){1}(\d{1,2})(:|\.)?(\d{1,2})?)
The right part of the above regular expression already matches the UTC and GMT offset and covers all the cases I need. But I don't manage to implement the exceptions mentioned above.
I expected the above regex to not match the string Etc/GMT+1:00. But in fact it matches the part +01:00 and only ignores Etc/GMT.
How can I achieve that the the following regular expression does not match if it is preceded with "Etc/"?
(UTC|GMT)?(\+|\-){1}(\d{1,2})(:|\.)?(\d{1,2})?
Here I have an example with most of the use cases I need.
You may add \S* after Etc/ to make sure Etc/ is checked even if there are any zero or more non-whitespace chars between Etc/ and the expected match:
(?<!\bEtc/\S*)((UTC|GMT)?([+-])(\d{1,2})[:.]?(\d{1,2})?)
See the .NET regex demo
Details:
(?<!\bEtc/\S*) - a negative lookbehind that matches a location that is not immediately preceded with a whole word Etc/ and then zero or more non-whitespace chars
(UTC|GMT)? - an optional substring, UTC or GMT
([+-]) - + or -
(\d{1,2}) - one or two digits
[:.]? - an optional : or .
(\d{1,2})? - an optional sequence of one or two digits (equal to (\d{0,2})).
As you are already capturing all data in groups, another way could be getting all the matches of Etc/ out of the way, and use your pattern to capture what you want in the groups.
Note that you can change groupings of single chars like (:|\.) to a character class ([:.])
\bEtc/\S*|(UTC|GMT)?([+-])(\d{1,2})([:.])?(\d{1,2})?
\bEtc/\S* Match Etc/ and optional non whitespace chars
| Or
(UTC|GMT)?([+-])(\d{1,2})([:.])?(\d{1,2})? Your pattern with all the separate groups.
Regex demo
Or with just a single group:
\bEtc/\S*|((?:GMT|UTC)?\+\d{2}(?:[:.]\d{2})?)
Regex demo

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

C# Regex match on special characters

I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting

Simple Regex Question

I am new to regex (15 minutes of experience) so I can't figure this one out. I just want something that will match an alphanumeric string with no spaces in it. For example:
"ThisIsMyName" should match, but
"This Is My Name" should not match.
^[a-zA-Z0-9]+$ will match any letters and any numbers with no spaces (or any punctuation) in the string. It will also require at least one alphanumeric character. This uses a character class for the matching. Breakdown:
^ #Match the beginning of the string
[ #Start of a character class
a-z #The range of lowercase letters
A-Z #The range of uppercase letters
0-9 #The digits 0-9
] #End of the character class
+ #Repeat the previous one or more times
$ #End of string
Further, if you want to "capture" the match so that it can be referenced later, you can surround the regex in parens (a capture group), like so:
^([a-zA-Z0-9]+)$
Even further: since you tagged this with C#, MSDN has a little howto for using regular expressions in .NET. It can be found here. You can also note the fact that if you run the regex with the RegexOptions.IgnoreCase flag then you can simplify it to:
^([a-z0-9])+$
this will match any sequence of non-space characters:
\S+
Take a look at this link for a good basic Regex information source: http://regexlib.com/CheatSheet.aspx
They also have a handy testing tool that I use quite a bit: http://regexlib.com/RETester.aspx
That said, #eldarerathis' or #Nicolas Bottarini's answers should work for you.
I have just written a blog entry about regex, maybe it's something you may find useful:)
http://blogs.appframe.com/erikv/2010-09-23-Regular-Expression
Try using this regex to see if it works: (\w+)

Regex: How to not match the last character of a word?

I am trying to create a regex that does not match a word (a-z only) if the word has a : on the end but otherwise matches it. However, this word is in the middle of a larger regex and so I (don't think) you can use a negative lookbehind and the $ metacharacter.
I tried this negative lookahead instead:
([a-z]+)(?!:)
but this test case
example:
just matches to
exampl
instead of failing.
If you are using a negative lookahead, you could put it at the beginning:
(?![a-z]*:)[a-z]+
i.e: "match at least one a-z char, except if the following chars are 0 to n 'a-z' followed by a ':'"
That would support a larger regex:
X(?![a-z]*:)[a-z]+Y
would match in the following string:
Xeee Xrrr:Y XzzzY XfffZ
only 'XzzzY'
Try this:
[a-z]\s
([a-z]+\b)(?!:)
asserts a word boundary at the end of the match and thus will fail "exampl"
[a-z]+(?![:a-z])

Categories