Regular Expression to validate email ending in .edu - c#

I am trying to create a regex validation attribute in asp.net mvc to validate that an entered email has the .edu TLD.
I have tried the following but the expression never validates to true...
[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+edu
and
\w.\w#{1,1}\w[.\w]?.edu
Can anyone provide some insight?

This should work for you:
^[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.+-]+\.edu$
Breakdown since you said you were weak at RegEx:
^ Beginning of string
[a-zA-Z0-9._%+-]+ one or more letters, numbers, dots, underscores, percent-signs, plus-signs or dashes
# #
[a-zA-Z0-9.+-]+ one or more letters, numbers, dots, plus-signs or dashes
\.edu .edu
$ End of string

if you're using asp.net mvc validation attributes, your regular expression actually has to be coded with javascript regex syntax, and not c# regex syntax. Some symbols are the same, but you have to be weary about that.
You want your attribute to look like the following:
[RegularExpression(#"([0-9]|[a-z]|[A-Z])+#([0-9]|[a-z]|[A-Z])+\.edu$", ErrorMessage = "text to display to user")]
the reason you include the # before the string is to make a literal string, because I believe c# will apply its own escape sequences before it passes it to the regex
(a|b|c) matches either an 'a' or 'b' or 'c'. [a-z] matches all characters between a and z, and the similar for capital letters and numerals so, ([0-9]|[a-z]|[A-Z]) matches any alphanumeric character
([0-9]|[a-z]|[A-Z])+ matches 1 or more alphanumeric characters. + in a regular expression means 1 or more of the previous
# is for the '#' symbol in an email address. If it doesn't work, you might have to escape it, but i don't know of any special meaning for # in a javascript regex
Let's simplify it more
[RegularExpression(#"\w+#\w+\.edu$", ErrorMessage = "text to display to user")]
\w stands for any alphanumeric character including underscore
read some regex documentation at https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions for more information

You may have different combinations and may be this very simple one :
\S+#\S+\.\S+\.edu

try this:
Regex regex = new Regex(#"^[A-Z0-9._%+-]+#[A-Z0-9.-]+\.(edu)$", RegexOptions.IgnoreCase);
ANSWER UPDATED...

Related

Input string that takes in only alphanumeric, # sign and - sign and spaces

Tried using the following regex code but the - key cant be accepted into my input textbox. Please assist!
My code is as followed:
if (Regex.IsMatch(textBox_address.Text, #"^[a-zA-Z0-9#- ]+$"))
Escape the - by replacing it by \-:
^[a-zA-Z0-9#\- ]+$
As you may see in this expression the [.-.] if used to define a set of characters. To explain the regex parser, that your character has not this meaning use \ to escape it.
It would be the same thing if you want to a regex that matches only numbers and [.
To do it : ^[0-9\[]+$ otherwise the regex can't be parsed.

C# Regex match on special characters

I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting

Simple Regex Question

I am new to regex (15 minutes of experience) so I can't figure this one out. I just want something that will match an alphanumeric string with no spaces in it. For example:
"ThisIsMyName" should match, but
"This Is My Name" should not match.
^[a-zA-Z0-9]+$ will match any letters and any numbers with no spaces (or any punctuation) in the string. It will also require at least one alphanumeric character. This uses a character class for the matching. Breakdown:
^ #Match the beginning of the string
[ #Start of a character class
a-z #The range of lowercase letters
A-Z #The range of uppercase letters
0-9 #The digits 0-9
] #End of the character class
+ #Repeat the previous one or more times
$ #End of string
Further, if you want to "capture" the match so that it can be referenced later, you can surround the regex in parens (a capture group), like so:
^([a-zA-Z0-9]+)$
Even further: since you tagged this with C#, MSDN has a little howto for using regular expressions in .NET. It can be found here. You can also note the fact that if you run the regex with the RegexOptions.IgnoreCase flag then you can simplify it to:
^([a-z0-9])+$
this will match any sequence of non-space characters:
\S+
Take a look at this link for a good basic Regex information source: http://regexlib.com/CheatSheet.aspx
They also have a handy testing tool that I use quite a bit: http://regexlib.com/RETester.aspx
That said, #eldarerathis' or #Nicolas Bottarini's answers should work for you.
I have just written a blog entry about regex, maybe it's something you may find useful:)
http://blogs.appframe.com/erikv/2010-09-23-Regular-Expression
Try using this regex to see if it works: (\w+)

How to check for special characters using regex

NET. I have created a regex validator to check for special characters means I donot want any special characters in username. The following is the code
Regex objAlphaPattern = new Regex(#"[a-zA-Z0-9_#.-]");
bool sts = objAlphaPattern.IsMatch(username);
If I provide username as $%^&asghf then the validator gives as invalid data format which is the result I want but If I provide a data s23_#.-^&()%^$# then my validator should block the data but my validator allows the data which is wrong
So how to not allow any special characters except a-z A-A 0-9 _ # .-
Thanks
Sunil Kumar Sahoo
There's a few things wrong with your expression. First you don't have the start string character ^ and end string character $ at the beginning and end of your expression meaning that it only has to find a match somewhere within your string.
Second, you're only looking for one character currently. To force a match of all the characters you'll need to use * Here's what it should be:
Regex objAlphaPattern = new Regex(#"^[a-zA-Z0-9_#.-]*$");
bool sts = objAlphaPattern.IsMatch(username);
Your pattern checks only if the given string contains any "non-special" character; it does not exclude the unwanted characters. You want to change two things; make it check that the whole string contains only allowed characters, and also make it check for more than one character:
^[a-zA-Z0-9_#.-]+$
Added ^ before the pattern to make it start matching at the beginning of the string. Also added +$ after, + to ensure that there is at least one character in the string, and $ to make sure that the string is matched to the end.
Change your regex to ^[a-zA-Z0-9_#.-]+$. Here ^ denotes the beginning of a string, $ is the end of the string.

why do these regex tests let certain characters pass?

I am checking a string with the following regexes:
[a-zA-Z0-9]+
[A-Za-z]+
For some reason, the characters:
.
-
_
are allowed to pass, why is that?
If you want to check that the complete string consists of only the wanted characters you need to anchor your regex like follows:
^[a-zA-Z0-9]+$
Otherwise every string will pass that contains a string of the allowed characters somewhere. The anchors essentially tell the regular expression engine to start looking for those characters at the start of the string and stop looking at the end of the string.
To clarify: If you just use [a-zA-Z0-9]+ as your regex, then the regex engine would rightfully reject the string -__-- as the regex doesn't match against that. There is no single character from the character class you defined.
However, with the string a-b it's different. The regular expression engine will match the first a here since that matches the expression you entered (at least one of the given characters) and won't care about the - or the b. It has done its job and successfully matched a substring according to your regular expression.
Similarly with _-abcdef- – the regex will match the substring abcdef just fine, because you didn't tell it to match only at the start or end of the string; and ignore the other characters.
So when using ^[a-zA-Z0-9]+$ as your regex you are telling the regex engine definitely that you are looking for one or more letters or digits, starting at the very beginning of the string right until the end of the string. There is no room for other characters to squeeze in or hide so this will do what you apparently want. But without the anchors, the match can be anywhere in your search string. For validation purposes you always want to use those anchors.
In regular expressions the + tells the engine to match one or more characters.
So this expression [A-Za-z]+ passes if the string contains a sequence of 1 or more alphabetic characters. The only strings that wouldn't pass are strings that contain no alphabetic characters at all.
The ^ symbol anchors the character class to the beginning of the string and the $ symbol anchors to the end of the string.
So ^[A-Za-z0-9]+ means 'match a string that begins with a sequence of one or more alphanumeric characters'. But would allow strings that include non-alphanumerics so long as those characters were not at the beginning of the string.
While ^[A-Za-z0-9]+$ means 'match a string that begins and ends with a sequence of one or more alphanumeric characters'. This is the only way to completely exclude non-alphanumerics from a string.

Categories