Regular Expression for Email #2

Regular Expression for Email #2 - c#

I've got a regular expression that I am using to check against a string to see if it an email address:
#"^((([\w]+\.[\w]+)+)|([\w]+))#(([\w]+\.)+)([A-Za-z]{1,3})$"
This works fine for all the email addresses I've tested, provided the bit before '#' is at least four characters long.
Works:
web1#domain.co.uk
Doesn't work:
web#domain.co.uk
How can I change the regex to allow prefixes of less than 4 characters??

The 'standard' regex used in asp.net mvc account models for email validation is as follows:
#"^[\w-]+(\.[\w-]+)*#([a-z0-9-]+(\.[a-z0-9-]+)*?\.[a-z]{2,6}|(\d{1,3}\.){3}\d{1,3})(:\d{4})?$"
It allows 1+ characters before the #

I believe the best way to check a valid email address is to make the user type it twice and then send him an email and challenge the fact that he received it using a validation link.
Check your regex againt a list of weird valid email addresses and you will see regexes are not perfect for email validation tasks.

I recommend not using a regex to validate email (for reasons outlined here) http://davidcel.is/blog/2012/09/06/stop-validating-email-addresses-with-regex/
If you can't sent a confirmation email a good alternative in C# is to try creating a MailAddress and check if it fails.
If you're using ASP.NET you can use a CustomValidator to call this validation method.
bool isValidEmail(string email)
{
try
{
MailAddress m = new MailAddress(email);
return true;
}
catch
{
return false;
}
}

You can use this regex as an alternative:
^([a-z0-9_\.-]+)#([\da-z\.-]+)\.([a-z\.]{2,6})$
Its description can be found here.
About your regex, the starting part (([\w]+\.[\w]+)+) forces the email address to have four characters at the beginning. Emending this part
would do the work for you.

The little trick used in the validated answer i.e. catching exceptions on
new MailAddress(email);
doesn't seem very satisfying as it considers "a#a" as a valid adress in fact it does't raise an exception for almost any string matching the regex "*.#.*" which is clearly too permissive for example
new MailAddress("¦#°§¬|¢#¢¬|")
doesn't raise an exception.
Thus I clearly would go for regex matching
This example is quite satisfying
https://msdn.microsoft.com/en-us/library/01escwtf%28v=vs.110%29.aspx

You can also try this one
^[a-zA-Z0-9._-]*#[a-z0-9._-]{2,}\.[a-z]{2,4}$

Related

Email validation C# asp.net [duplicate]

This question already has answers here:
How can I validate an email address using a regular expression?
(79 answers)
Closed 3 years ago.
I used the following pattern to validate my email field.
return Regex.IsMatch(email,
#"^(?("")("".+?(?<!\\)""#)|(([0-9a-z]((\.(?!\.))|[-!#\$%&'\*\+/=\?\^`\{\}\|~\w])*)(?<=[0-9a-z])#))" +
#"(?(\[)(\[(\d{1,3}\.){3}\d{1,3}\])|(([0-9a-z][-0-9a-z]*[0-9a-z]*\.)+[a-z0-9][\-a-z0-9]{0,22}[a-z0-9]))$",
RegexOptions.IgnoreCase, TimeSpan.FromMilliseconds(250));
It uses the following reference:
https://learn.microsoft.com/en-us/dotnet/standard/base-types/how-to-verify-that-strings-are-in-valid-email-format
My requirement is to have maximum number of 64 characters for user part, and max length for whole email string is 254 characters. The pattern in the reference only allow max 134 characters. Can someone give clear explanation of the meaning for the pattern? What is the right pattern to achieve my goal?

The code you cited is over-engineered, all you need to verify an email is to check for an at symbol and for a dot. If you need anything more precise, you are probably at a point where you actually need to email the recipient and ask for their confirmation that they hold the email, something that is simpler than a complex regex, and which provides much more precision.
Such a regex would simply be:
.+#.+\..+
Commentated below
.+ At least one of any character
# The at symbol
.+ At least one character
\. The . symbol
.+ At least one character
Of course this means that some emails might be accepted as false positives, like tomas#company.c when the user intended tomas#company.com , but even if you design the most robust of regexes, one that checks against a list of accepted TLDs, you will never catch tomas#company.co, and you might insert positive falses like tomas#company.blockchain when a new TLD is released and your code isn't updated.
So just keep it simple.

If you wanted to avoid using regex (which is, in my opinion, difficult to decipher), you could use the .Split() method on the email string using the "#" symbol as your delimiter. Then, you can check the string lengths of the two components from there.

Several years back, I wrote an email validation attribute in C# that should recognize most of that subset of syntactically valid email addresses that have the form local-part#domain — I say "most" because I didn't bother to try do deal with things like punycode, IPv4 address literals (dotted quads), or IPv6 address literals.
I'm sure there's lots of other edge cases I missed as well. But it worked well enough for our purposes at the time.
Use it in good health: C# Email Address validation
Before you go down the road of writing you own, you might want to read through the multiple relevant RFCs and try to understand the vagaries of what constitutes a "valid" email address (it's not what you think), and (2) stop trying to validate an RFC 822 email address. About the only way to "validate" an email address is to send mail to it and see if it bounces or not. Which doesn't mean that anybody is home at that address, or that that mailbox won't disappear next week.
https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx/
https://jackfoxy.github.io/FsRegEx/emailregex.html
Jeffrey Friedl's book Mastering Regular Expressions has a [more-or-less?] complete regular expression to match syntactically valid email addresses. It's 6,598 characters long.
Did you know that postmaster#. is a legal email address? It theoretically gets you to the postmaster of the root DNS server.
Or that [theoretically] "bang path" email addresses like MyDepartmentServer!MainServer!BigRouter!TheirDepartmentServer!SpecificServer!jsmith are valid. Here you define the actual path through the network that the email should take. Helps if you know the network topology involved.

Why is this Email regex so slow on Mvc?

I am currently building a system using Asp.net, c#, Mvc2 which uses the following regex:
^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$
This is an e-mail regex that validates a 'valid' e-mail address format. My code is as follows:
if (!Regex.IsMatch(model.Email, #"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$"))
ModelState.AddModelError("Email", "The field Email is invalid.");
The Regex works fine for validating e-mails however if a particularly long string is passed to the regex and it is invalid it causes the system to keep on 'working' without ever resolving the page. For instance, this is the data that I tried to pass:
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
The above string causes the system to essentially lock up. I would like to know why and if I can use a regex that accomplishes the same thing in maybe a simpler manner. My target is that an incorrectly formed e-mail address like for instance the following isn't passed:
host.#.host..com

You have nested repetition operators sharing the same characters, which is liable to cause catastrophic backtracking.
For example: ([-.\w]*[0-9a-zA-Z])*
This says: match 0 or more of -._0-9a-zA-Z followed by a single 0-9a-zA-Z, one or more times.
i falls in both of these classes.
Thus, when run on iiiiiiii... the regex is matching every possible permuation of (several "i"s followed by one "i") several times (which is a lot of permutations).
In general, validating email addresses with a regular expression is hard.

Email Regex that DOES include unicode domains

I was wondering if anybody has found a solution that validates an email that includes unicode characters as in from a unicode domain? I have searched at length and have yet to find a solution that works.

Fully validating an email address through a regex is hard. Really hard. This is one that is fully compliant with RFC822. Even if you create a perfect regex that correct validates all email addresses, that doesn't stop me from entering hi#hi.com (If you're trying to make sure that I enter a valid email address) or from accidentally misspelling my username (If you're trying to make sure that I enter my email address correctly).
Just send a link in an email saying, "click here to validate your email address."

I had the same issue and came up with an intelligent solution \p{L}.
Please check it out:
private static bool IsEmailValid(string email) {
System.Text.RegularExpressions.Regex re = new Regex(#"^[\p{L}0-9!$'*+\-_]+(\.[\p{L}0-9!$'*+\-_]+)*#[\p{L}0-9]+(\.[\p{L}0-9]+)*(\.[\p{L}]{2,})$", RegexOptions.CultureInvariant | RegexOptions.IgnoreCase);
return re.IsMatch(email);
}

Ok, so the only email validation I ever found that was truly awesome (instead of just OK) is part of the Zend Framework. Of course that means PHP, hopefully though, you can look at how they do it and emulate some of their better ideas: http://pastebin.com/SvZPBp31 Or just look up Zend_Validate_EmailAddress sourcecode.
sorry that this isn't in C# syntax / language.

Like has been pointed out, validating e-mail addresses through a regular expression is a hard problem. You can get close with a fairly simple one, but there are many, many cases that it will fail to catch. I'm all for sending an email to a supposed email address as #Nick ODell suggests (after doing some basic sanity checking, like, does it contain an # sign, does the domain name portion exist and have one or more of MX/A/AAAA RRs, and the likes) and including a verification link.
That said, if by Unicode domain you mean a Punycode-encoded host name label, those should be covered by any half-way competent validation regexp, as in encoded form those are just xn-- followed by the regular set [a-z0-9-] (case insensitive comparison).

Is RegEx used by System.Net.Mail.MailAddress

I have been trying to find a good RegEx for email validation.
I have already gone through Comparing E-mail Address Validating Regular Expressions and that didn't suffice all my validation needs.
I have Google/Bing(ed) and scan the top 50 odd results including regular expressions info article and other stuff.
So finally i used the System.Net.Mail.MailAddress class to validate my email address. Since, if this fails, my email won't get sent to the user.
I want to customize the validation as used by the constructor of the class.
So how do I go ahead and get the validation/RegEx that the MailAddress class is using?

No it does not use a RegEx, but rather a complicated process that would take way too long to explain here. How do I know? I looked at the implementation using the .NET Reflector. And so can you :D
http://www.red-gate.com/products/reflector/ (it's free)

Thanks Reflector... forgot you were still free!
Reflected the System.Net.Mail.MailAddress...
Found that it used a void ParseValue(string address)
and void GetParts(string address) methods to primary check the mail address format.
//Edited
Surprised, no RegEx was involved!

According to the Reflector, the class doesn't use regular expressions at all.

Email Validation: converting a regular expression written in PHP (preg) to .NET (Regex)

Based on this answer...
Using a regular expression to validate an email address
Which led me to this site...
http://fightingforalostcause.net/misc/2006/compare-email-regex.php
I'd like to use this regex for email validation for my ASP.NET MVC app:
/^[-_a-z0-9\'+*$^&%=~!?{}]++(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*+#(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d++)?$/iD
Unfortunately, I get this error
System.ArgumentException was unhandled by user code
Message="parsing \"/^[-_a-z0-9\'+$^&%=~!?{}]++(?:\.[-_a-z0-9\'+$^&%=~!?{}]+)*+#(?:(?![-.])[-a-z0-9.]+(?
Has anyone ever converted this to be usable by .NET's Regex class, or is there another .NET regular expression class that is a better fit with PHP's preg_match function?

The problem with your regular expression in .NET is that the possessive quantifiers aren't supported. If you remove those, it works. Here's the regular expression as a C# string:
#"^[-_a-z0-9\'+*$^&%=~!?{}]+(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*#(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d+)?$"
Here's a test bed for it based on the page you linked to, including all the strings that should match and the first three of those that shouldn't:
using System;
using System.Text.RegularExpressions;
public class Program
{
static void Main(string[] args)
{
foreach (string email in new string[]{
"l3tt3rsAndNumb3rs#domain.com",
"has-dash#domain.com",
"hasApostrophe.o'leary#domain.org",
"uncommonTLD#domain.museum",
"uncommonTLD#domain.travel",
"uncommonTLD#domain.mobi",
"countryCodeTLD#domain.uk",
"countryCodeTLD#domain.rw",
"lettersInDomain#911.com",
"underscore_inLocal#domain.net",
"IPInsteadOfDomain#127.0.0.1",
"IPAndPort#127.0.0.1:25",
"subdomain#sub.domain.com",
"local#dash-inDomain.com",
"dot.inLocal#foo.com",
"a#singleLetterLocal.org",
"singleLetterDomain#x.org",
"&*=?^+{}'~#validCharsInLocal.net",
"missingDomain#.com",
"#missingLocal.org",
"missingatSign.net"
})
{
string s = #"^[-_a-z0-9\'+*$^&%=~!?{}]+(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*#(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d+)?$";
bool isMatch = Regex.IsMatch(email, s, RegexOptions.IgnoreCase);
Console.WriteLine(isMatch);
}
}
}
Output:
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True
False
False
False
A problem though is that it fails to match some valid email-addresses, such as foo\#bar#example.com. It's better too match too much than too little.

You really shouldn't be using a RegEx to parse email addresses in .NET. Your better option is to use the functionality built into the framework.
Try to use your email string in the constructor of the MailAddress class. If it throws a FormatException then the address is no good.
try
{
MailAddress addr = new MailAddress("theEmail#stackoverflow.com")
// <- Valid email if this line is reached
}
catch (FormatException)
{
// <- Invalid email if this line is reached
}
You can see an answer a Microsoft developer gave to another email validation question, where he explains how .NET's email parsing has also improved dramatically in .NET 4.0. Since at the time of answering this, .NET 4.0 is still in beta, you probably aren't running it, however even previous versions of the framework have adequate email address parsing code. Remember, in the end you're most likely going to be using the MailAddress class to send your email anyway. Why not use it to validation your email addresses. In the end, being valid to the MailAddress class is all that matters anyway.

.NET regular expression syntax is not the same as in PHP, and Regex is the only built-in class to use regular expression (but there might be other third party implementation). Anyway, it's pretty easy to validate an email address with Regex... straight from the source
^([0-9a-zA-Z]([-\.\w]*[0-9a-zA-Z])*#([0-9a-zA-Z][-\w]*[0-9a-zA-Z]\.)+[a-zA-Z]{2,9})$

I've used this function before in a bunch of e-commerce applications and never had a problem.
public static bool IsEmailValid(string emailAddress)
{
Regex emailRegEx = new Regex(#"\b[A-Z0-9._%-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b");
if (emailRegEx.IsMatch(emailAddress))
{
return true;
}
return false;
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.