validating multiple occurence of e-mails in string in most efficient way - c#

I have string coming in this format as shown bellow:
"mark345345#test.com;rtereter#something.com;terst#gmail.com;fault#mail"
What would be the most efficient way to validate each of these above and fail if it is not valid e-mail?

you can use EmailAddressAttribute class of System.ComponentModel.DataAnnotations namespace for validating the email address. Before that you need to split up individual mails and check whether it is valid or not. the following code will help you to collect the valid mails and invalid mails seperately.
List<string> inputMails = "mark345345#test.com;rtereter#something.com;terst#gmail.com;fault#mail".Split(';').ToList();
List<string> validMails = new List<string>();
List<string> inValidMails = new List<string>();
var validator = new EmailAddressAttribute();
foreach (var mail in inputMails)
{
if (validator.IsValid(mail))
{
validMails.Add(mail);
}
else
{
inValidMails.Add(mail);
}
}

You can use Regex or you might split the string by ';' and try to create a System.Net.Mail.MailAddress instance for each and every address. FormatException will occur if address is not in a recognized format.

If you're sure, that all e-mails are semi colon separated, you can split it and make a list of all. The best way for me to validate each e-mail is to use a regex pattern. I've used this one:
var emailPattern = #"(?=^.{1,64}#)^[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?=.{1,255}$|.{1,255};)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])(;(?=.{1,64}#)[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?=.{1,255}$|.{1,255};)(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9])?\.)+[a-zA-Z0-9](?:[a-zA-Z0-9-]*[a-zA-Z0-9]))*$";
var incomingString = "mark345345#test.com;rtereter#something.com;terst#gmail.com;fault#mail";
var emails = incomingString.Split(';').ToList();
foreach (var email in emails)
{
if (new Regex(emailPattern).IsMatch(email))
{
// your logic here
}
}

Since .Net has out of the box ways to validate an email id, I would not use a regex and rely upon .Net. e.g the EmailAddressAttribute from System.ComponentModel.DataAnnotations.
A clean way to use it would be something like:
var emailAddressAttribute = new EmailAddressAttribute();
var groups = yourEmailsString.Split(new [] { ';' }, StringSplitOptions.RemoveEmptyEntries)
.GroupBy(emailAddressAttribute.IsValid);
This will give you 2 groups, the one with the Key == true will be valid email ids
var validEmailIds = groups.Where(group => group.Key)
.SelectMany(group => group);
the one with Key == false will be invalid email ids
var invalidEmailIds = groups.Where(group => !group.Key)
.SelectMany(group => group);
You could also run up a for loop after grouping, according to your needs..

Related

Iterating whole list and updating string using ToLower - Not working

Technologies using.
C#
.NET 4.0
Visual Studio 2010
Problem.
I have a List<User> which contains an Email property. I want to lowercase all the email addresses within the list, but my implementation is not working. I'm using the following statement:
emails.ToList().ForEach(e => e.ToLower());
This didnt work at all for email addresses like Catherine.Burke#email.co.uk. I built the following to test this:
string email = "Catherine.Burke#email.co.uk";
email = email.ToLower();
Console.WriteLine("Email: " + email);
string email2 = "Catherine.Burke#email.co.uk";
string email3 = "Gareth.bradley#email.co.uk";
List<string> emails = new List<string>();
emails.Add(email2);
emails.Add(email3);
emails.ToList().ForEach(e => e.ToLower());
emails.ToList().ForEach(delegate(string e)
{
Console.WriteLine("ForEach deletegate : " + e);
});
List<EmailAddress> emailAddresses = new List<EmailAddress>();
emailAddresses.Add(new EmailAddress { FullAddress = "Catherine.Burke#email.co.uk" });
emailAddresses.Add(new EmailAddress { FullAddress = "Gareth.bradley#email.co.uk" });
emailAddresses.ToList().ForEach(e => e.FullAddress.ToLower());
emailAddresses.ToList().ForEach(delegate(EmailAddress e)
{
Console.WriteLine("EmailAddress delegate: " + e.FullAddress);
});
foreach (EmailAddress em in emailAddresses)
{
Console.WriteLine("Foreach Print: " + em.FullAddress);
}
Now I thought it might be the Culture and as these are names, it kept them uppercase, but when I used ToLower() on a singular string it worked. The above ran with the following output, as you can see the 1st line shows an email address with lowercase characters, whereas the implementation of the various List's I tried using ForEach() have not worked. I'm presuming my implementation of ForEach() is incorrect?
Making my comment an answer as requested:
Use a simple for-loop. List.ForEach is a method where you get the string as argument, you can't replace the whole reference there and since strings are immutable you can't change them either. You have to reassign the string returned from String.ToLower to your variable:
for(int i = 0; i < emails.Count; i++)
emails[i] = emails[i].ToLower();
Side-note: if you are making all emails lowercase to get a case-insensitive comparison it's better to use the String.Equals overload with the right StringComparison
string email1 = "Catherine.Burke#email.co.uk";
string email2 = "catherine.burke#email.co.uk";
if (String.Equals(email1, email2, StringComparison.InvariantCultureIgnoreCase))
{
// ...
}
emails.ToList().ForEach(e => e.ToLower()); does just call ToLower() but does not assign the result.
What you want is:
var lowerEmails = emails.Select(e => e.ToLower()).ToList();
Try this:
emailAddresses.ToList().ForEach(e => e.FullAddress = e.FullAdress.ToLower());
As weertzui altready mentions the ForEach-method simply calls the delegate. However the result of this action is not used in your code in any way.
However I´d strongly recommend to use a simply foreach:
foreach(var mail in emailadresses) mail.FullAdress = mail.FullAdress.ToLower();
which seems better readable to me.

Validate email string after split

I have the following string:
var proc = new SAPayslips();
proc.RuleCustomValue = "document.xml|name#domain.com;name#domain.com;name#domain,co.za";
The first value is the name of a xml document, and the rest are emails I would like to utilize.
I can successfully split them and use them but I have a problem with the validation. I would like to throw an exception if the email address doesn't contain an # char.
// retrieves document name
customValues = _ruleCustomValue.Split('|');
// retrieves emails
emails = customValues[1].Split(';');
if(!customValues[1].Contains("#"))
throw new System.InvalidOperationException("Invalid Email adress,");
It doesn't throw the exception when there is no #
You need to check emails to search in array of emails instead of customValues[1] that is a string. Calling Contains on customValues[1] will return true if it contains only one #.
You need to iterate through array of find if any of array element does not contain # in it.
foreach (var email in emails)
if(!email.Contains("#"))
{
throw new System.InvalidOperationException("Invalid Email adress,");
}
You can also use linq, using Enumerable.Any
if(emails.Any(email=>email.indexOf("#") == -1))
throw new System.InvalidOperationException("Invalid Email adress,");
Checking whether or not there is "#" inside is not the exact solution for determining that it is an email adress, I think you are going to need regex pattern for this,
example;
function isEmail(email) {
var pattern = new RegExp(/^[+a-zA-Z0-9._-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$/i);
return pattern.test(email);
};
check it and throw an exception;
if( !isEmail("e#example.com") ) { *here we go! throw exception!*}
here more information about this; link
I hope it will be helpful.
Firstly, it will only throw this in one location (because you have only specified one location: customValues[1])
Secondly, the item you have specified is actually the second item in the array, as all collections start at 0.
What you may want to do instead is go through a loop, and check each email string:
foreach (string s in customValues)
{
if (!s.Contains("#"))
// throw exception
else
// do stuff...
}
You are validating that any email contains a #, not that each email contains a #.
You should get each e-mail and validate through that each e-mail NOT for items in customvalues.
You can try this code:
// retrieves document name
string[]customValues = _ruleCustomValue.Split('|');
// retrieves emails
string[] emails = customValues[1].Split(';');
foreach (string email in emails)
{
if (!email.Contains("#"))
{
throw new System.InvalidOperationException("Invalid Email adress,");
}
}
Try this
// retrieves document name
customValues = _ruleCustomValue.Split('|');
// retrieves emails
emails = customValues[1].Split(';');
foreach(var email in emails)
{
if (!EmailValidated(email))
{
throw new System.InvalidOperationException("Invalid Email adress,");
}
}
private static bool EmailValidated(string emailAddress)
{
const string pattern = #"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#"
+ #"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])\."
+ #"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|"
+ #"([a-zA-Z]+[\w-]+\.)+[a-zA-Z]{2,4})$";
var match = Regex.Match(emailAddress.Trim(), pattern, RegexOptions.IgnoreCase);
return match.Success;
}

Check whether a string is in a list at any order in C#

If We have a list of strings like the following code:
List<string> XAll = new List<string>();
XAll.Add("#10#20");
XAll.Add("#20#30#40");
string S = "#30#20";//<- this is same as #20#30 also same as "#20#30#40" means S is exist in that list
//check un-ordered string S= #30#20
// if it is contained at any order like #30#20 or even #20#30 ..... then return true :it is exist
if (XAll.Contains(S))
{
Console.WriteLine("Your String is exist");
}
I would prefer to use Linq to check that S in this regard is exist, no matter how the order is in the list, but it contains both (#30) and (#20) [at least] together in that list XAll.
I am using
var c = item2.Intersect(item1);
if (c.Count() == item1.Length)
{
return true;
}
You should represent your data in a more meaningful way. Don't rely on strings.
For example I would suggest creating a type to represent a set of these numbers and write some code to populate it.
But there are already set types such as HashSet which is possibly a good match with built in functions for testing for sub sets.
This should get you started:
var input = "#20#30#40";
var hashSetOfNumbers = new HashSet<int>(input
.Split(new []{'#'}, StringSplitOptions.RemoveEmptyEntries)
.Select(s=>int.Parse(s)));
This works for me:
Func<string, string[]> split =
x => x.Split(new [] { '#' }, StringSplitOptions.RemoveEmptyEntries);
if (XAll.Any(x => split(x).Intersect(split(S)).Count() == split(S).Count()))
{
Console.WriteLine("Your String is exist");
}
Now, depending on you you want to handle duplicates, this might even be a better solution:
Func<string, HashSet<string>> split =
x => new HashSet<string>(x.Split(
new [] { '#' },
StringSplitOptions.RemoveEmptyEntries));
if (XAll.Any(x => split(S).IsSubsetOf(split(x))))
{
Console.WriteLine("Your String is exist");
}
This second approach uses pure set theory so it strips duplicates.

Sort email list by domain

i wanna actuelly sort a list with email addresses by their domain.
Lets say for an example:
var list = new List<string>();
list.Add(a#hotmail.com);
list.Add(b#aon.at);
list.Add(c#gmail.com);
so the result should be:
b#aon.at
c#gmail.com
a#hotmail.com
is that possible without splitting the email addresses ?
Try this:
var sorted = list.OrderBy(x=>new MailAddress(x).Host).ToList();
it will sort your email addresses by mail host
You could use linq for this. However it is absolutely necessary that you split the email address:
list.OrderBy(email => email.Split('#')[1]).ToList();
You can use Regex to get domain of the emails:
var listSorted = list.OrderBy(email => Regex.Match(email, "#.*").Value)
.ToList();
because:
var temp = Regex.Match("a#hotmail.com", "#.*").Value;
tells: take everything after # sign (including # sign) so temp will be #hotmail.com in this case.

How can I check if a string contains an array of values?

I have an array of valid e-mail address domains. Given an e-mail address, I want to see if its domain is valid
string[] validDomains = { "#test1.com", "#test2.com", "#test3.com" };
string email = "test#test1.com"
Is there a way to check if email contains any of the values of validDomains without using a loop?
I would like to recommend you the following code:
HashSet<string> validDomains = new HashSet<string>
{
"test1.com", "test2.com", "test3.com"
};
const string email = "test#test1.com";
MailAddress mailAddress = new MailAddress(email);
if (validDomains.Contains(mailAddress.Host))
{
// Contains!
}
HashSet.Contains Method is an O(1) operation; while array - O(n). So HashSet<T>.Contains is extremely fast. Also, HashSet does not store the duplicate values and there is no point to store them in your case.
MailAddress Class represents the address of an electronic mail sender or recipient. It contains mail address parsing logic (just not to reinvent the wheel).
If you want to be efficient, not only should you avoid using a loop, but you should construct a HashSet for your allowed domains, which would allow O(1) lookup:
string[] validDomains = { "#test1.com", "#test2.com", "#test3.com" };
HashSet<string> validDomainsHashSet = new HashSet<string>(validDomains);
string email = "test#test1.com";
string domain = email.Substring(email.IndexOf('#'));
bool isValidDomain = validDomainsHashSet.Contains(domain);
It would also make sense to exclude the # character from your domains, since it would be present in all and thereby redundant:
string[] validDomains = { "test1.com", "test2.com", "test3.com" };
HashSet<string> validDomainsHashSet = new HashSet<string>(validDomains);
string email = "test#test1.com";
string domain = email.Substring(email.IndexOf('#') + 1);
bool isValidDomain = validDomainsHashSet.Contains(domain);
The simplest way with LINQ (this also ignores the case):
bool validEmail = validDomains
.Any(d => email.EndsWith(d, StringComparer.OrdinalIgnoreCase));
int index = email.IndexOf("#");
var domain = email.Substring(index)
return validDomains.Any(x=>x == domain);
Check this out:
string[] validDomains = { "#test1.com", "#test2.com", "#test3.com" };
string email = "test#test1.com";
if (validDomains.Contains(email.Substring(email.IndexOf("#"))))
{
}
With a for each loop in this way :
string[] validDomains = { "#test1.com", "#test2.com", "#test3.com" };
string email = "test#test1.com";
foreach (string x in validDomains)
{
if (email.Contains(x))
{
// Do Something
}
}
Without a loop in this way(with LINQ) :
if(validDomains.Any(s => email.Contains(s))) {
//Do Something
}
validDomains.Any(validDomain => email.EndsWith(validDomain))
Refer to the documentation of IEnumerable.Any for more details.

Categories