parse string with regex - c#

I am trying to search a string for email addresses, but my regex does not work, when the string contains other characters than the email. Meaning, if I try on a small string like "me#email.com", the regex finds a match. If I insert a blank space in the string, like: " me#mail.com", the regex does not find an email match.
Here is my code(the regex pattern is from the web):
string emailpattern = #"^(([^<>()[\]\\.,;:\s#\""]+"
+ #"(\.[^<>()[\]\\.,;:\s#\""]+)*)|(\"".+\""))#"
+ #"((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}"
+ #"\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+"
+ #"[a-zA-Z]{2,}))$";
Regex rEmail = new Regex(emailpattern);
string str = #" me#mail.com";
MatchCollection mcolResults = rEmail.Matches(str);
MessageBox.Show(mcolResults.Count.ToString());
Please let me know what am I doing wrong.
Thank you.
Best regards,

^ and $ mean (respectively) the start and end of the input text (or line in multi-line mode) - generally used to check that the entire text (or line) matches the pattern. So if you don't want that, take them away.

Remove the ^ and the $ from the beginning and the end. They mean "Start of string" and "End of string" respectively.

Do you learn how to use regex or you actually need to parse email addresses?
There is an object that was especially designed to do it MailAddress
Here is the MSDN documentation: http://msdn.microsoft.com/en-us/library/591bk9e8.aspx
When you initialize it with a string that holds a mail address that is not in the correct format, a FormatException will be thrown.
Good luck!

First obvious problem: Your expression only matches email adresses at the start of a string.
You need to drop the ^ at the start.
^ matches the start of a string.

The regex is correct. e-mail addresses don't contain whitespace.
You can use escapes like \w in your regex in order to match whitespace, or you can do str.Trim() to fix your string before trying to match against it.

Related

Regex pattern in c# start with # and end with 9;

Need regex pattern that text start with"#" and end with " ";
I tried the below pattern
string pattern = "^[#].*?[ ]$";
but not working
Since is an hex code of tab character, why not just using StartsWith and EndsWith methods instead?
if(yourString.StartsWith("#") && yourString.EndsWith("\\t"))
{
// Pass
}
This patterns works fine. I have tested it.
string pattern = "#(.*?)9";
See below link to test it online.
https://regex101.com/r/iR6nP6/1
C#
const string str = "dadasd#beetween9ddasdasd";
var match = Regex.Match(str, "#(.*?)9");
Console.WriteLine(match.Groups[1].Value);
In regex syntaxt, the [] denotes a group of characters of which the engine will attempt to match one of. Thus, [&#x9] means, match one of an &, #, x or 9 in no particular order.
If you are after order, which seems you are, you will need to remove the []. Something like so should work: string pattern = "^#.*?&#x9$";
you mean something like:
string pattern = "^#.*?[ ]$"
There are also many fine regex expression helpers on the web. for example https://regex101.com/ It gives a nice explanation of how your text will be handled.
You should use \t to match tab character
You can use special character sequences to put non-printable characters in your regular expression. Use \t to match a tab character (ASCII 0x09)
Try following Regex
^\#.*\t\;$

Regular Expression to remove leading and trailing Angle Brackets

Using C#, I need to check strings (email addresses) to see if they have leading and trailing angle brackets, and if so, remove them, leaving the email address string intact.
e.g.
<john#johnsmith.com> becomes john#johnsmith.com
I should probably also cater for the scenario where perhaps there could be white space in front of the leading angle bracket, or behind the trailing angle bracket.
What would be a decent regex to handle this replacement?
Why do you need to use Regex for this?
You can simply do this:
string email = "<john#johnsmith.org>";
email = email.TrimStart('<').TrimEnd('>');
Of course if you really need to be sure there's no spaces, or that there might be multiple spaces:
string email = "<john#johnsmith.org>";
email = email.Trim().TrimStart('<').TrimEnd('>');
You should use Russ Clarke solution (it is the best in my opinion).
But if you really need a regex....
var email = "<john#johnsmith.com>";
email = Regex.Replace(email, "^<|>$", "");
Clarification:
^< - match start < sign
| - or
>$ - match end > sign
Extended version for allowing whitespaces (\s* catches whitespaces):
email = Regex.Replace(email, #"^\s*<\s*|\s*>\s*$", "");
Although TrimStart/TrimEnd/Trim give you a nice option to complete the task without regex, if you would like to allow spaces around < on both sides you would have to perform four calls to do it.
Regex lets you do it in a single call. Here is one possible expression:
#"^\s*<?\s*([^\s>]+)\s*>?\s*$"
It has ^\s*<?\s* to match an optional < surrounded by optional spaces in the beginning, and \s*>?\s*$ for a similar match at the end.
The middle portion is a capturing group ([^\s>]+) to match the e-mail address itself, without performing any validation on it.
All you need now is to "paste" the captured middle into the replacement, like this:
var res = Regex.Replace(s, #"^\s*<?\s*([^\s>]+)\s*>?\s*$", "$1")
Demo.
You can use the following regex globally:
\<(.*?)\>
Explanation:
\< : < is a meta char and needs to be escaped if you want to match it
literally.
(.*?) : match everything in a non-greedy way and capture it.
\> : > is a meta char and needs to be escaped if you want to match it
literally.
If you only need this much, you can just do a String.Replace() like this
var email = " <someone#example.com>";
email = email.Trim().Replace("<", "").Replace(">", "");

How do I match an entire string with a regex?

I need a regex that will only find matches where the entire string matches my query.
For instance if I do a search for movies with the name "Red October" I only want to match on that exact title (case insensitive) but not match titles like "The Hunt For Red October". Not quite sure I know how to do this. Anyone know?
Thanks!
Try the following regular expression:
^Red October$
By default, regular expressions are case sensitive. The ^ marks the start of the matching text and $ the end.
Generally, and with default settings, ^ and $ anchors are a good way of ensuring that a regex matches an entire string.
A few caveats, though:
If you have alternation in your regex, be sure to enclose your regex in a non-capturing group before surrounding it with ^ and $:
^foo|bar$
is of course different from
^(?:foo|bar)$
Also, ^ and $ can take on a different meaning (start/end of line instead of start/end of string) if certain options are set. In text editors that support regular expressions, this is usually the default behaviour. In some languages, especially Ruby, this behaviour cannot even be switched off.
Therefore there is another set of anchors that are guaranteed to only match at the start/end of the entire string:
\A matches at the start of the string.
\Z matches at the end of the string or before a final line break.
\z matches at the very end of the string.
But not all languages support these anchors, most notably JavaScript.
I know that this may be a little late to answer this, but maybe it will come handy for someone else.
Simplest way:
var someString = "...";
var someRegex = "...";
var match = Regex.Match(someString , someRegex );
if(match.Success && match.Value.Length == someString.Length){
//pass
} else {
//fail
}
Use the ^ and $ modifiers to denote where the regex pattern sits relative to the start and end of the string:
Regex.Match("Red October", "^Red October$"); // pass
Regex.Match("The Hunt for Red October", "^Red October$"); // fail
You need to enclose your regex in ^ (start of string) and $ (end of string):
^Red October$
If the string may contain regex metasymbols (. { } ( ) $ etc), I propose to use
^\QYourString\E$
\Q starts quoting all the characters until \E.
Otherwise the regex can be unappropriate or even invalid.
If the language uses regex as string parameter (as I see in the example), double slash should be used:
^\\QYourString\\E$
Hope this tip helps somebody.
Sorry, but that's a little unclear.
From what i read, you want to do simple string compare. You don't need regex for that.
string myTest = "Red October";
bool isMatch = (myTest.ToLower() == "Red October".ToLower());
Console.WriteLine(isMatch);
isMatch = (myTest.ToLower() == "The Hunt for Red October".ToLower());
You can do it like this Exemple if i only want to catch one time the letter minus a in a string and it can be check with myRegex.IsMatch()
^[^e][e]{1}[^e]$

How to check if a string starts and ends with specific strings?

I have a string like:
string str = "https://abce/MyTest";
I want to check if the particular string starts with https:// and ends with /MyTest.
How can I acheive that?
This regular expression:
^https://.*/MyTest$
will do what you ask.
^ matches the beginning of the string.
https:// will match exactly that.
.* will match any number of characters (the * part) of any kind (the . part). If you want to make sure there is at least one character in the middle, use .+ instead.
/MyTest matches exactly that.
$ matches the end of the string.
To verify the match, use:
Regex.IsMatch(str, #"^https://.*/MyTest$");
More info at the MSDN Regex page.
Try the following:
var str = "https://abce/MyTest";
var match = Regex.IsMatch(str, "^https://.+/MyTest$");
The ^ identifier matches the start of the string, while the $ identifier matches the end of the string. The .+ bit simply means any sequence of chars (except a null sequence).
You need to import the System.Text.RegularExpressions namespace for this, of course.
I want to check if the particular string starts with "https://" and ends with "/MyTest".
Well, you could use regex for that. But it's clearer (and probably quicker) to just say what you mean:
str.StartsWith("https://") && str.EndsWith("/MyTest")
You then don't have to worry about whether any of the characters in your match strings need escaping in regex. (For this example, they don't.)
In .NET:
^https://.*/MyTest$
Try Expresso, good for building .NET regexes and teaching you the syntax at the same time.
HAndy tool for genrating regular expressions
http://txt2re.com/

Matching an (easy??) regular expression using C#'s regex

Ok sorry this might seem like a dumb question but I cannot figure this thing out :
I am trying to parse a string and simply want to check whether it only contains the following characters : '0123456789dD+ '
I have tried many things but just can't get to figure out the right regex to use!
Regex oReg = new Regex(#"[\d dD+]+");
oReg.IsMatch("e4");
will return true even though e is not allowed...
I've tried many strings, including Regex("[1234567890 dD+]+")...
It always works on Regex Pal but not in C#...
Please advise and again i apologize this seems like a very silly question
Try this:
#"^[0-9dD+ ]+$"
The ^ and $ at the beginning and end signify the beginning and end of the input string respectively. Thus between the beginning and then end only the stated characters are allowed. In your example, the regex matches if the string contains one of the characters even if it contains other characters as well.
#comments: Thanks, I fixed the missing + and space.
Oops, you forgot the boundaries, try:
Regex oReg = new Regex(#"^[0-9dD +]+$");
oReg.IsMatch("e4");
^ matches the begining of the text stream, $ matches the end.
It is matching the 4; you need ^ and $ to terminate the regex if you want a full match for the entire string - i.e.
Regex re = new Regex(#"^[\d dD+]+$");
Console.WriteLine(re.IsMatch("e4"));
Console.WriteLine(re.IsMatch("4"));
This is because regular expressions can also match parts of the input, in this case it just matches the "4" of "e4". If you want to match a whole line, you have to surround the regex with "^" (matches line start) and "$" (matches line end).
So to make your example work, you have to write is as follows:
Regex oReg = new Regex(#"^[\d dD+]+$");
oReg.IsMatch("e4");
I believe it's returning True because it's finding the 4. Nothing in the regex excludes the letter e from the results.
Another option is to invert everything, so it matches on characters you don't want to allow:
Regex oReg = new Regex(#"[^0-9dD+]");
!oReg.IsMatch("e4");

Categories