Regular Expression to remove leading and trailing Angle Brackets - c#

Using C#, I need to check strings (email addresses) to see if they have leading and trailing angle brackets, and if so, remove them, leaving the email address string intact.
e.g.
<john#johnsmith.com> becomes john#johnsmith.com
I should probably also cater for the scenario where perhaps there could be white space in front of the leading angle bracket, or behind the trailing angle bracket.
What would be a decent regex to handle this replacement?

Why do you need to use Regex for this?
You can simply do this:
string email = "<john#johnsmith.org>";
email = email.TrimStart('<').TrimEnd('>');
Of course if you really need to be sure there's no spaces, or that there might be multiple spaces:
string email = "<john#johnsmith.org>";
email = email.Trim().TrimStart('<').TrimEnd('>');

You should use Russ Clarke solution (it is the best in my opinion).
But if you really need a regex....
var email = "<john#johnsmith.com>";
email = Regex.Replace(email, "^<|>$", "");
Clarification:
^< - match start < sign
| - or
>$ - match end > sign
Extended version for allowing whitespaces (\s* catches whitespaces):
email = Regex.Replace(email, #"^\s*<\s*|\s*>\s*$", "");

Although TrimStart/TrimEnd/Trim give you a nice option to complete the task without regex, if you would like to allow spaces around < on both sides you would have to perform four calls to do it.
Regex lets you do it in a single call. Here is one possible expression:
#"^\s*<?\s*([^\s>]+)\s*>?\s*$"
It has ^\s*<?\s* to match an optional < surrounded by optional spaces in the beginning, and \s*>?\s*$ for a similar match at the end.
The middle portion is a capturing group ([^\s>]+) to match the e-mail address itself, without performing any validation on it.
All you need now is to "paste" the captured middle into the replacement, like this:
var res = Regex.Replace(s, #"^\s*<?\s*([^\s>]+)\s*>?\s*$", "$1")
Demo.

You can use the following regex globally:
\<(.*?)\>
Explanation:
\< : < is a meta char and needs to be escaped if you want to match it
literally.
(.*?) : match everything in a non-greedy way and capture it.
\> : > is a meta char and needs to be escaped if you want to match it
literally.

If you only need this much, you can just do a String.Replace() like this
var email = " <someone#example.com>";
email = email.Trim().Replace("<", "").Replace(">", "");

Related

Using an escape character with a beginning wildcard in regex in c#

Below is a sample of an email I am using from a database:
2.2|[johnnyappleseed#example.com]
Every line is different, and it may or may not be an email, but it will always. I am trying to use regular expressions to get the information inside the brackets. Below is what I have been trying to use:
^\[\]$
Unfortunately, every time I try to use it, the expression isn't matching. I think the problem is using the escape characters, but I am not sure. If this is not how I use the escape characters with this, or if I am wrong completely, please let me know what the actual regex should be.
Close to yours is ^.*\[(.*)\]$:
^ start of the line
.* anything
\[ a bracket, indicating the start of the email
(.*) anything (the email), as a capturing group
\] a square bracked, indicating the end of the email
$ end of the line
Note that your Regex is missing the .* parts to match the things between the key characters [ and ].
Your regex - ^\[\]$ - matches a single string/line that only contains [], and you need to obtain a substring inbetween the square brackets somewhere further inside a larger string.
You can use
var rx = new Regex(#"(?<=\[)[^]]+");
Console.WriteLine(rx.Match(s).Value);
See regex demo
With (?<=\[) we find the position after [ and then we match every character that is not ] with [^]]+.
Another, non-regex way:
var s = "2.2|[johnnyappleseed#example.com]";
var ss = s.Split('|');
if (ss.GetLength(0) > 1)
{
var last = ss[ss.GetLength(0)-1];
if (last.Contains("[") && last.Contains("#")) // We assume there is an email
Console.WriteLine(last.Trim(new[] {'[', ']'}));
}
See IDEONE demo of both approaches

Regex to match all alphanumeric and certain special characters?

I am trying to get a regex to work that will allow all alphanumeric characters (both caps and non caps as well as numbers) but also allow spaces, forward slash (/), dash (-) and plus (+)?
I have been playing with a refiddle: http://refiddle.com/gqr but so far no success, anyone any ideas?
I'm not sure if it makes any difference but I am trying to do this in c#?
If you want to allow only those, you will also need the use of the anchors ^ and $.
^[a-zA-Z0-9_\s\+\-\/]+$
^ ^^
This is your regex and I added characters as indicated from the second line. Don't forget the + or * near the end to allow for more than 1 character (0 or more in the case of *), otherwise the regex will try to match only one character, even with .Matches.
You can also replace the whole class [A-Za-z0-9_] by one \w, like so:
^[\w\s\+\-\/]+$
EDIT:
You can actually avoid some escaping and avoid one last escaping with a careful placement (i.e. ensure the - is either at the beginning or at the end):
^[\w\s+/-]+$
Your regex would look something like:
/[\w\d\/\-\+ ]+/g
That's all letters, digits, and / - + and spaces (but not any other whitespace characters)
The + at the end means that at least 1 character is required. Change it to a * if you want to allow an empty string.
This code does that:
var input = "Test if / this+-works&sec0nd 2 part*3rd part";
var matches = Regex.Matches(input, #"([0-9a-zA-Z /+-]+)");
foreach (Match m in matches) if (m.Success) Console.WriteLine(m.Value);
And output will have 3 result lines:
Test if / this+-works
sec0nd 2 part
3rd---part (I showed spaces with - here)

c# Regular expression for words in brackets with separator

I need to parse a text and check if between all squared brackets is a - and before and after the - must be at least one character.
I tried the following code, but it doesn't work. The matchcount is to large.
Regex regex = new Regex(#"[\.*-.*]");
MatchCollection matches = regex.Matches(textBox.Text);
SampleText:
Node
(Entity [1-5])
Figured I might as well provide an answer... To reiterate my points (with modifications):
* matches 0 or more occurences. You want + probably.
square brackets are special characters and will need to be escaped. They are used to define sets of characters.
You will probably want to exclude [ and ] from your "any character" matching
Put this all together and the following should do you better:
Regex regex = new Regex(#"\[[^-[\]]+-[^[\]]+\]");
Although its a little messy the key thing is that [^[\]] means any character except a square bracket. [^-[\]] means that but also disallows -. This is an optimisation and not required but it just reduces the work the regular expression engine has to do when working out the match. Thanks to ridgerunner for pointing out this optimisation.
Square brackets mean something special in Regexes, you'll need to escape them. Additionally, if you want at least one character then you need to use + rather than *.
Regex regex = new Regex(#"\[.+-.+\]");
MatchCollection matches = regex.Matches(textBox.Text);
string txt = "(Entity [1-5])";
Regex reg = new Regex(#"\[.+\-.+\]");
if it is for #:
string txt = "(Entity [1-5])";
Regex reg = new Regex(#"\[\d+\-\d+\]");

regex syntax stop search

How do I make Regex stop the search after "Target This"?
HeaderText="Target This" AnotherAttribute="Getting Picked Up"
This is what i've tried
var match = Regex.Match(string1, #"(?<=HeaderText=\").*(?=\")");
The quantifier * is eager, which means it will consume as many characters as it can while still getting a match. You want the lazy quantifier, *?.
As an aside, rather than using look-around expressions as you have done here, you may find it in general easier to use capturing groups:
var match = Regex.Match(string1, "HeaderText=\"(.*?)\"");
^ ^ these make a capturing group
Now the match matches the whole thing, but match.Groups[1] is just the value in the quotes.
Plain regex pattern
(?<=HeaderText=").*?(?=")
or as string
string pattern = "(?<=HeaderText=\").*?(?=\")";
or using a verbatim string
string pattern = #"(?<=HeaderText="").*?(?="")";
The trick is the question mark after .*. It means "as few as possible", making it stop after the first end-quotes it encounters.
Note that verbatim strings (introduced with #) do not recognize the backslash \ as escape character. Escape the double quotes by doubling them.
Note for others interested in regex: The search pattern used finds a postion between a prefix and a suffix:
(?<=prefix)find(?=suffix)
Try this:
var match = Regex.Match(string1, "HeaderText=\"([^\"]+)");
var val = match.Groups[1].Value; //Target This
UPDATE
if there possibilities have double quotes in target,change the regex to:
HeaderText=\"(.+?)\"\\s+\\w
Note: it's not right way to do this, if it's a XML, check out System.XML otherwise,HtmlAgilityPack / How to use HTML Agility pack.

What C# regex expression can be used to strip out dots (.) in a string?

I need a string with non alpha-numeric characters etc stripped out of it; I used the following:
wordsstr = Regex.Replace(wordsstr, "[^A-Za-z0-9,-_]", "");
The problem being dots (.)s are left in the string yet they are not specified to be kept. How could I make sure dots are gotten rid of too?
Many thanks.
You are specifying that they need to be kept - you're using ,-_ which is everything from U+002C to U+005F, including U+002E (period).
If you meant the ,-_ to just mean comma, dash and underscore you'll need to escape the dash, such as:
wordsstr = Regex.Replace(input, #"[^A-Za-z0-9,\-_]", "");
Alternatively, (as in Oded's comment) put the dash as the first or last character in the set, to prevent it being interpreted as a range specifier:
wordsstr = Regex.Replace(input, "[^A-Za-z0-9,_-]", "");
If that's not the aim, please be more specific: "non alpha-numeric characters etc" isn't really enough information to go on.
Try the code below:
wordsstr = Regex.Replace(wordsstr, "[^-A-Za-z0-9,_]", "");
Your problem would be easier to understand if you write your expectation and actual result.
Try
wordstr = Regex.Replace(wordstr, "[^A-Za-z0-9,\\-_]", "");
or better if you just want to have alpha-numerical characters:
wordstr = Regex.Replace(wordstr, "[^A-z0-9]", "");
The problem in your first regex is that the - char defines a range, so you have to escape it to make it behave the way you want it to.

Categories