Regular Expressions Help in C#

Regular Expressions Help in C# - c#

I have a regular expression I'm using to remove html tags and now I'm wondering if there is any way to modify it so that it could also remove links beginning with http and ending with .stm or .gif?
This is the piece of code I'm using:
string BBCSplit = Regex.Replace(BBC, #"<(.|\n)*?>", string.Empty);

The best way to figure out regular expressions is through example, trial and error.
Put your html text in this site, along with your regexp and if it turns yellow, it's matched.
If you need a tutorial on how regexp works, I found this site to be very useful.
The regexp you'll want will be something like http:.*\.stm - which means "the characters http, followed by 0 or more characters (.*) followed by the characters .stm".

Related

Regular expression not capturing matches in the middle of a string

The regular expression I'm starting with is:
^(((http|ftp|https|www)://)?([\w+?.\w+])+([a-zA-Z0-9\~!\##\$\%\^\&*()_-\=+\/\?.\:\;\'\,]*)?)$
I'm using this to find URLs in the middle of user-supplied text and replace it with a hyperlink. This works fine and matches the following:
http://www.google.com
www.google.com
google.com
www.google.com?id=5
etc...
However, it doesn't find a match if there is any text on either side of it (kind of defeats the purpose of what I'm doing). :)
No match:
Go to www.google.com
www.google.com is the best.
I go to www.google.com all the time.
etc...
How can I change this so that it will match no matter where in the string it appears? I'm terrible with regular expressions...

You have a bug in your original regex. The square brackets make \w+?\.\w+ a character class:
(((http|ftp|https|www)://)?([\w+?\.\w+])+([a-zA-Z0-9\~\!\#\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?)
^ ^
After removing them (and the anchors ^ and $), your regex will not match obvious non-URLs.
I suggest using http://regexpal.com/ for testing regexes, as it has syntax highlighting within the regex.

i think you should use a positive look ahead, that is going to search for a given url to first of all check two possibilities, either is at the beginning or at the middile of the whole string.
but you should you use something like ^((?=url)?|.?(?=url).*?$))
that is just the beginning , i am not giving you an answer, just an idea.
i would do it, but at the moment i am lazy and your regex looks for a 20 minutes analisis.
stackoverflow erase some things of my example

Why does .* fail to match the entire (rest of the) string in this regex?

I ran into a problem with my regular expressions, I'm using regular expressions for obtaining data from the string below:
"# DO NOT EDIT THIS MAIL BY HAND #\r\n\r\n[Feedback]:hallo\r\n\r\n# DO NOT EDIT THIS MAIL BY HAND #\r\n\r\n"
So far I got it working with:
String sFeedback = Regex.Match(Message, #"\[Feedback\]\:(?<string>.*?)\r\n\r\t\n# DO NOT EDIT THIS MAIL BY HAND #").Groups[1].Value;
This works except if the header is changed, therefore I want the regex to read from [feedback]: to the end of the string. (symbols, ascii, everything..)
I tried: \[Feedback]:(?<string>.*?)$
Above regular expression does work in some regular expression builders online but in my c# code its not working and returns a empty string. What's wrong?

The problem is that . doesn't match newlines unless you use RegexOptions.Singleline when compiling the regex or inline it using (?s):
(?s)\[Feedback\]:(.*)$

You are missing the escape character.
Also, since you are not referring to the group by name in your C# code, you could further simplify your regex to this
\[Feedback\]:(.*)$

$ in regex means:
The match must occur at the end of the string or before \n at the end of the line or string.
and . means:
Matches any single character except \n.
try to use this simple regex:
\[Feedback\]:(?<string>.*)

Designing Regular Expressions in C#

I've run into a bit of an issue designing RegEx in C#. I have to parse a text document that has multiple urls embedded in it, and I have to extract those
...url=http://www.cnn.com?id=abc,def&system=2&mode=2&quality=ade,url=http://www.bbc.com...
(^ I've added ellipsis to show that its part of content, ... won't actually be in the text)
The begining part is easy as I can start regex with 'url=', however, I can't come up with a way of ending the match
RegEx = (?<IgnoreFirst>[,]url=)(?<Url>[^,]+)
This regex stop at first comma - so just after 'abc' and doesn't return the entire url
RegEx = (?<IgnoreFirst>[,]url=)(?<Url>[^,]+)(?<IgnoreSecond>url)
This doesn't work either because the match stops at first comma and then looks for 'url', which it couldn't find. From some of the reading I've done it seems like its an issue of backtracking etc, so if anyone can help me out with the correct regex, that'd be great!
PS. while we're at it, if I wanted to extract url just before &quality, how would I do that?

How about using something like this:
RegEx = url=(?<Url>.+?)(?=,url|$)
The lookahead at the end will force matching to stop either at the next ",url" or at the end of the string or line.

Match domain names as regular expressions

can some tell how can i write regular expression matching abc.com.ae or abc.net.af or anything ,ae which is the last in the string is optional
this is successfull with / but not . don't know why
^[a-z]{1,25}.[a-z]{3}$
answer
^[a-z]{1,30}\.[a-z]{3}((\.)[a-z]{2})?$

The answer is: write \. instead of ., because . means “any character”.

You can match specific characters using the \u escape code followed immediately by the unicode number for that character in hex format and it must also be four digits only. I think a full stop is \u002E.
In your example where '/' works but not '.', this is because a '/' is recognised as a normal character and does not need to be escaped whereas the full stop has a different meaning in c# regex (it matches any character).
If youre still not sure, here is a useful guide to regex in C#: http://www.radsoftware.com.au/articles/regexlearnsyntax.aspx
And this one has examples of regex for URLs: http://www.radsoftware.com.au/articles/regexsyntaxadvanced.aspx

I sounds like you need some environment to learn regular expressions.
The best way to learn regular expressions is with an interactive regular expression simulator: type in an expression, and it will give you the results, instantly.
There are a few free (and not so free) regular expression simulators available:
Free web version from http://lumadis.be/regex/test_regex.php.
RegExBuddy from http://www.regexbuddy.com/. I am not affiliated with this program, but I have used it with good success in the past.

Regular Expression to reject special characters other than commas

I am working in asp.net. I am using Regular Expression Validator
Could you please help me in creating a regular expression for not allowing special characters other than comma. Comma has to be allowed.
I checked in regexlib, however I could not find a match. I treid with ^(a-z|A-Z|0-9)*[^#$%^&*()']*$ . When I add other characters as invalid, it does not work.
Also could you please suggest me a place where I can find a good resource of regular expressions? regexlib seems to be big; but any other place which lists very limited but most used examples?
Also, can I create expressions using C# code? Any articles for that?

[\w\s,]+
works fine, as you can see bellow.
RegExr is a great place to test your regular expressions with real time results, it also comes with a very complete list of common expressions.
[] character class \w Matches any word character (alphanumeric & underscore). \s
Matches any whitespace character (spaces, tabs, line breaks). , include comma + is greedy match; which will match the previous 1 or more times.

[\d\w\s,]*
Just a guess

To answer on any articles, I got started here, find it to be an excellent resource:
http://www.regular-expressions.info/
For your current problem, try something like this:
[\w\s,]*
Here's a breakdown:
Match a single character present in the list below «[\w\s,]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A word character (letters, digits, etc.) «\w»
A whitespace character (spaces, tabs, line breaks, etc.) «\s»
The character “,” «,»

For a single character that is not a comma, [^,] should work perfectly fine.

You can try [\w\s,] regular expression. This regex will match only alpha-numeric characters and comma. If any other character appears within text, then this wont match.
For your second question regarding regular expression resource, you can goto
http://www.regular-expressions.info/
This website has lot of tutorials on regex, plus it has lot of usefult information.
Also, can I create expressions using
C# code? Any articles for that?
By this, do you mean to say you want to know which class and methods for regular expression execution? Or you want tool that will create regular expression for you?

You can create expressions with C#, something like this usually does the trick:
Regex regex = new Regex(#"^[a-z | 0-9 | /,]*$", RegexOptions.IgnoreCase);
System.Console.Write("Enter Text");
String s = System.Console.ReadLine();
Match match = regex.Match(s);
if (match.Success == true)
{
System.Console.WriteLine("True");
}
else
{
System.Console.WriteLine("False");
}
System.Console.ReadLine();
You need to import the System.Text.RegularExpressions;
The regular expression above, accepts only numbers, letters (both upper and lower case) and the comma.
For a small introduction to Regular Expressions, I think that the book for MCTS 70-536 can be of a big help, I am pretty sure that you can either download it from somewhere or obtain a copy.
I am assuming that you never messed around with regular expressions in C#, hence I provided the code above.
Hope this helps.

Thank you, all..
[\w\s,]* works
Let me go through regular-expressions.info and come back if I need further support.
Let me try the C# code approach and come back if I need further support.
[This forum is awesome. Quality replies so qucik..]
Thanks again

(…) is denoting a grouping and not a character set that’s denoted with […]. So try this:
^[a-zA-Z0-9,]*$
This will only allow alphanumeric characters and the comma.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular Expressions Help in C# - c#

Related

Regular expression not capturing matches in the middle of a string

Why does .* fail to match the entire (rest of the) string in this regex?

Designing Regular Expressions in C#

Match domain names as regular expressions

Regular Expression to reject special characters other than commas

Categories

Resources