Firstly i have spent Three hours trying to solve this. Also please don't suggest not using regex. I appreciate other comments and can easily use other methods but i am practicing regex as much as possible.
I am using VB.Net
Example string:
"Hello world this is a string C:\Example\Test E:\AnotherExample"
Pattern:
"[A-Z]{1}:.+?[^ ]*"
Works fine. How ever what if the directory name contains a white space? I have tried to match all strings that start with 1 uppercase letter followed by a colon then any thing else. This needs to be matched up until a whitespace, 1 upper letter and a colon. But then match the same sequence again.
Hope i have made sense.
How about "[A-Z]{1}:((?![A-Z]{1}:).)*", which should stop before the next drive letter and colon?
That "?!" is a "negative lookaround" or "zero-width negative lookahead" which, according to Regular expression to match a line that doesn't contain a word? is the way to get around the lack of inverse matching in regexes.
Not to be too picky, but most filesystems disallow a small number of characters (like <>/\:?"), so a correct pattern for a file path would be more like [A-Z]:\\((?![A-Z]{1}:)[^<>/:?"])*.
The other important point that has been raised is how you expect to parse input like "hello path is c:\folder\file.extension this is not part of the path:P"? This is a problem you commonly run into when you start trying to parse without specifying the allowed range of inputs, or the grammar that a parser accepts. This particular problem seems pretty ad hoc and so I don't really expect you to come up with a grammar or to define how particular messages are encoded. But the next time you approach a parsing problem, see if you can first define what messages are allowed and what they mean (syntax and semantics). I think you'll find that once you've defined the structure of allowed messages, parsing can be almost trivial.
I've turned on case insensitivity...
I want to match abc anywhere except in watch?v=xxabcxx or tumblr_asdfabcasdf.
But if I use (watch\?v=[0-9a-zA-Z]){0}abc against watch?v=xxabcxx, it matches, presumably because the engine fails until it checks abcxxx which is fine.
In regular expressions that is called negative look behind (also ahead, depending of the direction you need to look at). Check the tutorial on "Positive and Negative Lookahead".
You might want also check the question and answer for "Regular expression negative lookahead".
As an example, take a look at (watch\?v=.*)(?<!xx)abc, the part (?<xx)abc can be read as abc matches only if the preceding letters do not match with xx, where (?a)b is the format to put a condition a before apply b. Also, the symbol < says look behind and the exclamation mark ! is to negate the condition. I used a generic regular expression, but you can get the idea.
So basically I have this giant regular expression pattern, and somewhere in the middle of it is the expression (?:\s(\d\d\d)|(\d\d\d\d)). At this part of the parse I'm wanting to capture either 3 digits that follows a space or 4 digits, but I don't want the capture that comes from using the parenthesis around the whole thing (doesn't ?: make something non-capture). I have to use parenthesis so that the "or" logic works (I think).
So potential example inputs would be something like...
input1= giantexpression 123more characters after
input2= giantexpression1234blahblahblah
I tried (?:\s(\d\d\d)|(\d\d\d\d)) and it gave an extra capture at least in the case where I have 4 digits. So am I doing this right or am I messed up somewhere?
Edit:
To go into more detail... here's the current regular expression I'm working with.
pattern = #".?(\d{1,2})\s*(\w{2}).?.?.?(?:\s(\d\d\d)|(\d\d\d\d)).*"
There's a bit of parsing I have to do at the beginning. I think Sean Johnson's answer would still work because I wouldn't need to use "or". But is there a way to do it in which you DO use "or"? I think eventually I'll need that capability.
This should work:
(?:\s(\d{3,4}))
If you aren't doing any logic on that subpattern, you don't even need the parenthesis surrounding it if all you want to do is capture the digits. The following pattern:
\s(\d{3,4})
will capture three or four digits directly following a space character.
I need to create regular expression for validation of string containing wild cards. The expression is must be in form of mobile number (xxx-xxx-xxxx) where x is digital number or question mark. In this case regexp was straight enough ^([\d?{3}]-[\d?{3}]-[\d?{4}])$ but when user requested also * wild card, I've been really confused.
First of all it can be xxx-xxx-*, right? But xxx-xxx-** is invalid as well as xxx-*-*. I read something about lookahead pattern (writing in C#) but had been only confused more. I tried to compile something like ^(?![\\*\\*])$ - "not two asterisks near one another" but it didn't work.
So, any more ideas?
I'm not sure I've understood your requirement exactly but it sounds to me like you want a pattern the will match:
optionally
one to three numbers or ? followed by -
one to three numbers or ? followed by -
one to four numbers or ? followed by -
this should match
123-456
12?-4??-78??
1-3?-2?0
but not match
1--123
-?-23
1233-23?-234
in which case you have no need for a lookahead
this pattern should work
^([\?\d]{1,3})(\-[\?\d]{1,3}(\-[\?\d]{1,4})?)?$
Try it here
This would your expression with some corrections
^[\d?]{3}-[\d?]{3}-[\d?]{4}$
I moved the closing square brackets, the quantifier has to be outside of the character classes, also I removed the outermost brackets, as they don't make sense.
Now to the lookaheads.
If you want to forbid "**"
^(?!.*\*\*)[\d?]{3}-[\d?]{3}-[\d?]{4}$
I am not sure about your requirements about the usage of the "*". Is only one allowed in the string?
similar to disallow "--"
^(?!.*\*\*)(?!.*--)[\d?]{3}-[\d?]{3}-[\d?]{4}$
I am working in asp.net. I am using Regular Expression Validator
Could you please help me in creating a regular expression for not allowing special characters other than comma. Comma has to be allowed.
I checked in regexlib, however I could not find a match. I treid with ^(a-z|A-Z|0-9)*[^#$%^&*()']*$ . When I add other characters as invalid, it does not work.
Also could you please suggest me a place where I can find a good resource of regular expressions? regexlib seems to be big; but any other place which lists very limited but most used examples?
Also, can I create expressions using C# code? Any articles for that?
[\w\s,]+
works fine, as you can see bellow.
RegExr is a great place to test your regular expressions with real time results, it also comes with a very complete list of common expressions.
[] character class \w Matches any word character (alphanumeric & underscore). \s
Matches any whitespace character (spaces, tabs, line breaks). , include comma + is greedy match; which will match the previous 1 or more times.
[\d\w\s,]*
Just a guess
To answer on any articles, I got started here, find it to be an excellent resource:
http://www.regular-expressions.info/
For your current problem, try something like this:
[\w\s,]*
Here's a breakdown:
Match a single character present in the list below «[\w\s,]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A word character (letters, digits, etc.) «\w»
A whitespace character (spaces, tabs, line breaks, etc.) «\s»
The character “,” «,»
For a single character that is not a comma, [^,] should work perfectly fine.
You can try [\w\s,] regular expression. This regex will match only alpha-numeric characters and comma. If any other character appears within text, then this wont match.
For your second question regarding regular expression resource, you can goto
http://www.regular-expressions.info/
This website has lot of tutorials on regex, plus it has lot of usefult information.
Also, can I create expressions using
C# code? Any articles for that?
By this, do you mean to say you want to know which class and methods for regular expression execution? Or you want tool that will create regular expression for you?
You can create expressions with C#, something like this usually does the trick:
Regex regex = new Regex(#"^[a-z | 0-9 | /,]*$", RegexOptions.IgnoreCase);
System.Console.Write("Enter Text");
String s = System.Console.ReadLine();
Match match = regex.Match(s);
if (match.Success == true)
{
System.Console.WriteLine("True");
}
else
{
System.Console.WriteLine("False");
}
System.Console.ReadLine();
You need to import the System.Text.RegularExpressions;
The regular expression above, accepts only numbers, letters (both upper and lower case) and the comma.
For a small introduction to Regular Expressions, I think that the book for MCTS 70-536 can be of a big help, I am pretty sure that you can either download it from somewhere or obtain a copy.
I am assuming that you never messed around with regular expressions in C#, hence I provided the code above.
Hope this helps.
Thank you, all..
[\w\s,]* works
Let me go through regular-expressions.info and come back if I need further support.
Let me try the C# code approach and come back if I need further support.
[This forum is awesome. Quality replies so qucik..]
Thanks again
(…) is denoting a grouping and not a character set that’s denoted with […]. So try this:
^[a-zA-Z0-9,]*$
This will only allow alphanumeric characters and the comma.