Regular expression to match specific filename pattern containing underscores

Regular expression to match specific filename pattern containing underscores - c#

I'm trying to create a regular expression that would match files of this pattern:
Id_Name_processID_timestamp_logName.txt
Example of filename: abcd_Service_11234_15112013_Log.txt
I don't need perfect matching something that would match anything_anything_anything_anything_anything.txt would work for me.
I haven't tried anything just lost time starring at this Regex Tutorial for quite a long time, i don t know where to start :(.

Go to this site: http://regexpal.com/
Put abcd_Service_11234_15112013_Log.txt in the lower box.
Start writing your rexex on the top box, until it matches (it's a simple one, really, chars, underscore, rinse and repeat) ... You'll be ok ...

My regex, a short simple one.
^\w+_\w+.txt
Edit:
I do agree with the 1st answer: You really need to try something on your own but that website must be the least userfriendly page on regex. You get my answer out of sympathy ;)

Related

Reusable Non-Capture Groups [duplicate]

I can't seem to find an answer to this problem, and I'm wondering if one exists. Simplified example:
Consider a string "nnnn", where I want to find all matches of "nn" - but also those that overlap with each other. So the regex would provide the following 3 matches:
nnnn
nnnn
nnnn
I realize this is not exactly what regexes are meant for, but walking the string and parsing this manually seems like an awful lot of code, considering that in reality the matches would have to be done using a pattern, not a literal string.

Update 2016:
To get nn, nn, nn, SDJMcHattie proposes in the comments (?=(nn)) (see regex101).
(?=(nn))
Original answer (2008)
A possible solution could be to use a positive look behind:
(?<=n)n
It would give you the end position of:
nnnn
 
nnnn
 
nnnn
As mentioned by Timothy Khouri, a positive lookahead is more intuitive (see example)
I would prefer to his proposition (?=nn)n the simpler form:
(n)(?=(n))
That would reference the first position of the strings you want and would capture the second n in group(2).
That is so because:
Any valid regular expression can be used inside the lookahead.
If it contains capturing parentheses, the backreferences will be saved.
So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex).

Using a lookahead with a capturing group works, at the expense of making your regex slower and more complicated. An alternative solution is to tell the Regex.Match() method where the next match attempt should begin. Try this:
Regex regexObj = new Regex("nn");
Match matchObj = regexObj.Match(subjectString);
while (matchObj.Success) {
matchObj = regexObj.Match(subjectString, matchObj.Index + 1);
}

AFAIK, there is no pure regex way to do that at once (ie. returning the three captures you request without loop).
Now, you can find a pattern once, and loop on the search starting with offset (found position + 1). Should combine regex use with simple code.
[EDIT] Great, I am downvoted when I basically said what Jan shown...
[EDIT 2] To be clear: Jan's answer is better. Not more precise, but certainly more detailed, it deserves to be chosen. I just don't understand why mine is downvoted, since I still see nothing incorrect in it. Not a big deal, just annoying.

How to use regex in order to catch a power statement

How can I use regex to catch a power statement, here are some examples:
24
(2*5)x
y(y+1)
or more complex ones such as x4+(x*2)(x+1) in which case it has 2 matches ("x4" and "(x*2)(x+1)")
I managed to get it working without the parenthesis using the expression:
Regex rPower = new Regex(#"\w\^\w");
But to deal with the possible existence of parenthesis I was thinking of something along these lines, but it still isn't working...
Regex rPower = new Regex(#"(?(?=\()(.*?(?=\)))|(\w))\^(?(?=\()(.*?(?=\)))|(\w))");
Any help/explanation that includes the thought process behind it would be deeply appreciated since I don't know much about regex and and I'm just now starting to learn it.
Thanks in advance
EDIT: For clarity what I intend to do is:
If in the string there is a substring which may start with an "(" in which case it should read everything from that "(" until it find and ")" otherwise assume it's an "\w", separated by a "^" which in turn follows another pattern just like the one it started with.
Basically it will match the expression "(random_Expression)(random_Expression)", but it may not actually be a complex expression, if it does not contain any parenthesis I will assume it's a simple "\w".
I hope I made myself clear :S

You may use this:
(\([^)]*\)|\w)\^(\([^)]*\)|\w)
Sample matches:
2^2 matches 2^2
a+b^c matches b^c
(a+b)^(c+d) matches (a+b)^(c+d)
2^(a+b) matches 2^(a+b)
(a+b)^2 matches (a+b)^2
(a+b)^2+5^2-(3+2)^(2+3) matches (a+b)^2, 5^2, (3+2)^(2+3)
Obviously, you may find bugs on the expression if stuff like nested operations is used. If you are going to work with complex expressions, I guess you will have to parse them carefully with a more elaborated method.
Could you please edit or reply with an explanation even if brief of
how the expression is working?
It is similar to your original expression \w\^\w, but it changes each \w with (\([^)]*\)|\w). If you look closely, that matches either "something inside parentheses" (given by\([^)]*\), which doesn't work for nested brackets) or "a simple word" (\w).
Hope that helps a bit :)

Regular expression not capturing matches in the middle of a string

The regular expression I'm starting with is:
^(((http|ftp|https|www)://)?([\w+?.\w+])+([a-zA-Z0-9\~!\##\$\%\^\&*()_-\=+\/\?.\:\;\'\,]*)?)$
I'm using this to find URLs in the middle of user-supplied text and replace it with a hyperlink. This works fine and matches the following:
http://www.google.com
www.google.com
google.com
www.google.com?id=5
etc...
However, it doesn't find a match if there is any text on either side of it (kind of defeats the purpose of what I'm doing). :)
No match:
Go to www.google.com
www.google.com is the best.
I go to www.google.com all the time.
etc...
How can I change this so that it will match no matter where in the string it appears? I'm terrible with regular expressions...

You have a bug in your original regex. The square brackets make \w+?\.\w+ a character class:
(((http|ftp|https|www)://)?([\w+?\.\w+])+([a-zA-Z0-9\~\!\#\#\$\%\^\&\*\(\)_\-\=\+\\\/\?\.\:\;\'\,]*)?)
^ ^
After removing them (and the anchors ^ and $), your regex will not match obvious non-URLs.
I suggest using http://regexpal.com/ for testing regexes, as it has syntax highlighting within the regex.

i think you should use a positive look ahead, that is going to search for a given url to first of all check two possibilities, either is at the beginning or at the middile of the whole string.
but you should you use something like ^((?=url)?|.?(?=url).*?$))
that is just the beginning , i am not giving you an answer, just an idea.
i would do it, but at the moment i am lazy and your regex looks for a 20 minutes analisis.
stackoverflow erase some things of my example

Regex for ANY string except "www"? (subdomain)

I was wondering if someone out there could help me with a regex in C#. I think it's fairly simple but I've been wracking my brain over it and not quite sure why I'm having such a hard time. :)
I've found a few examples around but I can't seem to manipulate them to do what I need.
I just need to match ANY alphanumeric+dashes subdomain string that is not "www", and just up to the "."
Also, ideally, if someone were to type "www.subdomain.domain.com" I would like the www to be ignored if possible. If not, it's not a huge issue.
In other words, I would like to match:
(test).domain.com
(test2).domain.com
(wwwasdf).domain.com
(asdfwww).domain.com
(w).domain.com
(wwwwww).domain.com
(asfd-12345-www-bananas).domain.com
www.(subdomain).domain.com
And I don't want to match:
(www).domain.com
It seems to me like it should be easy, but I'm having troubles with the "not match" part.
For what it's worth, this is for use in the IIS 7 URL Rewrite Module, to rewrite for all non-www subdomains.
Thanks!

Is the remainder of the domain name constant, like .domain.com, as in your examples? Try this:
\b(?!www\.)(\w+(?:-\w+)*)(?=\.domain\.com\b)
Explanation:
\w+(?:-\w+)* matches a generic domain-name component as you described (but a little more rigorously).
(?=\.domain\.com\b) makes sure it's the first subdomain (i.e., the last one before the actual domain name).
\b(?!www\.) makes sure it isn't www. (without the \b, it could skip over the first w and match just the ww.).
In my tests, this regex matches precisely the parts you highlighted in your examples, and does not match the www. in either of the last two examples.
EDIT: Here's another version which matches the whole name, capturing the pieces in different groups:
^((?:\w+(?:-\w+)*\.)*)((?!www\.)\w+(?:-\w+)*)(\.domain\.com)$
In most cases, group $1 will contain an empty string because there's nothing before the subdomain name, but here's how it breaks down www.subdomain.domain.com:
$1: "www."
$2: "subdomain"
$3: ".domain.com"

^www\.
And invert the logic for this bit, so if it matches, then your string does not meet your requirements.

This works:
^(?!www\.domain\.com)(?:[a-z\-\.]+\.domain\.com)$
Or, with the necessary backslashes for Java (or C#?) strings:
"^(?!www\\.domain\\.com)(?:[a-z\\-\\.]+\\.domain\\.com)$"
There may be a more concise way (i.e. only typing domain.com once), but this works ..

Just substitute the original with everything after the www, if present (pseudocode):
str = re.sub("(www\.)?(.+)", "\2", str)
Or if you just want to match those which are "wrong" use this:
(www\.([^.]+)\.([^.]+))
And if you must match all those which are good use this:
(([^w]|w[^w]|ww[^w]|www[^.]|www\.([^.]+)\.([^.]+)\.).+)

Just thinking aloud here:
^(?:www\.)?([^\.]+)\.([^\.]+)\.
where...
(?:www\.)? looks for a possible "www" at the start, non-capturing
([^\.]+)\. looks for the sub-domain (anything except a dot at least once until a dot)
([^\.]+)\. looks for the domain, ending with a dot (anything except a dot at least once until a dot)
Note: This expression will not work with double sub-domains:
www.subsub.sub.domain.com

This:
^(?:www\.)?([^.]*)
It matches exactly what you put in parentheses in your question. You will find your answers sitting in group(1). You have to anchor it to the beginning of the line. Use this:
^(?:www\.)?(.*)
If you want everything in the URL except the "www.". One example you did not include in your test cases was "alpha.subdomain.domain.com". In the event you need to match everything, except "www.", that is not in the "domain.com" part of the string, use this:
^(?:www\.)?(.+)((?:\.(?:[^./\?]+)){2})
It will solve all of your cases, but in addition, will also return "alpha.subdomain" from my additional test case. And, for an encore, places ".domain.com" in group 2 and will not match beyond that if there are directories or parameters in the url.
I verified all of these responses here.
Finally, for the sake of overkill, if you want to reject addresses that begin with "www.", you can use negative lookbehind:
^....(?<!www\.).*

Thought i'd share this.
(\\.[A-z]{2,3}){1,2}$
Removes any '.com.au' '.co.uk' from the end. Then you can do an additional lookup to detect whether a URL contains a subdomain.
E.g.
subdaomin1.sitea.com.au
subdaomin2.siteb.co.uk
subdaomin3.sitec.net.au
all become:
subdomain1.sitea
subdomain2.siteb
subdomain3.sitec

Regular Expression to reject special characters other than commas

I am working in asp.net. I am using Regular Expression Validator
Could you please help me in creating a regular expression for not allowing special characters other than comma. Comma has to be allowed.
I checked in regexlib, however I could not find a match. I treid with ^(a-z|A-Z|0-9)*[^#$%^&*()']*$ . When I add other characters as invalid, it does not work.
Also could you please suggest me a place where I can find a good resource of regular expressions? regexlib seems to be big; but any other place which lists very limited but most used examples?
Also, can I create expressions using C# code? Any articles for that?

[\w\s,]+
works fine, as you can see bellow.
RegExr is a great place to test your regular expressions with real time results, it also comes with a very complete list of common expressions.
[] character class \w Matches any word character (alphanumeric & underscore). \s
Matches any whitespace character (spaces, tabs, line breaks). , include comma + is greedy match; which will match the previous 1 or more times.

[\d\w\s,]*
Just a guess

To answer on any articles, I got started here, find it to be an excellent resource:
http://www.regular-expressions.info/
For your current problem, try something like this:
[\w\s,]*
Here's a breakdown:
Match a single character present in the list below «[\w\s,]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
A word character (letters, digits, etc.) «\w»
A whitespace character (spaces, tabs, line breaks, etc.) «\s»
The character “,” «,»

For a single character that is not a comma, [^,] should work perfectly fine.

You can try [\w\s,] regular expression. This regex will match only alpha-numeric characters and comma. If any other character appears within text, then this wont match.
For your second question regarding regular expression resource, you can goto
http://www.regular-expressions.info/
This website has lot of tutorials on regex, plus it has lot of usefult information.
Also, can I create expressions using
C# code? Any articles for that?
By this, do you mean to say you want to know which class and methods for regular expression execution? Or you want tool that will create regular expression for you?

You can create expressions with C#, something like this usually does the trick:
Regex regex = new Regex(#"^[a-z | 0-9 | /,]*$", RegexOptions.IgnoreCase);
System.Console.Write("Enter Text");
String s = System.Console.ReadLine();
Match match = regex.Match(s);
if (match.Success == true)
{
System.Console.WriteLine("True");
}
else
{
System.Console.WriteLine("False");
}
System.Console.ReadLine();
You need to import the System.Text.RegularExpressions;
The regular expression above, accepts only numbers, letters (both upper and lower case) and the comma.
For a small introduction to Regular Expressions, I think that the book for MCTS 70-536 can be of a big help, I am pretty sure that you can either download it from somewhere or obtain a copy.
I am assuming that you never messed around with regular expressions in C#, hence I provided the code above.
Hope this helps.

Thank you, all..
[\w\s,]* works
Let me go through regular-expressions.info and come back if I need further support.
Let me try the C# code approach and come back if I need further support.
[This forum is awesome. Quality replies so qucik..]
Thanks again

(…) is denoting a grouping and not a character set that’s denoted with […]. So try this:
^[a-zA-Z0-9,]*$
This will only allow alphanumeric characters and the comma.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regular expression to match specific filename pattern containing underscores - c#

Go to this site: http://regexpal.com/ Put abcd_Service_11234_15112013_Log.txt in the lower box. Start writing your rexex on the top box, until it matches (it's a simple one, really, chars, underscore, rinse and repeat) ... You'll be ok ...

My regex, a short simple one. ^\w+_\w+.txt Edit: I do agree with the 1st answer: You really need to try something on your own but that website must be the least userfriendly page on regex. You get my answer out of sympathy ;)

Related

Reusable Non-Capture Groups [duplicate]

How to use regex in order to catch a power statement

Regular expression not capturing matches in the middle of a string

Regex for ANY string except "www"? (subdomain)

Regular Expression to reject special characters other than commas

Categories

Resources