What is this regex supposed to match - h* - c#

I have a piece of code that is supposed look through a list of strings to match a regular expression whose pattern is an input from the user. Inputs such as
h*
q*
y*
seem to match anything and everything. My questions -
Is any of the above a valid regex pattern at all?
If yes, what exactly are they supposed to match?
I've gone through http://regexhero.net/reference/ but couldn't find anything that specifies such expression.
I've used http://regexhero.net/tester/ to check what my regex matches with q* as the Regular Expression and Whatever as the Target String. It gives me 9 matches!

h* means zero or more h characters
The same for the others

These patterns match any number of the specified character, including zero. Without any anchors, there are 9 places where there are zero q in whatever (between the characters and at the ends).
Out of your reference:
Ordinary characters — Characters other than . $ ^ { [ ( | ) * + ? \ match themselves.
* — Repeat 0 or more times matching as many times as possible.

Related

Splitting a string with some characters with some ignored characters as well

There is a string: "QARR_1 * QARR_1 * NPSH[*] + NPSH0". I want to split it into a string array (exactly of 4 items) to get output as: QARR_1, QARR_1, NPSH[*], NPSH0.
I understand, I should use Regex lookaround concepts here but, I am not able to achieve the desired result. Kindly help.
I think you could do it like this without lookarounds:
(\w+(?:\[\*\])?)
Test
http://rextester.com/YHNRC51736
a capured group (
get or more word characters \w+
with an optional non captured group (?:\[\*\])?
import re
a = "QARR_1 * QARR_1 * NPSH[*] + NPSH0"
x= re.split(' \* | \+ ',a)
print x
['QARR_1', 'QARR_1', 'NPSH[*]', 'NPSH0']
Hmmm, well... this works in the regex tool I used:
\w+\[?\*?\]?
Not the most elegant, but pretty simple, so long as the input isn't broken like: "Abc12*]", "abc12[]", etc.
How it works:
\w+ this will greedily capture any sequence of word characters (keeps capturing until it runs out of characters that match), basically translates to: [a-zA-Z0-9_]+
\[?, \*?, \]? well, to start, the backslash here is used as an escape character to get Regex to literally look for the characters [, * and ]. They need to be escaped because they have a special meaning in Regex syntax otherwise. The ? at the end of each part tells the Regex pattern to match for the character between 0 and 1 times. It is necessary to be able to capture it 0 times, to allow matches that don't have the characters ([, * ,] ) at the end to be made.
A few examples of the kind of things it will match:
apples123_121231_2133414[*]
Ap1]
Orange_11[*
1ba222nnana*]
A few examples of the kinds of things it won't match:
(note, cases where part of the word is highlighted, only the highlighted part will be matched.)
Pares]*[
++++!!~+
111Grapes[]
111Grapes[]*
So, given the input you supplied, it should be fine... these are just a few things to be aware of.

Regular expression to match 0 or more occurrences not working

This question may sound stupid. But I have tried several options and none of them worked.
I have the following in a string variable.
string myText="*someText*someAnotherText*";
What I mean by above is that, there can be 0 or more characters before "someText". There can be 0 or more characters after "someText" and before "someAnotherText". Finally, there can be 0 or more occurrences of any character before "someAnotherText".
I tried the following.
string res= Regex.Replace(searchFor.ToLower(), "*", #"\S*");
It didn't work.
Then I tried the following.
string res= Regex.Replace(searchFor.ToLower(), "*", #"\*");
Even that didn't work.
Can someone help pls ?
Even though I have mentioned "*" to indicate 0 or more occurrences, it says that I haven't mentioned the number of occurrences.
Unlike the DOS wildcard character, the * character in a regular expression means repeat the previous item (character, group, whatever) 0 or more times. In your regular expression the first * has no preceding character, the second one follows the t character, so will repeat that any number of times.
To get '0 or more of any character' you need to use the composition .* where . is 'any character' and * is '0 or more times'.
In other words to search for someText followed any number of characters later by someAnotherText you would use the following Regex:
var re = new Regex(#"someText.*someAnotherText");
Note that unless you specify otherwise by putting start/end specifiers in (^ for start of string, $ for end) the Regex will match any substring of the test string.
Tests for the above, all returning true:
re.IsMatch("This is someText, followed by someAnotherText with text after.");
re.IsMatch("someTextsomeAnotherText");
re.IsMatch("start:someTextsomAnotherText:end");
And so on.
In Regex terms * is a quantifier. Other quantifiers are:
? Match 0 or 1
+ Match 1 or more
{n} Match 'n' times
{n,} Match at least 'n' times
{n,m} Match 'n' to 'm' times
All apply to the preceding term in the Regex.
Placing a ? after another quantifier (including ?) will convert it to lazy form, where it will match as few items as it can. This will allow following terms to also match the terms you specified.
The regular expression to match 0 or more occurrences of any character is
.*
where . matches any single character and * matches zero or more occurrences of it.
(This answer is a quick reference simplification of the current answer.)

Regex pattern in C# with empty space

I am having issue with a reg ex expression and can't find the answer to my question.
I am trying to build a reg ex pattern that will pull in any matches that have # around them. for example #match# or #mt# would both come back.
This works fine for that. #.*?#
However I don't want matches on ## to show up. Basically if there is nothing between the pound signs don't match.
Hope this makes sense.
Thanks.
Please use + to match 1 or more symbols:
#+.+#+
UPDATE:
If you want to only match substrings that are enclosed with single hash symbols, use:
(?<!#)#(?!#)[^#]+#(?!#)
See regex demo
Explanation:
(?<!#)#(?!#) - a # symbol that is not preceded with a # (due to the negative lookbehind (?<!#)) and not followed by a # (due to the negative lookahead (?!#))
[^#]+ - one or more symbols other than # (due to the negated character class [^#])
#(?!#) - a # symbol not followed with another # symbol.
Instead of using * to match between zero and unlimited characters, replace it with +, which will only match if there is at least one character between the #'s. The edited regex should look like this: #.+?#. Hope this helps!
Edit
Sorry for the incorrect regex, I had not expected multiple hash signs. This should work for your sentence: #+.+?#+
Edit 2
I am pretty sure I got it. Try this: (?<!#)#[^#].*?#. It might not work as expected with triple hashes though.
Try:
[^#]?#.+#[^#]?
The [^ character_group] construction matches any single character not included in the character group. Using the ? after it will let you match at the beginning/end of a string (since it matches the preceeding character zero or more times. Check out the documentation here

regular expression to match a pattern

I need a regular expression for c# which can match following pattern
abc1abcd
1abcdefg
abcdefg1
basically my expression should have at least one number and min size is 8 char including number. If possible explain the regex also.
I'd probably check with two statements. Just check the length eg
string.Length > 7
and then make sure it this regex can find a match...
[0-9]
You can use a look-ahead assertion to verify the length, and then search forward for a digit, thus:
(?=.{8}).*[0-9]
We look-ahead for 8 characters, and if that is successful, then we actually attempt to match "anything, followed by a digit".
But really, don't do this. Just check the length explicitly. It's much clearer.
Your regular expression pattern should just be: \d+ (match 1 or more numbers). For your example, it's probably best to not determine minimum length using regex since all you care about is that it has at least 1 number and is at least than 8 characters
Regex regEx = new Regex(#"\d+");
isValid = regEx.Match(myString).Success && myString.Length >= 8;
The pattern \d is just the same as [0-9] and the + symbol means at least one of. The # symbol in front of the string is so that it what try to escape \d.
As mentioned by El Ronnoco in the comments, just \d would match your requirement. Knowing about \d+ is useful for more complicated patterns where you want a few numbers in between some strings,etc.
Also: I've just read something that I didn't know. \d matches any character in the Unicode number, decimal digit category which is a lot more than just [0-9]. Something to be aware of if you just want any number. Otherwise El Ronnoco's answer of [0-9] for your pattern is sufficient.

Regex for 1 or 2 digits, optional non-alphanumeric, 2 known alphas

I've been bashing my head against the wall trying to do what should be a simple regex - I need to match, eg 12po where the 12 part could be one or two digits, then an optional non-alphanumeric like a :.-,_ etc, then the string po.
The eventual use is going to be in C# but I'd like it to work in regular grep on the command line as well. I haven't got access to C#, which doesn't help.
^[0-9]{1,2}[:.,-]?po$
Add any other allowable non-alphanumeric characters to the middle brackets to allow them to be parsed as well.
^\d{1,2}[\W_]?po$
\d defines a number and {1,2} means 1 or two of the expression before, \W defines a non word character.
^[0-9][0-9]?[^A-Za-z0-9]?po$
You can test it here: http://www.regextester.com/
To use this in C#,
Regex r = new Regex(#"^[0-9][0-9]?[^A-Za-z0-9]?po$");
if (r.Match(someText).Success) {
//Do Something
}
Remember, # is a useful symbol that means the parser takes the string literally (eg, you don't need to write \\ for one backslash)

Categories