What is this Regex doing: new Regex(#"(?<!\\),"); - c#

Regex rx = new Regex(#"(?<!\\\\),");
String test = "OU=James\\, Brown,OU=Test,DC=Internal,DC=Net";
This works perfectly, but I want to understand it. I've been gooling without success. Can somebody give me a word or phrase that I can use to look this up and understand it.
I would have thought that it should be written like this:
new Regex(#"(\\\\)?,");
I've seen the (?zzzzzz) syntax before. It's the <! part that I'm stumped by.

(?<!…) is a negative look-behind assertion. In your regex
(?<!\\\\),
the , matches a comma obviously. The \\\\ matches 2 backslashes. Then (?<!\\\\), matches any commas not preceeded by 2 backslashes.
Therefore it will match the , before the OU and DC, but not between James and Brown:
OU=James\\, Brown,OU=Test,DC=Internal,DC=Net
^ ^ ^

The <! part indicates a negative lookbehind. The rest of the expression (just a comma) matches only if it's not preceded by a backslash (or two backslashes, depending on whether the title or the body of your question is the accurate one).

Related

Regex pattern in C# with empty space

I am having issue with a reg ex expression and can't find the answer to my question.
I am trying to build a reg ex pattern that will pull in any matches that have # around them. for example #match# or #mt# would both come back.
This works fine for that. #.*?#
However I don't want matches on ## to show up. Basically if there is nothing between the pound signs don't match.
Hope this makes sense.
Thanks.
Please use + to match 1 or more symbols:
#+.+#+
UPDATE:
If you want to only match substrings that are enclosed with single hash symbols, use:
(?<!#)#(?!#)[^#]+#(?!#)
See regex demo
Explanation:
(?<!#)#(?!#) - a # symbol that is not preceded with a # (due to the negative lookbehind (?<!#)) and not followed by a # (due to the negative lookahead (?!#))
[^#]+ - one or more symbols other than # (due to the negated character class [^#])
#(?!#) - a # symbol not followed with another # symbol.
Instead of using * to match between zero and unlimited characters, replace it with +, which will only match if there is at least one character between the #'s. The edited regex should look like this: #.+?#. Hope this helps!
Edit
Sorry for the incorrect regex, I had not expected multiple hash signs. This should work for your sentence: #+.+?#+
Edit 2
I am pretty sure I got it. Try this: (?<!#)#[^#].*?#. It might not work as expected with triple hashes though.
Try:
[^#]?#.+#[^#]?
The [^ character_group] construction matches any single character not included in the character group. Using the ? after it will let you match at the beginning/end of a string (since it matches the preceeding character zero or more times. Check out the documentation here

Why is this regex not allowing this text?

I have a username validator IsValidUsername, and I am testing "baconman" but it is failing, could someone please help me out with this regex?
if(!Regex.IsMatch(str, #"^[a-zA-Z]\\w+|[0-9][0-9_]*[a-zA-Z]+\\w*$")) {
isValid = false;
}
I want the restrictions to be: (It's very close)
Be between 5 & 17 characters long
contain at least one letter
no spaces
no special characters
You're escaping unnecessarily: if you write your regex as starting with # outside the string, you don't need both \ - just one is fine.
Either:
#"\w"
or
"\\w"
Edit: I didn't make this clear: right now due to the double escaping, you're looking for a \ in your regex and a w. So your match would need [some character]\w to match (example: "a\w" or "a\wwwwww" would match.
Your requirements are best taken care of in normal C#. They don't map well to a regular expression. Just code them up using LINQ which works on strings like it would on an IEnumerable<char>.
Also, understanding a query of a string is much easier than understanding a Regex with the requirements that you have.
It is possible to do everything as part of a Regex, however it is not pretty :-)
^(\w(?=\w*[a-zA-Z])|[a-zA-Z]|\w(?<=[a-zA-Z]\w*)){5,17}$
It does 3 checks that always results in 1 character being matched (so we can perform the length check in the end)
Either the character is any word character \w which is before [a-zA-Z]
Or it is [a-zA-Z]
Or it is any word character \w which is after [a-zA-Z]

C# Regex match on special characters

I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting

Regex to exclude part of string on split

I asked a similar question a few weeks ago on how to split a string based on a specific substring. However, I now want to do something a little different. I have a line that looks like this (sorry about the formatting):
What I want to do is split this line at all the newline \r\n sequences. However, I do not want to do this if there is a PA42 after one of the PA41 lines. I want the PA41 and the PA42 line that follows it to be on the same line. I have tried using several regex expressions to no avail. The output that I am looking for will ideally look like this:
This is the regex that I am currently using, but it does not quite accomplish what I am looking for.
string[] p = Regex.Split(parameterList[selectedIndex], #"[\r\n]+(?=PA41)");
If you need any clarifications, please feel free to ask.
You're trying a positive look-ahead, you want a negative one. (Positive insures that the pattern does follow, whereas negative insures it does not.)
(\\r\\n)(?!PA42)
Works for me.
string[] splitArray = Regex.Split(subjectString, #"\\r\\n(?!PA42)");
This should work. It uses a negative lookahead assertion to ensure that a \r\n sequence is not followed by PA42.
Explanation :
#"
\\ # Match the character “\” literally
r # Match the character “r” literally
\\ # Match the character “\” literally
n # Match the character “n” literally
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
PA42 # Match the characters “PA42” literally
)
"

What is wrong with my regex (simple)?

I am trying to make a regex that matches all occurrences of words that are at the start of a line and begin with #.
For example in:
#region #like
#hey
It would match #region and #hey.
This is what I have right now:
^#\w*
I apologize for posting this question. I'm sure it has a very simple answer, but I have been unable to find it. I admit that I am a regex noob.
What you've got should work, depending on what flags you pass for RegexOptions. You need to make sure you pass RegexOptions.Multiline:
var matches = Regex.Matches(input, #"^#\w*", RegexOptions.Multiline);
See the documentation I linked to above:
Multiline Multiline mode. Changes the meaning of ^ and $ so they match at the beginning and end, respectively, of any line, and not just the beginning and end of the entire string.
The regex looks fine, make sure you're using a verbatim string literal (# prefix) to define your regex, i.e. #"^#\w*" otherwise the backslash will be treated as an escape sequence.
Use this regex
^#.+?\b
.+ will ensure at least one character after # and \b indicates word boundry. ? adds non-greediness to the + operator so as to avoid matching whole string #region #like

Categories