Regex to exclude part of string on split - c#

I asked a similar question a few weeks ago on how to split a string based on a specific substring. However, I now want to do something a little different. I have a line that looks like this (sorry about the formatting):
What I want to do is split this line at all the newline \r\n sequences. However, I do not want to do this if there is a PA42 after one of the PA41 lines. I want the PA41 and the PA42 line that follows it to be on the same line. I have tried using several regex expressions to no avail. The output that I am looking for will ideally look like this:
This is the regex that I am currently using, but it does not quite accomplish what I am looking for.
string[] p = Regex.Split(parameterList[selectedIndex], #"[\r\n]+(?=PA41)");
If you need any clarifications, please feel free to ask.

You're trying a positive look-ahead, you want a negative one. (Positive insures that the pattern does follow, whereas negative insures it does not.)
(\\r\\n)(?!PA42)
Works for me.

string[] splitArray = Regex.Split(subjectString, #"\\r\\n(?!PA42)");
This should work. It uses a negative lookahead assertion to ensure that a \r\n sequence is not followed by PA42.
Explanation :
#"
\\ # Match the character “\” literally
r # Match the character “r” literally
\\ # Match the character “\” literally
n # Match the character “n” literally
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
PA42 # Match the characters “PA42” literally
)
"

Related

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

Splitting a string with some characters with some ignored characters as well

There is a string: "QARR_1 * QARR_1 * NPSH[*] + NPSH0". I want to split it into a string array (exactly of 4 items) to get output as: QARR_1, QARR_1, NPSH[*], NPSH0.
I understand, I should use Regex lookaround concepts here but, I am not able to achieve the desired result. Kindly help.
I think you could do it like this without lookarounds:
(\w+(?:\[\*\])?)
Test
http://rextester.com/YHNRC51736
a capured group (
get or more word characters \w+
with an optional non captured group (?:\[\*\])?
import re
a = "QARR_1 * QARR_1 * NPSH[*] + NPSH0"
x= re.split(' \* | \+ ',a)
print x
['QARR_1', 'QARR_1', 'NPSH[*]', 'NPSH0']
Hmmm, well... this works in the regex tool I used:
\w+\[?\*?\]?
Not the most elegant, but pretty simple, so long as the input isn't broken like: "Abc12*]", "abc12[]", etc.
How it works:
\w+ this will greedily capture any sequence of word characters (keeps capturing until it runs out of characters that match), basically translates to: [a-zA-Z0-9_]+
\[?, \*?, \]? well, to start, the backslash here is used as an escape character to get Regex to literally look for the characters [, * and ]. They need to be escaped because they have a special meaning in Regex syntax otherwise. The ? at the end of each part tells the Regex pattern to match for the character between 0 and 1 times. It is necessary to be able to capture it 0 times, to allow matches that don't have the characters ([, * ,] ) at the end to be made.
A few examples of the kind of things it will match:
apples123_121231_2133414[*]
Ap1]
Orange_11[*
1ba222nnana*]
A few examples of the kinds of things it won't match:
(note, cases where part of the word is highlighted, only the highlighted part will be matched.)
Pares]*[
++++!!~+
111Grapes[]
111Grapes[]*
So, given the input you supplied, it should be fine... these are just a few things to be aware of.

Regex pattern in C# with empty space

I am having issue with a reg ex expression and can't find the answer to my question.
I am trying to build a reg ex pattern that will pull in any matches that have # around them. for example #match# or #mt# would both come back.
This works fine for that. #.*?#
However I don't want matches on ## to show up. Basically if there is nothing between the pound signs don't match.
Hope this makes sense.
Thanks.
Please use + to match 1 or more symbols:
#+.+#+
UPDATE:
If you want to only match substrings that are enclosed with single hash symbols, use:
(?<!#)#(?!#)[^#]+#(?!#)
See regex demo
Explanation:
(?<!#)#(?!#) - a # symbol that is not preceded with a # (due to the negative lookbehind (?<!#)) and not followed by a # (due to the negative lookahead (?!#))
[^#]+ - one or more symbols other than # (due to the negated character class [^#])
#(?!#) - a # symbol not followed with another # symbol.
Instead of using * to match between zero and unlimited characters, replace it with +, which will only match if there is at least one character between the #'s. The edited regex should look like this: #.+?#. Hope this helps!
Edit
Sorry for the incorrect regex, I had not expected multiple hash signs. This should work for your sentence: #+.+?#+
Edit 2
I am pretty sure I got it. Try this: (?<!#)#[^#].*?#. It might not work as expected with triple hashes though.
Try:
[^#]?#.+#[^#]?
The [^ character_group] construction matches any single character not included in the character group. Using the ? after it will let you match at the beginning/end of a string (since it matches the preceeding character zero or more times. Check out the documentation here

C# Regex match on special characters

I know this stuff has been talked about a lot, but I'm having a problem trying to match the following...
Example input: "test test 310-315"
I need a regex expression that recognizes a number followed by a dash, and returns 310. How do I include the dash in the regex expression though. So the final match result would be: "310".
Thanks a lot - kcross
EDIT: Also, how would I do the same thing but with the dash preceding, but also take into account that the number following the dash could be a negative number... didnt think of this one when I wrote the question immediately. for example: "test test 310--315" returns -315 and "test 310-315" returns 315.
Regex regex = new Regex(#"\d+(?=\-)");
\d+ - Looks for one or more digits
(?=\-) - Makes sure it is followed by a dash
The # just eliminates the need to escape the backslashes to keep the compiler happy.
Also, you may want this instead:
\d+(?=\-\d+)
This will check for a one or more numbers, followed by a dash, followed by one or more numbers, but only match the first set.
In response to your comment, here's a regex that will check for a number following a -, while accounting for potential negative (-) numbers:
Regex regex = new Regex(#"(?<=\-)\-?\d+");
(?<=\-) - Negative lookbehind which will check and make sure there is a preceding -
\-? - Checks for either zero or one dashes
\d+ - One or more digits
(?'number'\d+)- will work ( no need to escape ). In this example the group containing the single number is the named group 'number'.
if you want to match both groups with optional sign try:
#"(?'first'-?\d+)-(?'second'-?\d+)"
See it working here.
Just to describe, nothing complicated, just using -? to match an optional - and \d+ to match one or more digit. a literal - match itself.
here's some documentation that I use:
http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet
in the comments section of that page, it suggests escaping the dash with '\-'
make sure you escape your escape character \
You would escape the special meaning of - in regex language (means range) using a backslash (\). Since backslash has a special meaning in C# literals to escape quotes or be part of some characters, you need to escape that with another backslash(\). So essentially it would be \d+\\-.
\b\d*(?=\-) you will want to look ahead for the dash
\b = is start at a word boundry
\d = match any decimal digit
* = match the previous as many times as needed
(?=\-) = look ahead for the dash
Edited for Formatting issue with the slash not showing after posting

What is this Regex doing: new Regex(#"(?<!\\),");

Regex rx = new Regex(#"(?<!\\\\),");
String test = "OU=James\\, Brown,OU=Test,DC=Internal,DC=Net";
This works perfectly, but I want to understand it. I've been gooling without success. Can somebody give me a word or phrase that I can use to look this up and understand it.
I would have thought that it should be written like this:
new Regex(#"(\\\\)?,");
I've seen the (?zzzzzz) syntax before. It's the <! part that I'm stumped by.
(?<!…) is a negative look-behind assertion. In your regex
(?<!\\\\),
the , matches a comma obviously. The \\\\ matches 2 backslashes. Then (?<!\\\\), matches any commas not preceeded by 2 backslashes.
Therefore it will match the , before the OU and DC, but not between James and Brown:
OU=James\\, Brown,OU=Test,DC=Internal,DC=Net
^ ^ ^
The <! part indicates a negative lookbehind. The rest of the expression (just a comma) matches only if it's not preceded by a backslash (or two backslashes, depending on whether the title or the body of your question is the accurate one).

Categories