Forward and Backward RegEx Lookups - c#

How can i get the value of nserver without the dot at the end?
It tried this Regex, but i cannot get rid of the last dot.
nserver:(\s*)(?<Value>(\S*))
This is the data in which I search
% By submitting a query to RIPN's Whois Service
% you agree to abide by the following terms of use:
% http://www.ripn.net/about/servpol.html#3.2 (in Russian)
% http://www.ripn.net/about/en/servpol.html#3.2 (in English).
domain: WEBMONEY.RU
nserver: ns.molot.ru.
nserver: ns.relsoftcom.ru.
nserver: ns.relsoft.ru.
state: REGISTERED, DELEGATED, VERIFIED
org: "Computing Forces" CJSC
registrar: RU-CENTER-REG-RIPN
admin-contact: https://www.nic.ru/whois
created: 1998.04.24
paid-till: 2014.05.01
free-date: 2014.06.01
source: TCI
Last updated on 2014.02.02 00:06:43 MSK
I want this as result
ns.molot.ru
I'm using these options
RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;

Well, isn't your regex supposed to be:
(?<=nserver:)(\s*)(?<Value>.*(?=.\r\n))
? (Carriage return before newline/linefeed)
You could perhaps escape the dot, and turn the greedy .* into lazy .*?, but a simpler way without the lookahead could be:
(?<=nserver:)\s*(?<Value>\S+)\.
regex101 demo
Explanation added due to comments:
This is where \s* finished matching and \S+ starts matching:
ns.molot.ru.
Since \S+ matches all characters except whitespace and the following \r \n \t \f, it will match these (bolded) and stop before the newline:
ns.molot.ru.
After that, the regex has \. so it tries to match a period, but there is no period left to match. \S+ will then backtrack (give up one character from its match) and try again:
ns.molot.ru.
Now, after giving up the last character it matched, the regex tries to match \. again and succeeds.
But if you have \. before the \S+, the regex will try to match the period first, when it's still here (at step 1):
ns.molot.ru.
But since there's an n there, it won't match and the whole regex stops there.

A quick way, change your pattern to:
(?<=nserver:\s*)(?<Value>.*)(?=\.\r?\n)
You must remove the pattern option singleline.

Related

Regex matching exactly 1 of a specific character(s)

I'm trying to match #relRef but not ##absRef from:
Stuff #relRef more stuff ##absRef
From what I understand, [^#]#{1}[^\s]* should work, but it's still incorrectly selecting both. Does {1} not mean what I think I does? (I think it means "match the previous thing exactly 1 time")
[^#]#[^#][^\s]* does work, but it's less convenient for my use case and more importantly, I don't understand why my original solutions doesn't work.
Finally, does whatever this answer ends up being change if it's multiple characters. (i.e. if the sentence is Stuff AT_relRef more stuff ATAT_absRef so now I'm not checking a single # character but "AT" instead.)
tl;dr:
1) Why does [^#]#{1}[^\s]* match ##absRef and how do I fix it to only match #relRef?
2) Does the answer to #1 change if I'm using more than a single character to mark the reference? (i.e. AT_relRef and ATAT_absRef)
Your regex matches both because [^\s] that matches any char but whitespace will match the # char, too. The [^#] matches a space in both cases, so it is not helpful enough. Also, #{1} is the same as #, the {1} quantifier is always redundant in any regex.
You may use
(?<!#)#[^\s#]\S*
See the regex demo.
Details
(?<!#) - no # right before the current location
# - a # char
[^\s#] - a char other than # and whitespace
\S* - 0 or more non-whitespace chars.
As for the second case, a negative lookbehind will work, too:
(?<!AT)AT_\S*
See the regex demo. It matches
(?<!AT) - any location not preceded with AT
AT_ - an AT_ substring
\S* - 0+ chars other than whitespace.

Regex for alpha number string in c# accepting underscore and white spaces

I already gone through many post on SO. I didn't find what I needed for my specific scenario.
I need a regex for alpha numeric string.
where following conditions should be matched
Valid string:
ameya123 (alphabets and numbers)
ameya (only alphabets)
AMeya12(Capital and normal alphabets and numbers)
Ameya_123 (alphabets and underscore and numbers)
Ameya_ 123 (alphabets underscore and white speces)
Invalid string:
123 (only numbers)
_ (only underscore)
(only space) (only white spaces)
any special charecter other than underscore
what i tried till now:
(?=.*[a-zA-Z])(?=.*[0-9]*[\s]*[_]*)
the above regex is working in Regex online editor however not working in data annotation in c#
please suggest.
Based on your requirements and not your attempt, what you are in need of is this:
^(?!(?:\d+|_+| +)$)[\w ]+$
The negative lookahead looks for undesired matches to fail the whole process. Those are strings containing digits only, underscores only or spaces only. If they never happen we want to have a match for ^[\w ]+$ which is nearly the same as ^[a-zA-Z0-9_ ]+$.
See live demo here
Explanation:
^ Start of line / string
(?! Start of negative lookahead
(?: Start of non-capturing group
\d+ Match digits
| Or
_+ Match underscores
| Or
[ ]+ Match spaces
)$ End of non-capturing group immediately followed by end of line / string (none of previous matches should be found)
) End of negative lookahead
[\w ]+$ Match a character inside the character set up to end of input string
Note: \w is a shorthand for [a-zA-Z0-9_] unless u modifier is set.
One problem with your regex is that in annotations, the regex must match and consume the entire string input, while your pattern only contains lookarounds that do not consume any text.
You may use
^(?!\d+$)(?![_\s]+$)[A-Za-z0-9\s_]+$
See the regex demo. Note that \w (when used for a server-side validation, and thus parsed with the .NET regex engine) will also allow any Unicode letters, digits and some more stuff when validating on the server side, so I'd rather stick to [A-Za-z0-9_] to be consistent with both server- and client-side validation.
Details
^ - start of string (not necessary here, but good to have when debugging)
(?!\d+$) - a negative lookahead that fails the match if the whole string consists of digits
(?![_\s]+$) - a negative lookahead that fails the match if the whole string consists of underscores and/or whitespaces. NOTE: if you plan to only disallow ____ or " " like inputs, you need to split this lookahead into (?!_+$) and (?!\s+$))
[A-Za-z0-9\s_]+ - 1+ ASCII letters, digits, _ and whitespace chars
$ - end of string (not necessary here, but still good to have).
If I understand your requirements correctly, you need to match one or more letters (uppercase or lowercase), and possibly zero or more of digits, whitespace, or underscore. This implies the following pattern:
^[A-Za-z0-9\s_]*[A-Za-z][A-Za-z0-9\s_]*$
Demo
In the demo, I have replaced \s with \t \r, because \s was matching across all lines.
Unlike the answers given by #revo and #wiktor, I don't have a fancy looking explanation to the regex. I am beautiful even without my makeup on. Honestly, if you don't understand the pattern I gave, you might want to review a good regex tutorial.
This simple RegEx should do it:
[a-zA-Z]+[0-9_ ]*
One or more Alphabet, followed by zero or more numbers, underscore and Space.
This one should be good:
[\w\s_]*[a-zA-Z]+[\w\s_]*

Regex: Few Matches with * Quantifier

My regex is ending by quantifier * .
But I have few matches in a string. How can I make so it still found all matches ? My regex:
((CMD1|CMD2)+(?::|;)+.*)
And the test string is "cmd1: test. test. test cmd2: test2. test2. test2"
So I need to get matches:
cmd1: test. test. test
cmd2: test2. test2. test2
Commands could be random words like "Look", "Take", "Go". There could be n-occurance of any commands in one string.
Example:
Go: some sentences. and more. Take: other more sentences, and even more text here. Look: more and more. and more.
You could use a positive lookahead:
\w+:.*?(?= \w+:|$)
Match a word character one or more times \w+
Match a colon :
Match any character zero or more times .*
Make it non greedy ?
A positive lookahead which asserts a word character one or more times \w+ followed by a colon : or | the end of the sting (?= \w+:|$)
Demo
A general rule when writing regex is that when you want to find all occurrences of a pattern and put each pattern into its own match, you write a regex for that pattern, not that pattern quantified * times. Otherwise, you will end up putting the whole string into one single match.
I edited the regex for you:
CMD(?:1|2)(?::|;).*?(?=$|CMD)
The beginning is pretty much self-explanatory. Towards the end, I matched . with a lazy quantifier *?. This will stop matching as soon as the string after it matches the lookahead. The lookahead just matches another CMD or the end of the string.
Remember to turn on case insensitive option!
string s = "Go: some sentences. and more. Take: other more sentences, and even more text here. Look: more and more. and more.";
var matches = Regex.Matches(s, #"(?i)(go|take|look):.+?(?=\s+\w+:)");
You can remove \s+, but in this case you should call Trim on result string.

Regex: greedy quantifier behaving lazy

.NET implementation of Regex defines the '?' character as a greedy quantifier that informs its expression to match 0 or 1 times and to prefer 1 if possible.
Consider the following source text:
some text (some parenthetical text)
And the following regex:
\A(.+)(?:\s\(.+\))?$
The result should be one matching group with the value:
some text
Instead, it is the whole line. Now when I remove from the regex the greedy 0 or 1 quantifier '?', I do get the expected result. However, since my requirements expect the parenthetical text may not exist, I can't leave that 0 or 1 quantifier off. How do I force it to be greedy?
The reason why this doesn't match the way you think it will is because (.+) is greedy.
Let me explain:
(.+) is greedy so it will immediately match the entire string.
(?:\s\(.+\))? is also greedy however just because something is greedy it doesn't mean that it has to match if it doesn't have too.
Take this example:
string: abc123
regex: (.+)(\d{3})?
.+ will start out matching abc123. The regex engine will get to the next character (which is an empty character) and see this (\d{3})?. Now, the regex engine will prefer to match \d{3} if possible but it has already matched the entire string. Since \d{3} is technically optional, it can throw it away.
Your best bet is to make the first section lazy and keep the last section greedy.
\A(.+)(?:\s\(.+\))?$ will become \A(.+?)(?:\s\(.+\))?$
(.+?) will try to match as few characters as possible so it leaves room for the second half but if that second half is not needed it'll consume the rest of the string.
Here's regex101 with an example (I changed \A to ^ so multiline would work)

C# Regex to match a string that doesn't contain a certain string?

I want to match any string that does not contain the string "DontMatchThis".
What's the regex?
try this:
^(?!.*DontMatchThis).*$
The regex to match a string that does not contain a certain pattern is
(?s)^(?!.*DontMatchThis).*$
If you use the pattern without the (?s) (which is an inline version of the RegexOptions.Singleline flag that makes . match a newline LF symbol as well as all other characters), the DontMatchThis will only be searched for on the first line, and only a string without LF symbols will be matched with .*.
Pattern details:
(?s) - a DOTALL/Singleline modifier making . match any character
^ - start of string anchor
(?!.*DontMatchThis) - a negative lookahead checking if there are any 0 or more characters (matched with greedy .* subpattern - NOTE a lazy .*? version (matching as few characters as possible before the next subpattern match) might get the job done quicker if DontMatchThis is expected closer to the string start) followed with DontMatchThis
.* - any zero or more characters, as many as possible, up to
$ - the end of string (see Anchor Characters: Dollar ($)).

Categories