My regex is ending by quantifier * .
But I have few matches in a string. How can I make so it still found all matches ? My regex:
((CMD1|CMD2)+(?::|;)+.*)
And the test string is "cmd1: test. test. test cmd2: test2. test2. test2"
So I need to get matches:
cmd1: test. test. test
cmd2: test2. test2. test2
Commands could be random words like "Look", "Take", "Go". There could be n-occurance of any commands in one string.
Example:
Go: some sentences. and more. Take: other more sentences, and even more text here. Look: more and more. and more.
You could use a positive lookahead:
\w+:.*?(?= \w+:|$)
Match a word character one or more times \w+
Match a colon :
Match any character zero or more times .*
Make it non greedy ?
A positive lookahead which asserts a word character one or more times \w+ followed by a colon : or | the end of the sting (?= \w+:|$)
Demo
A general rule when writing regex is that when you want to find all occurrences of a pattern and put each pattern into its own match, you write a regex for that pattern, not that pattern quantified * times. Otherwise, you will end up putting the whole string into one single match.
I edited the regex for you:
CMD(?:1|2)(?::|;).*?(?=$|CMD)
The beginning is pretty much self-explanatory. Towards the end, I matched . with a lazy quantifier *?. This will stop matching as soon as the string after it matches the lookahead. The lookahead just matches another CMD or the end of the string.
Remember to turn on case insensitive option!
string s = "Go: some sentences. and more. Take: other more sentences, and even more text here. Look: more and more. and more.";
var matches = Regex.Matches(s, #"(?i)(go|take|look):.+?(?=\s+\w+:)");
You can remove \s+, but in this case you should call Trim on result string.
Related
I have a bunch of URLs that I need to filter out, based on whether it contains the keyword 'staff'
1. /services
2. /services/EarNoseThroat
3. /services/EarNoseThroat/Audiology
4. /services/EarNoseThroat/Audiology/CochlearImplant
5. /services/BehavioralHealth/Clinic
6. /services/BehavioralHealth/Clinic/staff
7. /services/BehavioralHealth/Clinic/staff/Jamie-Hudgins
I want to create one regex pattern to match all the URLs that have /services after the host URL, but not 'staff' anywhere in the URL. Basically match URLS 1 to 5.
I also need a pattern than only match URL 6 and 7.
It seems like the negative lookahead will do the trick, except I don't know how to put it together. Can someone help me out?
Something like:
^\/services\/(?:[^\/]+\/?)*$
OR
^/services\/...any Depth here...\/(?!staff)
Regex to match the following:
/services
/services/EarNoseThroat
/services/EarNoseThroat/Audiology
/services/EarNoseThroat/Audiology/CochlearImplant
/services/BehavioralHealth/Clinic
Regex:
^\/services\/(?!.*\bstaff\b).*$
Click for Demo
Explanation:
^ - asserts the start of the string
\/services\/ - matches /services/
(?!.*\bstaff\b) - negative lookahead to make sure that the word staff does not appear anywhere in the string
.* - matches 0+ occurrences of any character except a newline character
$ - asserts the end of string
Regex to match the following:
/services/BehavioralHealth/Clinic/staff
/services/BehavioralHealth/Clinic/staff/Jamie-Hudgins
Regex:
^\/services\/(?=.*\bstaff\b).*$
Click for Demo
Explanation:
The only difference is the positive lookahead:
(?=.*\bstaff\b) - positive lookahead to make sure that the word staff appears somewhere in the string before the end of the string
I have the following input:
Person 1kg
To get the expected output:
Person 1kEq
I am using the following pattern:
string.Format(#"(?<!\S){0}(?!\S)", Regex.Escape("kg"));
Regex.Replace(inputSentence, Pattern, "kEq");
The Regex.Replace does not replace kg with kEq.
If I edit the input sentence to Person 1 kg the replacement happens,
Could someone help me with the pattern for this?
The (?<!\S) requires either a start of the string or a whitespace before the kg search term. The (?!\S) lookahead requires the end of string or a whitespace after the search term. That is why the replacement happens if you separate the number and the measurement unit with a space as in Person 1 kg.
It seems in this case, you want to replace a match if it is not enclosed with other letters. Use (?<!\p{L}) lookbehind at the start and (?!\p{L}) lookahead at the end:
string.Format(#"(?<!\p{{L}}){0}(?!\p{{L}})", Regex.Escape("kg"));
See the regex demo.
what is the difference between the two regex
new Regex(#"(([[[{""]))", RegexOptions.Compiled)
and
new Regex(#"(^[[[{""])", RegexOptions.Compiled)
I've used the both regex but can't find the difference. it's almost match similar things.
The regex patterns are not well written because
There are duplicate characters in character classes (thus redundant)
The first regex contains duplicate capture group on the whole pattern.
The first regex - (([[[{""])) - matches 1 character, either a [, a {, or a ", and captures it into Group 1 and Group 2. See demo. It is equal to
[[{"]
Demo
The second regex - (^[[[{""]) - only matches the same characters as the pattern above, but at the beginning of a string (if RegexOptions.Multiline is not set), or the beginning of a line (if that option is set). See demo. It is equal to
^[[{"]
See demo
You will access the matched characters using Regex.Match(s).Value.
More about anchors
Aslo see Caret ^: Beginning of String (or Line)
How can i get the value of nserver without the dot at the end?
It tried this Regex, but i cannot get rid of the last dot.
nserver:(\s*)(?<Value>(\S*))
This is the data in which I search
% By submitting a query to RIPN's Whois Service
% you agree to abide by the following terms of use:
% http://www.ripn.net/about/servpol.html#3.2 (in Russian)
% http://www.ripn.net/about/en/servpol.html#3.2 (in English).
domain: WEBMONEY.RU
nserver: ns.molot.ru.
nserver: ns.relsoftcom.ru.
nserver: ns.relsoft.ru.
state: REGISTERED, DELEGATED, VERIFIED
org: "Computing Forces" CJSC
registrar: RU-CENTER-REG-RIPN
admin-contact: https://www.nic.ru/whois
created: 1998.04.24
paid-till: 2014.05.01
free-date: 2014.06.01
source: TCI
Last updated on 2014.02.02 00:06:43 MSK
I want this as result
ns.molot.ru
I'm using these options
RegexOptions options = RegexOptions.IgnoreCase | RegexOptions.Singleline;
Well, isn't your regex supposed to be:
(?<=nserver:)(\s*)(?<Value>.*(?=.\r\n))
? (Carriage return before newline/linefeed)
You could perhaps escape the dot, and turn the greedy .* into lazy .*?, but a simpler way without the lookahead could be:
(?<=nserver:)\s*(?<Value>\S+)\.
regex101 demo
Explanation added due to comments:
This is where \s* finished matching and \S+ starts matching:
ns.molot.ru.
Since \S+ matches all characters except whitespace and the following \r \n \t \f, it will match these (bolded) and stop before the newline:
ns.molot.ru.
After that, the regex has \. so it tries to match a period, but there is no period left to match. \S+ will then backtrack (give up one character from its match) and try again:
ns.molot.ru.
Now, after giving up the last character it matched, the regex tries to match \. again and succeeds.
But if you have \. before the \S+, the regex will try to match the period first, when it's still here (at step 1):
ns.molot.ru.
But since there's an n there, it won't match and the whole regex stops there.
A quick way, change your pattern to:
(?<=nserver:\s*)(?<Value>.*)(?=\.\r?\n)
You must remove the pattern option singleline.
I asked a similar question a few weeks ago on how to split a string based on a specific substring. However, I now want to do something a little different. I have a line that looks like this (sorry about the formatting):
What I want to do is split this line at all the newline \r\n sequences. However, I do not want to do this if there is a PA42 after one of the PA41 lines. I want the PA41 and the PA42 line that follows it to be on the same line. I have tried using several regex expressions to no avail. The output that I am looking for will ideally look like this:
This is the regex that I am currently using, but it does not quite accomplish what I am looking for.
string[] p = Regex.Split(parameterList[selectedIndex], #"[\r\n]+(?=PA41)");
If you need any clarifications, please feel free to ask.
You're trying a positive look-ahead, you want a negative one. (Positive insures that the pattern does follow, whereas negative insures it does not.)
(\\r\\n)(?!PA42)
Works for me.
string[] splitArray = Regex.Split(subjectString, #"\\r\\n(?!PA42)");
This should work. It uses a negative lookahead assertion to ensure that a \r\n sequence is not followed by PA42.
Explanation :
#"
\\ # Match the character “\” literally
r # Match the character “r” literally
\\ # Match the character “\” literally
n # Match the character “n” literally
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
PA42 # Match the characters “PA42” literally
)
"