C# Regular Expression Look Ahead S - c#

I want to validate a string to see if it is at least 6 digits long and has at least 1 int.
string text = "I want to run 10 miles daily";
string pattern = #"(?=\.*\d).{6,}";
Match match = Regex.Match(text, pattern);
Console.WriteLine(match.Value);
Please explain me why I am getting the below output:
"10 miles daily"

The reason you get "10 miles daily" is because you specify a positive lookahead (?=\.*\d) which matches a literal dot zero or more times and then a digit.
That assertion succeeds at the position before the 1 where it matches zero times a dot and then a digit:
I want to run 10 miles daily
..............|
From that moment you match any character zero or more times which will match .{6,} which matches:
I want to run 10 miles daily
^^^^^^^^^^^^^^
You could update your regex to remove the backslash before the dot and use anchors to assert the start ^ and the end $ of the line:
^(?=.*\d).{6,}$
That would match
^ Assert begin of a line
(?=.*\d) Positive lookahead to assert what what followes contains a digit
.{6,} Match any character 6 more times
$ Assert the end of a line

I know this doesn't answer your question, but I assume you're looking for a working RegEx. Your statement was I want to validate a string to see if it is at least 6 digits long and has at least 1 int.. I assume you mean at least 6 characters long (including white space) and has at least 1 int.
This should do it (C#):
#"^(?=.*\d+)(?=.*[\w]).{6,}$";
RegEx Analyzer: UltraPico Expresso RegEx Tool
Test code and output (C#)
tatic void Main(string[] args)
{
string text = "abcdef";
Match match;
string pattern = #"^(?=.*\d+)(?=.*[\w]).{6,}$";
match = Regex.Match(text, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Text:'"+text + "'. Matched:" + match.Success + ". Value:" + match.Value);
text = "abcdefg";
match = Regex.Match(text, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Text:'" + text + "'. Matched:" + match.Success + ". Value:" + match.Value);
text = "abcde1";
match = Regex.Match(text, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Text:'" + text + "'. Matched:" + match.Success + ". Value:" + match.Value);
text = "abcd21";
match = Regex.Match(text, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Text:'" + text + "'. Matched:" + match.Success + ". Value:" + match.Value);
text = "abcd dog cat 21";
match = Regex.Match(text, pattern, RegexOptions.IgnoreCase);
Console.WriteLine("Text:'" + text + "'. Matched:" + match.Success + ". Value:" + match.Value);
Console.ReadKey();
}

\.*\d this means string ends with a digit.
(?=\.*\d) this take target digit in string
(?=\.*\d). this pattern means, in a string, everything after the digit found.
(?=\.*\d).{6,} every characters after digit and match must be at least 6 characters.
You need this regex : (?=.*\d).{6,}$. This return as a result at least 6 character and has at least one of them is digit

\.*\d => means a digit preceded by literal dots(`.`)
and * means repeating (0 ~ ) numbers of the character, dot(.).
Thus, as you can see, there is no character dot(.) in your input string. So regex engine try to match by evaluating *'s repeating number as zero, 0.
As a result, your regex may be interpreted as follows.
(?=\d).{6,}
Yes, this regex means one digit followed by more than 5 numbers of any characters.
But, {6,} means greedy search which searches possible maximum length string. On the other hand, lazymode( {6,}? in this case) searches possible minimum length string.
You can try this lazy mode regex and compare to the above greedy one's result.
(?=\d).{6,}?

Related

Regex to match positive and negative numbers and text between "" after a character

I need a regex for an input that contains positive and negative numbers and sometimes a string between " and ". I'm not sure if this can be done in only one pattern. Here's some test cases for the pattern:
*PATH "C:\Users\User\Desktop\Media\SoundBanks\Ambient\WAV_Data\AD_SMP_SFX_WIND0.wav"
*NODECOLOR 0 255 140
*FILEREF -7
*FREQUENCY 22050
The idea would be to use a pattern that returns:
C:\Users\User\Desktop\Media\SoundBanks\Ambient\WAV_Data\AD_SMP_SFX_WIND0.wav
0 255 140
-7
22050
The content always goes after the character *. I've split this in two patterns because I don't know how to do it all in one, but doesn't work:
MatchCollection NumberMtaches = Regex.Matches(FileLine, #"(?<=[*])-?[0-9]+");
MatchCollection FilePathMatches = Regex.Matches(FileLine, #"/,([^,]*)(?=,)/g");
You may read the file into a string and run the following regex:
var matches = Regex.Matches(filecontents, #"(?m)^\*\w+[\s-[\r\n]]*""?(.*?)""?\r?$")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
See the .NET regex demo.
Details:
(?m) - RegexOptions.Multiline option on
^ - start of a line
\* - a * char
\w+ - one or more word chars
[\s-[\r\n]]* - zero or more whitespaces other than CR and LF
"? - an optional " char
(.*?) - Group 1: any zero or more chars other than an LF char, as few as possible
"? - an optional " char
\r? - an optional CR
$ - end of a line/string.

Detecting a word followed by a dot or whitespace using regex

I am using regex and C# to find occurrences of a particular word using
Regex regex = new Regex(#"\b" + word + #"\b");
How can I modify my Regex to only detect the word if it is either preceded with a whitespace, followed with a whitespace or followed with a dot?
Examples:
this.Button.Value - should match
this.value - should match
document.thisButton.Value - should not match
You may use lookarounds and alternation to check for the 2 possibilities when a keyword is enclosed with spaces or is just followed with a dot:
var line = "this.Button.Value\nthis.value\ndocument.thisButton.Value";
var word = "this";
var rx =new Regex(string.Format(#"(?<=\s)\b{0}\b(?=\s)|\b{0}\b(?=\.)", word));
var result = rx.Replace(line, "NEW_WORD");
Console.WriteLine(result);
See IDEONE demo and a regex demo.
The pattern matches:
(?<=\s)\bthis\b(?=\s) - whole word "this" that is preceded with whitespace (?<=\s) and that is followed with whitespace (?=\s)
| - or
\bthis\b(?=\.) - whole word "this" that is followed with a literal . ((?=\.))
Since lookarounds are not consuming characters (the regex index remains where it was) the characters matched with them are not placed in the match value, and are thus untouched during the replacement.
If i am understanding you correctly:
Regex regex = new Regex(#"\b" + (word " " || ".") + #"\b");
Regex regex = new Regex(#"((?<=( \.))" + word + #"\b)" + "|" + #"(\b" + word + #"[ .])");
However, note that this could cause trouble if word contains characters that have special meanings in Regular Expressions. I'm assuming that word contains alpha-numeric characters only.
The (?<=...) match group checks for preceding and (?=...) checks for following, both without including them in the match.
Regex regex = new Regex(#"(?<=\s)\b" + word + #"\b|\b" + word + #"\b(?=[\s\.])");
EDIT: Pattern updated.
EDIT 2: Online test: http://ideone.com/RXRQM5

Regex tokenize issue

I have strings input by the user and want to tokenize them. For that, I want to use regex and now have a problem with a special case.
An example string is
Test + "Hello" + "Good\"more" + "Escape\"This\"Test"
or the C# equivalent
#"Test + ""Hello"" + ""Good\""more"" + ""Escape\""This\""Test"""
I am able to match the Test and + tokens, but not the ones contained by the ". I use the " to let the user specify that this is literally a string and not a special token. Now if the user wants to use the " character in the string, I thought of allowing him to escape it with a \.
So the rule would be: Give me everything between two " ", but the character in front of the last " can not be a \.
The results I expect are: "Hello" "Good\"more" "Escape\"This\"Test"
I need the " " characters to be in the final match so I know that this is a string.
I currently have the regex #"""([\w]*)(?<!\\"")""" which gives me the following results: "Hello" "more" "Test"
So the look behind isn't working as I want it to be. Does anyone know the correct way to get the string like I want?
Here's an adaption of a regex I use to parse command lines:
(?!\+)((?:"(?:\\"|[^"])*"?|\S)+)
Example here at regex101
(adaption is the negative look-ahead to ignore + and checking for \" instead of "")
Hope this helps you.
Regards.
Edit:
If you aren't interested in surrounding quotes:
(?!\+)(?:"((?:\\"|[^"])*)"?|(\S+))
To make it safer, I'd suggest getting all the substrings within unescaped pairs of "..." with the following regex:
^(?:[^"\\]*(?:\\.[^"\\]*)*("[^"\\]*(?:\\.[^"\\]*)*"))+
It matches
^ - start of string (so that we could check each " and escape sequence)
(?: - Non-capturing group 1 serving as a container for the subsequent subpatterns
[^"\\]*(?:\\.[^"\\]*)* - matches 0+ characters other than " and \ followed with 0+ sequences of \\. (any escape sequence) followed with 0+ characters other than " and \ (thus, we avoid matching the first " that is escaped, and it can be preceded with any number of escape sequences)
("[^"\\]*(?:\\.[^"\\]*)*") - Capture group 1 matching "..." substrings that may contain any escape sequences inside
)+ - end of the first non-capturing group that is repeated 1 or more times
See the regex demo and here is a C# demo:
var rx = "^(?:[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"))+";
var s = #"Test + ""Hello"" + ""Good\""more"" + \""Escape\""This\""Test\"" + ""f""";
var matches = Regex.Matches(s, rx)
.Cast<Match>()
.SelectMany(m => m.Groups[1].Captures.Cast<Capture>().Select(p => p.Value).ToArray())
.ToList();
Console.WriteLine(string.Join("\n", matches));
UPDATE
If you need to remove the tokens, just match and capture all outside of them with this code:
var keep = "[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*";
var rx = string.Format("^(?:(?<keep>{0})\"{0}\")+(?<keep>{0})$", keep);
var s = #"Test + ""Hello"" + ""Good\""more"" + \""Escape\""This\""Test\"" + ""f""";
var matches = Regex.Matches(s, rx)
.Cast<Match>()
.SelectMany(m => m.Groups["keep"].Captures.Cast<Capture>().Select(p => p.Value).ToArray())
.ToList();
Console.WriteLine(string.Join("", matches));
See another demo
Output: Test + + + \"Escape\"This\"Test\" + for #"Test + ""Hello"" + ""Good\""more"" + \""Escape\""This\""Test\"" + ""f""";.

Replace special characters or special characters followed by space

I have this particular string:
Administrationsomkostninger I -2.889 - r0.l l0
I would like to replace these characters:r,l and i with 1.
I use this expression:
([(t|r|l|i|)])
That gives me this string:
Adm1n1s11a11onsomkos1n1nge1 1 -2.889 - 10.1 10
Now i want to replace the all digits that contains a digit followed + a whitespace
so in this case only - 10.1 10 gets converted to -10.110
Try this
string input = "Administrationsomkostninger I -2.889 - r0.l l0";
string pattern = #"(?'spaces'\s){2,}";
string output = Regex.Replace(input, pattern, " ");
​

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

Categories