What I'm trying to do is that simple but the same time is not.
I have a function of RegEx in C# to find all the words inside quotes,
But if a specific word exist before the quotes, Ignore the whole word and continue to the next row.
While still looking for a specific kind of symbols inside the quotes and Ignore too.
Example -
My RegEx = #"(?<!Foo\()\""[^{}\r\n]*\""";
Text -
dontfindme1 = "Hello{}"
dontfindme2 = Foo("ABC")
findme1 = "Just a simple text to find"
findme2 = SuperFoo("WORKS")
Output example -
"ABC"
"Just a simple text to find"
"WORKS"
Now my problem is that I dont want to find the name "Foo(" before the quotes
And I dont want to find "{" or "}" or "(" or ")" or new lines
I only need "ABC" not to be found and skip to the next row.
You could use a negative lookahead (?! to check that the string does not match either {} between double quotes or Foo(
^(?!.*\bFoo\()(?!.*"[^"\r\n]*[{}][^"\r\n]*").*$
In C# string pattern = #"^(?!.*\bFoo\()(?!.*""[^""\r\n]*[{}][^""\r\n]*"").*$";
Regex demo
Explanation
^ Assert the start of the string
(?! Negative lookahead, assert that what follows does not
.*\bFoo\( Match any character 0+ times followed by a word boundary and Foo(
) Close negative lookahead
(?! Negative lookahead, assert that what follows does not
.* Match any character 0+ times
"[^"\r\n]* Match a double quote, match 0+ times not ", \r, \n
[{}] Match { or }
[^"\r\n]*" Match 0+ times not ", \r, \n followed by matching a double quote
) Close negative lookahead
.* Match any character 0+ times
$ Assert the end of the string
Related
I need any number of Versions from this string:
magic-string: [\"1.0.2.2 \", \"1.2\", \"1.1\"];
What I have:
[\s""\\]+([\d\.]+)+[\s""\\]+
Matches:
1.0.2.2
1.2
1.1
Fine so far, but I want to ensure that the "magic-string" is available as well and this will not match:
any-random-string: [\"1.0.2.2 \", \"1.2\", \"1.1\"];
EDIT:
Working solution in C#:
public class Program
{
public static void Main()
{
string pattern = #"(?<=^\s*magic-string:\s*\[(?:\s*""(?:\d+(?:\.\d+)*\s*"",)?)+)\d+(?:\.\d+)*";
var matches = Regex.Matches(" magic-string: [ \"1.0\", \"1.2\", \"1.1\" ];", pattern, RegexOptions.IgnoreCase);
Console.WriteLine(matches.Count);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups.Count);
Console.WriteLine(match.Value);
Console.WriteLine(match.Groups[1].Value);
}
}
}
https://dotnetfiddle.net/Kc2J2A
In languages that support variable-length lookbehinds (like .NET and JavaScript EMCA2018+):
See regex in use here
(?<=^magic-string:\s*\[(?:\s*\\"(?:\d+(?:\.\d+)*\s*\\",)?)+)\d+(?:\.\d+)*
How it works:
(?<=^magic-string:\s*\[(?:\s*\\"(?:\d+(?:\.\d+)*\s*\\",)?)+) positive lookbehind ensuring what precedes matches the following
^magic-string:\s*\[ match the following
^ assert position at the start of the line
magic-string: match this literally
\s*\[ match any number of whitespace characters, followed by [ literally
(?:\s*\\"(?:\d+(?:\.\d+)*\s*\\",)?)+ match the following one or more times
\s*\\", match any number of whitespace characters, followed by \", literally
(?:\d+(?:\.\d+)*\s*\\",)? optionally match the following
\d+ match any digit one or more times
(?:\.\d+)* match . then one or more digits, any number of times (matches .1, .1.1, etc. where 1 is any number)
\s*\\" match any number of whitespace characters, followed by \" literally
\d+ match any digit one or more times
(?:\.\d+)* match . then one or more digits, any number of times (matches .1, .1.1, etc. where 1 is any number)
In simple terms, this matches all locations of 0, 0.0, 0.0.0, etc. that are preceded by magic-string: [\"0.0\", \" with the substring 0.0\", \" appearing zero or more times. (0.0 being a placeholder for all the formats that (?:\d+(?:\.\d+)* matches).
You can use the following regex in languages that support \G and \K tokens (like PCRE):
See regex in use here
(?:^magic-string:\s*\[|\G(?!\A)\s*\\",)\s*\\"\K\d+(?:\.\d+)*
How it works:
(?:^magic-string:\s*\[|\G(?!\A)\s*\\",) match either of the following options
^magic-string:\s*\[ match the following
^ assert position at the start of the line
magic-string: match this literally
\s*\[ match any number of whitespace characters, followed by [ literally
\G(?!\A)\s*\\", match the following
\G(?!\A) assert position at the end of the previous match
\s*\\", match any number of whitespace characters, followed by \", literally
\s*\\"\K\d+(?:\.\d+)*
\s*\\" match any number of whitespace characters, followed by \" literally
\K reset the starting point of the match, any previously consumed characters are no longer in the final match
\d+ match any digit one or more times
(?:\.\d+)* match . then one or more digits, any number of times (matches .1, .1.1, etc. where 1 is any number)
In simple terms, this matches all locations that are preceded by magic-string: [\" or the position of a previous matched followed by \", \".
I have a string like Acc:123-456-789 and another string like -1234567, I need your help to write an expression to match digits in case there is no separator between the digits.
-*(?!\d*(?:\d*-)$)\d*$
Input strings:
Acc:123-456-789 -12323232 7894596
Desired result:
group 1 12323232
group 2 7894596
I think this ought to work:
(?<=^|\s|\s-)(\d+)(?=\s|$)
Breaking it down:
(?<=^|\s|\s-) - A positive lookbehind that matches the start of the string, whitespace, or whitespace followed by a -.
(\d+) - Matches and captures number sequences.
(?=\s|$) - A positive lookahead that matches whitespace or the end of the string.
** Note: If you need to capture negative number sequences, replace (\d+) with (\-?\d+).
Try it online
Regex reference
Remember for use in C# that you need to escape backslashes or use the # prefix to a string literal (#" ").
string line = "Rok rok irrelevant text irrelevant;text.irrelevant,text";
string NewLine = Regex.Replace(line, #"\b[rR]\w*", "");
Right now it replaces every word starting with r/R with a blank space, but I want to make everything a blank space EXCEPT words starting with r/R.
Edit
It seems all you want is to extract words starting with r or R and join them with a space. In this case, use a mere \b[rR]\w* regex and the following code:
var result = string.Join(" ", Regex.Matches(line, #"\b[rR]\w*").Cast<Match>().Select(x => x.Value));
See the C# demo.
Original answer
You may use a negative lookahead after a word boundary:
\b(?![rR])\w+
^^^^^^^^
Note that the + quantifier is better here since you want to remove at least 1 char found.
Or, in case you also want to remove all non-word chars after the found word, use
\b(?![rR])\w+\W*
See the regex demo #1 and regex demo #2.
If you want to remove any non-word chars before and after a qualifying word, use
var result = Regex.Replace(line, #"\W*\b(?![rR])\w+\W*", " ").Trim();
It will remove all non-word chars before a word not starting with r and R and after it.
Details
\b - a word boundary
(?![rR]) - a negative lookahead that will fail the match if, immediately to the right of the current location, there is r or R
\w+ - 1+ word chars
\W* - 0+ non-word chars.
I have to write a regex expression to parse each CSV line. For example, the regex is to match a double quoted string that contains even number of double quotation ("), not single quotation.
For example, the CSV delimiter is tab, \t. I have a line like this:
"first column ""end"\tsecond column\t"third \nNewLine\rcolumn\tend"
The regex expression will allow me to extract three columns like below:
first column ""end
second column
third \nNewLine\rcolumn\tend
Please note that there are two double quotes in first column, but it can allow even number of double quotes.
Please note that there are \t within third column, as are \n and \r.
The first and third columns can be quoted if it makes easy to write regex.
Any idea?
How about splitting on tabs if and only if an even number of quotes follows?
splitArray = Regex.Split(subject,
#"\t # Match a tab
(?= # if the following regex matches after it:
(?: # Match...
[^""]*"" # Any number of non-quotes, followed by a quote
[^""]*"" # ditto, to ensure an even number of quotes
)* # Repeat as many times as needed
[^""]* # Then match any remaining non-quote characters
$ # until the end of the string.
) # End of lookahead assertion",
RegexOptions.IgnorePatternWhitespace);
I WOuld like to implement textBox in which user can only insert text in pattern like this:
dddddddddd,
dddddddddd,
dddddddddd,
...
where d is a digit. If user leave control with less then 10 digits in a row validation should fail and he should not be able to write in one line more than 10 digits, then acceptable should be only comma ",".
Thanks for help
Match m = Regex.Match(textBox.Text, #"^\d{10},$", RegexOptions.Multiline);
Haven't tried it, but it should work. Please take a look here and here for more information.
I suggest the regex
\A(?:\s*\d{10},)*\s*\d{10}\s*\Z
Explanation:
\A # start of the string
(?: # match the following zero or more times:
\s* # optional whitespace, including newlines
\d{10}, # 10 digits, followed by a comma
)* # end of repeated group
\s* # match optional whitespace
\d{10} # match 10 digits (this time no comma)
\s* # optional whitespace
\Z # end of string
In C#, this would look like
validInput = Regex.IsMatch(subjectString, #"\A(?:\s*\d{10},)*\s*\d{10}\s*\Z");
Note that you need to use a verbatim string (#"...") or double all the backslashes in the regex.