How to not match a string instance with a line break? [duplicate]

How to not match a string instance with a line break? [duplicate] - c#

This question already has answers here:
How do I match an entire string with a regex?
(8 answers)
Closed 2 years ago.
I am new to using Regex and was struggling with a problem. I have got to a certain point and I am stuck, I can't figure out how to 'ignore' or not match a pin which has a line break in it which is causing the code to fail the test.
using System;
using System.Text.RegularExpressions;
public class Pin
{
public static bool ValidatePin(string pin){
string pattern = "^([0-9]{4}|[0-9]{6})$";
Regex reg = new Regex(pattern);
Match match = reg.Match(pin);
if (match.Success) {
return true;
} else {
return false;
}
I have the regular expression above, how would I implement it so that when it tries to match a pin with the line break it returns "FALSE". The failed test pin was:
"1234\n".

Replace $ (it can match before the line feed symbol at the string end position) with \z, the very end of string, use
string pattern = #"\A([0-9]{4}|[0-9]{6})\z";
\A is also useful instead of ^ so as to always match the start of string (if that's your intention).
See proof.
Explanation
--------------------------------------------------------------------------------
\A the beginning of the string
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[0-9]{4} any character of: '0' to '9' (4 times)
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
[0-9]{6} any character of: '0' to '9' (6 times)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\z the end of the string

Related

Building a regular expression in C#

How to check the following text in C# with Regex:
key_in-get { 43243225543543543 };
or
key_in_set { password123 : 34980430943834 };
I tried to build a regular expression, but I failed after few hours.
Here is my code:
string text1 = "key_in-get { 322389238237 };";
string text2 = "key_in-set { password123 : 322389238237 };";
string pattern = "key_in-(get|set) { .* };";
var result1 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result1);
var result2 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result2);
I have to check if there is "set" or "get".
If the pattern finds "set" then it can only accept following pattern "text123 : 123456789", and if it finds "get" then should accept only "123456789".

You can use
key_in-(?:get|(set)) {(?(1) \w+ :) \w+ };
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\w+\s*};
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\d+\s*};
See the regex demo. The second one allows any amount of any whitespace between the elements and the third one allows only digits after : or as part of the get expression.
If the whole string must match, add ^ at the start and $ at the end of the pattern.
Details:
key_in- - a substring
(?:get|(set)) - get or set (the latter is captured into Group 1)
\s* - zero or more whitespaces
{ - a { char
(?(1)\s*\w+\s*:) - a conditional construct: if Group 1 matched, match one or more word chars enclosed with zero or more whitespaces and then a colon
\s*\w+\s* - one or more word chars enclosed with zero or more whitespaces
}; - a literal substring.

In the pattern that you tried key_in-(get|set) { .* }; you are matching either get or set followed by { until the last occurrence of } which could possibly also match key_in-get { }; };
As an alternative solution, you could use an alternation | specifying each of the accepted parts for the get and the set.
key_in-(?:get\s*{\s*\w+|set\s*{\s*\w+\s*:\s*\w+)\s*};
The pattern matches
key_in- Match literally
(?: Non capture group
get\s*{\s*\w+ Match get, { between optional whitespace chars and 1+ word chars
| Or
set\s*{\s*\w+\s*:\s*\w+ Match set, { between optional whitespace chars and word chars on either side with : in between.
) Close non capture group
\s*}; Match optional whitespace chars and };
Regex demo

Get value between parentheses [duplicate]

This question already has answers here:
How do I extract text that lies between parentheses (round brackets)?
(19 answers)
Closed 4 years ago.
I need to get the all strings that sit between open and closed parentheses. An example string is as follows
[CDATA[[(MyTag),xi(Tag2) ]OT(OurTag3).
The output needs to be an array with MyTag, Tag2, OurTag3 i.e. The strings need to have the parentheses removed.
The code below works but retains the parentheses. How do I adjust the regex pattern to remove the parentheses from the output?
string pattern = #"\(([^)]*)\)";
string MyString = "[CDATA[[(MyTag),xi(Tag2) ]OT(OurTag3)";
Regex re = new Regex(pattern);
foreach (Match match in re.Matches(MyString))
{
Console.WriteLine(match.Groups[1]); // print the captured group 1
}

You should be able to use the following:
(?<=\().+?(?=\))
(?<=() - positive lookbehind for (
.*? - non greedy match for the content
(?=)) - positive lookahead for )

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.

You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208

Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();

You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".

You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

Split string on whitespace ignoring parenthesis

I have a string such as this
(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))
I need to split it into two such as this
ed
Karlsruhe Univ. (TH) (Germany, F.R.)
Basically, ignoring whitespace and parenthesis within a parenthesis
Is it possible to use a regex to achieve this?

If you can have more parentheses, it's better to use balancing groups:
string text = "(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))";
var charSetOccurences = new Regex(#"\(((?:[^()]|(?<o>\()|(?<-o>\)))+(?(o)(?!)))\)");
var charSetMatches = charSetOccurences.Matches(text);
foreach (Match match in charSetMatches)
{
Console.WriteLine(match.Groups[1].Value);
}
ideone demo
Breakdown:
\(( # First '(' and begin capture
(?:
[^()] # Match all non-parens
|
(?<o> \( ) # Match '(', and capture into 'o'
|
(?<-o> \) ) # Match ')', and delete the 'o' capture
)+
(?(o)(?!)) # Fails if 'o' stack isn't empty
)\) # Close capture and last opening brace

\((.*?)\)\s*\((.*)\)
you will get the two values in two match groups \1 and \2
demo here : http://regex101.com/r/rP5kG2
and this is what you get if you search and replace with the pattern \1\n\2 which also seems to be what you need exactly

string str = "(ed) (Karlsruhe Univ. (TH) (Germany, F.R.))";
Regex re = new Regex(#"\((.*?)\)\s*\((.*)\)");
Match match = re.Match(str);

In general, No.
You can't describe recursive patterns in regular expression. ( Since it's not possible to recognize it with a finite automaton. )

Validate a Boolean expression with brackets in C#

I want to validate a string in C# that contains a Boolean expression with brackets.
The string should only contain numbers 1-9, round brackets, "OR" , "AND".
Examples of good strings:
"1 AND 2"
"2 OR 4"
"4 AND (3 OR 5)"
"2"
And so on...
I am not sure if Regular Expression are flexible enough for this task.
Is there a nice short way of achieving this in C# ?

It's probably simpler to do this with a simple parser. But you can do this with .NET regex by using balancing groups and realizing that if the brackets are removed from the string you always have a string matched by a simple expression like ^\d+(?:\s+(?:AND|OR)\s+\d+)*\z.
So all you have to do is use balancing groups to make sure that the brackets are balanced (and are in the right place in the right form).
Rewriting the expression above a bit:
(?x)^
OPENING
\d+
CLOSING
(?:
\s+(?:AND|OR)\s+
OPENING
\d+
CLOSING
)*
BALANCED
\z
((?x) makes the regex engine ignore all whitespace and comments in the pattern, so it can be made more readable.)
Where OPENING matches any number (0 included) of opening brackets:
\s* (?: (?<open> \( ) \s* )*
CLOSING matches any number of closing brackets also making sure that the balancing group is balanced:
\s* (?: (?<-open> \) ) \s* )*
and BALANCED performs a balancing check, failing if there are more open brackets then closed:
(?(open)(?!))
Giving the expression:
(?x)^
\s* (?: (?<open> \( ) \s* )*
\d+
\s* (?: (?<-open> \) ) \s* )*
(?:
\s+(?:AND|OR)\s+
\s* (?: (?<open> \( ) \s* )*
\d+
\s* (?: (?<-open> \) ) \s* )*
)*
(?(open)(?!))
\z
If you do not want to allow random spaces remove every \s*.
Example
See demo at IdeOne. Output:
matched: '2'
matched: '1 AND 2'
matched: '12 OR 234'
matched: '(1) AND (2)'
matched: '(((1)) AND (2))'
matched: '1 AND 2 AND 3'
matched: '1 AND (2 OR (3 AND 4))'
matched: '1 AND (2 OR 3) AND 4'
matched: ' ( 1 AND ( 2 OR ( 3 AND 4 ) )'
matched: '((1 AND 7) OR 6) AND ((2 AND 5) OR (3 AND 4))'
matched: '(1)'
matched: '(((1)))'
failed: '1 2'
failed: '1(2)'
failed: '(1)(2)'
failed: 'AND'
failed: '1 AND'
failed: '(1 AND 2'
failed: '1 AND 2)'
failed: '1 (AND) 2'
failed: '(1 AND 2))'
failed: '(1) AND 2)'
failed: '(1)() AND (2)'
failed: '((1 AND 7) OR 6) AND (2 AND 5) OR (3 AND 4))'
failed: '((1 AND 7) OR 6) AND ((2 AND 5 OR (3 AND 4))'
failed: ''

If you consider a boolean expression as generated by a formal grammar writing a parser is easier.
I made an open source library to interpret simple boolean expressions. You can take a look at it on GitHub, in particular look at the AstParser class and Lexer.

If you just want to validate the input string, you can write a simple parser.
Each method consumes a certain kind of input (digit, brackets, operator) and returns the remaining string after matching. An exception is thrown if no match can be made.
public class ParseException : Exception { }
public static class ExprValidator
{
public static bool Validate(string str)
{
try
{
string term = Term(str);
string stripTrailing = Whitespace(term);
return stripTrailing.Length == 0;
}
catch(ParseException) { return false; }
}
static string Term(string str)
{
if(str == string.Empty) return str;
char current = str[0];
if(current == '(')
{
string term = LBracket(str);
string rBracket = Term(term);
string temp = Whitespace(rBracket);
return RBracket(temp);
}
else if(Char.IsDigit(current))
{
string rest = Digit(str);
try
{
//possibly match op term
string op = Op(rest);
return Term(op);
}
catch(ParseException) { return rest; }
}
else if(Char.IsWhiteSpace(current))
{
string temp = Whitespace(str);
return Term(temp);
}
else throw new ParseException();
}
static string Op(string str)
{
string t1 = Whitespace_(str);
string op = MatchOp(t1);
return Whitespace_(op);
}
static string MatchOp(string str)
{
if(str.StartsWith("AND")) return str.Substring(3);
else if(str.StartsWith("OR")) return str.Substring(2);
else throw new ParseException();
}
static string LBracket(string str)
{
return MatchChar('(')(str);
}
static string RBracket(string str)
{
return MatchChar(')')(str);
}
static string Digit(string str)
{
return MatchChar(Char.IsDigit)(str);
}
static string Whitespace(string str)
{
if(str == string.Empty) return str;
int i = 0;
while(i < str.Length && Char.IsWhiteSpace(str[i])) { i++; }
return str.Substring(i);
}
//match at least one whitespace character
static string Whitespace_(string str)
{
string stripFirst = MatchChar(Char.IsWhiteSpace)(str);
return Whitespace(stripFirst);
}
static Func<string, string> MatchChar(char c)
{
return MatchChar(chr => chr == c);
}
static Func<string, string> MatchChar(Func<char, bool> pred)
{
return input => {
if(input == string.Empty) throw new ParseException();
else if(pred(input[0])) return input.Substring(1);
else throw new ParseException();
};
}
}

Pretty simply:
At first stage you must determ lexems (digit, bracket or operator) with simple string comparsion.
At second stage you must define variable of count of closed bracket (bracketPairs), which can be calculated by the following algorithm for each lexem:
if current lexem is '(', then bracketPairs++;
if current lexem is ')', then bracketPairs--.
Else do not modify bracketPairs.
At the end if all lexems are known and bracketPairs == 0 then input expression is valid.
The task is a bit more complex, if it's necesery to build AST.

what you want are "balanced groups", with them you can get all bracet definitions, then you just need a simple string parsing
http://blog.stevenlevithan.com/archives/balancing-groups
http://msdn.microsoft.com/en-us/library/bs2twtah.aspx#balancing_group_definition

ANTLR Parser Generator?
a short way of achieving this in C#
Although it may be an overkill if its just numbers and OR + AND

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to not match a string instance with a line break? [duplicate] - c#

Related

Building a regular expression in C#

Get value between parentheses [duplicate]

Match only the nth occurrence using a regular expression

Split string on whitespace ignoring parenthesis

Validate a Boolean expression with brackets in C#

Categories

Resources