How to check for nested square brackets? - c#

I have the input string below:
[text1][text2][text3]...[textN]
and I want to apply the following validation rule using regular expression:
The ] and [ cannot be included in other [].
For example, the next input strings are not correct:
[test1][test2[][test3]
[test1][test2]][test3]
[test1][test2[lol][test3]
[test1][test2]lol][test3]
I need to validate the input string because I am going to split it on [] groups (again using regular expression).

If you really want a regexp here is a quick one :
^(\[[^\[\]]+\])*$
Works on your examples
The principle here is for each bracket pair (\[.*\])* to contain any text that does NOT contains a bracket [^\[\]]+
In case you need to be able to have [test1][test2][][test3] working change the + with an * to allow the empty string to match

This should do the trick:
^(\[\w*\])*$
It means
^ start with
[ a [
\w* multiple word characters (\w matches [A-Za-z0-9_])
] a ]
* multiple times
$ end of string

Related

Remove some specific string with special character

Input String
string b = "14-03-002980 AND 14-03- [ ] (5)Description of 002981";
In output String I Want Result As
4-03-002980 AND 14-03-002981
I tried with below regex but it, not works
Regex.Replace(b, "[#&'(\\s)<>(5)Description of ]","");
Plaese, help me if anyone knows how to do this thing.
You can use this regex,
\s+\[.*(?=\b\d+)
and replace it with empty string.
You start with one or more whitespace then match a [ using \[ and then .* consumes all the characters greedily and only stops when it sees a number using positive look ahead (?=\b\d+)
Regex Demo

Regular expression in RegularExpressionAttribute behavior

I am using this regular expression: #"[ \]\[;\/\\\?:*""<>|+=]|^[.]|[.]$"
First part [ \]\[;\/\\\?:*""<>|+=] should match any of the characters inside the brackets.
Next part ^[.] should match if the string starts with a 'dot'
Last part [.]$ should match if the string ends with a 'dot'
This works perfectly fine if I use Regex.IsMatch() function. However if I use RegularExpressionAttribute in ASP.NET MVC, I always get invalid model. Does anyone have any clue why this behavior occurs?
Examples:
"abcdefg" should not match
".abcdefg" should match
"abc.defg" should not match
"abcdefg." should match
"abc[defg" should match
Thanks in advance!
EDIT:
The RegularExpressionAttribute Specifies that a data field value in ASP.NET Dynamic Data must match the specified regular expression..
Which means. I need the "abcdef" to match, and ".abcdefg" to not match. Basically negate the whole expression I have above.
You need to make sure the pattern matches the entire string.
In a general case, you may append/prepend the pattern with .*.
Here, you may use
.*[ \][;/\\?:*"<>|+=].*|^[.].*|.*[.]$
Or, to make it a bit more efficient (that is, to reduce backtracking in the first branch) a negated character class will perform better:
[^ \][;/\\?:*"<>|+=]*[ \][;\/\\?:*"<>|+=].*|^[.].*|.*[.]$
But it is best to put the branches matching text at the start/end of the string as first branches:
^[.].*|.*[.]$|[^ \][;/\\?:*"<>|+=]*[ \][;/\\?:*"<>|+=].*
NOTE: You do not have to escape / and ? chars inside the .NET regex since you can't use regex delimiters there.
C# declaration of the last pattern will look like
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*"
See this .NET regex demo.
RegularExpressionAttrubute:
[RegularExpression(
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*",
ErrorMessage = "Username cannot contain following characters: ] [ ; / \\ ? : * \" < > | + =")
]
Your regex is an alternation which matches 1 character out of 3 character classes, the first consisting of more than 1 characters, the second a dot at the start of the string and the third a dot at the end of the string.
It works fine because it does match one of the alternations, only not the whole string you want to match.
You could use 3 alternations where the first matches a dot followed by repeating the character class until the end of the string, the second the other way around but this time the dot is at the end of the string.
Or the third using a positive lookahead asserting that the string contains at least one of the characters [\][;\/\\?:*"<>|+=]
^\.[a-z \][;\/\\?:*"<>|+=]+$|^[a-z \][;\/\\?:*"<>|+=]+\.$|^(?=.*[\][;\/\\?:*"<>|+=])[a-z \][;\/\\?:*"<>|+=]+$
Regex demo

Capture data from string not containing duplicate group of characters and strings

I am trying to verify and extract data coming from API. I need to extract text between [] brackets which can be anywhere in the data. e.g.
This is [extract] message
This is message [extract]
[extract] this message
Regular expression, I was using for this as below was working fine
^[^\]\[]*?\[(?<description>[^\]\[]+)\][^\]\[]*?$
Now the data from API can be HTML encoded and have %5B instead of [ and %5D instead of ].
I updated regular expression to below:
^[^\]\[%5B%5D]*?(\[|%5B)(?<description>[^\]\[%5B%5D]+)(\]|%5D)[^\]\[%5B%5D]*?$/i
But it is not treating %5B and %5D as single atom. And therefore not able to extract text from following valid data:
This is [extract] message %
This is message 5 [extract]
[extract d] this message
And able to extract text from following invalid data:
[extract %5D this message
%5B extract ] this message
How can I treat %5B and %5D as atoms and correct above regex?
First of all, your first regex should be written as
^[^][]*\[(?<description>[^][]+)][^][]*$
Note there is no point escaping [ inside a character class and there is no need escaping ] inside the character class if it is the first char there and the ] outside the character class. Also, no need using lazy quantifiers *?, you can use * equally well.
Now, you should decode the string to the plain text and then run the above regex. If you do not want to do that, you will have to use a complex regex based on a tempered greedy token like
^(?:(?!%5[DB])[^][])*(?:%5B|\[)(?<description>(?:(?!%5[DB])[^][])+)(?:]|%5D)(?:(?!%5[DB])[^][])*$
See the regex demo (additional patterns are added since it is a multiline demo).
Regex explanation:
^ - string start
(?:(?!%5[DB])[^][])* - a tempered greedy token matching any 0+ symbols other than ] and [ (see [^][]) that is not the starting char for a %5B or %5D char sequence
(?:%5B|\[) - the leading delimiter, a %5B or [
(?<description>(?:(?!%5[DB])[^][])+) - The "description" group matching 1+ symbols other than ] and [ that is not the starting char for a %5B or %5D char sequence (NOTE: you might want to replace with with (?<description>(?s:.+?)) subpattern to check if that works for you better).
(?:]|%5D) - trailing delimiter, ] or %5D
(?:(?!%5[DB])[^][])* - see above (2nd line)
$ - end of string.

Using an escape character with a beginning wildcard in regex in c#

Below is a sample of an email I am using from a database:
2.2|[johnnyappleseed#example.com]
Every line is different, and it may or may not be an email, but it will always. I am trying to use regular expressions to get the information inside the brackets. Below is what I have been trying to use:
^\[\]$
Unfortunately, every time I try to use it, the expression isn't matching. I think the problem is using the escape characters, but I am not sure. If this is not how I use the escape characters with this, or if I am wrong completely, please let me know what the actual regex should be.
Close to yours is ^.*\[(.*)\]$:
^ start of the line
.* anything
\[ a bracket, indicating the start of the email
(.*) anything (the email), as a capturing group
\] a square bracked, indicating the end of the email
$ end of the line
Note that your Regex is missing the .* parts to match the things between the key characters [ and ].
Your regex - ^\[\]$ - matches a single string/line that only contains [], and you need to obtain a substring inbetween the square brackets somewhere further inside a larger string.
You can use
var rx = new Regex(#"(?<=\[)[^]]+");
Console.WriteLine(rx.Match(s).Value);
See regex demo
With (?<=\[) we find the position after [ and then we match every character that is not ] with [^]]+.
Another, non-regex way:
var s = "2.2|[johnnyappleseed#example.com]";
var ss = s.Split('|');
if (ss.GetLength(0) > 1)
{
var last = ss[ss.GetLength(0)-1];
if (last.Contains("[") && last.Contains("#")) // We assume there is an email
Console.WriteLine(last.Trim(new[] {'[', ']'}));
}
See IDEONE demo of both approaches

Fetch values between two [] from a string using Regular expressions

I have a string like as folows :
"channel_changes":[[1313571300,26.879846,true],[1313571360,26.901025,true]]
I want to extract each string in angular brace like 1313571300, 26.879846, true
through regular expression.
I have tried using
string regexPattern = #"\[(.*?)\]";
but that gives the first string as [[1313571420,26.901025,true]
i.e with one extra angular brace.
Please help me how can I achieve this.
This seemed to work in Expresso for me:
\[([\w,\.]*?)\]
Literal [
[1]: A numbered capture group. [[\w,.]*?]
- Any character in this class: [\w,.], any number of repetitions, as few as possible
Literal ]
The problem seemed to be the "." in your regex - since it was picking up the first literal "[" and considering the following "[" in your input to be valid as the next character.
I constrained it to just alphanumeric characters, commas and literal full-stops (period mark), since that's all that was present in your example. You could go further and really specify the format of the data inside those inner square brackets assuming it's consistent, and end up with something more like this:
\[[0-9.]+,[0-9.]+,(true|false)\]
Example C# code:
var matches = Regex.Matches("\"channel_changes\":[[1313571300,26.879846,true],[1313571360,26.901025,true]]", #"\[([\w,\.]*?)\]");
foreach (var match in matches)
{
Console.WriteLine(match);
}
Try this:
#"\[+([^\]]+)\]+"
"[^]]+" - it means any character except right square bracket
Try this
\[([^\[\]]*)\]
See it here online on Regexr
[^\[\]]* is a negated character class, means match any character but [ and ]. With this construct you don't need the ? to make your * ungreedy.

Categories