Fetch values between two [] from a string using Regular expressions - c#

I have a string like as folows :
"channel_changes":[[1313571300,26.879846,true],[1313571360,26.901025,true]]
I want to extract each string in angular brace like 1313571300, 26.879846, true
through regular expression.
I have tried using
string regexPattern = #"\[(.*?)\]";
but that gives the first string as [[1313571420,26.901025,true]
i.e with one extra angular brace.
Please help me how can I achieve this.

This seemed to work in Expresso for me:
\[([\w,\.]*?)\]
Literal [
[1]: A numbered capture group. [[\w,.]*?]
- Any character in this class: [\w,.], any number of repetitions, as few as possible
Literal ]
The problem seemed to be the "." in your regex - since it was picking up the first literal "[" and considering the following "[" in your input to be valid as the next character.
I constrained it to just alphanumeric characters, commas and literal full-stops (period mark), since that's all that was present in your example. You could go further and really specify the format of the data inside those inner square brackets assuming it's consistent, and end up with something more like this:
\[[0-9.]+,[0-9.]+,(true|false)\]
Example C# code:
var matches = Regex.Matches("\"channel_changes\":[[1313571300,26.879846,true],[1313571360,26.901025,true]]", #"\[([\w,\.]*?)\]");
foreach (var match in matches)
{
Console.WriteLine(match);
}

Try this:
#"\[+([^\]]+)\]+"
"[^]]+" - it means any character except right square bracket

Try this
\[([^\[\]]*)\]
See it here online on Regexr
[^\[\]]* is a negated character class, means match any character but [ and ]. With this construct you don't need the ? to make your * ungreedy.

Related

Regular expression in RegularExpressionAttribute behavior

I am using this regular expression: #"[ \]\[;\/\\\?:*""<>|+=]|^[.]|[.]$"
First part [ \]\[;\/\\\?:*""<>|+=] should match any of the characters inside the brackets.
Next part ^[.] should match if the string starts with a 'dot'
Last part [.]$ should match if the string ends with a 'dot'
This works perfectly fine if I use Regex.IsMatch() function. However if I use RegularExpressionAttribute in ASP.NET MVC, I always get invalid model. Does anyone have any clue why this behavior occurs?
Examples:
"abcdefg" should not match
".abcdefg" should match
"abc.defg" should not match
"abcdefg." should match
"abc[defg" should match
Thanks in advance!
EDIT:
The RegularExpressionAttribute Specifies that a data field value in ASP.NET Dynamic Data must match the specified regular expression..
Which means. I need the "abcdef" to match, and ".abcdefg" to not match. Basically negate the whole expression I have above.
You need to make sure the pattern matches the entire string.
In a general case, you may append/prepend the pattern with .*.
Here, you may use
.*[ \][;/\\?:*"<>|+=].*|^[.].*|.*[.]$
Or, to make it a bit more efficient (that is, to reduce backtracking in the first branch) a negated character class will perform better:
[^ \][;/\\?:*"<>|+=]*[ \][;\/\\?:*"<>|+=].*|^[.].*|.*[.]$
But it is best to put the branches matching text at the start/end of the string as first branches:
^[.].*|.*[.]$|[^ \][;/\\?:*"<>|+=]*[ \][;/\\?:*"<>|+=].*
NOTE: You do not have to escape / and ? chars inside the .NET regex since you can't use regex delimiters there.
C# declaration of the last pattern will look like
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*"
See this .NET regex demo.
RegularExpressionAttrubute:
[RegularExpression(
#"^[.].*|.*[.]$|[^ \][;/\\?:*""<>|+=]*[ \][;/\\?:*""<>|+=].*",
ErrorMessage = "Username cannot contain following characters: ] [ ; / \\ ? : * \" < > | + =")
]
Your regex is an alternation which matches 1 character out of 3 character classes, the first consisting of more than 1 characters, the second a dot at the start of the string and the third a dot at the end of the string.
It works fine because it does match one of the alternations, only not the whole string you want to match.
You could use 3 alternations where the first matches a dot followed by repeating the character class until the end of the string, the second the other way around but this time the dot is at the end of the string.
Or the third using a positive lookahead asserting that the string contains at least one of the characters [\][;\/\\?:*"<>|+=]
^\.[a-z \][;\/\\?:*"<>|+=]+$|^[a-z \][;\/\\?:*"<>|+=]+\.$|^(?=.*[\][;\/\\?:*"<>|+=])[a-z \][;\/\\?:*"<>|+=]+$
Regex demo

Using an escape character with a beginning wildcard in regex in c#

Below is a sample of an email I am using from a database:
2.2|[johnnyappleseed#example.com]
Every line is different, and it may or may not be an email, but it will always. I am trying to use regular expressions to get the information inside the brackets. Below is what I have been trying to use:
^\[\]$
Unfortunately, every time I try to use it, the expression isn't matching. I think the problem is using the escape characters, but I am not sure. If this is not how I use the escape characters with this, or if I am wrong completely, please let me know what the actual regex should be.
Close to yours is ^.*\[(.*)\]$:
^ start of the line
.* anything
\[ a bracket, indicating the start of the email
(.*) anything (the email), as a capturing group
\] a square bracked, indicating the end of the email
$ end of the line
Note that your Regex is missing the .* parts to match the things between the key characters [ and ].
Your regex - ^\[\]$ - matches a single string/line that only contains [], and you need to obtain a substring inbetween the square brackets somewhere further inside a larger string.
You can use
var rx = new Regex(#"(?<=\[)[^]]+");
Console.WriteLine(rx.Match(s).Value);
See regex demo
With (?<=\[) we find the position after [ and then we match every character that is not ] with [^]]+.
Another, non-regex way:
var s = "2.2|[johnnyappleseed#example.com]";
var ss = s.Split('|');
if (ss.GetLength(0) > 1)
{
var last = ss[ss.GetLength(0)-1];
if (last.Contains("[") && last.Contains("#")) // We assume there is an email
Console.WriteLine(last.Trim(new[] {'[', ']'}));
}
See IDEONE demo of both approaches

c# Regular expression for words in brackets with separator

I need to parse a text and check if between all squared brackets is a - and before and after the - must be at least one character.
I tried the following code, but it doesn't work. The matchcount is to large.
Regex regex = new Regex(#"[\.*-.*]");
MatchCollection matches = regex.Matches(textBox.Text);
SampleText:
Node
(Entity [1-5])
Figured I might as well provide an answer... To reiterate my points (with modifications):
* matches 0 or more occurences. You want + probably.
square brackets are special characters and will need to be escaped. They are used to define sets of characters.
You will probably want to exclude [ and ] from your "any character" matching
Put this all together and the following should do you better:
Regex regex = new Regex(#"\[[^-[\]]+-[^[\]]+\]");
Although its a little messy the key thing is that [^[\]] means any character except a square bracket. [^-[\]] means that but also disallows -. This is an optimisation and not required but it just reduces the work the regular expression engine has to do when working out the match. Thanks to ridgerunner for pointing out this optimisation.
Square brackets mean something special in Regexes, you'll need to escape them. Additionally, if you want at least one character then you need to use + rather than *.
Regex regex = new Regex(#"\[.+-.+\]");
MatchCollection matches = regex.Matches(textBox.Text);
string txt = "(Entity [1-5])";
Regex reg = new Regex(#"\[.+\-.+\]");
if it is for #:
string txt = "(Entity [1-5])";
Regex reg = new Regex(#"\[\d+\-\d+\]");

Problem with regex, how do I get all with \S up until a special character?

Ive got the text:
192.168.20.31 Url=/flash/56553550_hi.mp4?token=(uniquePlayerReference=81781956||videoId=1)
And im trying to get the uniquePlayerReference and the videoId
Ive tried this regular expression:
(?<=uniquePlayerReference=)\S*
but it matches:
81781956||videoId=1)
And then I try and get the video id with this:
(?<=videoId=)\S*
But it matches the ) after the videoId.
My question is two fold:
1) How do I use the \S character and get it to stop at a character? (essentially what is the regex to do what i want) I cant get it to stop at a defined character, I think I need to use a positive lookahead to match but not include the double pipe).
2) When should I use brackets?
The problem is the mul;tiplicity operator you have here - the * - which means "as many as possible". If you have an explicit number in mind you can use the operator {a,b} where a is a minimum and b a maximum number fo matches, but if you have an unknown number, you can't use \S (which is too generic).
As for brackets, if you mean () you use them to capture a part of a match for backreferencing. Bit complicated, think you need to use a reference for that.
I think you want something like this:
/uniquePlayerReference=(\d+)||videoId=(\d+)/i
and then backreference to \1 and \2 respectively.
Given that both id's are numeric you are probably better off using \d instead of \S. \d only matches numeric digits whereas \S matches any non-whitespace character.
What you might also do is a non gready match up till the character you do not want to match like so:
uniquePlayerReference=(.*?)\|\|videoId=(.*?)\)
Note that I have escaped both the | and ) characters because otherwise they would have a special meaning inside a regex.
In C# you would use this like so: (which also answers your question what the brackets are for, they are meant to capture parts of the matched result).
Regex regex = new Regex(#"uniquePlayerReference=(.*?)\|\|videoId=(.*?)\)");
Match match = regex.Match(
"192.168.20.31 Url=/flash/56553550_hi.mp4?token=(uniquePlayerReference=81781956||videoId=1)");
if (match.Success)
{
string playerReference = match.Groups[1].Value;
string videoId = match.Groups[2].Value;
// Etc.
}
If the ID isn't just digits then you could use [^|] instead of \S, i.e.
(?<=uniquePlayerReference=)[^|]*
Then you can use
(?<=videoId=)[^)]*
For the video ID
The \S means it matches any non-whitespace character, including the closing parenthesis. So if you had to use \S, you would have to explicitly say stop at the closing parenthesis, like this:
videoId=(\S+)\)
Therefore, you are better off using the \d, since what you are looking for are numeric:
uniquePlayerReference=(\d+)
videoId=(\d+)

Regex battle between maximum and minimum munge

Greetings, I have file with the following strings:
string.Format("{0},{1}", "Having \"Two\" On The Same Line".Localize(), "Is Tricky For regex".Localize());
my goal is to get a match set with the two strings:
Having \"Two\" On The Same Line
Is Tricky For regex
My current regex looks like this:
private Regex CSharpShortRegex = new Regex("\"(?<constant>[^\"]+?)\".Localize\\(\\)");
My problem is with the escaped quotes in the first line I end up stopping at the quote and I get:
On The Same Line
Is Tricky For This Style Too
however attempting to ignore the escaped quotes is not working out because it makes the Regex greedy and I get
Having \"Two\" On The Same Line".Localize(), "Is Tricky For regex"
We seem to be caught between maximum and minimum munge. Is there any hope? I have some backup plans. Can you Regex backwards? that would make it easier because I can start with the "()ezilacoL."
EDIT:
To clarify. This is my lone edge case. Most of the time the string sits alone like:
var myString = "Hot Patootie".Localize()
This one works for me:
\"((?:[^\\"]|(?:\\\"))*)\"\.Localize\(\)
Tested on http://www.regexplanet.com/simple/index.html against a number of strings with various escaped quotes.
Looks like most of us who answered this one had the same rough idea, so let me explain the approach (comments after #s):
\" # We're looking for a string delimited by quotation marks
( # Capture the contents of the quotation marks
(?: # Start a non-capturing group
[^\\"] # Either read a character that isn't a quote or a slash
|(?:\\\") # Or read in a slash followed by a quote.
)* # Keep reading
) # End the capturing group
\" # The string literal ends in a quotation mark
\.Localize\(\) # and ends with the literal '.Localize()', escaping ., ( and )
For C# you'll need to escape the slashes twice (messy):
\"((?:[^\\\\\"]|(?:\\\\\"))*)\"\\.Localize\\(\\)
Mark correctly points out that this one doesn't match escaped characters other than quotation marks. So here's a better version:
\"((?:[^\\"]|(?:\\")|(?:\\.))*)\"\.Localize\(\)
And its slashed-up equivalent:
\"((?:[^\\\\\"]|(?:\\\\\")|(?:\\\\.))*)\"\\.Localize\\(\\)
Works the same way, except it has a special case that if encounters a slash but it can't match \", it just consumes the slash and the following character and moves on.
Thinking about it, it's better to just consume two characters at every slash, which is effectively Mark's answer so I won't repeat it.
Here's the regular expression you need:
#"""(?<constant>(\\.|[^""])*)""\.Localize\(\)"
A test program:
using System;
using System.Text.RegularExpressions;
using System.IO;
class Program
{
static void Main()
{
Regex CSharpShortRegex =
new Regex(#"""(?<constant>(\\.|[^""])*)""\.Localize\(\)");
foreach (string line in File.ReadAllLines("input.txt"))
foreach (Match match in CSharpShortRegex.Matches(line))
Console.WriteLine(match.Groups["constant"].Value);
}
}
Output:
Having \"Two\" On The Same Line
Is Tricky For regex
Hot Patootie
Notice that I have used #"..." to avoid having to escape backslashes inside the regular expression. I think this makes it easier to read.
Update:
My original answer (below the horizontal rule) has a bug: regular-expression matchers attempt alternatives in left-to-right order. Having [^"] as the first alternative allows it to consume the backslash, but then the next character to be matched is a quote, which prevents the match from proceeding.
Incompatibility note: Given the pattern below, perl backtracks to the other alternative (the escaped quote) and successfully finds a match for the Having \"Two\" On The Same Line case.
The fix is to try an escaped quote first and then a non-quote:
var CSharpShortRegex =
new Regex("\"(?<constant>(\\\\\"|[^\"])*)\"\\.Localize\\(\\)");
or if you prefer the at-string form:
var CSharpShortRegex =
new Regex(#"""(?<constant>(\\""|[^""])*)""\.Localize\(\)");
Allow for escapes:
private Regex CSharpShortRegex =
new Regex("\"(?<constant>([^\"]|\\\\\")*)\"\\.Localize\\(\\)");
Applying one level of escaping to make the pattern easier to read, we get
"(?<constant>([^"]|\\")*)"\.Localize\(\)
That is, a string starts and ends with " characters, and everything between is either a non-quote or an escaped quote.
Looks like you're trying to parse code so one approach might be to evaluate the code on the fly:
var cr = new CSharpCodeProvider().CompileAssemblyFromSource(
new CompilerParameters { GenerateInMemory = true },
"class x { public static string e() { return " + input + "}}");
var result = cr.CompiledAssembly.GetType("x")
.GetMethod("e").Invoke(null, null) as string;
This way you could handle all kinds of other special cases (e.g. concatenated or verbatim strings) that would be extremely difficult to handle with regex.
new Regex(#"((([^#]|^|\n)""(?<constant>((\\.)|[^""])*)"")|(#""(?<constant>(""""|[^""])*)""))\s*\.\s*Localize\s*\(\s*\)", RegexOptions.Compiled);
takes care of both simple and #"" strings. It also takes into account escape sequences.

Categories