Regex to find anything after ']' and before '[' - c#

I have a regex working to find anything between the square brackets in a text file, which is this:
Regex squareBrackets = new Regex(#"\[(.*?)\]");
And I want to create a regex that is basically the opposite way round to select whatever is after what's in the square brackets. So I thought just swap them round?
Regex textAfterTitles = new Regex(#"\](.*?)\[");
But this does not work and Regex's confuse me - can anyone help?
Cheers

You can use a lookbehind:
var textAfterTiles = new Regex(#"(?<=\[(.*?)\]).*");
You can combine it with a lookahead if you have multiple such bracketed groups, such as:
var textAfterTiles = "before [one] inside [two] after"
And you wanted to match " inside " and " after", you could do this:
new Regex(#"(?<=\[(.*?)\])[^\[]*");

The same \[(.*?)] regex (I'd just remove the redundant escaping backslash before ]), or even better regex is \[([^]]*)], can be used to split the text and get the text outside [...] (if used with RegexOptions.ExplicitCapture modifier):
var data = "A bracket is a tall punctuation mark[1] typically used in matched pairs within text,[2] to set apart or interject other text.";
Console.WriteLine(String.Join("\n", Regex.Split(data,#"\[([^]]*)]",RegexOptions.ExplicitCapture)));
Output of the C# demo:
A bracket is a tall punctuation mark
typically used in matched pairs within text,
to set apart or interject other text.
The RegexOptions.ExplicitCapture flag makes the capturing group inside the pattern non-capturing, and thus, the captured text is not output into the resulting split array.
If you do not have to keep the same regex, just remove the capture group, use \[[^]]*] for splitting.

You can try this one
\]([^\]]*)\[

Related

How separate words by excluding square brackets in C# using regex

I have a sentence like "Hello there [Gesture : 2.5] How are you" and I have separate the words by avoiding the whole square brackets. For example "Hello there How are you".
I tried to separate the words before the colon but that's not what I want. This is the code I've tried.
MatchCollection matches2 = Regex.Matches(avatarVM.AvatarText, #"([\w\s\[\]]+)");
The above code only separate the words before ":" which also include the opening square bracket and the word after. I want to avoid the whole square brackets
Perhaps invert the problem and concentrate on what you want to remove rather than what you want to keep. For example, this will match the brackets and a space either side, and replace with a single space:
// Hello there How are you
var output = Regex.Replace("Hello there [Gesture : 2.5] How are you", #" \[.+\] ", " ");
If required, you could use a slightly more complicated version that can handle the square brackets not necessarily being surrounded by spaces, for example at the start or end of the input string:
var output = Regex.Replace(
"Hello there How are you [Gesture : 2.5]", // input string
#"[^\w]{0,1}\[.+\]([^\w]){0,1}", // our pattern to match the brackets and what's in between them
"$1"); // replace with the first capture group - in this case the character that comes after
And if you wanted to you could use the overload of Replace taking a MatchEvaluator delegate to have more control over how it is replaced in the string and with what depending on what your needs are.
Assuming you want the fragments before and after the brackets as separate entries in a collection:
Regex.Split(avatarVM.AvatarText, #"\[[^\]]+\]");
This will also work if there are multiple bracketed fragments in the string. You may want to .Trim() each entry.
var output = Regex.Replace("Hello there [Gesture : 2.5] How are you", #"\[.*?\] ", string.Empty);

My regular expression isn't returning what I need

I have a block of text as such.
google.sbox.p50 && google.sbox.p50(["how to",[["how to tie a tie",0],["how to train your dragon 2 trailer",0],["how to do the cup song",0],["how to get a six pack in 3 minutes",0],["how to make a paper gun that shoots",0],["how to basic",0],["how to love lil wayne",0],["how to sing like your favorite artist",0],["how to be a heartbreaker marina and the diamonds",0],["how to tame a horse in minecraft",0]],{"q":"XJW--0IKH6sqOp0ME-x5B7b_5wY","j":"5","k":1}])
Using \\[([^]]+)\\] I am able to get everything I need, but with a little extra that I don't. I do not need the ["how to",[[. I only need the blocks that are formatted like,
["how to tie a tie",0]
Can someone please help me modify my expression to only get what I need? I've been at it for hours and I can't grasp the idea of RegEx.
Put both the opening and closing square brackets in the negated character class?
\\[([^][]+)\\]
\\[ matches a literal [
\\] matches a literal ]
[^][] is a negated class, which for instance matches any character except ][. It might be a little difficult to see it, but it's equivalent to [^\\]\\[]. Here the double escapes are not required because you are using a character class (just like \\. is equivalent to [.])
([^][]+) captures everything within square brackets, making sure there's no ] or [ inside.
In C#, you can use the # symbol to avoid having to double escape everytime and using this makes the regex like that:
var regex = new Regex(#"\[([^][]+)\]");
Note: This regex will capture everything within square brackets. If you wish to specificly get the format ["how to tie a tie",0], you can be more precise. After all, the regex will only match stuff you make it match:
var regex = new Regex(#"\["[^"]+",0\]");
Here, we have another negated character class: [^"]. This will match any character which is not a quote character.
This one assumes that the digit is always 0, as depicted in your sample text block. If you have multiple possibilities of numbers, you can use the character class [0-9]+:
var regex = new Regex(#"\["[^"]+",[0-9]+\]");
You can use \d+ as well, but this character class also matches other characters which may or may not render the regex worse. If you want to be more even cautious by allowing possible spaces, tabs, newlines, form feeds in between the characters, you can use this regex:
var regex = new Regex(#"\[\s*"[^"]+"\s*,\s*[0-9]+\s*\]");
Conclusion, there might be many regexes which suit what you need, just make sure you know how your data is coming through so you can pick one which has the right amount of freeway.
I think this is what you are looking for to match the format of ["how to tie a tie",0]:
(\["[^"]+",\d\])
( ) - around the whole thing so it all gets captured in this group
\[" - find ["
[^"]+ - find one or more of anything except "
", - find ",
\d - find a number, if you want more than just a single digit, do \d+
\] - match the ending ]
The only variable things in this regex are whatever is within the quotes ([^"]+) and the number (\d+).
Demo
If you don't want the square brackets in the capture group, you can do it like this:
\[("[^"]+",\d+)\]
I assume you don't want to match if there are quotes within your quotes as it would probably break whatever purpose you are using it for, but if you do, this should work:
\[("[^[\]]+",\d+)\]
You must use this pattern
#"\[[^][]+\]"
More informations about square brackets here.
I think you need this one: (\[[^\[^]+?])
What you did mis is the ? (smallest match) and exclude any [ or ]
Seemingly the text in the outer brackets is a JSON representation of an object. Instead of a regular expression I'd just:
strip off the stuff before the bracket + first bracket (google.sbox.p50 && google.sbox.p50() plus strip off the trailing bracket ). There are more ways to do this, and it can be more efficient than regex.
JSON parse the remaining inner part.
From that point you have the object representation, you can leave out the first element of the array what you don't need, plus you have everything else in a traversable form.
There's the session information at the end along with parameters anyway (in {} brackets), so in the end you may end up parsing stuff anyway. Better not to reinvent the wheel (JSON parsing).

c# Regular expression for words in brackets with separator

I need to parse a text and check if between all squared brackets is a - and before and after the - must be at least one character.
I tried the following code, but it doesn't work. The matchcount is to large.
Regex regex = new Regex(#"[\.*-.*]");
MatchCollection matches = regex.Matches(textBox.Text);
SampleText:
Node
(Entity [1-5])
Figured I might as well provide an answer... To reiterate my points (with modifications):
* matches 0 or more occurences. You want + probably.
square brackets are special characters and will need to be escaped. They are used to define sets of characters.
You will probably want to exclude [ and ] from your "any character" matching
Put this all together and the following should do you better:
Regex regex = new Regex(#"\[[^-[\]]+-[^[\]]+\]");
Although its a little messy the key thing is that [^[\]] means any character except a square bracket. [^-[\]] means that but also disallows -. This is an optimisation and not required but it just reduces the work the regular expression engine has to do when working out the match. Thanks to ridgerunner for pointing out this optimisation.
Square brackets mean something special in Regexes, you'll need to escape them. Additionally, if you want at least one character then you need to use + rather than *.
Regex regex = new Regex(#"\[.+-.+\]");
MatchCollection matches = regex.Matches(textBox.Text);
string txt = "(Entity [1-5])";
Regex reg = new Regex(#"\[.+\-.+\]");
if it is for #:
string txt = "(Entity [1-5])";
Regex reg = new Regex(#"\[\d+\-\d+\]");

Matching a substring of any length and characters using RegEx

I would like to be able to match and then extract all substrings in the following string using regex in c#:
"2012-05-15 00:49:02 192.168.100.10 POST /Microsoft-Server-ActiveSync/default.eas User=nikced&DeviceId=ApplDNWGRKZQDTC0&DeviceType=iPhone&Cmd=Ping&Log=V121_Sst8_LdapC0_LdapL0_RpcC31_RpcL50_Hb3540_Erq1_Pk1728465481_S2_ 443 redcloud\nikced 94.234.170.42 Apple-iPhone4C1/902.179 200 0 64 3140491"
Since it's a logfile it the regex should be able to handle any line that is of a similar type.
In this case, the preferred output to a collection should be:
2012-05-15
00:49:02
192.168.100.10
/Microsoft-Server-ActiveSync/default.eas
User=nikced&DeviceId=ApplDNWGRKZQDTC0&DeviceType=iPhone&Cmd=Ping&Log=V121_Sst8_LdapC0_LdapL0_RpcC31_RpcL50_Hb3540_Erq1_Pk1728465481_S2_
443
redcloud\nikced
94.234.170.42
Apple-iPhone4C1/902.179
200
0
64
3140491
Appreciate any answer using C#, .net and Regex to extract the above substrings into a collection (MatchCollection preferred). All log lines follows the same format and pattern.
Incredibly complex regex incoming:
logFile.Split(' ');
This will give you an array that you can iterate through to retrieve all of the "lines" which are separated by a space
string[] lines = log.Split(' ');
You don't need to use a Regex. You can simply use String.Split Method, and specify space as separator:
string [] substrings = line.Split(new Char [] {' '});
If you need to identify the kind of each part, then you should specify what you need to find, and a regex can be created for it.
Anyway, if you really want to use a Regex, do this:
Regex re = new Regex (#"(?:(?<s>[^ ]+)(?: |$))*");
This will give you all the captures in the "s" group, when you call the Match method.
As the OP pointed out in a comment that the separator can be anything appart from a single space, then the possible separators should be included in the (?: |$) and the [^ ] parts of the expression. I.e. if space as well as tab are possible separators, replace that part with (?: |\t|$) and [^ \t]. If you need to accept more than one of those characters as separators, add a + after the () group:
(?:(?<s>[^ \t]+)(?: |\t|$)+)*
The fastest and most obvious way is to use String.Split:
string[] substrings = result = line->Split( nullptr, StringSplitOptions::RemoveEmptyEntries );
But if you insist on a MatchCollection then this will do what you want
MatchCollection ^ substrings = Regex.Matches(line, "\\S+")
Really, you just need to break this down into the parts.
First, the date. Will it always be in YYYY-MM-DD format? Could it be possible that it will be different based on region/culture settings?
(?<LogDate>dddd-dd-dd)
Next, you have the time. Same thing:
(?<LogTime>dd:dd:dd)
Next, I'm assuming this is the web method that was actually called? Not entirely sure, since you haven't really explained how the data is laid out. However, I'm assuming it's either going to be either POST or GET, so that's what we're going to do next...
(?<LogMethod>POST|GET)
Just do this for every part of the log line you're interested in, and you'll be set. IE:
(?<LogDate>dddd-dd-dd) (?<LogTime>dd:dd:dd) (?<LogMethod>POST|GET)...
If you want to anchor to the start/end of the line, be sure to use ^ and $ respectively. When you get the Matches, you can get the values from each group by indexing the Groups property with the named group (such as match.Groups["LogMethod"].Value). Good luck!

Regex - Get all words that are not wrapped with a "/"

Im really trying to learn regex so here it goes.
I would really like to get all words in a string which do not have a "/" on either side.
For example, I need to do this to:
"Hello Great /World/"
I need to have the results:
"Hello"
"Great"
is this possible in regex, if so, how do I do it? I think i would like the results to be stored in a string array :)
Thank you
Just use this regular expression \b(?<!/)\w+(?!/)\b:
var str = "Hello Great /World/ /I/ am great too";
var words = Regex.Matches(str, #"\b(?<!/)\w+(?!/)\b")
.Cast<Match>()
.Select(m=>m.Value)
.ToArray();
This will get you:
Hello
Great
am
great
too
var newstr = Regex.Replace("Hello Great /World/", #"/(\w+?)/", "");
If you realy want an array of strings
var words = Regex.Matches(newstr, #"\w+")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
I would first split the string into the array, then filter out matching words. This solution might also be cleaner than a big regexp, because you can spot the requirements for "word" and the filter better.
The big regexp solution would be something like word boundary - not a slash - many no-whitespaces - not a slash - word boundary.
I would use a regex replace to replace all /[a-zA-Z]/ with '' (nothing) then get all words
Try this one : (Click here for a demo)
(\s(?<!/)([A-Za-z]+)(?!/))|((?<!/)([A-Za-z]+)(?!/)\s)
Using this example excerpt:
The /character/ "_" (underscore/under-strike) can be /used/ in /variable/ names /in/ many /programming/ /languages/, while the /character/ "/" (slash/stroke/solidus) is typically not allowed.
...this expression matches any string of letters, numbers, underscores, or apostrophes (fairly typical idea of a "word" in English) that does not have a / character both before and after it - wrapped with a "/"
\b([\w']+)\b(?<=(?<!/)\1|\1(?!/))
...and is the purest form, using only one character class to define "word" characters. It matches the example as follows:
Matched Not Matched
------------- -------------
The character
_ used
underscore variable
under in
strike programming
can languages
be character
in stroke
names
many
while
the
slash
solidus
is
typically
not
allowed
If excluding /stroke/, is not desired, then adding a bit to the end limitation will allow it, depending upon how you want to define the beginning of a "next" word:
\b([\w']+)\b(?<=(?<!/)\1|\1(?!/([^\w]))).
changes (?!/) to (?!/([^\w])), which allows /something/ if it does have a letter, number, or underscore immediately after it. This would move stroke from the "Not Matched" to the "Matched" list, above.
note: \w matches uppercase or lowercase letters, numbers and the underscore character
If you want to alter your concept for "word" from the above, simply exchange the characters and shorthand character classes contained in the [\w'] part of the expression to something like [a-zA-Z'] to exclude digits or [\w'-] to include hyphens, which would capture under-strike as a single match, rather than two separate matches:
\b([\w'-]+)\b(?<=(?<!/)\1|\1(?!/([^\w])))
IMPORTANT ALTERNATIVE!!! (I think)
I just thought of an alternative to Matching any words that are not wrapped with / symbols: simply consume all of these symbols and words that are surrounded in them (splitting). This has a few benefits: no lookaround means this could be used in more contexts (JavaScript does not support lookbehind and some flavors of regex don't support lookaround at all) while increasing efficiency; also, using a split expression means a direct result of a String array:
string input = "The /character/ "_" (underscore/under-strike) can be..."; //etc...
string[] resultsArray = Regex.Split(input, #"([^\w'-]+?(/[\w]+/)?)+");
voila!

Categories