How to do the complicated replacement using regex in C# - c#

I encountered a deeply nested curly braces string, like this:
{{{text1},{text2}},{{text3},{text4}}}
I just want to keep the inner most curly braces and replace another curly braces with square brakets, so the result will looks like this:
[[{text1},{text2}],[{text3},{text4}]]
how to do this replacement with Regex.Replace() function in C#?
thanks

This will take two replacement, first replace every { with [ which is followed by { and second replace every } with ] which is preceded by a non-word boundary \B. Try this C# code,
string input = "{{{text1},{text2}},{{text3},{text4}}}";
Regex regex = new Regex("{(?={)");
string result = regex.Replace(input, "[");
regex = new Regex("\\B}");
result = regex.Replace(result, "]");
Console.WriteLine("Result: " + result);
Prints,
Result: [[{text1},{text2}],[{text3},{text4}]]
Online C# demo
You can even use a positive look behind (?<=})} instead of \\B} for second replacement but I deliberately avoided it to keep the solution simple and to make it work even for languages that don't support look behinds but using (?<=})} will be strictly better than \\B}. Choose as you like.

Related

C# Regex.Split, How do I split string into surrounded by parenthesis and not surrounded by parenthesis?

Given the string "I'm not surrounded by parenthesis (but I am) so what.", how would I use regex to split it into a string[] that looks like this:
[0] = "I'm not surrounded by parenthesis "
[1] = "(but I am)"
[2] = " so what."
Alternatively, Is there a way to replace (but I am) with the same string but surrounded by a tag? Say for example [b](but I am)[/b].
I've tried using the Regex.Replace method with \(([^\)]+)\) as the regular expression. But I need a replacement string in the method's parameters. Given that the replacement string should be whatever matches the RegEx, I don't quite know how to execute this method.
For your first question, this should give you what you wish:
(\([^)]+?\)|[^(]+)
Just make sure you use Regex.Matches, not Regex.Match, since it will return a match for each parentheses and non-parentheses groups.
For your second question, a simple call like this should do it:
string s = ...;
Regex.Replace(s, #"\([^)]+?\)", #"[b]$1[/b]");

How separate words by excluding square brackets in C# using regex

I have a sentence like "Hello there [Gesture : 2.5] How are you" and I have separate the words by avoiding the whole square brackets. For example "Hello there How are you".
I tried to separate the words before the colon but that's not what I want. This is the code I've tried.
MatchCollection matches2 = Regex.Matches(avatarVM.AvatarText, #"([\w\s\[\]]+)");
The above code only separate the words before ":" which also include the opening square bracket and the word after. I want to avoid the whole square brackets
Perhaps invert the problem and concentrate on what you want to remove rather than what you want to keep. For example, this will match the brackets and a space either side, and replace with a single space:
// Hello there How are you
var output = Regex.Replace("Hello there [Gesture : 2.5] How are you", #" \[.+\] ", " ");
If required, you could use a slightly more complicated version that can handle the square brackets not necessarily being surrounded by spaces, for example at the start or end of the input string:
var output = Regex.Replace(
"Hello there How are you [Gesture : 2.5]", // input string
#"[^\w]{0,1}\[.+\]([^\w]){0,1}", // our pattern to match the brackets and what's in between them
"$1"); // replace with the first capture group - in this case the character that comes after
And if you wanted to you could use the overload of Replace taking a MatchEvaluator delegate to have more control over how it is replaced in the string and with what depending on what your needs are.
Assuming you want the fragments before and after the brackets as separate entries in a collection:
Regex.Split(avatarVM.AvatarText, #"\[[^\]]+\]");
This will also work if there are multiple bracketed fragments in the string. You may want to .Trim() each entry.
var output = Regex.Replace("Hello there [Gesture : 2.5] How are you", #"\[.*?\] ", string.Empty);

How to use regex to match anything from A to B, where B is not preceeded by C

I'm having a hard time with this one. First off, here is the difficult part of the string I'm matching against:
"a \"b\" c"
What I want to extract from this is the following:
a \"b\" c
Of course, this is just a substring from a larger string, but everything else works as expected. The problem is making the regex ignore the quotes that are escaped with a backslash.
I've looked into various ways of doing it, but nothing has gotten me the correct results. My most recent attempt looks like this:
"((\"|[^"])+?)"
In various test online, this works the way it should - but when I build my ASP.NET page, it cuts off at the first ", leaving me with just the a-letter, white space and a backslash.
The logic behind the pattern above is to capture all instances of \" or something that is not ". I was hoping this would search for \", making sure to find those first - but I got the feeling that this is overridden by the second part of the expression, which is only 1 single character. A single backslash does not match 2 characters (\"), but it will match as a non-". And from there, the next character will be a single ", and the matching is completed. (This is just my hypothesis on why my pattern is failing.)
Any pointers on this one? I have tried various combinations with "look"-methods in regex, but I didn't really get anywhere. I also get the feeling that is what I need.
ORIGINAL ANSWER
To match a string like a \"b\" c, you need to use following regex declaration:
(?:\\"|[^"])+
var rx = Regex(#"(?:\\""|[^""])+");
See RegexStorm demo
Here is an IDEONE demo:
var str = "a \\\"b\\\" c";
Console.WriteLine(str);
var rx = new Regex(#"(?:\\""|[^""])+");
Console.WriteLine(rx.Match(str).Value);
Please note the # in front of the string literal that lets us use verbatim string literals where we have to double quotes to match literal quotes and use single escape slashes instead of double. This makes regexps easier to read and maintain.
If you want to match any escaped entities in your input string, you can use:
var rx = new Regex(#"[^""\\]*(?:\\.[^""\\]*)*");
See demo on RegexStorm
UPDATE
To match the quoted strings, just add quotes around the pattern:
var rx = new Regex(#"""(?<res>[^""\\]*(?:\\.[^""\\]*)*)""");
This pattern yields much better performance than Tim Long's suggested regex, see RegexHero test resuls:
The following expression worked for me:
"(?<Result>(\\"|.)*)"
The expression matches as follows:
An opening quote (literal ")
A named capture (?<name>pattern) consisting of:
Zero or more occurences * of literal \" or (|) any single character (.)
A final closing quote (literal ")
Note that the * (zero or more) quantifier is non-greedy so the final quote is matched by the literal " and not the "any single character" . part.
I used ReSharper 9's built-in Regular Expression validator to develop the expression and verify the results:
I have used the "Explicit Capture" option to reduce cruft in the output (RegexOptions.ExplicitCapture).
One thing to note is that I am matching the whole string, but I am only capturing the substring, using a named capture. Using named captures is a really useful way to get at the results you want. In code, it might look something like this:
static string MatchQuotedString(string input)
{
const string pattern = #"""(?<Result>(\\""|.)*)""";
const RegexOptions options = RegexOptions.ExplicitCapture;
Regex regex = new Regex(pattern, options);
var matches = regex.Match(input);
var substring = matches.Groups["Result"].Value;
return substring;
}
Optimization: If you are planning on using the regex a lot, you could factor it out into a field and use the RegexOptions.Compiled option, this pre-compiles the expression and gives you faster throughput at the expense of longer initialization.

Fetch values between two [] from a string using Regular expressions

I have a string like as folows :
"channel_changes":[[1313571300,26.879846,true],[1313571360,26.901025,true]]
I want to extract each string in angular brace like 1313571300, 26.879846, true
through regular expression.
I have tried using
string regexPattern = #"\[(.*?)\]";
but that gives the first string as [[1313571420,26.901025,true]
i.e with one extra angular brace.
Please help me how can I achieve this.
This seemed to work in Expresso for me:
\[([\w,\.]*?)\]
Literal [
[1]: A numbered capture group. [[\w,.]*?]
- Any character in this class: [\w,.], any number of repetitions, as few as possible
Literal ]
The problem seemed to be the "." in your regex - since it was picking up the first literal "[" and considering the following "[" in your input to be valid as the next character.
I constrained it to just alphanumeric characters, commas and literal full-stops (period mark), since that's all that was present in your example. You could go further and really specify the format of the data inside those inner square brackets assuming it's consistent, and end up with something more like this:
\[[0-9.]+,[0-9.]+,(true|false)\]
Example C# code:
var matches = Regex.Matches("\"channel_changes\":[[1313571300,26.879846,true],[1313571360,26.901025,true]]", #"\[([\w,\.]*?)\]");
foreach (var match in matches)
{
Console.WriteLine(match);
}
Try this:
#"\[+([^\]]+)\]+"
"[^]]+" - it means any character except right square bracket
Try this
\[([^\[\]]*)\]
See it here online on Regexr
[^\[\]]* is a negated character class, means match any character but [ and ]. With this construct you don't need the ? to make your * ungreedy.

Regex battle between maximum and minimum munge

Greetings, I have file with the following strings:
string.Format("{0},{1}", "Having \"Two\" On The Same Line".Localize(), "Is Tricky For regex".Localize());
my goal is to get a match set with the two strings:
Having \"Two\" On The Same Line
Is Tricky For regex
My current regex looks like this:
private Regex CSharpShortRegex = new Regex("\"(?<constant>[^\"]+?)\".Localize\\(\\)");
My problem is with the escaped quotes in the first line I end up stopping at the quote and I get:
On The Same Line
Is Tricky For This Style Too
however attempting to ignore the escaped quotes is not working out because it makes the Regex greedy and I get
Having \"Two\" On The Same Line".Localize(), "Is Tricky For regex"
We seem to be caught between maximum and minimum munge. Is there any hope? I have some backup plans. Can you Regex backwards? that would make it easier because I can start with the "()ezilacoL."
EDIT:
To clarify. This is my lone edge case. Most of the time the string sits alone like:
var myString = "Hot Patootie".Localize()
This one works for me:
\"((?:[^\\"]|(?:\\\"))*)\"\.Localize\(\)
Tested on http://www.regexplanet.com/simple/index.html against a number of strings with various escaped quotes.
Looks like most of us who answered this one had the same rough idea, so let me explain the approach (comments after #s):
\" # We're looking for a string delimited by quotation marks
( # Capture the contents of the quotation marks
(?: # Start a non-capturing group
[^\\"] # Either read a character that isn't a quote or a slash
|(?:\\\") # Or read in a slash followed by a quote.
)* # Keep reading
) # End the capturing group
\" # The string literal ends in a quotation mark
\.Localize\(\) # and ends with the literal '.Localize()', escaping ., ( and )
For C# you'll need to escape the slashes twice (messy):
\"((?:[^\\\\\"]|(?:\\\\\"))*)\"\\.Localize\\(\\)
Mark correctly points out that this one doesn't match escaped characters other than quotation marks. So here's a better version:
\"((?:[^\\"]|(?:\\")|(?:\\.))*)\"\.Localize\(\)
And its slashed-up equivalent:
\"((?:[^\\\\\"]|(?:\\\\\")|(?:\\\\.))*)\"\\.Localize\\(\\)
Works the same way, except it has a special case that if encounters a slash but it can't match \", it just consumes the slash and the following character and moves on.
Thinking about it, it's better to just consume two characters at every slash, which is effectively Mark's answer so I won't repeat it.
Here's the regular expression you need:
#"""(?<constant>(\\.|[^""])*)""\.Localize\(\)"
A test program:
using System;
using System.Text.RegularExpressions;
using System.IO;
class Program
{
static void Main()
{
Regex CSharpShortRegex =
new Regex(#"""(?<constant>(\\.|[^""])*)""\.Localize\(\)");
foreach (string line in File.ReadAllLines("input.txt"))
foreach (Match match in CSharpShortRegex.Matches(line))
Console.WriteLine(match.Groups["constant"].Value);
}
}
Output:
Having \"Two\" On The Same Line
Is Tricky For regex
Hot Patootie
Notice that I have used #"..." to avoid having to escape backslashes inside the regular expression. I think this makes it easier to read.
Update:
My original answer (below the horizontal rule) has a bug: regular-expression matchers attempt alternatives in left-to-right order. Having [^"] as the first alternative allows it to consume the backslash, but then the next character to be matched is a quote, which prevents the match from proceeding.
Incompatibility note: Given the pattern below, perl backtracks to the other alternative (the escaped quote) and successfully finds a match for the Having \"Two\" On The Same Line case.
The fix is to try an escaped quote first and then a non-quote:
var CSharpShortRegex =
new Regex("\"(?<constant>(\\\\\"|[^\"])*)\"\\.Localize\\(\\)");
or if you prefer the at-string form:
var CSharpShortRegex =
new Regex(#"""(?<constant>(\\""|[^""])*)""\.Localize\(\)");
Allow for escapes:
private Regex CSharpShortRegex =
new Regex("\"(?<constant>([^\"]|\\\\\")*)\"\\.Localize\\(\\)");
Applying one level of escaping to make the pattern easier to read, we get
"(?<constant>([^"]|\\")*)"\.Localize\(\)
That is, a string starts and ends with " characters, and everything between is either a non-quote or an escaped quote.
Looks like you're trying to parse code so one approach might be to evaluate the code on the fly:
var cr = new CSharpCodeProvider().CompileAssemblyFromSource(
new CompilerParameters { GenerateInMemory = true },
"class x { public static string e() { return " + input + "}}");
var result = cr.CompiledAssembly.GetType("x")
.GetMethod("e").Invoke(null, null) as string;
This way you could handle all kinds of other special cases (e.g. concatenated or verbatim strings) that would be extremely difficult to handle with regex.
new Regex(#"((([^#]|^|\n)""(?<constant>((\\.)|[^""])*)"")|(#""(?<constant>(""""|[^""])*)""))\s*\.\s*Localize\s*\(\s*\)", RegexOptions.Compiled);
takes care of both simple and #"" strings. It also takes into account escape sequences.

Categories