Need pattern for a complex Regex Split - c#

I'd like to split the following string
// Comments
KeyA : SomeType { SubKey : SubValue } KeyB:'This\'s a string'
KeyC : [ 1 2 3 ] // array value
into
KeyA
:
SomeType
{ SubKey : SubValue }
KeyB
:
This's a string
KeyC
:
[ 1 2 3 ]
(: and blank spaces are the delimiters although : is kept in the result; comments are ignored; no splitting between {}, [], or '')
Can I achieve that with Regex Split or Match? If so, what would be the right pattern? Comments to the pattern string would be appreciated.
Moreover, it's also desirable to throw exception or return an error message if the input string is not valid (see the comment below).
Thanks.

You can use this pattern...
string pattern = #"(\w+)\s*:\s*((?>[^\w\s\"'{[:]+|\w+\b(?!\s*:)|\s(?!\w+\s*:|$)|\[[^]]*]|{[^}]*}|\"(?>[^\"\\]|\\.)*\"|'(?>[^'\\]|\\.)*')+)\s*";
... in two ways:
with Match method which will give you what you are looking for with keys in group 1 and values in group 2
with Split method, but you must remove all the empty results.
How is build the second part (after the :) of the pattern?
The idea is to avoid, first at all, problematic characters: [^\w\s\"'{[:]+
Then you allow each of these characters but in a specific situation:
\w+\b(?!\s*:) a word that is not the key
\s(?!\w+\s*:|$) spaces that are not at the end of the value (to trim them)
\[[^]]*] content surrounded by square brackets
{[^}]*} the same with curly brackets
"(?>[^"\\]|\\\\|\\.)*" content between double quotes (with escaped double quotes allowed)
'(?>[^'\\]|\\\\|\\.)*' the same with single quotes
Note that the problem with colon inside brackets or quotes is avoided.

I'm not quite sure what you're looking for when you get to KeyC. How do you know when the string value for KeyB ends and the string for KeyC begins? Is there a colon after 'this\'s is a string' or a line break? Here's a sample to get you started:
[TestMethod]
public void SplitString()
{
string splitMe = "KeyA : SubComponent { SubKey : SubValue } KeyB:This's is a string";
string pattern = "^(.*):(.*)({.*})(.*):(.*)";
Match match = Regex.Match(splitMe, pattern);
Assert.IsTrue(match.Success);
Assert.AreEqual(6, match.Groups.Count); // 1st group is the entire match
Assert.AreEqual("KeyA", match.Groups[1].Value.Trim());
Assert.AreEqual("SubComponent", match.Groups[2].Value.Trim());
Assert.AreEqual("{ SubKey : SubValue }", match.Groups[3].Value.Trim());
Assert.AreEqual("KeyB", match.Groups[4].Value.Trim());
Assert.AreEqual("This's is a string", match.Groups[5].Value.Trim());
}

this Regex pattern should work for you
\s*:\s*(?![^\[]*\])(?![^{]*})(?=(([^"]*"[^"]*){2})*$|[^"]+$)
when replaced with
\n$0\n
Demo

Related

Greedy regex finding latest brace

I'm trying to parse some variable definition to extract documentation automatically, but I'm having trouble skipping some } which potentially appear in the default value.
Here's a sample...
variable "a" {
type = string
description = "A desc"
default = ""
}
variable "b" {
type = map()
description = "B desc"
default = {}
}
variable "c" {
type = list(string)
description = "C desc"
default = []
}
And the regex I'm using
variable.\"(?<name>\w+)\"(.*?)description.=."(?<desc>[^"\\]*(?:\\.[^"\\]*)*)"(.*?)}
with a replace of
'* `${name}`: ${desc}
This gives the output
* `a`: A desc
* `b`: B desc
}
* `c`: C Desc
I need the regex to be in single line mode and non-greedy so it stays within each variable definition, but then I can't seem to stop it matching on the first trailing } it finds. What would be good is if could match ^} - but again we are in single line mode so it doesn't apply.
See if this will work for your dataset:
variable.\"(?<name>\w+)\"(.*?)description.=."(?<desc>[^"\\]*(?:\\.[^"\\]*)*)"(.*?)(?=[^}]+?variable|$)
Try it on Regex101
Here I replaced } at the end with (?=[^}]+?variable|$). This should ensure that the last capturing group will keep consuming characters until there are no more closing braces before the next variable (or the end of the input).
You can match the variable and the description values and match all lines in between that do not start with } using a negative lookahead.
variable\s*"(?<name>\w+)"\s*{(?:(?!\r?\n}|\bdescription\b).)*description\s*=\s*"(?<desc>[^"]*(?:\\.[^"]*)*)"(?:(?!\r?\n}).)*\r?\n}
Explanation
variable\s*" Match variable and then "
(?<name>\w+) Group name, match 1+ word chars
"\s*{ Match optional whitespace chars and {
(?:(?!\r?\n}|\bdescription\b).)* Match any char using the dot (Single line mode) when what is directly to the right is not a newline and } or description
description\s*=\s*" match description=" with optional whitespace chars around the = and then "
(?<desc>[^"]*(?:\\.[^"]*)*) Named group desc to capture the description
" Match the closing "
(?:(?!\r?\n}).)* Match any char (Using the Single line mode) when what is directly to the right is not a newline and }
\r?\n} Match a newline and }
.Net regex demo
It's quite verbose, but a bit more optimized pattern might be
variable\s*"(?<name>\w+)"\s*{[^}d]*(?>}(?<!\r?\n.)[^}]*|(?!\bdescription\s*=\s*"[^"]*")d[^d]*)*\bdescription\s*=\s*"(?<desc>[^"]*(?:\\.[^"]*)*)"[^}]*(?:}(?<!\r?\n.)[^}]*)*\r?\n}
Regex demo

Regex pattern generator

I'm trying to do regex pattern which will match to this:
Name[0]/Something
or
Name/Something
Verbs Name and Something will be always known.
I did for Name[0]/Something, but I want make pattern for this verb in one regex
I've tried to sign [0] as optional but it didn't work :
var regexPattern = "Name" + #"\([\d*\]?)/" + "Something"
Do you know some generator where I will input some verbs and it will make pattern for me?
Use this:
Name(\[\d+\])?\/Something
\d+ allows one or more digits
\[\d+\] allows one or more digits inside [ and ]. So it will allow [0], [12] etc but reject []
(\[\d+\])? allows digit with brackets to be present either zero times or once
\/ indicates a slash (only one)
Name and Something are string literals
Regex 101 Demo
You were close, the regex Name(\[\d+\])?\/Something will do.
The problem is with first '\' in your pattern before '('.
Here is what you need:
var str = "Name[0]/Something or Name/Something";
Regex rg = new Regex(#"Name(\[\d+\])?/Something");
var matches = rg.Matches(str);
foreach(Match a in matches)
{
Console.WriteLine(a.Value);
}
var string = 'Name[0]/Something';
var regex = /^(Name)(\[\d*\])?\/Something$/;
console.log(regex.test(string));
string = 'Name/Something';
console.log(regex.test(string));
You've tried wrong with this pattern: \([\d*\]?)/
No need to use \ before ( (in this case)
? after ] mean: character ] zero or one time
So, if you want the pattern [...] displays zero or one time, you can try: (\[\d*\])?
Hope this helps!
i think this is what you are looking for:
Name(\[\d+\])?\/Something
Name litteral
([\d+])? a number (1 or more digits) between brackets optional 1 or 0 times
/Something Something litteral
https://regex101.com/r/G8tIHC/1

Regex matching group

I have the following pattern format of text:
[1/#DaysInMonth #FirstTitle] #SecondTitle
The #DaysInMonth is gets on how many days are there based on the selected month, #FirstTitle and the #SecondTitle is alphanumeric.
I tried with the following:
[\1(?<DaysInMonth>\d\s+) (?<FirstTitle>[\w\s \]+)\] (?<SecondTitle>[\w\s \]+)$]
But it didn't seems to working. The matches character is 53 characters. [Link]
How can I solve this?
Edit after #baddger964 answer:
I want to use in my application like this:
private Regex _regex = null;
string value = "[1/30 Development In Progress] Development In Progress";
_regex = new Regex(#"\[\d+\/(?<DaysInMonth>\d+)\s(?<FirstTitle>[\w\s]+)\]\s(?<SecondTitle>[\w\s]+)").Match(value);
string value1 = _regex.Groups["DaysInMonth"].Value;
string value2 = _regex.Groups["FirstTitle"].Value;
string value3 = _regex.Groups["SecondTitle"].Value;
Your answer much appreciated.
Thank you.
Maybe you can use this :
\[\d+\/(?<DaysInMonth>\d+)\s(?<FirstTitle>[\w\s]+)\]\s(?<SecondTitle>[\w\s]+)
for note :
\1 => dont escape the "1" because \1 match the same thing as the last defined match group.
[ => ou have to escape this \[ because with [ you create a set of caracters
so your regex :
[\1(?<DaysInMonth>\d\s+) (?<FirstTitle>[\w\s \]+)\] (?<SecondTitle>[\w\s \]+)$]
says : i want match one caracter from this set of caracter :
\1(?<DaysInMonth>\d\s+) (?<FirstTitle>[\w\s \]+)\] (?<SecondTitle>[\w\s \]+)$
Like this?
Here is an example of a normal regex if you interesse.
https://regex101.com/r/wLBj7Z/1

Regex to replace between starting curly bracket { and colon :

I wanted to replace a part of string between curly bracket and colon. Suppose I have a string like :
{Name: {\"before\":'Aj', \"after\":'Ajay'} },
So now I want to replace the part of string {Name: with {"Name":.
I tried doing Regex.Replace(rectifyAfter, #"/{([^\s].+?)(\s|$):", "{\"$1\":"). But it doesn't do the replace.
Can anyone please help me with it?
The following regex should do the trick:
(?:\{)(?<Property>[a-z0-9]+)(?:\:)
What it does:
(?:\{) - matches but doesn't capture the opening curly bracket
(?<Property>[a-z0-9]+) - captures the name of the property in a capturing group named Property
(?:\:) - again, matches but doesn't capture the : after the property
So, basically, what you want to do is match the pattern {Name: but have it replaced with {" + value of Property group + :.
And below is the code to do the replacement:
string pattern = #"(?:\{)(?<Property>[a-z0-9]+)(?:\:)";
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
string targetString = #"{Name: {\""before\"":'Aj', \""after\"":'Ajay'} },";
string replacement = #"{""${Property}"":";
return regex.Replace(targetString, replacement);
${Property} is the name of the capturing group and it will hold the name of your property.
I don't see why you'd need regex for this. Just use a simple .Replace
string json = "" //Whatever your JSON string is.
json = json.Replace("{Name:", "{\"Name\":");

Cutting text to specific length preserving the words

I have the following text:
Test some text. Now here is some new realylonglonglong text
And I need to cut it to 50 characters but without cutting the words. So, the desire result is:
Test some text. Now here is some new ...
I am looking only for solution using regular expression replace. The following regular expression:
^.{0,50}(?= |$)
matches:
Test some text. Now here is some new
but I failed transforming it for use in replace function.
In my real case I have SQL CLR function called [dbo].[RegexReplace] and I am calling it like this:
SELECT [dbo].[RegexReplace](#TEST, '^.{0,50}(?= |$)', '...')
Its C# definition is:
public static string Replace(SqlString sqlInput, SqlString sqlPattern, SqlString sqlReplacement)
{
string input = (sqlInput.IsNull) ? string.Empty : sqlInput.Value;
string pattern = (sqlPattern.IsNull) ? string.Empty : sqlPattern.Value;
string replacement = (sqlReplacement.IsNull) ? string.Empty : sqlReplacement.Value;
return Regex.Replace(input, pattern, replacement);
}
That's why I want to to this with regular expression replace function.
This is the regex you want:
string result = Regex.Replace("Test some text. Now here is some new realylonglonglong text", "(?=.{50,})(^.{0,50}) .*", "$1...");
so look for ^(?=.{50,})(.{0,50}) .* and replace it with $1...
Explanation... You are looking for texts that are AT LEAST 50 characters long, because shorter texts don't need shortening, so (?=.{50,}) (but note that this won't capture anything). Then you look for the first 0...50 characters (.{0,50}) followed by a space , followed by anything else .*. You'll replace all of this with the first 0...50 characters ($1) followed by ...
I need the (?=.{50,}) because otherwise the regex would replace Test test with Test..., replacing from the first space.

Categories