Can you improve this C# regular expression code? - c#

In a program I'm reading in some data files, part of which are formatted as a series of records each in square brackets. Each record contains a section title and a series of key/value pairs.
I originally wrote code to loop through and extract the values, but decided it could be done more elegantly using regular expressions. Below is my resulting code (I just hacked it out for now in a console app - so know the variable names aren't that great, etc.
Can you suggest improvements? I feel it shouldn't be necessary to do two matches and a substring, but can't figure out how to do it all in one big step:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches=Regex.Matches(input, #"\[[^\]]*\]");
foreach (Match match in matches)
{
string subinput = match.Value;
int firstSpace = subinput.IndexOf(' ');
string section = subinput.Substring(1, firstSpace-1);
Console.WriteLine(section);
MatchCollection newMatches = Regex.Matches(subinput.Substring(firstSpace + 1), #"\s*(\w+)\s*=\s*(\w+)\s*");
foreach (Match newMatch in newMatches)
{
Console.WriteLine("{0}={1}", newMatch.Groups[1].Value, newMatch.Groups[2].Value);
}
}

I prefer named captures, nice formatting, and clarity:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches = Regex.Matches(input, #"\[
(?<sectionName>\S+)
(\s+
(?<key>[^=]+)
=
(?<value>[^ \] ]+)
)+
]", RegexOptions.IgnorePatternWhitespace);
foreach(Match currentMatch in matches)
{
Console.WriteLine("Section: {0}", currentMatch.Groups["sectionName"].Value);
CaptureCollection keys = currentMatch.Groups["key"].Captures;
CaptureCollection values = currentMatch.Groups["value"].Captures;
for(int i = 0; i < keys.Count; i++)
{
Console.WriteLine("{0}={1}", keys[i].Value, values[i].Value);
}
}

You should take advantage of the collections to get each key. So something like this then:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
Regex r = new Regex(#"(\[(\S+) (\s*\w+\s*=\s*\w+\s*)*\])", RegexOptions.Compiled);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups[2].Value);
foreach (Capture c in m.Groups[3].Captures)
{
Console.WriteLine(c.Value);
}
}
Resulting output:
section1
key1=value1
key2=value2
section2
key1=value1
key2=value2
key3=value3
section3
key1=value1

You should be able to do something with nested groups like this:
pattern = #"\[(\S+)(\s+([^\s=]+)=([^\s\]]+))*\]"
I haven't tested it in C# or looped through the matches, but the results look right on rubular.com

This will match all the key/value pairs ...
var input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
var ms = Regex.Matches(input, #"section(\d+)\s*(\w+=\w+)\s*(\w+=\w+)*");
foreach (Match m in ms)
{
Console.WriteLine("Section " + m.Groups[1].Value);
for (var i = 2; i < m.Groups.Count; i++)
{
if( !m.Groups[i].Success ) continue;
var kvp = m.Groups[i].Value.Split( '=' );
Console.WriteLine( "{0}={1}", kvp[0], kvp[1] );
}
}

Related

Getting a numbers from a string with chars glued

I need to recover each number in a glued string
For example, from these strings:
string test = "number1+3"
string test1 = "number 1+4"
I want to recover (1 and 3) and (1 and 4)
How can I do this?
CODE
string test= "number1+3";
List<int> res;
string[] digits= Regex.Split(test, #"\D+");
foreach (string value in digits)
{
int number;
if (int.TryParse(value, out number))
{
res.Add(number)
}
}
This regex should work
string pattern = #"\d+";
string test = "number1+3";
foreach (Match match in Regex.Matches(test, pattern))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
Note that if you intend to use it multiple times, it's better, for performance reasons, to create a Regex instance than using this static method.
var res = new List<int>();
var regex = new Regex(#"\d+");
void addMatches(string text) {
foreach (Match match in regex.Matches(text))
{
int number = int.Parse(match.Value);
res.Add(number);
}
}
string test = "number1+3";
addMatches(test);
string test1 = "number 1+4";
addMatches(test1);
MSDN link.
Fiddle 1
Fiddle 2
This calls for a regular expression:
(\d+)\+(\d+)
Test it
Match m = Regex.Match(input, #"(\d+)\+(\d+)");
string first = m.Groups[1].Captures[0].Value;
string second = m.Groups[2].Captures[0].Value;
An alternative to regular expressions:
string test = "number 1+4";
int[] numbers = test.Replace("number", string.Empty, StringComparison.InvariantCultureIgnoreCase)
.Trim()
.Split("+", StringSplitOptions.RemoveEmptyEntries)
.Select(x => Convert.ToInt32(x))
.ToArray();

Want to get more than 1 value by using regex c#

> String St = "New Specification Result : Measures 0.0039mm ( 4 Microns )New Specification Result : Measures 0.0047mm ( 5 Microns )";
The string that i want to get is 0.0039mm and 0.0047mm but the code i use keep giving me 0.0047mm only.
var src = st;
var pattern = #"([0-9].[0-9]{4}mm)";
var expr = new Regex(pattern, RegexOptions.IgnoreCase);
foreach (Match match in expr.Matches(src))
{
string key = match.Groups[1].Value;
string key2 = match.Groups[2].Value;
label1.Text = key + key2;
}
Your code is fine, and the millimeter number you are trying to match is being captured correctly, but in the first capture group, and not in the second. There is a slight problem with your pattern, and it should be this:
([0-9]\.[0-9]{4}mm)
You intend for the dot to be a literal decimal point, so it should be escaped with a backslash. Here is the full code:
var pattern = #"([0-9].[0-9]{4}mm)";
var expr = new Regex(pattern, RegexOptions.IgnoreCase);
foreach (Match match in expr.Matches(src))
{
string key = match.Groups[1].Value;
string key2 = match.Groups[2].Value; // this doesn't match to anything here
Console.WriteLine(key);
}
Demo
You want the following. Your loop is is overwriting your copy of the first result (and you don't have a 2nd capture group. You have a 2nd match)
var st = "New Specification Result : Measures 0.0039mm(4 Microns)New Specification Result: Measures 0.0047mm(5 Microns";
var pattern = #"([0-9]\.[0-9]{4}mm)";
var expr = new Regex(pattern, RegexOptions.IgnoreCase);
string key = "";
foreach (Match match in expr.Matches(st))
{
key += match.Groups[1].Value;
}
you want to iterate through each match and the join them together for display
var mm = new Regex(#"([0-9]\.[0-9]{4}mm)").Matches(src).Select(m => m.Groups[1]).ToList();
var list = string.Join(" ", mm);
label1.Text = list;
currently you are only getting the last match as you keep overwriting the text in your label

Find all string occurrences after a string

Just need a little push here. I have a file with data like
xyz buildinfo app_id="12345" asf
sfsdf buildinfo app_id="12346" wefwef
...
I need to get a string array with the number following app_id=. Below code gives me all matches and i am able to get the count( Regex.Matches(text, searchPattern).Count). But I need the actual items into an array.
string searchPattern = #"app_id=(\d+)";
var z = Regex.Matches(text, searchPattern);
I think you're saying you want the items (numbers) without the app_id part. You want to use a Positive Lookbehind
string text = #"xyz buildinfo app_id=""12345"" asf sfsdf buildinfo app_id=""12346"" wefwef";
string searchPattern = #"(?<=app_id="")(\d+)";
var z = Regex.Matches(text, searchPattern)
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
(?<=app_id="") will match the pattern, but not include it in the capture
You can take a look at the documentation.
Quoting it you can use this code:
string pattern = #"app_id=(\d+)";
string input = "xyz buildinfo app_id="12345" asf sfsdf buildinfo app_id="12346" efwef";
Match match = Regex.Match(input, pattern);
if (match.Success) {
Console.WriteLine("Matched text: {0}", match.Value);
for (int ctr = 1; ctr <= match.Groups.Count - 1; ctr++) {
Console.WriteLine(" Group {0}: {1}", ctr, match.Groups[ctr].Value);
int captureCtr = 0;
foreach (Capture capture in match.Groups[ctr].Captures) {
Console.WriteLine(" Capture {0}: {1}",
captureCtr, capture.Value);
captureCtr += 1;
}
}
}

Get numbers of text with Regex.Split in C#

How can I get numbers between brackets of this text with regex in C#?
sample text :
"[1]Ali ahmadi,[2]Mohammad Razavi"
result is : 1,2
My C# code is :
string result = null;
string[] digits = Regex.Split(Text, #"[\d]");
foreach (string value in digits)
{
result += value + ",";
}
return result.Substring(0,result.Length - 1);
string s = "[1]Ali ahmadi,[2]Mohammad Razavi";
Regex regex = new Regex(#"\[(\d+)\]", RegexOptions.Compiled);
foreach (Match match in regex.Matches(s))
{
Console.WriteLine(match.Groups[1].Value);
}
This will capture the numbers between brackets (\d+), and store them in the first matched group (Groups[1]).
DEMO.
Using a Linq-based approach on João's answer:
string s = "[1]Ali ahmadi,[2]Mohammad Razavi";
var digits = Regex.Matches(s, #"\[(\d+)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
foreach (var match in digits)
{
Console.WriteLine(match);
}
DEMO

How do I "cut" out part of a string with a regex?

I need to cut out and save/use part of a string in C#. I figure the best way to do this is by using Regex. My string looks like this:
"changed from 1 to 10".
I need a way to cut out the two numbers and use them elsewhere. What's a good way to do this?
Error checking left as an exercise...
Regex regex = new Regex( #"\d+" );
MatchCollection matches = regex.Matches( "changed from 1 to 10" );
int num1 = int.Parse( matches[0].Value );
int num2 = int.Parse( matches[1].Value );
Matching only exactly the string "changed from x to y":
string pattern = #"^changed from ([0-9]+) to ([0-9]+)$";
Regex r = new Regex(pattern);
Match m = r.match(text);
if (m.Success) {
Group g = m.Groups[0];
CaptureCollection cc = g.Captures;
int from = Convert.ToInt32(cc[0]);
int to = Convert.ToInt32(cc[1]);
// Do stuff
} else {
// Error, regex did not match
}
In your regex put the fields you want to record in parentheses, and then use the Match.Captures property to extract the matched fields.
There's a C# example here.
Use named capture groups.
Regex r = new Regex("*(?<FirstNumber>[0-9]{1,2})*(?<SecondNumber>[0-9]{1,2})*");
string input = "changed from 1 to 10";
string firstNumber = "";
string secondNumber = "";
MatchCollection joinMatches = regex.Matches(input);
foreach (Match m in joinMatches)
{
firstNumber= m.Groups["FirstNumber"].Value;
secondNumber= m.Groups["SecondNumber"].Value;
}
Get Expresson to help you out, it has an export to C# option.
DISCLAIMER: Regex is probably not right (my copy of expresso expired :D)
Here is a code snippet that does almost what I wanted:
using System.Text.RegularExpressions;
string text = "changed from 1 to 10";
string pattern = #"\b(?<digit>\d+)\b";
Regex r = new Regex(pattern);
MatchCollection mc = r.Matches(text);
foreach (Match m in mc) {
CaptureCollection cc = m.Groups["digit"].Captures;
foreach (Capture c in cc){
Console.WriteLine((Convert.ToInt32(c.Value)));
}
}

Categories