Better string replacing in C# - c#

Does anyone have an idea which would be better for potential string replacement?
If I have a collection of varying length of strings of varying lengths in which some strings might need special replacement of encoded hex values (e.g. =0A, %20... etc)
The "replacements" (there could be multiple) for each string would be handled by a Regular Expression to detect the appropriate escaped hex values
Which would be more efficient?
To simply run the replacement on every string in the collection ensuring by brute force that all needed replacements are done
To perform a test if a replacement is needed and only run the replacement on the strings that need it.
I'm working in C#.
Update
A little additional info from the answers and comments.
This is primarily for VCARD processing that is loaded from a QR Code
I currently have a regex that uses capture groups to get the KEY, PARAMETERS and VALUE from each KEY;PARAMETERS:VALUE in the VCARD.
Since i'm supporting v 2.1 and 3.0 the encoding and line folding are VERY different so I need to know the version before I decode.
Now it doesn't make sense to me to run the entire regular expression JUST to get the version and apply the approptiate replace to the whole vcard block of text and THEN rerun the regular expression.
To me it makes more sense to just get my capture groups loaded up then snag the version and do the appropriate decoding replacement on each match

When you just Replace it will perform slightly slower when there's No Match because of the additional checks that Replace does (e.g.)
if (replacement == null)
{
throw new ArgumentNullException("replacement");
}
Regex.Replace does return the input if no matches are found so there's no memory issue here.
Match match = regex.Match(input, startat);
if (!match.Success)
{
return input;
}
When there is a match the regex.Match fires twice once when you do it and again when replace does it. Which means Check and Replace will perform slower then.
So your results will be based on
Do you expect a lot of matches or a lot of misses?
When there are matches how does the fact that the Regex.Match will run twice overwelm the extra parameter checks? My guess is it probably will.

You could use something along the lines of a very specialized lexer with look-forward checking, e.g.,
outputBuffer := new StringBuilder
index := 0
max := input.Length
while index < max
if input[ index ] == '%'
&& IsHexDigit( input[ index + 1 ] )
&& IsHexDigit( input[ index + 2 ] )
outputBuffer.Append( ( char )int.Parse( input.Substring( index + 1, 2 )
index += 3
continue
else
outputBuffer.Append( input[ index ] )
index ++;
continue

If you go with string replacement, it may be better to use StringBuilder.Replace than string.Replace. (Will not create many temporary strings while replacing....)

(Posted on behalf of the question author).
Taking some inspiration from some of the fine folks who chimed in, I managed to isolate and test the code in question.
In both cases I have a Parser Regex that handles breaking up each "line" of the vcard and a Decode Regex that handles capturing any encoded Hex numbers.
It occurred to me that regardless of my use of string.Replace or not I still had to depend on the Decode Regex to pick up the potential replacement hex codes.
I ran through several different scenarios to see if the numbers would change; including: Casting the Regex MatchCollection to a Dictionary to remove the complexity of the Match object and projecting the Decoding regex into a collection of distinct simple anonymous object with an Old and New string value for simple string.Replace calls
In the end no matter how I massaged the test using the String.Replace it came close but was always slower that letting the Decoded Regex do it's Replace thing.
The closest was about a 12% difference in speed.
In the end for those curious this is what ended up as the winning block of code
var ParsedCollection = Parser.Matches(UnfoldedEncodeString).Cast<Match>().Select(m => new
{
Field = m.Groups["FIELD"].Value,
Params = m.Groups["PARAM"].Captures.Cast<Capture>().Select(c => c.Value),
Encoding = m.Groups["ENCODING"].Value,
Content = m.Groups["ENCODING"].Value.FirstOrDefault == 'Q' ? QuotePrintableDecodingParser.Replace(m.Groups["CONTENT"].Value, me => Convert.ToChar(Convert.ToInt32(me.Groups["HEX"].Value, 16)).ToString()) : m.Groups["CONTENT"].Value,
Base64Content = ((m.Groups["ENCODING"].Value.FirstOrDefault() == 'B') ? Convert.FromBase64String(m.Groups["CONTENT"].Value.Trim()) : null)
});
Gives me everything I need in one shot. All the Fields, their values, any parameters and the two most common encodings decoded all projected into a nicely packaged anonymous object.
and on the plus side only a little over 1000 nano seconds from string to parsed and decoded Anonymous Object (thank goodness for LINQ and extension methods) (based on 100,000 tests with around 4,000 length VCARD).

Related

Finding the beginning and end of a substring using regex

I have an awful time with regular expressions, so I usually resort to lousy kludges and workarounds when parsing strings. I need to get better at using regex. This one seems simple to me, but I don't even know where to start.
Here's the string output from my device:
testString = IP:192.168.5.210\rPlaylist:1\rEnable:On\rMode:HDMI\rLineIn:unbal\r
Example:
I want to find if the device is off or on. I need to search for the string "Enable:" then locate the carriage return and determine if the word between Enable: and \r is off or on. It seems like that's what regex is for or do I totally misunderstand it.
Can someone point me in the right direction?
Additional information - Maybe I need to expand on the question.
Based on the answers, finding whether or not the device is Enabled appears to be fairly simple. Since I get a return string is similar to a key/value pair what's more vexing determining the substring between the : and the carriage return. A number of these pairs have a response with lengths that vary significantly, such as DeviceLocation, DeviceName, IPAddress. In fact, the device responds to every command sent to it by returning the entire status list, 48 key/value pairs, which I then must parse even if I only need to know one property.
Also based on your answers .... regular expressions is not the way to go.
Thanks for any help.
Norm
I would suggest for a simple line as shown, ask for one or the other, but verify as well. Based partially off Ken White's suggestions.
if(input.Contains(":On")){
//DoWork()
}else{
if(input.Contains(":Off"))
//DoOtherWork
}
This presumes that ":On" and ":Off" will not appear anywhere else in the string, even with a different string.
Consider the following code:
// This regular expression matches text 'Enabled: ' followed by one or more non '\r' followed by '\r'
// RegexOptions.Multiline is optional but MAY be necessary on other platforms.
// Also, '\r' is not a line break. '\n' is.
Regex regex = new Regex("Enable: ([^\r]+)\r", RegexOptions.Multiline);
string input = "IP:192.168.5.210\rPlaylist: 1\rEnable: On\rMode: HDMI\rLineIn: unbal\r";
var matches = regex.Match(input);
Debug.Assert(matches != Match.Empty);
// The match variable will contain 2 Groups:
// First will be 'Enabled: On\r'
// The other is 'On' since we enclosed ([^\r]+) in ().
Console.WriteLine(matches.Groups[1]);

Regex matching only when condition has not been met

I have kind of a weird problem that I am attempting to resolve with some elegant regular expressions.
The system I am working on was initially designed to accept an incoming string and through a pattern matching method, alter the string which it then returns. A very simplistic example is:
Incoming string:
The dog & I went to the park and had a great time...
Outgoing string:
The dog {&} I went to the park and had a great time {...}
The punctuation mapper wraps key characters or phrases and wraps them in curly braces. The original implementation was a one way street and was never meant for how it is currently being applied and as a result, if it is called incorrectly, it is very easy for the system to "double" wrap a string as it is just doing a simple string replace.
I spun up Regex Hero this morning and started working on some pattern matches and having not written a regular expression in nearly a year, quickly hit a wall.
My first idea was to match a character (i.e. &) but only if it wasn't wrapped in braces and came up with [^\{]&[^\}], which is great but of course catches any instance of the ampersand so long as it is not preceded by a curly brace, including white spaces and would not work in a situation where there were two ampersands back to back (i.e. && would need to be {&}{&} in the outgoing string. To make matters more complicated, it is not always a single character as ellipsis (...) is also one of the mapped values.
Every solution I noodle over either hits a barrier because there is an unknown number of occurrences of a particular value in the string or that the capture groups will either be too greedy or finally, cannot compensate for multiple values back to back (i.e. a single period . vs ellipsis ...) which the original dev handled by processing ellipsis first which covered the period in the string replace implementation.
Are there any regex gurus out there that have any ideas on how I can detect the undecorated (unwrapped) values in a string and then perform their replacements in an ungreedy fashion that can also handle multiple repeated characters?
My datasource that I am working against is a simple key value pair that contains the value to be searched for and the value to replace it with.
Updated with example strings:
Undecorated:
Show Details...
Default Server:
"Smart" 2-Way
Show Lender's Information
Black & White
Decorated:
Show Details{...}
Default Server{:}
{"}Smart{"} 2-Way
Show Lender{'}s Information
Black {&} White
Updated With More Concrete Examples and Datasource
Datasource (SQL table, can grow at any time):
TaggedValue UntaggedValue
{:} :
{&} &
{<} <
{$} $
{'} '
{} \
{>} >
{"} "
{%} %
{...} ...
{...} …
{:} :
{"} “
{"} ”
{'} `
{'} ’
Broken String: This is a string that already has stuff {&} other stuff{!} and {...} with {_} and {#} as well{.} and here are the same characters without it & follow by ! and ... _ & . &&&
String that needs decoration: Show Details... Default Server: "Smart" 2-Way Show Lender's Information Black & White
String that would pass through the method untouched (because it was already decorated): The dog {&} I went to the park and had a great time {...}
The other "gotcha" in moving to regex is the need to handle escaping, especially of backslashes elegantly due to their function in regular expressions.
Updated with output from #Ethan Brown
#Ethan Brown,
I am starting think that regex, while elegant might not be the way to go here. The updated code you provided, while closer still does not yield correct results and the number of variables involved may exceed the regex logics capability.
Using my example above:
'This is a string that already has stuff {&} other stuff{!} and {...} with {_} and {#} as well{.} and here are the same characters without it & follow by ! and ... _ & . &&&'
yields
This is a string that already has stuff {&} other stuff{!} and {...} with {_} and {#} as well{.} and here are the same characters without it {&} follow by {!} and {...} {_} {&} . {&&}&
Where the last group of ampersands which should come out as {&}{&}{&} actually comes out as {&&}&.
There is so much variability here (i.e. need to handle ellipsis and wide ellipsis from far east languages) and the need to utilize a database as the datasource is paramount.
I think I am just going to write a custom evaluator which I can easily enough write to perform this type of validation and shelve the regex route for now. I will grant you credit for your answer and work as soon as I get in front of a desktop browser.
This kind of problem can be really tough, but let me give you some ideas that might help out. One thing that's really going to give you headaches is handling the case where the punctuation appears at the beginning or end of the string. Certainly that's possible to handle in a regex with a construct like (^|[^{])&($|[^}]), but in addition to that being painfully hard to read, it also has efficiency issues. However, there's a simple way to "cheat" and get around this problem: just pad your input string with a space on either end:
var input = " " + originalInput + " ";
When you're done you can just trim. Of course if you care about preserving input at the beginning or end, you'll have to be more clever, but I'm going to assume for argument's sake that you don't.
So now on to the meat of the problem. Certainly, we can come up with some elaborate regular expressions to do what we're looking for, but often the answer is much much simpler if you use more than one regular expression.
Since you've updated your answer with more characters, and more problem inputs, I've updated this answer to be more flexible: hopefully it will meet your needs better as more characters get added.
Looking over your input space, and the expressions you need quoted, there are really three cases:
Single-character replacements (! becomes {!}, for example).
Multi-character replacements (... becomes {...}).
Slash replacement (\ becomes {})
Since the period is included in the single-character replacements, order matters: if you replace all the periods first, then you will miss ellipses.
Because I find the C# regex library a little clunky, I use the following extension method to make this more "fluent":
public static class StringExtensions {
public static string RegexReplace( this string s, string regex, string replacement ) {
return Regex.Replace( s, regex, replacement );
}
}
Now I can cover all of the cases:
// putting this into a const will make it easier to add new
// characters in the future
const string normalQuotedChars = #"\!_\\:&<\$'>""%:`";
var output = s
.RegexReplace( "(?<=[^{])\\.\\.\\.(?=[^}])", "{$&}" )
.RegexReplace( "(?<=[^{])[" + normalQuotedChars + "](?=[^}])", "{$&}" )
.RegexReplace( "\\\\", "{}" );
So let's break this solution down:
First we handle the ellipses (which will keep us from getting in trouble with periods later). Note that we use a zero-width assertions at the beginning and end of the expression to exclude expressions that are already quoted. The zero-width assertions are necessary, because without them, we'd get into trouble with quoted characters right next to each other. For example, if you have the regex ([^{])!([^}]), and your input string is foo !! bar, the match would include the space before the first exclamation point and the second exclamation point. A naive replacement of $1!$2 would therefore yield foo {!}! bar because the second exclamation point would have been consumed as part of the match. You'd have to end up doing an exhaustive match, and it's much easier to just use zero-width assertions, which are not consumed.
Then we handle all of the normal quoted characters. Note that we use zero-width assertions here for the same reasons as above.
Finally, we can find lone slashes (note we have to escape it twice: once for C# strings and again for regex metacharacters) and replace that with empty curly brackets.
I ran all of your test cases (and a few of my own invention) through this series of matches, and it all worked as expected.
I'm no regex god, so one simple way:
Get / construct the final replacement string(s) - ex. "{...}", "{&}"
Replace all occurrences of these in the input with a reserved char (unicode to the rescue)
Run your matching regex(es) and put "{" or whatever desired marker(s).
Replace reserved char(s) with the original string.
Ignoring the case where your original input string has a { or } character, a common way to avoid re-applying a regex to an already-escaped string is to look for the escape sequence and remove it from the string before applying your regex to the remainders. Here's an example regex to find things that are already escaped:
Regex escapedPattern = new Regex(#"\{[^{}]*\}"); // consider adding RegexOptions.Compiled
The basic idea of this negative-character class pattern comes from regular-expressions.info, a very helpful site for all thing regex. The pattern works because for any inner-most pair of braces, there must be a { followed by non {}'s followed by a }
Run the escapedPattern on the input string, find for each Match get the start and end indices in the original string and substring them out, then with the final cleaned string run your original pattern match again or use something like the following:
Regex punctPattern = new Regex(#"[^\w\d\s]+"); // this assumes all non-word,
// digit or space chars are punctuation, which may not be a correct
//assumption
And replace Match.Groups[1].Value for each match (groups are a 0 based array where 0 is the whole match, 1 is the first set of parentheses, 2 is the next etc.) with "{" + Match.Groups[1].Value + "}"

Find all kinds of numbers within a string

The string can contains ints, floats and hexadecimal numbers for example.
"This a string than can have -345 and 57 and could also have 35.4656 or a subtle 0xF46434 and more"
What could I use to find these numbers in C#?
Use something along these lines: (I wrote it myself, so I'm not going to say it's all-inclusive for whatever sort of numbers you're looking to find, but it works for your example)
var str = "123 This a string than can have -345 and 57 and could also have 35.4656 or a subtle 0XF46434 and more like -0xf46434";
var a = Regex.Matches(str, #"(?<=(^|[^a-zA-Z0-9_^]))(-?\d+(\.\d+)?|-?0[xX][0-9A-Fa-f]+)(?=([^a-zA-Z0-9_]|$))");
foreach (Match match in a)
{
//do something
}
Regex seems to be a write-only language, (i.e. incredibly hard to read) so I'll break it down so you can understand: (?<=(^|[^a-zA-Z0-9_^])) is a lookbehind to break it by a word boundary. I can't use \b because it considers - a boundary character, so it would only match 345 instead of -345. -?\d+(\.\d+)? matches decimal numbers, optionally negative, optionally with fractional digits. -?0[xX][0-9A-Fa-f]+ matches hexadecimal numbers, case insensitive, optionally negative. Finally, (?=([^a-zA-Z0-9_]|$)) is a lookahead, again as a word boundary. Note that in the first boundary, I allowed for the start of the string, and here I allow for the end of the string.
Just try to parse each word to double and return the array of doubles.
Here is a way to get array of doubles from a string:
double[] GetNumbers(string str)
{
double num;
List<double> l = new List<double>();
foreach (string s in str.Split(' '))
{
bool isNum = double.TryParse(s, out num);
if (isNum)
{
l.Add(num);
}
}
return l.ToArray();
}
more info about double.TryParse() here.
Given your input above this expression matches every number present there
string line = "This a string than can have " +
"-345 and 57 and could also have 35.4656 " +
"or a subtle 0xF46434 and more";
Regex r = new Regex(#"(-?0[Xx][A-Fa-f0-9]+|-?\d+\.\d+|-?\d+)");
var m = r.Matches(line);
foreach(Match h in m)
Console.WriteLine(h.ToString());
EDIT: for a replace you use the Replace method that takes a MatchEvaluator overload
string result = r.Replace(line, new MatchEvaluator(replacementMethod));
public string replacementMethod(Match match)
{
return "?????";
}
Explaining the regex pattern
First, the sequence "(pattern1|pattern2|pattern3)" means that we have three possible pattern to find in our string. One of them is enough to have a match
First pattern -?0[Xx][A-Fa-f0-9]+ means an optional minus followed by a zero followed by an X or x char followed by a series of one or more chars in the range A-F a-f or 0-9
Second pattern -?\d+\.\d+ means an optional minus followed by a series of 1 or more digits followed by the decimal point followed by a series of 1 or more digits
Third pattern -?\d+ means an optional minus followed by a series of 1 or more digits.
The sequence of patterns is of utmost importance. If you reverse the pattern and put the integer match before the decimal pattern the results will be wrong.
Besides regex, which tends to have its own problems, you can build a state machine to do the processing. You can decide on which inputs the machine would accept as 'numbers'. Unlike regex, a state machine will have predictably decent performance, and will also give you predictable results (whereas regex can sometimes match rather surprising things).
It's not really that difficult, when you think about it. There are rather few states, and you can define special cases explicitly.
EDIT: The following is an edit as a response to the comment.
In .NET, Regex is implemented as an NFA (Nontdeterminisitc Finite Automaton). On one hand, it's a very powerful parser, but on the other, it can sometimes backtrack much more than it should. This is especially true when you're accepting unsafe input (input from the user, which can be just about anything). While I'm not sure what sort of Regex expression you'll be using to parse the result, you can induce a performance hit in pretty much anything. Although in most cases performance is a non-issue, Regex performance can scale exponentially with the input. That means that, in some cases, it really can be a bottleneck. And a rather unexpected one.
Another potential problem stemming from the greedy nature of Regex is that sometimes it can match unexpected things. You might use the same Regex expression for days, and it might work fine, waiting for the right combination of overlooked characters to be parsed, and you'll end up writing garbage into your database.
By state machine, I mean parsing the input using a deterministic finite automaton, or something like that. I'll show you what I mean. Here's a small DFA for parsing a positive decimal integer or float within a string. I'm pretty sure you can build a DFA using frameworks like ANTLR, though I'm sure there are also less powerful ones around.

Remove all "invisible" chars from a string?

I'm writing a little class to read a list of key value pairs from a file and write to a Dictionary<string, string>. This file will have this format:
key1:value1
key2:value2
key3:value3
...
This should be pretty easy to do, but since a user is going to edit this file manually, how should I deal with whitespaces, tabs, extra line jumps and stuff like that? I can probably use Replace to remove whitespaces and tabs, but, is there any other "invisible" characters I'm missing?
Or maybe I can remove all characters that are not alphanumeric, ":" and line jumps (since line jumps are what separate one pair from another), and then remove all extra line jumps. If this, I don't know how to remove "all-except-some" characters.
Of course I can also check for errors like "key1:value1:somethingelse". But stuff like that doesn't really matter much because it's obviously the user's fault and I would just show a "Invalid format" message. I just want to deal with the basic stuff and then put all that in a try/catch block just in case anything else goes wrong.
Note: I do NOT need any whitespaces at all, even inside a key or a value.
I did this one recently when I finally got pissed off at too much undocumented garbage forming bad xml was coming through in a feed. It effectively trims off anything that doesn't fall between a space and the ~ in the ASCII table:
static public string StripControlChars(this string s)
{
return Regex.Replace(s, #"[^\x20-\x7F]", "");
}
Combined with the other RegEx examples already posted it should get you where you want to go.
If you use Regex (Regular Expressions) you can filter out all of that with one function.
string newVariable Regex.Replace(variable, #"\s", "");
That will remove whitespace, invisible chars, \n, and \r.
One of the "white" spaces that regularly bites us is the non-breakable space. Also our system must be compatible with MS-Dynamics which is much more restrictive. First, I created a function that maps the 8th bit characters to their approximate 7th bit counterpart, then I removed anything that was not in the x20 to x7f range further limited by the Dynamics interface.
Regex.Replace(s, #"[^\x20-\x7F]", "")
should do that job.
The requirements are too fuzzy. Consider:
"When is a space a value? key?"
"When is a delimiter a value? key?"
"When is a tab a value? key?"
"Where does a value end when a delimiter is used in the context of a value? key"?
These problems will result in code filled with one off's and a poor user experience. This is why we have language rules/grammar.
Define a simple grammar and take out most of the guesswork.
"{key}":"{value}",
Here you have a key/value pair contained within quotes and separated via a delimiter (,). All extraneous characters can be ignored. You could use use XML, but this may scare off less techy users.
Note, the quotes are arbitrary. Feel free to replace with any set container that will not need much escaping (just beware the complexity).
Personally, I would wrap this up in a simple UI and serialize the data out as XML. There are times not to do this, but you have given me no reason not to.
var split = textLine.Split(":").Select(s => s.Trim()).ToArray();
The Trim() function will remove all the irrelevant whitespace. Note that this retains whitespace inside of a key or value, which you may want to consider separately.
You can use string.Trim() to remove white-space characters:
var results = lines
.Select(line => {
var pair = line.Split(new[] {':'}, 2);
return new {
Key = pair[0].Trim(),
Value = pair[1].Trim(),
};
}).ToList();
However, if you want to remove all white-spaces, you can use regular expressions:
var whiteSpaceRegex = new Regex(#"\s+", RegexOptions.Compiled);
var results = lines
.Select(line => {
var pair = line.Split(new[] {':'}, 2);
return new {
Key = whiteSpaceRegex.Replace(pair[0], string.Empty),
Value = whiteSpaceRegex.Replace(pair[1], string.Empty),
};
}).ToList();
If it doesn't have to be fast, you could use LINQ:
string clean = new String(tainted.Where(c => 0 <= "ABCDabcd1234:\r\n".IndexOf(c)).ToArray());

In C# regular expression why does the initial match show up in the groups?

So if I write a regex it's matches I can get the match or I can access its groups. This seems counter intuitive since the groups are defined in the expression with braces "(" and ")". It seems like it is not only wrong but redundant. Any one know why?
Regex quickCheck = new Regex(#"(\D+)\d+");
string source = "abc123";
m.Value //Equals source
m.Groups.Count //Equals 2
m.Groups[0]) //Equals source
m.Groups[1]) //Equals "abc"
I agree - it is a little strange, however I think there are good reasons for it.
A Regex Match is itself a Group, which in turn is a Capture.
But the Match.Value (or Capture.Value as it actually is) is only valid when one match is present in the string - if you're matching multiple instances of a pattern, then by definition it can't return everything. In effect - the Value property on the Match is a convenience for when there is only match.
But to clarify where this behaviour of passing the whole match into Groups[0] makes sense - consider this (contrived) example of a naive code unminifier:
[TestMethod]
public void UnMinifyExample()
{
string toUnMinify = "{int somevalue = 0; /*init the value*/} /* end */";
string result = Regex.Replace(toUnMinify, #"(;|})\s*(/\*[^*]*?\*/)?\s*", "$0\n");
Assert.AreEqual("{int somevalue = 0; /*init the value*/\n} /* end */\n", result);
}
The regex match will preserve /* */ comments at the end of a statement, placing a newline afterwards - but works for either ; or } line-endings.
Okay - you might wonder why you'd bother doing this with a regex - but humour me :)
If Groups[0] generated by the matches for this regex was not the whole capture - then a single-call replace would not be possible - and your question would probably be asking why doesn't the whole match get put into Groups[0] instead of the other way round!
The documentation for Match says that the first group is always the entire match so it's not an implementation detail.
It's historical is all. In Perl 5, the contents of capture groups are stored in the special variables $1, $2, etc., but C#, Java, and others instead store them in an array (or array-like structure). To preserve compatibility with Perl's naming convention (which has been copied by several other languages), the first group is stored in element number one, the second in element two, etc. That leaves element zero free, so why not store the full match there?
FYI, Perl 6 has adopted a new convention, in which the first capturing group is numbered zero instead of one. I'm sure it wasn't done just to piss us off. ;)
Most likely so that you can use "$0" to represent the match in a substitution expression, and "$1" for the first group match, etc.
I don't think there's really an answer other than the person who wrote this chose that as an implementation detail. As long as you remember that the first group will always equal the source string you should be ok :-)
Not sure why either, but if you use named groups you can then set the option RegExOptions.ExplicitCapture and it should not include the source as first group.
It might be redundant, however it has some nice properties.
For example, it means the capture groups work the same way as other regex engines - the first capture group corresponds to "1", and so on.
Backreferences are one-based, e.g., \1 or $1 is the first parenthesized subexpression, and so on. As laid out, one maps to the other without any thought.
Also of note: m.Groups["0"] gives you the entire matched substring, so be sure to skip "0" if you're iterating over regex.GetGroupNames().

Categories