How to get the matched sub-groups in C# using Regex? - c#

I have a string:
{Lower Left ( 460700.000, 2121200.000)}
and here is my code:
var pat = #"Lower Left\s*\(\s*[\d\.]+\,(\s)*[\d\.]+\)";
var r = new Regex(pat, RegexOptions.IgnoreCase);
var m = r.Match(s);
The m.Groups[0] now equals:
{Lower Left ( 460700.000, 2121200.000)}
But I want to get the coordinate strings in two variables, e.g. X and Y. how to do it?

You could do like this:
string s = "{Lower Left ( 460700.000, 2121200.000)}";
var pat = #"Lower Left\s*\(\s*(\d+\.\d+)\,\s*(\d+\.\d+)\)";
var r = new Regex(pat, RegexOptions.IgnoreCase);
var m = r.Match(s);
Console.WriteLine(m.Groups[1]); // first number
Console.WriteLine(m.Groups[2]); // second number
If your number may or may not contain ., you can use:
string s = "{Lower Left ( 460700.000, 2121200.000)}";
var pat = #"Lower Left\s*\(\s*(\d+(?:\.\d+)?)\,\s*(\d+(?:\.\d+)?)\)";
var r = new Regex(pat, RegexOptions.IgnoreCase);
var m = r.Match(s);
Console.WriteLine(m.Groups[1]);
Console.WriteLine(m.Groups[2]);
This will accept this number: 123456 (no dot), 123.456 (one dot inside), but not 123.456.7 (two dot) or 1234. (dot at the end).

The first group allways returns the entire match, whilst the indexed ones contain your actual values for the matching groups. So you need m.Groups[1] and m.Groups[1] accordingly.
You can also name your groups:
#"Lower Left\s*\(\s*(?<X>\d+\.\d+),(\s)*(?<Y>\d+\.\d+)\)";
Where (?<identifier>anyPattern) means build a matching-group which is named identifier and has the pattern given by anyPattern.
Allowing you to access them like this:
m.Groups["X"]
m.Groups["Y"]
The square-brackets ([]) are also not needed at all as this would mean "either a number od digits (\d+), or a dot", not "a number of digits followed by a dot followed by a number of digits".

Related

RegEx string between N and (N+1)th Occurance

I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;
If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.
Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2
Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];

How to get two numerical values from a string in C#

I have a string like this :
X LIMITED COMPANY (52100000/58447000)
I want to extract X LIMITED COMPANY, 52100000 and 58447000 seperately.
I'm extracting X LIMITED COMPANY like this :
companyName = Regex.Match(mystring4, #"[a-zA-Z\s]+").Value.Trim();
But I'm stuck with extracting numbers, they can be 1, 2 or large numbers in the example. Can you show me how to extract those numbers? Thanks.
Try regular expressions with alternative | (or):
Either word symbols (but not digits) [\w-[\d]][\w\s-[\d]]+)
Digits only ([0-9]+)
E.g.
string mystring4 = #"AKASYA CAM SANAYİ VE TİCARET LİMİTED ŞİRKETİ(52100000 / 58447000)";
string[] values = Regex
.Matches(mystring4, #"([\w-[\d]][\w\s-[\d]]+)|([0-9]+)")
.OfType<Match>()
.Select(match => match.Value.Trim())
.ToArray();
Test
// X LIMITED COMPANY
// 52100000
// 58447000
Console.Write(string.Join(Environment.NewLine, values));
I suggested changing the initial pattern [a-zA-Z\s]+ into [a-zA-Z][a-zA-Z\s]+ in order to skip matches which contain separators only (e.g. " ")
Try using named groups:
var s = "X LIMITED COMPANY (52100000 / 58447000)";
var regex = new Regex(#"(?<CompanyName>[^\(]+)\((?<Num1>\d+)\s*/\s*(?<Num2>\d+)\)");
var match = regex.Match(s);
var companyName = match.Groups["CompanyName"];
If the format is fixed, you could try this:
var regex = new Regex(#"^(?<name>[^\(]+)\((?<n1>\d+)/(?<n2>\d+)\)");
var match = regex.Match(input);
var companyName = match.Groups["name"].Value;
var number1 = Convert.ToInt64(match.Groups["n1"].Value);
var number2 = Convert.ToInt64(match.Groups["n2"].Value);
This matches everything up to the open parentheses and puts it into a named group "name". Then it matches two numbers within parentheses, separated by "/" and puts them into groups named "n1" and "n2" respectively.

Replace matches AFTER having processed the match

In my program I am implementing tokens to be replaced with variable values.
Such a token is #INT[1-5] (meaning it will get replaced with a random int between 1 and 5.
I have already written the regex to match the token: #INT[\d+-\d+]
However I don't know how to replace the token (after having processed the match and calculated the random number.
So far I have the following:
Random random = new Random();
Regex regex = new Regex(#"#INT\[\d+-\d+\]");
MatchCollection matches = regex.Matches("This is one of #INT[1-5] tests");
foreach (Match m in matches)
{
if (m.Success)
{
var ints = m.Value.Split('-').Select(x => Convert.ToInt32(x)).ToArray();
int intToInsert = random.Next(ints[0], ints[1]);
//now how do I insert the int in place of the match?
}
}
I think you need to make use of the match evaluator with Regex.Replace and use capturing groups around the numbers in your regex:
var regex = new Regex(#"#INT\[(\d+)-(\d+)\]");
// ^ ^ ^ ^
var res = regex.Replace("This is one of #INT[1-5] tests", m =>
random.Next(Convert.ToInt32(m.Groups[1].Value), Convert.ToInt32(m.Groups[2].Value)).ToString());
Results: This is one of 2 tests, This is one of 3 tests, ...
The captured texts can be accessed with m.Groups[n].Value.

Regexp find position of different characters in string

I have a string conforming to the following pattern:
(cc)-(nr).(nr)M(nr)(cc)whitespace(nr)
where cc is artbitrary number of letter characters, nr is arbitrary number of numerical characters, and M is is the actual letter M.
For example:
ASF-1.15M437979CA 100000
EU-12.15M121515PO 1145
I need to find the positions of -, . and M whithin the string. The problem is, the leading characters and the ending characters can contain the letter M as well, but I need only the one in the middle.
As an alternative, the subtraction of the first characters (until -) and the first two numbers (as in (nr).(nr)M...) would be enough.
If you need a regex-based solution, you just need to use 3 capturing groups around the required patterns, and then access the Groups[n].Index property:
var rxt = new Regex(#"\p{L}*(-)\d+(\.)\d+(M)\d+\p{L}*\s*\d+");
// Collect matches
var matches = rxt.Matches(#"ASF-1.15M437979CA 100000 or EU-12.15M121515PO 1145");
// Now, we can get the indices
var posOfHyphen = matches.Cast<Match>().Select(p => p.Groups[1].Index);
var posOfDot = matches.Cast<Match>().Select(p => p.Groups[2].Index);
var posOfM = matches.Cast<Match>().Select(p => p.Groups[3].Index);
Output:
posOfHyphen => [3, 32]
posOfDot => [5, 35]
posOfM => [8, 38]
Regex:
string pattern = #"[A-Z]+(-)\d+(\.)\d+(M)\d+[A-Z]+";
string value = "ASF-1.15M437979CA 100000 or EU-12.15M121515PO 1145";
var match = Regex.Match(value, pattern);
if (match.Success)
{
int sep1 = match.Groups[1].Index;
int sep2 = match.Groups[2].Index;
int sep3 = match.Groups[3].Index;
}

regex to strip number from var in string

I have a long string and I have a var inside it
var abc = '123456'
Now I wish to get the 123456 from it.
I have tried a regex but its not working properly
Regex regex = new Regex("(?<abc>+)=(?<var>+)");
Match m = regex.Match(body);
if (m.Success)
{
string key = m.Groups["var"].Value;
}
How can I get the number from the var abc?
Thanks for your help and time
var body = #" fsd fsda f var abc = '123456' fsda fasd f";
Regex regex = new Regex(#"var (?<name>\w*) = '(?<number>\d*)'");
Match m = regex.Match(body);
Console.WriteLine("name: " + m.Groups["name"]);
Console.WriteLine("number: " + m.Groups["number"]);
prints:
name: abc
number: 123456
Your regex is not correct:
(?<abc>+)=(?<var>+)
The + are quantifiers meaning that the previous characters are repeated at least once (and there are no characters since (?< ... > ... ) is named capture group and is not considered as a character per se.
You perhaps meant:
(?<abc>.+)=(?<var>.+)
And a better regex might be:
(?<abc>[^=]+)=\s*'(?<var>[^']+)'
[^=]+ will match any character except an equal sign.
\s* means any number of space characters (will also match tabs, newlines and form feeds though)
[^']+ will match any character except a single quote.
To specifically match the variable abc, you then put it like this:
(?<abc>abc)\s*=\s*'(?<var>[^']+)'
(I added some more allowances for spaces)
From the example you provided the number can be gotten such as
Console.WriteLine (
Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456
\d+ means 1 or more numbers (digits).
But I surmise your data doesn't look like your example.
Try this:
var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x";
Regex regex = new Regex(#"(?<=var \w+ = ')\d+");
Match m = regex.Match(body);

Categories