Regex to match and return group names - c#

I need to match the following strings and returns the values as groups:
abctic
abctac
xyztic
xyztac
ghhtic
ghhtac
Pattern is wrote with grouping is as follows:
(?<arch>[abc,xyz,ghh])(?<flavor>[tic,tac]$)
The above returns only parts of group names. (meaning match is not correct).
If I use * in each sub pattern instead of $ at the end, groups are correct, but that would mean that abcticff will also match.
Please let me know what my correct regex should be.

Your pattern is incorrect because a pipe symbol | is used to specify alternate matches, not a comma in brackets as you were using, i.e., [x,y].
Your pattern should be: ^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$
The ^ and $ metacharacters ensures the string matches from start to end. If you need to match text in a larger string you could replace them with \b to match on a word boundary.
Try this approach:
string[] inputs = { "abctic", "abctac", "xyztic", "xyztac", "ghhtic", "ghhtac" };
string pattern = #"^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$";
foreach (var input in inputs)
{
var match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("Arch: {0} - Flavor: {1}",
match.Groups["arch"].Value,
match.Groups["flavor"].Value);
}
else
Console.WriteLine("No match for: " + input);
}

Related

Use RegEx to extract specific part from string

I have string like
"Augustin Ralf (050288)"
"45 Max Müller (4563)"
"Hans (Adam) Meider (056754)"
I am searching for a regex to extract the last part in the brackets, for example this results for the strings above:
"050288"
"4563"
"056754"
I have tried with
var match = Regex.Match(string, #".*(\(\d*\))");
But I get also the brackets with the result. Is there a way to extract the strings and get it without the brackets?
Taking your requirements precisely, you are looking for
\(([^()]+)\)$
This will capture anything between the parentheses (not nested!), may it be digits or anything else and anchors them to the end of the string. If you happen to have whitespace at the end, use
\(([^()]+)\)\s*$
In C# this could be
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\(([^()]+)\)$";
string input = #"Augustin Ralf (050288)
45 Max Müller (4563)
Hans (Adam) Meider (056754)
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
See a demo on regex101.com.
please use regex - \(([^)]*)\)[^(]*$. This is working as expected. I have tested here
You can extract the number between the parantheses without worring about extracting the capturing groups with following regex.
(?<=\()\d+(?=\)$)
demo
Explanation:
(?<=\() : positive look behind for ( meaning that match will start after a ( without capturing it to the result.
\d+ : captures all digits in a row until non digit character found
(?=\)$) : positive look ahead for ) with line end meaning that match will end before a ) with line ending without capturing ) and line ending to the result.
Edit: If the number can be within parantheses that is not at the end of the line, remove $ from the regex to fix the match.
var match = Regex.Match(string, #".*\((\d*)\)");
https://regex101.com/r/Wk9asY/1
Here are three options for you.
The first one uses the simplest pattern and in addition the Trim method.
The second one uses capturing the desired value to the group and then getting it from the group.
The third one uses Lookbehind and Lookahead.
var inputs = new string[] {
"Augustin Ralf (050288)", "45 Max Müller (4563)", "Hans (Adam) Meider (056754)"
};
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\(\d+\)");
Console.WriteLine(match.Value.Trim('(', ')'));
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\((\d+)\)");
Console.WriteLine(match.Groups[1]);
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"(?<=\()\d+(?=\))");
Console.WriteLine(match.Value);
}
Console.WriteLine();

Building a regular expression in C#

How to check the following text in C# with Regex:
key_in-get { 43243225543543543 };
or
key_in_set { password123 : 34980430943834 };
I tried to build a regular expression, but I failed after few hours.
Here is my code:
string text1 = "key_in-get { 322389238237 };";
string text2 = "key_in-set { password123 : 322389238237 };";
string pattern = "key_in-(get|set) { .* };";
var result1 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result1);
var result2 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result2);
I have to check if there is "set" or "get".
If the pattern finds "set" then it can only accept following pattern "text123 : 123456789", and if it finds "get" then should accept only "123456789".
You can use
key_in-(?:get|(set)) {(?(1) \w+ :) \w+ };
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\w+\s*};
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\d+\s*};
See the regex demo. The second one allows any amount of any whitespace between the elements and the third one allows only digits after : or as part of the get expression.
If the whole string must match, add ^ at the start and $ at the end of the pattern.
Details:
key_in- - a substring
(?:get|(set)) - get or set (the latter is captured into Group 1)
\s* - zero or more whitespaces
{ - a { char
(?(1)\s*\w+\s*:) - a conditional construct: if Group 1 matched, match one or more word chars enclosed with zero or more whitespaces and then a colon
\s*\w+\s* - one or more word chars enclosed with zero or more whitespaces
}; - a literal substring.
In the pattern that you tried key_in-(get|set) { .* }; you are matching either get or set followed by { until the last occurrence of } which could possibly also match key_in-get { }; };
As an alternative solution, you could use an alternation | specifying each of the accepted parts for the get and the set.
key_in-(?:get\s*{\s*\w+|set\s*{\s*\w+\s*:\s*\w+)\s*};
The pattern matches
key_in- Match literally
(?: Non capture group
get\s*{\s*\w+ Match get, { between optional whitespace chars and 1+ word chars
| Or
set\s*{\s*\w+\s*:\s*\w+ Match set, { between optional whitespace chars and word chars on either side with : in between.
) Close non capture group
\s*}; Match optional whitespace chars and };
Regex demo

search string for everything before a set of characters in C#

I'm looking for a way to search a string for everything before a set of characters in C#. For Example, if this is my string value:
This is is a test.... 12345
I want build a new string with all of the characters before "12345".
So my new string would equal "This is is a test.... "
Is there a way to do this?
I've found Regex examples where you can focus on one character but not a sequence of characters.
You don't need to use a Regex:
public string GetBitBefore(string text, string end)
{
var index = text.IndexOf(end);
if (index == -1) return text;
return text.Substring(0, index);
}
You can use a lazy quantifier to match anything, followed by a lookahead:
var match = Regex.Match("This is is a test.... 12345", #".*?(?=\d{5})");
where:
.*? lazily matches everything (up to the lookahead)
(?=…) is a positive lookahead: the pattern must be matched, but is not included in the result
\d{5} matches exactly five digits. I'm assuming this is your lookahead; you can replace it
You can do so with help of regex lookahead.
.*(?=12345)
Example:
var data = "This is is a test.... 12345";
var rxStr = ".*(?=12345)";
var rx = new System.Text.RegularExpressions.Regex (rxStr,
System.Text.RegularExpressions.RegexOptions.IgnoreCase);
var match = rx.Match(data);
if (match.Success) {
Console.WriteLine (match.Value);
}
Above code snippet will print every thing upto 12345:
This is is a test....
For more detail about see regex positive lookahead
This should get you started:
var reg = new Regex("^(.+)12345$");
var match = reg.Match("This is is a test.... 12345");
var group = match.Groups[1]; // This is is a test....
Of course you'd want to do some additional validation, but this is the basic idea.
^ means start of string
$ means end of string
The asterisk tells the engine to attempt to match the preceding token zero or more times. The plus tells the engine to attempt to match the preceding token once or more
{min,max} indicate the minimum/maximum number of matches.
\d matches a single character that is a digit, \w matches a "word character" (alphanumeric characters plus underscore), and \s matches a whitespace character (includes tabs and line breaks).
[^a] means not so exclude a
The dot matches a single character, except line break characters
In your case there many way to accomplish the task.
Eg excluding digit: ^[^\d]*
If you know the set of characters and they are not only digit, don't use regex but IndexOf(). If you know the separator between first and second part as "..." you can use Split()
Take a look at this snippet:
class Program
{
static void Main(string[] args)
{
string input = "This is is a test.... 12345";
// Here we call Regex.Match.
MatchCollection matches = Regex.Matches(input, #"(?<MySentence>(\w+\s*)*)(?<MyNumberPart>\d*)");
foreach (Match item in matches)
{
Console.WriteLine(item.Groups["MySentence"]);
Console.WriteLine("******");
Console.WriteLine(item.Groups["MyNumberPart"]);
}
Console.ReadKey();
}
}
You could just split, not as optimal as the indexOf solution
string value = "oiasjdoiasj12345";
string end = "12345";
string result = value.Split(new string[] { end }, StringSplitOptions.None)[0] //Take first part of the result, not the quickest but fairly simple

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

Dot word pattern matching

I want to create a regular expression to match a word that begins with a period. The word(s) can exist N times in a string. I want to ensure that the word comes up whether it's at the beginning of a line, the end of a line or somewhere in the middle. The latter part is what I'm having difficulty with.
Here is where I am at so far.
const string pattern = #"(^|(.* ))(?<slickText>\.[a-zA-Z0-9]*)( .*|$)";
public static MatchCollection Find(string input)
{
Regex regex = new Regex(pattern,RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection collection = regex.Matches(input);
return collection;
}
My test pattern finds .lee and .good. My test pattern fails to find .bruce:
static void Main()
{
MatchCollection results = ClassName.Find("a short stump .bruce\r\nand .lee a small tree\r\n.good roots");
foreach (Match item in results)
{
GroupCollection groups = item.Groups;
Console.WriteLine("{0} ", groups["slickText"].Value);
}
System.Diagnostics.Debug.Assert(results.Count > 0);
}
Maybe you're just looking for \.\w+?
Test:
var s = "a short stump .bruce\r\nand .lee a small tree\r\n.good roots";
Regex.Matches(s, #"\.\w+").Dump();
Result:
Note:
If you don't want to find foo in some.foo (because there's no whitespace between some and .foo), you can use (?<=\W|^)\.\w+ instead.
Bizarrely enough, it seems that with RegexOptions.Multiline, ^ and $ will only additionally match \n, not \r\n.
Thus you get .good because it is preceded by \n which is matched by ^, but you don't get .bruce because it is succeeded by \r which is not matched by $.
You could do a .Replace("\r", "") on the input, or rewrite your expression to take individual lines of input.
Edit: Or replace $ with \r?$ in your pattern to explicitly include the \r; thanks to SvenS for the suggestion.
In your RegEx, a word has to be terminated by a space, but bruce is terminated by \r instead.
I would give this regex a go:
(?:.*?(\.[A-Za-z]+(?:\b|.\s)).*?)+
And change the RegexOptions from Multiline to Singleline - in this mode dot matches all characters including newline.

Categories