Regex - Extract string patterns - c#

I have many strings like these
/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item
...
For each one I need to extract the defined pattern like these
/test/v1/{Value}/item
/test/v1/{Value}/item/{Value}/document
/test/v2/field/{Value}/item
The value can be a guid or something else, Can I match the given string patterns with input paths with regex?
I wrote just this code but I don't konw how to match input paths with patterns. The result should be the pattern. Thank you
string pattern1 = "/test/v1/{Value}/item";
string pattern2 = "/test/v1/{Value}/item/{Value}/document";
string pattern3 = "/test/v2/field/{Value}/item";
List<string> paths = new List<string>();
List<string> matched = new List<string>();
paths.Add("/test/v1/3908643GASF/item");
paths.Add("/test/v1/343569/item/AAAS45663/document");
paths.Add("/test/v1/343569/item/AAAS45664/document");
paths.Add("/test/v1/123444/item/AAAS45688/document");
paths.Add("/test/v2/field/1230FRE/item");
foreach (var path in paths)
{
}

This can also be achieved using regex alone. You can probably try:
(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?
Explanation of the above regex:
(\w+) - Represents a capturing group matching a word character one or more time. This group captures our required result.
\/\w+(?<=\/item) - Represents a positive look-behind matching the characters before \items.
$1 - Captured group 1 contains the required information you're expecting.
(\/(\w+)\/)? - Represents the second and third capturing group capturing if after item some other values is present or not.
You can find the demo of the above regex in here.
Sample implementation in C#:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?";
string input = #"/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.Write(m.Groups[1].Value + " ");
if(m.Groups[3].Value != null)
Console.WriteLine(m.Groups[3].Value);
}
}
}
You can find the sample run of the above implementation in here.

Related

Use RegEx to extract specific part from string

I have string like
"Augustin Ralf (050288)"
"45 Max Müller (4563)"
"Hans (Adam) Meider (056754)"
I am searching for a regex to extract the last part in the brackets, for example this results for the strings above:
"050288"
"4563"
"056754"
I have tried with
var match = Regex.Match(string, #".*(\(\d*\))");
But I get also the brackets with the result. Is there a way to extract the strings and get it without the brackets?
Taking your requirements precisely, you are looking for
\(([^()]+)\)$
This will capture anything between the parentheses (not nested!), may it be digits or anything else and anchors them to the end of the string. If you happen to have whitespace at the end, use
\(([^()]+)\)\s*$
In C# this could be
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\(([^()]+)\)$";
string input = #"Augustin Ralf (050288)
45 Max Müller (4563)
Hans (Adam) Meider (056754)
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
See a demo on regex101.com.
please use regex - \(([^)]*)\)[^(]*$. This is working as expected. I have tested here
You can extract the number between the parantheses without worring about extracting the capturing groups with following regex.
(?<=\()\d+(?=\)$)
demo
Explanation:
(?<=\() : positive look behind for ( meaning that match will start after a ( without capturing it to the result.
\d+ : captures all digits in a row until non digit character found
(?=\)$) : positive look ahead for ) with line end meaning that match will end before a ) with line ending without capturing ) and line ending to the result.
Edit: If the number can be within parantheses that is not at the end of the line, remove $ from the regex to fix the match.
var match = Regex.Match(string, #".*\((\d*)\)");
https://regex101.com/r/Wk9asY/1
Here are three options for you.
The first one uses the simplest pattern and in addition the Trim method.
The second one uses capturing the desired value to the group and then getting it from the group.
The third one uses Lookbehind and Lookahead.
var inputs = new string[] {
"Augustin Ralf (050288)", "45 Max Müller (4563)", "Hans (Adam) Meider (056754)"
};
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\(\d+\)");
Console.WriteLine(match.Value.Trim('(', ')'));
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\((\d+)\)");
Console.WriteLine(match.Groups[1]);
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"(?<=\()\d+(?=\))");
Console.WriteLine(match.Value);
}
Console.WriteLine();

Regex to parse URL from an excel formula

I have a formula in excel which upon reading from C# code looks like this
"=HYPERLINK(CONCATENATE(\"https://abc.efghi.rtyui.com/#/wqeqwq/\",#REF!,\"/asdasd\"), \"View asdas\")"
I want to use regex to fetch the URL from this string, i.e.
https://abc.efghi.rtyui.com/#/wqeqwq/#REF!/asdasd
The url can be different but the format of the formula will remain the same.
"=HYPERLINK(CONCATENATE(\"{SOME_STRING}\",#REF!,\"{SOME_STRING}\"), \"View asdas\")"
Try it like this:
(?<=HYPERLINK\(CONCATENATE\(")[^"]+
Demo
The positive lookbehind allows us to skip part in-front of the URL from the full match.
If you have an arbitrary number of whitespace in-between add some \s*, e.g. see this example that also shows the escaped = at the beginning of the string.
Sample Code:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(?<=HYPERLINK\(CONCATENATE\("")[^""]+";
string input = #"=HYPERLINK(CONCATENATE(""https://abc.efghi.rtyui.com/#/wqeqwq/"",#REF!,""/asdasd""), ""View asdas"")";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
Addendum: Here is another technique that uses capturing groups and regex Replace to extract the resulting URL string (after CONCATENATE would have happened):
^\=HYPERLINK\(CONCATENATE\("([^"]+)",([^,]+),"([^"]+)".*$
Demo2
string pattern = #"^\=HYPERLINK\(CONCATENATE\(""([^""]+)"",([^,]+),""([^""]+)"".*$";
string substitution = #"$1$2$3";
string input = #"=HYPERLINK(CONCATENATE(""https://abc.efghi.rtyui.com/#/wqeqwq/"",#REF!,""/asdasd""), ""View asdas"")";
Regex regex = new Regex(pattern);
string result = regex.Replace(input, substitution, 1);
You can extract the URL from the formula using capturing groups in regular expression as given below:
string inputString = "=HYPERLINK(CONCATENATE(\"https://abc.efghi.rtyui.com/#/wqeqwq/\",#REF!,\"/asdasd\"), \"View asdas\")";
string regex = "CONCATENATE\\(\"([\\S]+)\",#REF!,\"([\\S]+)\"\\)";
Regex substringRegex = new Regex(regex, RegexOptions.IgnoreCase);
Match substringMatch = substringRegex.Match(inputString);
if (substringMatch.Success)
{
string url = substringMatch.Groups[1].Value + "#REF!" + substringMatch.Groups[2].Value;
}
I have defined two capturing groups in my regular expression. One for extracting part of the URL before #REF! and the other for extracting part of the URL after #REF!. Then I am concatenating all the extracted parts with #REF! to get the final URL.

Attempting to capture multiple groups but only the last group is captured

I am trying to use regex to help to convert the following string into a Dictionary:
{TheKey|TheValue}{AnotherKey|AnotherValue}
Like such:
["TheKey"] = "TheValue"
["AnotherKey"] = "AnotherValue"
To parse the string for the dictionary, I am using the regex expression:
^(\{(.+?)\|(.+?)\})*?$
But it will only capture the last group of {AnotherKey|AnotherValue}.
How do I get it to capture all of the groups?
I am using C#.
Alternatively, is there a more straightforward way to approach this rather than using Regex?
Code (Properties["PromptedValues"] contains the string to be parsed):
var regex = Regex.Matches(Properties["PromptedValues"], #"^(\{(.+?)\|(.+?)\})*?$");
foreach(Match match in regex) {
if(match.Groups.Count == 4) {
var key = match.Groups[2].Value.ToLower();
var value = match.Groups[3].Value;
values.Add(key, new StringPromptedFieldHandler(key, value));
}
}
This is coded to work for the single value, I would be looking to update it once I can get it to capture multiple values.
The $ says that: The match must occur at the end of the string or before \n at the end of the line or string.
The ^ says that: The match must start at the beginning of the string or line.
Read this for more regex syntax: msdn RegEx
Once you remove the ^ and $ your regex will match all of the sets You should read: Match.Groups and get something like the following:
public class Example
{
public static void Main()
{
string pattern = #"\{(.+?)\|(.+?)\}";
string input = "{TheKey|TheValue}{AnotherKey|AnotherValue}";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine("The Key: {0}", match.Groups[1].Value);
Console.WriteLine("The Value: {0}", match.Groups[2].Value);
Console.WriteLine();
}
Console.WriteLine();
}
}
Your regex tries to match against the entire line. You can get individual pairs if you don't use anchors:
var input = Regex.Matches("{TheKey|TheValue}{AnotherKey|AnotherValue}");
var matches=Regex.Matches(input,#"(\{(.+?)\|(.+?)\})");
Debug.Assert(matches.Count == 2);
It's better to name the fields though:
var matches=Regex.Matches(input,#"\{(?<key>.+?)\|(?<value>.+?)\}");
This allows you to access the fields by name, and even use LINQ:
var pairs= from match in matches.Cast<Match>()
select new {
key=match.Groups["key"].Value,
value=match.Groups["value"].Value
};
Alternatively, you can use the Captures property of your groups to get all of the times they matched.
if (regex.Success)
{
for (var i = 0; i < regex.Groups[1].Captures.Count; i++)
{
var key = regex.Groups[2].Captures[i].Value.ToLower();
var value = regex.Groups[3].Captures[i].Value;
}
}
This has the advantage of still checking that your entire string was made up of matches. Solutions suggesting you remove the anchors will find things that look like matches in a longer string, but will not fail for you if anything was malformed.

How to match names with slash in C# regex?

I have a long text which contains strings like these:
...
1.1SMITH/JOHN 2.1SMITH/SARA
...
1.1Parker/Sara/Amanda.CH07/Elizabeth.IN03
...
Is there any regular expression in C# which can match these names. The clue is to search for [A-Z] which has separated by '/'.
You can try this:
[a-zA-Z\/]+
Explanation
c# sample:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"[a-zA-Z\/]+";
string input = #"...
1.1SMITH/JOHN 2.1SMITH/SARA
...
1.1Parker/Sara/Amanda.CH07/Elizabeth.IN03";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
You can test the working c# sample here
You can use
[a-z\/]+
which matches any combination of characters and slashes (see Regex101).
Make sure you are matching case-insensitive.
var expression = new Regex(#"[a-z\/]+", RegexOptions.IgnoreCase);
var names = expression.Matches(theText, expression);
Do you want to capture any [A-Za-z] which has a previous char or next char equals '/'?
Try this:
(?<=\/)[A-Za-z]+|[A-Za-z]+(?=\/)

Looking for patterns in a string how to?

I'm trying to find all instances of the substring EnemyType('XXXX') where XXXX is an arbitrary string and the instasnce of EnemyType('XXXX') can appear multiple times.
Right now I'm using a consortium of index of/substring functions in C# but would like to know if there's a cleaner way of doing it?
Use regex. Example:
using System.Text.RegularExpressions;
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('\d{4}'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Value);
}
It will print:
EnemyType('1234')
EnemyType('5678')
The pattern to match is #"EnemyType\('\d{4}'\)", where \d{4} means 4 numeric characters (0-9). The parentheses are escaped with backslash.
Edit: Since you only want the string inside quotes, not the whole string, you can use named groups instead.
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('(?<id>[^']+)'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Groups["id"].Value);
}
Now it prints:
1234
5678
Regex is a really nice tool for parsing strings. If you often parse strings, regex can make life so much easier.

Categories