Use RegEx to extract specific part from string - c#

I have string like
"Augustin Ralf (050288)"
"45 Max Müller (4563)"
"Hans (Adam) Meider (056754)"
I am searching for a regex to extract the last part in the brackets, for example this results for the strings above:
"050288"
"4563"
"056754"
I have tried with
var match = Regex.Match(string, #".*(\(\d*\))");
But I get also the brackets with the result. Is there a way to extract the strings and get it without the brackets?

Taking your requirements precisely, you are looking for
\(([^()]+)\)$
This will capture anything between the parentheses (not nested!), may it be digits or anything else and anchors them to the end of the string. If you happen to have whitespace at the end, use
\(([^()]+)\)\s*$
In C# this could be
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\(([^()]+)\)$";
string input = #"Augustin Ralf (050288)
45 Max Müller (4563)
Hans (Adam) Meider (056754)
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
See a demo on regex101.com.

please use regex - \(([^)]*)\)[^(]*$. This is working as expected. I have tested here

You can extract the number between the parantheses without worring about extracting the capturing groups with following regex.
(?<=\()\d+(?=\)$)
demo
Explanation:
(?<=\() : positive look behind for ( meaning that match will start after a ( without capturing it to the result.
\d+ : captures all digits in a row until non digit character found
(?=\)$) : positive look ahead for ) with line end meaning that match will end before a ) with line ending without capturing ) and line ending to the result.
Edit: If the number can be within parantheses that is not at the end of the line, remove $ from the regex to fix the match.

var match = Regex.Match(string, #".*\((\d*)\)");
https://regex101.com/r/Wk9asY/1

Here are three options for you.
The first one uses the simplest pattern and in addition the Trim method.
The second one uses capturing the desired value to the group and then getting it from the group.
The third one uses Lookbehind and Lookahead.
var inputs = new string[] {
"Augustin Ralf (050288)", "45 Max Müller (4563)", "Hans (Adam) Meider (056754)"
};
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\(\d+\)");
Console.WriteLine(match.Value.Trim('(', ')'));
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\((\d+)\)");
Console.WriteLine(match.Groups[1]);
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"(?<=\()\d+(?=\))");
Console.WriteLine(match.Value);
}
Console.WriteLine();

Related

Regex - Extract string patterns

I have many strings like these
/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item
...
For each one I need to extract the defined pattern like these
/test/v1/{Value}/item
/test/v1/{Value}/item/{Value}/document
/test/v2/field/{Value}/item
The value can be a guid or something else, Can I match the given string patterns with input paths with regex?
I wrote just this code but I don't konw how to match input paths with patterns. The result should be the pattern. Thank you
string pattern1 = "/test/v1/{Value}/item";
string pattern2 = "/test/v1/{Value}/item/{Value}/document";
string pattern3 = "/test/v2/field/{Value}/item";
List<string> paths = new List<string>();
List<string> matched = new List<string>();
paths.Add("/test/v1/3908643GASF/item");
paths.Add("/test/v1/343569/item/AAAS45663/document");
paths.Add("/test/v1/343569/item/AAAS45664/document");
paths.Add("/test/v1/123444/item/AAAS45688/document");
paths.Add("/test/v2/field/1230FRE/item");
foreach (var path in paths)
{
}
This can also be achieved using regex alone. You can probably try:
(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?
Explanation of the above regex:
(\w+) - Represents a capturing group matching a word character one or more time. This group captures our required result.
\/\w+(?<=\/item) - Represents a positive look-behind matching the characters before \items.
$1 - Captured group 1 contains the required information you're expecting.
(\/(\w+)\/)? - Represents the second and third capturing group capturing if after item some other values is present or not.
You can find the demo of the above regex in here.
Sample implementation in C#:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?";
string input = #"/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.Write(m.Groups[1].Value + " ");
if(m.Groups[3].Value != null)
Console.WriteLine(m.Groups[3].Value);
}
}
}
You can find the sample run of the above implementation in here.

Match the last bracket

I have a string which contains some text followed by some brackets with different content (possibly empty). I need to extract the last bracket with its content:
atext[d][][ef] // should return "[ef]"
other[aa][][a] // should return "[a]"
xxxxx[][xx][x][][xx] // should return "[xx]"
yyyyy[] // should return "[]"
I have looked into RegexOptions.RightToLeft and read up on lazy vs greedy matching, but I can't for the life of me get this one right.
This regex will work
.*(\[.*\])
Regex Demo
More efficient and non-greedy version
.*(\[[^\]]*\])
C# Code
string input = "atext[d][][ef]\nother[aa][][a]\nxxxxx[][xx][x][][xx]\nyyyyy[]";
string pattern = "(?m).*(\\[.*\\])";
Regex rgx = new Regex(pattern);
Match match = rgx.Match(input);
while (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
match = match.NextMatch();
}
Ideone Demo
It may give unexpected results for nested [] or unbalanced []
Alternatively, you could reverse the string using a function similar to this:
public static string Reverse( string s )
{
char[] charArray = s.ToCharArray();
Array.Reverse( charArray );
return new string( charArray );
}
And then you could perform a simple Regex search to just look for the first [someText] group or just use a for loop to iterate through and then stop when the first ] is reached.
With negative lookahead:
\[[^\]]*\](?!\[)
This is relatively efficient and flexible, without the evil .*. This will be also work with longer text which contains multiple instances.
Regex101 demo here
The correct way for .net is indeed to use the regex option RightToLeft with the appropriate method Regex.Match(String, String, RegexOptions).
In this way you keep the pattern very simple and efficient since it doesn't produce the less backtracking step and, since the pattern ends with a literal character (the closing bracket), allows a quick search for possible positions in the string where the pattern may succeeds before the "normal" walk of the regex engine.
public static void Main()
{
string input = #"other[aa][][a]";
string pattern = #"\[[^][]*]";
Match m = Regex.Match(input, pattern, RegexOptions.RightToLeft);
if (m.Success)
Console.WriteLine("Found '{0}' at position {1}.", m.Value, m.Index);
}

Search string using Pattern within long string in C#

I need to search for a pattern within a string.
For eg:
string big = "Hello there, I need information for ticket XYZ12345. I also submitted ticket ZYX54321. Please update.";
Now I need to extract/find/seek words based on the pattern XXX00000 i.e. 3 ALPHA and than 5 numeric.
Is there any way to do this ?
Even extraction will be okay for me.
Please help.
foreach (Match m in Regex.Matches(big, "([A-Za-z]{3}[0-9]{5})"))
{
if (m.Success)
{
m.Groups[1].Value // -- here is your match
}
}
How about this one?
([XYZ]{3}[0-9]{5})
You can use Regex Tester to test your expressions.
You can use simple regular expression to match your following string
([A-Za-z]{3}[0-9]{5})
the full code will be:
string strRegex = #"([A-Za-z]{3}[0-9]{5})";
Regex myRegex = new Regex(strRegex, RegexOptions.IgnoreCase);
string strTargetString = #"Hello there, I need information for ticket XYZ12345. I also submitted ticket ZYX54321. Please update.";
foreach (Match myMatch in myRegex.Matches(strTargetString))
{
if (myMatch.Success)
{
// Add your code here
}
}
You could always use a chatbot extension for the requests.
as for extracting the required information out of a sentence without any context
you can use regex for that.
you can use http://rubular.com/ to test it,
an example would be
...[0-9]
that would find XXX00000
hope that helped.
Use a regex:
string ticketNumber = string.Empty;
var match = Regex.Match(myString,#"[A-Za-z]{3}\d{5}");
if(match.Success)
{
ticketNumber = match.Value;
}
Here's a regex:
var str = "ABCD12345 ABC123456 ABC12345 XYZ98765";
foreach (Match m in Regex.Matches(str, #"(?<![A-Z])[A-Z]{3}[0-9]{5}(?![0-9])"))
Console.WriteLine(m.Value);
The extra bits are the zero-width negative look-behind ((?<![A-Z])) and look-ahead ((?![0-9])) expressions to make sure you don't capture extra numbers or letters. The above example only catches the third and fourth parts, but not the first and second. A simple [A-Z]{3}[0-9]{5} catches at least the specified number of characters, or more.

c# regex to extract link after =

Couldn't find better title but i need a Regex to extract link from sample below.
snip... flashvars.image_url = 'http://domain.com/test.jpg' ..snip
assuming regex is the best way.
thanks
Consider the following sample code. It shows how one might extract from your supplied string. But I have expanded upon the string some. Generally, the use of .* is too all inclusive (as the example below demonstrates).
The main point, is there are several ways to do what you are asking, the first answer given uses "look-around" while the second suggests the "Groups" approach. The choice mainly depend upon your actual data.
string[] tests = {
#"snip... flashvars.image_url = 'http://domain.com/test.jpg' ..snip",
#"snip... flashvars.image_url = 'http://domain.com/test.jpg' flashvars2.image_url = 'http://someother.domain.com/test.jpg'",
};
string[] patterns = {
#"(?<==\s')[^']*(?=')",
#"=\s*'(.*)'",
#"=\s*'([^']*)'",
};
foreach (string pattern in patterns)
{
Console.WriteLine();
foreach (string test in tests)
foreach (Match m in Regex.Matches(test, pattern))
{
if (m.Groups.Count > 1)
Console.WriteLine("{0}", m.Groups[1].Value);
else
Console.WriteLine("{0}", m.Value);
}
}
A simple regex for this would be #"=\s*'(.*)'".
Edit: New regex matching your edited question:
You need to match what's between quotes, after a =, right?
#"(?<==\s*')[^']*(?=')"
should do.
(?<==\s*') asserts that there is a =, optionally followed by whitespace, followed by a ', just before our current position (positive lookbehind).
[^']* matches any number of non-' characters.
(?=') asserts that the match stops before the next '.
This regex doesn't check if there is indeed a URL inside those quotes. If you want to do that, use
#"(?<==\s*')(?=(?:https?|ftp|mailto)\b)[^']*(?=')"

Regex to match and return group names

I need to match the following strings and returns the values as groups:
abctic
abctac
xyztic
xyztac
ghhtic
ghhtac
Pattern is wrote with grouping is as follows:
(?<arch>[abc,xyz,ghh])(?<flavor>[tic,tac]$)
The above returns only parts of group names. (meaning match is not correct).
If I use * in each sub pattern instead of $ at the end, groups are correct, but that would mean that abcticff will also match.
Please let me know what my correct regex should be.
Your pattern is incorrect because a pipe symbol | is used to specify alternate matches, not a comma in brackets as you were using, i.e., [x,y].
Your pattern should be: ^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$
The ^ and $ metacharacters ensures the string matches from start to end. If you need to match text in a larger string you could replace them with \b to match on a word boundary.
Try this approach:
string[] inputs = { "abctic", "abctac", "xyztic", "xyztac", "ghhtic", "ghhtac" };
string pattern = #"^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$";
foreach (var input in inputs)
{
var match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("Arch: {0} - Flavor: {1}",
match.Groups["arch"].Value,
match.Groups["flavor"].Value);
}
else
Console.WriteLine("No match for: " + input);
}

Categories