How to extract the number from a matched string in C#? - c#

I want to extract emoji id from the input.
For example, inputs:
`<eid=1> valid get 1`
`<eid = > invalid `
`<exd = 1> invalid`
`< eid = 1000> valid get 1000`
I know how to match those string, but I have no idea about how to extract those ids from the matched strings.

Use regex
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string[] inputs = {
"<eid=1>",
"<eid = >",
"<exd = 1>",
"< eid = 1000>"
};
string pattern = #"\<\s*eid\s*=\s*(?'number'\d+)\s*\>";
foreach (string input in inputs)
{
Match match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("input : '{0}' Does Match, number = '{1}'", input, match.Groups["number"]);
}
else
{
Console.WriteLine("input : '{0}' Does not Match", input);
}
}
Console.ReadLine();
}
}
}

You can do something like this. If you don't want to store each item in an array (ex. you have html code), you can store all the values as one string, as use the following:
var input = #"`<eid=1> valid get 1`
`<eid = > invalid `
`<exd = 1> invalid`
`< eid = 1000> valid get 1000`";
var regex = new Regex(#"(?<open>\=).*?(?<final-open>\>)");
var matches = regex.Matches(input).Cast<Match>().Select(m => m.Groups["final"].Value).Distinct().ToList();
foreach (var match in matches)
{
// here you have all the matches
var id = int.Parse(match.Trim());
}
This method sets the opening and closing tags of the matches you want where this is the open tag '\=' and this is the closing tag '>':
var regex = new Regex(#"(?<open>\=).*?(?<final-open>\>)");

You need to understand what is a match, what is a capture and how can one do match captures of specific data.
In the realm of regular expressions there is a difference between a match and a capture and basic grouping.
You want to match the whole value <eid=8> but you want to get the value 8 into a capture. That is done by adding a grouping ( ) pattern to establish 1 to many capture groups. For a match can hold one or more groupings which are indexed starting at 1 to N. Zero is a special group done automatically and explained later.
So for the data <eid=8>, to group capture the value use this regex <\w+=(\d+)\> (instead of the viable pattern <\w+=\d+\>). The grouping is what puts the number into the match capture group of 1 with a value of 8.
So what are groups exactly?
Groups[0] is always the whole match such as what you see of <eid=8>.
Groups[1-N] are individual captures when ( ) construct is specified. So for our example Groups[1].Value is the number of 8. Nice, that answers your question.
One can do a named match capture by putting in (<?<{name here}>... ). By that logic we can change our pattern to <\w+=(?<TheNumbers>\d+)\> and we then can extract with Groups["TheNumbers"].Value or even Groups[1].Value still.

Related

Use RegEx to extract specific part from string

I have string like
"Augustin Ralf (050288)"
"45 Max Müller (4563)"
"Hans (Adam) Meider (056754)"
I am searching for a regex to extract the last part in the brackets, for example this results for the strings above:
"050288"
"4563"
"056754"
I have tried with
var match = Regex.Match(string, #".*(\(\d*\))");
But I get also the brackets with the result. Is there a way to extract the strings and get it without the brackets?
Taking your requirements precisely, you are looking for
\(([^()]+)\)$
This will capture anything between the parentheses (not nested!), may it be digits or anything else and anchors them to the end of the string. If you happen to have whitespace at the end, use
\(([^()]+)\)\s*$
In C# this could be
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"\(([^()]+)\)$";
string input = #"Augustin Ralf (050288)
45 Max Müller (4563)
Hans (Adam) Meider (056754)
";
RegexOptions options = RegexOptions.Multiline;
foreach (Match m in Regex.Matches(input, pattern, options))
{
Console.WriteLine("'{0}' found at index {1}.", m.Value, m.Index);
}
}
}
See a demo on regex101.com.
please use regex - \(([^)]*)\)[^(]*$. This is working as expected. I have tested here
You can extract the number between the parantheses without worring about extracting the capturing groups with following regex.
(?<=\()\d+(?=\)$)
demo
Explanation:
(?<=\() : positive look behind for ( meaning that match will start after a ( without capturing it to the result.
\d+ : captures all digits in a row until non digit character found
(?=\)$) : positive look ahead for ) with line end meaning that match will end before a ) with line ending without capturing ) and line ending to the result.
Edit: If the number can be within parantheses that is not at the end of the line, remove $ from the regex to fix the match.
var match = Regex.Match(string, #".*\((\d*)\)");
https://regex101.com/r/Wk9asY/1
Here are three options for you.
The first one uses the simplest pattern and in addition the Trim method.
The second one uses capturing the desired value to the group and then getting it from the group.
The third one uses Lookbehind and Lookahead.
var inputs = new string[] {
"Augustin Ralf (050288)", "45 Max Müller (4563)", "Hans (Adam) Meider (056754)"
};
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\(\d+\)");
Console.WriteLine(match.Value.Trim('(', ')'));
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"\((\d+)\)");
Console.WriteLine(match.Groups[1]);
}
Console.WriteLine();
foreach (var input in inputs)
{
var match = Regex.Match(input, #"(?<=\()\d+(?=\))");
Console.WriteLine(match.Value);
}
Console.WriteLine();

Regex - Extract string patterns

I have many strings like these
/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item
...
For each one I need to extract the defined pattern like these
/test/v1/{Value}/item
/test/v1/{Value}/item/{Value}/document
/test/v2/field/{Value}/item
The value can be a guid or something else, Can I match the given string patterns with input paths with regex?
I wrote just this code but I don't konw how to match input paths with patterns. The result should be the pattern. Thank you
string pattern1 = "/test/v1/{Value}/item";
string pattern2 = "/test/v1/{Value}/item/{Value}/document";
string pattern3 = "/test/v2/field/{Value}/item";
List<string> paths = new List<string>();
List<string> matched = new List<string>();
paths.Add("/test/v1/3908643GASF/item");
paths.Add("/test/v1/343569/item/AAAS45663/document");
paths.Add("/test/v1/343569/item/AAAS45664/document");
paths.Add("/test/v1/123444/item/AAAS45688/document");
paths.Add("/test/v2/field/1230FRE/item");
foreach (var path in paths)
{
}
This can also be achieved using regex alone. You can probably try:
(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?
Explanation of the above regex:
(\w+) - Represents a capturing group matching a word character one or more time. This group captures our required result.
\/\w+(?<=\/item) - Represents a positive look-behind matching the characters before \items.
$1 - Captured group 1 contains the required information you're expecting.
(\/(\w+)\/)? - Represents the second and third capturing group capturing if after item some other values is present or not.
You can find the demo of the above regex in here.
Sample implementation in C#:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string pattern = #"(\w+)\/\w+(?<=\/item)(\/(\w+)\/)?";
string input = #"/test/v1/3908643GASF/item
/test/v1/343569/item/AAAS45663/document
/test/v2/field/1230FRE/item";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.Write(m.Groups[1].Value + " ");
if(m.Groups[3].Value != null)
Console.WriteLine(m.Groups[3].Value);
}
}
}
You can find the sample run of the above implementation in here.

Parsing a list of functions and their parameters from a string

I have a string which contains some functions (I know their names) and their parameters like this:
translate(700 210) rotate(-30)
I would like to parse each one of them in a string array starting with the function name followed by the parameters.
I don't know much abour regex and so far I got this:
MatchCollection matches = Regex.Matches(attribute.InnerText, #"((translate|rotate|scale|matrix)\s*\(\s*(-?\d+\s*\,*\s*)+\))*");
for (int i = 0; i < matches.Count; i++)
{
Console.WriteLine(matches[i].Value);
}
That this returns is:
translate(700 210)
[blank space]
rotate(-30)
[blank space]
This works for me because I can run another regular expression one each row from the resulting collection and get the contents. What I don't understand is why there are blank rows returned between the methods.
Also, is running a regex twice - once to separate the methods and once to actually parse them a good approach?
Thanks!
Regex.Matches will match your entire regular expression multiple times. It finds one match for the whole thing, then finds the next match for the whole thing.
The outermost parens with * indicate that you're willing to accept zero or more of the preceding group's contents as a match. So when it finds none of them, it happily returns that. That is not your intent. You want exactly one.
The blanks are harmless, but "zero or more" also includes two. Consider this string, with no space between the two functions:
var text = "translate(700 210)rotate(-30)";
That's one match, according to the regex you provided. You'll get "rotate" and "-30". If the missing space is an error, detect it and warn the user. If you're not going to do that, parse it correctly.
So let's get rid of the outermost parens and that *. We'll also name the capturing groups, for readability.
var matches = Regex.Matches(text, #"(?<funcName>translate|rotate|scale|matrix)\s*\(\s*(?<param>-?\s*\d+\s*\,*\s*)+\)");
foreach (Match match in matches)
{
if (match.Groups["funcName"].Success)
{
var funcName = match.Groups["funcName"].Value;
var param = Int32.Parse(match.Groups["param"].Value);
Console.WriteLine($"{funcName}( {param} )");
}
}
I also stuck in \s* after the optional -, just in case.
I like using Regex with a dictionary
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace ConsoleApplication56
{
class Program
{
static void Main(string[] args)
{
Dictionary<string, string> dict = new Dictionary<string, string>();
string input = "translate(700 210) rotate(-30)";
string pattern = #"(?'command'[^\(]+)\((?'value'[^\)]+)\)";
MatchCollection matches = Regex.Matches(input, pattern);
foreach(Match match in matches.Cast<Match>())
{
dict.Add(match.Groups["command"].Value, match.Groups["value"].Value);
}
}
}
}

Attempting to capture multiple groups but only the last group is captured

I am trying to use regex to help to convert the following string into a Dictionary:
{TheKey|TheValue}{AnotherKey|AnotherValue}
Like such:
["TheKey"] = "TheValue"
["AnotherKey"] = "AnotherValue"
To parse the string for the dictionary, I am using the regex expression:
^(\{(.+?)\|(.+?)\})*?$
But it will only capture the last group of {AnotherKey|AnotherValue}.
How do I get it to capture all of the groups?
I am using C#.
Alternatively, is there a more straightforward way to approach this rather than using Regex?
Code (Properties["PromptedValues"] contains the string to be parsed):
var regex = Regex.Matches(Properties["PromptedValues"], #"^(\{(.+?)\|(.+?)\})*?$");
foreach(Match match in regex) {
if(match.Groups.Count == 4) {
var key = match.Groups[2].Value.ToLower();
var value = match.Groups[3].Value;
values.Add(key, new StringPromptedFieldHandler(key, value));
}
}
This is coded to work for the single value, I would be looking to update it once I can get it to capture multiple values.
The $ says that: The match must occur at the end of the string or before \n at the end of the line or string.
The ^ says that: The match must start at the beginning of the string or line.
Read this for more regex syntax: msdn RegEx
Once you remove the ^ and $ your regex will match all of the sets You should read: Match.Groups and get something like the following:
public class Example
{
public static void Main()
{
string pattern = #"\{(.+?)\|(.+?)\}";
string input = "{TheKey|TheValue}{AnotherKey|AnotherValue}";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine("The Key: {0}", match.Groups[1].Value);
Console.WriteLine("The Value: {0}", match.Groups[2].Value);
Console.WriteLine();
}
Console.WriteLine();
}
}
Your regex tries to match against the entire line. You can get individual pairs if you don't use anchors:
var input = Regex.Matches("{TheKey|TheValue}{AnotherKey|AnotherValue}");
var matches=Regex.Matches(input,#"(\{(.+?)\|(.+?)\})");
Debug.Assert(matches.Count == 2);
It's better to name the fields though:
var matches=Regex.Matches(input,#"\{(?<key>.+?)\|(?<value>.+?)\}");
This allows you to access the fields by name, and even use LINQ:
var pairs= from match in matches.Cast<Match>()
select new {
key=match.Groups["key"].Value,
value=match.Groups["value"].Value
};
Alternatively, you can use the Captures property of your groups to get all of the times they matched.
if (regex.Success)
{
for (var i = 0; i < regex.Groups[1].Captures.Count; i++)
{
var key = regex.Groups[2].Captures[i].Value.ToLower();
var value = regex.Groups[3].Captures[i].Value;
}
}
This has the advantage of still checking that your entire string was made up of matches. Solutions suggesting you remove the anchors will find things that look like matches in a longer string, but will not fail for you if anything was malformed.

Get specific numbers from string

In my current project I have to work alot with substring and I'm wondering if there is an easier way to get out numbers from a string.
Example:
I have a string like this:
12 text text 7 text
I want to be available to get out first number set or second number set.
So if I ask for number set 1 I will get 12 in return and if I ask for number set 2 I will get 7 in return.
Thanks!
This will create an array of integers from the string:
using System.Linq;
using System.Text.RegularExpressions;
class Program {
static void Main() {
string text = "12 text text 7 text";
int[] numbers = (from Match m in Regex.Matches(text, #"\d+") select int.Parse(m.Value)).ToArray();
}
}
Try using regular expressions, you can match [0-9]+ which will match any run of numerals within your string. The C# code to use this regex is roughly as follows:
Match match = Regex.Match(input, "[0-9]+", RegexOptions.IgnoreCase);
// Here we check the Match instance.
if (match.Success)
{
// here you get the first match
string value = match.Groups[1].Value;
}
You will of course still have to parse the returned strings.
Looks like a good match for Regex.
The basic regular expression would be \d+ to match on (one or more digits).
You would iterate through the Matches collection returned from Regex.Matches and parse each returned match in turn.
var matches = Regex.Matches(input, "\d+");
foreach(var match in matches)
{
myIntList.Add(int.Parse(match.Value));
}
You could use regex:
Regex regex = new Regex(#"^[0-9]+$");
you can split the string in parts using string.Split, and then travese the list with a foreach applying int.TryParse, something like this:
string test = "12 text text 7 text";
var numbers = new List<int>();
int i;
foreach (string s in test.Split(' '))
{
if (int.TryParse(s, out i)) numbers.Add(i);
}
Now numbers has the list of valid values

Categories