C# extract words using regex - c#

I've found a lot of examples of how to check something using regex, or how to split text using regular expressions.
But how can I extract words out of a string ?
Example:
aaaa 12312 <asdad> 12334 </asdad>
Lets say I have something like this, and I want to extract all the numbers [0-9]* and put them in a list.
Or if I have 2 different kind of elements:
aaaa 1234 ...... 1234 ::::: asgsgd
And I want to choose digits that come after ..... and words that come after ::::::
Can I extract these strings in a single regex ?

Here's a solution for your first problem:
class Program
{
static void Main(string[] args)
{
string data = "aaaa 12312 <asdad> 12334 </asdad>";
Regex reg = new Regex("[0-9]+");
foreach (var match in reg.Matches(data))
{
Console.WriteLine(match);
}
Console.ReadLine();
}
}

In the general case, you can do this using capturing parentheses:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
string regex = #"\.\.\.\. (\d+) ::::: (\w+)";
Match m = Regex.Match(input, regex);
if (m.Success) {
int numberAfterDots = int.Parse(m.Groups[1].Value);
string wordAfterColons = m.Groups[2].Value;
// ... Do something with these values
}
But the first part you asked (extract all the numbers) is a bit easier:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
var numbers = Regex.Matches(input, #"\d+")
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToList();
Now numbers will be a list of integers.

For your specific examples:
string firstString = "aaaa 12312 <asdad> 12334 </asdad>";
Regex firstRegex = new Regex(#"(?<Digits>[\d]+)", RegexOptions.ExplicitCapture);
if (firstRegex.IsMatch(firstString))
{
MatchCollection firstMatches = firstRegex.Matches(firstString);
foreach (Match match in firstMatches)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
}
string secondString = "aaaa 1234 ...... 1234 ::::: asgsgd";
Regex secondRegex = new Regex(#"([\.]+\s(?<Digits>[\d]+))|([\:]+\s(?<Words>[a-zA-Z]+))", RegexOptions.ExplicitCapture);
if (secondRegex.IsMatch(secondString))
{
MatchCollection secondMatches = secondRegex.Matches(secondString);
foreach (Match match in secondMatches)
{
if (match.Groups["Digits"].Success)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
if (match.Groups["Words"].Success)
{
Console.WriteLine("Words: " + match.Groups["Words"].Value);
}
}
}
Hope that helps. The output is:
Digits: 12312
Digits: 12334
Digits: 1234
Words: asgsgd

Something like this will do nicely!
var text = "aaaa 12312 <asdad> 12334 </asdad>";
var matches = Regex.Matches(text, #"\w+");
var arrayOfMatched = matches.Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(string.Join(", ", arrayOfMatched));
\w+ Matches consecutive word characters. Then we just selected the values out of the list of matches and turn them into an array.

Regex itemsRegex = new Regex(#"(\d*)");
MatchCollection matches = itemsRegex.Matches(text);
int[] values = matches.Cast<Match>().Select(m => Convert.ToInt32(m.Value)).ToArray();

Regex phoneregex = new Regex("[0-9][0-9][0-9]\-[0-9][0-9][0-9][0-9]");
String unicornCanneryDirectory = "unicorn cannery 483-8627 cha..."
String numbersToCall = "";
//the second argument is where to begin within the match,
//we probably want 0, the first character
Match matchIterator = phoneregex.Match(unicornCanneryDirectory , 0);
//Success tells us if matchIterator has another match or not
while( matchIterator.Sucess){
String aResult = matchIterator.Result();
//we could manipulate our match now but I'm going to concatenate them all for later
numbersToCall += aResult + " ";
matchIterator = matchIterator.NextMatch();
}
// use my concatenated matches now
String message = "Unicorn rights activists demand more sparkles in the unicorn canneries under the new law...";
phoneDialer.MassCallWithAutomatedMessage(aResult, message );
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.match.nextmatch.aspx

Related

How do I check if a string contains "(1)" and if it does, increase the number by 1?

If a any given string, at the end contains "(" followed by a number, + ")", i want to increase that value by one. If not Ill just add a "(1)".
Ive tried with something like string.Contains(), but since the value within () can be diffrent i don't know how to always search like this and get the number.
To find a parentheses enclosed number at the end of a string, and increase b 1, try this:
Regex.Replace(yourString, #"(?<=\()\d+(?=\)$)", match => (int.Parse(match.Value) + 1).ToString());
Explanation:
(?<=\() is a positive look-behind, which matches an open bracket, but does not include it in the match result.
\d+ matches one or more digits.
(?=\)$) is a positive look-ahead, which matches a closing bracket at the end of the string.
To add a number if none is present, test the match first:
string yourString = "A string with no number at the end";
string pattern = #"(?<=\()\d+(?=\)$)";
if (Regex.IsMatch(yourString, pattern))
{
yourString = Regex.Replace(yourString, pattern, match => (int.Parse(match.Value) + 1).ToString());
}
else
{
yourString += " (1)";
}
You can try regular expressions: Match and Replace the desired fragment, e.g.
using System.Text.RegularExpressions;
...
string[] tests = new string[] {
"abc",
"def (123)",
"pqr (123) def",
"abs (789) (123)",
};
Func<string, string> solution = (line) =>
Regex.Replace(line,
#"\((?<value>[0-9]+)\)$",
m => $"({int.Parse(m.Groups["value"].Value) + 1})");
string demo = string.Join(Environment.NewLine, tests
.Select(test => $"{test,-20} => {solution(test)}"));
Console.Write(demo);
Outcome:
abc => abc # no numbers
def (123) => def (124) # 123 turned into 124
pqr (123) def => pqr (123) def # 123 is not at the end of string
abs (789) (123) => abs (789) (124) # 123 turned into 124, 789 spared
If we put
Func<string, string> solution = (line) => {
Match m = Regex.Match(line, #"\((?<value>[0-9]+)\)$");
return m.Success
? line.Substring(0, m.Index) + $"({int.Parse(m.Groups["value"].Value) + 1})"
: line + " (1)";
};
Edit: If we want to put (1) if we haven't any match we can try Match and replace matched text:
abc => abc (1)
def (123) => def (124)
pqr (123) def => pqr (123) def (1)
abs (789) (123) => abs (789) (124)
string s = "sampleText";
string pattern = "[(]([0-9]*?)[)]$";
for (int i = 0; i < 5; i++)
{
var m = Regex.Match(s, pattern);
if (m.Success)
{
int value = int.Parse(m.Groups[1].Value);
s = Regex.Replace(s, pattern, $"({++value})");
}
else
{
s += "(1)";
}
Console.WriteLine(s);
}
If I understand correctly you have strings such as :
string s1 = "foo(12)"
string s2 = "bar(21)"
string s3 = "foobar"
And you want to obtain the following:
IncrementStringId(s1) == "foo(13)"
IncrementStringId(s2) == "bar(22)"
IncrementStringId(s3) == "foobar(1)"
you could accomplish this by using the following method
public string IncrementStringId(string input)
{
// The RexEx pattern is looking at the very end of the string for any number encased in paranthesis
string pattern = #"\(\d*\)$";
Regex regex = new Regex(pattern);
Match match = regex.Match(input);
if (match.Success)
if (int.TryParse(match.Value.Replace(#"(", "").Replace(#")", ""), out int index))
//if pattern in found parse the number detected and increment it by 1
return Regex.Replace(input, pattern, "(" + ++index + ")");
// In case the pattern is not detected add a (1) to the end of the string
return input + "(1)";
}
Please make sure you are using System.Text.RegularExpressions namespace that includes Regex class.

Getting a numbers from a string with chars glued

I need to recover each number in a glued string
For example, from these strings:
string test = "number1+3"
string test1 = "number 1+4"
I want to recover (1 and 3) and (1 and 4)
How can I do this?
CODE
string test= "number1+3";
List<int> res;
string[] digits= Regex.Split(test, #"\D+");
foreach (string value in digits)
{
int number;
if (int.TryParse(value, out number))
{
res.Add(number)
}
}
This regex should work
string pattern = #"\d+";
string test = "number1+3";
foreach (Match match in Regex.Matches(test, pattern))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
Note that if you intend to use it multiple times, it's better, for performance reasons, to create a Regex instance than using this static method.
var res = new List<int>();
var regex = new Regex(#"\d+");
void addMatches(string text) {
foreach (Match match in regex.Matches(text))
{
int number = int.Parse(match.Value);
res.Add(number);
}
}
string test = "number1+3";
addMatches(test);
string test1 = "number 1+4";
addMatches(test1);
MSDN link.
Fiddle 1
Fiddle 2
This calls for a regular expression:
(\d+)\+(\d+)
Test it
Match m = Regex.Match(input, #"(\d+)\+(\d+)");
string first = m.Groups[1].Captures[0].Value;
string second = m.Groups[2].Captures[0].Value;
An alternative to regular expressions:
string test = "number 1+4";
int[] numbers = test.Replace("number", string.Empty, StringComparison.InvariantCultureIgnoreCase)
.Trim()
.Split("+", StringSplitOptions.RemoveEmptyEntries)
.Select(x => Convert.ToInt32(x))
.ToArray();

Want to get more than 1 value by using regex c#

> String St = "New Specification Result : Measures 0.0039mm ( 4 Microns )New Specification Result : Measures 0.0047mm ( 5 Microns )";
The string that i want to get is 0.0039mm and 0.0047mm but the code i use keep giving me 0.0047mm only.
var src = st;
var pattern = #"([0-9].[0-9]{4}mm)";
var expr = new Regex(pattern, RegexOptions.IgnoreCase);
foreach (Match match in expr.Matches(src))
{
string key = match.Groups[1].Value;
string key2 = match.Groups[2].Value;
label1.Text = key + key2;
}
Your code is fine, and the millimeter number you are trying to match is being captured correctly, but in the first capture group, and not in the second. There is a slight problem with your pattern, and it should be this:
([0-9]\.[0-9]{4}mm)
You intend for the dot to be a literal decimal point, so it should be escaped with a backslash. Here is the full code:
var pattern = #"([0-9].[0-9]{4}mm)";
var expr = new Regex(pattern, RegexOptions.IgnoreCase);
foreach (Match match in expr.Matches(src))
{
string key = match.Groups[1].Value;
string key2 = match.Groups[2].Value; // this doesn't match to anything here
Console.WriteLine(key);
}
Demo
You want the following. Your loop is is overwriting your copy of the first result (and you don't have a 2nd capture group. You have a 2nd match)
var st = "New Specification Result : Measures 0.0039mm(4 Microns)New Specification Result: Measures 0.0047mm(5 Microns";
var pattern = #"([0-9]\.[0-9]{4}mm)";
var expr = new Regex(pattern, RegexOptions.IgnoreCase);
string key = "";
foreach (Match match in expr.Matches(st))
{
key += match.Groups[1].Value;
}
you want to iterate through each match and the join them together for display
var mm = new Regex(#"([0-9]\.[0-9]{4}mm)").Matches(src).Select(m => m.Groups[1]).ToList();
var list = string.Join(" ", mm);
label1.Text = list;
currently you are only getting the last match as you keep overwriting the text in your label

Get numbers of text with Regex.Split in C#

How can I get numbers between brackets of this text with regex in C#?
sample text :
"[1]Ali ahmadi,[2]Mohammad Razavi"
result is : 1,2
My C# code is :
string result = null;
string[] digits = Regex.Split(Text, #"[\d]");
foreach (string value in digits)
{
result += value + ",";
}
return result.Substring(0,result.Length - 1);
string s = "[1]Ali ahmadi,[2]Mohammad Razavi";
Regex regex = new Regex(#"\[(\d+)\]", RegexOptions.Compiled);
foreach (Match match in regex.Matches(s))
{
Console.WriteLine(match.Groups[1].Value);
}
This will capture the numbers between brackets (\d+), and store them in the first matched group (Groups[1]).
DEMO.
Using a Linq-based approach on João's answer:
string s = "[1]Ali ahmadi,[2]Mohammad Razavi";
var digits = Regex.Matches(s, #"\[(\d+)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
foreach (var match in digits)
{
Console.WriteLine(match);
}
DEMO

How do I "cut" out part of a string with a regex?

I need to cut out and save/use part of a string in C#. I figure the best way to do this is by using Regex. My string looks like this:
"changed from 1 to 10".
I need a way to cut out the two numbers and use them elsewhere. What's a good way to do this?
Error checking left as an exercise...
Regex regex = new Regex( #"\d+" );
MatchCollection matches = regex.Matches( "changed from 1 to 10" );
int num1 = int.Parse( matches[0].Value );
int num2 = int.Parse( matches[1].Value );
Matching only exactly the string "changed from x to y":
string pattern = #"^changed from ([0-9]+) to ([0-9]+)$";
Regex r = new Regex(pattern);
Match m = r.match(text);
if (m.Success) {
Group g = m.Groups[0];
CaptureCollection cc = g.Captures;
int from = Convert.ToInt32(cc[0]);
int to = Convert.ToInt32(cc[1]);
// Do stuff
} else {
// Error, regex did not match
}
In your regex put the fields you want to record in parentheses, and then use the Match.Captures property to extract the matched fields.
There's a C# example here.
Use named capture groups.
Regex r = new Regex("*(?<FirstNumber>[0-9]{1,2})*(?<SecondNumber>[0-9]{1,2})*");
string input = "changed from 1 to 10";
string firstNumber = "";
string secondNumber = "";
MatchCollection joinMatches = regex.Matches(input);
foreach (Match m in joinMatches)
{
firstNumber= m.Groups["FirstNumber"].Value;
secondNumber= m.Groups["SecondNumber"].Value;
}
Get Expresson to help you out, it has an export to C# option.
DISCLAIMER: Regex is probably not right (my copy of expresso expired :D)
Here is a code snippet that does almost what I wanted:
using System.Text.RegularExpressions;
string text = "changed from 1 to 10";
string pattern = #"\b(?<digit>\d+)\b";
Regex r = new Regex(pattern);
MatchCollection mc = r.Matches(text);
foreach (Match m in mc) {
CaptureCollection cc = m.Groups["digit"].Captures;
foreach (Capture c in cc){
Console.WriteLine((Convert.ToInt32(c.Value)));
}
}

Categories