Get numbers of text with Regex.Split in C# - c#

How can I get numbers between brackets of this text with regex in C#?
sample text :
"[1]Ali ahmadi,[2]Mohammad Razavi"
result is : 1,2
My C# code is :
string result = null;
string[] digits = Regex.Split(Text, #"[\d]");
foreach (string value in digits)
{
result += value + ",";
}
return result.Substring(0,result.Length - 1);

string s = "[1]Ali ahmadi,[2]Mohammad Razavi";
Regex regex = new Regex(#"\[(\d+)\]", RegexOptions.Compiled);
foreach (Match match in regex.Matches(s))
{
Console.WriteLine(match.Groups[1].Value);
}
This will capture the numbers between brackets (\d+), and store them in the first matched group (Groups[1]).
DEMO.

Using a Linq-based approach on João's answer:
string s = "[1]Ali ahmadi,[2]Mohammad Razavi";
var digits = Regex.Matches(s, #"\[(\d+)\]")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
foreach (var match in digits)
{
Console.WriteLine(match);
}
DEMO

Related

Getting a numbers from a string with chars glued

I need to recover each number in a glued string
For example, from these strings:
string test = "number1+3"
string test1 = "number 1+4"
I want to recover (1 and 3) and (1 and 4)
How can I do this?
CODE
string test= "number1+3";
List<int> res;
string[] digits= Regex.Split(test, #"\D+");
foreach (string value in digits)
{
int number;
if (int.TryParse(value, out number))
{
res.Add(number)
}
}
This regex should work
string pattern = #"\d+";
string test = "number1+3";
foreach (Match match in Regex.Matches(test, pattern))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
Note that if you intend to use it multiple times, it's better, for performance reasons, to create a Regex instance than using this static method.
var res = new List<int>();
var regex = new Regex(#"\d+");
void addMatches(string text) {
foreach (Match match in regex.Matches(text))
{
int number = int.Parse(match.Value);
res.Add(number);
}
}
string test = "number1+3";
addMatches(test);
string test1 = "number 1+4";
addMatches(test1);
MSDN link.
Fiddle 1
Fiddle 2
This calls for a regular expression:
(\d+)\+(\d+)
Test it
Match m = Regex.Match(input, #"(\d+)\+(\d+)");
string first = m.Groups[1].Captures[0].Value;
string second = m.Groups[2].Captures[0].Value;
An alternative to regular expressions:
string test = "number 1+4";
int[] numbers = test.Replace("number", string.Empty, StringComparison.InvariantCultureIgnoreCase)
.Trim()
.Split("+", StringSplitOptions.RemoveEmptyEntries)
.Select(x => Convert.ToInt32(x))
.ToArray();

Finding the longest substring regex?

Someone knows how to find the longest substring composed of letters using using MatchCollection.
public static Regex pattern2 = new Regex("[a-zA-Z]");
public static string zad3 = "ala123alama234ijeszczepsa";
You can loop over all matches and get the longest:
string max = "";
foreach (Match match in Regex.Matches(zad3, "[a-zA-Z]+"))
if (max.Length < match.Value.Length)
max = match.Value;
Try this:
MatchCollection matches = pattern2.Matches(txt);
List<string> strLst = new List<string>();
foreach (Match match in matches)
strLst.Add(match.Value);
var maxStr1 = strLst.OrderByDescending(s => s.Length).First();
or better way :
var maxStr2 = matches.Cast<Match>().Select(m => m.Value).ToArray().OrderByDescending(s => s.Length).First();
best solution for your task is:
string zad3 = "ala123alama234ijeszczepsa54dsfd";
string max = Regex.Split(zad3,#"\d+").Max(x => x);
You must change your Regex pattern to include the repetition operator + so that it matches more than once.
[a-zA-Z] should be [a-zA-Z]+
You can get the longest value using LINQ. Order by the match length descending and then take the first entry. If there are no matches the result is null.
string pattern2 = "[a-zA-Z]+";
string zad3 = "ala123alama234ijeszczepsa";
var matches = Regex.Matches(zad3, pattern2);
string result = matches
.Cast<Match>()
.OrderByDescending(x => x.Value.Length)
.FirstOrDefault()?
.Value;
The string named result in this example is:
ijeszczepsa
Using linq and the short one:
string longest= Regex.Matches(zad3, pattern2).Cast<Match>()
.OrderByDescending(x => x.Value.Length).FirstOrDefault()?.Value;
you can find it in O(n) like this (if you do not want to use regex):
string zad3 = "ala123alama234ijeszczepsa";
int max=0;
int count=0;
for (int i=0 ; i<zad3.Length ; i++)
{
if (zad3[i]>='0' && zad3[i]<='9')
{
if (count > max)
max=count;
count=0;
continue;
}
count++;
}
if (count > max)
max=count;
Console.WriteLine(max);

Get an array of symbols from a string

I have a string with text inside curlies like this:
{field1}-{field}+{field3}Anthing{field4}
from which I need to get an array like this:
['field1', 'field2', 'field3', 'field4']
Is there a way to do it using regexes in c#?
You can use Split and Linq:
string[] words = s.Split('+')
.Select(word => word.Substring(1, word.Length - 2))
.ToArray();
Or, you can match for {...} tokens using a simple regular expression:
MatchCollection matches = Regex.Matches(s, #"\{(\w*)\}");
string[] words = matches.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToArray();
\w* would only match alphanumeric characters, you may want to replace it with [^}]* or .*?.
like this ? ↓
  static void Main(string[] args)
{
string from = "{field1}+{field2}+{field3}";
string[] to = from.Split("{}+".ToCharArray() , StringSplitOptions.RemoveEmptyEntries).ToArray();
foreach (var x in to)
Console.WriteLine(x);
Console.ReadKey();
}
FOR EDIT
To solve the problem with "{field1}-{field}+{field3}Anthing{field4} "
static void Main(string[] args)
{
string f = "{field1}-{field}+{field3}Anthing{field4} ";
List<string> lstPattern = new List<string>();
foreach (Match m in Regex.Matches(f, "{.*?}"))
{
lstPattern.Add(m.Value.Replace("{","").Replace("}",""));
}
foreach (var p in lstPattern)
Console.WriteLine(p);
}

C# extract words using regex

I've found a lot of examples of how to check something using regex, or how to split text using regular expressions.
But how can I extract words out of a string ?
Example:
aaaa 12312 <asdad> 12334 </asdad>
Lets say I have something like this, and I want to extract all the numbers [0-9]* and put them in a list.
Or if I have 2 different kind of elements:
aaaa 1234 ...... 1234 ::::: asgsgd
And I want to choose digits that come after ..... and words that come after ::::::
Can I extract these strings in a single regex ?
Here's a solution for your first problem:
class Program
{
static void Main(string[] args)
{
string data = "aaaa 12312 <asdad> 12334 </asdad>";
Regex reg = new Regex("[0-9]+");
foreach (var match in reg.Matches(data))
{
Console.WriteLine(match);
}
Console.ReadLine();
}
}
In the general case, you can do this using capturing parentheses:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
string regex = #"\.\.\.\. (\d+) ::::: (\w+)";
Match m = Regex.Match(input, regex);
if (m.Success) {
int numberAfterDots = int.Parse(m.Groups[1].Value);
string wordAfterColons = m.Groups[2].Value;
// ... Do something with these values
}
But the first part you asked (extract all the numbers) is a bit easier:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
var numbers = Regex.Matches(input, #"\d+")
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToList();
Now numbers will be a list of integers.
For your specific examples:
string firstString = "aaaa 12312 <asdad> 12334 </asdad>";
Regex firstRegex = new Regex(#"(?<Digits>[\d]+)", RegexOptions.ExplicitCapture);
if (firstRegex.IsMatch(firstString))
{
MatchCollection firstMatches = firstRegex.Matches(firstString);
foreach (Match match in firstMatches)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
}
string secondString = "aaaa 1234 ...... 1234 ::::: asgsgd";
Regex secondRegex = new Regex(#"([\.]+\s(?<Digits>[\d]+))|([\:]+\s(?<Words>[a-zA-Z]+))", RegexOptions.ExplicitCapture);
if (secondRegex.IsMatch(secondString))
{
MatchCollection secondMatches = secondRegex.Matches(secondString);
foreach (Match match in secondMatches)
{
if (match.Groups["Digits"].Success)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
if (match.Groups["Words"].Success)
{
Console.WriteLine("Words: " + match.Groups["Words"].Value);
}
}
}
Hope that helps. The output is:
Digits: 12312
Digits: 12334
Digits: 1234
Words: asgsgd
Something like this will do nicely!
var text = "aaaa 12312 <asdad> 12334 </asdad>";
var matches = Regex.Matches(text, #"\w+");
var arrayOfMatched = matches.Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(string.Join(", ", arrayOfMatched));
\w+ Matches consecutive word characters. Then we just selected the values out of the list of matches and turn them into an array.
Regex itemsRegex = new Regex(#"(\d*)");
MatchCollection matches = itemsRegex.Matches(text);
int[] values = matches.Cast<Match>().Select(m => Convert.ToInt32(m.Value)).ToArray();
Regex phoneregex = new Regex("[0-9][0-9][0-9]\-[0-9][0-9][0-9][0-9]");
String unicornCanneryDirectory = "unicorn cannery 483-8627 cha..."
String numbersToCall = "";
//the second argument is where to begin within the match,
//we probably want 0, the first character
Match matchIterator = phoneregex.Match(unicornCanneryDirectory , 0);
//Success tells us if matchIterator has another match or not
while( matchIterator.Sucess){
String aResult = matchIterator.Result();
//we could manipulate our match now but I'm going to concatenate them all for later
numbersToCall += aResult + " ";
matchIterator = matchIterator.NextMatch();
}
// use my concatenated matches now
String message = "Unicorn rights activists demand more sparkles in the unicorn canneries under the new law...";
phoneDialer.MassCallWithAutomatedMessage(aResult, message );
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.match.nextmatch.aspx

Can you improve this C# regular expression code?

In a program I'm reading in some data files, part of which are formatted as a series of records each in square brackets. Each record contains a section title and a series of key/value pairs.
I originally wrote code to loop through and extract the values, but decided it could be done more elegantly using regular expressions. Below is my resulting code (I just hacked it out for now in a console app - so know the variable names aren't that great, etc.
Can you suggest improvements? I feel it shouldn't be necessary to do two matches and a substring, but can't figure out how to do it all in one big step:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches=Regex.Matches(input, #"\[[^\]]*\]");
foreach (Match match in matches)
{
string subinput = match.Value;
int firstSpace = subinput.IndexOf(' ');
string section = subinput.Substring(1, firstSpace-1);
Console.WriteLine(section);
MatchCollection newMatches = Regex.Matches(subinput.Substring(firstSpace + 1), #"\s*(\w+)\s*=\s*(\w+)\s*");
foreach (Match newMatch in newMatches)
{
Console.WriteLine("{0}={1}", newMatch.Groups[1].Value, newMatch.Groups[2].Value);
}
}
I prefer named captures, nice formatting, and clarity:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches = Regex.Matches(input, #"\[
(?<sectionName>\S+)
(\s+
(?<key>[^=]+)
=
(?<value>[^ \] ]+)
)+
]", RegexOptions.IgnorePatternWhitespace);
foreach(Match currentMatch in matches)
{
Console.WriteLine("Section: {0}", currentMatch.Groups["sectionName"].Value);
CaptureCollection keys = currentMatch.Groups["key"].Captures;
CaptureCollection values = currentMatch.Groups["value"].Captures;
for(int i = 0; i < keys.Count; i++)
{
Console.WriteLine("{0}={1}", keys[i].Value, values[i].Value);
}
}
You should take advantage of the collections to get each key. So something like this then:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
Regex r = new Regex(#"(\[(\S+) (\s*\w+\s*=\s*\w+\s*)*\])", RegexOptions.Compiled);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups[2].Value);
foreach (Capture c in m.Groups[3].Captures)
{
Console.WriteLine(c.Value);
}
}
Resulting output:
section1
key1=value1
key2=value2
section2
key1=value1
key2=value2
key3=value3
section3
key1=value1
You should be able to do something with nested groups like this:
pattern = #"\[(\S+)(\s+([^\s=]+)=([^\s\]]+))*\]"
I haven't tested it in C# or looped through the matches, but the results look right on rubular.com
This will match all the key/value pairs ...
var input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
var ms = Regex.Matches(input, #"section(\d+)\s*(\w+=\w+)\s*(\w+=\w+)*");
foreach (Match m in ms)
{
Console.WriteLine("Section " + m.Groups[1].Value);
for (var i = 2; i < m.Groups.Count; i++)
{
if( !m.Groups[i].Success ) continue;
var kvp = m.Groups[i].Value.Split( '=' );
Console.WriteLine( "{0}={1}", kvp[0], kvp[1] );
}
}

Categories