Find all string occurrences after a string - c#

Just need a little push here. I have a file with data like
xyz buildinfo app_id="12345" asf
sfsdf buildinfo app_id="12346" wefwef
...
I need to get a string array with the number following app_id=. Below code gives me all matches and i am able to get the count( Regex.Matches(text, searchPattern).Count). But I need the actual items into an array.
string searchPattern = #"app_id=(\d+)";
var z = Regex.Matches(text, searchPattern);

I think you're saying you want the items (numbers) without the app_id part. You want to use a Positive Lookbehind
string text = #"xyz buildinfo app_id=""12345"" asf sfsdf buildinfo app_id=""12346"" wefwef";
string searchPattern = #"(?<=app_id="")(\d+)";
var z = Regex.Matches(text, searchPattern)
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
(?<=app_id="") will match the pattern, but not include it in the capture

You can take a look at the documentation.
Quoting it you can use this code:
string pattern = #"app_id=(\d+)";
string input = "xyz buildinfo app_id="12345" asf sfsdf buildinfo app_id="12346" efwef";
Match match = Regex.Match(input, pattern);
if (match.Success) {
Console.WriteLine("Matched text: {0}", match.Value);
for (int ctr = 1; ctr <= match.Groups.Count - 1; ctr++) {
Console.WriteLine(" Group {0}: {1}", ctr, match.Groups[ctr].Value);
int captureCtr = 0;
foreach (Capture capture in match.Groups[ctr].Captures) {
Console.WriteLine(" Capture {0}: {1}",
captureCtr, capture.Value);
captureCtr += 1;
}
}
}

Related

Getting a numbers from a string with chars glued

I need to recover each number in a glued string
For example, from these strings:
string test = "number1+3"
string test1 = "number 1+4"
I want to recover (1 and 3) and (1 and 4)
How can I do this?
CODE
string test= "number1+3";
List<int> res;
string[] digits= Regex.Split(test, #"\D+");
foreach (string value in digits)
{
int number;
if (int.TryParse(value, out number))
{
res.Add(number)
}
}
This regex should work
string pattern = #"\d+";
string test = "number1+3";
foreach (Match match in Regex.Matches(test, pattern))
Console.WriteLine("Found '{0}' at position {1}",
match.Value, match.Index);
Note that if you intend to use it multiple times, it's better, for performance reasons, to create a Regex instance than using this static method.
var res = new List<int>();
var regex = new Regex(#"\d+");
void addMatches(string text) {
foreach (Match match in regex.Matches(text))
{
int number = int.Parse(match.Value);
res.Add(number);
}
}
string test = "number1+3";
addMatches(test);
string test1 = "number 1+4";
addMatches(test1);
MSDN link.
Fiddle 1
Fiddle 2
This calls for a regular expression:
(\d+)\+(\d+)
Test it
Match m = Regex.Match(input, #"(\d+)\+(\d+)");
string first = m.Groups[1].Captures[0].Value;
string second = m.Groups[2].Captures[0].Value;
An alternative to regular expressions:
string test = "number 1+4";
int[] numbers = test.Replace("number", string.Empty, StringComparison.InvariantCultureIgnoreCase)
.Trim()
.Split("+", StringSplitOptions.RemoveEmptyEntries)
.Select(x => Convert.ToInt32(x))
.ToArray();

Finding the longest substring regex?

Someone knows how to find the longest substring composed of letters using using MatchCollection.
public static Regex pattern2 = new Regex("[a-zA-Z]");
public static string zad3 = "ala123alama234ijeszczepsa";
You can loop over all matches and get the longest:
string max = "";
foreach (Match match in Regex.Matches(zad3, "[a-zA-Z]+"))
if (max.Length < match.Value.Length)
max = match.Value;
Try this:
MatchCollection matches = pattern2.Matches(txt);
List<string> strLst = new List<string>();
foreach (Match match in matches)
strLst.Add(match.Value);
var maxStr1 = strLst.OrderByDescending(s => s.Length).First();
or better way :
var maxStr2 = matches.Cast<Match>().Select(m => m.Value).ToArray().OrderByDescending(s => s.Length).First();
best solution for your task is:
string zad3 = "ala123alama234ijeszczepsa54dsfd";
string max = Regex.Split(zad3,#"\d+").Max(x => x);
You must change your Regex pattern to include the repetition operator + so that it matches more than once.
[a-zA-Z] should be [a-zA-Z]+
You can get the longest value using LINQ. Order by the match length descending and then take the first entry. If there are no matches the result is null.
string pattern2 = "[a-zA-Z]+";
string zad3 = "ala123alama234ijeszczepsa";
var matches = Regex.Matches(zad3, pattern2);
string result = matches
.Cast<Match>()
.OrderByDescending(x => x.Value.Length)
.FirstOrDefault()?
.Value;
The string named result in this example is:
ijeszczepsa
Using linq and the short one:
string longest= Regex.Matches(zad3, pattern2).Cast<Match>()
.OrderByDescending(x => x.Value.Length).FirstOrDefault()?.Value;
you can find it in O(n) like this (if you do not want to use regex):
string zad3 = "ala123alama234ijeszczepsa";
int max=0;
int count=0;
for (int i=0 ; i<zad3.Length ; i++)
{
if (zad3[i]>='0' && zad3[i]<='9')
{
if (count > max)
max=count;
count=0;
continue;
}
count++;
}
if (count > max)
max=count;
Console.WriteLine(max);

Replace a text by incrementing each occurence

I have a text say:
Hello
abc
Hello
def
Hello
I want to convert it to
Hello1
abc
Hello2
abc
Hello3
i.e I need to append a number after each occurrence of "Hello" text.
Currently I have written this code:
var xx = File.ReadAllText("D:\\test.txt");
var regex = new Regex("Hello", RegexOptions.IgnoreCase);
var matches = regex.Matches(xx);
int i = 1;
foreach (var match in matches.Cast<Match>())
{
string yy = match.Value;
xx = Replace(xx, match.Index, match.Length, match.Value + (i++));
}
and the Replace method above used is:
public static string Replace(string s, int index, int length, string replacement)
{
var builder = new StringBuilder();
builder.Append(s.Substring(0, index));
builder.Append(replacement);
builder.Append(s.Substring(index + length));
return builder.ToString();
}
Currently the above code is not working and is replacing the text in between.
Can you help me fixing that?
Assuming Hello is just a placeholder for a more complex pattern, here is a simple fix: use a match evaluator inside Regex.Replace where you may use variables:
var s = "Hello\nabc\nHello\ndef\nHello";
var i = 0;
var result = Regex.Replace(
s, "Hello", m => string.Format("{0}{1}",m.Value,++i), RegexOptions.IgnoreCase);
Console.WriteLine(result);
See the C# demo

C# extract words using regex

I've found a lot of examples of how to check something using regex, or how to split text using regular expressions.
But how can I extract words out of a string ?
Example:
aaaa 12312 <asdad> 12334 </asdad>
Lets say I have something like this, and I want to extract all the numbers [0-9]* and put them in a list.
Or if I have 2 different kind of elements:
aaaa 1234 ...... 1234 ::::: asgsgd
And I want to choose digits that come after ..... and words that come after ::::::
Can I extract these strings in a single regex ?
Here's a solution for your first problem:
class Program
{
static void Main(string[] args)
{
string data = "aaaa 12312 <asdad> 12334 </asdad>";
Regex reg = new Regex("[0-9]+");
foreach (var match in reg.Matches(data))
{
Console.WriteLine(match);
}
Console.ReadLine();
}
}
In the general case, you can do this using capturing parentheses:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
string regex = #"\.\.\.\. (\d+) ::::: (\w+)";
Match m = Regex.Match(input, regex);
if (m.Success) {
int numberAfterDots = int.Parse(m.Groups[1].Value);
string wordAfterColons = m.Groups[2].Value;
// ... Do something with these values
}
But the first part you asked (extract all the numbers) is a bit easier:
string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
var numbers = Regex.Matches(input, #"\d+")
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToList();
Now numbers will be a list of integers.
For your specific examples:
string firstString = "aaaa 12312 <asdad> 12334 </asdad>";
Regex firstRegex = new Regex(#"(?<Digits>[\d]+)", RegexOptions.ExplicitCapture);
if (firstRegex.IsMatch(firstString))
{
MatchCollection firstMatches = firstRegex.Matches(firstString);
foreach (Match match in firstMatches)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
}
string secondString = "aaaa 1234 ...... 1234 ::::: asgsgd";
Regex secondRegex = new Regex(#"([\.]+\s(?<Digits>[\d]+))|([\:]+\s(?<Words>[a-zA-Z]+))", RegexOptions.ExplicitCapture);
if (secondRegex.IsMatch(secondString))
{
MatchCollection secondMatches = secondRegex.Matches(secondString);
foreach (Match match in secondMatches)
{
if (match.Groups["Digits"].Success)
{
Console.WriteLine("Digits: " + match.Groups["Digits"].Value);
}
if (match.Groups["Words"].Success)
{
Console.WriteLine("Words: " + match.Groups["Words"].Value);
}
}
}
Hope that helps. The output is:
Digits: 12312
Digits: 12334
Digits: 1234
Words: asgsgd
Something like this will do nicely!
var text = "aaaa 12312 <asdad> 12334 </asdad>";
var matches = Regex.Matches(text, #"\w+");
var arrayOfMatched = matches.Cast<Match>().Select(m => m.Value).ToArray();
Console.WriteLine(string.Join(", ", arrayOfMatched));
\w+ Matches consecutive word characters. Then we just selected the values out of the list of matches and turn them into an array.
Regex itemsRegex = new Regex(#"(\d*)");
MatchCollection matches = itemsRegex.Matches(text);
int[] values = matches.Cast<Match>().Select(m => Convert.ToInt32(m.Value)).ToArray();
Regex phoneregex = new Regex("[0-9][0-9][0-9]\-[0-9][0-9][0-9][0-9]");
String unicornCanneryDirectory = "unicorn cannery 483-8627 cha..."
String numbersToCall = "";
//the second argument is where to begin within the match,
//we probably want 0, the first character
Match matchIterator = phoneregex.Match(unicornCanneryDirectory , 0);
//Success tells us if matchIterator has another match or not
while( matchIterator.Sucess){
String aResult = matchIterator.Result();
//we could manipulate our match now but I'm going to concatenate them all for later
numbersToCall += aResult + " ";
matchIterator = matchIterator.NextMatch();
}
// use my concatenated matches now
String message = "Unicorn rights activists demand more sparkles in the unicorn canneries under the new law...";
phoneDialer.MassCallWithAutomatedMessage(aResult, message );
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.match.nextmatch.aspx

Can you improve this C# regular expression code?

In a program I'm reading in some data files, part of which are formatted as a series of records each in square brackets. Each record contains a section title and a series of key/value pairs.
I originally wrote code to loop through and extract the values, but decided it could be done more elegantly using regular expressions. Below is my resulting code (I just hacked it out for now in a console app - so know the variable names aren't that great, etc.
Can you suggest improvements? I feel it shouldn't be necessary to do two matches and a substring, but can't figure out how to do it all in one big step:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches=Regex.Matches(input, #"\[[^\]]*\]");
foreach (Match match in matches)
{
string subinput = match.Value;
int firstSpace = subinput.IndexOf(' ');
string section = subinput.Substring(1, firstSpace-1);
Console.WriteLine(section);
MatchCollection newMatches = Regex.Matches(subinput.Substring(firstSpace + 1), #"\s*(\w+)\s*=\s*(\w+)\s*");
foreach (Match newMatch in newMatches)
{
Console.WriteLine("{0}={1}", newMatch.Groups[1].Value, newMatch.Groups[2].Value);
}
}
I prefer named captures, nice formatting, and clarity:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
MatchCollection matches = Regex.Matches(input, #"\[
(?<sectionName>\S+)
(\s+
(?<key>[^=]+)
=
(?<value>[^ \] ]+)
)+
]", RegexOptions.IgnorePatternWhitespace);
foreach(Match currentMatch in matches)
{
Console.WriteLine("Section: {0}", currentMatch.Groups["sectionName"].Value);
CaptureCollection keys = currentMatch.Groups["key"].Captures;
CaptureCollection values = currentMatch.Groups["value"].Captures;
for(int i = 0; i < keys.Count; i++)
{
Console.WriteLine("{0}={1}", keys[i].Value, values[i].Value);
}
}
You should take advantage of the collections to get each key. So something like this then:
string input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
Regex r = new Regex(#"(\[(\S+) (\s*\w+\s*=\s*\w+\s*)*\])", RegexOptions.Compiled);
foreach (Match m in r.Matches(input))
{
Console.WriteLine(m.Groups[2].Value);
foreach (Capture c in m.Groups[3].Captures)
{
Console.WriteLine(c.Value);
}
}
Resulting output:
section1
key1=value1
key2=value2
section2
key1=value1
key2=value2
key3=value3
section3
key1=value1
You should be able to do something with nested groups like this:
pattern = #"\[(\S+)(\s+([^\s=]+)=([^\s\]]+))*\]"
I haven't tested it in C# or looped through the matches, but the results look right on rubular.com
This will match all the key/value pairs ...
var input = "[section1 key1=value1 key2=value2][section2 key1=value1 key2=value2 key3=value3][section3 key1=value1]";
var ms = Regex.Matches(input, #"section(\d+)\s*(\w+=\w+)\s*(\w+=\w+)*");
foreach (Match m in ms)
{
Console.WriteLine("Section " + m.Groups[1].Value);
for (var i = 2; i < m.Groups.Count; i++)
{
if( !m.Groups[i].Success ) continue;
var kvp = m.Groups[i].Value.Split( '=' );
Console.WriteLine( "{0}={1}", kvp[0], kvp[1] );
}
}

Categories