Regex: Multiple captures in multiple captures - c#

I've a regular expression that works perfect.
^SENT KV(?<singlelinedata> L(?<line>[1-9]\d*) (?<measureline>\d+)(?: (?<samplingpoint>\d+))+)+$
My Input string looks like this:
SENT KV L1 123 1 2 3 L2 456 4 5 6
The only question is: How to get the context of all captures of group "samplingpoint"?
This group contains 6 captures but I need the context information too. There are three captures in the first capture of group "singlelinedata" and three in the second capture. How to get this information?
The capture of a group doesn't contain a property containing all captures of contained groups.
I know that I can write a single regex to match the whole string an perform a second regex to parse all "singlelinedata"-captures.
I'm looking for a way that works with the specified regex.
Hope someone can help me.

void Main()
{
string data = #"SENT KV L1 123 1 2 3 L2 456 4 5 6";
Parse(data).Dump();
}
public class Result
{
public int Line;
public int MeasureLine;
public List<int> SamplingPoints;
}
private Regex pattern = new Regex(#"^SENT KV(?<singlelinedata> L(?<line>[1-9]\d*) (?<measureline>\d+)(?: (?<samplingpoint>\d+))+)+$", RegexOptions.Multiline);
public IEnumerable<Result> Parse(string data)
{
foreach (Match m in pattern.Matches(data))
{
foreach (Capture c1 in m.Groups["singlelinedata"].Captures)
{
int lineStart = c1.Index;
int lineEnd = c1.Index + c1.Length;
var result = new Result();
result.Line = int.Parse(m.Groups["line"].CapturesWithin(c1).First().Value);
result.MeasureLine = int.Parse(m.Groups["measureline"].CapturesWithin(c1).First().Value);
result.SamplingPoints = new List<int>();
foreach (Capture c2 in m.Groups["samplingpoint"].CapturesWithin(c1))
{
result.SamplingPoints.Add(int.Parse(c2.Value));
}
yield return result;
}
}
}
public static class RegexExtensions
{
public static IEnumerable<Capture> CapturesWithin(this Group group, Capture capture)
{
foreach (Capture c in group.Captures)
{
if (c.Index < capture.Index) continue;
if (c.Index >= capture.Index + capture.Length) break;
yield return c;
}
}
}
Edit: Rewritten as an extension method on Group.

There's no concept of "subgroups" in the regex API. A group can have multiple captures, but you can't know which samplingpoint belongs to which line.
You only option is to use the character index to calculate it yourself.

One way without doing lots of index matching and keeping a single regex is to change the capture groups to all have the same name. The nested captures actually get pushed onto the stack first so you end up with an array like this:
["1", "123", "1", "2", "3", "L1 123 1 2 3", "2", "456", "4", "5", "6", "L2 456 4 5 6"]
Then it's just a matter of some LINQ craziness to split the result into groups when a capture containing an L is found and then pulling out the data from each group.
var regex = new Regex(#"^SENT KV(?<singlelinedata> L(?<singlelinedata>[1-9]\d*) (?<singlelinedata>\d+)(?: (?<singlelinedata>\d+))+)+$");
var matches = regex.Matches("SENT KV L1 123 1 2 3 L2 456 4 5 6 12 13 L3 789 7 8 9 10");
var singlelinedata = matches[0].Groups["singlelinedata"];
string groupKey = null;
var result = singlelinedata.Captures.OfType<Capture>()
.Reverse()
.GroupBy(key => groupKey = key.Value.Contains("L") ? key.Value : groupKey, value => value.Value)
.Reverse()
.Select(group => new { key = group.Key, data = group.Skip(1).Reverse().ToList() })
.Select(item => new { line = item.data.First(), measureline = item.data.Skip(1).First(), samplingpoints = item.data.Skip(2).ToList() })
.ToList();

Based on the answer of Markus Jarderot I wrote an extension method for groups that takes a capture and returns all captures of that group within the specified capture.
The extension method looks like this:
public static IEnumerable<Capture> CapturesWithin(this Group source, Capture captureContainingGroup)
{
var lowerIndex = captureContainingGroup.Index;
var upperIndex = lowerIndex + captureContainingGroup.Length - 1;
foreach (var capture in source.Captures.Cast<Capture>())
{
if (capture.Index < lowerIndex)
{
continue;
}
if (capture.Index > upperIndex)
{
break;
}
yield return capture;
}
}
Usage of this method:
foreach (var capture in match.Groups["singlelinedata"].Captures.Cast<Capture>())
{
var samplingpoints = match.Groups["samplingpoint"].CapturesWithin(capture).ToList();
...

Related

How do I display a list and a list of lists side by side

I found a code that displays two lists side by side but a list and a list of lists no luck
this is the code of two lists side by side
for (var i = 0; i < bncount; i++)
{
//Console.WriteLine(String.Format("{0,-10} | {1,-10}", hed.ElementAt(i),bin.ElementAt(i)));
Console.WriteLine(String.Format("{0,-10} | {1,-10}", i< hdcount ? hed[i] : string.Empty, i< bncount ? bin[i] : string.Empty));
}
but the string.empty is for lists only and not list of lists and ElementAt() also wouldn't work
I tried using linq with foreach but no success
the hed is a list of strings and bn is a list of lists of a sequence of numbers
my output are as follows
foreach(var r in bin) //0010 1110 1111
foreach(var m in hed) //red blue white
I want to have the following output
red 0010
blue 1110
white 1111
or
red blue white
0 1 1
0 1 1
1 1 1
0 0 1
Any Idea on how to do this in c# in general or in Linq? the methods I tried either resulted on reprinting one value only froma hed and all the vlaues from bin or the opposite
Not sure if I understood the question correctly, I think it would be helpful to extend the code examples including the variables definition. Anyway, if I understood correctly, this would me my approach:
var listOfString = new List<string>( )
{
"red",
"blue",
"white"
};
var listOfArrays = new List<int[]>( )
{
new int[] { 0,0,1,0 },
new int[] { 0,1,1,1 },
new int[] { 1,1,1,1 }
};
// Here you could add a condition in case you are not 100% sure your arrays are of same length.
for( var i = 0; i < listOfString.Count; i++ )
{
var stringItem = listOfString[i];
var arrayItem = listOfArrays[i];
Console.WriteLine( $"{stringItem} {string.Join( null, arrayItem )}" );
}
I would suggest using a different structure for storing your data (considering OOP principles) and the following code for printing the data out:
public class Item
{
public string Name { get; set; }
public List<int> Values { get; set; }
}
public void Print(List<Item> items)
{
foreach (var item in items)
{
Console.WriteLine($"{item.Name} {string.Join("", item.Values)}");
}
}
The first version is not so hard:
string reuslt = string.Join("\n", bin.Zip(hed).Select(x => $"{x.Item1} {x.Item2}"));
With zip, we create an enumerable of tuples, where the n-th tuple has the n-th element of bin and the n-th element of hed. You can just concatenate those two items.
The second version is a bit more complex:
result = string.Join("\t",hed) + "\n" +
string.Join("\n",Enumerable.Range(0, bin.First().Length)
.Select(x => string.Join("\t\t", bin.Select(str => str[x]))));
We create the heading line by just joing the hed strings. Then we create an enumerable of numbers which represent the indexes in the string. The enumerable will be 0, 1, 2, 3. Then we take the char with index 0 of each string of the bin list, then the char with index 1 of each string of the bin list and so on.
Online demo: https://dotnetfiddle.net/eBQ54N
You can use a dictionary:
var hed = new List<string>(){"red", "blue", "white"};
var bin = new List<string>(){ "0010", "1110", "1111" };
Dictionary<string, string> Dic = new Dictionary<string, string>();
for (int i = 0; i < hed.Count; i++)
{
Dic.Add(hed[i],bin[i]);
}
foreach (var item in Dic)
{
Console.WriteLine(item.Key+" "+item.Value);
}
Console.ReadKey();

I want to use LINQ to extract the strings that match the class property strings and the strings in the prepared list

Prerequisites
A class called AAA has a string type property called BBB.
BBB contains a character string such as "1,5,6,12".
There is another int type list called CCC (eg [1,2,4,7,9,12,15]).
want to achieve
I want to use LINQ to get a list of AAAs that match at least one in BBB and CCC from DbSet in C#.
Example
AAA1 → AAA1.BBB = "1,2,3"
AAA2 → AAA2.BBB = "4,5,6"
AAA3 → AAA3.BBB = "7,8,9"
CCC = [1,3,4]
In this case I want a list of AAA1 and AAA2!!!
Any help is highly appreciated.
Disclaimer: provided they are always INT and also they are separated by "," this solution will work
public class StringNumber
{
public string Numbers { get; set; }
}
public static void Main()
{
var a = new StringNumber { Numbers = "1,2,3" };
var b = new StringNumber { Numbers = "4,5,6" };
var c = new StringNumber { Numbers = "7,8,9" };
List<StringNumber> stringNumbers = new List<StringNumber>() { a, b, c };
List<int> numbers = new List<int>() { 10 };
//ToList is needed unless you want it to actually remove from the stringNumbers!
var result = GetAllListsWithNumber(numbers, stringNumbers.ToList());
}
public static List<StringNumber> GetAllListsWithNumber(List<int> numbers, List<StringNumber> stringNumbers)
{
stringNumbers.RemoveAll(x => !x.Numbers
.Split(",", StringSplitOptions.RemoveEmptyEntries)
.Select(number => Convert.ToInt32(number))
.Any(number => numbers.Contains(number)));
return stringNumbers;
}

How to get every possible combination base on the ranges in brackets?

Looking for the best way to take something like 1[a-C]3[1-6]07[R,E-G] and have it output a log that would look like the following — basically every possible combination base on the ranges in brackets.
1a3107R
1a3107E
1a3107F
1a3107G
1b3107R
1b3107E
1b3107F
1b3107G
1c3107R
1c3107E
1c3107F
1c3107G
all the way to 1C3607G.
Sorry for not being more technical about what I looking for, just not sure on the correct terms to explain.
Normally what we'd do to get all combinations is to put all our ranges into arrays, then use nested loops to loop through each array, and create a new item in the inner loop that gets added to our results.
But in order to do that here, we'd first need to write a method that can parse your range string and return a list of char values defined by the range. I've written a rudimentary one here, which works with your sample input but should have some validation added to ensure the input string is in the proper format:
public static List<char> GetRange(string input)
{
input = input.Replace("[", "").Replace("]", "");
var parts = input.Split(',');
var range = new List<char>();
foreach (var part in parts)
{
var ends = part.Split('-');
if (ends.Length == 1)
{
range.Add(ends[0][0]);
}
else if (char.IsDigit(ends[0][0]))
{
var start = Convert.ToInt32(ends[0][0]);
var end = Convert.ToInt32(ends[1][0]);
var count = end - start + 1;
range.AddRange(Enumerable.Range(start, count).Select(c => (char) c));
}
else
{
var start = (int) ends[0][0];
var last = (int) ends[1][0];
var end = last < start ? 'z' : last;
range.AddRange(Enumerable.Range(start, end - start + 1)
.Select(c => (char) c));
if (last < start)
{
range.AddRange(Enumerable.Range('A', last - 'A' + 1)
.Select(c => (char) c));
}
}
}
return range;
}
Now that we can get a range of values from a string like "[a-C]", we need a way to create nested loops for each range, and to build our list of values based on the input string.
One way to do this is to replace our input string with one that contains placeholders for each range, and then we can create a loop for each range, and on each iteration we can replace the placeholder for that range with a character from the range.
So we'll take an input like this: "1[a-C]3[1-6]07[R,E-G]", and turn it into this: "1{0}3{1}07{2}". Now we can create loops where we take the characters from the first range and create a new string for each one of them, replacing the {0} with the character. Then, for each one of those strings, we iterate over the second range and create a new string that replaces the {1} placeholder with a character from the second range, and so on and so on until we've created new strings for every possible combination.
public static List<string> GetCombinatins(string input)
{
// Sample input = "1[a-C]3[1-6]07[R,E-G]"
var inputWithPlaceholders = string.Empty; // This will become "1{0}3{1}07{2}"
var placeholder = 0;
var ranges = new List<List<char>>();
for (int i = 0; i < input.Length; i++)
{
// We've found a range start, so replace this with our
// placeholder '{n}' and add the range to our list of ranges
if (input[i] == '[')
{
inputWithPlaceholders += $"{{{placeholder++}}}";
var rangeEndIndex = input.IndexOf("]", i);
ranges.Add(GetRange(input.Substring(i, rangeEndIndex - i)));
i = rangeEndIndex;
}
else
{
inputWithPlaceholders += input[i];
}
}
if (ranges.Count == 0) return new List<string> {input};
// Add strings for the first range
var values = ranges.First().Select(chr =>
inputWithPlaceholders.Replace("{0}", chr.ToString())).ToList();
// Then continually add all combinations of other ranges
for (int i = 1; i < ranges.Count; i++)
{
values = values.SelectMany(value =>
ranges[i].Select(chr =>
value.Replace($"{{{i}}}", chr.ToString()))).ToList();
}
return values;
}
Now with these methods out of the way, we can create output of all our ranges quite easily:
static void Main()
{
Console.WriteLine(string.Join(", ", GetCombinatins("1[a-C]3[1-6]07[R,E-G]")));
GetKeyFromUser("\nPress any key to exit...");
}
Output
I would approach this problem in three stages. The first stage is to transform the source string to an IEnumerable of IEnumerable<string>.
static IEnumerable<IEnumerable<string>> ParseSourceToEnumerables(string source);
For example the source "1[A-C]3[1-6]07[R,E-G]" should be transformed to the 6 enumerables below:
"1"
"A", "B", "C"
"3"
"1", "2", "3", "4", "5", "6"
"07"
"R", "E", "F", "G"
Each literal inside the source has been transformed to an IEnumerable<string> containing a single string.
The second stage would be to create the Cartesian product of these enumerables.
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
IEnumerable<IEnumerable<T>> sequences)
The final (and easiest) stage would be to concatenate each one of the inner IEnumerable<string> of the Cartesian product to a single string. For example
the sequence "1", "A", "3", "1", "07", "R" to the string "1A3107R"
The hardest stage is the first one, because it involves parsing. Below is a partial implementation:
static IEnumerable<IEnumerable<string>> ParseSourceToEnumerables(string source)
{
var matches = Regex.Matches(source, #"\[(.*?)\]", RegexOptions.Singleline);
int previousIndex = 0;
foreach (Match match in matches)
{
var previousLiteral = source.Substring(
previousIndex, match.Index - previousIndex);
if (previousLiteral.Length > 0)
yield return Enumerable.Repeat(previousLiteral, 1);
yield return SinglePatternToEnumerable(match.Groups[1].Value);
previousIndex = match.Index + match.Length;
}
var lastLiteral = source.Substring(previousIndex, source.Length - previousIndex);
if (lastLiteral.Length > 0) yield return Enumerable.Repeat(lastLiteral, 1);
}
static IEnumerable<string> SinglePatternToEnumerable(string pattern)
{
// TODO
// Should transform the pattern "X,A-C,YZ"
// to the sequence ["X", "A", "B", "C", "YZ"]
}
The second stage is hard too, but solved. I just grabbed the implementation from Eric Lippert's blog.
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
accumulator.SelectMany(_ => sequence,
(accseq, item) => accseq.Append(item)) // .NET Framework 4.7.1
);
}
The final stage is just a call to String.Join.
var source = "1[A-C]3[1-6]07[R,E-G]";
var enumerables = ParseSourceToEnumerables(source);
var combinations = CartesianProduct(enumerables);
foreach (var combination in combinations)
{
Console.WriteLine($"Combination: {String.Join("", combination)}");
}

Create Regex Pattern for String using C#

I have string pattern like this:
#c1 12,34,222x8. 45,989,100x10. 767x55. #c1
I want to change these patterns into this:
c1,12,8
c1,34,8
c1,222,8
c1,45,10
c1,989,10
c1,100,10
c1,767,55
My code in C#:
private void btnProses_Click(object sender, EventArgs e)
{
String ps = txtpesan.Text;
Regex rx = new Regex("((?:\d+,)*(?:\d+))x(\d+)");
Match mc = rx.Match(ps);
while (mc.Success)
{
txtpesan.Text = rx.ToString();
}
}
I've been using split and replace but to no avail. After I tried to solve this problem, I see many people using regex, I tried to use regex but I do not get the logic of making a pattern regex.
What should I use to solve this problem?
sometimes regex is not good approach - old school way wins. Assuming valid input:
var tokens = txtpesan.Text.Split(' '); //or use split by regex's whitechar
var prefix = tokens[0].Trim('#');
var result = new StringBuilder();
//skip first and last token
foreach (var token in tokens.Skip(1).Reverse().Skip(1).Reverse())
{
var xIndex = token.IndexOf("x");
var numbers = token.Substring(0, xIndex).Split(',');
var lastNumber = token.Substring(xIndex + 1).Trim('.');
foreach (var num in numbers)
{
result.AppendLine(string.Format("{0},{1},{2}", prefix, num, lastNumber));
}
}
var viola = result.ToString();
Console.WriteLine(viola);
And here comes a somewhat ugly regex based solution:
var q = "#c1 12,34,222x8. 45,989,100x10. 767x55. #c1";
var results = Regex.Matches(q, #"(?:(?:,?\b(\d+))(?:x(\d+))?)+");
var caps = results.Cast<Match>()
.Select(m => m.Groups[1].Captures.Cast<Capture>().Select(cap => cap.Value));
var trailings = results.Cast<Match>().Select(m => m.Groups[2].Value).ToList();
var c1 = q.Split(' ')[0].Substring(1);
var cnt = 0;
foreach (var grp in caps)
{
foreach (var item in grp)
{
Console.WriteLine("{0},{1},{2}", c1, item, trailings[cnt]);
}
cnt++;
}
The regex demo can be seen here. The pattern matches blocks of comma-separated digits while capturing the digits into Group 1, and captures the digits after x into Group 2. Could not get rid of the cnt counter, sorry.

how to split the line in the text file

Text File:
$3.00,0.00,0.00,1.00,L55894M8,$3.00,0.00,0.00,2.00,L55894M9
How do I split the line and get the serial number like L55894M8 and L55894M9?
To get the data that appears after the 4th comma and 9th comma, you would want to do:
var pieces = line.Split(',');
var serial1 = line[3];
var serial2 = line[8];
Edit: Upon further reflection, it appears your file has records that begin with $ and end with the next record. If you want these records, along with the serial number (which appears to be the last field) you can do:
var records = line.TrimStart('$').Split('$');
var recordObjects = records.Select(r => new { Line = r, Serial = r.TrimEnd(',').Split(',').Last() });
In your sample you means want to get the words in index of ( 4 , 9 , 14 .... )
And the five words as a party .
So you can try this way.....
static void Main(string[] args)
{
string strSample = "$3.00,0.00,0.00,1.00,L55894M8,$3.00,0.00,0.00,2.00,L55894M9";
var result = from p in strSample.Split(',').Select((v, i) => new { Index = i, Value = v })
where p.Index % 5 == 4
select p.Value;
foreach (var r in result)
{
Console.WriteLine(r);
}
Console.ReadKey();
}
if the file is in a string you can use the string's .split(',') method then check each element of the resulting array. Or grab every 5th element if that pattern of data is seen throughout.
var str = "$3.00,0.00,0.00,1.00,L55894M8,$3.00,0.00,0.00,2.00,L55894M9";
var fieldArray = str.Split(new[] { ',' });
var serial1 = fieldArray[4]; // "L55894M8"
var serial2 = fieldArray[9]; // "L55894M9"
Try regular expression.
string str = "$3.00,0.00,0.00,1.00,L55894M8,$3.00,0.00,0.00,2.00,L55894M9";
string pat = #"L[\w]+";
MatchCollection ar= Regex.Matches(str, pat);
foreach (var t in ar)
Console.WriteLine(t);

Categories