Regex - 1st group 1 time, 2nd group Multiple times

Regex - 1st group 1 time, 2nd group Multiple times - c#

I have data like -
06deepaksharma
i need regex to split the data as
06 > then multiple group of (06 char)
so its going to be like
first 2 digit then multiple groups, each with the length of first 2 digit value.
01DE > 01 D E 01 - then 2 group each 1 char length
02DE > 02 DE 02 - then 1 group each 2 char length
02DESH > 02 DE SH 02 - then 2 group each 2 char length
03DEESHA > 03 DEE SHA 03 - then 2 group each 3 char length
01DEESHA > 01 D E E S H A 01 - then 6 group each 1 char length
Hope now its clear what i want.
I am not getting how to fix the length for second group on the basis of first group value and how to define that second group may occur N times.
UPDATE BELOW ---
so if we can not apply the length on second group then can we get all the possibility if I say i fix the length of second group?
mean if length going to be 2 for char groups
01DE > 01 DE
01DEEPAK > 01 DE EP AK
XXDEEP > XX DE EP
So if we say length going to be 2 all the times, now can be get the desired result as stated in UPDATED parts

You can achieve what you described in the beginning of your question with both regex and LINQ:
var input = "03DEESHA";
var result = new List<string>();
var mtch = Regex.Match(input, #"^(\d+)(.*)"); // Get the Match object with captured texts
result.Add(mtch.Groups[1].Value); // Add the number to the resulting list
var chunks = Regex.Matches(mtch.Groups[2].Value, // Get all chunks
string.Format(".{{{0}}}", int.Parse(mtch.Groups[1].Value)))
.Cast<Match>()
.Select(p => p.Value)
.ToList();
result.AddRange(chunks);
The regex ^(\d+)(.*) matches any numbers in the beginning (Group 1), and then captures the rest of a single-line string (with no newlines, if you want to support them, add a RegexOptions.Singleline flag to the Regex.Match) into Group 2.
Result of the above code execution:
If you have strings where the number of the letters cannot be divided by the initial number without a remainder, instead of ".{{{0}}}" use ".{{1,{0}}}".

I don´t think you can use regex here as you need to use a back-ref with variable value.
However you may consider a simple linq on the characters:
// first get the number of characters to read
int num = Convert.ToInt32(myString.Substring(0, 2));
// now a simple loop on the characters
for(int i = 2; i < myString.Length; i += num) result.Add(myString.SubString(i, num);
Or if you really want a regex parse the number first and THEN apply your regex:
var r = "([a-zA-Z]{" + num + "})";
var res = new Regex(r).Split(new string(myString.Skip(2).ToArray()));

Related

C# split 20 digit numbers and assign it to 5 string variables

I need guide as to how I can split 20 digit numbers (e.g 77772222666611118888) and uniquely assign to five declared int variables e.g int n1,n2,n3,n4,n5.
Expected result
int mynumber = 77772222666611118888;
And, after the splitting and assigning, one gets the following:
n1=7777;
n2=2222;
n3=6666;
n4=1111;
n5=8888;
Thanks

You can use a simple regex for it
string mynumber = "77772222666611118888";
var ns = Regex.Matches(mynumber, #"\d{4}").Cast<Match>()
.Select(x => x.Value)
.ToList();

If you need to use separate variables, you could do this:
string mynumber = "77772222666611118888";
string n1 = mynumber.Substring(0, 4);
string n2 = mynumber.Substring(4, 4);
string n3 = mynumber.Substring(8, 4);
string n4 = mynumber.Substring(12, 4);
string n5 = mynumber.Substring(16, 4);
If you're willing to use an array or another collection, you could do this:
int stringSize = 4;
string[] n = new string[mynumber.Length / stringSize];
for (int i = 0; i < n.Length; i ++)
{
n[i] = mynumber.Substring(i*4, stringSize);
}

You'll need a long or Decimal to store the initial number, as its too long for an int.
Once that is done, use modulus to get the digits (in reverse), and division to get rid of the used numbers:
long tempNumber = number;
List<long> splitNumbers = new List<long>();
while (tempNumber > 0)
{
long currentDigits = number % 10000;
tempNumber = tempNumber / 10000; //Make sure this is integer division!
//Store cuurentDigits off in your variables
// 8888 would be the first number returned in this loop
// then 1111 and so on
splitNumbers.Add(currentDigits);
}
//We got the numbers backwards, so reverse the list
IEnumerable<long> finalNumberList = splitNumbers.Reverse();
You could also turn it into a string, and use .Take(4) and int.Parse to get your numbers.

You should convert myNumber to a string first,
then extract each part of this number using the substring function
and parse those strings back to the desired integers

You haven't defined what the rules are for performing the split you asked about. Are you splitting based on position and length? Are you splitting based on runs of identical digits?
Assuming you're splitting based on runs of identical digits, you could use a backreference like so,
Regex rxSameDigits = new Regex(#"(\d)(\1)*") ;
The \1 says, use the value of the specified group, in this case, the group starting with the first left parenthesis in the regex. So it says to
Match any digit, followed by
zero or more of the exact same digit
So it will match sequences like 1, 22, 333, etc. So you can simply say:
string s = "1223334444555556666667777777888888889999999990000000000" ;
string[] split = rxSameDigits.Matches(s).Cast<Match>().Select( x => x.Value ).ToArray() ;
And get the expected
1
22
333
4444
55555
666666
7777777
88888888
999999999
0000000000

LINQ non-linear order by string length

I'm trying to get a list of string ordered such that the longest are on either end of the list and the shortest are in the middle. For example:
A
BB
CCC
DDDD
EEEEE
FFFFFF
would get sorted as:
FFFFFF
DDDD
BB
A
CCC
EEEEE
EDIT: To clarify, I was specifically looking for a LINQ implementation to achieve the desired results because I wasn't sure how/if it was possible to do using LINQ.

You could create two ordered groups, then order the first group descending(already done) and the second group ascending:
var strings = new List<string> {
"A",
"BB",
"CCC",
"DDDD",
"EEEEE",
"FFFFFF"};
var two = strings.OrderByDescending(str => str.Length)
.Select((str, index) => new { str, index })
.GroupBy(x => x.index % 2)
.ToList(); // two groups, ToList to prevent double execution in following query
List<string> ordered = two.First()
.Concat(two.Last().OrderBy(x => x.str.Length))
.Select(x => x.str)
.ToList();
Result:
[0] "FFFFFF" string
[1] "DDDD" string
[2] "BB" string
[3] "A" string
[4] "CCC" string
[5] "EEEEE" string

Don't ask how and why... ^^
list.Sort(); // In case the list is not already sorted.
var length = list.Count;
var result = Enumerable.Range(0, length)
.Select(i => length - 1 - 2 * i)
.Select(i => list[Math.Abs(i - (i >> 31))])
.ToList();
Okay, before I forget how it works, here you go.
A list with 6 items for example has to be reordered to this; the longest string is at index 5, the shortest one at index 0 of the presorted list.
5 3 1 0 2 4
We start with Enumerable.Range(0, length) yielding
0 1 2 3 4 5
then we apply i => length - 1 - 2 * i yielding
5 3 1 -1 -3 -5
and we have the non-negative part correct. Now note that i >> 31 is an arithmetic left shift and will copy the sign bit into all bits. Therefore non-negative numbers yield 0 while negative numbers yield -1. That in turn means subtracting i >> 31 will not change non-negative numbers but add 1 to negative numbers yielding
5 3 1 0 -2 -4
and now we finally apply Math.Abs() and get
5 3 1 0 2 4
which is the desired result. It works similarly for lists of odd length.

Just another option, which I find more readable and easy to follow:
You have an ordered list:
var strings = new List<string> {
"A",
"BB",
"CCC",
"DDDD",
"EEEEE",
"FFFFFF"};
Create a new list and simply alternate where you add items::
var new_list = new List<string>(); // This will hold your results
bool start = true; // Insert at head or tail
foreach (var s in strings)
{
if (start)
new_list.Insert(0,s);
else
new_list.Add(s);
start = !start; // Flip the insert location
}
Sweet and simple :)
As for Daniel Bruckner comment, if you care about which strings comes first, you could also change the start condition to:
// This will make sure the longest strings is first
bool start= strings.Count()%2 == 1;

Match a row with fixed columns as long as possible

I'm going to parse a position base file from a legacy system. Each column in the file has a fixed column width and each row can maximum be 80 chars long. The problem is that you don't know how long a row is. Sometime they only have filled in the first five columns, and sometimes all columns are used.
If I KNOW that all 80 chars where used, then I simple could do like this:
^\s*
(?<a>\w{3})
(?<b>[ \d]{2})
(?<c>[ 0-9a-fA-F]{2})
(?<d>.{20})
...
But the problem with this is that if the last columns is missing, the row will not match. The last column can even be less number of chars then the maximum of that column.
See example
Text to match a b c d
"AQM45A3A text " => AQM 45 A3 "A text " //group d has 9 chars instead of 20
"AQM45F5" => AQM 45 F5 //group d is missing
"AQM4" => AQM 4 //group b has 1 char instead of 2
"AQM4 ASome Text" => AQM 4 A "Some Text" //group b and c only uses one char, but fill up the gap with space
"AQM4FSome Text" => No match, group b should have two numbers, but it is only one.
"COM*A comment" => Comments do not match (all comments are prefixed with COM*)
" " => Empty lines do not match
How should I design the Regular Expression to match this?
Edit 1
In this example, EACH row that I want to parse, is starting with AQM
Column a is always starting at position 0
Column b is always starting at position 3
Column c is always starting at position 5
Column d is always starting at position 7
If a column is not using all its space, is files up with spaces
Only the last column that is used can be trimed
Edit 2
To make it more clearer, I enclose here soem exemple of how the data might look like, and the definition of the columns (note that the examples I have mentioned earlier in the question was heavily simplified)

I'm not sure a regexp is the right thing to use here. If I understand your structure, you want something like
if (length >= 8)
d = everything 8th column on
remove field d
else
d = empty
if (length >= 6)
c = everything 6th column on
remove field c
else
c = empty
etc. Maybe a regexp can do it, but it will probably be rather contrived.

Try using a ? after the groups which could not be there. In this case if some group is missing you would have the match.
Edit n, after Sguazz answer
I would use
(?<a>AQM)(?<b>[ \d]{2})?(?<c>[ 0-9a-fA-F]{2})?(?<d>.{0,20})?
or even a + instead of the {0,20} for the last group, if could be that there are more than 20 chars.
Edit n+1,
Better like this?
(?<a>\w{3})(?<b>\d[ \d])(?<c>[0-9a-fA-F][ 0-9a-fA-F])(?<d>.+)

So, just to rephrase: in your example you have a sequence of character, and you know that the first 3 belong to group A, the following 2 belong to group B, then 2 to group C and 20 to group D, but there might not be this many elements.
Try with:
(?<a>\w{0,3})(?<b>[ \d]{0,2})(?<c>[ 0-9a-fA-F]{0,2})(?<d>.{0,20})
Basically these numbers are now an upper limit of the group as opposed to a fixed size.
EDIT, to reflect your last comment: if you know that all your relevant rows start with 'AQM', you can replace group A with (?<a>AQM)
ANOTHER EDIT: Let's try with this instead.
(?<a>AQM)(?<b>[ \d]{2}|[ \d]$)(?<c>[ 0-9a-fA-F]{0,2})(?<d>.{0,20})

Perhaps you could use a function like this one to break the string into its column values. It doesn't parse comment strings and is able to handle strings that are shorter than 80 characters. It doesn't validate the contents of the columns though. Maybe you can do that when you use the values.
/// <summary>
/// Break a data row into a collection of strings based on the expected column widths.
/// </summary>
/// <param name="input">The width delimited input data to break into sub strings.</param>
/// <returns>
/// An empty collection if the input string is empty or a comment.
/// A collection of the width delimited values contained in the input string otherwise.
/// </returns>
private static IEnumerable<string> ParseRow(string input) {
const string COMMENT_PREFIX = "COM*";
var columnWidths = new int[] { 3, 2, 2, 3, 6, 14, 2, 2, 3, 2, 2, 10, 7, 7, 2, 1, 1, 2, 7, 1, 1 };
int inputCursor = 0;
int columnIndex = 0;
var parsedValues = new List<string>();
if (String.IsNullOrEmpty(input) || input.StartsWith(COMMENT_PREFIX) || input.Trim().Length == 0) {
return parsedValues;
}
while (inputCursor < input.Length && columnIndex < columnWidths.Length) {
//Make sure the column width never exceeds the bounds of the input string. This can happen if the input string doesn't end on the edge of a column.
int columnWidth = Math.Min(columnWidths[columnIndex++], input.Length - inputCursor);
string columnValue = input.Substring(inputCursor, columnWidth);
parsedValues.Add(columnValue);
inputCursor += columnWidth;
}
return parsedValues;
}

Getting a substring 2 characters at a time [duplicate]

This question already has answers here:
Splitting a string / number every Nth Character / Number?
(17 answers)
Closed 8 years ago.
I have a string that looks something like:
0122031203
I want to be able to parse it and add the following into a list:
01
22
03
12
03
So, I need to get each 2 characters and extract them.
I tried this:
List<string> mList = new List<string>();
for (int i = 0; i < _CAUSE.Length; i=i+2) {
mList.Add(_CAUSE.Substring(i, _CAUSE.Length));
}
return mList;
but something is not right here, I keep getting the following:
Index and length must refer to a location within the string. Parameter
name: length
Did I get this wrong?

How about using Linq?
string s = "0122031203";
int i = 0;
var mList = s.GroupBy(_ => i++ / 2).Select(g => String.Join("", g)).ToList();

I believe you have may have specified the length incorrectly in the Substring function.
Try the following:
List<string> mList = new List<string>();
for (int i = 0; i < _CAUSE.Length; i = i + 2)
{
mList.Add(_CAUSE.Substring(i, 2));
}
return mList;
The length should be 2 if you wish to split this into chunks of 2 characters each.

when you do the substring, try _CAUSE.SubString(i, 2).

2points:
1) as previously mentioned, it should be substring(i,2);
2) U should consider the case when the length of ur string is odd. For example 01234: do u want it 01 23 and u'll discard the 4 or do u want it to be 01 23 4 ??

C# /Linq Yet another Brain Teaser

Friends, there is yet another scenario to solve. I am working it out without applying Linq.But I hope it is good opportunity for me to learn Linq if you share your code in Linq.
It is know as FLAMES
F - Friend
L - Lover
A - Admirer
M - Marry(Husband)
E - Enemy
S - Sister
Problem description:
Two names will be given (male, female).We have to strike out the common letters from both names. Then we have to count the number of remaining letters after striking out the common characters from both names. Finally we have to iterate the string FLAMES and striking out the letters in FLAMES until we will reach single character left. The remaining single character shows the relationship. I will explain the process more details in the following example.(Ignore cases and spaces).
Example :
Step 1
Male : Albert
Female : Hebarna
Letters “a”, “e” ,”b” are common in both names.
( Strike those letters from both string , even the name “Hebarna” contains two “a” you are allowed to strike single “a” from both string because The name “Albert” has only single “a”).
The resultant string is
Male : $ l $ $ r t
Female: H $ $ $ r n a
Step 2:
Count the remaining letters from both strings.
Count : 7
Step 3:
Using the count we have to iterate the string “FLAMES” in the following manner
F L A M E S
1 2 3 4 5 6
7
(Here the count 7 ends at F ,so strike F)
you will get
$ L A M E S
(Again start your count from immediate next letter (it should not already be hit out) if it is the last letter (“S”) then start from first letter “F” if ‘F” is not already hit out.
$ L A M E S
(igonre) 1 2 3 4 5
(ignore) 6 7
During counting never consider hit out letters.
$ L $ M E S
1 2 3
ignore 4 ignore 5 6 7
"s" will be hit out.
$ L $ M E $
ignore 1 ignore 2 3
4 ignore 5 6
7
"L" will be hit out
$ $ $ M E $
ignore 1 2 ignore
ignore ignore ignore 3 4 ignore
5 6
7
Finally "M" will be hit out. Then only remaining letter is "E" So albert is enemy to herbana.
Update :
Lettter "r" is also common in both names.I forgor to hit it out.Anyhow the process is same as explained.Thanks for pointing it out.

Step1 and Step2
var firstLookup = firstName.ToLookup(c => c.ToLower());
var secondLookup = secondName.ToLookup(c => c.ToLower());
var allChars = firstLookup.Keys.Union(secondLookup.Keys);
int count =
(
from c in allChars
let firstCount = firstLookup[c].Count()
let secondCount = secondLookup[c].Count()
select
firstCount < secondCount ? secondCount - firstCount :
firstCount - secondCount
).Sum()
Step3 (untested)
List<char> word = "FLAMES".ToList();
while (word.Count > 1)
{
int wordCount = word.Count;
int remove = (count-1) % wordCount;
word =
word.Select( (c, i) => new {c, i =
i == remove ? 0 :
i < remove ? i + wordCount + 1 :
i})
.OrderBy(x => x.i)
.Select(x => x.c)
.Skip(1)
.ToList();
}
char result = word.Single();

var count = male.Length + female.Length - male.Intersect( female ).Count();
while (flames.Length > 1)
{
flames = string.Join( '', flames.Where( (c,i) => i != (count % flames.Length) -1 ).ToArray() );
}

That last calculation part that iterates through the flames-letters and removing one after another can be precalculated.
public static Char GetChar(int diff) {
var idx = (diff - 1) % 60;
return "efefmeaelmaafmfaflefefeemsasamfmfallslslesmsasmmaelmlaslslfs"[idx];
}
Some things can be done without linq... unless I totally messed something up.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.