Restart Regex indexing based on line content - c#

I am trying to load a text file into a C# ContentBox and index the lines.
Currently the text file contains
Data
Company
Phone
Email
Company
Phone
Email
I have currently setup a Refex index to number all the lines in the text file on load.
string content = File.ReadAllText(file);
content = Regex.Replace(content, #"^\s*$\n|\r", string.Empty, RegexOptions.Multiline).TrimEnd();
var index = 0;
content = Regex.Replace(content, "^", (Match m) => (index++).ToString().PadLeft(4, '0') + " ", RegexOptions.Multiline);
ContentBox.Text = content;
This outputs
0000 Data
0001 Company
0002 Phone
0003 Email
0004 Company
0005 Phone
0006 Email
What I need to do is be able to output the following into the ContentBox.
0000 Data
0001 Company
0002 Phone
0003 Email
0001 Company
0002 Phone
0003 Email
Can anyone assist me with this?

Your use of Regex is not needed. This code should give you the direction you need:
string[] content =
File
.ReadAllText(file)
.Select((x, n) => $"{(n == 0 ? 0 : (n - 1) % 3 + 1):0000} {x}")
.ToArray();
Here's an example of it working:
string[] source = #"Data
ABC Bakery
0123
abc#bakery.com
DEF Pets
0124
def#pets.com".Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
string[] content =
source
.Select((x, n) => $"{(n == 0 ? 0 : (n - 1) % 3 + 1):0000} {x}")
.ToArray();
That gives:
0000 Data
0001 ABC Bakery
0002 0123
0003 abc#bakery.com
0001 DEF Pets
0002 0124
0003 def#pets.com

Related

How to find strings between two strings in c#

I'm getting a string with some patterns, like:
A 11 A 222222 B 333 A 44444 B 55 A 66666 B
How to get all the strings between A and B in the smallest area?
For example, "A 11 A 222222 B" result in " 222222 "
And the first example should result in:
222222
333
44444
55
66666
We can try searching for all regex matches in your input string which are situated between A and B, or vice-versa. Here is a regex pattern which uses lookarounds to do this:
(?<=\bA )\d+(?= B\b)|(?<=\bB )\d+(?= A\b)
Sample script:
string input = "A 11 A 222222 B 333 A 44444 B 55 A 66666 B";
var vals = Regex.Matches(input, #"(?<=\bA )\d+(?= B\b)|(?<=\bB )\d+(?= A\b)")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
foreach (string val in vals)
{
Console.WriteLine(val);
}
This prints:
222222
333
44444
55
66666

Append arrays and lists

For example, if the entered input is:
1 2 3 |4 5 6 | 7 8
we should manipulate it to
1 2 3|4 5 6|7 8
Another example:
7 | 4 5|1 0| 2 5 |3
we should manipulate it to
7|4 5|1 0|2 5|3
This is my idea because I want to exchange some of the subarrays (7; 4 5; 1 0; 2 5; 3).
I'm not sure that this code is working and it can be the base of I want to do but I must upload it for you to see my work.
static void Main(string[] args)
{
List<string> arrays = Console.ReadLine()
.Split(' ', StringSplitOptions.RemoveEmptyEntries)
.ToList();
foreach (var element in arrays)
{
Console.WriteLine("element: " + element);
}
}
You need to split your input by "|" first and then by space. After this, you can reassemble your input with string.Join. Try this code:
var input = "1 2 3 |4 5 6 | 7 8";
var result = string.Join("|", input.Split('|')
.Select(part => string.Join(" ",
part.Trim().Split(new []{' '}, StringSplitOptions.RemoveEmptyEntries))));
// now result is "1 2 3|4 5 6|7 8"
This could do this with a simple regular expression:
var result = Regex.Replace(input, #"\s?\|\s?", "|");
This will match any (optional) white space character, followed by a | character, followed by an (optional) white space character and replace it with a single | character.
Alternatively, if you need to potentially strip out multiple spaces around the |, replace the zero-or-one quantifiers (?) with zero-or-more quantifiers (*):
var result = Regex.Replace(input, #"\s*\|\s*", "|");
To also deal with multiple spaces between numbers (not just around | characters), I'd recommend something like this:
var result = Regex.Replace(input, #"\s*([\s|])\s*", "$1")
This will match any occurrence of zero or more white space characters, followed by either a white space character or a | character (captured in group 1), followed by zero or more white space characters and replace it with whatever was captured in group 1.

Regex if condition c#

Text from txt file:
10 25
32 44
56 88
102 127
135 145
...
If it is a first line place 0, rest use the last number as a first in new line. Is it possible to do it or I need to loop through lines after regex parse.
0 10 25
25 32 44
44 56 88
88 102 127
127 135 145
(?<Middle>\d+)\s(?<End>\d+) //(?<Start>...)
I would advise against using regex for readability reasons but this will work:
var input = ReadFromFile();
var regex = #"(?<num>\d*)[\n\r]+";
var replace = "${num}\n${num} ";
var output = Regex.Replace(input, regex, replace);
That will do everything apart from the first 0.
Note that a regex approach does not sound quite good for a task like this. It can be used for small input strings, for larger ones, it is recommended that you write some more logic and parse text line by line.
So, more from academic interest, here is a regex solution showing how to replace with different replacement patterns based on whether the line matched is first or not:
var pat = #"(?m)(?:(\A)|^(?!\A))(.*\b\s+(\d+)\r?\n)";
var s = "10 25\n32 44\n56 88\n102 127\n135 14510 25\n32 44\n56 88\n102 127\n135 145";
var res = Regex.Replace(s, pat, m => m.Groups[1].Success ?
$"0 {m.Groups[2].Value}{m.Groups[3].Value} " : $"{m.Groups[2].Value}{m.Groups[3].Value} ");
Result of the C# demo:
0 10 25
25 32 44
44 56 88
88 102 127
127 135 14510 25
25 32 44
44 56 88
88 102 127
127 135 145
Note the \n line breaks are hardcoded, but it is still just an illustration of regex capabilities.
Pattern details
(?m) - an inline RegexOptions.Multiline modifier
(?:(\A)|^(?!\A)) - a non-capturing group matching either
(\A) - start of string capturing it to Group 1
| - or
^(?!\A) - start of a line (but not string due to the (?!\A) negative lookahead)
(.*\b\s+(\d+)\r?\n) - Group 2:
.*\b - 0+ chars other than newline up to the last word boundary on a line followed with...
\s+ - 1+ whitespaces (may be replaced with [\p{Zs}\t]+ to only match horizontal whitespaces)
(\d+) - Group 3: one or more digits
\r?\n - a CRLF or LF line break.
The replacement logic is inside the match evaluator: if Group 1 matched (m.Groups[1].Success ?) replace with 0 and Group 2 + Group 3 values + space. Else, replace with Group 2 + Group 3 + space.
With C#.
var lines = File.ReadLines(fileName);
var st = new StringBuilder(); //or StreamWriter directly to disk ect.
var last = "0";
foreach (var line in lines)
{
st.AppendLine(last + " " + line );
last = line.Split().LastOrDefault();
}
var lines2 = st.ToString();

How to search for immediate Substring present at multiple places in c# [duplicate]

This question already has answers here:
How to Read Substrings based on Length of the String in C#
(5 answers)
Closed 9 years ago.
I have a String which i need to break into multiple substring.Now as per my condition while breaking into substring i have to search for two texts into the string.Here is my two texts..
1 . 2 To Other Mobiles (This may be changed based on the condition needed in substring)
Total(This is the last of the substring)
Here is my sample string Content
1 . 1 To Airtel Mobile
1 03/MAR/2013 16:06:31 9845070641 05:44 1.80 **
2 04/MAR/2013 10:00:29 9845096416 00:14 0.30 **
Total 25:28 9.30
1 . 2 To Other Mobiles
1 03/MAR/2013 06:41:06 9448485859 00:15 0.40 **
2 04/MAR/2013 18:57:47 9448367847 08:33 3.60 **
3 05/MAR/2013 18:57:05 9448485859 00:42 0.40 **
4 05/MAR/2013 20:13:19 9448367847 00:42 0.40 **
Total 34:33 18.00
1 . 3 To Fixed Landline
21 21/MAR/2013 11:59:35 08066000000 09:34 5.00
22 22/MAR/2013 11:31:33 08066000000 15:20 8.00
Total 01:35:23 54.00
Based on the Index of these two texts i am breaking string into substring.Now as per my condition i have to read substrings to the immediate text i.e Total.
But in my present code i am reading string to substring till last Total text.
Here is my Code:
string search1 = "1 . 2 To Other Mobiles";
string search2 = "Total";
int startPosition = currentText.IndexOf(search1);
if (startPosition >= 0)
{
startPosition += search1.Length;
int endPosition = currentText.LastIndexOf(search2);
if (endPosition > startPosition)
{
string result = currentText.Substring(startPosition, endPosition - startPosition);
}
}
In Brief i have to search for first searchable text and start reading till the second text.
A solution without Regex done in LinqPad:
string source = #"1 . 1 To Airtel Mobile
1 03/MAR/2013 16:06:31 9845070641 05:44 1.80 **
2 04/MAR/2013 10:00:29 9845096416 00:14 0.30 **
Total 25:28 9.30
1 . 2 To Other Mobiles
1 03/MAR/2013 06:41:06 9448485859 00:15 0.40 **
2 04/MAR/2013 18:57:47 9448367847 08:33 3.60 **
3 05/MAR/2013 18:57:05 9448485859 00:42 0.40 **
4 05/MAR/2013 20:13:19 9448367847 00:42 0.40 **
Total 34:33 18.00
1 . 3 To Fixed Landline
21 21/MAR/2013 11:59:35 08066000000 09:34 5.00
22 22/MAR/2013 11:31:33 08066000000 15:20 8.00
Total 01:35:23 54.00";
string search1 = "1 . 2 To Other Mobiles";
string search2 = "Total";
var result = source.Split(new string[] { search1 }, StringSplitOptions.None)[1].
Split(new string[] { search2 }, StringSplitOptions.None)[0].
Dump();
Output:
1 03/MAR/2013 06:41:06 9448485859 00:15 0.40 **
2 04/MAR/2013 18:57:47 9448367847 08:33 3.60 **
3 05/MAR/2013 18:57:05 9448485859 00:42 0.40 **
4 05/MAR/2013 20:13:19 9448367847 00:42 0.40 **
From the comments on your question I presumed you did not want the searchterms to appear in the output, yes?
If you need to repeatidly search the for textblocks with the same start and end parameters, you could wrap this in a method and do let it recursivly start at the split index [1].
You can use Regular expressions to capture the string up until first search2
var input = "1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf Total asdfasdfasdfasdfasdfasdf another Total ertyertyertyerty eryertwer";
var sub1 = "1 . 2 To Other Mobiles";
var sub2 = "Total";
var match = Regex.Match(input, sub1 + "(.*?)" + sub2, RegexOptions.IgnoreCase);
if (match.Success)
{
var m = match.Groups[1].Value;
Console.WriteLine(sub1 + m);
}
This will output
1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf
If you want up until the last search2 then remove the ? in the capture group to make it greedy.
var input = "1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf Total asdfasdfasdfasdfasdfasdf another Total ertyertyertyerty eryertwer";
var sub1 = "1 . 2 To Other Mobiles";
var sub2 = "Total";
// note the capture group is missing the ?
var match = Regex.Match(input, sub1 + "(.*)" + sub2, RegexOptions.IgnoreCase);
if (match.Success)
{
var m = match.Groups[1].Value;
Console.WriteLine(sub1 + m);
}
This will output
1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf Total asdfasdfasdfasdfasdfasdf another

C# regex search and replace all integer numbers surrounded by space

I apologize if this is a duplicated question. I haven't found a solution for my situation.
I would like to search for all integer numbers surrounded by space, and replace them with a space.
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"\s[0-9]+\s", " ", RegexOptions.Compiled);
sbDigits return value is, "123 789 fdsa jkl xyz x5x 456 123"
I would like the return value to be "fdsa jkl xyz x5x"
So, what is going on? How do I ensure that I am getting the duplicate number?
How about this:
Test string:
123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x
Regex:
(?<=\s|^)[\d]+(?=\s|$)
Working example:
http://regex101.com/r/tJ5rA6
C#:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"(?<=\s|^)[\d]+(?=\s|$)", " ", RegexOptions.Compiled);
Return value:
fdsa jkl xyz x5x 5x
String fixed = Regex.Replace(originalString, "\\s*\\d+\\s+", "");
Try the following regex:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString(), #"\s*[0-9]+\d\s*", " ", RegexOptions.Compiled));
Regex demo
You can use this:
search: #"( [0-9]+)(?=\1\b)"
replace: ""
If you add in word-breaks (\b) you can capture only 'digit words' (which is what it sounds like you want. And you can capture zero or more white space around the digits while not matching the numbers inside letters:
\s*\b\d+\b\s*
I don't know too much about Regex. But it can be done with a little LINQ:
var str = "123 789 fdsa jkl xyz x5x 456 123";
var parts = str.Split().Where(x => !x.All(char.IsDigit));
var result = string.Join(" ", parts); // fdsa jkl xyz x5x
What hapenned
Look what happens when you apply your regex, which matches a whitespace, any number of digits, and another whitespace:
"( 123 )123 456 789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 123 ", first fitting pattern
"( 123 )123( 456 )789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 456 ", because the first match "ate" the whitespace
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456 123 123"
// matches the first " 456 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
// matches " 123 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
So the regex only found " 123 ", " 456 ", " 456 " and " 123 ". You replaced these matches with a whitespace and this is what caused your output.
What you want to do
You want to match word boundaries with something that won't "eat" the word boundary (here, the whitespace). As suggested by many others,
\b\d+\b
will do the trick.

Categories