I'm getting a string with some patterns, like:
A 11 A 222222 B 333 A 44444 B 55 A 66666 B
How to get all the strings between A and B in the smallest area?
For example, "A 11 A 222222 B" result in " 222222 "
And the first example should result in:
222222
333
44444
55
66666
We can try searching for all regex matches in your input string which are situated between A and B, or vice-versa. Here is a regex pattern which uses lookarounds to do this:
(?<=\bA )\d+(?= B\b)|(?<=\bB )\d+(?= A\b)
Sample script:
string input = "A 11 A 222222 B 333 A 44444 B 55 A 66666 B";
var vals = Regex.Matches(input, #"(?<=\bA )\d+(?= B\b)|(?<=\bB )\d+(?= A\b)")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
foreach (string val in vals)
{
Console.WriteLine(val);
}
This prints:
222222
333
44444
55
66666
Related
Text from txt file:
10 25
32 44
56 88
102 127
135 145
...
If it is a first line place 0, rest use the last number as a first in new line. Is it possible to do it or I need to loop through lines after regex parse.
0 10 25
25 32 44
44 56 88
88 102 127
127 135 145
(?<Middle>\d+)\s(?<End>\d+) //(?<Start>...)
I would advise against using regex for readability reasons but this will work:
var input = ReadFromFile();
var regex = #"(?<num>\d*)[\n\r]+";
var replace = "${num}\n${num} ";
var output = Regex.Replace(input, regex, replace);
That will do everything apart from the first 0.
Note that a regex approach does not sound quite good for a task like this. It can be used for small input strings, for larger ones, it is recommended that you write some more logic and parse text line by line.
So, more from academic interest, here is a regex solution showing how to replace with different replacement patterns based on whether the line matched is first or not:
var pat = #"(?m)(?:(\A)|^(?!\A))(.*\b\s+(\d+)\r?\n)";
var s = "10 25\n32 44\n56 88\n102 127\n135 14510 25\n32 44\n56 88\n102 127\n135 145";
var res = Regex.Replace(s, pat, m => m.Groups[1].Success ?
$"0 {m.Groups[2].Value}{m.Groups[3].Value} " : $"{m.Groups[2].Value}{m.Groups[3].Value} ");
Result of the C# demo:
0 10 25
25 32 44
44 56 88
88 102 127
127 135 14510 25
25 32 44
44 56 88
88 102 127
127 135 145
Note the \n line breaks are hardcoded, but it is still just an illustration of regex capabilities.
Pattern details
(?m) - an inline RegexOptions.Multiline modifier
(?:(\A)|^(?!\A)) - a non-capturing group matching either
(\A) - start of string capturing it to Group 1
| - or
^(?!\A) - start of a line (but not string due to the (?!\A) negative lookahead)
(.*\b\s+(\d+)\r?\n) - Group 2:
.*\b - 0+ chars other than newline up to the last word boundary on a line followed with...
\s+ - 1+ whitespaces (may be replaced with [\p{Zs}\t]+ to only match horizontal whitespaces)
(\d+) - Group 3: one or more digits
\r?\n - a CRLF or LF line break.
The replacement logic is inside the match evaluator: if Group 1 matched (m.Groups[1].Success ?) replace with 0 and Group 2 + Group 3 values + space. Else, replace with Group 2 + Group 3 + space.
With C#.
var lines = File.ReadLines(fileName);
var st = new StringBuilder(); //or StreamWriter directly to disk ect.
var last = "0";
foreach (var line in lines)
{
st.AppendLine(last + " " + line );
last = line.Split().LastOrDefault();
}
var lines2 = st.ToString();
I have a lot of string with following pattern(format):
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
XXXX XX
zzzzzz zzz
OR:
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddd dddd
XXXX XX
zzzzzz zzz
OR :
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddddddd
eeeee
XXXX XX
zzzzzz zzz
I want to replace XXXX XX with YYYY. I think I need to detect lastEnterin string and do the operation. How can I do this?
I'd do something like this. If the string in question is always on the second to last line, I'd split the string into an array of strings, a single string per line. Then find out how many lines (strings in array) there are. The object of interest is this number -2. Then replace this string with YYYY.
EDIT:
var result = Regex.Split(input, "\r\n|\r|\n");
int len = result.Length;
result[len - 2] = "YYYY";
var output = string.Join(Environment.NewLine, result);
If it's just the pattern, here's an example with Regex:
\b\S{4}\s\S{2}\b
You could use this regex like this:
var regex = new Regex(#"\b\S{4}\s\S{2}\b");
var result = regex.Replace(inputString, "YYYY");
It looks for a word boundary (e.g. a return), then four non-whitespace characters, then one whitespace character, two non-whitespace characters and a word boundary again. It should do what you want.
However, depending on your input it might be a better idea to use this regex:
\b\S{4} \S{2}\b
So I replaced the whitespace character with an actual space character. Of course it could still happen that one of your characters is counted as a word boundary, then again I'd have to see an example of your input.
Here's an example of how it works:
It's in the C# interactive, which works pretty much the same as normal C#.
EDIT
As I realized that your pattern relevant line ends with a space, you could use this pattern as well:
\b\S{4} \S{2}\b\n
Which would probably work even better. However you'd have to replace it with "YYYY\n" then.
This question already has answers here:
How to Read Substrings based on Length of the String in C#
(5 answers)
Closed 9 years ago.
I have a String which i need to break into multiple substring.Now as per my condition while breaking into substring i have to search for two texts into the string.Here is my two texts..
1 . 2 To Other Mobiles (This may be changed based on the condition needed in substring)
Total(This is the last of the substring)
Here is my sample string Content
1 . 1 To Airtel Mobile
1 03/MAR/2013 16:06:31 9845070641 05:44 1.80 **
2 04/MAR/2013 10:00:29 9845096416 00:14 0.30 **
Total 25:28 9.30
1 . 2 To Other Mobiles
1 03/MAR/2013 06:41:06 9448485859 00:15 0.40 **
2 04/MAR/2013 18:57:47 9448367847 08:33 3.60 **
3 05/MAR/2013 18:57:05 9448485859 00:42 0.40 **
4 05/MAR/2013 20:13:19 9448367847 00:42 0.40 **
Total 34:33 18.00
1 . 3 To Fixed Landline
21 21/MAR/2013 11:59:35 08066000000 09:34 5.00
22 22/MAR/2013 11:31:33 08066000000 15:20 8.00
Total 01:35:23 54.00
Based on the Index of these two texts i am breaking string into substring.Now as per my condition i have to read substrings to the immediate text i.e Total.
But in my present code i am reading string to substring till last Total text.
Here is my Code:
string search1 = "1 . 2 To Other Mobiles";
string search2 = "Total";
int startPosition = currentText.IndexOf(search1);
if (startPosition >= 0)
{
startPosition += search1.Length;
int endPosition = currentText.LastIndexOf(search2);
if (endPosition > startPosition)
{
string result = currentText.Substring(startPosition, endPosition - startPosition);
}
}
In Brief i have to search for first searchable text and start reading till the second text.
A solution without Regex done in LinqPad:
string source = #"1 . 1 To Airtel Mobile
1 03/MAR/2013 16:06:31 9845070641 05:44 1.80 **
2 04/MAR/2013 10:00:29 9845096416 00:14 0.30 **
Total 25:28 9.30
1 . 2 To Other Mobiles
1 03/MAR/2013 06:41:06 9448485859 00:15 0.40 **
2 04/MAR/2013 18:57:47 9448367847 08:33 3.60 **
3 05/MAR/2013 18:57:05 9448485859 00:42 0.40 **
4 05/MAR/2013 20:13:19 9448367847 00:42 0.40 **
Total 34:33 18.00
1 . 3 To Fixed Landline
21 21/MAR/2013 11:59:35 08066000000 09:34 5.00
22 22/MAR/2013 11:31:33 08066000000 15:20 8.00
Total 01:35:23 54.00";
string search1 = "1 . 2 To Other Mobiles";
string search2 = "Total";
var result = source.Split(new string[] { search1 }, StringSplitOptions.None)[1].
Split(new string[] { search2 }, StringSplitOptions.None)[0].
Dump();
Output:
1 03/MAR/2013 06:41:06 9448485859 00:15 0.40 **
2 04/MAR/2013 18:57:47 9448367847 08:33 3.60 **
3 05/MAR/2013 18:57:05 9448485859 00:42 0.40 **
4 05/MAR/2013 20:13:19 9448367847 00:42 0.40 **
From the comments on your question I presumed you did not want the searchterms to appear in the output, yes?
If you need to repeatidly search the for textblocks with the same start and end parameters, you could wrap this in a method and do let it recursivly start at the split index [1].
You can use Regular expressions to capture the string up until first search2
var input = "1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf Total asdfasdfasdfasdfasdfasdf another Total ertyertyertyerty eryertwer";
var sub1 = "1 . 2 To Other Mobiles";
var sub2 = "Total";
var match = Regex.Match(input, sub1 + "(.*?)" + sub2, RegexOptions.IgnoreCase);
if (match.Success)
{
var m = match.Groups[1].Value;
Console.WriteLine(sub1 + m);
}
This will output
1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf
If you want up until the last search2 then remove the ? in the capture group to make it greedy.
var input = "1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf Total asdfasdfasdfasdfasdfasdf another Total ertyertyertyerty eryertwer";
var sub1 = "1 . 2 To Other Mobiles";
var sub2 = "Total";
// note the capture group is missing the ?
var match = Regex.Match(input, sub1 + "(.*)" + sub2, RegexOptions.IgnoreCase);
if (match.Success)
{
var m = match.Groups[1].Value;
Console.WriteLine(sub1 + m);
}
This will output
1 . 2 To Other Mobiles asdf asd fas df asd fas dfas df asd fas df sdaf Total asdfasdfasdfasdfasdfasdf another
I apologize if this is a duplicated question. I haven't found a solution for my situation.
I would like to search for all integer numbers surrounded by space, and replace them with a space.
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"\s[0-9]+\s", " ", RegexOptions.Compiled);
sbDigits return value is, "123 789 fdsa jkl xyz x5x 456 123"
I would like the return value to be "fdsa jkl xyz x5x"
So, what is going on? How do I ensure that I am getting the duplicate number?
How about this:
Test string:
123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x
Regex:
(?<=\s|^)[\d]+(?=\s|$)
Working example:
http://regex101.com/r/tJ5rA6
C#:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123 5x");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString()), #"(?<=\s|^)[\d]+(?=\s|$)", " ", RegexOptions.Compiled);
Return value:
fdsa jkl xyz x5x 5x
String fixed = Regex.Replace(originalString, "\\s*\\d+\\s+", "");
Try the following regex:
StringBuilder sb = new StringBuilder(" 123 123 456 789 fdsa jkl xyz x5x 456 456 123 123");
StringBuilder sbDigits = new StringBuilder(Regex.Replace(sb.ToString(), #"\s*[0-9]+\d\s*", " ", RegexOptions.Compiled));
Regex demo
You can use this:
search: #"( [0-9]+)(?=\1\b)"
replace: ""
If you add in word-breaks (\b) you can capture only 'digit words' (which is what it sounds like you want. And you can capture zero or more white space around the digits while not matching the numbers inside letters:
\s*\b\d+\b\s*
I don't know too much about Regex. But it can be done with a little LINQ:
var str = "123 789 fdsa jkl xyz x5x 456 123";
var parts = str.Split().Where(x => !x.All(char.IsDigit));
var result = string.Join(" ", parts); // fdsa jkl xyz x5x
What hapenned
Look what happens when you apply your regex, which matches a whitespace, any number of digits, and another whitespace:
"( 123 )123 456 789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 123 ", first fitting pattern
"( 123 )123( 456 )789 fdsa jkl xyz x5x 456 456 123 123"
// regex engine matches " 456 ", because the first match "ate" the whitespace
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456 123 123"
// matches the first " 456 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
// matches " 123 "
"( 123 )123( 456 )789 fdsa jkl xyz x5x( 456 )456( 123 )123"
So the regex only found " 123 ", " 456 ", " 456 " and " 123 ". You replaced these matches with a whitespace and this is what caused your output.
What you want to do
You want to match word boundaries with something that won't "eat" the word boundary (here, the whitespace). As suggested by many others,
\b\d+\b
will do the trick.
I have a text file that contains numbers in this format :
84 152 100
86 149 101
83 149 99
86 142 101
How can I remove the spaces and bring it in this shape :
84 152 100
86 149 101
83 149 99
86 142 101
This is what I have tried so far :
string path = Directory.GetCurrentDirectory();
string[] lines = System.IO.File.ReadAllLines(#"data_1_2.txt");
string[] line = lines[0].Trim().Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);
But the result of this input is :
84
152
100
Use a bit of LINQ magic:
lines = lines.Select(l => String.Join(" ", l.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries))).ToArray();
It will split each line using space as a separator, remove empty entries and join them back using space as a separator again.
You can use a simple regular expression:
lines = lines.Select(line => Regex.Replace(line, #"\s+", " ")).ToArray();