How to read from line x to line y using StreamReader - c#

I have a text file containing a lot of short stories spread over multiple lines. I want to get one story at a time and convert it to a string.
Example:
//Story A
Aaaa AAA aaaaa AAA
Aaa A A aaaa AA aaa
A Aaaa AAA aaaaa A A
//Story B
BBB b BBb BB bbb BB
BBB bb bb bb BB BB
BB bbb BBBB bbbb
How can I get all lines of Story A and convert it to a string without having to load Story B? (There are going to be many other stories so loading them all isn't an option as it will occupy a lot of RAM)

The static File Class in the System.IO Namespace has a lot of handy methods to process files. The File.ReadLines Method will save memory:
public static System.Collections.Generic.IEnumerable<string> ReadLines (string path);
The point is that it reads the file line by line, keeping only the current line in memory (plus the current file buffer, but not the whole file).
Using some LINQ, you can get the right story with
string storyTitle = "//Story A";
var story = String.Join(Environment.NewLine,
File.ReadLines(theFile)
.SkipWhile(s => s != storyTitle)
.Skip(1) // Skip the current title
.TakeWhile(s => !s.StartsWith("//")) // Next story title
);
We must skip the current story title with .Skip(1), otherwise .TakeWhile immediately stops reading lines.

Related

How to remove decimals from file (round them)?

I have to open file, find all decimals, remove decimal part, round them and replace in the text. Result text should be print in the Console.
I tried to do it, but the only thing I made was to remove the decimal part. Please tell me how to round them and replace in the result text. Here is my code:
Console.WriteLine("Enter path to first file:");
String path1 = Console.ReadLine();
string text = File.ReadAllText(path1);
string pattern = #"(\d+)\.\d+";
if(File.Exists(path1) ){
foreach(string phrase in Regex.Split(text, pattern)){
Console.Write(phrase);
}
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
You can use #"\d+([\.\,]\d+)" pattern to capture each number with any amount of decimals. Then use Regex.Replace with MatchEvaluator, where parse captured value as double then "cut" decimals by simple ToString("F0") (check about Fixed-point format).
Example below include decimals with comma , or . fraction separators with help of double.TryParse overload, where we can specify NumberStyles.Any and CultureInfo.InvariantCulture (from System.Globalization namespace) and simple replacement of comma , to dot .. Also works with negative numbers (e.g. -0.98765 in example):
var input = "I have 11.23$ and can spend 20,01 of it. "+
"Melons cost 01.25$ per -0.98765 kg, "+
"but my mom ordered me to buy 1234.56789 kg. "+
"Please do something with that decimals.";
var result = Regex.Replace(input, #"\d+([\.\,]\d+)", (match) =>
double.TryParse(match.Value.Replace(",", "."), NumberStyles.Any, CultureInfo.InvariantCulture, out double value)
? value.ToString("F0")
: match.Value);
// Result:
// I have 11$ and can spend 20 of it.
// Melons cost 1$ per -1 kg,
// but my mom ordered me to buy 1235 kg.
// Please do something with that decimals.
On "Aaaa 50.05 bbbb 82.52 cccc 6.8888" would work too with result of "Aaaa 50 bbbb 83 cccc 7".
You can use Math.Round on all matches that you can transform using Regex.Replace and a match evaluator as the replacement:
var text = "Aaaa 50.05 bbbb 82.52 cccc 6.8888";
var pattern = #"\d+\.\d+";
var result = Regex.Replace(text, pattern, x => $"{Math.Round(Double.Parse(x.Value))}");
Console.WriteLine(result); // => Aaaa 50 bbbb 83 cccc 7
See the C# demo.
The \d+\.\d+ regex is simple, it matches one or more digits, . and one or more digits. Double.Parse(x.Value) converts the found value to a Double, and then Math.Round rounds the number.

Catastrophic backtracking; regular expression for extracting values in nested brackets

I would like to extract aaa, bb b={{b}}bb bbb and {ccc} ccc from the following string using regular expression:
zyx={aaa}, yzx={bb b={{b}}bb bbb}, xyz={{ccc} ccc}
Note: aaa represents an arbitrary sequence of any number of characters, hence no determined length or pattern. For instance, {ccc} ccc could be {cccccccccc}cc {cc} cccc cccc, or any other combination),
I have written the following regular expression:
(?<a>[^{}]*)\s*=\s*{((?<v>[^{}]+)*)},*
This expression extracts aaa, but fails to parse the rest of the input with catastrophic backtracking failure, because of the nested curly-brackets.
Any thoughts on how I can update the regex to process the nested brackets correctly?
(Just in case, I am using C# .NET Core 3.0, if you need engine-specific options. Also, I rather not doing any magics on the code, but work with the regex pattern only.)
Similar question
The question regular expression to match balanced parentheses is similar to this question, with one difference that here the parenthesis are not necessarily balanced, rather they follow x={y} pattern.
Update 1
Inputs such as the following are also possible:
yzx={bb b={{b}},bb bbb,},
Note , after {{b}} and bbb.
Update 2
I wrote the following patter, this can match anything but aaa from the first example:
(?<A>[^{}]*)\s*=\s*{(?<V>(?<S>([^{}]?)\{(?:[^}{]+|(?&S))+\}))}(,|$)
Regex.Matches, pretty good
"={(.*?)}(, |$)" could work.
string input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, xyz={{ccc} ccc}";
string pattern = "={(.*?)}(, |$)";
var matches = Regex.Matches(input, pattern)
.Select(m => m.Groups[1].Value)
.ToList();
foreach (var m in matches) Console.WriteLine(m);
Output
aaa
bb b={{b}}bb bbb
bb b={{b}},bb bbb,
{ccc} ccc
Regex.Split, really good
I think for this job Regex.Split may be a better tool.
tring input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, ttt={nasty{t, }, }, xyz={{ccc} ccc}, zzz={{{{{{{huh?}";
var matches2 = Regex.Split(input, "(^|, )[a-zA-Z]+=", RegexOptions.ExplicitCapture); // Or "(?:^|, )[a-zA-Z]+=" without the flag
Console.WriteLine("-------------------------"); // Adding this to show the empty element (see note below)
foreach (var m in matches2) Console.WriteLine(m);
Console.WriteLine("-------------------------");
-------------------------
{aaa}
{bb b={{b}}bb bbb}
{bb b={{b}},bb bbb,}
{nasty{t, }, }
{{ccc} ccc}
{{{{{{{huh?}
-------------------------
Note: The empty element is there because:
If a match is found at the beginning or the end of the input string, an empty string is included at the beginning or the end of the returned array.
Case 3
string input = "AAA={aaa}, BBB={bbb, bb{{b}}, bbb{b}}, CCC={ccc}, DDD={ddd}, EEE={00-99} ";
var matches2 = Regex.Split(input, "(?:^|, )[a-zA-Z]+="); // Or drop '?:' and use RegexOptions.ExplicitCapture
foreach (var m in matches2) Console.WriteLine(m);
{aaa}
{bbb, bb{{b}}, bbb{b}}
{ccc}
{ddd}
{00-99}

Regex if condition c#

Text from txt file:
10 25
32 44
56 88
102 127
135 145
...
If it is a first line place 0, rest use the last number as a first in new line. Is it possible to do it or I need to loop through lines after regex parse.
0 10 25
25 32 44
44 56 88
88 102 127
127 135 145
(?<Middle>\d+)\s(?<End>\d+) //(?<Start>...)
I would advise against using regex for readability reasons but this will work:
var input = ReadFromFile();
var regex = #"(?<num>\d*)[\n\r]+";
var replace = "${num}\n${num} ";
var output = Regex.Replace(input, regex, replace);
That will do everything apart from the first 0.
Note that a regex approach does not sound quite good for a task like this. It can be used for small input strings, for larger ones, it is recommended that you write some more logic and parse text line by line.
So, more from academic interest, here is a regex solution showing how to replace with different replacement patterns based on whether the line matched is first or not:
var pat = #"(?m)(?:(\A)|^(?!\A))(.*\b\s+(\d+)\r?\n)";
var s = "10 25\n32 44\n56 88\n102 127\n135 14510 25\n32 44\n56 88\n102 127\n135 145";
var res = Regex.Replace(s, pat, m => m.Groups[1].Success ?
$"0 {m.Groups[2].Value}{m.Groups[3].Value} " : $"{m.Groups[2].Value}{m.Groups[3].Value} ");
Result of the C# demo:
0 10 25
25 32 44
44 56 88
88 102 127
127 135 14510 25
25 32 44
44 56 88
88 102 127
127 135 145
Note the \n line breaks are hardcoded, but it is still just an illustration of regex capabilities.
Pattern details
(?m) - an inline RegexOptions.Multiline modifier
(?:(\A)|^(?!\A)) - a non-capturing group matching either
(\A) - start of string capturing it to Group 1
| - or
^(?!\A) - start of a line (but not string due to the (?!\A) negative lookahead)
(.*\b\s+(\d+)\r?\n) - Group 2:
.*\b - 0+ chars other than newline up to the last word boundary on a line followed with...
\s+ - 1+ whitespaces (may be replaced with [\p{Zs}\t]+ to only match horizontal whitespaces)
(\d+) - Group 3: one or more digits
\r?\n - a CRLF or LF line break.
The replacement logic is inside the match evaluator: if Group 1 matched (m.Groups[1].Success ?) replace with 0 and Group 2 + Group 3 values + space. Else, replace with Group 2 + Group 3 + space.
With C#.
var lines = File.ReadLines(fileName);
var st = new StringBuilder(); //or StreamWriter directly to disk ect.
var last = "0";
foreach (var line in lines)
{
st.AppendLine(last + " " + line );
last = line.Split().LastOrDefault();
}
var lines2 = st.ToString();

C# detect last `Enter` in string

I have a lot of string with following pattern(format):
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
XXXX XX
zzzzzz zzz
OR:
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddd dddd
XXXX XX
zzzzzz zzz
OR :
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddddddd
eeeee
XXXX XX
zzzzzz zzz
I want to replace XXXX XX with YYYY. I think I need to detect lastEnterin string and do the operation. How can I do this?
I'd do something like this. If the string in question is always on the second to last line, I'd split the string into an array of strings, a single string per line. Then find out how many lines (strings in array) there are. The object of interest is this number -2. Then replace this string with YYYY.
EDIT:
var result = Regex.Split(input, "\r\n|\r|\n");
int len = result.Length;
result[len - 2] = "YYYY";
var output = string.Join(Environment.NewLine, result);
If it's just the pattern, here's an example with Regex:
\b\S{4}\s\S{2}\b
You could use this regex like this:
var regex = new Regex(#"\b\S{4}\s\S{2}\b");
var result = regex.Replace(inputString, "YYYY");
It looks for a word boundary (e.g. a return), then four non-whitespace characters, then one whitespace character, two non-whitespace characters and a word boundary again. It should do what you want.
However, depending on your input it might be a better idea to use this regex:
\b\S{4} \S{2}\b
So I replaced the whitespace character with an actual space character. Of course it could still happen that one of your characters is counted as a word boundary, then again I'd have to see an example of your input.
Here's an example of how it works:
It's in the C# interactive, which works pretty much the same as normal C#.
EDIT
As I realized that your pattern relevant line ends with a space, you could use this pattern as well:
\b\S{4} \S{2}\b\n
Which would probably work even better. However you'd have to replace it with "YYYY\n" then.

Keep only lines in String which contain numbers

I have tried couple of regular expressions but have not been able to come up with one that works correctly. I have string with lines and I want to keep the lines which contains numbers.
Current String
-----------------
Dog
Cat
Cat 1
Dog 22
Once processed the expected result is:
Filtered String
-----------------
Cat 1
Dog 22
myString.Split('\n').Where(s => s.Any(c => Char.IsDigit(c)));
This splits the string by newline ('\n') characters, and for each "line", it finds the ones that have at least one character that is a digit.

Categories