C# detect last `Enter` in string - c#

I have a lot of string with following pattern(format):
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
XXXX XX
zzzzzz zzz
OR:
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddd dddd
XXXX XX
zzzzzz zzz
OR :
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddddddd
eeeee
XXXX XX
zzzzzz zzz
I want to replace XXXX XX with YYYY. I think I need to detect lastEnterin string and do the operation. How can I do this?

I'd do something like this. If the string in question is always on the second to last line, I'd split the string into an array of strings, a single string per line. Then find out how many lines (strings in array) there are. The object of interest is this number -2. Then replace this string with YYYY.
EDIT:
var result = Regex.Split(input, "\r\n|\r|\n");
int len = result.Length;
result[len - 2] = "YYYY";
var output = string.Join(Environment.NewLine, result);

If it's just the pattern, here's an example with Regex:
\b\S{4}\s\S{2}\b
You could use this regex like this:
var regex = new Regex(#"\b\S{4}\s\S{2}\b");
var result = regex.Replace(inputString, "YYYY");
It looks for a word boundary (e.g. a return), then four non-whitespace characters, then one whitespace character, two non-whitespace characters and a word boundary again. It should do what you want.
However, depending on your input it might be a better idea to use this regex:
\b\S{4} \S{2}\b
So I replaced the whitespace character with an actual space character. Of course it could still happen that one of your characters is counted as a word boundary, then again I'd have to see an example of your input.
Here's an example of how it works:
It's in the C# interactive, which works pretty much the same as normal C#.
EDIT
As I realized that your pattern relevant line ends with a space, you could use this pattern as well:
\b\S{4} \S{2}\b\n
Which would probably work even better. However you'd have to replace it with "YYYY\n" then.

Related

How to remove decimals from file (round them)?

I have to open file, find all decimals, remove decimal part, round them and replace in the text. Result text should be print in the Console.
I tried to do it, but the only thing I made was to remove the decimal part. Please tell me how to round them and replace in the result text. Here is my code:
Console.WriteLine("Enter path to first file:");
String path1 = Console.ReadLine();
string text = File.ReadAllText(path1);
string pattern = #"(\d+)\.\d+";
if(File.Exists(path1) ){
foreach(string phrase in Regex.Split(text, pattern)){
Console.Write(phrase);
}
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
You can use #"\d+([\.\,]\d+)" pattern to capture each number with any amount of decimals. Then use Regex.Replace with MatchEvaluator, where parse captured value as double then "cut" decimals by simple ToString("F0") (check about Fixed-point format).
Example below include decimals with comma , or . fraction separators with help of double.TryParse overload, where we can specify NumberStyles.Any and CultureInfo.InvariantCulture (from System.Globalization namespace) and simple replacement of comma , to dot .. Also works with negative numbers (e.g. -0.98765 in example):
var input = "I have 11.23$ and can spend 20,01 of it. "+
"Melons cost 01.25$ per -0.98765 kg, "+
"but my mom ordered me to buy 1234.56789 kg. "+
"Please do something with that decimals.";
var result = Regex.Replace(input, #"\d+([\.\,]\d+)", (match) =>
double.TryParse(match.Value.Replace(",", "."), NumberStyles.Any, CultureInfo.InvariantCulture, out double value)
? value.ToString("F0")
: match.Value);
// Result:
// I have 11$ and can spend 20 of it.
// Melons cost 1$ per -1 kg,
// but my mom ordered me to buy 1235 kg.
// Please do something with that decimals.
On "Aaaa 50.05 bbbb 82.52 cccc 6.8888" would work too with result of "Aaaa 50 bbbb 83 cccc 7".
You can use Math.Round on all matches that you can transform using Regex.Replace and a match evaluator as the replacement:
var text = "Aaaa 50.05 bbbb 82.52 cccc 6.8888";
var pattern = #"\d+\.\d+";
var result = Regex.Replace(text, pattern, x => $"{Math.Round(Double.Parse(x.Value))}");
Console.WriteLine(result); // => Aaaa 50 bbbb 83 cccc 7
See the C# demo.
The \d+\.\d+ regex is simple, it matches one or more digits, . and one or more digits. Double.Parse(x.Value) converts the found value to a Double, and then Math.Round rounds the number.

Catastrophic backtracking; regular expression for extracting values in nested brackets

I would like to extract aaa, bb b={{b}}bb bbb and {ccc} ccc from the following string using regular expression:
zyx={aaa}, yzx={bb b={{b}}bb bbb}, xyz={{ccc} ccc}
Note: aaa represents an arbitrary sequence of any number of characters, hence no determined length or pattern. For instance, {ccc} ccc could be {cccccccccc}cc {cc} cccc cccc, or any other combination),
I have written the following regular expression:
(?<a>[^{}]*)\s*=\s*{((?<v>[^{}]+)*)},*
This expression extracts aaa, but fails to parse the rest of the input with catastrophic backtracking failure, because of the nested curly-brackets.
Any thoughts on how I can update the regex to process the nested brackets correctly?
(Just in case, I am using C# .NET Core 3.0, if you need engine-specific options. Also, I rather not doing any magics on the code, but work with the regex pattern only.)
Similar question
The question regular expression to match balanced parentheses is similar to this question, with one difference that here the parenthesis are not necessarily balanced, rather they follow x={y} pattern.
Update 1
Inputs such as the following are also possible:
yzx={bb b={{b}},bb bbb,},
Note , after {{b}} and bbb.
Update 2
I wrote the following patter, this can match anything but aaa from the first example:
(?<A>[^{}]*)\s*=\s*{(?<V>(?<S>([^{}]?)\{(?:[^}{]+|(?&S))+\}))}(,|$)
Regex.Matches, pretty good
"={(.*?)}(, |$)" could work.
string input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, xyz={{ccc} ccc}";
string pattern = "={(.*?)}(, |$)";
var matches = Regex.Matches(input, pattern)
.Select(m => m.Groups[1].Value)
.ToList();
foreach (var m in matches) Console.WriteLine(m);
Output
aaa
bb b={{b}}bb bbb
bb b={{b}},bb bbb,
{ccc} ccc
Regex.Split, really good
I think for this job Regex.Split may be a better tool.
tring input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, ttt={nasty{t, }, }, xyz={{ccc} ccc}, zzz={{{{{{{huh?}";
var matches2 = Regex.Split(input, "(^|, )[a-zA-Z]+=", RegexOptions.ExplicitCapture); // Or "(?:^|, )[a-zA-Z]+=" without the flag
Console.WriteLine("-------------------------"); // Adding this to show the empty element (see note below)
foreach (var m in matches2) Console.WriteLine(m);
Console.WriteLine("-------------------------");
-------------------------
{aaa}
{bb b={{b}}bb bbb}
{bb b={{b}},bb bbb,}
{nasty{t, }, }
{{ccc} ccc}
{{{{{{{huh?}
-------------------------
Note: The empty element is there because:
If a match is found at the beginning or the end of the input string, an empty string is included at the beginning or the end of the returned array.
Case 3
string input = "AAA={aaa}, BBB={bbb, bb{{b}}, bbb{b}}, CCC={ccc}, DDD={ddd}, EEE={00-99} ";
var matches2 = Regex.Split(input, "(?:^|, )[a-zA-Z]+="); // Or drop '?:' and use RegexOptions.ExplicitCapture
foreach (var m in matches2) Console.WriteLine(m);
{aaa}
{bbb, bb{{b}}, bbb{b}}
{ccc}
{ddd}
{00-99}

How to read from line x to line y using StreamReader

I have a text file containing a lot of short stories spread over multiple lines. I want to get one story at a time and convert it to a string.
Example:
//Story A
Aaaa AAA aaaaa AAA
Aaa A A aaaa AA aaa
A Aaaa AAA aaaaa A A
//Story B
BBB b BBb BB bbb BB
BBB bb bb bb BB BB
BB bbb BBBB bbbb
How can I get all lines of Story A and convert it to a string without having to load Story B? (There are going to be many other stories so loading them all isn't an option as it will occupy a lot of RAM)
The static File Class in the System.IO Namespace has a lot of handy methods to process files. The File.ReadLines Method will save memory:
public static System.Collections.Generic.IEnumerable<string> ReadLines (string path);
The point is that it reads the file line by line, keeping only the current line in memory (plus the current file buffer, but not the whole file).
Using some LINQ, you can get the right story with
string storyTitle = "//Story A";
var story = String.Join(Environment.NewLine,
File.ReadLines(theFile)
.SkipWhile(s => s != storyTitle)
.Skip(1) // Skip the current title
.TakeWhile(s => !s.StartsWith("//")) // Next story title
);
We must skip the current story title with .Skip(1), otherwise .TakeWhile immediately stops reading lines.

Find the last part of a string

I am really not good at using RegEx, it struggles me a lot every time when I try to use it.
I Have a string:
"aaa bbb ccc - ddd eee fff - xxx yyy zzz";
What I try to get is the substring after the last ' - '
If I use pattern "^.* - (.*)$" like below it does't work.
string pattern = "aaa bbb ccc - ddd eee fff - xxx yyy zzz";
Match match = Regex.Match(pattern, #"^.* - (.*)$", RegexOptions.IgnoreCase);
what pattern can make match.Captures.Count equals 1, and match.Captures[0].Value equals "xxx yyy zzz"?
I have to use Regex, because I have a generic function, and the pattern is a parameter.
What the pattern should be?
Background:
I have a function alreay deploied in production, the main job of that function is:
..............................
string name = xxx;
Regex regex = new Regex(pattern);
Match match = regex.Match(name);
if (match != null)
{
for (int index = match.Captures.Count - 1; index > 0; index--)
name = name.Remove(match.Captures[index].Index, match.Captures[index].Length);
}
xxx = name;
...........................
Regex is, yet again, overkill for this sort of thing. Just use LastIndexOf:
var result = pattern.Substring(pattern.LastIndexOf("-") + 1);
Output: xxx yyy zzz
EDIT:
Regex version: (.)(?<=- )([^-])+$. Don't bother matching from the start of the string (using ^).. you only care about the end.
Not sure why you need this though. I would be interested to see your "non-simplified" version of your function.

.NET Regex: Is {0} quantifier works?

In LINQPad (.NET ) all these expressions returns "True":
new Regex(#"\w{0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"(\w){0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"[\w]{0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"([\w]){0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"\w{0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"(\w){0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"[\w]{0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"([\w]){0,0}").IsMatch("aa aa ZZ Z").Dump();
new Regex(#"([a]){0,0}").IsMatch("aaaaZZZ").Dump();
Why?
I'm assuming that your plan is to make sure that a certain character isn't present in the source string by using the {0} quantifier on it. That's not going to work like this. The {0} quantifier itself is useless here - it means "match the previous token zero times". This is true for all strings, even the empty string. Zero is only useful as a lower bound, for example in a{0,5} to match zero to five as.
Regexes are designed to match text, so you need to go through some contortions to make them not match text. For example:
Regex(#"^\W*$") // syntactic sugar for Regex(#"^[^\w]*$")
matches only if the entire string consists of non-alphanumeric characters.
Regex(#"^[^a]*$")
matches only if the entire string consists of characters other than a.
Regex is better at positive assertions than negative. new Regex(#"\w{0}") is the same as new Regex(#""). {0} means to match zero instances of \w. Since there is nothing else in the regex, it will match all input strings.
You are trying on each expressions to match a zero-width string that is present in all strings of the world. Thus it returns true.

Categories