Find the last part of a string - c#

I am really not good at using RegEx, it struggles me a lot every time when I try to use it.
I Have a string:
"aaa bbb ccc - ddd eee fff - xxx yyy zzz";
What I try to get is the substring after the last ' - '
If I use pattern "^.* - (.*)$" like below it does't work.
string pattern = "aaa bbb ccc - ddd eee fff - xxx yyy zzz";
Match match = Regex.Match(pattern, #"^.* - (.*)$", RegexOptions.IgnoreCase);
what pattern can make match.Captures.Count equals 1, and match.Captures[0].Value equals "xxx yyy zzz"?
I have to use Regex, because I have a generic function, and the pattern is a parameter.
What the pattern should be?
Background:
I have a function alreay deploied in production, the main job of that function is:
..............................
string name = xxx;
Regex regex = new Regex(pattern);
Match match = regex.Match(name);
if (match != null)
{
for (int index = match.Captures.Count - 1; index > 0; index--)
name = name.Remove(match.Captures[index].Index, match.Captures[index].Length);
}
xxx = name;
...........................

Regex is, yet again, overkill for this sort of thing. Just use LastIndexOf:
var result = pattern.Substring(pattern.LastIndexOf("-") + 1);
Output: xxx yyy zzz
EDIT:
Regex version: (.)(?<=- )([^-])+$. Don't bother matching from the start of the string (using ^).. you only care about the end.
Not sure why you need this though. I would be interested to see your "non-simplified" version of your function.

Related

Catastrophic backtracking; regular expression for extracting values in nested brackets

I would like to extract aaa, bb b={{b}}bb bbb and {ccc} ccc from the following string using regular expression:
zyx={aaa}, yzx={bb b={{b}}bb bbb}, xyz={{ccc} ccc}
Note: aaa represents an arbitrary sequence of any number of characters, hence no determined length or pattern. For instance, {ccc} ccc could be {cccccccccc}cc {cc} cccc cccc, or any other combination),
I have written the following regular expression:
(?<a>[^{}]*)\s*=\s*{((?<v>[^{}]+)*)},*
This expression extracts aaa, but fails to parse the rest of the input with catastrophic backtracking failure, because of the nested curly-brackets.
Any thoughts on how I can update the regex to process the nested brackets correctly?
(Just in case, I am using C# .NET Core 3.0, if you need engine-specific options. Also, I rather not doing any magics on the code, but work with the regex pattern only.)
Similar question
The question regular expression to match balanced parentheses is similar to this question, with one difference that here the parenthesis are not necessarily balanced, rather they follow x={y} pattern.
Update 1
Inputs such as the following are also possible:
yzx={bb b={{b}},bb bbb,},
Note , after {{b}} and bbb.
Update 2
I wrote the following patter, this can match anything but aaa from the first example:
(?<A>[^{}]*)\s*=\s*{(?<V>(?<S>([^{}]?)\{(?:[^}{]+|(?&S))+\}))}(,|$)
Regex.Matches, pretty good
"={(.*?)}(, |$)" could work.
string input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, xyz={{ccc} ccc}";
string pattern = "={(.*?)}(, |$)";
var matches = Regex.Matches(input, pattern)
.Select(m => m.Groups[1].Value)
.ToList();
foreach (var m in matches) Console.WriteLine(m);
Output
aaa
bb b={{b}}bb bbb
bb b={{b}},bb bbb,
{ccc} ccc
Regex.Split, really good
I think for this job Regex.Split may be a better tool.
tring input = "zyx={aaa}, yzx={bb b={{b}}bb bbb}, yzx={bb b={{b}},bb bbb,}, ttt={nasty{t, }, }, xyz={{ccc} ccc}, zzz={{{{{{{huh?}";
var matches2 = Regex.Split(input, "(^|, )[a-zA-Z]+=", RegexOptions.ExplicitCapture); // Or "(?:^|, )[a-zA-Z]+=" without the flag
Console.WriteLine("-------------------------"); // Adding this to show the empty element (see note below)
foreach (var m in matches2) Console.WriteLine(m);
Console.WriteLine("-------------------------");
-------------------------
{aaa}
{bb b={{b}}bb bbb}
{bb b={{b}},bb bbb,}
{nasty{t, }, }
{{ccc} ccc}
{{{{{{{huh?}
-------------------------
Note: The empty element is there because:
If a match is found at the beginning or the end of the input string, an empty string is included at the beginning or the end of the returned array.
Case 3
string input = "AAA={aaa}, BBB={bbb, bb{{b}}, bbb{b}}, CCC={ccc}, DDD={ddd}, EEE={00-99} ";
var matches2 = Regex.Split(input, "(?:^|, )[a-zA-Z]+="); // Or drop '?:' and use RegexOptions.ExplicitCapture
foreach (var m in matches2) Console.WriteLine(m);
{aaa}
{bbb, bb{{b}}, bbb{b}}
{ccc}
{ddd}
{00-99}

C# detect last `Enter` in string

I have a lot of string with following pattern(format):
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
XXXX XX
zzzzzz zzz
OR:
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddd dddd
XXXX XX
zzzzzz zzz
OR :
aaaaaaa aa aa
bbbbbbbbbbbbbbb bb bbbbb bbb bb
ccccc c cc ccc
dddddddd
eeeee
XXXX XX
zzzzzz zzz
I want to replace XXXX XX with YYYY. I think I need to detect lastEnterin string and do the operation. How can I do this?
I'd do something like this. If the string in question is always on the second to last line, I'd split the string into an array of strings, a single string per line. Then find out how many lines (strings in array) there are. The object of interest is this number -2. Then replace this string with YYYY.
EDIT:
var result = Regex.Split(input, "\r\n|\r|\n");
int len = result.Length;
result[len - 2] = "YYYY";
var output = string.Join(Environment.NewLine, result);
If it's just the pattern, here's an example with Regex:
\b\S{4}\s\S{2}\b
You could use this regex like this:
var regex = new Regex(#"\b\S{4}\s\S{2}\b");
var result = regex.Replace(inputString, "YYYY");
It looks for a word boundary (e.g. a return), then four non-whitespace characters, then one whitespace character, two non-whitespace characters and a word boundary again. It should do what you want.
However, depending on your input it might be a better idea to use this regex:
\b\S{4} \S{2}\b
So I replaced the whitespace character with an actual space character. Of course it could still happen that one of your characters is counted as a word boundary, then again I'd have to see an example of your input.
Here's an example of how it works:
It's in the C# interactive, which works pretty much the same as normal C#.
EDIT
As I realized that your pattern relevant line ends with a space, you could use this pattern as well:
\b\S{4} \S{2}\b\n
Which would probably work even better. However you'd have to replace it with "YYYY\n" then.

regex to strip number from var in string

I have a long string and I have a var inside it
var abc = '123456'
Now I wish to get the 123456 from it.
I have tried a regex but its not working properly
Regex regex = new Regex("(?<abc>+)=(?<var>+)");
Match m = regex.Match(body);
if (m.Success)
{
string key = m.Groups["var"].Value;
}
How can I get the number from the var abc?
Thanks for your help and time
var body = #" fsd fsda f var abc = '123456' fsda fasd f";
Regex regex = new Regex(#"var (?<name>\w*) = '(?<number>\d*)'");
Match m = regex.Match(body);
Console.WriteLine("name: " + m.Groups["name"]);
Console.WriteLine("number: " + m.Groups["number"]);
prints:
name: abc
number: 123456
Your regex is not correct:
(?<abc>+)=(?<var>+)
The + are quantifiers meaning that the previous characters are repeated at least once (and there are no characters since (?< ... > ... ) is named capture group and is not considered as a character per se.
You perhaps meant:
(?<abc>.+)=(?<var>.+)
And a better regex might be:
(?<abc>[^=]+)=\s*'(?<var>[^']+)'
[^=]+ will match any character except an equal sign.
\s* means any number of space characters (will also match tabs, newlines and form feeds though)
[^']+ will match any character except a single quote.
To specifically match the variable abc, you then put it like this:
(?<abc>abc)\s*=\s*'(?<var>[^']+)'
(I added some more allowances for spaces)
From the example you provided the number can be gotten such as
Console.WriteLine (
Regex.Match("var abc = '123456'", #"(?<var>\d+)").Groups["var"].Value); // 123456
\d+ means 1 or more numbers (digits).
But I surmise your data doesn't look like your example.
Try this:
var body = #"my word 1, my word 2, my word var abc = '123456' 3, my word x";
Regex regex = new Regex(#"(?<=var \w+ = ')\d+");
Match m = regex.Match(body);

Problem with backreferences in C#'s regex

The goal is to extract time and date strings from this:
<strong>Date</strong> - Thursday, June 2 2011 9:00PM<br>
Here's the code:
Match m = Regex.Match(line, "<strong>Date</strong> - (.*) (.*)<br>");
date = m.Captures[0].Value;
time = m.Captures[1].Value;
Thanks to the regex being greedy, it should match the first group all the way up to the last space. But it doesn't. Captures[0] is the whole line and Captures[1] is out of range. Why?
Use Groups, not Captures. Your results will be in Groups[1] and Groups[2].
And personally, I'd recommend naming the groups:
Match m = Regex.Match(line, "<strong>Date</strong> - (?<date>.*) (?<time>.*)<br>");
if( m.Success )
{
date = m.Groups["date"].Value;
time = m.Groups["time"].Value;
}

C# regular expression match

18.jun. 7 noči od 515,00 EUR
here I would like to get 515,00 with a regular expression.
Regex regularExpr = new Regex(#rule.RegularExpression,
RegexOptions.Compiled | RegexOptions.Multiline |
RegexOptions.IgnoreCase | RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);
tagValue.Value = "18.jun. 7 noči od 515,00 EUR";
Match match = regularExpr.Match(tagValue.Value);
object value = match.Groups[2].Value;
regex is: \d+((.\d+)+(,\d+)?)?
but I always get an empty string (""). If I try this regex in Expresso I get an array of 3 values and the third is 515,00.
What is wrong with my C# code that I get an empty string?
Your regex matches the 18 (since the decimal parts are optional), and match.Groups[2] refers to the second capturing parenthesis (.\d+) which should correctly read (\.\d+) and hasn't participated in the match, therefore the empty string is returned.
You need to correct your regex and iterate over the results:
StringCollection resultList = new StringCollection();
Regex regexObj = new Regex(#"\d+(?:[.,]\d+)?");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Value);
matchResult = matchResult.NextMatch();
}
resultList[2] will then contain your match.
Make sure you escaped everything properly when you created the regular expression.
Regex re = new Regex("\d+((.\d+)+(,\d+)?)?")
is very different from
Regex re = new Regex(#"\d+((.\d+)+(,\d+)?)?")
You probably want the second.
I suspect the result you're getting in Expresso is equivalent to this:
string s = "18.jun. 7 noči od 515,00 EUR";
Regex r = new Regex(#"\d+((.\d+)+(,\d+)?)?");
foreach (Match m in r.Matches(s))
{
Console.WriteLine(m.Value);
}
In other words, it's not the contents of the second capturing group you're seeing, it's the third match. This code shows it more clearly:
Console.WriteLine("{0,10} {1,10} {2,10} {3,10}",
#"Group 0", #"Group 1", #"Groups 2", #"Group 3");
Regex r = new Regex(#"\d+((.\d+)+(,\d+)?)?");
foreach (Match m in r.Matches(s))
{
Console.WriteLine("{0,10} {1,10} {2,10} {3,10}",
m.Groups[0].Value, m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value);
}
output:
Group 0 Group 1 Group 2 Group 3
18
7
515,00 ,00 ,00
On to the regex itself. If you want to match only the price and not those other numbers, you need to be more specific. For example, if you know the ,00 part will always be present, you can use this regex:
#"(?n)\b\d+(\.\d+)*(,\d+)\b"
(?n) is the inline form of the ExplicitCapture option, which turns those two capturing groups into non-capturing groups. Of the RegexOptions you did specify, the only one that has any effect is Compiled, which speeds up matching of the regex slightly, at the expense of slowing down its construction and hogging memory. \b is a word boundary.
It looks like you're applying all those modifiers blindly to every regex when you construct them, which is not a good idea. If a particular regex needs a certain modifier, you should try to specify it in the regex itself with an inline modifier, like I did with (?n).

Categories