How to Split those regular expression - c#

Here i have following strings ,
"#,##0.00"
"\"$\"#,##0.0000"
I need to split using regular expressions.
My Expected output is
"#,##0.00" => 2 (decimal)
"\"$\"#,##0.0000" => $4(4 decimal with $)
How to convert can u please suggest any way.
Thanks

You may use the following regex:
^(?:"([^"]+)")?.*?(0+)$
The pattern matches:
^ - start of string
(?:"([^"]+)")? - 1 or 0 sequences of:
" - a double quote
([^"]+) - Group 1 capturing 1 or more chars other than "
"
.*? - any characters other than newline, 0 or more repetitions
(0+) - Group 2 capturing 1 or more zeros
$ - end of string
And here is a C# demo:
var pat = #"^(?:""([^""]+)"")?.*?(0+)$";
var match = Regex.Match("#.##0.00", pat);
if (match.Success) {
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Length.ToString());
} // => 2
// With "\"$\"#,##0.0000" input: $4
See the IDEONE demo

Related

Problem with brackets in regular expression in C#

can anybody help me with regular expression in C#?
I want to create a pattern for this input:
{a? ab 12 ?? cd}
This is my pattern:
([A-Fa-f0-9?]{2})+
The problem are the curly brackets. This doesn't work:
{(([A-Fa-f0-9?]{2})+)}
It just works for
{ab}
I would use {([A-Fa-f0-9?]+|[^}]+)}
It captures 1 group which:
Match a single character present in the list below [A-Fa-f0-9?]+
Match a single character not present in the list below [^}]+
If you allow leading/trailing whitespace within {...} string, the expression will look like
{(?:\s*([A-Fa-f0-9?]{2}))+\s*}
See this regex demo
If you only allow a single regular space only between the values inside {...} and no space after { and before }, you can use
{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}
See this regex demo. Note this one is much stricter. Details:
{ - a { char
(?:\s*([A-Fa-f0-9?]{2}))+ - one or more occurrences of
\s* - zero or more whitespaces
([A-Fa-f0-9?]{2}) - Capturing group 1: two hex or ? chars
\s* - zero or more whitespaces
} - a } char.
See a C# demo:
var text = "{a? ab 12 ?? cd}";
var pattern = #"{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}";
var result = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Captures.Cast<Capture>().Select(m => m.Value))
.ToList();
foreach (var list in result)
Console.WriteLine(string.Join("; ", list));
// => a?; ab; 12; ??; cd
If you want to capture pairs of chars between the curly's, you can use a single capture group:
{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}
Explanation
{ Match {
( Capture group 1
[A-Fa-f0-9?]{2} Match 2 times any of the listed characters
(?: [A-Fa-f0-9?]{2})* Optionally repeat a space and again 2 of the listed characters
) Close group 1
} Match }
Regex demo | C# demo
Example code
string pattern = #"{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}";
string input = #"{a? ab 12 ?? cd}
{ab}";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[1].Value);
}
Output
a? ab 12 ?? cd
ab

Regex to match positive and negative numbers and text between "" after a character

I need a regex for an input that contains positive and negative numbers and sometimes a string between " and ". I'm not sure if this can be done in only one pattern. Here's some test cases for the pattern:
*PATH "C:\Users\User\Desktop\Media\SoundBanks\Ambient\WAV_Data\AD_SMP_SFX_WIND0.wav"
*NODECOLOR 0 255 140
*FILEREF -7
*FREQUENCY 22050
The idea would be to use a pattern that returns:
C:\Users\User\Desktop\Media\SoundBanks\Ambient\WAV_Data\AD_SMP_SFX_WIND0.wav
0 255 140
-7
22050
The content always goes after the character *. I've split this in two patterns because I don't know how to do it all in one, but doesn't work:
MatchCollection NumberMtaches = Regex.Matches(FileLine, #"(?<=[*])-?[0-9]+");
MatchCollection FilePathMatches = Regex.Matches(FileLine, #"/,([^,]*)(?=,)/g");
You may read the file into a string and run the following regex:
var matches = Regex.Matches(filecontents, #"(?m)^\*\w+[\s-[\r\n]]*""?(.*?)""?\r?$")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();
See the .NET regex demo.
Details:
(?m) - RegexOptions.Multiline option on
^ - start of a line
\* - a * char
\w+ - one or more word chars
[\s-[\r\n]]* - zero or more whitespaces other than CR and LF
"? - an optional " char
(.*?) - Group 1: any zero or more chars other than an LF char, as few as possible
"? - an optional " char
\r? - an optional CR
$ - end of a line/string.

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.
You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208
Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();
You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".
You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

Matching a pattern in a string

I have a string
string str = "I am fine. How are you? You need exactly 4 pieces of sandwiches. Your ADAST Count is 5. Okay thank you ";
What I want is, get the ADAST count value. For the above example, it is 5.
The problem here is, the is after the ADAST Count. It can be is or =. But there will the two words ADAST Count.
What I have tried is
var resultString = Regex.Match(str, #"ADAST\s+count\s+is\s+\d+", RegexOptions.IgnoreCase).Value;
var number = Regex.Match(resultString, #"\d+").Value;
How can I write the pattern which will search is or = ?
You may use
ADAST\s+count\s+(?:is|=)\s+(\d+)
See the regex demo
Note that (?:is|=) is a non-capturing group (i.e. it is used to only group alternations without pushing these submatches on to the capture stack for further retrieval) and | is an alternation operator.
Details:
ADAST - a literal string
\s+ - 1 or more whitespaces
count - a literal string
\s+ - 1 or more whitespaces
(?:is|=) - either is or =
\s+ - 1 or more whitespaces
(\d+) - Group 1 capturing one or more digits
C#:
var m = Regex.Match(s, #"ADAST\s+count\s+(?:is|=)\s+(\d+)", RegexOptions.IgnoreCase);
if (m.Success) {
Console.Write(m.Groups[1].Value);
}

Regex to search for single 0's and add commas

I have the following all number data:
4245 4 0 0242 4424.09 0 422404 5955 0
2234234.234 224 0
2423 234 0
I need to process each line individually. I need to remove all the single 0's and output as follows with commas:
4245,4, 0242,4424.09, 422404,5955
2234234.234,224
2423,234
I got the part of removing the single digits working:
var result = Regex.Replace(inData, #"\b\s0\b", string.Empty);
But cannot figure out how to add the commas in between each number. Any help would be appreciated. Thanks.
You can achieve what you want with one Regex.Replace operation, but with a custom match evaluator:
var input = "4245 4 0 242 4424.09 0 422404 5955 0";
var results = Regex.Replace(input, #"(?:\s+|^)0(\s+)|(\s+0)$|\s+", m =>
m.Groups[1].Success ? ", " :
m.Groups[2].Success ? "" : ",");
The point is to match those parts we need and capture into groups, so that they can be further analyzed and an appropriate action could be taken.
Pattern details:
(?:\s+|^)0(\s+) - match 0 that is either at the start or with whitespaces before it and that is followed with 1 or more whitespaces (the whitespaces after 0 are stored in Group 1)
| - or
(\s+0)$ - Group 2 capturing one or more whitespaces, then a 0 at the end ($) of the string
| - or
\s+ - (3rd option) 1 or more whitespaces in all other contexts.
And just in case one likes a more readable version, here is an alternative where the final 0 is removed with string methods, and then 1 regex is used to replace all spaces inside digits with a comma, but before we replace all 0 s with a mere String.Replace.
var inp = "4245 4 0 0242 4424.09 0 422404 5955 0";
inp = inp.EndsWith(" 0") ? inp.Substring(0, inp.Length - 2) : inp;
var output = Regex.Replace(inp.Replace(" 0 ", ", "), #"(\d) (\d)", "$1,$2");
I understand that you want to
Replace spaces with commas ("xy z" => "xy,z")
Replace single zeros with spaces ("xy 0 z" => "xy, z")
Then I would recommend two string replacements:
inData.replace(" ", ",");,
inData.replace(",0", " ");
Using this will replace any whitespace character with a comma.
var result = Regex.Replace(inData, #"\s+", ",");
\s+ matches any whitespace character.
Then run your other regex to remove the single digit 0's
You could just do a string.replace(" ", ","), right? (if I am understanding your question correctly)
Or you could even do a string.split(" ") into an array, then string.join(','). Although this is probably less efficient.

Categories