Matching a pattern in a string

Matching a pattern in a string - c#

I have a string
string str = "I am fine. How are you? You need exactly 4 pieces of sandwiches. Your ADAST Count is 5. Okay thank you ";
What I want is, get the ADAST count value. For the above example, it is 5.
The problem here is, the is after the ADAST Count. It can be is or =. But there will the two words ADAST Count.
What I have tried is
var resultString = Regex.Match(str, #"ADAST\s+count\s+is\s+\d+", RegexOptions.IgnoreCase).Value;
var number = Regex.Match(resultString, #"\d+").Value;
How can I write the pattern which will search is or = ?

You may use
ADAST\s+count\s+(?:is|=)\s+(\d+)
See the regex demo
Note that (?:is|=) is a non-capturing group (i.e. it is used to only group alternations without pushing these submatches on to the capture stack for further retrieval) and | is an alternation operator.
Details:
ADAST - a literal string
\s+ - 1 or more whitespaces
count - a literal string
\s+ - 1 or more whitespaces
(?:is|=) - either is or =
\s+ - 1 or more whitespaces
(\d+) - Group 1 capturing one or more digits
C#:
var m = Regex.Match(s, #"ADAST\s+count\s+(?:is|=)\s+(\d+)", RegexOptions.IgnoreCase);
if (m.Success) {
Console.Write(m.Groups[1].Value);
}

Related

Regex only letters except set of numbers

I'm using Replace(#"[^a-zA-Z]+", "");
leave only letters, but I have a set of numbers or characters that I want to keep as well, ex: 122456 and 112466. But I'm having trouble leaving it only if it's this sequence:
ex input:
abc 1239 asm122456000
I want to:
abscasm122456
tried this: ([^a-zA-Z])+|(?!122456)

My answer doesn't applying Replace(), but achieves a similar result:
(?:[a-zA-Z]+|\d{6})
which captures the group (non-capturing group) with the alphabetic character(s) or a set of digits with 6 occurrences.
Regex 101 & Test Result
Join all the matching values into a single string.
using System.Linq;
Regex regex = new Regex("(?:[a-zA-Z]+|\\d{6})");
string input = "abc 1239 asm12245600";
string output = "";
var matches = regex.Matches(input);
if (matches.Count > 0)
output = String.Join("", matches.Select(x => x.Value));
Sample .NET Fiddle

Alternate way,
using .Split() and .All(),
string input = "abc 1239 asm122456000";
string output = string.Join("", input.Split().Where(x => !x.All(char.IsDigit)));
.NET Fiddle

It is very simple: you need to match and capture what you need to keep, and just match what you need to remove, and then utilize a backreference to the captured group value in the replacement pattern to put it back into the resulting string.
Here is the regex:
(122456|112466)|[^a-zA-Z]
See the regex demo. Details:
(122456|112466) - Capturing group with ID 1: either of the two alternatives
| - or
[^a-zA-Z] - a char other than an ASCII letter (use \P{L} if you need to match any char other than any Unicode letter).
Note the removed + quantifier as [^A-Za-z] also matches digits.
You need to use $1 in the replacement:
var result = Regex.Replace(text, #"(122456|112466)|[^a-zA-Z]", "$1");

Regex - Get digits after a colon

I have a regex:
var topPayMatch = Regex.Match(result, #"(?<=Top Pay)(\D*)(\d+(?:\.\d+)?)", RegexOptions.IgnoreCase);
And I have to convert this to int which I did
topPayMatch = Convert.ToInt32(topPayMatchString.Groups[2].Value);
So now...
Top Pay: 1,000,000 then it currently grabs the first digit, which is 1. I want all 1000000.
If Top Pay: 888,888 then I want all 888888.
What should I add to my regex?

You can use something as simple like #"(?<=Top Pay: )([0-9,]+)". Note that, decimals will be ignored with this regex.
This will match all numbers with their commas after Top Pay:, which after you can parse it to an integer.
Example:
Regex rgx = new Regex(#"(?<=Top Pay: )([0-9,]+)");
string str = "Top Pay: 1,000,000";
Match match = rgx.Match(str);
if (match.Success)
{
string val = match.Value;
int num = int.Parse(val, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(num);
}
Console.WriteLine("Ended");
Source:
Convert int from string with commas

If you use the lookbehind, you don't need the capture groups and you can move the \D* into the lookbehind.
To get the values, you can match 1+ digits followed by optional repetitions of , and 1+ digits.
Note that your example data contains comma's and no dots, and using ? as a quantifier means 0 or 1 time.
(?<=Top Pay\D*)\d+(?:,\d+)*
The pattern matches:
(?<=Top Pay\D*) Positive lookbehind, assert what is to the left is Top Pay and optional non digits
\d+ Match 1+ digits
(?:,\d+)* Optionally repeat a , and 1+ digits
See a .NET regex demo and a C# demo
string pattern = #"(?<=Top Pay\D*)\d+(?:,\d+)*";
string input = #"Top Pay: 1,000,000
Top Pay: 888,888";
RegexOptions options = RegexOptions.IgnoreCase;
foreach (Match m in Regex.Matches(input, pattern, options))
{
var topPayMatch = int.Parse(m.Value, System.Globalization.NumberStyles.AllowThousands);
Console.WriteLine(topPayMatch);
}
Output
1000000
888888

Match only the nth occurrence using a regular expression

I have a string with 3 dates in it like this:
XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx
I want to select the 2nd date in the string, the 20180208 one.
Is there away to do this purely in the regex, with have to resort to pulling out the 2 match in code. I'm using C# if that matters.
Thanks for any help.

You could use
^(?:[^_]+_){2}(\d+)
And take the first group, see a demo on regex101.com.
Broken down, this says
^ # start of the string
(?:[^_]+_){2} # not _ + _, twice
(\d+) # capture digits
C# demo:
var pattern = #"^(?:[^_]+_){2}(\d+)";
var text = "XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx";
var result = Regex.Match(text, pattern)?.Groups[1].Value;
Console.WriteLine(result); // => 20180208

Try this one
MatchCollection matches = Regex.Matches(sInputLine, #"\d{8}");
string sSecond = matches[1].ToString();

You could use the regular expression
^(?:.*?\d{8}_){1}.*?(\d{8})
to save the 2nd date to capture group 1.
Demo
Naturally, for n > 2, replace {1} with {n-1} to obtain the nth date. To obtain the 1st date use
^(?:.*?\d{8}_){0}.*?(\d{8})
Demo
The C#'s regex engine performs the following operations.
^ # match the beginning of a line
(?: # begin a non-capture group
.*? # match 0+ chars lazily
\d{8} # match 8 digits
_ # match '_'
) # end non-capture group
{n} # execute non-capture group n (n >= 0) times
.*? # match 0+ chars lazily
(\d{8}) # match 8 digits in capture group 1
The important thing to note is that the first instance of .*?, followed by \d{8}, because it is lazy, will gobble up as many characters as it can until the next 8 characters are digits (and are not preceded or followed by a digit. For example, in the string
_1234abcd_efghi_123456789_12345678_ABC
capture group 1 in (.*?)_\d{8}_ will contain "_1234abcd_efghi_123456789".

You can use System.Text.RegularExpressions.Regex
See the following example
Regex regex = new Regex(#"^(?:[^_]+_){2}(\d+)"); //Expression from Jan's answer just showing how to use C# to achieve your goal
GroupCollection groups = regex.Match("XXXXX_20160207_20180208_XXXXXXX_20190408T160742_xxxxx").Groups;
if (groups.Count > 1)
{
Console.WriteLine(groups[1].Value);
}

how to get password from string "password is Rtt3Ved36" by regexp?

on condition that string may be is "password Rtt3Ved36" (without "is").
(?<=password\ is|password\ ).*
That regexp doesn't work, because always return "is Rtt3Ved36" (but i need "Rtt3Ved36"). How to keep order in OR condition?

You may use
password(?: is)?\s*(.*)
and grab Group 1 value.
See the regex demo.
Details
password - a literal substring
(?: is)? - an optional substring space + is
\s* - 0+ whitespace
(.*) - Group 1: any 0+ chars other than a newline.
In C#:
var m = Regex.Match(s, #"password(?: is)?\s*(.*)");
var result = string.Empty;
if (m.Success)
{
result = m.Groups[1].Value;
}

According to this site: https://regex101.com/ your regex does work and returns "Rtt3Ved36".

RegEx string between N and (N+1)th Occurance

I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;

If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.

Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2

Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Matching a pattern in a string - c#

Related

Regex only letters except set of numbers

Regex - Get digits after a colon

Match only the nth occurrence using a regular expression

how to get password from string "password is Rtt3Ved36" by regexp?

RegEx string between N and (N+1)th Occurance

Categories

Resources