Regex - Match multiple times in a string - c#

I am trying to do a regex search on 'NNTSY` so that I can get two matches.
NNTS
NTSY
When I attempted to match using the pattern ?<NGrlyosylation>N[^P][ST][^P])", I am only getting one match, which is NNTS.
How can I use Regex to match NNTSY so that two matches can be found?
NOTE: Background info: Rosalind problem can be found here.
Here is my code.
input = "NNTSY";
Regex regex = new Regex("(?<NGrlyosylation>N[^P][ST][^P])", RegexOptions.Compiled | RegexOptions.IgnoreCase);
MatchCollection matches = regex.Matches(input);
foreach (Match match in matches)
{
// Need to add 1 to because match index is 0 based
const int offset = 1;
yield return match.Index + offset;
}

Finding overlapping matches is usually not allowed in most programming languages (except few).
So, I don't think there exists a pure regex way to solve this, but you can use Substring in C# with lookahead as
(?=N[^P][ST][^P]).
C# Code
string input = "NNTSY";
Regex regex = new Regex("(?=N[^P][ST][^P]).", RegexOptions.Compiled | RegexOptions.IgnoreCase);
Match match = regex.Match(input);
while (match.Success)
{
Console.WriteLine(input.Substring(match.Index, 4));
match = match.NextMatch();
}
Ideone Demo

Related

Regex match between two strings that might contain another string

I'm doing a regex that is trying to match the following string:
.\SQL2012
From the two strings (they are contained within another larger string but that is irrelevant in this case):
/SERVER "\".\SQL2012\""
/SERVER .\SQL2012
So the "\" before and the \"" after the match may both be omitted in some cases. The regex I've come up with (from a previous question here on StackOverflow) is the following:
(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )
Which works fine if I'm trying to match TEST_SERVER instead of .\SQL2012 (because \w does not match special characters). Is there a way to match anything until \"" or a whitespace occurs?
I'm doing this in C#, here's my code:
string input = "/SERVER \"\\\".\\SQL2012\\\"\"";
string pattern = #"(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )";
Regex regEx = new Regex(pattern);
MatchCollection matches = regEx.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(match.ToString());
}
Console.ReadKey();
Add a word boundary \b just before to the lookahead,
string input = "/SERVER .\\SQL2012";
Regex rgx = new Regex(#"(?<=\/SERVER\s+""\\"").*?\b(?=\\""""|$| )|(?<=\/SERVER\s+).*?\b(?= |$)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
Console.WriteLine(input);
IDEONE

How can i find and get the value of a match with regex?

I have the following pattern >4.66< I would like to find in a string. The following code is ment to find the pattern and give med the double value:
string data = File.ReadAllText("test.txt");
string pattern = "^>\\d.\\d<";
if (Regex.IsMatch(data, pattern))
{
MatchCollection mc = Regex.Matches(data, pattern);
foreach (Match m in mc)
{
double value = double.Parse(m.Value.Substring(1, m.Value.Length - 1));
string foo = "" + 2;
}
}
I think my pattern is wrong since i cant seem to find >4.66< and I see in the source that it is right there :D
Use the following regex:
(?<=>)\d+\.\d+(?=<)
Slightly simplified code:
string data = File.ReadAllText("test.txt");
MatchCollection mc = Regex.Matches(data, #"(?<=>)\d+\.\d+(?=<)");
foreach (Match m in mc)
{
double value = double.Parse(m.Value, CultureInfo.InvariantCulture);
}
You don't need to call IsMatch method, because Matches will simply return you empty collection if nothing is matched.
Main thing is, you are missing a quantifier. With \d you are matching exactly 1 digit. If you want to match more of them, you need to define it.
+ is a quantifier repeating the previous item 1 ore more.
To match a dot literally it needs to be escaped, because it is a special character in regex.
Use a verbatim string to avoid double escaping
Match only what you need (like Ulugbek described) with lookaround assertions or use a capturing group
I removed ^ from your pattern, because this is matching the start of the string, and you wrote you want to find within a string.
So we end up with:
string pattern = #">(\d.\d+)<";
MatchCollection mc = Regex.Matches(data, pattern);
foreach (Match m in mc)
{
double value = double.Parse(m.groups[1]);
}

Why doesn't $ always match to an end of line

Below is a simple code snippet that demonstrates the seemingly buggy behavior of end of line matching ("$") in .Net regular expressions. Am I missing something obvious?
string input = "Hello\nWorld\n";
string regex = #"^Hello\n^World\n"; //Match
//regex = #"^Hello\nWorld\n"; //Match
//regex = #"^Hello$"; //Match
//regex = #"^Hello$World$"; //No match!!!
//regex = #"^Hello$^World$"; //No match!!!
Match m = Regex.Match(input, regex, RegexOptions.Multiline | RegexOptions.CultureInvariant);
Console.WriteLine(m.Success);
$ does not consume the newline character(s). #"^Hello$\s+^World$" should match.
The $ doesn't match a newline. It matches the end of the string in which the pattern is applied (unless multiline mode is enabled). There isn't much sense in having two ends in a string.

Regex to match and return group names

I need to match the following strings and returns the values as groups:
abctic
abctac
xyztic
xyztac
ghhtic
ghhtac
Pattern is wrote with grouping is as follows:
(?<arch>[abc,xyz,ghh])(?<flavor>[tic,tac]$)
The above returns only parts of group names. (meaning match is not correct).
If I use * in each sub pattern instead of $ at the end, groups are correct, but that would mean that abcticff will also match.
Please let me know what my correct regex should be.
Your pattern is incorrect because a pipe symbol | is used to specify alternate matches, not a comma in brackets as you were using, i.e., [x,y].
Your pattern should be: ^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$
The ^ and $ metacharacters ensures the string matches from start to end. If you need to match text in a larger string you could replace them with \b to match on a word boundary.
Try this approach:
string[] inputs = { "abctic", "abctac", "xyztic", "xyztac", "ghhtic", "ghhtac" };
string pattern = #"^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$";
foreach (var input in inputs)
{
var match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("Arch: {0} - Flavor: {1}",
match.Groups["arch"].Value,
match.Groups["flavor"].Value);
}
else
Console.WriteLine("No match for: " + input);
}

Regex help with sample pattern. C#

I decided to use Regex, now I have two problems :)
Given the input string "hello world [2] [200] [%8] [%1c] [%d]",
What would be an approprite pattern to match the instances of "[%8]" "[%1c]" + "[%d]" ? (So a percentage sign, followed by any length alphanumeric, all enclosed in square brackets).
for the "[2]" and [200], I already use
Regex.Matches(input, "(\\[)[0-9]*?\\]");
Which works fine.
Any help would be appreicated.
MatchCollection matches = null;
try {
Regex regexObj = new Regex(#"\[[%\w]+\]");
matches = regexObj.Matches(input);
if (matches.Count > 0) {
// Access individual matches using matches.Item[]
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
The Regex needed to match this pattern of "[%anyLengthAlphaNumeric]" in a string is this "[(%\w+)]"
The leading "[" is escaped with the "\" then you are creating a grouping of characters with the (...). This grouping is defined as %\w+. The \w is a shortcut for all word characters including letters and digits no spaces. The + matches one or more instances of the previous symbol, character or group. Then the trailing "]" is escaped with a "\" and catches the closing bracket.
Here is a basic code example:
string input = #"hello world [2] [200] [%8] [%1c] [%d]";
Regex example = new Regex(#"\[(%\w+)\]");
MatchCollection matches = example.Matches(input);
Try this:
Regex.Matches(input, "\\[%[0-9a-f]+\\]");
Or as a combined regular expression:
Regex.Matches(input, "\\[(\\d+|%[0-9a-f]+)\\]");
How about #"\[%[0-9a-f]*?\]"?
string input = "hello world [2] [200] [%8] [%1c] [%d]";
MatchCollection matches = Regex.Matches(input, #"\[%[0-9a-f]*?\]");
matches.Count // = 3

Categories