C# Regex match multiple words in a string - c#

How can I find all the matches in a string using a regular expression run in C#?
I want to find all matches in the below example string.
Example:
inputString: Hello (mail) byebye (time) how are you (mail) how are you (time)
I want to match (mail) and (time) from the example. Including parentheses( and ).
In attempting to solve this, I've writtent the following code.
string testString = #"(mail)|(time)";
Regex regx = new Regex(Regex.Escape(testString), RegexOptions.IgnoreCase);
List<string> mactches = regx.Matches(inputString).OfType<Match>().Select(m => m.Value).Distinct().ToList();
foreach (string match in mactches)
{
//Do something
}
Is the pipe(|) used for the logical OR condition?

Using Regex.Escape(testString) is going to escape your pipe character, turning
#"(mail)|(time)"
effectively into
#"\(mail\)\|\(time\)".
Thus, your regex is looking for the literal "(mail)|(time)".
If all of your matches are as simple as words surrounded by parens, I would build the regex like this:
List<string> words = new List<string> { "(mail)", "(time)", ... };
string pattern = string.Join("|", words.Select(w => Regex.Escape(w)));
Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);

Escape the parentheses in your test string:
string testString = #"\(mail\)|\(time\)";
Remove Regex.Escape:
Regex regx = new Regex(testString, RegexOptions.IgnoreCase);
Output (includes parentheses):
(mail)
(time)
The reason Regex.Escape isn't working in your case is that it escapes the | character as well:
Escapes a minimal set of metacharacters (\, *, +, ?, |, {, [, (, ), ^, $, ., #, and whitespace) by replacing them with their \ codes.

Related

Remove Adjacent Space near a Special Character using regex

Using regex want to remove adjacent Space near replacement Character
replacementCharcter = '-'
this._adjacentSpace = new Regex($#"\s*([\{replacementCharacter}])\s*");
MatchCollection replaceCharacterMatch = this._adjacentSpace.Matches(extractedText);
foreach (Match replaceCharacter in replaceCharacterMatch)
{
if (replaceCharacter.Success)
{
cleanedText = Extactedtext.Replace(replaceCharacter.Value, replaceCharacter.Value.Trim());
}
}
Extractedtext = - whi, - ch
cleanedtext = -whi, -ch
expected result : cleanedtext = -whi,-ch
You can use
var Extactedtext = "- whi, - ch";
var replacementCharacter = "-";
var _adjacentSpace = new Regex($#"\s*({Regex.Escape(replacementCharacter)})\s*");
var cleanedText = _adjacentSpace.Replace(Extactedtext, "$1");
Console.WriteLine(cleanedText); // => -whi,-ch
See the C# demo.
NOTE:
replacementCharacter is of type string in the code above
$#"\s*({Regex.Escape(replacementCharacter)})\s*" will create a regex like \s*-\s*, Regex.Escape() will escape any regex-special char (like +, (, etc.) correctly to be used in a regex pattern, and the whole regex simply matches (and captured into Group 1 with the capturing parentheses) the replacementCharacter enclosed with zero or more whitespaces
No need using Regex.Matches, just replace all matches if there are any, that is how Regex.Replace works.
_adjacentSpace is the compiled Regex object, to replace, just call the .Replace() method of the regex object instance
The replacement is a backreference to the Group 1 value, the - char here.

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

Regex match between two strings that might contain another string

I'm doing a regex that is trying to match the following string:
.\SQL2012
From the two strings (they are contained within another larger string but that is irrelevant in this case):
/SERVER "\".\SQL2012\""
/SERVER .\SQL2012
So the "\" before and the \"" after the match may both be omitted in some cases. The regex I've come up with (from a previous question here on StackOverflow) is the following:
(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )
Which works fine if I'm trying to match TEST_SERVER instead of .\SQL2012 (because \w does not match special characters). Is there a way to match anything until \"" or a whitespace occurs?
I'm doing this in C#, here's my code:
string input = "/SERVER \"\\\".\\SQL2012\\\"\"";
string pattern = #"(?<=\/SERVER\s*(?:[""\\""]+)?)\w+(?=(?:[\\""""]+|$)| )";
Regex regEx = new Regex(pattern);
MatchCollection matches = regEx.Matches(input);
foreach (Match match in matches)
{
Console.WriteLine(match.ToString());
}
Console.ReadKey();
Add a word boundary \b just before to the lookahead,
string input = "/SERVER .\\SQL2012";
Regex rgx = new Regex(#"(?<=\/SERVER\s+""\\"").*?\b(?=\\""""|$| )|(?<=\/SERVER\s+).*?\b(?= |$)");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
Console.WriteLine(input);
IDEONE

C# - Regex Match whole words

I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.

Regex help with sample pattern. C#

I decided to use Regex, now I have two problems :)
Given the input string "hello world [2] [200] [%8] [%1c] [%d]",
What would be an approprite pattern to match the instances of "[%8]" "[%1c]" + "[%d]" ? (So a percentage sign, followed by any length alphanumeric, all enclosed in square brackets).
for the "[2]" and [200], I already use
Regex.Matches(input, "(\\[)[0-9]*?\\]");
Which works fine.
Any help would be appreicated.
MatchCollection matches = null;
try {
Regex regexObj = new Regex(#"\[[%\w]+\]");
matches = regexObj.Matches(input);
if (matches.Count > 0) {
// Access individual matches using matches.Item[]
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}
The Regex needed to match this pattern of "[%anyLengthAlphaNumeric]" in a string is this "[(%\w+)]"
The leading "[" is escaped with the "\" then you are creating a grouping of characters with the (...). This grouping is defined as %\w+. The \w is a shortcut for all word characters including letters and digits no spaces. The + matches one or more instances of the previous symbol, character or group. Then the trailing "]" is escaped with a "\" and catches the closing bracket.
Here is a basic code example:
string input = #"hello world [2] [200] [%8] [%1c] [%d]";
Regex example = new Regex(#"\[(%\w+)\]");
MatchCollection matches = example.Matches(input);
Try this:
Regex.Matches(input, "\\[%[0-9a-f]+\\]");
Or as a combined regular expression:
Regex.Matches(input, "\\[(\\d+|%[0-9a-f]+)\\]");
How about #"\[%[0-9a-f]*?\]"?
string input = "hello world [2] [200] [%8] [%1c] [%d]";
MatchCollection matches = Regex.Matches(input, #"\[%[0-9a-f]*?\]");
matches.Count // = 3

Categories