Use Regex in Dynamical Approach - c#

Today I can do the hard code but later I would like to change it that the string word pattern can be applied in side of the #"\bfood\b". I want to make it into dynamical without using hardcoding
IN the futore I would like to have the word "chicken" instead of "food".
I tried to replace the code "#"\bfood\b" into #"\b" + pattern +"\b" but it doesn't work.
string inputText = "food ddd dd";
string dddd = "\bfood\b";
string pattern = "food";
Regex rx = new Regex(#"\bfood\b", RegexOptions.None);
MatchCollection mc = rx.Matches(inputText);
if (rx.Match(pattern).Success)
{
int dd = 3;
}

You should use
#"\b" + Regex.Escape(pattern) + #"\b"
Or a more generic:
#"(?<!\w)" + Regex.Escape(pattern) + #"(?!\w)"
Or with the string.format:
Regex rx = new Regex(string.Format(#"\b{0}\b", Regex.Escape(pattern)), RegexOptions.None);
Or with the string interpolation:
Regex rx = new Regex($#"(?<!\w){Regex.Escape(pattern)}(?!\w)", RegexOptions.None);
Now, why do I suggest (?<!\w) and (?!\w) lookarounds? Because these are word boundaries that are not context dependent. What if you decide to pass a |border| pattern? The \b\|border\|\b will most probably fail to match most of the cases you intended to match because \b will require a word character to appear before the first | and after the last |. The lookarounds will match the |border| string only if not enclosed with word characters.

The reason your #"\b" + pattern +"\b" didn't work is that the verbatim string literal # wasn't applied to both pieces of your regex building.
Fix this with either
#"\b" + pattern + #"\b"
Or even better use String.Format()
String.Format(#"\b{0}\b", pattern);
Or use C#6 string interpolation
$#"\b{pattern}\b";

Related

Regex MatchEvaluator doesn't work with words contains "_" underscore

I am trying to match and format output regex result. I have a words array e.g:
var resultArray = new List {"new", "new_"}; // notice the word with underscore
But when i try to search a sentence like this:
New Law_Book_with_New_Cover
it does match the with the first word "New" but not the middle one with "New_". here is my code
if (resultArray.Count > 0)
{
string regex = "\\b(?:" + String.Join("|", resultArray.ToArray()) + ")\\b";
MatchEvaluator myEvaluator = new MatchEvaluator(GetHighlightMarkup);
return Regex.Replace(result, regex, myEvaluator, RegexOptions.Compiled | RegexOptions.Multiline | RegexOptions.IgnoreCase);
}
private static string GetHighlightMarkup(Match m)
{
return string.Format("<span class=\"focus\">{0}</span>", m.Value);
}
And yes i did tried escaping the word "\New_" but no luck still.
What am i missing ?
It seems you need to match your items only if they are not enclosed with letters.
You may replace the word boundaries in your regex with lookarounds:
string regex = #"(?<!\p{L})(?:" + String.Join("|", resultArray.ToArray()) + #")(?!\p{L})";
where \p{L} matches any letter, (?<!\p{L}) requires the absence of a letter before the match, and (?!\p{L}) disallows a letter after the match.

Detecting a word followed by a dot or whitespace using regex

I am using regex and C# to find occurrences of a particular word using
Regex regex = new Regex(#"\b" + word + #"\b");
How can I modify my Regex to only detect the word if it is either preceded with a whitespace, followed with a whitespace or followed with a dot?
Examples:
this.Button.Value - should match
this.value - should match
document.thisButton.Value - should not match
You may use lookarounds and alternation to check for the 2 possibilities when a keyword is enclosed with spaces or is just followed with a dot:
var line = "this.Button.Value\nthis.value\ndocument.thisButton.Value";
var word = "this";
var rx =new Regex(string.Format(#"(?<=\s)\b{0}\b(?=\s)|\b{0}\b(?=\.)", word));
var result = rx.Replace(line, "NEW_WORD");
Console.WriteLine(result);
See IDEONE demo and a regex demo.
The pattern matches:
(?<=\s)\bthis\b(?=\s) - whole word "this" that is preceded with whitespace (?<=\s) and that is followed with whitespace (?=\s)
| - or
\bthis\b(?=\.) - whole word "this" that is followed with a literal . ((?=\.))
Since lookarounds are not consuming characters (the regex index remains where it was) the characters matched with them are not placed in the match value, and are thus untouched during the replacement.
If i am understanding you correctly:
Regex regex = new Regex(#"\b" + (word " " || ".") + #"\b");
Regex regex = new Regex(#"((?<=( \.))" + word + #"\b)" + "|" + #"(\b" + word + #"[ .])");
However, note that this could cause trouble if word contains characters that have special meanings in Regular Expressions. I'm assuming that word contains alpha-numeric characters only.
The (?<=...) match group checks for preceding and (?=...) checks for following, both without including them in the match.
Regex regex = new Regex(#"(?<=\s)\b" + word + #"\b|\b" + word + #"\b(?=[\s\.])");
EDIT: Pattern updated.
EDIT 2: Online test: http://ideone.com/RXRQM5

C# RegEx to find a specific string or all words in a string

Looking it up, I thought I understood how to look up a string of multiple words in a sentence, but it does not find a match. Can someone tell me what I am doing wrong? I need to be able to find a single or multiple word match. I passed in "to find" to the method and it did not find the match. Also, if the user does not enclose their search phrase in quotes, I also need it to search on each word entered.
var pattern = #"\b\" + searchString + #"\b"; //searchString is passed in.
Regex rgx = new Regex(pattern);
var sentence = "I need to find a string in this sentence!";
Match match = rgx.Match(sentence);
if (match.Success)
{
// Do something with the match.
}
Just remove the second \ in the first #"\b\":
var pattern = #"\b" + searchString + #"\b";
^
See IDEONE demo
Note that in case you have special regex metacharacters (like (, ), [, +, *, etc.) in your searchStrings, you can use Regex.Escape() to escape them:
var pattern = #"\b" + Regex.Escape(searchString) + #"\b";
And if those characters may appear in edge positions, use lookarounds rather than word boundaries:
var pattern = #"(?<!\w)" + searchString + #"(?=\w)";

replacing whole word with .Replace() using \b not working

I have a string in which I want to replace a whole word. This is what I have:
var TheWord = "SomeWord";
TheWord = "\b" + TheWord + "\b";
TheText = TheText.replace(TheWord, "SomeOtherWord");
I'm using "\b" because I only want to replace "SomeWord", not "SomeWordDifferent". The text looks like this: var TheHTML = '<div class="SomeWord">'; However, the replacement doesn't take place. What do I need to change?
You need to escape the backslashes. Try either of these...
TheWord = #"\b" + TheWord + #"\b";
or
TheWord = "\\b" + TheWord + "\\b";
I assume you are trying to use Regex. The method for this is
string Regex.Replace(string input, string replacment)
So I think this is what you want:
string text = ...; // text comes from somewhere
string pattern = #"\bSomeWord\b"; // escape \b (word boundary regex anchor), or use verbatim string literal, like here
var regex = new Regex(pattern);
text = regex.Replace(text, "SomeOtherWord");
Or simply the static version of Replace method as Tim wrote:
Regex.Replace(text, pattern, "SomeOtherWord");

Replace text in string with delimeters using Regex

I have a string something like,
string str = "(50%silicon +20%!(20%Gold + 80%Silver)| + 30%Alumnium)";
I need a Regular Expression which would Replace the contents in between ! and | with an empty string. The result should be (50%silicon +20% + 30%Alumnium).
If the string contains something like (with nested delimiters):
string str = "(50%silicon +20%!(80%Gold + 80%Silver + 20%!(20%Iron + 80%Silver)|)|
+ 30%Alumnium)";
The result should be (50%silicon +20% + 30%Alumnium) - ignoring the nested delimiters.
I've tried the following Regex, but it doesn't ignore the nesting:
Regex.Replace(str , #"!.+?\|", "", RegexOptions.IgnoreCase);
You are using the lazy quantifier +? which will look for the smallest possible substring that matches your regex. To get the result you are looking for, you want to use the greedy quantifier + which will match the largest substring possible.
The following regex (not tested in C# because I don't have it available, but this should work for any standard regex implementation) will do what you want:
'!.+\|'
using System.Text.RegularExpressions;
str = Regex.Replace(str , #"!.+?\|", "", RegexOptions.IgnoreCase);
Regex.Replace(str, #"!.+?\||\)\|", "", RegexOptions.IgnoreCase);
Works for both provided strings. I extended the regex with a 2nd check on ")/" to replace the leftover characters.

Categories