Can I use variables in pattern in Regex (C#) - c#

I have some HTML-text, where I need to replace words to links on them. For example, I have text with word "PHP", and want to replace it with PHP. And there are many words that I need to replace.
My code:
public struct GlossaryReplace
{
public string word; // here the words, e.g. PHP
public string link; // here the links to replace, e.g. glossary.html#php
}
public static GlossaryReplace[] Replaces = null;
IHTMLDocument2 html_doc = webBrowser1.Document.DomDocument as IHTMLDocument2;
string html_content = html_doc.body.outerHTML;
for (int i = 0; i < Replaces.Length; i++)
{
String substitution = "<a class=\"glossary\" href=\"" + Replaces[i].link + "\">" + Replaces[i].word + "</a>";
html_content = Regex.Replace(html_content, #"\b" + Replaces[i].word + "\b", substitution);
}
html_doc.body.innerHTML = html_content;
The trouble is - this is not working :( But,
html_content = Regex.Replace(html_content, #"\bPHP\b", "some replacement");
this code works well! I can't understand my error!

The # prefix for strings only apply to the immediately following string, so when you concatenate strings you may have to use it on each string.
Change this:
html_content = Regex.Replace(html_content, #"\b" + Replaces[i].word + "\b", substitution);
to:
html_content = Regex.Replace(html_content, #"\b" + Replaces[i].word + #"\b", substitution);
In a regular expression \b means a word boundary, but in a string it means a backspace character (ASCII 8). You get a compiler error if you use an escape code that doesn't exist in a string (e.g. \s), but not in this case as the code exist both in strings and regular expressions.
On a side note; a method that is useful when creating regular expression patterns dynamically is the Regex.Escape method. It escapes characters in a string to be used in a pattern, so #"\b" + Regex.Escape(Replaces[i].word) + #"\b" would make the pattern work even if the word contains characters that have a special meaning in a regular expression.

You forgot a # here:
#"\b" + Replaces[i].word + "\b"
Should be:
#"\b" + Replaces[i].word + #"\b"
I'd also recommend that you use an HTML parser if you are modifying HTML. HTML Agility Pack is a useful library for this purpose.

Related

Replacement evaluator contains number after replacement group

I want to dynamically adjust my replacement pattern and evaluator:
string pattern = "np";
string replacement = "ab";
string retval = Regex.Replace("Input", #"(.*)" + pattern + #"(.*)", #"$1" + replacement + #"$2";
// retval = "Iabut" => correct
string replacement = "12";
retval = Regex.Replace("Input", #"(.*)" + pattern + #"(.*)", #"$1" + replacement + #"$2";
// retval = "$112ut" => wrong
The problem is that in the second case my evaluator is "$112$2" so my first replacement group would be $112.
Is it possible to avoid such problems directly or do I need to put a delimiting character between my group definition and my string?
As a replacement argument, use
"${1}" + replacement.Replace("$", "$$") + "$2"
The braces in ${1} will make sure the first group is referred to and .Replace("$", "$$") will make it work well if the replacement has $ inside.

Regex to insert space C#

I have some string. I need a regex that will replace each occurrence of symbol that is not space + '<' with the same symbol + space + '<'.
In other words if there is '<' without ' ' before it it must add the space.
I've tried something like :
string pattern = "[^ ]<";
string replacement = "$0" + "<";
string result = Regex.Replace(html, pattern, replacement);
Obviously not working as I want.
string pattern = "([^ ])<";
string replacement = "$1" + " <";
You can try something like this.

Regex.Replace not replacing the whole string instead replacing chars in string

My code is as follow:
ArticleContent = Regex.Replace(_article.Article, "[QUOTE]", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);
The problem i'm facing here is, the Regex is not replacing the whole occurrence of the string '[QUOTE]'. Instead it is searching for the letters q,u,o,t,e and replacing them with the replace string. I know the issue is because of the square brackets, but i want that to be replaced as well. Please help.
You must escape square brackets! And don't forget REGEXP are case sensitive. Here's my correction to your code:
ArticleContent = Regex.Replace(_article.Article, "\[quote\]", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);
By the way, I don't see any accourrence of 'quote' enclosed in brackets, so I'm not sure I got what you're trying to do...
Use an non capturing group to replace all the occurrences of the string QUOTE with your desired string ,
(?:QUOTE)
So your code should be,
ArticleContent = Regex.Replace(_article.Article, "(?:QUOTE)", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);
OR
Try to escape the square brackets, if you want to replace [QUOTE] with some-other string becuase suare brackets in regex have a special meaning.
\[QUOTE\]
And your code should be,
ArticleContent = Regex.Replace(_article.Article, "\[QUOTE\]", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);

replacing whole word with .Replace() using \b not working

I have a string in which I want to replace a whole word. This is what I have:
var TheWord = "SomeWord";
TheWord = "\b" + TheWord + "\b";
TheText = TheText.replace(TheWord, "SomeOtherWord");
I'm using "\b" because I only want to replace "SomeWord", not "SomeWordDifferent". The text looks like this: var TheHTML = '<div class="SomeWord">'; However, the replacement doesn't take place. What do I need to change?
You need to escape the backslashes. Try either of these...
TheWord = #"\b" + TheWord + #"\b";
or
TheWord = "\\b" + TheWord + "\\b";
I assume you are trying to use Regex. The method for this is
string Regex.Replace(string input, string replacment)
So I think this is what you want:
string text = ...; // text comes from somewhere
string pattern = #"\bSomeWord\b"; // escape \b (word boundary regex anchor), or use verbatim string literal, like here
var regex = new Regex(pattern);
text = regex.Replace(text, "SomeOtherWord");
Or simply the static version of Replace method as Tim wrote:
Regex.Replace(text, pattern, "SomeOtherWord");

C# Regex replace in string only outside tags

I have a string, which represents part of xml.
string text ="word foo<tag foo='a' />another word "
and I need to replace particular words in this string. So I used this code:
Regex regex = new Regex("\\b" + co + "\\b", RegexOptions.IgnoreCase);
return regex.Replace(text, new MatchEvaluator(subZvyrazniStr));
static string subZvyrazniStr(Match m)
{
return "<FtxFraze>" + m.ToString() + "</FtxFraze>";
}
But the problem of my code is, that it also replaces string inside tags, which i don't want to. So what should I add, to replace words only outside tags?
Ex.: when I set variable co to "foo" I want to return "word <FtxFraze>foo</FtxFraze><tag foo='a' />another word"
Thanks
A simple trick like this may suffice in some cases if you are not that picky:
\bfoo\b(?![^<>]*>)
This is what you want
(?<!\<[\w\s]*?)\bfoo\b(?![\w\s]*?>)
works here
I had answered a related question here
Try this regex:
Regex r = new Regex(#"\b" + rep + #".*?(?=\<)\b", RegexOptions.IgnoreCase);

Categories