Replacement evaluator contains number after replacement group - c#

I want to dynamically adjust my replacement pattern and evaluator:
string pattern = "np";
string replacement = "ab";
string retval = Regex.Replace("Input", #"(.*)" + pattern + #"(.*)", #"$1" + replacement + #"$2";
// retval = "Iabut" => correct
string replacement = "12";
retval = Regex.Replace("Input", #"(.*)" + pattern + #"(.*)", #"$1" + replacement + #"$2";
// retval = "$112ut" => wrong
The problem is that in the second case my evaluator is "$112$2" so my first replacement group would be $112.
Is it possible to avoid such problems directly or do I need to put a delimiting character between my group definition and my string?

As a replacement argument, use
"${1}" + replacement.Replace("$", "$$") + "$2"
The braces in ${1} will make sure the first group is referred to and .Replace("$", "$$") will make it work well if the replacement has $ inside.

Related

Replace specific instance of item in string with regex

I am trying to build a parser that will replace a specific instance of a string. So for example I have a string that is sn+sn+sn*9, and I want to only replace the third instance of sn How can I do this with regex?
I have tried
var expression = "sn+sn+sn*9";
var regex = new Regex("sn");
expression = regex.Replace("sn",4.ToString());
//expression = regex.Replace("sn",4.ToString(),1,2);
Thanks in advance
int x = 0;
string repl = "ANYTHING";
string s = Regex.Replace("sn+sn+sn*9", "sn", m => (++x) == 3 ? repl : m.Value);
Explanation
The x variable is used to track the occurrence of sought text. As soon as Regex finds third occurrence, the MatchEvaluator delegate replaces this string with whatever is in repl variable. Otherwise, it just returns the same found string.
Here is one option, which uses the following regex pattern:
(?<=.*?\bsn\b.*?\bsn\b.*?)\bsn\b
This pattern literally says to replace sn, as a single term, when we have already seen two sn terms previously in the string. I use the replacement blah in the sample code below, though you may use any value you wish.
var term = #"sn";
var replacement = "blah";
var expression = term + "+" + term + "+" + term + "*9";
var pattern = #"(?<=.*?\b" + term + #"\b.*?\b" + term + #"\b.*?)\b" + term + #"\b";
var exp_trim = Regex.Replace(expression, #pattern, replacement);
Console.WriteLine(exp_trim);
sn+sn+blah*9
Demo
Here is another method using the index and length of the match
string expression = "sn+sn+sn*9";
Regex regex = new Regex("sn");
MatchCollection matches = regex.Matches(expression);
expression = expression.Substring(0, matches[2].Index) + "4".ToString() + expression.Substring(matches[2].Index + matches[2].Length);

Regex to insert space C#

I have some string. I need a regex that will replace each occurrence of symbol that is not space + '<' with the same symbol + space + '<'.
In other words if there is '<' without ' ' before it it must add the space.
I've tried something like :
string pattern = "[^ ]<";
string replacement = "$0" + "<";
string result = Regex.Replace(html, pattern, replacement);
Obviously not working as I want.
string pattern = "([^ ])<";
string replacement = "$1" + " <";
You can try something like this.

C# Regex wildcard multiple replace

Doing a search for different strings using wildcards, such as doing a search for test0? (there is a space after the ?). The strings the search produces are:
test01
test02
test03
(and so on)
The replacement text should be for example:
test0? -
The wildcard above in test0? - represents the 1, 2, or 3...
So, the replacement strings should be:
test01 -
test02 -
test03 -
string pattern = WildcardToRegex(originalText);
fileName = Regex.Replace(originalText, pattern, replacementText);
public string WildcardToRegex(string pattern)
{
return "^" + System.Text.RegularExpressions.Regex.Escape(pattern).
Replace("\\*", ".*").Replace("\\?", ".") + "$";
}
The problem is saving the new string with the original character(s) plus the added characters. I could search the string and save the original with some string manipulation, but that seems like too much overhead. There has to be an easier way.
Thanks for any input.
EDIT:
Search for strings using the wildcard ?
Possible string are:
test01 someText
test02 someotherText
test03 moreText
Using Regex, the search string patter will be:
test0? -
So, each string should then read:
test01 - someText
test02 - someotherText
test03 - moreText
How to keep the character that was replaced by the regex wildcard '?'
As my code stands, it will come out as test? - someText
That is wrong.
Thanks.
EDIT Num 2
First, thanks everyone for their answers and direction.
It did help and lead me to the right track and now I can better ask the exact question:
It has to do with substitution.
Inserting text after the Regex.
The sample string I gave, they may not always be in that format. I have been looking into substitution but just can't seem to get the syntax right. And I am using VS 2008.
Any more suggestions?
Thanks
If you want to replace "test0? " with "test0? -", you would write:
string bar = Regex.Replace(foo, "^test0. ", "$0- ");
The key here is the $0 substitution, which will include the matched text.
So if I understand your question correctly, you just want your replacementText to be "$0- ".
If I understand the question correctly, couldn't you just use a match?
//Convert pattern to regex (I'm assuming this can be done with your "originalText")
Regex regex = pattern;
//For each match, replace the found pattern with the original value + " -"
foreach (Match m in regex.Matches)
{
RegEx.Replace(pattern, m.Groups[0].Value + " -");
}
So I'm not 100% clear on what you're doing, but I'll give it a try.
I'm going with the assumption that you want to use "file wildcards" (?/*) and search for a set of values that match (while retaining the values stored using the placeholder itself), then replace it with the new value (re-inserting those placeholders). given that, and probably a lot of overkill (since your requirement is kind of weird) here's what I came up with:
// Helper function to turn the file search pattern in to a
// regex pattern.
private Regex BuildRegexFromPattern(String input)
{
String pattern = String.Concat(input.ToCharArray().Select(i => {
String c = i.ToString();
return c == "?" ? "(.)"
: c == "*" ? "(.*)"
: c == " " ? "\\s"
: Regex.Escape(c);
}));
return new Regex(pattern);
}
// perform the actual replacement
private IEnumerable<String> ReplaceUsingPattern(IEnumerable<String> items, String searchPattern, String replacementPattern)
{
Regex searchRe = BuildRegexFromPattern(searchPattern);
return items.Where(s => searchRe.IsMatch(s)).Select (s => {
Match match = searchRe.Match(s);
Int32 m = 1;
return String.Concat(replacementPattern.ToCharArray().Select(i => {
String c = i.ToString();
if (m > match.Groups.Count)
{
throw new InvalidOperationException("Replacement placeholders exceeds locator placeholders.");
}
return c == "?" ? match.Groups[m++].Value
: c == "*" ? match.Groups[m++].Value
: c;
}));
});
}
Then, in practice:
String[] samples = new String[]{
"foo01", "foo02 ", "foo 03",
"bar0?", "bar0? ", "bar03 -",
"test01 ", "test02 ", "test03 "
};
String searchTemplate = "test0? ";
String replaceTemplate = "test0? -";
var results = ReplaceUsingPattern(samples, searchTemplate, replaceTemplate);
Which, from the samples list above, gives me:
matched: & modified to:
test01 test01 -
test02 test02 -
test03 test03 -
However, if you really want to save headaches you should be using replacement references. there's no need to re-invent the wheel. The above, with replacements, could have been changed to:
Regex searchRe = new Regex("test0(.*)\s");
samples.Select(x => searchRe.Replace(s, "test0$1-"));
You can catch any piece of your matched string and place anywhere in the replace statement, using symbol $ followed by the index of catched element (it starts at index 1).
You can catch element with parenthesis "()"
Example:
If I have several strings with testXYZ, being XYZ a 3-digit number, and I need to replace it, say, with testZYX, inverting the 3 digits, I would do:
string result = Regex.Replace(source, "test([0-9])([0-9])([0-9])", "test$3$2$1");
So, in your case, it can be done:
string result = Regex.Replace(source, "test0([0-9]) ", "test0$1 - ");

Can I use variables in pattern in Regex (C#)

I have some HTML-text, where I need to replace words to links on them. For example, I have text with word "PHP", and want to replace it with PHP. And there are many words that I need to replace.
My code:
public struct GlossaryReplace
{
public string word; // here the words, e.g. PHP
public string link; // here the links to replace, e.g. glossary.html#php
}
public static GlossaryReplace[] Replaces = null;
IHTMLDocument2 html_doc = webBrowser1.Document.DomDocument as IHTMLDocument2;
string html_content = html_doc.body.outerHTML;
for (int i = 0; i < Replaces.Length; i++)
{
String substitution = "<a class=\"glossary\" href=\"" + Replaces[i].link + "\">" + Replaces[i].word + "</a>";
html_content = Regex.Replace(html_content, #"\b" + Replaces[i].word + "\b", substitution);
}
html_doc.body.innerHTML = html_content;
The trouble is - this is not working :( But,
html_content = Regex.Replace(html_content, #"\bPHP\b", "some replacement");
this code works well! I can't understand my error!
The # prefix for strings only apply to the immediately following string, so when you concatenate strings you may have to use it on each string.
Change this:
html_content = Regex.Replace(html_content, #"\b" + Replaces[i].word + "\b", substitution);
to:
html_content = Regex.Replace(html_content, #"\b" + Replaces[i].word + #"\b", substitution);
In a regular expression \b means a word boundary, but in a string it means a backspace character (ASCII 8). You get a compiler error if you use an escape code that doesn't exist in a string (e.g. \s), but not in this case as the code exist both in strings and regular expressions.
On a side note; a method that is useful when creating regular expression patterns dynamically is the Regex.Escape method. It escapes characters in a string to be used in a pattern, so #"\b" + Regex.Escape(Replaces[i].word) + #"\b" would make the pattern work even if the word contains characters that have a special meaning in a regular expression.
You forgot a # here:
#"\b" + Replaces[i].word + "\b"
Should be:
#"\b" + Replaces[i].word + #"\b"
I'd also recommend that you use an HTML parser if you are modifying HTML. HTML Agility Pack is a useful library for this purpose.

Replace char in a string

how to change
XXX#YYY.ZZZ into XXX_YYY_ZZZ
One way i know is to use the string.replace(char, char) method,
but i want to replace "#" & "." The above method replaces just one char.
one more case is what if i have XX.X#YYY.ZZZ...
i still want the output to look like XX.X_YYY_ZZZ
Is this possible?? any suggestions thanks
So, if I'm understanding correctly, you want to replace # with _, and . with _, but only if . comes after #? If there is a guaranteed # (assuming you're dealing with e-mail addresses?):
string e = "XX.X#YYY.ZZZ";
e = e.Substring(0, e.IndexOf('#')) + "_" + e.Substring(e.IndexOf('#')+1).Replace('.', '_');
Here's a complete regex solution that covers both your cases. The key to your second case is to match dots after the # symbol by using a positive look-behind.
string[] inputs = { "XXX#YYY.ZZZ", "XX.X#YYY.ZZZ" };
string pattern = #"#|(?<=#.*?)\.";
foreach (var input in inputs)
{
string result = Regex.Replace(input, pattern, "_");
Console.WriteLine("Original: " + input);
Console.WriteLine("Modified: " + result);
Console.WriteLine();
}
Although this is simple enough to accomplish with a couple of string Replace calls. Efficiency is something you will need to test depending on text size and number of replacements the code will make.
You can use the Regex.Replace method:
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace(v=VS.90).aspx
You can use the following extension method to do your replacement without creating too many temporary strings (as occurs with Substring and Replace) or incurring regex overhead. It skips to the # symbol, and then iterates through the remaining characters to perform the replacement.
public static string CustomReplace(this string s)
{
var sb = new StringBuilder(s);
for (int i = Math.Max(0, s.IndexOf('#')); i < sb.Length; i++)
if (sb[i] == '#' || sb[i] == '.')
sb[i] = '_';
return sb.ToString();
}
you can chain replace
var newstring = "XX.X#YYY.ZZZ".Replace("#","_").Replace(".","_");
Create an array with characters you want to have replaced, loop through array and do the replace based off the index.
Assuming data format is like XX.X#YYY.ZZZ, here is another alternative with String.Split(char seperator):
string[] tmp = "XX.X#YYY.ZZZ".Split('#');
string newstr = tmp[0] + "_" + tmp[1].Replace(".", "_");

Categories