Regex.Replace to replace sub-string in a string - c#

I am trying to replace ValueBinding="metasys-value:111,813? in following string
Canvas.Top="494" Width="75" Height="75" jcge:RubberBand.ID="ce4f76db-9efc-4b5d-b48b-b62f727d53ef" ValueBinding="meta-value:111,813?analogCommand=37&enumCommand=37" AlarmBinding="meta-item:Alarm%20-%20Present%20Value" TrendBinding="meta-item:Trend%20-%20Present%20Value" SecondaryValueBinding="meta-value:222,813?analogCommand=10&enumCommand=44" SecondaryTrendBinding="meta-item:Trend%20-%20Present%20Value" SensorType="Bulb"
by new string by using
patch = Regex.Replace(patch, "ValueBinding=" + "\".*,813", "ValueBinding=" + "\"" + primaryObjectReference + ",813");
but it replace string till second ,813 occurrence. how can I replace only ValueBinding="metasys-value:111,813? with new value

Use [^\"]* or .*? instead of .* and add \b to the start of the regex.
\b matches a word boundary, eg. the space before ValueBinding.
[^\"]* will match all characters except ", .*? will match everything non greedily.
In your case:
patch = Regex.Replace(patch, "\\bValueBinding=" + "\".*?,813", "ValueBinding=" + "\"" + primaryObjectReference + ",813");

This sounds logic because in you expression you have ".,813", which will much 111,813 and 222,813 as well. If i have understand it well, you must replace ".,813" with ".*111,813\?".

Related

Detecting a word followed by a dot or whitespace using regex

I am using regex and C# to find occurrences of a particular word using
Regex regex = new Regex(#"\b" + word + #"\b");
How can I modify my Regex to only detect the word if it is either preceded with a whitespace, followed with a whitespace or followed with a dot?
Examples:
this.Button.Value - should match
this.value - should match
document.thisButton.Value - should not match
You may use lookarounds and alternation to check for the 2 possibilities when a keyword is enclosed with spaces or is just followed with a dot:
var line = "this.Button.Value\nthis.value\ndocument.thisButton.Value";
var word = "this";
var rx =new Regex(string.Format(#"(?<=\s)\b{0}\b(?=\s)|\b{0}\b(?=\.)", word));
var result = rx.Replace(line, "NEW_WORD");
Console.WriteLine(result);
See IDEONE demo and a regex demo.
The pattern matches:
(?<=\s)\bthis\b(?=\s) - whole word "this" that is preceded with whitespace (?<=\s) and that is followed with whitespace (?=\s)
| - or
\bthis\b(?=\.) - whole word "this" that is followed with a literal . ((?=\.))
Since lookarounds are not consuming characters (the regex index remains where it was) the characters matched with them are not placed in the match value, and are thus untouched during the replacement.
If i am understanding you correctly:
Regex regex = new Regex(#"\b" + (word " " || ".") + #"\b");
Regex regex = new Regex(#"((?<=( \.))" + word + #"\b)" + "|" + #"(\b" + word + #"[ .])");
However, note that this could cause trouble if word contains characters that have special meanings in Regular Expressions. I'm assuming that word contains alpha-numeric characters only.
The (?<=...) match group checks for preceding and (?=...) checks for following, both without including them in the match.
Regex regex = new Regex(#"(?<=\s)\b" + word + #"\b|\b" + word + #"\b(?=[\s\.])");
EDIT: Pattern updated.
EDIT 2: Online test: http://ideone.com/RXRQM5

Regex tokenize issue

I have strings input by the user and want to tokenize them. For that, I want to use regex and now have a problem with a special case.
An example string is
Test + "Hello" + "Good\"more" + "Escape\"This\"Test"
or the C# equivalent
#"Test + ""Hello"" + ""Good\""more"" + ""Escape\""This\""Test"""
I am able to match the Test and + tokens, but not the ones contained by the ". I use the " to let the user specify that this is literally a string and not a special token. Now if the user wants to use the " character in the string, I thought of allowing him to escape it with a \.
So the rule would be: Give me everything between two " ", but the character in front of the last " can not be a \.
The results I expect are: "Hello" "Good\"more" "Escape\"This\"Test"
I need the " " characters to be in the final match so I know that this is a string.
I currently have the regex #"""([\w]*)(?<!\\"")""" which gives me the following results: "Hello" "more" "Test"
So the look behind isn't working as I want it to be. Does anyone know the correct way to get the string like I want?
Here's an adaption of a regex I use to parse command lines:
(?!\+)((?:"(?:\\"|[^"])*"?|\S)+)
Example here at regex101
(adaption is the negative look-ahead to ignore + and checking for \" instead of "")
Hope this helps you.
Regards.
Edit:
If you aren't interested in surrounding quotes:
(?!\+)(?:"((?:\\"|[^"])*)"?|(\S+))
To make it safer, I'd suggest getting all the substrings within unescaped pairs of "..." with the following regex:
^(?:[^"\\]*(?:\\.[^"\\]*)*("[^"\\]*(?:\\.[^"\\]*)*"))+
It matches
^ - start of string (so that we could check each " and escape sequence)
(?: - Non-capturing group 1 serving as a container for the subsequent subpatterns
[^"\\]*(?:\\.[^"\\]*)* - matches 0+ characters other than " and \ followed with 0+ sequences of \\. (any escape sequence) followed with 0+ characters other than " and \ (thus, we avoid matching the first " that is escaped, and it can be preceded with any number of escape sequences)
("[^"\\]*(?:\\.[^"\\]*)*") - Capture group 1 matching "..." substrings that may contain any escape sequences inside
)+ - end of the first non-capturing group that is repeated 1 or more times
See the regex demo and here is a C# demo:
var rx = "^(?:[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*(\"[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*\"))+";
var s = #"Test + ""Hello"" + ""Good\""more"" + \""Escape\""This\""Test\"" + ""f""";
var matches = Regex.Matches(s, rx)
.Cast<Match>()
.SelectMany(m => m.Groups[1].Captures.Cast<Capture>().Select(p => p.Value).ToArray())
.ToList();
Console.WriteLine(string.Join("\n", matches));
UPDATE
If you need to remove the tokens, just match and capture all outside of them with this code:
var keep = "[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*";
var rx = string.Format("^(?:(?<keep>{0})\"{0}\")+(?<keep>{0})$", keep);
var s = #"Test + ""Hello"" + ""Good\""more"" + \""Escape\""This\""Test\"" + ""f""";
var matches = Regex.Matches(s, rx)
.Cast<Match>()
.SelectMany(m => m.Groups["keep"].Captures.Cast<Capture>().Select(p => p.Value).ToArray())
.ToList();
Console.WriteLine(string.Join("", matches));
See another demo
Output: Test + + + \"Escape\"This\"Test\" + for #"Test + ""Hello"" + ""Good\""more"" + \""Escape\""This\""Test\"" + ""f""";.

Regex.Replace not replacing the whole string instead replacing chars in string

My code is as follow:
ArticleContent = Regex.Replace(_article.Article, "[QUOTE]", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);
The problem i'm facing here is, the Regex is not replacing the whole occurrence of the string '[QUOTE]'. Instead it is searching for the letters q,u,o,t,e and replacing them with the replace string. I know the issue is because of the square brackets, but i want that to be replaced as well. Please help.
You must escape square brackets! And don't forget REGEXP are case sensitive. Here's my correction to your code:
ArticleContent = Regex.Replace(_article.Article, "\[quote\]", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);
By the way, I don't see any accourrence of 'quote' enclosed in brackets, so I'm not sure I got what you're trying to do...
Use an non capturing group to replace all the occurrences of the string QUOTE with your desired string ,
(?:QUOTE)
So your code should be,
ArticleContent = Regex.Replace(_article.Article, "(?:QUOTE)", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);
OR
Try to escape the square brackets, if you want to replace [QUOTE] with some-other string becuase suare brackets in regex have a special meaning.
\[QUOTE\]
And your code should be,
ArticleContent = Regex.Replace(_article.Article, "\[QUOTE\]", "<p class='quote'><span style='font-size:1.8em !important;'>" + _article.NewFields.Quotes + "</span></p>", RegexOptions.IgnoreCase);

Using Regex Replace instead of String Replace

I am not clued up on Regex as much as I should be, so this may seem like a silly question.
I am splitting a string into a string[] with .Split(' ').
The purpose is to check the words, or replace any.
The problem I'm having now, is that for the word to be replaces, it has to be an exact match, but with the way I'm splitting it, there might be a ( or [ with the split word.
So far, to counter that, I'm using something like this:
formattedText.Replace(">", "> ").Replace("<", " <").Split(' ').
This works fine for now, but I want to incorporate more special chars, such as [;\\/:*?\"<>|&'].
Is there a quicker way than the method of my replacing, such as Regex? I have a feeling my route is far from the best answer.
EDIT
This is an (example) string
would be replaced to
This is an ( example ) string
If you want to replace whole words, you can do that with a regular expression like this.
string text = "This is an example (example) noexample";
string newText = Regex.Replace(text, #"\bexample\b", "!foo!");
newText will contain "This an !foo! (!foo!) noexample"
The key here is that the \b is the word break metacharacter. So it will match at the beginning or end of a line, and the transitions between word characters (\w) and non-word characters (\W). The biggest difference between it and using \w or \W is that those won't match at the beginning or end of lines.
I thing this is the right thing you want
if you want these -> ;\/:*?"<>|&' symbols to replace
string input = "(exam;\\/:*?\"<>|&'ple)";
Regex reg = new Regex("[;\\/:*?\"<>|&']");
string result = reg.Replace(input, delegate(Match m)
{
return " " + m.Value + " ";
});
if you want to replace all characters except a-zA-Z0-9_
string input = "(example)";
Regex reg = new Regex(#"\W");
string result = reg.Replace(input, delegate(Match m)
{
return " " + m.Value + " ";
});

Custom word boundaries in regular expression

I am trying to match words using a regular expression, but sadly the word boundary character (\b) does not include enough characters for my taste, so I want to add more. (in that precise case, the "+" character)
Here is what I used to have (it is C# but not very relevant) :
string expression = Regex.Escape(word);
Regex regExp = new Regex(#"\b" + expression + #"\b", RegexOptions.IgnoreCase);
This particular regex did not match "C++" and I thought it was a real bummer. So I tried using the \w character in a character class that way, along with the + character :
string expression = Regex.Escape(word);
Regex regExp = new Regex(#"(?![\w\+])" + expression + #"(?![\w\+])", RegexOptions.IgnoreCase);
But now, nothing gets matched... is there something I am missing?
(no need to escape the + in a character class)
The problem is that you use a negative lookahead first whereas you should use a negative lookbehind. Try:
#"(?<![\w+])" + expression + #"(?![\w+])"

Categories