Replace text in string with delimeters using Regex - c#

I have a string something like,
string str = "(50%silicon +20%!(20%Gold + 80%Silver)| + 30%Alumnium)";
I need a Regular Expression which would Replace the contents in between ! and | with an empty string. The result should be (50%silicon +20% + 30%Alumnium).
If the string contains something like (with nested delimiters):
string str = "(50%silicon +20%!(80%Gold + 80%Silver + 20%!(20%Iron + 80%Silver)|)|
+ 30%Alumnium)";
The result should be (50%silicon +20% + 30%Alumnium) - ignoring the nested delimiters.
I've tried the following Regex, but it doesn't ignore the nesting:
Regex.Replace(str , #"!.+?\|", "", RegexOptions.IgnoreCase);

You are using the lazy quantifier +? which will look for the smallest possible substring that matches your regex. To get the result you are looking for, you want to use the greedy quantifier + which will match the largest substring possible.
The following regex (not tested in C# because I don't have it available, but this should work for any standard regex implementation) will do what you want:
'!.+\|'

using System.Text.RegularExpressions;
str = Regex.Replace(str , #"!.+?\|", "", RegexOptions.IgnoreCase);

Regex.Replace(str, #"!.+?\||\)\|", "", RegexOptions.IgnoreCase);
Works for both provided strings. I extended the regex with a 2nd check on ")/" to replace the leftover characters.

Related

Find and replace the string in paragraph

I want to empty the value between the hyphn for example need to clear the data in between the range of hyphen prefix and suffix then make it has empty string.
string templateContent = "Template content -macro- -UnitDetails- -testEmail- sending Successfully";
Output
templateContent = "Template content sending Successfully";
templateContent = Regex.Replace(templateContent, #"-\w*-\s?", string.Empty).TrimEnd(' ');
#"-\w*-\s" - is regex pattern for '-Word- '
- - pattern for -
\w - word character.
* - zero or any occurrences of \w
\s - pattern for whitespace character
? - marks \s as optional
TrimEnd(' ') - to remove trailing space if there was a pattern at end of the string
There are many ways to do this, however given your example the following should work
var split = templateContent
.Split(' ')
.Where(x => !x.StartsWith("-") && !x.EndsWith("-"));
var result = string.Join(" ",split);
Console.WriteLine(result);
Output
Template content sending Successfully
Full Demo Here
Note : I personally think regex is better suited to this
You can use regex for this
string regExp = "(-[a-zA-Z]*-)";
string tmp = Regex.Replace(templateContent , regExp, "");
string finalStr = Regex.Replace(tmp, " {2,}", " ");
var resultWithSpaces = Regex.Replace(templateContent, #"-\S+-", string.Empty);
This regular expression looks for two hyphens surrounding one or more characters that are not white space.
It will leave the spaces that were around the removed word. To get rid of those you can do another Regex to replace multiple spaces with a single space.
var result = Regex.Replace(resultWithSpaces, #"\s+", " ");

Use Regex in Dynamical Approach

Today I can do the hard code but later I would like to change it that the string word pattern can be applied in side of the #"\bfood\b". I want to make it into dynamical without using hardcoding
IN the futore I would like to have the word "chicken" instead of "food".
I tried to replace the code "#"\bfood\b" into #"\b" + pattern +"\b" but it doesn't work.
string inputText = "food ddd dd";
string dddd = "\bfood\b";
string pattern = "food";
Regex rx = new Regex(#"\bfood\b", RegexOptions.None);
MatchCollection mc = rx.Matches(inputText);
if (rx.Match(pattern).Success)
{
int dd = 3;
}
You should use
#"\b" + Regex.Escape(pattern) + #"\b"
Or a more generic:
#"(?<!\w)" + Regex.Escape(pattern) + #"(?!\w)"
Or with the string.format:
Regex rx = new Regex(string.Format(#"\b{0}\b", Regex.Escape(pattern)), RegexOptions.None);
Or with the string interpolation:
Regex rx = new Regex($#"(?<!\w){Regex.Escape(pattern)}(?!\w)", RegexOptions.None);
Now, why do I suggest (?<!\w) and (?!\w) lookarounds? Because these are word boundaries that are not context dependent. What if you decide to pass a |border| pattern? The \b\|border\|\b will most probably fail to match most of the cases you intended to match because \b will require a word character to appear before the first | and after the last |. The lookarounds will match the |border| string only if not enclosed with word characters.
The reason your #"\b" + pattern +"\b" didn't work is that the verbatim string literal # wasn't applied to both pieces of your regex building.
Fix this with either
#"\b" + pattern + #"\b"
Or even better use String.Format()
String.Format(#"\b{0}\b", pattern);
Or use C#6 string interpolation
$#"\b{pattern}\b";

C# RegEx to find a specific string or all words in a string

Looking it up, I thought I understood how to look up a string of multiple words in a sentence, but it does not find a match. Can someone tell me what I am doing wrong? I need to be able to find a single or multiple word match. I passed in "to find" to the method and it did not find the match. Also, if the user does not enclose their search phrase in quotes, I also need it to search on each word entered.
var pattern = #"\b\" + searchString + #"\b"; //searchString is passed in.
Regex rgx = new Regex(pattern);
var sentence = "I need to find a string in this sentence!";
Match match = rgx.Match(sentence);
if (match.Success)
{
// Do something with the match.
}
Just remove the second \ in the first #"\b\":
var pattern = #"\b" + searchString + #"\b";
^
See IDEONE demo
Note that in case you have special regex metacharacters (like (, ), [, +, *, etc.) in your searchStrings, you can use Regex.Escape() to escape them:
var pattern = #"\b" + Regex.Escape(searchString) + #"\b";
And if those characters may appear in edge positions, use lookarounds rather than word boundaries:
var pattern = #"(?<!\w)" + searchString + #"(?=\w)";

Regex to remove specific string if exist

I wanna remove the -L from the end of my string if exists
So
ABCD => ABCD
ABCD-L => ABCD
at the moment I'm using something like the line below which uses the if/else type of arrangement in my Regex, however, I have a feeling that it should be way more easier than this.
var match = Regex.Match("...", #"(?(\S+-L$)\S+(?=-L)|\S+)");
How about just doing:
Regex rgx = new Regex("-L$");
string result = rgx.Replace("ABCD-L", "");
So basically: if the string ends with -L, replace that part with an empty string.
If you want to not only invoke the replacement at the end of the string, but also at the end of a word, you can add an additional switch to detect word boundaries (\b) in addition to the end of the string:
Regex rgx = new Regex("-L(\b|$)");
string result = rgx.Replace("ABCD-L ABCD ABCD-L", "");
Note that detecting word boundaries can be a little ambiguous. See here for a list of characters that are considered to be word characters in C#.
You also can use String.Replace() method to find a specific string inside a string and replace it with another string in this case with an empty string.
http://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx
Use Regex.Replace function,
Regex.Replace(string, #"(\S+?)-L(?=\s|$)", "$1")
DEMO
Explanation:
( group and capture to \1:
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times)
) end of \1
-L '-L'
(?= look ahead to see if there is:
\s whitespace (\n, \r, \t, \f, and " ")
| OR
$ before an optional \n, and the end of
the string
) end of look-ahead
You certainly can use Regex for this, but why when using normal string functions is clearer?
Compare this:
text = text.EndsWith("-L")
? text.Substring(0, text.Length - "-L".Length)
: text;
to this:
text = Regex.Replace(text, #"(\S+?)-L(?=\s|$)", "$1");
Or better yet, define an extension method like this:
public static string RemoveIfEndsWith(this string text, string suffix)
{
return text.EndsWith(suffix)
? text.Substring(0, text.Length - suffix.Length)
: text;
}
Then your code can look like this:
text = text.RemoveIfEndsWith("-L");
Of course you can always define the extension method using the Regex. At least then your calling code looks a lot cleaner and is far more readable and maintainable.

Using Regex Replace instead of String Replace

I am not clued up on Regex as much as I should be, so this may seem like a silly question.
I am splitting a string into a string[] with .Split(' ').
The purpose is to check the words, or replace any.
The problem I'm having now, is that for the word to be replaces, it has to be an exact match, but with the way I'm splitting it, there might be a ( or [ with the split word.
So far, to counter that, I'm using something like this:
formattedText.Replace(">", "> ").Replace("<", " <").Split(' ').
This works fine for now, but I want to incorporate more special chars, such as [;\\/:*?\"<>|&'].
Is there a quicker way than the method of my replacing, such as Regex? I have a feeling my route is far from the best answer.
EDIT
This is an (example) string
would be replaced to
This is an ( example ) string
If you want to replace whole words, you can do that with a regular expression like this.
string text = "This is an example (example) noexample";
string newText = Regex.Replace(text, #"\bexample\b", "!foo!");
newText will contain "This an !foo! (!foo!) noexample"
The key here is that the \b is the word break metacharacter. So it will match at the beginning or end of a line, and the transitions between word characters (\w) and non-word characters (\W). The biggest difference between it and using \w or \W is that those won't match at the beginning or end of lines.
I thing this is the right thing you want
if you want these -> ;\/:*?"<>|&' symbols to replace
string input = "(exam;\\/:*?\"<>|&'ple)";
Regex reg = new Regex("[;\\/:*?\"<>|&']");
string result = reg.Replace(input, delegate(Match m)
{
return " " + m.Value + " ";
});
if you want to replace all characters except a-zA-Z0-9_
string input = "(example)";
Regex reg = new Regex(#"\W");
string result = reg.Replace(input, delegate(Match m)
{
return " " + m.Value + " ";
});

Categories