Replace whole word with a symbol using C# Regex - c#

So I am trying to replace a word like #theplace or #theplaces using a Regex pattern like:
String Pattern = string.Format(#"\b{0}\b", PlaceName);
But when I do the replacement, it is not finding the pattern, I am guessing it is the # symbol that is the problem.
Can someone show me what I need to do to the Regex pattern to get it to work?

The following code will replace any instances of #thepalace or #thepalaces with <replacement>.
var result = Regex.Replace(
"some text with #thepalace or #thepalaces in it."
+ "\r\nHowever, #thepalacefoo and bar#thepalace won't be replaced.", // input
#"\B#thepalaces?\b", // pattern
"<replacement>"); // replacement text
The ? makes the preceding character, s, optional. I'm using the static Regex.Replace method.
The \b matches boundaries between word and non-word characters. \B matches every boundary that \b does not. See regex boundaries.
Result
some text with <replacement> or <replacement> in it.
However, #thepalacefoo and bar#thepalace won't be replaced.

Your problem* is the \b (word boundary) before the #. There is no word boundary between a space and an #.
You could just remove it, or replace it with a non-boundary, which is a capital B.
string Pattern = string.Format(#"\B{0}\b", PlaceName);
* assuming that PlaceName begins with #.

Try this:
string PlaceName="theplace", Replacement ="...";
string Pattern = String.Format(#"#\b{0}\b", PlaceName);
string Result = Regex.Replace(input, Pattern, Replacement);

Related

Regex - Removing specific characters before the final occurance of #

So, I'm trying to remove certain characters [.&#] before the final occurance of an #, but after that final #, those characters should be allowed.
This is what I have so far.
string pattern = #"\.|\&|\#(?![^#]+$)|[^a-zA-Z#]";
string input = "username#middle&something.else#company.com";
// input, pattern, replacement
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
Output: usernamemiddlesomethingelse#companycom
This currently removes all occurances of the specified characters, apart from the final #. I'm not sure how to get this to work, help please?
You may use
[.&#]+(?=.*#)
Or, equivalent [.&#]+(?![^#]*$). See the regex demo.
Details
[.&#]+ - 1 or more ., & or # chars
(?=.*#) - followed with any 0+ chars (other than LF) as many as possible and then a #.
See the C# demo:
string pattern = #"[.&#]+(?=.*#)";
string input = "username#middle&something.else#company.com";
string result = Regex.Replace(input, pattern, string.Empty);
Console.WriteLine(result);
// => usernamemiddlesomethingelse#company.com
Just a simple solution (and alternative to complex regex) using Substring and LastIndexOf:
string pattern = #"[.#&]";
string input = "username#middle&something.else#company.com";
string inputBeforeLastAt = input.Substring(0, input.LastIndexOf('#'));
// input, pattern, replacement
string result = Regex.Replace(inputBeforeLastAt, pattern, string.Empty) + input.Substring(input.LastIndexOf('#'));
Console.WriteLine(result);
Try it with this fiddle.

How replace whitespaces (unicode to utf-8) with a regex C#

I'm trying to do a replace regex in C #. The method that I'm trying to write replace some unicode character (spaces) by normal space in UTF-8.
Let me explain with code. I'm not good writting regular expressions, culture information and regex.
//This method replace white spaces in unicode by whitespaces UTF-8
public static string cleanUnicodeSpaces(string value)
{
//This first pattern works but, remove other special characteres
//For example: mark accents
//string pattern = #"[^\u0000-\u007F]+";
string cleaned = "";
string pattern = #"[^\u0020\u0009\u000D]+"; //Unicode characters
string replacement = ""; //Replace by UTF-8 space
Regex regex = new Regex(pattern);
cleaned = regex.Replace(value, replacement).Trim(); //Trim by quit spaces
return cleaned;
}
Unicode spaces
HT:U+0009 = Character tabulation
LF:U+000A = Line Feed
CR:U+000D = Carriage Return
What I doing wrong?
Source
Unicode Characteres: https://unicode-table.com/en
White Spaces:https://en.wikipedia.org/wiki/Whitespace_character
Regex: https://msdn.microsoft.com/es-es/library/system.text.regularexpressions.regex(v=vs.110).aspx
SOLUTION
Thanks to #wiktor-stribiżew and #mathias-r-jessen, solution:
string pattern = #"[\u0020\u0009\u000D\u00A0]+";
//I include \u00A0 for replace &nbsp
Your regex - [^\u0020\u0009\u000D]+ - is a negated character class that matches any 1+ chars other than a regular space (\u0020), tab (\u0009) and carriage return (\u000D). You actually are looking for a positive character class that would match one of the three chars you indicated (\x0A for a newline, \x0D for a carriage return and \x09 for a tab) in the question with a regular space (\x20).
You may just use
var res = Regex.Replace(s, #"[\x0A\x0D\x09]", " ");
See the regex demo

Regex to remove specific string if exist

I wanna remove the -L from the end of my string if exists
So
ABCD => ABCD
ABCD-L => ABCD
at the moment I'm using something like the line below which uses the if/else type of arrangement in my Regex, however, I have a feeling that it should be way more easier than this.
var match = Regex.Match("...", #"(?(\S+-L$)\S+(?=-L)|\S+)");
How about just doing:
Regex rgx = new Regex("-L$");
string result = rgx.Replace("ABCD-L", "");
So basically: if the string ends with -L, replace that part with an empty string.
If you want to not only invoke the replacement at the end of the string, but also at the end of a word, you can add an additional switch to detect word boundaries (\b) in addition to the end of the string:
Regex rgx = new Regex("-L(\b|$)");
string result = rgx.Replace("ABCD-L ABCD ABCD-L", "");
Note that detecting word boundaries can be a little ambiguous. See here for a list of characters that are considered to be word characters in C#.
You also can use String.Replace() method to find a specific string inside a string and replace it with another string in this case with an empty string.
http://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx
Use Regex.Replace function,
Regex.Replace(string, #"(\S+?)-L(?=\s|$)", "$1")
DEMO
Explanation:
( group and capture to \1:
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times)
) end of \1
-L '-L'
(?= look ahead to see if there is:
\s whitespace (\n, \r, \t, \f, and " ")
| OR
$ before an optional \n, and the end of
the string
) end of look-ahead
You certainly can use Regex for this, but why when using normal string functions is clearer?
Compare this:
text = text.EndsWith("-L")
? text.Substring(0, text.Length - "-L".Length)
: text;
to this:
text = Regex.Replace(text, #"(\S+?)-L(?=\s|$)", "$1");
Or better yet, define an extension method like this:
public static string RemoveIfEndsWith(this string text, string suffix)
{
return text.EndsWith(suffix)
? text.Substring(0, text.Length - suffix.Length)
: text;
}
Then your code can look like this:
text = text.RemoveIfEndsWith("-L");
Of course you can always define the extension method using the Regex. At least then your calling code looks a lot cleaner and is far more readable and maintainable.

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

Dot word pattern matching

I want to create a regular expression to match a word that begins with a period. The word(s) can exist N times in a string. I want to ensure that the word comes up whether it's at the beginning of a line, the end of a line or somewhere in the middle. The latter part is what I'm having difficulty with.
Here is where I am at so far.
const string pattern = #"(^|(.* ))(?<slickText>\.[a-zA-Z0-9]*)( .*|$)";
public static MatchCollection Find(string input)
{
Regex regex = new Regex(pattern,RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection collection = regex.Matches(input);
return collection;
}
My test pattern finds .lee and .good. My test pattern fails to find .bruce:
static void Main()
{
MatchCollection results = ClassName.Find("a short stump .bruce\r\nand .lee a small tree\r\n.good roots");
foreach (Match item in results)
{
GroupCollection groups = item.Groups;
Console.WriteLine("{0} ", groups["slickText"].Value);
}
System.Diagnostics.Debug.Assert(results.Count > 0);
}
Maybe you're just looking for \.\w+?
Test:
var s = "a short stump .bruce\r\nand .lee a small tree\r\n.good roots";
Regex.Matches(s, #"\.\w+").Dump();
Result:
Note:
If you don't want to find foo in some.foo (because there's no whitespace between some and .foo), you can use (?<=\W|^)\.\w+ instead.
Bizarrely enough, it seems that with RegexOptions.Multiline, ^ and $ will only additionally match \n, not \r\n.
Thus you get .good because it is preceded by \n which is matched by ^, but you don't get .bruce because it is succeeded by \r which is not matched by $.
You could do a .Replace("\r", "") on the input, or rewrite your expression to take individual lines of input.
Edit: Or replace $ with \r?$ in your pattern to explicitly include the \r; thanks to SvenS for the suggestion.
In your RegEx, a word has to be terminated by a space, but bruce is terminated by \r instead.
I would give this regex a go:
(?:.*?(\.[A-Za-z]+(?:\b|.\s)).*?)+
And change the RegexOptions from Multiline to Singleline - in this mode dot matches all characters including newline.

Categories