Replace exact matching words containing special characters - c#

I came across How to search and replace exact matching strings only. However, it doesn't work when there are words that start with #. My fiddle here https://dotnetfiddle.net/9kgW4h
string textToFind = string.Format(#"\b{0}\b", "#bob");
Console.WriteLine(Regex.Replace("#bob!", textToFind, "me"));// "#bob!" instead of "me!"
Also, in addition to that what I would like to do is that, if a word starts with \# say for example \#myname and if I try to find and replace #myname, it shouldn't do the replace.

I suggest replacing the leading and trailing word boundaries with unambiguous lookaround-based boundaries that will require whitespace chars or start/end of string on both ends of the search word, (?<!\S) and (?!\S). Besides, you need to use $$ in the replacement pattern to replace with a literal $.
I suggest:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string text = #"It is #google.com or #google w#google \#google \\#google";
string result = SafeReplace(text,"#google", "some domain", true);
Console.WriteLine(result);
}
public static string SafeReplace(string input, string find, string replace, bool matchWholeWord)
{
string textToFind = matchWholeWord ? string.Format(#"(?<!\S){0}(?!\S)", Regex.Escape(find)) : find;
return Regex.Replace(input, textToFind, replace.Replace("$","$$"));
}
}
See the C# demo.
The Regex.Escape(find) is only necessary if you expect special regex metacharacters in the find variable value.
The regex demo is available at regexstorm.net.

Related

RegEx get words with special character

I want to find words in a string starting with a _ (underscore).
That was easy enough
I wrote this small test program:
class Program
{
private static Regex WordExpression = new Regex(#"_\w+");
private static string TranslateWord(Match word) => word?.Value?.Replace("_", "");
private static string Translate(string word)
{
return WordExpression.Replace(word, TranslateWord);
}
static void Main(string[] args)
{
Console.WriteLine(Translate("Do you want to _Exit the _Program"));
Console.ReadKey();
}
}
And that worked out very well. The problems starts when there are no spaces between my words:
Console.WriteLine(Translate("_Exit_Program"));
My expressions only finds one match _Exit_Program but I would very much like two matches. Can this be done with a regular expression or would I need to do a split string in my TranslateWord method?
You can use the following regex:
#"_[^\W_]+"
The [^\W_] negated character class will match any character other than a non-word character (so, it will match all \ws) except _.
See the regex demo
A more .NET-ish regex will be an expression with character class subtraction:
_[\w-[_]]+
See another demo
Here, with [\w-[_]], we match all \ws with the exception of _.
Use the first suggestion if you need a more portable solution, and the second one if you only plan to use the regex in a .NET environment.

Regex to remove specific string if exist

I wanna remove the -L from the end of my string if exists
So
ABCD => ABCD
ABCD-L => ABCD
at the moment I'm using something like the line below which uses the if/else type of arrangement in my Regex, however, I have a feeling that it should be way more easier than this.
var match = Regex.Match("...", #"(?(\S+-L$)\S+(?=-L)|\S+)");
How about just doing:
Regex rgx = new Regex("-L$");
string result = rgx.Replace("ABCD-L", "");
So basically: if the string ends with -L, replace that part with an empty string.
If you want to not only invoke the replacement at the end of the string, but also at the end of a word, you can add an additional switch to detect word boundaries (\b) in addition to the end of the string:
Regex rgx = new Regex("-L(\b|$)");
string result = rgx.Replace("ABCD-L ABCD ABCD-L", "");
Note that detecting word boundaries can be a little ambiguous. See here for a list of characters that are considered to be word characters in C#.
You also can use String.Replace() method to find a specific string inside a string and replace it with another string in this case with an empty string.
http://msdn.microsoft.com/en-us/library/fk49wtc1(v=vs.110).aspx
Use Regex.Replace function,
Regex.Replace(string, #"(\S+?)-L(?=\s|$)", "$1")
DEMO
Explanation:
( group and capture to \1:
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times)
) end of \1
-L '-L'
(?= look ahead to see if there is:
\s whitespace (\n, \r, \t, \f, and " ")
| OR
$ before an optional \n, and the end of
the string
) end of look-ahead
You certainly can use Regex for this, but why when using normal string functions is clearer?
Compare this:
text = text.EndsWith("-L")
? text.Substring(0, text.Length - "-L".Length)
: text;
to this:
text = Regex.Replace(text, #"(\S+?)-L(?=\s|$)", "$1");
Or better yet, define an extension method like this:
public static string RemoveIfEndsWith(this string text, string suffix)
{
return text.EndsWith(suffix)
? text.Substring(0, text.Length - suffix.Length)
: text;
}
Then your code can look like this:
text = text.RemoveIfEndsWith("-L");
Of course you can always define the extension method using the Regex. At least then your calling code looks a lot cleaner and is far more readable and maintainable.

C# Regex for retrieving capital string in quotation mark

Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"

Regular expressions match not working accurately with semicolon

I have a string of codes like:
0926;0941;0917;0930;094D;
I want to search for: 0930;094D; in the above string. I am using this code to find a string fragment:
static bool ExactMatch(string input, string match)
{
return Regex.IsMatch(input, string.Format(#"\b{0}\b", Regex.Escape(match)));
}
The problem is that sometimes the code works and sometimes not. If I match a single code for example: 0930; , it works but when I add 094D; , it skips match.
How to refine the code to work accurately with semicolons?
Try this, I have tested..
string val = "0926;0941;0917;0930;094D;";
string match = "0930;094D;"; // or match = "0930;" both found
if (Regex.IsMatch(val,match))
Console.Write("Found");
else Console.Write("Not Found");
"\b" denotes a word boundary, which is in between a word and a non-word character. Unfortunately, a semi-colon is not a word character. There is no "\b" at the end of "0926;0941;0917;0930;094D;" thus the Regex shows no match.
Why not just remove the last "\b" in your Regex?
Perhaps I'm not understanding your situation correctly; but if you're looking for an exact match within the string, couldn't you simply avoid regex and use string.Contains:
static bool ExactMatch(string input, string match)
{
return input.Contains(match);
}

How to delete certain characters of a word in c#

I have a
string word = "degree/NN";
What I want is to remove the "/NN" part of the word and take only the word "degree".
I have following conditions:
The length of the word can be different in different occasions. (can be any word therefore the length is not fixed)
But the word will contain the "/NN" part at the end always.
How can I do this in C# .NET?
Implemented as an extension method:
static class StringExtension
{
public static string RemoveTrailingText(this string text, string textToRemove)
{
if (!text.EndsWith(textToRemove))
return text;
return text.Substring(0, text.Length - textToRemove.Length);
}
}
Usage:
string whatever = "degree/NN".RemoveTrailingText("/NN");
This takes into account that the unwanted part "/NN" is only removed from the end of the word, as you specified. A simple Replace would remove every occurrence of "/NN". However, that might not be a problem in your special case.
You can shorten the input string by three characters using String.Remove like this:
string word = "degree/NN";
string result = word.Remove(word.Length - 3);
If the part after the slash has variable length, you can use String.LastIndexOf to find the slash:
string word = "degree/NN";
string result = word.Remove(word.LastIndexOf('/'));
Simply use
word = word.Replace(#"/NN","");
edit
Forgot to add word =. Fixed that in my example.
Try this -
string.replace();
if you need to replace patterns use regex replace
Regex rgx = new Regex("/NN");
string result = rgx.Replace("degree/NN", string.Empty);

Categories