RegEx get words with special character - c#

I want to find words in a string starting with a _ (underscore).
That was easy enough
I wrote this small test program:
class Program
{
private static Regex WordExpression = new Regex(#"_\w+");
private static string TranslateWord(Match word) => word?.Value?.Replace("_", "");
private static string Translate(string word)
{
return WordExpression.Replace(word, TranslateWord);
}
static void Main(string[] args)
{
Console.WriteLine(Translate("Do you want to _Exit the _Program"));
Console.ReadKey();
}
}
And that worked out very well. The problems starts when there are no spaces between my words:
Console.WriteLine(Translate("_Exit_Program"));
My expressions only finds one match _Exit_Program but I would very much like two matches. Can this be done with a regular expression or would I need to do a split string in my TranslateWord method?

You can use the following regex:
#"_[^\W_]+"
The [^\W_] negated character class will match any character other than a non-word character (so, it will match all \ws) except _.
See the regex demo
A more .NET-ish regex will be an expression with character class subtraction:
_[\w-[_]]+
See another demo
Here, with [\w-[_]], we match all \ws with the exception of _.
Use the first suggestion if you need a more portable solution, and the second one if you only plan to use the regex in a .NET environment.

Related

C# Regex Replace Greek Letters

I'm getting string like "thetaetaA" (theta eta A)
I need to replace the recived string like {\theta}{\eta}A
// C# code with regex to match greek letters
string gl = "alpha|beta|delata|theta|eta";
string recived = "thetaetaA";
var greekLetters = Regex.Matches(recived,gl);
could someone please tell how can I create the required text
{\theta}{\eta}A
if I use loop and do a replace it generate following out put
{\th{\eta}}{\eta}A
because theta included eta
Regex.Matches() doesn't replace anything. Use Regex.Replace(). Capture the words and reference the capture in the replacement adding the special characters around it. (And possibly have the superstrings before the substrings in the alternation. Though it works either way for me. Supposedly it's a greedy match anyway.)
class Program
{
static void Main(string[] args)
{
string gl = "alpha|beta|delta|theta|eta";
string received = "thetaetaA";
string texified = Regex.Replace(received, $"({gl})", #"{\$1}");
Console.WriteLine(texified);
Console.ReadKey();
}
}

Replace exact matching words containing special characters

I came across How to search and replace exact matching strings only. However, it doesn't work when there are words that start with #. My fiddle here https://dotnetfiddle.net/9kgW4h
string textToFind = string.Format(#"\b{0}\b", "#bob");
Console.WriteLine(Regex.Replace("#bob!", textToFind, "me"));// "#bob!" instead of "me!"
Also, in addition to that what I would like to do is that, if a word starts with \# say for example \#myname and if I try to find and replace #myname, it shouldn't do the replace.
I suggest replacing the leading and trailing word boundaries with unambiguous lookaround-based boundaries that will require whitespace chars or start/end of string on both ends of the search word, (?<!\S) and (?!\S). Besides, you need to use $$ in the replacement pattern to replace with a literal $.
I suggest:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string text = #"It is #google.com or #google w#google \#google \\#google";
string result = SafeReplace(text,"#google", "some domain", true);
Console.WriteLine(result);
}
public static string SafeReplace(string input, string find, string replace, bool matchWholeWord)
{
string textToFind = matchWholeWord ? string.Format(#"(?<!\S){0}(?!\S)", Regex.Escape(find)) : find;
return Regex.Replace(input, textToFind, replace.Replace("$","$$"));
}
}
See the C# demo.
The Regex.Escape(find) is only necessary if you expect special regex metacharacters in the find variable value.
The regex demo is available at regexstorm.net.

C# Regular Expression Match Failing

Here's the regular expression pattern:
string testerpattern = #"\s+\d+:\s+\w\w\w\w\w\w\s+..:..:..:..:..:..:..:..\s+\d+.\d+.\d+.\d+\s+\d+.\d+.\d+.\d+\s+""\w +""";
Here's some lines of text I want to match. there will be 1 or more spaces at the beginning of the line. When I get it working I will modify it to do named matches. Basically I want most of the line without doing multiple matches on a line for each pattern.
2: fffc02 10:00:00:05:1e:36:5f:82 172.31.3.93 0.0.0.0 "SAN002A"
3: fffc03 10:00:00:05:1e:e2:a7:00 172.31.3.168 0.0.0.0 "SAN003A"
4: fffc04 50:00:51:e8:cc:2f:ae:01 0.0.0.0 0.0.0.0 "fcr_fd_4"
here's the static class I wrote to do the matches. It works elsewhere in my program so I'm assuming that it's the pattern that's a problem. the pattern matches successfully on Regexr.com
public static class RegexExtensions
{
public static bool TryMatch(out Match match, string input, string pattern)
{
match = Regex.Match(input, pattern);
return (match.Success);
}
public static bool TryMatch(out MatchCollection match, string input, string pattern)
{
match = Regex.Matches(input, pattern);
return (match.Count > 0);
}
}
First of all, surely remove the space between \w and + if you intend to match one or more word characters.
Next, if you need to match a literal dot, you must either escape it - \., or put into a character class - [.].
Also, you can make use of limiting quantifiers to shorten the pattern if you do not need captures. See how your pattern can be written:
string pat = #"\s+\d+:\s+\w{6}\s+(?:..:){7}..(?:\s+\d+(?:\.\d+){3}){2}\s+""\w+""";
See the regex demo (where \w{6} matches 6 "word" chars, (?:..:){7} matches 7 sequences of 2 any chars other than a newline followed with :, etc.)
If you need to capture, still, you can use the ideas I outlined above:
\s+(\d+):\s+(\w{6})\s+(..(?::..){3}):((?:..:){3}..)\s+(\d+(?:\.\d+){3})\s+(\d+(?:\.\d+){3})\s+"(\w+)"
See the regex demo

C# Regex for retrieving capital string in quotation mark

Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"

Regular expressions match not working accurately with semicolon

I have a string of codes like:
0926;0941;0917;0930;094D;
I want to search for: 0930;094D; in the above string. I am using this code to find a string fragment:
static bool ExactMatch(string input, string match)
{
return Regex.IsMatch(input, string.Format(#"\b{0}\b", Regex.Escape(match)));
}
The problem is that sometimes the code works and sometimes not. If I match a single code for example: 0930; , it works but when I add 094D; , it skips match.
How to refine the code to work accurately with semicolons?
Try this, I have tested..
string val = "0926;0941;0917;0930;094D;";
string match = "0930;094D;"; // or match = "0930;" both found
if (Regex.IsMatch(val,match))
Console.Write("Found");
else Console.Write("Not Found");
"\b" denotes a word boundary, which is in between a word and a non-word character. Unfortunately, a semi-colon is not a word character. There is no "\b" at the end of "0926;0941;0917;0930;094D;" thus the Regex shows no match.
Why not just remove the last "\b" in your Regex?
Perhaps I'm not understanding your situation correctly; but if you're looking for an exact match within the string, couldn't you simply avoid regex and use string.Contains:
static bool ExactMatch(string input, string match)
{
return input.Contains(match);
}

Categories