Regular expressions match not working accurately with semicolon - c#

I have a string of codes like:
0926;0941;0917;0930;094D;
I want to search for: 0930;094D; in the above string. I am using this code to find a string fragment:
static bool ExactMatch(string input, string match)
{
return Regex.IsMatch(input, string.Format(#"\b{0}\b", Regex.Escape(match)));
}
The problem is that sometimes the code works and sometimes not. If I match a single code for example: 0930; , it works but when I add 094D; , it skips match.
How to refine the code to work accurately with semicolons?

Try this, I have tested..
string val = "0926;0941;0917;0930;094D;";
string match = "0930;094D;"; // or match = "0930;" both found
if (Regex.IsMatch(val,match))
Console.Write("Found");
else Console.Write("Not Found");

"\b" denotes a word boundary, which is in between a word and a non-word character. Unfortunately, a semi-colon is not a word character. There is no "\b" at the end of "0926;0941;0917;0930;094D;" thus the Regex shows no match.
Why not just remove the last "\b" in your Regex?

Perhaps I'm not understanding your situation correctly; but if you're looking for an exact match within the string, couldn't you simply avoid regex and use string.Contains:
static bool ExactMatch(string input, string match)
{
return input.Contains(match);
}

Related

Replace exact matching words containing special characters

I came across How to search and replace exact matching strings only. However, it doesn't work when there are words that start with #. My fiddle here https://dotnetfiddle.net/9kgW4h
string textToFind = string.Format(#"\b{0}\b", "#bob");
Console.WriteLine(Regex.Replace("#bob!", textToFind, "me"));// "#bob!" instead of "me!"
Also, in addition to that what I would like to do is that, if a word starts with \# say for example \#myname and if I try to find and replace #myname, it shouldn't do the replace.
I suggest replacing the leading and trailing word boundaries with unambiguous lookaround-based boundaries that will require whitespace chars or start/end of string on both ends of the search word, (?<!\S) and (?!\S). Besides, you need to use $$ in the replacement pattern to replace with a literal $.
I suggest:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string text = #"It is #google.com or #google w#google \#google \\#google";
string result = SafeReplace(text,"#google", "some domain", true);
Console.WriteLine(result);
}
public static string SafeReplace(string input, string find, string replace, bool matchWholeWord)
{
string textToFind = matchWholeWord ? string.Format(#"(?<!\S){0}(?!\S)", Regex.Escape(find)) : find;
return Regex.Replace(input, textToFind, replace.Replace("$","$$"));
}
}
See the C# demo.
The Regex.Escape(find) is only necessary if you expect special regex metacharacters in the find variable value.
The regex demo is available at regexstorm.net.

Regular Expression oddity, why does this happen?

This simple regular expression matches the text of Movie. Am I wrong in reading this as "Q repeated zero or more times"? Why does it match, shouldn't it return false?
public class Program
{
private static void Main(string[] args)
{
Regex regex = new Regex("Q*");
string input = "Movie";
if (regex.IsMatch(input))
{
Console.WriteLine("Yup.");
}
else
{
Console.WriteLine("Nope.");
}
}
}
As you are saying correctly, it means “Q repeated zero or more times”. I this case, it’s zero times, so you are essentially trying to match "" in your input string. As IsMatch doesn’t care where it matches, it can match the empty string anywhere within your input string, so it returns true.
If you want to make sure that the whole input string has to match, you can add ^ and $: "^Q*$".
Regex regex = new Regex("^Q*$");
Console.WriteLine(regex.IsMatch("Movie")); // false
Console.WriteLine(regex.IsMatch("QQQ")); // true
Console.WriteLine(regex.IsMatch("")); // true
You are right in reading this regex as Q repeated 0 or more times. The thing with that is the 0. When you try a regex, it will try to find any successful match.
The only way for the regex to match the string is to try matching an empty string (0 times), which appears anywhere in-between the matches, and if you didn't know that before, yes, regex can match empty strings between characters. You can try:
(Q*)
To get a capture group and use .Matches and Groups[1].Value to see what has been captured. You'll see that it's an empty string.
Usually, if you want to check the existence of a character, you don't use regex, but use .Contains. Otherwise, if you do want to use regex, you'd drop the quantifier, or use one which matches at least one particular character.

Why isn't this C# regular expression working?

I tried to write an expression to validate the following pattern:
digit[0-9] at 1 time exactly
"dot"
digit[0-9] 1-2 times
"dot"
digit[0-9] 1-3 times
"dot"
digit[0-9] 1-3 times or “hyphen”
For example these are legal numbers:
1.10.23.5
1.10.23.-
these aren't:
10.10.23.5
1.254.25.3
I used RegexBuddy to write the next pattern:
[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-
In RegexBuddy all seems perfect but in my code I am getting true about illegal numbers (like 10.1.1.1)
I wrote the next method for validating this pattern:
public static bool IsVaildEc(string ec)
{
try
{
if (String.IsNullOrEmpty(ec))
return false;
string pattern = #"[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-";
Regex check = new Regex(pattern);
return check.IsMatch(ec);
}
catch (Exception ex)
{
//logger
}
}
What am I doing wrong?
You regex isn't anchored to the start and end of the string, therefore it also matches a substring (e. g. 0.1.1.1 in the string 10.1.1.1).
As you can see, RegexBuddy matches a substring in the first "illegal" number. It correctly fails to match the second number because the three digits in the second octet can't be matched at all:
string pattern = #"^(?:[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-)$";
will fix that problem.
Then, your regex is needlessly complicated. The following does the same but simpler:
string pattern = #"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.(?:[0-9]{1,3}|-)$";
try:
#"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-"
you are not starting from the beggining of the text
If you match against the "10.1.1.1" the "0.1.1.1" part of your string would be a correct number and therefor return true.
Matching against
#"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-"
with the ^ sign at the beginning means that you want to match from the beginning.
You are missing the ^ char in the start of the regex.
Try this regex:
^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-
This C# Regex Cheat Sheet can be handy

C# Regex for retrieving capital string in quotation mark

Given a string, I want to retrieve a string that is in between the quotation marks, and that is fully capitalized.
For example, if a string of
oqr"awr"q q"ASRQ" asd "qIKQWIR"
has been entered, the regex would only evaluate "ASRQ" as matching string.
What is the best way to approach this?
Edit: Forgot to mention the string takes a numeric input as well I.E: "IO8917AS" is a valid input
EDIT: If you actually want "one or more characters, and none of the characters is a lower-case letter" then you probably want:
Regex regex = new Regex("\"\\P{Ll}+\"");
That will then allow digits as well... and punctuation. If you want to allow digits and upper case letters but nothing else, you can use:
Regex regex = new Regex("\"[\\p{Lu}\\d]+\"");
Or in verbatim string literal form (makes the quotes more confusing, but the backslashes less so):
Regex regex = new Regex(#"""[\p{Lu}\d]+""");
Original answer (before digits were required)
Sounds like you just want (within the pattern)
"[A-Z]*"
So something like:
Regex regex = new Regex("\"[A-Z]*\"");
Or for full Unicode support, use the Lu Unicode character category:
Regex regex = new Regex("\"\\p{Lu}*\"");
EDIT: As noted, if you don't want to match an empty string in quotes (which is still "a string where everything is upper case") then use + instead of *, e.g.
Regex regex = new Regex("\"\\p{Lu}+\");
Short but complete example of finding and displaying the first match:
using System;
using System.Text.RegularExpressions;
class Program
{
public static void Main()
{
Regex regex = new Regex("\"\\p{Lu}+\"");
string text = "oqr\"awr\"q q\"ASRQ\" asd \"qIKQWIR\"";
Match match = regex.Match(text);
Console.WriteLine(match.Success); // True
Console.WriteLine(match.Value); // "ASRQ"
}
}
Like this:
"\"[A-Z]+\""
The outermost quotes are not part of the regex, they delimit a C# string.
This requires at least one uppercase character between quotes and works for the English language.
Please try the following:
[\w]*"([A-Z0-9]+)"

Using Regex to match quoted string with embedded, non-escaped quotes

I am trying to match a string in the following pattern with a regex.
string text = "'Emma','The Last Leaf','Gulliver's travels'";
string pattern = #"'(.*?)',?";
foreach (Match match in Regex.Matches(text,pattern,RegexOptions.IgnoreCase))
{
Console.WriteLine(match + " " + match.Index);
Console.WriteLine(match.Groups[1].Captures[0]);
}
This matches "Emma" and "The Last leaf" correctly, however the third match is "Gulliver". But the desired match is "Gulliver's travels". How can I build a regex for a patterns like this?
Since , is your delimiter, you can try changing your pattern like this. It should work.
string pattern = #"'(.*?)'(?:,|$)";
The way this works is, it looks for a single quote followed by a comma or end of the line.
I think this can work '(.*?)',|'(.*)' as regular expression.
you may consider to use look behind /look ahead:
"(?<=^'|',').*?(?='$|',')"
test with grep:
kent$ echo "'Emma','The Last Leaf','Gulliver's travels'"|grep -Po "(?<=^'|',').*?(?='$|',')"
Emma
The Last Leaf
Gulliver's travels
You can't, if you have single-quote delimited strings and Gulliver's contains a single, unescaped quote there's no way to distinguish it from the end of a string. You could always just split it by commas and trim 's from either side but I'm not sure that's what you want:
string text = "'Emma','The Last Leaf','Gulliver's travels'";
foreach(string s in text.split(new char[] {','})) {
Console.WriteLine(s.Trim('\''));
}

Categories