Why isn't this C# regular expression working?

Why isn't this C# regular expression working? - c#

I tried to write an expression to validate the following pattern:
digit[0-9] at 1 time exactly
"dot"
digit[0-9] 1-2 times
"dot"
digit[0-9] 1-3 times
"dot"
digit[0-9] 1-3 times or “hyphen”
For example these are legal numbers:
1.10.23.5
1.10.23.-
these aren't:
10.10.23.5
1.254.25.3
I used RegexBuddy to write the next pattern:
[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-
In RegexBuddy all seems perfect but in my code I am getting true about illegal numbers (like 10.1.1.1)
I wrote the next method for validating this pattern:
public static bool IsVaildEc(string ec)
{
try
{
if (String.IsNullOrEmpty(ec))
return false;
string pattern = #"[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-";
Regex check = new Regex(pattern);
return check.IsMatch(ec);
}
catch (Exception ex)
{
//logger
}
}
What am I doing wrong?

You regex isn't anchored to the start and end of the string, therefore it also matches a substring (e. g. 0.1.1.1 in the string 10.1.1.1).
As you can see, RegexBuddy matches a substring in the first "illegal" number. It correctly fails to match the second number because the three digits in the second octet can't be matched at all:
string pattern = #"^(?:[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-)$";
will fix that problem.
Then, your regex is needlessly complicated. The following does the same but simpler:
string pattern = #"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.(?:[0-9]{1,3}|-)$";

try:
#"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-"
you are not starting from the beggining of the text

If you match against the "10.1.1.1" the "0.1.1.1" part of your string would be a correct number and therefor return true.
Matching against
#"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-"
with the ^ sign at the beginning means that you want to match from the beginning.

You are missing the ^ char in the start of the regex.
Try this regex:
^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-
This C# Regex Cheat Sheet can be handy

Related

Regular Expression oddity, why does this happen?

This simple regular expression matches the text of Movie. Am I wrong in reading this as "Q repeated zero or more times"? Why does it match, shouldn't it return false?
public class Program
{
private static void Main(string[] args)
{
Regex regex = new Regex("Q*");
string input = "Movie";
if (regex.IsMatch(input))
{
Console.WriteLine("Yup.");
}
else
{
Console.WriteLine("Nope.");
}
}
}

As you are saying correctly, it means “Q repeated zero or more times”. I this case, it’s zero times, so you are essentially trying to match "" in your input string. As IsMatch doesn’t care where it matches, it can match the empty string anywhere within your input string, so it returns true.
If you want to make sure that the whole input string has to match, you can add ^ and $: "^Q*$".
Regex regex = new Regex("^Q*$");
Console.WriteLine(regex.IsMatch("Movie")); // false
Console.WriteLine(regex.IsMatch("QQQ")); // true
Console.WriteLine(regex.IsMatch("")); // true

You are right in reading this regex as Q repeated 0 or more times. The thing with that is the 0. When you try a regex, it will try to find any successful match.
The only way for the regex to match the string is to try matching an empty string (0 times), which appears anywhere in-between the matches, and if you didn't know that before, yes, regex can match empty strings between characters. You can try:
(Q*)
To get a capture group and use .Matches and Groups[1].Value to see what has been captured. You'll see that it's an empty string.
Usually, if you want to check the existence of a character, you don't use regex, but use .Contains. Otherwise, if you do want to use regex, you'd drop the quantifier, or use one which matches at least one particular character.

How to find repeatable characters

I can't understand how to solve the following problem:
I have input string "aaaabaa" and I'm trying to search for string "aa" (I'm looking for positions of characters)
Expected result is
0 1 2 5
aa aabaa
a aa abaa
aa aa baa
aaaab aa
This problem is already solved by me using another approach (non-RegEx).
But I need a RegEx I'm new to RegEx so google-search can't help me really.
Any help appreciated! Thanks!
P.S.
I've tried to use (aa)* and "\b(\w+(aa))*\w+" but those expressions are wrong

You can solve this by using a lookahead
a(?=a)
will find every "a" that is followed by another "a".
If you want to do this more generally
(\p{L})(?=\1)
This will find every character that is followed by the same character. Every found letter is stored in a capturing group (because of the brackets around), this capturing group is then reused by the positive lookahead assertion (the (?=...)) by using \1 (in \1 there is the matches character stored)
\p{L} is a unicode code point with the category "letter"
Code
String text = "aaaabaa";
Regex reg = new Regex(#"(\p{L})(?=\1)");
MatchCollection result = reg.Matches(text);
foreach (Match item in result) {
Console.WriteLine(item.Index);
}
Output
0
1
2
5

The following code should work with any regular expression without having to change the actual expression:
Regex rx = new Regex("(a)\1"); // or any other word you're looking for.
int position = 0;
string text = "aaaaabbbbccccaaa";
int textLength = text.Length;
Match m = rx.Match(text, position);
while (m != null && m.Success)
{
Console.WriteLine(m.Index);
if (m.Index <= textLength)
{
m = rx.Match(text, m.Index + 1);
}
else
{
m = null;
}
}
Console.ReadKey();
It uses the option to change the start index of a regex search for each consecutive search. The actual problem comes from the fact that the Regex engine, by default, will always continue searching after the previous match. So it will never find a possible match within another match, unless you instruct it to by using a Look ahead construction or by manually setting the start index.
Another, relatively easy, solution is to just stick the whole expression in a forward look ahead:
string expression = "(a)\1"
Regex rx2 = new Regex("(?=" + expression + ")");
MatchCollection ms = rx2.Matches(text);
var indexes = ms.Cast<Match>().Select(match => match.Index);
That way the engine will automatically advance the index by one for every match it finds.
From the docs:
When a match attempt is repeated by calling the NextMatch method, the regular expression engine gives empty matches special treatment. Usually, NextMatch begins the search for the next match exactly where the previous match left off. However, after an empty match, the NextMatch method advances by one character before trying the next match. This behavior guarantees that the regular expression engine will progress through the string. Otherwise, because an empty match does not result in any forward movement, the next match would start in exactly the same place as the previous match, and it would match the same empty string repeatedly.

Try this:
How can I find repeated characters with a regex in Java?
It is in java, but the regex and non-regex way is there. C# Regex is very similar to the Java way.

Regular expressions match not working accurately with semicolon

I have a string of codes like:
0926;0941;0917;0930;094D;
I want to search for: 0930;094D; in the above string. I am using this code to find a string fragment:
static bool ExactMatch(string input, string match)
{
return Regex.IsMatch(input, string.Format(#"\b{0}\b", Regex.Escape(match)));
}
The problem is that sometimes the code works and sometimes not. If I match a single code for example: 0930; , it works but when I add 094D; , it skips match.
How to refine the code to work accurately with semicolons?

Try this, I have tested..
string val = "0926;0941;0917;0930;094D;";
string match = "0930;094D;"; // or match = "0930;" both found
if (Regex.IsMatch(val,match))
Console.Write("Found");
else Console.Write("Not Found");

"\b" denotes a word boundary, which is in between a word and a non-word character. Unfortunately, a semi-colon is not a word character. There is no "\b" at the end of "0926;0941;0917;0930;094D;" thus the Regex shows no match.
Why not just remove the last "\b" in your Regex?

Perhaps I'm not understanding your situation correctly; but if you're looking for an exact match within the string, couldn't you simply avoid regex and use string.Contains:
static bool ExactMatch(string input, string match)
{
return input.Contains(match);
}

.NET Regex - "Not" Match

I have a regular expression:
12345678|[0]{8}|[1]{8}|[2]{8}|[3]{8}|[4]{8}|[5]{8}|[6]{8}|[7]{8}|[8]{8}|[9]{8}
which matches if the string contains 12345679 or 11111111 or 22222222 ... or ... 999999999.
How can I changed this to only match if NOT the above? (I am not able to just !IsMatch in the C# unfortunately)...EDIT because that is black box code to me and I am trying to set the regex in an existing config file

This will match everything...
foundMatch = Regex.IsMatch(SubjectString, #"^(?:(?!123456789|(\d)\1{7}).)*$");
unless one of the "forbidden" sequences is found in the string.
Not using !isMatch as you can see.
Edit:
Adding your second constraint can be done with a lookahead assertion:
foundMatch = Regex.IsMatch(SubjectString, #"^(?=\d{9,12})(?:(?!123456789|(\d)\1{7}).)*$");

Works perfectly
string s = "55555555";
Regex regx = new Regex(#"^(?:12345678|(\d)\1{7})$");
if (!regx.IsMatch(s)) {
Console.WriteLine("It does not match!!!");
}
else {
Console.WriteLine("it matched");
}
Console.ReadLine();
Btw. I simplified your expression a bit and added anchors
^(?:12345678|(\d)\1{7})$
The (\d)\1{7} part takes a digit \d and the \1 checks if this digit is repeated 7 more times.
Update
This regex is doing what you want
Regex regx = new Regex(#"^(?!(?:12345678|(\d)\1{7})$).*$");

First of all, you don't need any of those [] brackets; you can just do 0{8}|1{8}| etc.
Now for your problem. Try using a negative lookahead:
#"^(?:(?!123456789|(\d)\1{7}).)*$"
That should take care of your issue without using !IsMatch.

I am not able to just !IsMatch in the C# unfortunately.
Why not? What's wrong with the following solution?
bool notMatch = !Regex.Match(yourString, "^(12345678|[0]{8}|[1]{8}|[2]{8}|[3]{8}|[4]{8}|[5]{8}|[6]{8}|[7]{8}|[8]{8}|[9]{8})$");
That will match any string that contains more than just 12345678, 11111111, ..., 99999999

Simple regex question C#

I need to match the string that is shown in the window displayed below :
8% of setup_av_free.exe from software-files-l.cnet.com Completed
98% of test.zip from 65.55.72.119 Completed
[numeric]%of[filename]from[hostname | IP address]Completed
I have written the regex pattern halfway
if (Regex.IsMatch(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s]"))
MessageBox.Show(text);
and I now need to integrate the following regex into my code above
ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
ValidHostnameRegex = "^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
The 2 regex were taken from this link. These 2 regex works well when i use the Regex.ismatch to match "123.123.123.123" and "software-files-l.cnet.com" . However i cannot get it to work when i intergrate both of them to my existin regex code. I tried several variant but not able to get it to work. Can someone guide me to integrate the 2 regex to my existing code. Thanks in advance.

You can certainly combine all these regular expressions into one, but I'd recommend against it. Consider this method, first it checks wether your input text has the correct form overall, then it checks if the "from" part is an IP address or a hostname.
bool CheckString(string text) {
const string ValidIpAddressRegex = #"^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
const string ValidHostnameRegex = #"^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
var match = Regex.Match(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s](\S+)");
if(!match.Success)
return false;
string address = match.Groups[3].Value;
return Regex.IsMatch(address, ValidIpAddressRegex) ||
Regex.IsMatch(address, ValidHostnameRegex);
}
It does what you want and is much more readable and than single monster-sized regular expression. If you aren't going to call this method millions of time in a loop there is no reason to be concerned about it being less performant that single regex.
Also, in case you aren't aware of that the brackets around \d or \s aren't necessary.

The "Problem" that those two regexes do not match your string is that they start with ^ and end with $
^ means match the start of the string (or row if the m modifier is activated)
$ means match the end of the string (or row if the m modifier is activated)
When you try it this is true but in your real text they are in the middle of the string, so it is not matched.
Try just remove the ^ at the very beginning and the $ at the very end.

Here you go.
^[\d]+%[\s+]of[\s+](.+?)(\.[^.]*)[\s+]from[\s+]((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])|((([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])))[\s+]Completed
Remove the ^ and $ characters from the ValidIpAddressRegex and ValidHostnameRegex samples above, and add them separated by the or character (|) enclosed by parentheses.

You could use this, its should work for all cases. I mightve accidentally deleted a character while formatting so let me know if it doesnt work.
string captureString = "8% of setup_av_free.exe from software-files-l.cnet.com Completed";
Regex reg = new Regex(#"(?<perc>\d+)% of (?<file>\w+\.\w+) from (?<host>" +
#"(\d+\.\d+.\d+.\d+)|(((https?|ftp|gopher|telnet|file|notes|ms-help):" +
#"((//)|(\\\\))+)?[\w\d:##%/;$()~_?\+-=\\\.&]*)) Completed");
Match m = reg.Match(captureString);
string perc = m.Groups["perc"].Value;
string file = m.Groups["file"].Value;
string host = m.Groups["host"].Value;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why isn't this C# regular expression working? - c#

try: #"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-" you are not starting from the beggining of the text

If you match against the "10.1.1.1" the "0.1.1.1" part of your string would be a correct number and therefor return true. Matching against #"^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.-" with the ^ sign at the beginning means that you want to match from the beginning.

You are missing the ^ char in the start of the regex. Try this regex: ^[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.[0-9]{1,3}|[0-9]\.[0-9]{1,2}\.[0-9]{1,3}\.- This C# Regex Cheat Sheet can be handy

Related

Regular Expression oddity, why does this happen?

How to find repeatable characters

Regular expressions match not working accurately with semicolon

.NET Regex - "Not" Match

Simple regex question C#

Categories

Resources