C# Regular Expression Match Failing

C# Regular Expression Match Failing - c#

Here's the regular expression pattern:
string testerpattern = #"\s+\d+:\s+\w\w\w\w\w\w\s+..:..:..:..:..:..:..:..\s+\d+.\d+.\d+.\d+\s+\d+.\d+.\d+.\d+\s+""\w +""";
Here's some lines of text I want to match. there will be 1 or more spaces at the beginning of the line. When I get it working I will modify it to do named matches. Basically I want most of the line without doing multiple matches on a line for each pattern.
2: fffc02 10:00:00:05:1e:36:5f:82 172.31.3.93 0.0.0.0 "SAN002A"
3: fffc03 10:00:00:05:1e:e2:a7:00 172.31.3.168 0.0.0.0 "SAN003A"
4: fffc04 50:00:51:e8:cc:2f:ae:01 0.0.0.0 0.0.0.0 "fcr_fd_4"
here's the static class I wrote to do the matches. It works elsewhere in my program so I'm assuming that it's the pattern that's a problem. the pattern matches successfully on Regexr.com
public static class RegexExtensions
{
public static bool TryMatch(out Match match, string input, string pattern)
{
match = Regex.Match(input, pattern);
return (match.Success);
}
public static bool TryMatch(out MatchCollection match, string input, string pattern)
{
match = Regex.Matches(input, pattern);
return (match.Count > 0);
}
}

First of all, surely remove the space between \w and + if you intend to match one or more word characters.
Next, if you need to match a literal dot, you must either escape it - \., or put into a character class - [.].
Also, you can make use of limiting quantifiers to shorten the pattern if you do not need captures. See how your pattern can be written:
string pat = #"\s+\d+:\s+\w{6}\s+(?:..:){7}..(?:\s+\d+(?:\.\d+){3}){2}\s+""\w+""";
See the regex demo (where \w{6} matches 6 "word" chars, (?:..:){7} matches 7 sequences of 2 any chars other than a newline followed with :, etc.)
If you need to capture, still, you can use the ideas I outlined above:
\s+(\d+):\s+(\w{6})\s+(..(?::..){3}):((?:..:){3}..)\s+(\d+(?:\.\d+){3})\s+(\d+(?:\.\d+){3})\s+"(\w+)"
See the regex demo

Related

Simplify Regex grouping

var pattern = (?:[P|p]rint\("")(.+)(?:""\);?)
var input = Print("Hello World");
Results in two groups, the second one captures exactly what I want to capture and the first one is completely useless, how do I remove the first one?
I tried (?:ABC) it didn't work

Your pattern uses 1 capturing group () and 2 non capturing groups using (?:)
Those 2 non capturing groups you can omit as well as the | from the character class. I think you also would like to make the .* non greedy like .*? to prevent overmatching.
Then your pattern could look like(Matching an optional semicolon at the end):
[Pp]rint\("(.+?)"\);?
Regex demo
You might also use a version with a negated character class to match not a double quote:
[Pp]rint\(("[^"]+)"\);
Regex demo

Try following :
string input = "var input = Print(\"Hello World\");";
string pattern = "[Pp]rint\\(\"(?'message'[^\"]+)";
Match match = Regex.Match(input, pattern);
string message = match.Groups["message"].Value;

How to match camel case identifiers with a Regular Expression?

I have the need to match camel case variables. I am ignoring variables with numbers in the name.
private const String characters = #"\-:;*+=\[\{\(\/?\s^""'\<\]\}\.\)$\>";
private const String start = #"(?<=[" + characters +"])[_a-z]+";
private const String capsWord = "[_A-Z]{1}[_a-z]+";
private const String end = #"(?=[" + characters + "])";
var regex = new Regex($"{start}{capsWord}{end}",
RegexOptions.Compiled | RegexOptions.CultureInvariant) }
This is great for matching single hump variables! But not with multiple nor does the one that meets the end of the line. I thought $ or ^ in my characters would allow them to match.
abcDef // match
notToday<end of line> // no match
<start of line>intheBeginning // no match
whatIf // match
"howFar" // match
(whatsNext) // match
ohMyGod // two humps don't match
I have also tried wrapping my capsWord like this
"(capsWord)+" but it also doesn't work.
WARNING! Regex tester online matches using this "(capsWord)+" so don't verify and respond by testing from there.
It seems that my deployment wasn't getting the updates when I was making changes so there may not have been an issue after all.
This following almost works save for the start of line problem. Note, I notice I didn't need the suffix part because the match ends with [a-z] content.
private const String characters = #"\-:;*+=\[\{\(\/?\s^""'\<\]\}\.\)$\>";
private const String pattern = "(?<=[" + characters + "])[_a-z]+([A-Z][a-z]+)+";
abcDef // match
notToday<end of line> // match
<start of line>intheBeginning // no match
whatIf // match
"howFar" // match
(whatsNext) // match
ohMyGod // match
So, if anyone can solve it let me know.
I have also simplified the other characters to a simpler more concise expression but it still has a problem with matching from the beginning of the line.
private const String pattern = "(?<=[^a-zA-Z])[_a-z]+([A-Z][a-z]+)+";

You can match an empty position between a prefix and a suffix to split the camelCase identifiers
(?<=[_a-z])(?=[_A-Z])
The prefix contains the lower case letters, the suffix the upper case letters.
If you want to match camelCase identifiers, you can use
(?<=^|[^_a-zA-Z])_*[a-z]+[_a-zA-Z]*
How it works:
(?<= Match any position pos following a prefix exp (?<=exp)pos
^ Beginning of line
| OR
[^_a-zA-Z] Not an identifier character
)
_* Any number of underlines
[a-z]+ At least one lower case letter
[_a-zA-Z]* Any number of underlines and lower or upper case letters
So, it basically says: Match a sequence optionally starting with underlines, followed by at least one lower case letter, optionally followed by underlines and letters (upper and lower), and the whole thing must be preceded by either a beginning of line or a non-identifier character. This is necessary to make sure that we not only match the ending of a identifier starting with an upper case letter (or underscores and a upper case letter).
var camelCaseExpr = new Regex("(?<=^|[^_a-zA-Z])_*[a-z]+[_a-zA-Z]*");
MatchCollection matches = camelCaseExpr.Matches("whatIf _Abc _abc howFar");
foreach (Match m in matches) {
Console.WriteLine(m.Value);
}
prints
whatIf
_abc
howFar

Had the same problem today, what worked for me:
\b([a-z][a-z0-9]+[A-Z])+[a-z0-9]+\b
Note: this is for PCRE regexes
Explanation:
`(` group begin
`[a-z]` start with a lower-case letter
`[a-z0-9]+` match a string of all lowercase/numbers
`[A-Z]` an upper-case letter
`)+` group end; match one or more of such groups.
Ends with some more lower-case/numbers.
\b for word boundary.
In my case, the _camelCaseIdent_s had only one letter upper in between words.
So, this worked for me, but if you can have (or want to match) more than one
upper-case letter in between, you could do something like [A-Z]{1,2}

Regular expression matching a given structure

I need to generate a regex to match any string with this structure:
{"anyWord"}{"aSpace"}{"-"}{"anyLetter"}
How can I do it?
Thanks
EDIT
I have tried:
string txt="print -c";
string re1="((?:[a-z][a-z]+))"; // Word 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String word1=m.Groups[1].ToString();
Console.Write("("+word1.ToString()+")"+"\n");
}
Console.ReadLine();
but this only matches the word "print"

This would be pretty straight-forward :
[a-zA-Z]+\s\-[a-zA-Z]
explained as follows :
[a-zA-Z]+ # Matches 1 or more letters
\s # Matches a single space
\- # Matches a single hyphen / dash
[a-zA-Z] # Matches a single letter
If you needed to implement this in C#, you could just use the Regex class and specifically the Regex.Matches() method:
var matches = Regex.Matches(yourString,#"[a-zA-Z]+\s\-[a-zA-Z]");
Some example matching might look like this :

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?

Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.

Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).

Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

Dot word pattern matching

I want to create a regular expression to match a word that begins with a period. The word(s) can exist N times in a string. I want to ensure that the word comes up whether it's at the beginning of a line, the end of a line or somewhere in the middle. The latter part is what I'm having difficulty with.
Here is where I am at so far.
const string pattern = #"(^|(.* ))(?<slickText>\.[a-zA-Z0-9]*)( .*|$)";
public static MatchCollection Find(string input)
{
Regex regex = new Regex(pattern,RegexOptions.IgnoreCase | RegexOptions.Multiline);
MatchCollection collection = regex.Matches(input);
return collection;
}
My test pattern finds .lee and .good. My test pattern fails to find .bruce:
static void Main()
{
MatchCollection results = ClassName.Find("a short stump .bruce\r\nand .lee a small tree\r\n.good roots");
foreach (Match item in results)
{
GroupCollection groups = item.Groups;
Console.WriteLine("{0} ", groups["slickText"].Value);
}
System.Diagnostics.Debug.Assert(results.Count > 0);
}

Maybe you're just looking for \.\w+?
Test:
var s = "a short stump .bruce\r\nand .lee a small tree\r\n.good roots";
Regex.Matches(s, #"\.\w+").Dump();
Result:
Note:
If you don't want to find foo in some.foo (because there's no whitespace between some and .foo), you can use (?<=\W|^)\.\w+ instead.

Bizarrely enough, it seems that with RegexOptions.Multiline, ^ and $ will only additionally match \n, not \r\n.
Thus you get .good because it is preceded by \n which is matched by ^, but you don't get .bruce because it is succeeded by \r which is not matched by $.
You could do a .Replace("\r", "") on the input, or rewrite your expression to take individual lines of input.
Edit: Or replace $ with \r?$ in your pattern to explicitly include the \r; thanks to SvenS for the suggestion.

In your RegEx, a word has to be terminated by a space, but bruce is terminated by \r instead.

I would give this regex a go:
(?:.*?(\.[A-Za-z]+(?:\b|.\s)).*?)+
And change the RegexOptions from Multiline to Singleline - in this mode dot matches all characters including newline.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Regular Expression Match Failing - c#

Related

Simplify Regex grouping

How to match camel case identifiers with a Regular Expression?

Regular expression matching a given structure

Regex to find special pattern

Dot word pattern matching

Categories

Resources