C# Regex Pattern Conundrum

C# Regex Pattern Conundrum - c#

I have a regex that I've verified in 3 separate sources as successfully matching the desired text.
http://regexlib.com/RETester.aspx
http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx,
http://sourceforge.net/projects/regextester/
But, when I use the regex in my code. It does not produce a match. I have used other regex with this code and they have resulted in the desired matches. I'm at a loss...
string SampleText = "starttexthere\r\nothertexthereendtexthere";
string RegexPattern = "(?<=starttexthere)(.*?)(?=endtexthere)";
Regex FindRegex = new Regex(#RegexPattern);
Match m = FindRegex.Match(SampleText);
I don't know if the problem is my regex, or my code.

The problem is that your text contains a \r\n which means it is split across two lines. If you want to match the whole string you have to set the option to match across multiple lines, and to change the behavior of the . to include the \n (new-line character) in matched
Regex FindRegex = new Regex(#RegexPattern, RegexOptions.Multiline | RegexOptions.Singleline);

You don't need RegexOptions.Multiline.
The problem in your case is that the dot matches any character except line break characters (\r\ and \n).
So, you'll need to define your regex pattern like so: (?<=starttexthere)[\w\r\n]+(?=endtexthere) in order to specifically match text across line breaks.
Here's an online running sample: http://ideone.com/ZXgKar

Related

.NET Regex for parsing chess moves

Background
I'd like to parse quite a few of strings representing chess moves:
1.e4e62.d3d53.Nd2c54.g3Nf6
Each move begins with an increasing number 1., 2., 3. etc. There are no spaces in-between the moves.
The perfect match would be an array like this:
["1.e4e6", "2.d3d5", "3.Nd2c5", "4.g3Nf6"]
Regex Question
My regex so far is:
([0-9]\.)(.*?)(?=[0-9]\.)
This works in an online .NET Regex Tester (Regex Storm), apart not including the last move (4th).
How to include the last one too?
C# Question
My code is:
var regex = new Regex(#"([0-9]\.)(.*?)(?=[0-9]\.)");
var match = regex.Match(game);
The match here includes only one entry "1.e4e6" and not three (or four).
How to fix?
Thanks,
pom

It can not match the last item because the lookahead assertion is not true as there is no digit and dot following.
You can add to match the end of the string using an alternation.
To get all the results you could use Matches instead.
([0-9]\.)(.*?)(?=[0-9]\.|$)
Regex demo | C# demo
For example
string pattern = #"([0-9]\.)(.*?)(?=[0-9]\.|$)";
string input = #"1.e4e62.d3d53.Nd2c54.g3Nf6";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Value);
}
Note that if you want to get a match only and don't want to match spaces, you can use \S instead of a . and omit the capturing group:
[0-9]\.\S*?(?=[0-9]\.|$)
Regex demo

Regex pattern in c# start with # and end with 9;

Need regex pattern that text start with"#" and end with " ";
I tried the below pattern
string pattern = "^[#].*?[ ]$";
but not working

Since is an hex code of tab character, why not just using StartsWith and EndsWith methods instead?
if(yourString.StartsWith("#") && yourString.EndsWith("\\t"))
{
// Pass
}

This patterns works fine. I have tested it.
string pattern = "#(.*?)9";
See below link to test it online.
https://regex101.com/r/iR6nP6/1
C#
const string str = "dadasd#beetween9ddasdasd";
var match = Regex.Match(str, "#(.*?)9");
Console.WriteLine(match.Groups[1].Value);

In regex syntaxt, the [] denotes a group of characters of which the engine will attempt to match one of. Thus, [&#x9] means, match one of an &, #, x or 9 in no particular order.
If you are after order, which seems you are, you will need to remove the []. Something like so should work: string pattern = "^#.*?&#x9$";

you mean something like:
string pattern = "^#.*?[ ]$"
There are also many fine regex expression helpers on the web. for example https://regex101.com/ It gives a nice explanation of how your text will be handled.

You should use \t to match tab character
You can use special character sequences to put non-printable characters in your regular expression. Use \t to match a tab character (ASCII 0x09)
Try following Regex
^\#.*\t\;$

Regex to match full lines of text excluding crlf

How would a regex pattern to match each line of a given text be?
I'm trying ^(.+)$ but it includes crlf...

Just use RegexOptions.Multiline.
Multiline mode. Changes the meaning of
^ and $ so they match at the beginning
and end, respectively, of any line,
and not just the beginning and end of
the entire string.
Example:
var lineMatches = Regex.Matches("Multi\r\nlines", "^(.+)$", RegexOptions.Multiline);

I'm not sure what you mean by "match each line of a given text" means, but you can use a character class to exclude the CR and LF characters:
[^\r\n]+

The wording of your question seems a little unclear, but it sounds like you want RegexOptions.Multiline (in the System.Text.RegularExpressions namespace). It's an option you have to set on your RegEx object. That should make ^ and $ match the beginning and end of a line rather than the entire string.
For example:
Regex re = new Regex("^(.+)$", RegexOptions.Compiled | RegexOptions.Multiline);

Have you tried:
^(.+)\r?\n$
That way the match group includes everything except the CRLF, and requires that a new line be present (Unix default), but accepts the carriage return in front (Windows default).

I assume you're using the Multiline option? In that case you'll want to match the newline explicitly with "\n". (substitute "\r\n" as appropriate.)

Multiline regular expression in C# [duplicate]

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
Closed 4 years ago.
How do I match and replace text using regular expressions in multiline mode?
I know the RegexOptions.Multiline option, but what is the best way to specify match all with the new line characters in C#?
Input:
<tag name="abc">this
is
a
text</tag>
Output:
[tag name="abc"]this
is
a
test
[/tag]
Aahh, I found the actual problem. '&' and ';' in Regex are matching text in a single line, while the same need to be escaped in the Regex to work in cases where there are new lines also.

If you mean there has to be a newline character for the expression to match, then \n will do that for you.
Otherwise, I think you might have misunderstood the Multiline/Singleline flags. If you want your expression to match across several lines, you actually want to use RegexOptions.Singleline. What it means is that it treats the entire input string as a single line, thus ignoring newlines. Is this what you're after...?
Example
Regex rx = new Regex("<tag name=\"(.*?)\">(.*?)</tag>", RegexOptions.Singleline);
String output = rx.Replace("Text <tag name=\"abc\">test\nwith\nnewline</tag> more text...", "[tag name=\"$1\"]$2[/tag]");

Here's a regex to match. It requires the RegexOptions.Singleline option, which makes the . match newlines.
<(\w+) name="([^"]*)">(.*?)</\1>
After this regex, the first group contains the tag, the second the tag name, and the third the content between the tags. So replacement string could look like this:
[$1 name="$2"]$3[/$1]
In C#, this looks like:
newString = Regex.Replace(oldString,
#"<(\w+) name=""([^""]*)"">(.*?)</\1>",
"[$1 name=\"$2\"]$3[/$1]",
RegexOptions.Singleline);

C# reliable way to pattern match?

At the moment I am trying to match patterns such as
text text date1 date2
So I have regular expressions that do just that. However, the issue is for example if users input data with say more than 1 whitespace or if they put some of the text in a new line etc the pattern does not get picked up because it doesn't exactly match the pattern set.
Is there a more reliable way for pattern matching? The goal is to make it very simple for the user to write but make it easily matchable on my end. I was considering stripping out all the whitespace/newlines etc and then trying to match the pattern with no spaces i.e. texttextdate1date2.
Anyone got any better solutions?
Update
Here is a small example of the pattern I would need to match:
FIND me#test.com 01/01/2010 to 10/01/2010
Here is my current regex:
FIND [A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4} [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4} to [0-9]{1,2}/[0-9]{1,2}/[0-9]{2,4}
This works fine 90% of the time, however, if users submit this information via email it can have all different kinds of formatting and HTML I am not interested in. I am using a combination of the HtmlAgilityPack and a HTML tag removing regex to strip all the HTML from the email, but even at that I can't seem to get a match on some occassions.
I believe this could be a more parsing related question than pattern matching, but I think maybe there is a better way of doing this...

To match at least one or more whitespace characters (space, tab, newline), use:
\s+
Substitute the above wherever you have the physical space in your pattern and you should be fine.

Example of matching multiple groups in a text with multiple whitespaces and/or newlines.
var txt = "text text date1\ndate2";
var matches = Regex.Match(txt, #"([a-z]+)\s+([a-z]+)\s+([a-z0-9]+)\s+([a-z0-9]+)", RegexOptions.Singleline);
matches.Groups[n].Value with n from 1 to 4 will contain your matches.

I would split the string into a string array and match each resulting string to the necessary Regular Expression.

\b(text)[\s]+(text)[\s]+(date1)[\s]+(date2)\b

Its a nasty expression but here is something that will work for the input you provided:
^(\w+)\s+([\w#.]+)\s+(\d{2}\/\d{2}\/\d{4})[^\d]+(\d{2}\/\d{2}\/\d{4})$
This will work with variable amounts of whitespace between the capture groups as well.

Through ORegex you can tokenize your string and just pattern match on token sequences:
var tokens = input.Split(new[]{' ','\t','\n','\r'}, StringSplitOptions.RemoveEmptyEntries);
var oregex = new ORegex<string>("{0}{0}{1}{1}", IsText, IsDate);
var matches = oregex.Matches(tokens); //here is your subsequence tokens.
...
public bool IsText(string str)
{
...
}
public bool IsDate(string str)
{
...
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Regex Pattern Conundrum - c#

Related

.NET Regex for parsing chess moves

Regex pattern in c# start with # and end with 9;

Regex to match full lines of text excluding crlf

Multiline regular expression in C# [duplicate]

C# reliable way to pattern match?

Categories

Resources