C# regex not matching my string - c#

I have a regex string:
string regex =
"\"\\d*\",\"(?<url>\\w|\\d|[().,-–_'])\".*";
And a string I want to match it against:
string line =
"\"4\",\"1800_in_sports\",\"24987709\",\"\",\"1906\",\"20171028152258\"";
When I try to get the url category, or even check for a match, there is no match:
var result = Regex.Match(line, regex);
string output = result.Groups["url"].Value;
If i try Regex.IsMatch(..) it also returns false.
I used http://regexstorm.net/tester to test this and it works there, but, not when I run the code.
In RegexStorm I used the pattern:
"\d{1,3}","(?<url>\w|\d|\n|[().,-–_'])+?"

Replace \\d with just \d and \\w with just \w.

As Dour High Arch mentioned, verbatim string should be used. Adding double quotes in front of double quotes allows for verbatim strings.
Changing string regex to:
string regex =
#"""\d{1,3}"",""(?<url>\w|\d|\n|[().,-–_''])+?""";
Now returns a match.

Related

C# Regex, match but not include the first character before matched string

How can I make this C# Regex to not include the first character before the URL in the matching results:
((?!\").)https?:\/\/twitter\.com\/(?:#!\/)?(\w+)\/status(?:es)?\/(\d+)
This will match:
Xhttps://twitter.com/oppomobileindia/status/798397636780953600
Notice the first X letter.
I want it to match the URLs that start without double quotes. Also not include the first character before the https for those URLs that do not start with double quotes.
An actual example that I use in my code:
var str = "<div id=\"content\">
<p>https://twitter.com/oppomobileindia/status/798397636780953600</p>
<p>\"https://twitter.com/oppomobileindia/status/11111111111111111111</p></div>";
var pattern = #"(?<!""')https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";//
var rgx = new Regex(pattern);
var results = rgx.Replace(str, "XXX");
In the above example, only the first URL should be replaces, because the second one has double quotation before the URL. It also should be replaced at the exact match, without the first letter before the matches string.
Use a (?<!") negative lookbehind:
var re = #"(?<!"")https?://twitter\.com/(?:#!/)?(\w+)/status(?:es)?/(\d+)";
The (?<!") means that there cannot be a " immediately before the current location.
In C#, you do not need to escape / inside the pattern since regex delimiters are not used when defining the regex.
Note on the C# syntax: if you want to define a " inside a verbatim string literal, double it. In a regular string literal, escape the " and \:
var re = "(?<!\")https?://twitter\\.com/(?:#!/)?(\\w+)/status(?:es)?/(\\d+)";

what's wrong with this regular expression

I'm doing some experiments with regular expressions and I don't know why the regex don't match.
string line is one line from a file. A line which should match is this
["boxusers:settings/user[boxuser11]/name"] = "username",
The number of the boxuser and the value could be different, so I tried to find a regular expression
My code is this:
string user;
string patternUser = "[\"boxusers:settings/user[boxuser\\d{2,}]/name\"] = \"";
if (Regex.Match(line,patternUser).Success)
user = Regex.Replace(Regex.Replace(line, patternUser, String.Empty), ",*", String.Empty);
So I think \d{2,0} should be a number with two digits and the rest is just the same. But the regex just don't match.
What's going wrong?
Square brackets have a special significance in regular expressions. You need to escape them with a backslash.
var line = #"[""boxusers:settings/user[boxuser11]/name""] = ""username"", ";
string patternUser = #"\[""boxusers:settings/user\[boxuser\d{2,}\]/name""\] = """;
Console.WriteLine(Regex.Match(line, patternUser).Success);
If you don't want to use verbatim strings, you'll need to use two backslashes to escape each regex metacharacter (the first to escape the second).

Extracting digits from string

I'm trying to extract some digits from a string: foo=bar&hash=00000690821388874159\";\n
I tried making a group for the digit, but it always return an empty string.
string matchString = Regex.Match(textBox1.Text, #"hash=(\d+)\\").Groups[1].Value;
I never use regex, so please tell me what I'm missing here.
There is no \\ in your string, the \ is in fact used to escape a quote so that's why the regex doesn't match. This works:
string matchString = Regex.Match(textBox1.Text, #"hash=(\d+)""").Groups[1].Value;
http://dotnetfiddle.net/2U0lkI

Regular expression: take string literally with special re-characters

Maybe simple question..
String text = "fake 43 60 fake";
String patt = "[43.60]";
Match m = Regex.Match(text, patt)
In this situation, m.Success = true because the dot replace any character (also the space). But I must match the string literally in the patt.
Of course, I can use the '\' before the dot in the patt
String patt = #"[43\.60]";
So the m.Success = false, but there's more special characters in the Regular Expression-world.
My question is, how can I use regular expression that a string will be literally taken as it set. So '43.60' must be match with exactly '43.60'. '43?60' must be match with '43?60'....
thanks.
To get a regex-safe literal:
string escaped = Regex.Escape(input);
For example, to match the literal [43.60]:
string escaped = Regex.Escape(#"[43.60]");
gives the string with content: \[43\.60].
You can then use this escaped content to create a regex; for example:
string find = "43?60";
string escaped = Regex.Escape(find);
bool match = Regex.IsMatch("abc 43?60", escaped);
Note that in many cases you will want to combine the escaped string with some other regex fragment to make a complete pattern.

problem in regular expression

I am having a regular expression
Regex r = new Regex(#"(\s*)([A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]\d(?!.*[DFIOQU])(?:[A-Z](\s?)\d[A-Z]\d))(\s*)",RegexOptions.IgnoreCase);
and having a string
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
I have to fetch C1C 1C1.This running fine.
But if a modify test string as
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
then it is unable to find the pattern i.e C1C 1C1.
any idea why this expression is failing?
You have a negative look ahead:
(?!.*[DFIOQU])
That matches the "O" in "ON" and since it is a negative look ahead, the whole pattern fails. And, as an aside, I think you want to replace this:
[A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]
With this:
[A-CEGHJ-NPR-TVYX]
A pipe (|) is a literal character inside a character class, not an alternation, and you can use ranges to help hilight the characters that you're leaving out.
A single regex might not be the best way to parse that string. Or perhaps you just need a looser regex.
You are searching for a not a following DFIOQU with your negative look ahead (?!.*[DFIOQU])
In your second string there is a O at the end in ON, so it must be failing to match.
If you remove the .* in your negative look ahead it will only check the directly following character and not the complete string to the end (Is it this what you want?).
\s*([ABCEGHJKLMNPRSTVYX]\d(?![DFIOQU])(?:[A-Z]\s?\d[A-Z]\d))\s*
then it works, see it here on Regexr. It is now checking if there is not one of the characters in the class directly after the digit, I don't know if this is intended.
Btw. I removed the | from your first character class, its not needed and also some brackets around your whitespaces, also not needed.
As I understood you need to find the C1C 1C1 text in your string
I've used this regex for do this
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
after that you can extract text from named groups
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
string secondStr = "LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
Match match = myRegex.Match(strTargetString);
string c1c = match.Groups["c1c"].Value;
string c1c2 = match.Groups["c1c2"].Value;
Console.WriteLine(c1c + " " +c1c2);

Categories