Regex pattern with dot and quotation mark - c#

How can I declare regex pattern in c#. Pattern starts with quotation mark and later is url address, some thing like below
\"www\.mypage\.pl //<- this is my pattern
string pattern = ? //todo: what should I put there

using verbatim strings:
string pattern = #"""www\.mypage\.pl";

You can use \ backslash to escape the regex semantic and revert to the actual literal. If you want to escape
"www.mypage.pl
you can use
\"[0-9a-zA-Z]*\.[0-9a-zA-Z]*\.[0-9a-zA-Z]*
^ this is not to escape the regex but the eventual string
if you use single quotes you don't need to escape the quotes!
Note that [0-9a-zA-Z]* will require some more characters like % and - and _ to capture all cases according to an RFC about URL syntax I can't produce now (it's easy to find online).
If you want to escape
\"www\.mypage\.pl
you have to escape the escapes as well:
\\\"[0-9a-zA-Z]*\\\.[0-9a-zA-Z]*\\\.[0-9a-zA-Z]*

Other answers have already handled (very nicely) how to put the phrase itself into a string.
To use it, you need to refer to the System.Text.RegularExpressions namespace, documentation for which is over at MSDN. As a (very) quick example:
System.Text.RegularExpressions.Regex.Replace(content,
regexPattern, newMatchContent);
will replace matches for the regular expression regexPattern in content with newMatchContent, and return the result.

Related

regular expression not matching in c# but matching in regexr

I found a citation parsing regular expression here: http://linklens.blogspot.com.au/2009/04/citation-parsing-regular-expression.html and it's working fine when testing it at http://www.regexr.com, however it's not working when attempting to use Regex.Match in c#.
This is the expression (with escaped \"") - evaluated from c# and re-tested in regexr.
/([^e][^d][^s][^\.]\s|\d+\.?\s|^)([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*))(\s?(,|and|&|,\s?and)?\s?([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*)))*\s*(\(?\d\d\d\d\)?\.?)?\s*(\""|“)?((([A-Za-z:,\r\n]{2,}\s?){3,}))\.?(\""|”)?/g
Would anybody familiar with regular expressions notice anything that may not be compatible with c# in this fairly complex expression?
Edit:
Link to regexr example with some text citations: http://regexr.com/3a232
var myMatches = #"/([^e][^d][^s][^\.]\s|\d+\.?\s|^)([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*))(\s?(,|and|&|,\s?and)?\s?([A-Z][a-z]{1,},?((\s[A-Z](\.|,|\.,))(\s?[A-Z](\.|,|\.,))*)))*\s*(\(?\d\d\d\d\)?\.?)?\s*(""|“)?((([A-Za-z:,\r\n]{2,}\s?){3,}))\.?(""|”)?/g";
var matches = Regex.Matches(TestApp.Properties.Resources.Citation, myMatches);
Console.WriteLine(matches.Count);
Returns 0 matches.
You are escaping the quotation marks wrong. It's never escaped with \"".
In a regular string a quotation mark is escaped with \".
In a # delimited string a quotation mark is escaped with "".
You should remove the / from the beginning of the string and the /g from the end of the string. They are not part of the pattern, that is the syntax for a regex literal (which doesn't exist in C# syntax by the way).

Regex.Replace not working

I am having a strange issue with Regex.Replace
string test = "if the other party is in material breach of such contract and, after <span style=\"background-color:#ffff00;\">forty-five (45)</span> calendar days notice of such breach";
string final = Regex.Replace(test, "forty-five (45)", "forty-five (46)", RegexOptions.IgnoreCase);
the "final" string still shows "forty-five (45)". Any idea why? I am assuming it has to do something with the tag. How do I fix this?
Thanks
Escape the parenthesis. Depending on the language, might require two back slashes.
string final = Regex.Replace(test, "forty-five \(45\)", "forty-five (46)", RegexOptions.IgnoreCase);
Basically, parenthesis are defined to mean something, and by escaping the characters, you are telling regex to use the parenthesis character, and not the meaning.
Better yet, why are you using a Regex to do this at all? Try just doing a normal string replacement.
string final = test.Replace("forty-five (45)", "forty-six (46)")
Parentheses are special in regular expressions. They delimit a group, to allow for things such as alternation. For example, the regular expression foo(bar|bat)baz matches:
foo, followed by
either bar OR bat, followed by
baz
So, a regular expression like foo(bar) will never match the literal string foo(bar). What it will match is the literal string foobar. Consequently, you need to escape the metacharacters. In C#, this should do you:
string final = Regex.Replace(test, #"forty-five \(45\)", "forty-five (46)", RegexOptions.IgnoreCase);
The #-quoted string helps avoid headaches from excessive backslashes. Without it, you'd have to write "forty-five \(45\)".
If you are unable to escape the parenthesis, put them in a character class:
forty-five [(]45[)]

Find and replace a specific number with regex

I have the following string
string absoluteUri = "http://localhost/asdf1234?$asdf=1234&$skip=1234&skip=4321&$orderby=asdf"
In this string I would like to replace '$skip=1234' with '$skip=1244'
I have tried the following regular expression:
Regex.Replace(absoluteUri, #"$skip=\d+", "$skip=1244");
Unfortunately this is not working. What am I doing wrong?
The output should be:
"http://localhost/asdf1234?$asdf=1234&$skip=1244&skip=4321&$orderby=asdf"
$ is a special character in regular expressions (it's an anchor). You need to escape it in both the expression and in the replacement string, but they are escaped differently.
In the regular expression, you escape it with a \ but in the substitution you escape it by adding another $:
Regex.Replace(absoluteUri, #"\$skip=\d+", "$$skip=1244");
I can't add comment.
Just little fix. Need to do:
absoluteUri = Regex.Replace(absoluteUri, #"\$skip=\d+", "$skip=1244");

Regex, MVS does not like my Regex strings, how do I make it comply

So in microsoft visual studio I have a string that is compiled into a regex. My string is "#(\d+(.\d+)?)=(\d+(.\d+)?)". I cannot compile my program because I get an error saying that \d is a unrecognized escape character. How do I tell it to shut up and let me regex like a pro?
Begin your string with #, that causes the compiler to leave (almost) all characters alone, unescaped (the exception is ", which can be escaped as ""):
#"#(\d+(.\d+)?)=(\d+(.\d+)?"
The problem is that c# does not like the \d inside the string. Use a verbatim string instead
string pattern = #"#(\d+(.\d+)?)=(\d+(.\d+)?)";
The "#" denotes it. C# will not look for escape sequences in the string. If you have to escape a " use two "".
Of cause you can use normal strings. but then you will have to escape the backslashes
string pattern = "#(\\d+(.\\d+)?)=(\\d+(.\\d+)?)";
If you're using a normal string, you need to escape your backslashes, like so:
"#(\\d+(.\\d+)?)=(\\d+(.\\d+)?)"
Basically, you're putting a literal string into C#; the C# compiler sees the string first, and tries to interpret \d as an escape sequence (which doesn't exist, hence error). Therefore, you use \\d to get the C# compiler to see the string as \d, which then gets passed to the regex engine (which does recognize \d as something meaningful). (yes, if you want to match a literal backslash in your regex pattern, you need to use \\\\)
But in C#, you have the alternative of just prepending the string with # to get the compiler to leave the string alone (though " still needs escaping), so that would be like this:
#"#(\d+(.\d+)?)=(\d+(.\d+)?)"
You could also use a verbatim string literal (I prefer to use these because of readability).
Use #"(#\d+(.\d+)?)=(\d+(.\d+)?)"
The #" sign indicates that the string shouldn't interpret escaped characters (A character prefixed by a \) until the closing " is reached.
Note: You can match a single " in your search pattern by double quoting instead "". For instance you can match "Hello" by using the pattern #"""\w+"""

regex syntax stop search

How do I make Regex stop the search after "Target This"?
HeaderText="Target This" AnotherAttribute="Getting Picked Up"
This is what i've tried
var match = Regex.Match(string1, #"(?<=HeaderText=\").*(?=\")");
The quantifier * is eager, which means it will consume as many characters as it can while still getting a match. You want the lazy quantifier, *?.
As an aside, rather than using look-around expressions as you have done here, you may find it in general easier to use capturing groups:
var match = Regex.Match(string1, "HeaderText=\"(.*?)\"");
^ ^ these make a capturing group
Now the match matches the whole thing, but match.Groups[1] is just the value in the quotes.
Plain regex pattern
(?<=HeaderText=").*?(?=")
or as string
string pattern = "(?<=HeaderText=\").*?(?=\")";
or using a verbatim string
string pattern = #"(?<=HeaderText="").*?(?="")";
The trick is the question mark after .*. It means "as few as possible", making it stop after the first end-quotes it encounters.
Note that verbatim strings (introduced with #) do not recognize the backslash \ as escape character. Escape the double quotes by doubling them.
Note for others interested in regex: The search pattern used finds a postion between a prefix and a suffix:
(?<=prefix)find(?=suffix)
Try this:
var match = Regex.Match(string1, "HeaderText=\"([^\"]+)");
var val = match.Groups[1].Value; //Target This
UPDATE
if there possibilities have double quotes in target,change the regex to:
HeaderText=\"(.+?)\"\\s+\\w
Note: it's not right way to do this, if it's a XML, check out System.XML otherwise,HtmlAgilityPack / How to use HTML Agility pack.

Categories