Regular expression with "|" - c#

I need to be able to check for a pattern with | in them. For example an expression like d*|*t should return true for a string like "dtest|test".
I'm no regular expression hero so I just tried a couple of things, like:
Regex Pattern = new Regex("s*\|*d"); //unable to build because of single backslash
Regex Pattern = new Regex("s*|*d"); //argument exception error
Regex Pattern = new Regex(#"s*\|*d"); //returns true when I use "dtest" as input, so incorrect
Regex Pattern = new Regex(#"s*|*d"); //argument exception error
Regex Pattern = new Regex("s*\\|*d"); //returns true when I use "dtest" as input, so incorrect
Regex Pattern = new Regex("s*" + "\\|" + "*d"); //returns true when I use "dtest" as input, so incorrect
Regex Pattern = new Regex(#"s*\\|*d"); //argument exception error
I'm a bit out of options, what should I then use?
I mean this is a pretty basic regular expression I know, but I'm not getting it for some reason.

In regular expressions, the * means "zeros or more (the pattern before it)", e.g. a* means zero or more a, and (xy)* expects matches of the form xyxyxyxy....
To match any characters, you should use .*, i.e.
Regex Pattern = new Regex(#"s.*\|.*d");
(Also, | means "or")
Here . will match any characters[1], including |. To avoid this you need to use a character class:
new Regex(#"s[^|]*\|[^d]*d");
Here [^x] means "any character except x".
You may read http://www.regular-expressions.info/tutorial.html to learn more about RegEx.
[1]: Except a new line \n. But . will match \n if you pass the Singleline option. Well this is more advanced stuff...

A | inside a char class will be treated literally, so you can try the regex:
[|]

How about s.*\|.*d?
The problem of your tries is, that you wrote something like s* - which means: match any number of s(including 0). You need to define the characters following the s by using . like in my example. You can use \w for alphanumerical characters, only.

Try this.
string test1 = "dtest|test";
string test2 = "apple|orange";
string pattern = #"d.*?\|.*?t";
Console.WriteLine(Regex.IsMatch(test1, pattern));
Console.WriteLine(Regex.IsMatch(test2, pattern));

Regex Pattern = new Regex(#"s*\|*d"); would work, except that having |* means "0 or more pipes". So You probably want Regex Pattern = new Regex(#"s.*\|.*d");

In Javascript, if you construct
var regex = /somestuff\otherstuff/;,
then backslashes are as you'd expect. But if you construct the very same thing with the different syntax
var regex = new Regex("somestuff\\otherstuff");
then because of a weirdness in the way Javascript is parsed you have have to double all backslashes. I suspect your first attempt was correct, but you imported a new problem while solving the old in that you ran afoul of this other issue about single backslashes.

Related

Simplify Regex grouping

var pattern = (?:[P|p]rint\("")(.+)(?:""\);?)
var input = Print("Hello World");
Results in two groups, the second one captures exactly what I want to capture and the first one is completely useless, how do I remove the first one?
I tried (?:ABC) it didn't work
Your pattern uses 1 capturing group () and 2 non capturing groups using (?:)
Those 2 non capturing groups you can omit as well as the | from the character class. I think you also would like to make the .* non greedy like .*? to prevent overmatching.
Then your pattern could look like(Matching an optional semicolon at the end):
[Pp]rint\("(.+?)"\);?
Regex demo
You might also use a version with a negated character class to match not a double quote:
[Pp]rint\(("[^"]+)"\);
Regex demo
Try following :
string input = "var input = Print(\"Hello World\");";
string pattern = "[Pp]rint\\(\"(?'message'[^\"]+)";
Match match = Regex.Match(input, pattern);
string message = match.Groups["message"].Value;

Regular Expression to match the pattern

I am looking for Regular Expression search pattern to find data within $< and >$.
string pattern = "\b\$<[^>]*>\$";
is not working.
Thanks,
You can make use of a tempered greedy token:
\$<(?:(?!\$<|>\$)[\s\S])*>\$
See demo
This way, you will match only the closest boundaries.
Your regex does not match because you do not allow > in-between your markers, and you are using \b where you most probably do not have a word boundary.
If you do not want to get the delimiters in the output, use capturing group:
\$<((?:(?!\$<|>\$)[\s\S])*)>\$
^ ^
And the result will be in Group 1.
In C#, you should consider declaring all regex patterns (whenever possible) with the help of a verbatim string literal notation (with #"") because you won't have to worry about doubling backslashes:
var rx = new Regex(#"\$<(?:(?!\$<|>\$)[\s\S])*>\$");
Or, since there is a singleline flag (and this is preferable):
var rx = new Regex(#"\$<((?:(?!\$<|>\$).)*)>\$", RegexOptions.Singleline | RegexOptions.CultureInvariant);
var res = rx.Match(text).Select(p => p.Groups[1].Value).ToList();
This pattern will do the work:
(?<=\$<).*(?=>\$)
Demo: https://regex101.com/r/oY6mO2/1
To find this pattern in php you have this REGEX code for find any patten,
/$<(.*?)>$/s
For Example:
$arrayWhichStoreKeyValueArrayOfYourPattern= array();
preg_match_all('/$<(.*?)>$/s',
$yourcontentinwhichyoufind,
$arrayWhichStoreKeyValueArrayOfYourPattern);
for($i=0;$i<count($arrayWhichStoreKeyValueArrayOfYourPattern[0]);$i++)
{
$content=
str_replace(
$arrayWhichStoreKeyValueArrayOfYourPattern[0][$i],
constant($arrayWhichStoreKeyValueArrayOfYourPattern[1][$i]),
$yourcontentinwhichyoufind);
}
using this example you will replace value using same name constant content in this var $yourcontentinwhichyoufind
For example you have string like this which has also same named constant.
**global.php**
//in this file my constant declared.
define("MYNAME","Hiren Raiyani");
define("CONSTANT_VAL","contant value");
**demo.php**
$content="Hello this is $<MYNAME>$ and this is simple demo to replace $<CONSTANT_VAL>$";
$myarr= array();
preg_match_all('/$<(.*?)>$/s', $content, $myarray);
for($i=0;$i<count($myarray[0]);$i++)
{
$content=str_replace(
$myarray[0][$i],
constant($myarray[1][$i]),
$content);
}
I think as i know that's all.

problem in regular expression

I am having a regular expression
Regex r = new Regex(#"(\s*)([A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]\d(?!.*[DFIOQU])(?:[A-Z](\s?)\d[A-Z]\d))(\s*)",RegexOptions.IgnoreCase);
and having a string
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
I have to fetch C1C 1C1.This running fine.
But if a modify test string as
string test="LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
then it is unable to find the pattern i.e C1C 1C1.
any idea why this expression is failing?
You have a negative look ahead:
(?!.*[DFIOQU])
That matches the "O" in "ON" and since it is a negative look ahead, the whole pattern fails. And, as an aside, I think you want to replace this:
[A|B|C|E|G|H|J|K|L|M|N|P|R|S|T|V|Y|X]
With this:
[A-CEGHJ-NPR-TVYX]
A pipe (|) is a literal character inside a character class, not an alternation, and you can use ranges to help hilight the characters that you're leaving out.
A single regex might not be the best way to parse that string. Or perhaps you just need a looser regex.
You are searching for a not a following DFIOQU with your negative look ahead (?!.*[DFIOQU])
In your second string there is a O at the end in ON, so it must be failing to match.
If you remove the .* in your negative look ahead it will only check the directly following character and not the complete string to the end (Is it this what you want?).
\s*([ABCEGHJKLMNPRSTVYX]\d(?![DFIOQU])(?:[A-Z]\s?\d[A-Z]\d))\s*
then it works, see it here on Regexr. It is now checking if there is not one of the characters in the class directly after the digit, I don't know if this is intended.
Btw. I removed the | from your first character class, its not needed and also some brackets around your whitespaces, also not needed.
As I understood you need to find the C1C 1C1 text in your string
I've used this regex for do this
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
after that you can extract text from named groups
string strRegex = #"^.*(?<c1c>C1C)\s*(?<c1c2>1C1).*$";
RegexOptions myRegexOptions = RegexOptions.Multiline;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"LJHLJHL HJGJKDGKJ JGJK C1C 1C1 LKJLKJ";
string secondStr = "LJHLJHL HJGJKDGKJ JGJK C1C 1C1 ON";
Match match = myRegex.Match(strTargetString);
string c1c = match.Groups["c1c"].Value;
string c1c2 = match.Groups["c1c2"].Value;
Console.WriteLine(c1c + " " +c1c2);

Simple regex question C#

I need to match the string that is shown in the window displayed below :
8% of setup_av_free.exe from software-files-l.cnet.com Completed
98% of test.zip from 65.55.72.119 Completed
[numeric]%of[filename]from[hostname | IP address]Completed
I have written the regex pattern halfway
if (Regex.IsMatch(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s]"))
MessageBox.Show(text);
and I now need to integrate the following regex into my code above
ValidIpAddressRegex = "^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
ValidHostnameRegex = "^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
The 2 regex were taken from this link. These 2 regex works well when i use the Regex.ismatch to match "123.123.123.123" and "software-files-l.cnet.com" . However i cannot get it to work when i intergrate both of them to my existin regex code. I tried several variant but not able to get it to work. Can someone guide me to integrate the 2 regex to my existing code. Thanks in advance.
You can certainly combine all these regular expressions into one, but I'd recommend against it. Consider this method, first it checks wether your input text has the correct form overall, then it checks if the "from" part is an IP address or a hostname.
bool CheckString(string text) {
const string ValidIpAddressRegex = #"^(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])$";
const string ValidHostnameRegex = #"^(([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])$";
var match = Regex.Match(text, #"[\d]+%[\s]of[\s](.+?)(\.[^.]*)[\s]from[\s](\S+)");
if(!match.Success)
return false;
string address = match.Groups[3].Value;
return Regex.IsMatch(address, ValidIpAddressRegex) ||
Regex.IsMatch(address, ValidHostnameRegex);
}
It does what you want and is much more readable and than single monster-sized regular expression. If you aren't going to call this method millions of time in a loop there is no reason to be concerned about it being less performant that single regex.
Also, in case you aren't aware of that the brackets around \d or \s aren't necessary.
The "Problem" that those two regexes do not match your string is that they start with ^ and end with $
^ means match the start of the string (or row if the m modifier is activated)
$ means match the end of the string (or row if the m modifier is activated)
When you try it this is true but in your real text they are in the middle of the string, so it is not matched.
Try just remove the ^ at the very beginning and the $ at the very end.
Here you go.
^[\d]+%[\s+]of[\s+](.+?)(\.[^.]*)[\s+]from[\s+]((([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\.){3}([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])|((([a-zA-Z]|[a-zA-Z][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z]|[A-Za-z][A-Za-z0-9\-]*[A-Za-z0-9])))[\s+]Completed
Remove the ^ and $ characters from the ValidIpAddressRegex and ValidHostnameRegex samples above, and add them separated by the or character (|) enclosed by parentheses.
You could use this, its should work for all cases. I mightve accidentally deleted a character while formatting so let me know if it doesnt work.
string captureString = "8% of setup_av_free.exe from software-files-l.cnet.com Completed";
Regex reg = new Regex(#"(?<perc>\d+)% of (?<file>\w+\.\w+) from (?<host>" +
#"(\d+\.\d+.\d+.\d+)|(((https?|ftp|gopher|telnet|file|notes|ms-help):" +
#"((//)|(\\\\))+)?[\w\d:##%/;$()~_?\+-=\\\.&]*)) Completed");
Match m = reg.Match(captureString);
string perc = m.Groups["perc"].Value;
string file = m.Groups["file"].Value;
string host = m.Groups["host"].Value;

C# - Regex Match whole words

I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.
If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]
Keep it simple: why not just try \w*TEST\w* as the match pattern.
I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);
Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);
Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")
Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.

Categories