Simplify Regex grouping

Simplify Regex grouping - c#

var pattern = (?:[P|p]rint\("")(.+)(?:""\);?)
var input = Print("Hello World");
Results in two groups, the second one captures exactly what I want to capture and the first one is completely useless, how do I remove the first one?
I tried (?:ABC) it didn't work

Your pattern uses 1 capturing group () and 2 non capturing groups using (?:)
Those 2 non capturing groups you can omit as well as the | from the character class. I think you also would like to make the .* non greedy like .*? to prevent overmatching.
Then your pattern could look like(Matching an optional semicolon at the end):
[Pp]rint\("(.+?)"\);?
Regex demo
You might also use a version with a negated character class to match not a double quote:
[Pp]rint\(("[^"]+)"\);
Regex demo

Try following :
string input = "var input = Print(\"Hello World\");";
string pattern = "[Pp]rint\\(\"(?'message'[^\"]+)";
Match match = Regex.Match(input, pattern);
string message = match.Groups["message"].Value;

Related

Regex match with multiple delimiters

I have a regex that takes out all parts of a string in between citation marks.
\(([^)]*)\)
So
*- (Hello) + (World) -
returns two matches
(Hello)
(World)
Im trying but failing to modify it so that i also get the parts in between as their own matches. Like:
*-
(Hello)
+
(World)
-
Is it even possible?

In this case, with the current regex, you may use Regex.Split with the pattern wrapped in a capturing group:
var tokens = Regex.Split(s, #"(\([^)]*\))");
Or even, when matches occur in the leading/trailing positions:
var tokens = Regex.Split(s, #"(\([^)]*\))").Where(m => !string.IsNullOrEmpty(m));
See the regex demo:
Note you may need to replace all capturing groups in your regex into non-capturing to use this feature. When you use "technical" capturing groups to later refer to using backreferences, you would have to build the non-matching substring array using multiple matching and calling .Substring() on the input using the information on the match position.

You could use an alternation to match either the parenthesis with the characters \([^)]*\) or | match one or more times the characters listed in a character class [*+-]+
\([^)]*\)|[*+-]+
string pattern = #"\([^)]*\)|[*+-]+";
string input = #"*- (Hello) + (World) - ";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Value);
}
That would give you:
*-
(Hello)
+
(World)
-
Demo C#

Regular Expression to match the pattern

I am looking for Regular Expression search pattern to find data within $< and >$.
string pattern = "\b\$<[^>]*>\$";
is not working.
Thanks,

You can make use of a tempered greedy token:
\$<(?:(?!\$<|>\$)[\s\S])*>\$
See demo
This way, you will match only the closest boundaries.
Your regex does not match because you do not allow > in-between your markers, and you are using \b where you most probably do not have a word boundary.
If you do not want to get the delimiters in the output, use capturing group:
\$<((?:(?!\$<|>\$)[\s\S])*)>\$
^ ^
And the result will be in Group 1.
In C#, you should consider declaring all regex patterns (whenever possible) with the help of a verbatim string literal notation (with #"") because you won't have to worry about doubling backslashes:
var rx = new Regex(#"\$<(?:(?!\$<|>\$)[\s\S])*>\$");
Or, since there is a singleline flag (and this is preferable):
var rx = new Regex(#"\$<((?:(?!\$<|>\$).)*)>\$", RegexOptions.Singleline | RegexOptions.CultureInvariant);
var res = rx.Match(text).Select(p => p.Groups[1].Value).ToList();

This pattern will do the work:
(?<=\$<).*(?=>\$)
Demo: https://regex101.com/r/oY6mO2/1

To find this pattern in php you have this REGEX code for find any patten,
/$<(.*?)>$/s
For Example:
$arrayWhichStoreKeyValueArrayOfYourPattern= array();
preg_match_all('/$<(.*?)>$/s',
$yourcontentinwhichyoufind,
$arrayWhichStoreKeyValueArrayOfYourPattern);
for($i=0;$i<count($arrayWhichStoreKeyValueArrayOfYourPattern[0]);$i++)
{
$content=
str_replace(
$arrayWhichStoreKeyValueArrayOfYourPattern[0][$i],
constant($arrayWhichStoreKeyValueArrayOfYourPattern[1][$i]),
$yourcontentinwhichyoufind);
}
using this example you will replace value using same name constant content in this var $yourcontentinwhichyoufind
For example you have string like this which has also same named constant.
**global.php**
//in this file my constant declared.
define("MYNAME","Hiren Raiyani");
define("CONSTANT_VAL","contant value");
**demo.php**
$content="Hello this is $<MYNAME>$ and this is simple demo to replace $<CONSTANT_VAL>$";
$myarr= array();
preg_match_all('/$<(.*?)>$/s', $content, $myarray);
for($i=0;$i<count($myarray[0]);$i++)
{
$content=str_replace(
$myarray[0][$i],
constant($myarray[1][$i]),
$content);
}
I think as i know that's all.

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?

Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.

Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).

Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

C# RegEx - get only first match in string

I've got an input string that looks like this:
level=<device[195].level>&name=<device[195].name>
I want to create a RegEx that will parse out each of the <device> tags, for example, I'd expect two items to be matched from my input string: <device[195].level> and <device[195].name>.
So far I've had some luck with this pattern and code, but it always finds both of the device tags as a single match:
var pattern = "<device\\[[0-9]*\\]\\.\\S*>";
Regex rgx = new Regex(pattern);
var matches = rgx.Matches(httpData);
The result is that matches will contain a single result with the value <device[195].level>&name=<device[195].name>
I'm guessing there must be a way to 'terminate' the pattern, but I'm not sure what it is.

Use non-greedy quantifiers:
<device\[\d+\]\.\S+?>
Also, use verbatim strings for escaping regexes, it makes them much more readable:
var pattern = #"<device\[\d+\]\.\S+?>";
As a side note, I guess in your case using \w instead of \S would be more in line with what you intended, but I left the \S because I can't know that.

depends how much of the structure of the angle blocks you need to match, but you can do
"\\<device.+?\\>"

I want to create a RegEx that will parse out each of the <device> tags
I'd expect two items to be matched from my input string:
1. <device[195].level>
2. <device[195].name>
This should work. Get the matched group from index 1
(<device[^>]*>)
Live demo
String literals for use in programs:
#"(<device[^>]*>)"

Change your repetition operator and use \w instead of \S
var pattern = #"<device\[[0-9]+\]\.\w+>";
String s = #"level=<device[195].level>&name=<device[195].name>";
foreach (Match m in Regex.Matches(s, #"<device\[[0-9]+\]\.\w+>"))
Console.WriteLine(m.Value);
Output
<device[195].level>
<device[195].name>

Use named match groups and create a linq entity projection. There will be two matches, thus separating the individual items:
string data = "level=<device[195].level>&name=<device[195].name>";
string pattern = #"
(?<variable>[^=]+) # get the variable name
(?:=<device\[) # static '=<device'
(?<index>[^\]]+) # device number index
(?:]\.) # static ].
(?<sub>[^>]+) # Get the sub command
(?:>&?) # Match but don't capture the > and possible &
";
// Ignore pattern whitespace is to document the pattern, does not affect processing.
var items = Regex.Matches(data, pattern, RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Variable = mt.Groups["variable"].Value,
Index = mt.Groups["index"].Value,
Sub = mt.Groups["sub"].Value
})
.ToList();
items.ForEach(itm => Console.WriteLine ("{0}:{1}:{2}", itm.Variable, itm.Index, itm.Sub));
/* Output
level:195:level
name:195:name
*/

C# - Regex Match whole words

I need to match all the whole words containing a given a string.
string s = "ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
Regex r = new Regex("(?<TM>[!\..]*TEST.*)", ...);
MatchCollection mc = r.Matches(s);
I need the result to be:
MYTESTING
YOUTESTED
TESTING
But I get:
TESTING
TESTED
.TESTING
How do I achieve this with Regular expressions.
Edit: Extended sample string.

If you were looking for all words including 'TEST', you should use
#"(?<TM>\w*TEST\w*)"
\w includes word characters and is short for [A-Za-z0-9_]

Keep it simple: why not just try \w*TEST\w* as the match pattern.

I get the results you are expecting with the following:
string s = #"ABC.MYTESTING
XYZ.YOUTESTED
ANY.TESTING";
var m = Regex.Matches(s, #"(\w*TEST\w*)", RegexOptions.IgnoreCase);

Try using \b. It's the regex flag for a non-word delimiter. If you wanted to match both words you could use:
/\b[a-z]+\b/i
BTW, .net doesn't need the surrounding /, and the i is just a case-insensitive match flag.
.NET Alternative:
var re = new Regex(#"\b[a-z]+\b", RegexOptions.IgnoreCase);

Using Groups I think you can achieve it.
string s = #"ABC.TESTING
XYZ.TESTED";
Regex r = new Regex(#"(?<TM>[!\..]*(?<test>TEST.*))", RegexOptions.Multiline);
var mc= r.Matches(s);
foreach (Match match in mc)
{
Console.WriteLine(match.Groups["test"]);
}
Works exactly like you want.
BTW, your regular expression pattern should be a verbatim string ( #"")

Regex r = new Regex(#"(?<TM>[^.]*TEST.*)", RegexOptions.IgnoreCase);
First, as #manojlds said, you should use verbatim strings for regexes whenever possible. Otherwise you'll have to use two backslashes in most of your regex escape sequences, not just one (e.g. [!\\..]*).
Second, if you want to match anything but a dot, that part of the regex should be [^.]*. ^ is the metacharacter that inverts the character class, not !, and . has no special meaning in that context, so it doesn't need to be escaped. But you should probably use \w* instead, or even [A-Z]*, depending on what exactly you mean by "word". [!\..] matches ! or ..
Regex r = new Regex(#"(?<TM>[A-Z]*TEST[A-Z]*)", RegexOptions.IgnoreCase);
That way you don't need to bother with word boundaries, though they don't hurt:
Regex r = new Regex(#"(?<TM>\b[A-Z]*TEST[A-Z]*\b)", RegexOptions.IgnoreCase);
Finally, if you're always taking the whole match anyway, you don't need to use a capturing group:
Regex r = new Regex(#"\b[A-Z]*TEST[A-Z]*\b", RegexOptions.IgnoreCase);
The matched text will be available via Match's Value property.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Simplify Regex grouping - c#

var pattern = (?:[P|p]rint\("")(.+)(?:""\);?) var input = Print("Hello World"); Results in two groups, the second one captures exactly what I want to capture and the first one is completely useless, how do I remove the first one? I tried (?:ABC) it didn't work

Try following : string input = "var input = Print(\"Hello World\");"; string pattern = "[Pp]rint\\(\"(?'message'[^\"]+)"; Match match = Regex.Match(input, pattern); string message = match.Groups["message"].Value;

Related

Regex match with multiple delimiters

Regular Expression to match the pattern

Regex to find special pattern

C# RegEx - get only first match in string

C# - Regex Match whole words

Categories

Resources