Regular expression matching a given structure - c#

I need to generate a regex to match any string with this structure:
{"anyWord"}{"aSpace"}{"-"}{"anyLetter"}
How can I do it?
Thanks
EDIT
I have tried:
string txt="print -c";
string re1="((?:[a-z][a-z]+))"; // Word 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String word1=m.Groups[1].ToString();
Console.Write("("+word1.ToString()+")"+"\n");
}
Console.ReadLine();
but this only matches the word "print"

This would be pretty straight-forward :
[a-zA-Z]+\s\-[a-zA-Z]
explained as follows :
[a-zA-Z]+ # Matches 1 or more letters
\s # Matches a single space
\- # Matches a single hyphen / dash
[a-zA-Z] # Matches a single letter
If you needed to implement this in C#, you could just use the Regex class and specifically the Regex.Matches() method:
var matches = Regex.Matches(yourString,#"[a-zA-Z]+\s\-[a-zA-Z]");
Some example matching might look like this :

Related

Building a regular expression in C#

How to check the following text in C# with Regex:
key_in-get { 43243225543543543 };
or
key_in_set { password123 : 34980430943834 };
I tried to build a regular expression, but I failed after few hours.
Here is my code:
string text1 = "key_in-get { 322389238237 };";
string text2 = "key_in-set { password123 : 322389238237 };";
string pattern = "key_in-(get|set) { .* };";
var result1 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result1);
var result2 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result2);
I have to check if there is "set" or "get".
If the pattern finds "set" then it can only accept following pattern "text123 : 123456789", and if it finds "get" then should accept only "123456789".
You can use
key_in-(?:get|(set)) {(?(1) \w+ :) \w+ };
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\w+\s*};
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\d+\s*};
See the regex demo. The second one allows any amount of any whitespace between the elements and the third one allows only digits after : or as part of the get expression.
If the whole string must match, add ^ at the start and $ at the end of the pattern.
Details:
key_in- - a substring
(?:get|(set)) - get or set (the latter is captured into Group 1)
\s* - zero or more whitespaces
{ - a { char
(?(1)\s*\w+\s*:) - a conditional construct: if Group 1 matched, match one or more word chars enclosed with zero or more whitespaces and then a colon
\s*\w+\s* - one or more word chars enclosed with zero or more whitespaces
}; - a literal substring.
In the pattern that you tried key_in-(get|set) { .* }; you are matching either get or set followed by { until the last occurrence of } which could possibly also match key_in-get { }; };
As an alternative solution, you could use an alternation | specifying each of the accepted parts for the get and the set.
key_in-(?:get\s*{\s*\w+|set\s*{\s*\w+\s*:\s*\w+)\s*};
The pattern matches
key_in- Match literally
(?: Non capture group
get\s*{\s*\w+ Match get, { between optional whitespace chars and 1+ word chars
| Or
set\s*{\s*\w+\s*:\s*\w+ Match set, { between optional whitespace chars and word chars on either side with : in between.
) Close non capture group
\s*}; Match optional whitespace chars and };
Regex demo

Regex - split by "_" and exclude file extension

I need to split the following string AAA_BBB_CCC.extension by "_" and exclude from the results any file extension.
Where A, B and C can be any character or space. I wish to get AAA, BBB and CCC.
I know that \.(?:.(?!\.))+$ will match .extension but I could not combine it with matching "_" for splitting.
Use the Path.GetFileNameWithoutExtension function to strip the extension from the file name.
Then use String.Split to get an array with three items:
var fileName = Path.GetFileNameWithoutExtension(fullName);
var parts = fileName.Split('_');
var partAAA = parts[0];
var partBBB = parts[1];
var partCCC = parts[2];
If the parts are always the same fixed number of characters long, you can as well extract them using the Substring function. No need to resort to regex here.
Another option is to make use of the .NET Group.Captures property and capture any char except an _ in a named capture group, which you can extract from the match using a named group.
^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$
Explanation
^ Start of string
(?'val'[^_]+) Named group val, match 1+ chars other than _ using a negated character class
(?: Non caputure group
_(?'val'[^_]+) Match an _ and capture again 1+ chars other than _ in same named group val
)+ Close the non capture group and repeat 1+ times for at least 1 occurrence with _
\.\w+ Match a . and 1+ word chars
$ End of string
Regex demo
string pattern = #"^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$";
string input = #"AAA_BBB_CCC.extension";
Match m = Regex.Match(input, pattern);
foreach (Capture capture in m.Groups["val"].Captures) {
Console.WriteLine(capture.Value);
}
Output
AAA
BBB
CCC
If you wanted to use a regex based approach here, you could try doing a find all on the following regex pattern:
[^_]+(?=.*\.\w+$)
This pattern will match every term in between underscore, except for the portion after the extension, which will be excluded by the lookahead.
Regex rx = new Regex(#"[^_]+(?=.*\.\w+$)");
string text = "AAA_BBB_CCC.extension";
MatchCollection matches = rx.Matches(text);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[0].Value);
}
This prints:
AAA
BBB
CCC

Build a regex that does not contain the first and last character you are looking for in the match

I have the following problem.
This is what the regex looks like:
var regexTest = new Regex(#"'\d.*\d#");
This is what the string looks like:
var text = "dsadsadsadsa('1.222222#dsadsa'";
That is the result of what I would like to have:
1.222222
That's the result I'm getting right now ...:
'1.222222#
You want to extract the float number in between ' and ", use
var text = "dsadsadsadsa('1.222222#dsadsa'";
var regexTest = new Regex(#"'(\d+\.\d+)#");
var m = regexTest.Match(text);
if (m.Success)
{
Console.WriteLine(m.Groups[1].Value);
}
Here, (\d+\.\d+) captures any 1+ digits, . and then 1+ digits into Group 1 that you may access using match.Groups[1].Value. However, only access that value if there was a match, or you will get an exception (see m.Success part in my demo snippet).
See the regex demo:
Just enclose the part you want to get in parentheses, so that you can get it as a group:
var regexTest = new Regex(#"'(\d.*\d)#");
-----------------------------^------^----
In '\d.*\d# you are are matching ' followed by a digit, any character 0+ times followed by a digit. That would match '1.222222# but also for example '1.A2# because of the .*
To don't match the ' and the # you could use a positive lookahead and a positive lookbehind to assert that they are there. If you only want to match digits then the .* could be left out.
(?<=')\d+\.\d+(?=#)
Regex demo

Regular expression to find 3 repeated words

I'm trying to create a regular expression which matches the same word 3 times, they are separated by a comma. For example, some inputs would be:
HEY,HEY,HEY - match
NO,NO,NO - match
HEY,HI,HEY - no match
HEY,H,Y - no match
HEY,NO,HEY - no match
How can I go about doing this? I've had a look at some example but they are only good for characters, not words.
This should do the trick:
^(\w+),\1,\1$
Explanation:
^: beginning of the line. Needed to avoid matching "HHEY,HEY,HEY".
(\w+): matches one or more word characters. This is the first catpured group.
,: the character comma.
\1: a backreference to the first captured group. In the other words, matches whatever was matched in (\w+) before.
,: the character comma.
\1: a backreference to the first captured group.
$: end of the line. Needed to avoid matching "HEY,HEY,HEYY".
Source: https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx#Anchor_5
Example usage
static void Main()
{
var threeWords = new Regex(#"^(\w+),\1,\1$");
var lines = new[]
{
"HEY,HEY,HEY",
"NO,NO,NO",
"HEY,HI,HEY",
"HEY,H,Y",
"HEY,NO,HEY",
"HHEY,HEY,HEY",
"HEY,HEY,HEYY",
};
foreach (var line in lines)
{
var isMatch = threeWords.IsMatch(line) ? "" : "no ";
Console.WriteLine($"{line} - {isMatch}match");
}
}
Output:
HEY,HEY,HEY - match
NO,NO,NO - match
HEY,HI,HEY - no match
HEY,H,Y - no match
HEY,NO,HEY - no match
HHEY,HEY,HEY - no match
HEY,HEY,HEYY - no match

Regex to find special pattern

I have a string to parse. First I have to check if string contains special pattern:
I wanted to know if there is substrings which starts with "$(",
and end with ")",
and between those start and end special strings,there should not be
any white-empty space,
it should not include "$" character inside it.
I have a little regex for it in C#
string input = "$(abc)";
string pattern = #"\$\(([^$][^\s]*)\)";
Regex rgx = new Regex(pattern, RegexOptions.IgnoreCase);
MatchCollection matches = rgx.Matches(input);
foreach (var match in matches)
{
Console.WriteLine("value = " + match);
}
It works for many cases but failed at input= $(a$() , which inside the expression is empty. I wanted NOT to match when input is $().[ there is nothing between start and end identifiers].
What is wrong with my regex?
Note: [^$] matches a single character but not of $
Use the below regex if you want to match $()
\$\(([^\s$]*)\)
Use the below regex if you don't want to match $(),
\$\(([^\s$]+)\)
* repeats the preceding token zero or more times.
+ Repeats the preceding token one or more times.
Your regex \(([^$][^\s]*)\) is wrong. It won't allow $ as a first character inside () but it allows it as second or third ,, etc. See the demo here. You need to combine the negated classes in your regex inorder to match any character not of a space or $.
Your current regex does not match $() because the [^$] matches at least 1 character. The only way I can think of where you would have this match would be when you have an input containing more than one parens, like:
$()(something)
In those cases, you will also need to exclude at least the closing paren:
string pattern = #"\$\(([^$\s)]+)\)";
The above matches for example:
abc in $(abc) and
abc and def in $(def)$()$(abc)(something).
Simply replace the * with a + and merge the options.
string pattern = #"\$\(([^$\s]+)\)";
+ means 1 or more
* means 0 or more

Categories