Get particular parts from a string - c#

I'm trying to get particular parts from a string. I have to get the part which starts after '#' and contains only letters from the Latin alphabet.
I suppose that I have to create a regex pattern, but I don't know how.
string test = "PQ#Alderaa1:30000!A!->20000";
var planet = "Alderaa"; //what I want to get
string test2 = "#Cantonica:3000!D!->4000NM";
var planet2 = "Cantonica";
There are some other parts which I have to get, but I will try to get them myself. (starts after ':' and is an Integer; may be "A" (attack) or "D" (destruction) and must be surrounded by "!" (exclamation mark); starts after "->" and should be an Integer)

You could get the separate parts using capturing groups:
#([a-zA-Z]+)[^:]*:(\d+)!([AD])!->(\d+)
That will match:
#([a-zA-Z]+) Match # and capture in group 1 1+ times a-zA-Z
[^:]*: Match 0+ times not a : using a negated character class, then match a : (If what follows could be only optional digits, you might also match 0+ times a digit [0-9]*)
(\d+) Capture in group 2 1+ digits
!([AD])! Match !, capture in group 3 and A or D, then match !
->(\d+) Match -> and capture in group 4 1+ digits
Demo | C# Demo

You can use this regex, which uses a positive look behind to ensure the matched text is preceded by # and one or more alphabets get captured using [a-zA-Z]+ and uses a positive look ahead to ensure it is followed by some optional text, a colon, then one or more digits followed by ! then either A or D then again a !
(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)
Demo
C# code demo
string test = "PQ#Alderaa1:30000!A!->20000";
Match m1 = Regex.Match(test, #"(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)");
Console.WriteLine(m1.Groups[0].Value);
test = "#Cantonica:3000!D!";
m1 = Regex.Match(test, #"(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)");
Console.WriteLine(m1.Groups[0].Value);
Prints,
Alderaa
Cantonica

You already have a good answers but I would like to add a new one to show named capturing groups.
You can create a class for your planets like
class Planet
{
public string Name;
public int Value1; // name is not cleat from context
public string Category; // as above: rename it
public string Value2; // same problem
}
Now you can use regex with named groups
#(?<name>[a-z]+)[^:]*:(?<value1>\d+)!(?<category>[^!]+)!->(?<value2>[\da-z]+)
Demo
Usage:
var input = new[]
{
"PQ#Alderaa1:30000!A!->20000",
"#Cantonica:3000!D!->4000NM",
};
var regex = new Regex("#(?<name>[a-z]+)[^:]*:(?<value1>\\d+)!(?<category>[^!]+)!->(?<value2>[\\da-z]+)",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
var planets = input
.Select(p => regex.Match(p))
.Select(m => new Planet
{
Name = m.Groups["name"].Value, // here and further we can access to part of input string by name
Value1 = int.Parse(m.Groups["value1"].Value),
Category = m.Groups["category"].Value,
Value2 = m.Groups["value2"].Value
})
.ToList();

Related

Regex to find all placeholder occurrences in text

Im struggling to create a Regex that finds all placeholder occurrences in a given text. Placeholders will have the following format:
[{PRE.Word1.Word2}]
Rules:
Delimited by "[{PRE." and "}]" ("PRE" upper case)
2 words (at least 1 char long each) separated by a dot. All chars valid on each word apart from newline.
word1: min 1 char, max 15 chars
word2: min 1 char, max 64 chars
word1 cannot have dots, if there are more than 2 dots inside placeholder extra ones will be part of word2. If less than 2 dots, placeholder is invalid.
Looking to get all valid placeholders regardless of what the 2 words are.
Im not being lazy, just spent an horrible amount of time building the rule on regexr.com, but was unable to cross all these rules.
Looking fwd to checking your suggestions.
The closest I've got to was the below, and any attempt to expand on that breaks all valid matches.
\[\{OEP\.*\.*\}\]
Much appreciated!
Sample text where Regex should find matches:
Random text here
[{Test}] -- NO MATCH
[{PRE.TestTest3}] --NO MATCH
[{PRE.TooLong.12345678901234567890}] --NO MATCH
[{PRE.Address.Country}] --MATCH
[{PRE.Version.1.0}] --MATCH
Random text here
You can use
\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]
See the regex demo
Details
\[{ - a [{ string
PRE\. - PRE. text
([^][{}.]{1,15}) - Group 1: any one to fifteen chars other than [, ], {, } and .
\. - a dot
(.{1,64}?) - any one to 64 chars other than line break chars as few as possible
}] - a }] text.
If you need to get all matches in C#, you can use
var pattern = #"\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]";
var matches = Regex.Matches(text, pattern);
See this C# demo:
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
var text = "[{PRE.Word1.Word2}] and [{PRE.Word 3.Word..... 2 %%%}]";
var pattern = #"\[{PRE\.([^][{}.]{1,15})\.(.{1,64}?)}]";
var matches = Regex.Matches(text, pattern);
var props = new List<Property>();
foreach (Match m in matches)
props.Add(new Property(m.Groups[1].Value,m.Groups[2].Value));
foreach (var item in props)
Console.WriteLine("Word1 = " + item.Word1 + ", Word2 = " + item.Word2);
}
public class Property
{
public string Word1 { get; set; }
public string Word2 { get; set; }
public Property()
{}
public Property(string w1, string w2)
{
this.Word1 = w1;
this.Word2 = w2;
}
}
}
Output:
Word1 = Word1, Word2 = Word2
Word1 = Word 3, Word2 = Word..... 2 %%%
string input = "[{PRE.Word1.Word2}]";
// language=regex
string pattern = #"\[{ PRE \. (?'group1' .{1,15}? ) \. (?'group2' .{1,64}? ) }]";
var match = Regex.Match(input, pattern, RegexOptions.IgnorePatternWhitespace);
Console.WriteLine(match.Groups["group1"].Value);
Console.WriteLine(match.Groups["group2"].Value);

Building a regular expression in C#

How to check the following text in C# with Regex:
key_in-get { 43243225543543543 };
or
key_in_set { password123 : 34980430943834 };
I tried to build a regular expression, but I failed after few hours.
Here is my code:
string text1 = "key_in-get { 322389238237 };";
string text2 = "key_in-set { password123 : 322389238237 };";
string pattern = "key_in-(get|set) { .* };";
var result1 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result1);
var result2 = Regex.IsMatch(text, pattern);
Console.Write("Is valid: {0} ", result2);
I have to check if there is "set" or "get".
If the pattern finds "set" then it can only accept following pattern "text123 : 123456789", and if it finds "get" then should accept only "123456789".
You can use
key_in-(?:get|(set)) {(?(1) \w+ :) \w+ };
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\w+\s*};
key_in-(?:get|(set))\s*{(?(1)\s*\w+\s*:)\s*\d+\s*};
See the regex demo. The second one allows any amount of any whitespace between the elements and the third one allows only digits after : or as part of the get expression.
If the whole string must match, add ^ at the start and $ at the end of the pattern.
Details:
key_in- - a substring
(?:get|(set)) - get or set (the latter is captured into Group 1)
\s* - zero or more whitespaces
{ - a { char
(?(1)\s*\w+\s*:) - a conditional construct: if Group 1 matched, match one or more word chars enclosed with zero or more whitespaces and then a colon
\s*\w+\s* - one or more word chars enclosed with zero or more whitespaces
}; - a literal substring.
In the pattern that you tried key_in-(get|set) { .* }; you are matching either get or set followed by { until the last occurrence of } which could possibly also match key_in-get { }; };
As an alternative solution, you could use an alternation | specifying each of the accepted parts for the get and the set.
key_in-(?:get\s*{\s*\w+|set\s*{\s*\w+\s*:\s*\w+)\s*};
The pattern matches
key_in- Match literally
(?: Non capture group
get\s*{\s*\w+ Match get, { between optional whitespace chars and 1+ word chars
| Or
set\s*{\s*\w+\s*:\s*\w+ Match set, { between optional whitespace chars and word chars on either side with : in between.
) Close non capture group
\s*}; Match optional whitespace chars and };
Regex demo

Regex - split by "_" and exclude file extension

I need to split the following string AAA_BBB_CCC.extension by "_" and exclude from the results any file extension.
Where A, B and C can be any character or space. I wish to get AAA, BBB and CCC.
I know that \.(?:.(?!\.))+$ will match .extension but I could not combine it with matching "_" for splitting.
Use the Path.GetFileNameWithoutExtension function to strip the extension from the file name.
Then use String.Split to get an array with three items:
var fileName = Path.GetFileNameWithoutExtension(fullName);
var parts = fileName.Split('_');
var partAAA = parts[0];
var partBBB = parts[1];
var partCCC = parts[2];
If the parts are always the same fixed number of characters long, you can as well extract them using the Substring function. No need to resort to regex here.
Another option is to make use of the .NET Group.Captures property and capture any char except an _ in a named capture group, which you can extract from the match using a named group.
^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$
Explanation
^ Start of string
(?'val'[^_]+) Named group val, match 1+ chars other than _ using a negated character class
(?: Non caputure group
_(?'val'[^_]+) Match an _ and capture again 1+ chars other than _ in same named group val
)+ Close the non capture group and repeat 1+ times for at least 1 occurrence with _
\.\w+ Match a . and 1+ word chars
$ End of string
Regex demo
string pattern = #"^(?'val'[^_]+)(?:_(?'val'[^_]+))+\.\w+$";
string input = #"AAA_BBB_CCC.extension";
Match m = Regex.Match(input, pattern);
foreach (Capture capture in m.Groups["val"].Captures) {
Console.WriteLine(capture.Value);
}
Output
AAA
BBB
CCC
If you wanted to use a regex based approach here, you could try doing a find all on the following regex pattern:
[^_]+(?=.*\.\w+$)
This pattern will match every term in between underscore, except for the portion after the extension, which will be excluded by the lookahead.
Regex rx = new Regex(#"[^_]+(?=.*\.\w+$)");
string text = "AAA_BBB_CCC.extension";
MatchCollection matches = rx.Matches(text);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[0].Value);
}
This prints:
AAA
BBB
CCC

Regular expression matching a given structure

I need to generate a regex to match any string with this structure:
{"anyWord"}{"aSpace"}{"-"}{"anyLetter"}
How can I do it?
Thanks
EDIT
I have tried:
string txt="print -c";
string re1="((?:[a-z][a-z]+))"; // Word 1
Regex r = new Regex(re1,RegexOptions.IgnoreCase|RegexOptions.Singleline);
Match m = r.Match(txt);
if (m.Success)
{
String word1=m.Groups[1].ToString();
Console.Write("("+word1.ToString()+")"+"\n");
}
Console.ReadLine();
but this only matches the word "print"
This would be pretty straight-forward :
[a-zA-Z]+\s\-[a-zA-Z]
explained as follows :
[a-zA-Z]+ # Matches 1 or more letters
\s # Matches a single space
\- # Matches a single hyphen / dash
[a-zA-Z] # Matches a single letter
If you needed to implement this in C#, you could just use the Regex class and specifically the Regex.Matches() method:
var matches = Regex.Matches(yourString,#"[a-zA-Z]+\s\-[a-zA-Z]");
Some example matching might look like this :

C# Regex capturing group not working

In the following code I want to capture anything that begins with test and followed by the text enclosed by double quotes. E.g.
test"abc"
test"rst"
The code works fine.
private void testRegex()
{
string st = "this test\"abc\"= or test\"rst\"\"uvw\" or test(def)(abc) is a test.";
Regex oRegex = new Regex("test\".*?\"");
foreach (Match mt in oRegex.Matches(st))
{
Console.WriteLine(mt.Value);
}
}
Then, from the above captures, I want to capture the subexpressions that follow the word test (in above examples those subexpressions would be "abc" and "rst" including the ". I tried the following and it correctly gives me:
"abc"
"rst"
private void testRegex()
{
string st = "this test\"abc\"= or test\"rst\"\"uvw\" or test(def)(abc) is a test.";
Regex oRegex = new Regex("test(\".*?\")");
foreach (Match mt in oRegex.Matches(st))
{
Console.WriteLine(mt.Groups[1].Value);
}
}
Question: Now I want to capture the two subexpressions 1. "abc" and "rst" 2. Any character except " that follows the matches test"abc" and test"rst". So, I tried the following but as shown below the groups 1 and 2 for the match "rst""uvw" are wrong. I need group 1 of "rst""uvw" to be "rst" and group 2 to be empty since the character that follows "rst" is ":
Group 1: "abc"
Group 2: =
Group 1: "rst""
Group 2: u
private void testRegex()
{
string st = "this test\"abc\"= or test\"rst\"\"uvw\" or test(def)(abc) is a test.";
Regex oRegex = new Regex("test(\".*?\")([^\"])");
foreach (Match mt in oRegex.Matches(st))
{
Console.WriteLine(mt.Groups[1].Value);
Console.WriteLine(mt.Groups[2].Value);
}
}
You must be looking for
test("[^"]*")([^"])?
See demo
I made 2 changes:
Used negated character class [^"]* (matching 0 or more characters other than a double quote) instead of lazy matching any characters with .*?
Made the [^"] optional with ? quantifier.
Two alternate version:
(?<=test)("[^"]+")([^"])?
In case you would like to keep result in one place:
(?<=test)("[^"]+"[^"]?)

Categories