In the following code I want to capture anything that begins with test and followed by the text enclosed by double quotes. E.g.
test"abc"
test"rst"
The code works fine.
private void testRegex()
{
string st = "this test\"abc\"= or test\"rst\"\"uvw\" or test(def)(abc) is a test.";
Regex oRegex = new Regex("test\".*?\"");
foreach (Match mt in oRegex.Matches(st))
{
Console.WriteLine(mt.Value);
}
}
Then, from the above captures, I want to capture the subexpressions that follow the word test (in above examples those subexpressions would be "abc" and "rst" including the ". I tried the following and it correctly gives me:
"abc"
"rst"
private void testRegex()
{
string st = "this test\"abc\"= or test\"rst\"\"uvw\" or test(def)(abc) is a test.";
Regex oRegex = new Regex("test(\".*?\")");
foreach (Match mt in oRegex.Matches(st))
{
Console.WriteLine(mt.Groups[1].Value);
}
}
Question: Now I want to capture the two subexpressions 1. "abc" and "rst" 2. Any character except " that follows the matches test"abc" and test"rst". So, I tried the following but as shown below the groups 1 and 2 for the match "rst""uvw" are wrong. I need group 1 of "rst""uvw" to be "rst" and group 2 to be empty since the character that follows "rst" is ":
Group 1: "abc"
Group 2: =
Group 1: "rst""
Group 2: u
private void testRegex()
{
string st = "this test\"abc\"= or test\"rst\"\"uvw\" or test(def)(abc) is a test.";
Regex oRegex = new Regex("test(\".*?\")([^\"])");
foreach (Match mt in oRegex.Matches(st))
{
Console.WriteLine(mt.Groups[1].Value);
Console.WriteLine(mt.Groups[2].Value);
}
}
You must be looking for
test("[^"]*")([^"])?
See demo
I made 2 changes:
Used negated character class [^"]* (matching 0 or more characters other than a double quote) instead of lazy matching any characters with .*?
Made the [^"] optional with ? quantifier.
Two alternate version:
(?<=test)("[^"]+")([^"])?
In case you would like to keep result in one place:
(?<=test)("[^"]+"[^"]?)
Related
can anybody help me with regular expression in C#?
I want to create a pattern for this input:
{a? ab 12 ?? cd}
This is my pattern:
([A-Fa-f0-9?]{2})+
The problem are the curly brackets. This doesn't work:
{(([A-Fa-f0-9?]{2})+)}
It just works for
{ab}
I would use {([A-Fa-f0-9?]+|[^}]+)}
It captures 1 group which:
Match a single character present in the list below [A-Fa-f0-9?]+
Match a single character not present in the list below [^}]+
If you allow leading/trailing whitespace within {...} string, the expression will look like
{(?:\s*([A-Fa-f0-9?]{2}))+\s*}
See this regex demo
If you only allow a single regular space only between the values inside {...} and no space after { and before }, you can use
{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}
See this regex demo. Note this one is much stricter. Details:
{ - a { char
(?:\s*([A-Fa-f0-9?]{2}))+ - one or more occurrences of
\s* - zero or more whitespaces
([A-Fa-f0-9?]{2}) - Capturing group 1: two hex or ? chars
\s* - zero or more whitespaces
} - a } char.
See a C# demo:
var text = "{a? ab 12 ?? cd}";
var pattern = #"{(?:([A-Fa-f0-9?]{2})(?: (?!}))?)+}";
var result = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Captures.Cast<Capture>().Select(m => m.Value))
.ToList();
foreach (var list in result)
Console.WriteLine(string.Join("; ", list));
// => a?; ab; 12; ??; cd
If you want to capture pairs of chars between the curly's, you can use a single capture group:
{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}
Explanation
{ Match {
( Capture group 1
[A-Fa-f0-9?]{2} Match 2 times any of the listed characters
(?: [A-Fa-f0-9?]{2})* Optionally repeat a space and again 2 of the listed characters
) Close group 1
} Match }
Regex demo | C# demo
Example code
string pattern = #"{([A-Fa-f0-9?]{2}(?: [A-Fa-f0-9?]{2})*)}";
string input = #"{a? ab 12 ?? cd}
{ab}";
foreach (Match m in Regex.Matches(input, pattern))
{
Console.WriteLine(m.Groups[1].Value);
}
Output
a? ab 12 ?? cd
ab
I'm trying to get particular parts from a string. I have to get the part which starts after '#' and contains only letters from the Latin alphabet.
I suppose that I have to create a regex pattern, but I don't know how.
string test = "PQ#Alderaa1:30000!A!->20000";
var planet = "Alderaa"; //what I want to get
string test2 = "#Cantonica:3000!D!->4000NM";
var planet2 = "Cantonica";
There are some other parts which I have to get, but I will try to get them myself. (starts after ':' and is an Integer; may be "A" (attack) or "D" (destruction) and must be surrounded by "!" (exclamation mark); starts after "->" and should be an Integer)
You could get the separate parts using capturing groups:
#([a-zA-Z]+)[^:]*:(\d+)!([AD])!->(\d+)
That will match:
#([a-zA-Z]+) Match # and capture in group 1 1+ times a-zA-Z
[^:]*: Match 0+ times not a : using a negated character class, then match a : (If what follows could be only optional digits, you might also match 0+ times a digit [0-9]*)
(\d+) Capture in group 2 1+ digits
!([AD])! Match !, capture in group 3 and A or D, then match !
->(\d+) Match -> and capture in group 4 1+ digits
Demo | C# Demo
You can use this regex, which uses a positive look behind to ensure the matched text is preceded by # and one or more alphabets get captured using [a-zA-Z]+ and uses a positive look ahead to ensure it is followed by some optional text, a colon, then one or more digits followed by ! then either A or D then again a !
(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)
Demo
C# code demo
string test = "PQ#Alderaa1:30000!A!->20000";
Match m1 = Regex.Match(test, #"(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)");
Console.WriteLine(m1.Groups[0].Value);
test = "#Cantonica:3000!D!";
m1 = Regex.Match(test, #"(?<=#)[a-zA-Z]+(?=[^:]*:\d+![AD]!)");
Console.WriteLine(m1.Groups[0].Value);
Prints,
Alderaa
Cantonica
You already have a good answers but I would like to add a new one to show named capturing groups.
You can create a class for your planets like
class Planet
{
public string Name;
public int Value1; // name is not cleat from context
public string Category; // as above: rename it
public string Value2; // same problem
}
Now you can use regex with named groups
#(?<name>[a-z]+)[^:]*:(?<value1>\d+)!(?<category>[^!]+)!->(?<value2>[\da-z]+)
Demo
Usage:
var input = new[]
{
"PQ#Alderaa1:30000!A!->20000",
"#Cantonica:3000!D!->4000NM",
};
var regex = new Regex("#(?<name>[a-z]+)[^:]*:(?<value1>\\d+)!(?<category>[^!]+)!->(?<value2>[\\da-z]+)",
RegexOptions.IgnoreCase | RegexOptions.Compiled);
var planets = input
.Select(p => regex.Match(p))
.Select(m => new Planet
{
Name = m.Groups["name"].Value, // here and further we can access to part of input string by name
Value1 = int.Parse(m.Groups["value1"].Value),
Category = m.Groups["category"].Value,
Value2 = m.Groups["value2"].Value
})
.ToList();
I need to be able to grab specific elements out of a string that start and end with curly brackets. If I had a string:
"asjfaieprnv{1}oiuwehern{0}oaiwefn"
How could I grab just the 1 followed by the 0.
Regex is very useful for this.
What you want to match is:
\{ # a curly bracket
# - we need to escape this with \ as it is a special character in regex
[^}] # then anything that is not a curly bracket
# - this is a 'negated character class'
+ # (at least one time)
\} # then a closing curly bracket
# - this also needs to be escaped as it is special
We can collapse this to one line:
\{[^}]+\}
Next, you can capture and extract the inner contents by surrounding the part you want to extract with parentheses to form a group:
\{([^}]+)\}
In C# you'd do:
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
}
Group 0 is the whole match (in this case including the { and }), group 1 the first parenthesized part, and so on.
A full example:
var input = "asjfaieprnv{1}oiuwehern{0}oaiwef";
var matches = Regex.Matches(input, #"\{([^}]+)\}");
foreach (Match match in matches)
{
var groupContents = match.Groups[1].Value;
Console.WriteLine(groupContents);
}
Outputs:
1
0
Use the Indexof method:
int openBracePos = yourstring.Indexof ("{");
int closeBracePos = yourstring.Indexof ("}");
string stringIWant = yourstring.Substring(openBracePos, yourstring.Len() - closeBracePos + 1);
That will get your first occurrence. You need to slice your string so that the first occurrence is no longer there, then repeat the above procedure to find your 2nd occurrence:
yourstring = yourstring.Substring(closeBracePos + 1);
Note: You MAY need to escape the curly braces: "{" - not sure about this; have never dealt with them in C#
This looks like a job for regular expressions
using System.Text.RegularExpressions;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string str = "asjfaieprnv{1}oiuwe{}hern{0}oaiwefn";
Regex regex = new Regex(#"\{(.*?)\}");
foreach( Match match in regex.Matches(str))
{
Console.WriteLine(match.Groups[1].Value);
}
}
}
}
I have the following string that would require me to parse it via Regex in C#.
Format: rec_mnd.rate.current_rate.sum.QWD.RET : 214345
I would like to extract our the bold chars as group objects in a groupcollection.
QWD = 1 group
RET = 1 group
214345 = 1 group
what would the message pattern be like?
It would be something like this:
string s = "Format: rec_mnd.rate.current_rate.sum.QWD.RET : 214345";
Match m = Regex.Match(s, #"^Format: rec_mnd\.rate\.current_rate\.sum\.(.+?)\.(.+?) : (\d+)$");
if( m.Success )
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
}
The question mark in the first two groups make that quantifier lazy: it will capture the least possible amount of characters. In other words, it captures until the first . it sees. Alternatively, you could use ([^.]+) in those groups, which explicitly captures everything except a period.
The last group explicitly only captures decimal digits. If your expression can have other values on the right side of the : you'd have to change that to .+ as well.
Please, make it a lot easier on yourself and label your groups to make it easier to understand what is going on in code.
RegEx myRegex = new Regex(#"rec_mnd\.rate\.current_rate\.sum\.(?<code>[A-Z]{3})\.(?<subCode>[A-Z]{3})\s*:\s*(?<number>\d+)");
var matches = myRegex.Matches(sourceString);
foreach(Match match in matches)
{
//do stuff
Console.WriteLine("Match");
Console.WriteLine("Code: " + match.Groups["code"].Value);
Console.WriteLine("SubCode: " + match.Groups["subCode"].Value);
Console.WriteLine("Number: " + match.Groups["number"].Value);
}
This should give you what you want regardless of what's between the .'s.
#"(?:.+\.){4}(.\w+)\.(\w+)\s?:\s?(\d+)"
I need to match the following strings and returns the values as groups:
abctic
abctac
xyztic
xyztac
ghhtic
ghhtac
Pattern is wrote with grouping is as follows:
(?<arch>[abc,xyz,ghh])(?<flavor>[tic,tac]$)
The above returns only parts of group names. (meaning match is not correct).
If I use * in each sub pattern instead of $ at the end, groups are correct, but that would mean that abcticff will also match.
Please let me know what my correct regex should be.
Your pattern is incorrect because a pipe symbol | is used to specify alternate matches, not a comma in brackets as you were using, i.e., [x,y].
Your pattern should be: ^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$
The ^ and $ metacharacters ensures the string matches from start to end. If you need to match text in a larger string you could replace them with \b to match on a word boundary.
Try this approach:
string[] inputs = { "abctic", "abctac", "xyztic", "xyztac", "ghhtic", "ghhtac" };
string pattern = #"^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$";
foreach (var input in inputs)
{
var match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("Arch: {0} - Flavor: {1}",
match.Groups["arch"].Value,
match.Groups["flavor"].Value);
}
else
Console.WriteLine("No match for: " + input);
}