How can I get a certain segment with regex? - c#

I have a simple string
testteststete(segment1)tete(segment2)sttes323testte(segment3)eteste
I need to get (segment2). Each segment may be any text. I tried to use this regex
\(.+\). But i get this result
How i can get (segment2)?
PS: I want to get all of the segments in brackets

In C#, you can just match all the (...) substrings and then access the second one using Index 1:
var rx = #"\([^)]*\)";
var matches = Regex.Matches(str, rx).Cast<Match>().Select(p => p.Value).ToList();
var result = matches != null && matches.Count > 1 ? matches[1] : string.Empty;
See IDEONE demo
The regex matches
\( - an opening (
[^)]* - 0 or more characters other than )
\) - a closing ).

I can't test it but this should probably work:
\([^\)]*\)

You can try to use this regex:
\(([^)]+)\)
REGEX DEMO

(?<=^[^()]*\([^)]*\)[^()]*\()[^)]*
You can simply use this.See Demo

var regex = new Regex(#"\(([^)]*)\)", RegexOptions.Compiled);
string secondMatch = regex.Matches(text).Cast<Match>()
.Select(m => m.Value.Trim('(', ')'))
.ElementAtOrDefault(1);

Related

Regular expression to replace string except in sqaure brackets

Need to replace all forward-slash (/) with > except for the ones in the square brackets
input string:
string str = "//div[1]/li/a[#href='https://www.facebook.com/']";
Tried pattern (did not work):
string regex = #"\/(?=$|[^]]+\||\[[^]]+\]\/)";
var pattern = Regex.Replace(str, regex, ">");
Expected Result:
">>div[1]>li>a[#href='https://www.facebook.com/']"
Your thinking was good with lookbehind but instead positive use negative.
(?<!\[[^\]]*)(\/)
Demo
After updating your c# code
string pattern = #"(?<!\[[^\]]*)(\/)";
string input = "//div[1]/li/a[#href='https://www.facebook.com/']";
var result = Regex.Replace(input, pattern, ">");
You will get
>>div[1]>li>a[#href='https://www.facebook.com/']
If you're willing to also use String.Replace you can do the following:
string input = "//div[1]/li/a[#href='https://www.facebook.com/']";
string expected = ">>div[1]>li>a[#href='https://www.facebook.com/']";
var groups = Regex.Match(input, #"^(.*)(\[.*\])$")
.Groups
.Cast<Group>()
.Select(g => g.Value)
.Skip(1);
var left = groups.First().Replace('/', '>');
var right = groups.Last();
var actual = left + right;
Assert.Equal(expected, actual);
What this does is split the string into two groups, where for the first group the / is replaced by > as you describe. The second group is appended as is. Basically, you don't care what is between square brackets.
(The Assert is from an xUnit unit test.)
You could either match from an opening till a closing square bracket or capture the / in a capturing group.
In the replacement replace the / with a <
Pattern
\[[^]]+\]|(/)
\[[^]]+\] Match from opening [ till closing ]
| Or
(/) Capture / in group 1
Regex demo | C# demo
For example
string str = "//div[1]/li/a[#href='https://www.facebook.com/']";
string regex = #"\[[^]]+\]|(/)";
str = Regex.Replace(str, regex, m => m.Groups[1].Success ? ">" : m.Value);
Console.WriteLine(str);
Output
>>div[1]>li>a[#href='https://www.facebook.com/']

c# regex matches exclude first and last character

I know nothing about regex so I am asking this great community to help me out.
With the help of SO I manage to write this regex:
string input = "((isoCode=s)||(isoCode=a))&&(title=s)&&((ti=2)&&(t=2))||(t=2&&e>5)";
string pattern = #"\((?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!))\)|&&|\|\|";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[0].Value);
}
And the result is:
((isoCode=s)||(isoCode=a))
&&
(title=s)
&&
((ti=2)&&(t=2))
||
(t=2&&e>5)
but I need result like this (without first/last "(", ")"):
(isoCode=s)||(isoCode=a)
&&
title=s
&&
(ti=2)&&(t=2)
||
t=2&&e>5
Can it be done? I know I can do it with substring (removing first and last character), but I want to know if it can be done with regex.
You may use
\((?<R>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*(?(DEPTH)(?!)))\)|(?<R>&&|\|\|)
See the regex demo, grab Group "R" value.
Details
\( - an open (
(?<R> - start of the R named group:
(?> - start of the atomic group:
\((?<DEPTH>)| - an open ( and an empty string is pushed on the DEPTH group stack or
\)(?<-DEPTH>)| - a closing ) and an empty string is popped off the DEPTH group stack or
[^()]+ - 1+ chars other than ( and )
)* - zero or more repetitions
(?(DEPTH)(?!)) - a conditional construct that checks if the number of close and open parentheses is balanced
) - end of R named group
\) - a closing )
| - or
(?<R>&&|\|\|) - another occurrence of Group R matching either of the 2 subpatterns:
&& - a && substring
| - or
\|\| - a || substring.
C# code:
var pattern = #"\((?<R>(?>\((?<DEPTH>)|\)(?<-DEPTH>)|[^()]+)*(?(DEPTH)(?!)))\)|(?<R>&&|\|\|)";
var results = Regex.Match(input, pattern)
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Brief
You can use the regex below, but I'd still strongly suggest you write a proper parser for this instead of using regex.
Code
See regex in use here
\(((?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)|&{2}|‌​\|{2}
Usage
See regex in use here
using System;
using System.Text.RegularExpressions;
public class Test
{
public static void Main()
{
string input = "((isoCode=s)||(isoCode=a))&&(title=s)&&((ti=2)&&(t=2))||(t=2&&e>5)";
string pattern = #"\(((?>\((?<DEPTH>)|\)(?<-DEPTH>)|.?)*(?(DEPTH)(?!)))\)|&{2}|\|{2}";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Success ? match.Groups[1].Value : match.Groups[0].Value);
}
}
}
Result
(isoCode=s)||(isoCode=a)
&&
title=s
&&
(ti=2)&&(t=2)
||
t=2&&e>5

C# regex. Everything inside curly brackets{} and mod(%) charaters

I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))

Extract string that contains only letters in C#

string input = "5991 Duncan Road";
var onlyLetters = new String(input.Where(Char.IsLetter).ToArray());
Output: DuncanRoad
But I am expecting output is Duncan Road. What need to change ?
For the input like yours, you do not need a regex, just skip all non-letter symbols at the beginning with SkipWhile():
Bypasses elements in a sequence as long as a specified condition is true and then returns the remaining elements.
C# code:
var input = "5991 Duncan Road";
var onlyLetters = new String(input.SkipWhile(p => !Char.IsLetter(p)).ToArray());
Console.WriteLine(onlyLetters);
See IDEONE demo
A regx solution that will remove numbers that are not part of words and also adjoining whitespace:
var res = Regex.Replace(str, #"\s+(?<!\p{L})\d+(?!\p{L})|(?<!\p{L})\d+(?!\p{L})\s+", string.Empty); 
You can use this lookaround based regex:
repl = Regex.Replace(input, #"(?<![a-zA-Z])[^a-zA-Z]|[^a-zA-Z](?![a-zA-Z])", "");
//=> Duncan Road
(?<![a-zA-Z])[^a-zA-Z] matches a non-letter that is not preceded by another letter.
| is regex alternation
[^a-zA-Z](?![a-zA-Z]) matches a non-letter that is not followed by another letter.
RegEx Demo
You can still use LINQ filtering with Char.IsLetter || Char.IsWhiteSpace. To remove all leading and trailing whitespace chars you can call String.Trim:
string input = "5991 Duncan Road";
string res = String.Join("", input.Where(c => Char.IsLetter(c) || Char.IsWhiteSpace(c)))
.Trim();
Console.WriteLine(res); // Duncan Road

Regex starting with a string

I want to filter the following string with the regular expressions:
TEST^AB^^HOUSE-1234~STR2255
I wanna get only the string "HOUSE-1234" and I've to test the string always with the beginning "TEST^AB^^" and ending with the "~".
Can you please help me how the regex should look like?
You can use \^\^(.*?)\~ pattern which matches start with ^^ and ends with ~
string s = #"TEST^AB^^HOUSE-1234~STR2255";
Match match = Regex.Match(s, #"\^\^(.*?)\~", RegexOptions.IgnoreCase);
if (match.Success)
{
string key = match.Groups[1].Value;
Console.WriteLine(key);
}
Output will be;
HOUSE-1234
Here is a DEMO.
string input = "TEST^AB^^HOUSE-1234~STR2255";
var matches = Regex.Matches(input, #"TEST\^AB\^\^(.+?)~").Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
string pattern=#"\^\^(.*)\~";
Regex re=new Regex(pattern);
With the little information you've given us (and assuming that the TEST^AB isn't necessarily constant), this might work:
(?:\^\^).*(?:~)
See here
Or if TEST^AB is constant, you can throw it in too
(?:TEST\^AB\^\^).*(?:~)
The important part is to remember that you need to escape the ^
Don't even need the RegEx for something that well defined. If you want to simplify:
string[] splitString;
if (yourstring.StartsWith("TEST^AB^^"))
{
yourstring = yourstring.Remove(0, 9);
splitString = yourstring.Split('~');
return splitString[0];
}
return null;
(TEST\^AB\^\^)((\w)+-(\w+))(\~.+)
There are three groups :
(TEST\^AB\^\^) : match yours TEST^AB^^
((\w)+\-(\w+)) : match yours HOUSE-123
(\~.+) : match the rest
You should do this without regex:
var str = "TEST^AB^^HOUSE-1234~STR2255";
var result = (str.StartsWith("TEST^AB^^") && str.IndexOf('~') > -1)
? new string(str.Skip(9).TakeWhile(c=>c!='~').ToArray())
: null;
Console.WriteLine(result);

Categories