Matching any word enclosed in parentheses in a sentence - c#

I am trying to find a regex to match any word enclosed in parentheses in a sentence.
Suppose, I have a sentence.
"Welcome, (Hello, All of you) to the Stack Over flow."
Say if my matching word is Hello,, All, of or you. It should return true.
Word could contain anything number , symbol but separated from other by white-space
I tried with this \(([^)]*)\). but this returns all words enclosed by parentheses
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"\(([^)]*)\)");
Match match = _regex.Match(ss.ToLower());
if (match.Success)
{
ss = match.Groups[0].Value;
}
}
Help and Guidance is very much appreciated.
Thanks.
Thanks People for you time and answers. I have finally solved by changing my code as reply by Tim.
For People with similar problem. I am writing my final code here
static void Main(string[] args)
{
string ss = "Welcome, (Hello, All of you) to the Stack Over flow.";
Regex _regex = new Regex(#"[^\s()]+(?=[^()]*\))");
Match match = _regex.Match(ss.ToLower());
while (match.Success)
{
ss = match.Groups[0].Value;
Console.WriteLine(ss);
match = match.NextMatch();
}
}

OK, so it seems that a "word" is anything that's not whitespace and doesn't contain parentheses, and that you want to match a word if the next parenthesis character that follows is a closing parenthesis.
So you can use
[^\s()]+(?=[^()]*\))
Explanation:
[^\s()]+ matches a "word" (should be easy to understand), and
(?=[^()]*\)) makes sure that a closing parenthesis follows:
(?= # Look ahead to make sure the following regex matches here:
[^()]* # Any number of characters except parentheses
\) # followed by a closing parenthesis.
) # (End of lookahead assertion)

I've developed a c# function for you, if you are interested.
public static class WordsHelper
{
public static List<string> GetWordsInsideParenthesis(string s)
{
List<int> StartIndices = new List<int>();
var rtn = new List<string>();
var numOfOpen = s.Where(m => m == '(').ToList().Count;
var numOfClose = s.Where(m => m == ')').ToList().Count;
if (numOfClose == numOfOpen)
{
for (int i = 0; i < numOfOpen; i++)
{
int ss = 0, sss = 0;
if (StartIndices.Count == 0)
{
ss = s.IndexOf('(') + 1; StartIndices.Add(ss);
sss = s.IndexOf(')');
}
else
{
ss = s.IndexOf('(', StartIndices.Last()) + 1;
sss = s.IndexOf(')', ss);
}
var words = s.Substring(ss, sss - ss).Split(' ');
foreach (string ssss in words)
{
rtn.Add(ssss);
}
}
}
return rtn;
}
}
Just call it this way:
var text = "Welcome, (Hello, All of you) to the (Stack Over flow).";
var words = WordsHelper.GetWordsInsideParenthesis(s);
Now you'll have a list of words in words variable.
Generally, you should opt for c# coding, rather than regex because c# is far more efficient and readable and better than regex in performance wise.
But, if you want to stick on to Regex, then its ok, do the following:
If you want to use regex, keep the regex from Tim Pietzcker [^\s()]+(?=[^()]*\)) but use it this way:
var text="Welcome, (Hello, All of you) to the (Stack Over flow).";
var values= Regex.Matches(text,#"[^\s()]+(?=[^()]*\))");
now values contains MatchCollection
You can access the value using index and Value property
Something like this:
string word=values[0].Value;

(?<=[(])[^)]+(?=[)])
Matches all words in parentheses
(?<=[(]) Checks for (
[^)]+ Matches everything up to but not including a )
(?=[)]) Checks for )

Related

Regex - replacing chars in C# string in specific cases

I want to replace all brackets to another in my input string only when between them there aren't digits. I wrote this working sample of code:
string pattern = #"(\{[^0-9]*?\})";
MatchCollection matches = Regex.Matches(inputString, pattern);
if(matches != null)
{
foreach (var match in matches)
{
string outdateMatch = match.ToString();
string updateMatch = outdateMatch.Replace('{', '[').Replace('}', ']');
inputString = inputString.Replace(outdateMatch, updateMatch);
}
}
So for:
string inputString = "{0}/{something}/{1}/{other}/something"
The result will be:
inputString = "{0}/[something]/{1}/[other]/something"
Is there possibility to do this in one line using Regex.Replace() method?
You may use
var output = Regex.Replace(input, #"\{([^0-9{}]*)}", "[$1]");
See the regex demo.
Details
\{ - a { char
([^0-9{}]*) - Capturing group 1: 0 or more chars other than digits, { and }
} - a } char.
The replacement is [$1], the contents of Group 1 enclosed with square brackets.
Regex.Replace(input, #"\{0}/\{(.+)\}/\{1\}/\{(.+)}/(.+)", "{0}/[$1]/{1}/[$2]/$3")
Could you do this?
Regex.Replace(inputString, #"\{([^0-9]*)\}", "[$1]");
That is, capture the "number"-part, then just return the string with the braces replaced.
Not sure if this is exactly what you are after, but it seems to fit the question :)

Split string by character in C#

I need to split this code by ',' in C#.
Sample string:
'DC0''008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'
I can use string.split(',') but as you can see 'Comm,erc,' is split up by
comm
erc
also 'DC0''008_' should split up as
'DC0''008_'
not as
'DC0'
'008_'
The expected output should be like this:
'DC0''008_'
'23802.76'
'23802.76'
'23802.76'
'Comm,erc,'
'2f17'
'3f44c0ba-daf1-44f0-a361-'
split can do it but regex will be more complex.
You can use Regex.Matches using this simpler regex:
'[^']*'
and get all quoted strings in a collection.
Code:
MatchCollection matches = Regex.Matches(input, #"'[^']*'");
To print all the matched values:
foreach (Match match in Regex.Matches(input, #"'[^']*'"))
Console.WriteLine("Found {0}", match.Value);
To store all matched values in an ArrayList:
ArrayList list = new ArrayList();
foreach (Match match in Regex.Matches(input, #"'[^']*'")) {
list.add(match.Value);
}
EDIT: As per comments below if OP wants to consume '' in the captured string then use this lookaround regex:
'.*?(?<!')'(?!')
(?<!')'(?!') means match a single quote that is not surrounded by another single quote.
RegEx Demo
You can use this Regex to get all the things inside the commas and apostrophes:
(?<=')[^,].*?(?=')
Regex101 Explanation
To convert it into a string array, you can use the following:
var matches = Regex.Matches(strInput, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
EDIT: If you want it to be able to capture double quotes, then the Regex that will match it in every case becomes unwieldy. At this point, It's better to just use a simpler pattern with Regex.Split:
var matches = Regex.Split(strInput, "^'|'$|','")
.Where(x => !string.IsNullOrEmpty(x))
.ToArray();
it is good to modify your string then split it so that you will achieve what you want like some thing below
string data = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
data = Process(data); //process before split i.e for the time being replace outer comma with some thing else like '#'
string[] result = data.Split('#'); // now it will work lolz not confirmed and tested
the Process() function is below
private string Process(string input)
{
bool flag = false;
string temp="";
char[] data = input.ToCharArray();
foreach(char ch in data)
{
if(ch == '\'' || ch == '"')
if(flag)
flag=false;
else
flag=true;
if(ch == ',')
{
if(flag) //if it is inside ignore else replace with #
temp+=ch;
else
temp+="#";
}
else
temp+=ch;
}
return temp;
}
see output here http://rextester.com/COAH43918
using System;
using System.Linq;
using System.Text.RegularExpressions;
namespace ConsoleApplication15
{
class Program
{
static void Main(string[] args)
{
string str = "'DC0008_','23802.76','23802.76','23802.76','Comm,erc,','2f17','3f44c0ba-daf1-44f0-a361-'";
var matches = Regex.Matches(str, "(?<=')[^,].*?(?=')");
var array = matches.Cast<Match>().Select(x => x.Value).ToArray();
foreach (var item in array)
Console.WriteLine("'" + item + "'");
}
}
}

How can I strip in-line comments from a text reader

Hi I'm trying to remove comments from within a text file by iterating through a streamreader and checking if each line starts with /*
private void StripComments()
{
_list = new List<string>();
using (_reader = new StreamReader(_path))
{
while ((_line = _reader.ReadLine()) != null)
{
var temp =_line.Trim();
if (!temp.StartsWith(#"/*"))
{
_list.Add(temp);
}
}
}
}
I need to remove comments with the following format /* I AM A COMMENT */ I thought that the file only had whole line comments but upon closer inspection there are comments located at the ends of some lines. The .endswith(#"*/") can't be used as this would remove the code preceding it.
Thanks.
If you are comfortable with regex
string pattern="(?s)/[*].*?[*]/";
var output=Regex.Replace(File.ReadAllText(path),pattern,"");
. would match any character other then newline.
(?s) toggles the single line mode in which . would also match newlines..
.* would match 0 to many characters where * is a quantifier
.*? would match lazily i.e it would match as less as possible
NOTE
That won't work if a string within "" contain /*..You should use a parser instead!
Regex is a good fit for this.
string START = Regex.Escape("/*");
string END = Regex.Escape("*/");
string input = #"aaa/* bcd
de */ f";
var str = Regex.Replace(input, START + ".+?" + END, "",RegexOptions.Singleline);
List<string> _list = new List<string>();
Regex r = new Regex("/[*]");
string temp = #"sadf/*slkdj*/";
if (temp.StartsWith(#"/*")) { }
else if (temp.EndsWith(#"*/") && temp.Contains(#"/*"))
{
string pre = temp.Substring(0, r.Match(temp).Index);
_list.Add(pre);
}
else
{
_list.Add(temp);
}

C# Regex match all occurrences

I'm trying to make a Regular Expression in C# that will match strings like"", but my Regex stops at the first match, and I'd like to match the whole string.
I've been trying with a lot of ways to do this, currently, my code looks like this:
string sPattern = #"/&#\d{2};/";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
foreach (Match m in mcMatches) {
if (!m.Success) {
//Give Warning
}
}
And also tried lblDebug.Text = Regex.IsMatch(txtInput.Text, "(&#[0-9]{2};)+").ToString(); but it also only finds the first match.
Any tips?
Edit:
The end result I'm seeking is that strings like &# are labeled as incorrect, as it is now, since only the first match is made, my code marks this as a correct string.
Second Edit:
I changed my code to this
string sPattern = #"&#\d{2};";
Regex rExp = new Regex(sPattern);
MatchCollection mcMatches = rExp.Matches(txtInput.Text);
int iMatchCount = 0;
foreach (Match m in mcMatches) {
if (m.Success) {
iMatchCount++;
}
}
int iTotalStrings = txtInput.Text.Length / 5;
int iVerify = txtInput.Text.Length % 5;
if (iTotalStrings == iMatchCount && iVerify == 0) {
lblDebug.Text = "True";
} else {
lblDebug.Text = "False";
}
And this works the way I expected, but I still think this can be achieved in a better way.
Third Edit:
As #devundef suggest, the expression "^(&#\d{2};)+$" does the work I was hopping, so with this, my final code looks like this:
string sPattern = #"^(&#\d{2};)+$";
Regex rExp = new Regex(sPattern);
lblDebug.Text = rExp.IsMatch(txtInput.Text).ToString();
I always neglect the start and end of string characters (^ / $).
Remove the / at the start and end of the expression.
string sPattern = #"&#\d{2};";
EDIT
I tested the pattern and it works as expected. Not sure what you want.
Two options:
&#\d{2}; => will give N matches in the string. On the string  it will match 2 groups,  and 
(&#\d{2};)+ => will macth the whole string as one single group. On the string  it will match 1 group, 
Edit 2:
What you want is not get the groups but know if the string is in the right format. This is the pattern:
Regex rExp = new Regex(#"^(&#\d{2};)+$");
var isValid = rExp.IsMatch("") // isValid = true
var isValid = rExp.IsMatch("xyz") // isValid = false
Here you go: (&#\d{2};)+ This should work for one occurence or more
(&#\d{2};)*
Recommend: http://www.weitz.de/regex-coach/

Find all words without figures using RegEx

I found this code to get all words of a string,
static string[] GetWords(string input)
{
MatchCollection matches = Regex.Matches(input, #"\b[\w']*\b");
var words = from m in matches.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
return words.ToArray();
}
static string TrimSuffix(string word)
{
int apostrapheLocation = word.IndexOf('\'');
if (apostrapheLocation != -1)
{
word = word.Substring(0, apostrapheLocation);
}
return word;
}
Please describe about the code.
How can I get words without figures?
2 How can I get words without figures?
You'll have to replace \w with [A-Za-z]
So that your RegEx becomes #"\b[A-Za-z']*\b"
And then you'll have to think about TrimSuffix(). The regEx allows apostrophes but TrimSuffix() will extract only the left part. So "it's" will become "it".
In
MatchCollection matches = Regex.Matches(input, #"\b[\w']*\b");
the code is using a regex that will look for any word; \b means border of word and \w is the alpha numerical POSIX class to get everything as letters(with or without graphical accents), numbers and sometimes underscore and the ' is just included in the list along with the alphaNum. So basically that is searching for the begining and the end of the word and selecting it.
then
var words = from m in matches.Cast<Match>()
where !string.IsNullOrEmpty(m.Value)
select TrimSuffix(m.Value);
is a LINQ syntax, where you can do SQL-Like queries inside your code. That code is getting every match from the regex and checking to see if the value is not empty and to get it without spaces. Its also where you can add your figure validation.
and This:
static string TrimSuffix(string word)
{
int apostrapheLocation = word.IndexOf('\'');
if (apostrapheLocation != -1)
{
word = word.Substring(0, apostrapheLocation);
}
return word;
}
is removing the ' of the words who have it and getting just the part that is before it
i.e. for don't word it will get only the don

Categories