I have a list of strings with the output below
stop = F6, quantity ( 1 ) // stop 0
stop = F8, quantity ( 1 ) // stop 1
stop = BN, quantity ( 1 ) // stop 2
stop = F6, quantity ( 1 ) // stop 3
stop = F8, quantity ( 1 ) // stop 4
stop = BN, quantity ( 1 ) // stop 5
stop = F6, quantity ( 1 ) // stop 6
stop = F8, quantity ( 1 ) // stop 7
stop = SC, quantity ( 1 ) // stop 8
etc
using a foreach loop i'm retrieving each line in the list ie
`stop = F6, quantity ( 1 ) // stop 0`
However I only need the character F6.
I Know I need to use regex to retrieve f6 in this instance, however, I am unsure on the expression. From a brief tutorial on regex, I've tried using the code below to achieve this with no luck
`Regex.Match(output, #"=\w*,").Value.Replace("\"", "");`
Any help would be appreciated.
You can use this pattern:
"=\\s([A-Za-z0-9]{2}),"
//or
"=\\s(\\w+),"
Code:
string str = "stop = F6, quantity ( 1 ) ";
var res = Regex.Matches(str, "=\\s([A-Za-z0-9]{2}),")[0].Groups[1].Value;
i don't know much in C# but you're regex is this : "= (\w+),". That regex get any words/digit between = and ,.
In regex, an expression between parenthesis is call a "Capturing Group". In any languages you have some API to retrieve content capture in capturing group. I found this for C# : https://msdn.microsoft.com/fr-fr/library/system.text.regularexpressions.match.groups(v=vs.110).aspx
So the code for retrieve you're data look like that :
String pattern = #"=\\s(\\w+),";
MatchCollection matches = Regex.Matches(input, pattern);
foreach (Match match in matches)
{
Console.WriteLine("Value : {0}", match.Groups[1].Value);
}
To test you're regex in live, https://regex101.com/ is so usefull ! Use it to see visually what the regex request do while you write it.
i'm currently developping an application with , i have probleme with Regex.
i have a file txt that contain email like that:
test#test.uk
test1#test.uk
my function loademail must import email from txt and add him to list result.
but the probleme he still work he dont add any email
this is my code :
public class Loademail
{
public EmailAddress email;
public List<Loademail> loademail()
{
var result = new List<Loademail>();
string fileSocks = Path.GetFullPath(Path.Combine(Application.StartupPath, "liste.txt"));
var input = File.ReadAllText(fileSocks);
var r = new Regex(#"^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#"
+ #"((([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])\."
+ #"([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?
[0-9]{1,2}|25[0-5]|2[0-4][0-9])){1}|"
+ #"([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-Z0-9-]{1,23})$", RegexOptions.IgnoreCase);
foreach (Match match in r.Matches(input))
{
string Email = match.Groups[1].Value;
Loademail bi = new Loademail();
bi.email = EmailAddress.Parse(Email);
result.Add(bi);
//result.Add(Email);
}
return result;
}
what i should do thnks?
Use ignore pattern whitespace.
Edit
Try it using a while () { next match ...}
Like this
Match _mData = Rx.Match( Input );
while (_mData.Success)
{
if (_mData.Groups[1].Success )
Console.WriteLine("{0} \r\n", _mData.Groups[1].Value);
_mData = _mData.NextMatch();
}
// -------------------
Regex Rx = new Regex(
#"
^(([\w-]+\.)+[\w-]+|([a-zA-Z]{1}|[\w-]{2,}))#((([0
-1]?[0-9]{1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{
1,2}|25[0-5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-
5]|2[0-4][0-9])\.([0-1]?[0-9]{1,2}|25[0-5]|2[0-4][
0-9])){1}|([a-zA-Z0-9]+[\w-]+\.)+[a-zA-Z]{1}[a-zA-
Z0-9-]{1,23})$
",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace );
Use a good tool to format and process large expressions.
Formatted:
^
( # (1 start)
( [\w-]+ \. )+ # (2)
[\w-]+
| ( [a-zA-Z]{1} | [\w-]{2,} ) # (3)
) # (1 end)
#
( # (4 start)
( # (5 start)
( # (6 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (6 end)
\.
( # (7 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (7 end)
\.
( # (8 start)
[0-1]? [0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (8 end)
\.
( # (9 start)
[0-1]?
[0-9]{1,2}
| 25 [0-5]
| 2 [0-4] [0-9]
) # (9 end)
){1} # (5 end)
|
( [a-zA-Z0-9]+ [\w-]+ \. )+ # (10)
[a-zA-Z]{1} [a-zA-Z0-9-]{1,23}
) # (4 end)
$
As a side note, this is a good email regex as well.
# http://www.w3.org/TR/html5/forms.html#valid-e-mail-address
# ^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+#[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$
^
[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+
#
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
(?:
\.
[a-zA-Z0-9]
(?:
[a-zA-Z0-9-]{0,61}
[a-zA-Z0-9]
)?
)*
$
Im taking a string like "4 + 5 + ( 7 - 9 ) + 8" and trying to split on the parentheses to get a list containing 4 + 5, (7-9), + 8. So im using the regex string below. But it is giving me 4 + 5, (7-9), 7-9 , + 8. Hoping its just something easy. Thanks.
List<string> test = Regex.Split("4 + 5 + ( 7 - 9 ) + 8", #"(\(([^)]+)\))").ToList();
Remove the extra set of parenthesis you have in your regex:
(\(([^)]+)\)) // your regex
( ) // outer parens
\( \) // literal parens match
( ) // extra parens you don't need
[^)]+ // one or more 'not right parens'
The extra parens create a match for 'inside the literal parens', which is the extra 7 - 9 you see.
So you should have:
#"(\([^)]+\))"
List<string> test = Regex.Split("4 + 5 + ( 7 - 9 ) + 8", #"(\([^)]+\))").ToList();
This question already has answers here:
Using RegEx to balance match parenthesis
(4 answers)
Closed 9 years ago.
I want to select a part of a string, but the problem is that the last character I want to select can have multiple occurrences.
I want to select 'Aggregate(' and end at the matching ')', any () in between can be ignored.
Examples:
string: Substr(Aggregate(SubQuery, SUM, [Model].Remark * [Object].Shortname + 10), 0, 1)
should return: Aggregate(SubQuery, SUM, [Model].Remark * [Object].Shortname + 10)
string: Substr(Aggregate(SubQuery, SUM, [Model].Remark * ([Object].Shortname + 10)), 0, 1)
should return: Aggregate(SubQuery, SUM, [Model].Remark * ([Object].Shortname + 10))
string: Substr(Aggregate(SubQuery, SUM, ([Model].Remark) * ([Object].Shortname + 10) ), 0, 1)
should return: Aggregate(SubQuery, SUM, ([Model].Remark) * ([Object].Shortname + 10) )
Is there a way to solve this with a regular expression? I'm using C#.
This is a little ugly, but you could use something like
Aggregate\(([^()]+|\(.*?\))*\)
It passes all your tests, but it can only match one level of nested parentheses.
This solution works with any level of nested parenthesis by using .NETs balancing groups:
(?x) # allow comments and ignore whitespace
Aggregate\(
(?:
[^()] # anything but ( and )
| (?<open> \( ) # ( -> open++
| (?<-open> \) ) # ) -> open--
)*
(?(open) (?!) ) # fail if open > 0
\)
I'm not sure how much the input varies but for the string examples in the question something as simple as this would work:
Aggregate\(.*\)(?=,)
If eventually consider avoiding regular expressions, here's an alternative for parsing, which uses the System.Xml.Linq namespace:
class Program
{
static void Main()
{
var input = File.ReadAllLines("input.txt");
input.ToList().ForEach(item => {
Console.WriteLine(item.GetParameter("Aggregate"));
});
}
}
static class X
{
public static string GetParameter(this string expression, string element)
{
XDocument doc;
var input1 = "<root>" + expression
.Replace("(", "<n1>")
.Replace(")", "</n1>")
.Replace("[", "<n2>")
.Replace("]", "</n2>") +
"</root>";
try
{
doc = XDocument.Parse(input1);
}
catch
{
return null;
}
var agg=doc.Descendants()
.Where(d => d.FirstNode.ToString() == element)
.FirstOrDefault();
if (agg == null)
return null;
var param = agg
.Elements()
.FirstOrDefault();
if (param == null)
return null;
return element +
param
.ToString()
.Replace("<n1>", "(")
.Replace("</n1>", ")")
.Replace("<n2>", "[")
.Replace("</n2>", "]");
}
}
This regex works with any number of pairs of brackets, and nested to any level:
Aggregate\(([^(]*\([^)]*\))*[^()]\)
For example, it will find the bolded text here:
Substr(Aggregate(SubQuery, SUM(foo(bar), baz()), ([Model].Remark) * ([Object].Shortname + 10) ), 0, 1)
Notice the SUM(foo(bar), baz()) in there.
See a live demo on rubular.
I have this string in C#
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
I want to use a RegEx to parse it to get the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
In addition to the above example, I tested with the following, but am still unable to parse it correctly.
"%exc.uns: 8 hours let # = ABC, DEF", "exc_it = 1 day" , " summ=graffe ", " a,b,(c,d)"
The new text will be in one string
string mystr = #"""%exc.uns: 8 hours let # = ABC, DEF"", ""exc_it = 1 day"" , "" summ=graffe "", "" a,b,(c,d)""";
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
if (str[i] == ',' && scopeLevel == 0)
{
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
firstIndex = i + 1;
}
else if (str[i] == '(') scopeLevel++;
else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));
Event faster:
([^,]*\x28[^\x29]*\x29|[^,]+)
That should do the trick. Basically, look for either a "function thumbprint" or anything without a comma.
adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
^ ^ ^ ^ ^
The Carets symbolize where the grouping stops.
Just this regex:
[^,()]+(\([^()]*\))?
A test example:
var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(#"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
.Cast<Match>()
.Select(m => m.Value);
returns
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
If you simply must use Regex, then you can split the string on the following:
, # match a comma
(?= # that is followed by
(?: # either
[^\(\)]* # no parens at all
| # or
(?: #
[^\(\)]* # ...
\( # (
[^\(\)]* # stuff in parens
\) # )
[^\(\)]* # ...
)+ # any number of times
)$ # until the end of the string
)
It breaks your input into the following:
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
You can also use .NET's balanced grouping constructs to create a version that works with nested parens, but you're probably just as well off with one of the non-Regex solutions.
Another way to implement what Snowbear was doing:
public static string[] SplitNest(this string s, char src, string nest, string trg)
{
int scope = 0;
if (trg == null || nest == null) return null;
if (trg.Length == 0 || nest.Length < 2) return null;
if (trg.IndexOf(src) >= 0) return null;
if (nest.IndexOf(src) >= 0) return null;
for (int i = 0; i < s.Length; i++)
{
if (s[i] == src && scope == 0)
{
s = s.Remove(i, 1).Insert(i, trg);
}
else if (s[i] == nest[0]) scope++;
else if (s[i] == nest[1]) scope--;
}
return s.Split(trg);
}
The idea is to replace any non-nested delimiter with another delimiter that you can then use with an ordinary string.Split(). You can also choose what type of bracket to use - (), <>, [], or even something weird like \/, ][, or `'. For your purposes you would use
string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");
The function would first turn your string into
adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO
then split on the ~, ignoring the nested commas.
Assuming non nested, matching parentheses, you can easily match the tokens you want instead of splitting the string:
MatchCollection matches = Regex.Matches(data, #"(?:[^(),]|\([^)]*\))+");
var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var result = string.Join(#"\n",Regex.Split(s, #"(?<=\)),|,\s"));
The pattern matches for ) and excludes it from the match then matches ,
or
matches , followed by a space.
result =
adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO
The TextFieldParser (msdn) class seems to have the functionality built-in:
TextFieldParser Class: - Provides methods and properties for parsing structured text files.
Parsing a text file with the TextFieldParser is similar to iterating over a text file, while the ReadFields method to extract fields of text is similar to splitting the strings.
The TextFieldParser can parse two types of files: delimited or fixed-width. Some properties, such as Delimiters and HasFieldsEnclosedInQuotes are meaningful only when working with delimited files, while the FieldWidths property is meaningful only when working with fixed-width files.
See the article which helped me find that
Here's a stronger option, which parses the whole text, including nested parentheses:
string pattern = #"
\A
(?>
(?<Token>
(?:
[^,()] # Regular character
|
(?<Paren> \( ) # Opening paren - push to stack
|
(?<-Paren> \) ) # Closing paren - pop
|
(?(Paren),) # If inside parentheses, match comma.
)*?
)
(?(Paren)(?!)) # If we are not inside parentheses,
(?:,|\Z) # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
# though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);
You can get all matches by iterating over match.Groups["Token"].Captures.