18.jun. 7 noči od 515,00 EUR
here I would like to get 515,00 with a regular expression.
Regex regularExpr = new Regex(#rule.RegularExpression,
RegexOptions.Compiled | RegexOptions.Multiline |
RegexOptions.IgnoreCase | RegexOptions.Singleline |
RegexOptions.IgnorePatternWhitespace);
tagValue.Value = "18.jun. 7 noči od 515,00 EUR";
Match match = regularExpr.Match(tagValue.Value);
object value = match.Groups[2].Value;
regex is: \d+((.\d+)+(,\d+)?)?
but I always get an empty string (""). If I try this regex in Expresso I get an array of 3 values and the third is 515,00.
What is wrong with my C# code that I get an empty string?
Your regex matches the 18 (since the decimal parts are optional), and match.Groups[2] refers to the second capturing parenthesis (.\d+) which should correctly read (\.\d+) and hasn't participated in the match, therefore the empty string is returned.
You need to correct your regex and iterate over the results:
StringCollection resultList = new StringCollection();
Regex regexObj = new Regex(#"\d+(?:[.,]\d+)?");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Value);
matchResult = matchResult.NextMatch();
}
resultList[2] will then contain your match.
Make sure you escaped everything properly when you created the regular expression.
Regex re = new Regex("\d+((.\d+)+(,\d+)?)?")
is very different from
Regex re = new Regex(#"\d+((.\d+)+(,\d+)?)?")
You probably want the second.
I suspect the result you're getting in Expresso is equivalent to this:
string s = "18.jun. 7 noči od 515,00 EUR";
Regex r = new Regex(#"\d+((.\d+)+(,\d+)?)?");
foreach (Match m in r.Matches(s))
{
Console.WriteLine(m.Value);
}
In other words, it's not the contents of the second capturing group you're seeing, it's the third match. This code shows it more clearly:
Console.WriteLine("{0,10} {1,10} {2,10} {3,10}",
#"Group 0", #"Group 1", #"Groups 2", #"Group 3");
Regex r = new Regex(#"\d+((.\d+)+(,\d+)?)?");
foreach (Match m in r.Matches(s))
{
Console.WriteLine("{0,10} {1,10} {2,10} {3,10}",
m.Groups[0].Value, m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value);
}
output:
Group 0 Group 1 Group 2 Group 3
18
7
515,00 ,00 ,00
On to the regex itself. If you want to match only the price and not those other numbers, you need to be more specific. For example, if you know the ,00 part will always be present, you can use this regex:
#"(?n)\b\d+(\.\d+)*(,\d+)\b"
(?n) is the inline form of the ExplicitCapture option, which turns those two capturing groups into non-capturing groups. Of the RegexOptions you did specify, the only one that has any effect is Compiled, which speeds up matching of the regex slightly, at the expense of slowing down its construction and hogging memory. \b is a word boundary.
It looks like you're applying all those modifiers blindly to every regex when you construct them, which is not a good idea. If a particular regex needs a certain modifier, you should try to specify it in the regex itself with an inline modifier, like I did with (?n).
Related
I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;
If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.
Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2
Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];
This question already has answers here:
Regular Expression Groups in C#
(5 answers)
Closed 6 years ago.
New to using C# Regex, I am trying to capture two comma separated integers from a string into two variables.
Example: 13,567
I tried variations on
Regex regex = new Regex(#"(\d+),(\d+)");
var matches = regex.Matches("12,345");
foreach (var itemMatch in matches)
Debug.Print(itemMatch.Value);
This just captures 1 variable, which is the entire string. I did workaround this by changing the capture pattern to "(\d+)", but that then ignores the middle comma entirely and I would get a match if there were any text between the integers.
How do I get it to extract both integers and ensure it also sees a comma between.
Can do this with String.Split
Why not just use a split and parse?
var results = "123,456".Split(',').Select(int.Parse).ToArray();
var left = results[0];
var right = results[1];
Alternatively, you can use a loop and use int.TryParse to handle failures but for what you're looking for this should cover it
If you're really committed to a Regex
You can do this with a Regex too, just need to use groups of the match
Regex r = new Regex(#"(\d+)\,(\d+)", RegexOptions.Compiled);
var r1 = r.Match("123,456");
//first is total match
Console.WriteLine(r1.Groups[0].Value);
//Then first and second groups
var left = int.Parse(r1.Groups[1].Value);
var right = int.Parse(r1.Groups[2].Value);
Console.WriteLine("Left "+ left);
Console.WriteLine("Right "+right);
Made a dotnetfiddle you can test the solution in as well
With Regex, you can use this:
Regex regex = new Regex(#"\d+(?=,)|(?<=,)\d+");
var matches = regex.Matches("12,345");
foreach (Match itemMatch in matches)
Console.WriteLine(itemMatch.Value);
prints:
12
345
Actually this is doing a look-ahead and look-behind a , :
\d+(?=,) <---- // Match numbers followed by a ,
| <---- // OR
(?<=,)\d+ <---- // Match numbers preceeded by a ,
I'm trying to get the values between {} and %% in a same Regex.
This is what I have till now. I can successfully get values individually for each but I was curious to learn about how can I combine both.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
String s = "This is a {test} %String%. %Stack% {Overflow}";
Expected answer for the above string
test
String
Stack
Overflow
Individual regex
#"%(.*?)%" gives me String and Stack
#"\{([^}]*)\}" gives me test and Overflow
Following is my code.
var regex = new Regex(#"%(.*?)%|\{([^}]*)\}");
var matches = regex.Matches(s);
foreach (Match match in matches)
{
Console.WriteLine(match.Groups[1].Value);
}
Similar to your regex. You can use Named Capturing Groups
String s = "This is a {test} %String%. %Stack% {Overflow}";
var list = Regex.Matches(s, #"\{(?<name>.+?)\}|%(?<name>.+?)%")
.Cast<Match>()
.Select(m => m.Groups["name"].Value)
.ToList();
If you want to learn how conditional expressions work, here is a solution using that kind of .NET regex capability:
(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})
See the regex demo
Here is how it works:
(?:(?<p>%)|(?<b>{)) - match and capture either Group "p" with % (percentage), or Group "b" (brace) with {
(?<v>.*?) - match and capture into Group "v" (value) any character (even a newline since I will be using RegexOptions.Singleline) zero or more times, but as few as possible (lazy matching with *? quantifier)
(?(p)%|}) - a conditional expression meaning: if "p" group was matched, match %, else, match }.
C# demo:
var s = "This is a {test} %String%. %Stack% {Overflow}";
var regex = "(?:(?<p>%)|(?<b>{))(?<v>.*?)(?(p)%|})";
var matches = Regex.Matches(s, regex, RegexOptions.Singleline);
// var matches_list = Regex.Matches(s, regex, RegexOptions.Singleline)
// .Cast<Match>()
// .Select(p => p.Groups["v"].Value)
// .ToList();
// Or just a demo writeline
foreach (Match match in matches)
Console.WriteLine(match.Groups["v"].Value);
Sometimes the capture is in group 1 and sometimes it's in group 2 because you have two pairs of parentheses.
Your original code will work if you do this instead:
Console.WriteLine(match.Groups[1].Value + match.Groups[2].Value);
because one group will be the empty string and the other will be the value you're interested in.
#"[\{|%](.*?)[\}|%]"
The idea being:
{ or %
anything
} or %
I think you should use a combination of conditional anda nested groups:
((\{(.*)\})|(%(.*)%))
I am very new to reg-ex and i am not sure whats going on with this one.... however my friend gave me this to solve my issue BUT somehow it is not working....
string: department_name:womens AND item_type_keyword:base-layer-underwear
reg-ex: (department_name:([\\w-]+))?(item_type_keyword:([\\w-]+))?
desired output: array OR group
1st element should be: department_name:womens
2nd should be: womens
3rd: item_type_keyword:base-layer-underwear
4th: base-layer-underwear
strings can contain department_name OR item_type_keyword, BUT not mendatory, in any order
C# Code
Regex regex = new Regex(#"(department_name:([\w-]+))?(item_type_keyword:([\w-]+))?");
Match match = regex.Match(query);
if (match.Success)
if (!String.IsNullOrEmpty(match.Groups[4].ToString()))
d1.ItemType = match.Groups[4].ToString();
this C# code only returns string array with 3 element
1: department_name:womens
2: department_name:womens
3: womens
somehow it is duplicating 1st and 2nd element, i dont know why. BUT its not return the other elements that i expect..
can someone help me please...
when i am testing the regex online, it looks fine to me...
http://fiddle.re/crvw1
Thanks
You can use something like this to get the output you have in your question:
string txt = "department_name:womens AND item_type_keyword:base-layer-underwear";
var reg = new Regex(#"(?:department_name|item_type_keyword):([\w-]+)", RegexOptions.IgnoreCase);
var ms = reg.Matches(txt);
ArrayList results = new ArrayList();
foreach (Match match in ms)
{
results.Add(match.Groups[0].Value);
results.Add(match.Groups[1].Value);
}
// results is your final array containing all results
foreach (string elem in results)
{
Console.WriteLine(elem);
}
Prints:
department_name:womens
womens
item_type_keyword:base-layer-underwear
base-layer-underwear
match.Groups[0].Value gives the part that matched the pattern, while match.Groups[1].Value will give the part captured in the pattern.
In your first expression, you have 2 capture groups; hence why you have twice department_name:womens appearing.
Once you get the different elements, you should be able to put them in an array/list for further processing. (Added this part in edit)
The loop then allows you to iterate over each of the matches, which you cannot exactly do with if and .Match() (which is better suited for a single match, while here I'm enabling multiple matches so the order they are matched doesn't matter, or the number of matches).
ideone demo
(?:
department_name # Match department_name
| # Or
item_type_keyword # Match item_type_keyword
)
:
([\w-]+) # Capture \w and - characters
It's better to use the alternation (or logical OR) operator | because we don't know the order of the input string.
(department_name:([\w-]+))|(item_type_keyword:([\w-]+))
DEMO
String input = #"department_name:womens AND item_type_keyword:base-layer-underwear";
Regex rgx = new Regex(#"(?:(department_name:([\w-]+))|(item_type_keyword:([\w-]+)))");
foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
Console.WriteLine(m.Groups[4].Value);
}
IDEONE
Another idea using a lookahead for capturing and getting all groups in one match:
^(?!$)(?=.*(department_name:([\w-]+))|)(?=.*(item_type_keyword:([\w-]+))|)
as a .NET String
"^(?!$)(?=.*(department_name:([\\w-]+))|)(?=.*(item_type_keyword:([\\w-]+))|)"
test at regexplanet (click on .NET); test at regex101.com
(add m multiline modifier if multiline input: "^(?m)...)
If you use any spliting with And Or , etc that you can use
(department_name:(.*?)) AND (item_type_keyword:(.*?)$)
•1: department_name:womens
•2: womens
•3: item_type_keyword:base-layer-underwear
•4: base-layer-underwear
(?=(department_name:\w+)).*?:([\w-]+)|(?=(item_type_keyword:.*)$).*?:([\w-]+)
Try this.This uses a lookahead to capture then backtrack and again capture.See demo.
http://regex101.com/r/lS5tT3/52
I have some text like "item number - item description" eg "13-40 - Computer Keyboard" that I want to split into item number and item description.
Is this possible with 1 regular expression, or would I need 2 (one for item and one for description)?
I can't work out how to "group" it - like the item number can be this and the description can be this, without it thinking that everything is the item number. Eg:
(\w(\w|-|/)*\w)-.*
matches everything as 1 match.
This is the code I'm using:
Regex rx = new Regex(RegExString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
MatchCollection matches = rx.Matches("13-40 - Computer Keyboard");
Assert.AreEqual("13-40", matches[0].Value);
Assert.AreEqual("Computer Keyboard", matches[1].Value);
From the code you posted, you are using regex wrong. You should be having one regex pattern to match the whole product and using the captures within the match to extract the number and description.
string RegExString = #"(?<number>[\d-]+)\s-\s(?<description>.*)";
Regex rx = new Regex(RegExString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
Match match = rx.Match("13-40 - Computer Keyboard");
Debug.Assert("13-40" == match.Groups["number"].Value);
Debug.Assert("Computer Keyboard" == match.Groups["description"].Value);
Here is a regexp that works in Ruby - not sure if there are any differences in c# regexp:
/^([\d\-]+) \- (.+)$/
([0-9-]+)\s-\s(.*)
Group 1 contains the item number, and group 2 contains the description.
CaffeineFueled's answer is correct for C#.
Match match = Regex.Match("13-40 - Computer Keyboard", #"^([\d\-]+) \- (.+)$");
Console.WriteLine(match.Groups[1]);
Console.WriteLine(match.Groups[2]);
Results:
13-40
Computer Keyboard
If your text is always divided by a dash and you don't have to handle dashes within the data, you don't have to use regex.
string[] itemProperties = item.Split(new string[] { "-" });
itemProperties = itemProperties.Select(p => p.Trim());
Item item = new Item()
{
Number = itemProperties[0],
Name = itemProperties[1],
Description = itemProperties[2]
}
You don't seem to want to match groups, but have multiple matches.
Maybe this will do what you want?
(:^.+(?=( - ))|(?<=( - )).+$)
Split up:
(: Used to provide two possible matches
^.+ Match item ID text
(?=( - )) Text must be before " - "
| OR
(?<=( - )) Test must be after " - "
.+$ Match description text
)
This isn't as elegant as CaffineFueled's answer but maybe easier to read for a regex beginner.
String RegExString = "(\d*-\d*)\s*-\s*(.*)";
Regex rx = new Regex(RegExString, RegexOptions.Compiled | RegexOptions.IgnoreCase);
MatchCollection matches = rx.Matches("13-40 - Computer Keyboard");
Assert.AreEqual("13-40", matches[0].Value);
Assert.AreEqual("Computer Keyboard", matches[1].Value);
or even more readable:
String RegExString = "(\d*-\d*) - (.*)";