Process regex matches and non-matches differently - c#

When the following code is run:
string input = "<td>abc</td><td></td><td>abc</td>)";
string pattern = "<td>(abc)?</td>";
foreach (Match match in Regex.Matches(input, pattern))
Console.Write(match.Groups[1].Value);
If outputs the following text:
abcabc
That makes sense since the pattern only matches the first and the last td elements in the input string. However, I'd like to change it so that it outputs the following:
abc
abc
In other words, I'd like it to output a new line when it encounters an empty td element. How could I accomplish this?

You could do that like this:
string input = "<td>abc</td><td></td><td>abc</td>)";
string pattern = "<td>(abc)?</td>";
foreach (Match match in Regex.Matches(input, pattern))
{
if (match.Groups[1].Success)
Console.Write(match.Groups[1].Value);
else
Console.WriteLine();
}
By changing your pattern from <td>(abc)</td> to <td>(abc)?</td>, the abc becomes optional. In other words, either <td>abc</td> or <td></td> inputs will match. Since the entire group is optional, you can then use the Group.Success property to determine whether or not the capturing group exists in each match.

Related

I'm having trouble with a multiline regex in C#, how do I fix this?

I have the following code to attempt to extract the content of li tags.
string blah = #"<ul>
<li>foo</li>
<li>bar</li>
<li>oof</li>
</ul>";
string liRegexString = #"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
Match liMatches = liRegex.Match(blah);
if (liMatches.Success)
{
foreach (var group in liMatches.Groups)
{
Console.WriteLine(group);
}
}
Console.ReadLine();
The Regex started much simpler and without the multiline option, but I've been tweaking it to try to make it work.
I want results foo, bar and oof but instead I get <li>foo</li> and foo.
On top of this I it seems to work fine in Regex101, https://regex101.com/r/jY6rnz/1
Any thoughts?
I will start by saying that I think as mentioned in comments you should be parsing HTML with a proper HTML parser such as the HtmlAgilityPack. Moving on to actually answer your question though...
The problem is that you are getting a single match because liRegex.Match(blah); only returns a single match. What you want is liRegex.Matches(blah) which will return all matches.
So your use would be:
var liMatches = liRegex.Matches(blah);
foreach(Match match in liMatches)
{
Console.WriteLine(match.Groups[1].Value);
}
Your regex produces multiple matches when matched with blah. The method Match only returns the first match, which is the foo one. You are printing all groups in that first match. That will get you 1. the whole match 2. group 1 of the match.
If you want to get foo and bar, then you should print group 1 of each match. To do this you should get all the matches using Matches first. Then iterate over the MatchCollection and print Groups[1]:
string blah = #"<ul>
<li>foo</li>
<li>bar</li>
<li>oof</li>
</ul>";
string liRegexString = #"(?:.)*?<li>(.*?)<\/li>(?:.?)*";
Regex liRegex = new Regex(liRegexString, RegexOptions.Multiline);
MatchCollection liMatches = liRegex.Matches(blah);
foreach (var match in liMatches.Cast<Match>())
{
Console.WriteLine(match.Groups[1]);
}

Looking for patterns in a string how to?

I'm trying to find all instances of the substring EnemyType('XXXX') where XXXX is an arbitrary string and the instasnce of EnemyType('XXXX') can appear multiple times.
Right now I'm using a consortium of index of/substring functions in C# but would like to know if there's a cleaner way of doing it?
Use regex. Example:
using System.Text.RegularExpressions;
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('\d{4}'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Value);
}
It will print:
EnemyType('1234')
EnemyType('5678')
The pattern to match is #"EnemyType\('\d{4}'\)", where \d{4} means 4 numeric characters (0-9). The parentheses are escaped with backslash.
Edit: Since you only want the string inside quotes, not the whole string, you can use named groups instead.
var inputString = " EnemyType('1234')abcdeEnemyType('5678')xyz";
var regex = new Regex(#"EnemyType\('(?<id>[^']+)'\)");
var matches = regex.Matches(inputString);
foreach (Match i in matches)
{
Console.WriteLine(i.Groups["id"].Value);
}
Now it prints:
1234
5678
Regex is a really nice tool for parsing strings. If you often parse strings, regex can make life so much easier.

Match with blank and without blank

I want or get the name of mp3
I'm currently using this code
string str = "onClick=\"playVideo('upload/honour-3.mp3',this)\"/> onClick=\"playVideo('upload/honor is my honor .mp3',this)\"/> onClick=\"playVideo('upload/honour-6.mp3',this)\"/> ";
string Pattern = #"playVideo\(\'upload\/(?<mp3>\S*).mp3\'\,this\)";
if (Regex.IsMatch(str, Pattern))
{
MatchCollection Matches = Regex.Matches(str, Pattern);
foreach (Match match in Matches)
{
string fn = match.Groups["mp3"].Value;
Debug.Log(match.Groups["mp3"].Value);
}
}
But \ S * matches only like
honour-3
honour-6
i can't get "honor is my honor "
i try the"\S*\s*",but it not work
I have a lot of how many blank string uncertain
How do I use Regex to get mp3's name?
If you dont have to match "playVideo" and "upload", Your regex is unnecessarily complicated. This one produces the expected results:
#"[\w\s-]+\.mp3"
Results:
"honour-3.mp3",
"honor is my honor .mp3",
"honour-6.mp3"
If you don't want .mp3 at the end of the matches, you can change the regex to #"([\w\s-]+)\.mp3" and select the second group (the first one is the whole match).
Regex.Matches(str, #"([\w\s-]+)\.mp3").Cast<Match>().Select(m => m.Groups[1].Value).ToArray();
Results:
"honour-3",
"honor is my honor ",
"honour-6"

REGEX help needed in c#

I am very new to reg-ex and i am not sure whats going on with this one.... however my friend gave me this to solve my issue BUT somehow it is not working....
string: department_name:womens AND item_type_keyword:base-layer-underwear
reg-ex: (department_name:([\\w-]+))?(item_type_keyword:([\\w-]+))?
desired output: array OR group
1st element should be: department_name:womens
2nd should be: womens
3rd: item_type_keyword:base-layer-underwear
4th: base-layer-underwear
strings can contain department_name OR item_type_keyword, BUT not mendatory, in any order
C# Code
Regex regex = new Regex(#"(department_name:([\w-]+))?(item_type_keyword:([\w-]+))?");
Match match = regex.Match(query);
if (match.Success)
if (!String.IsNullOrEmpty(match.Groups[4].ToString()))
d1.ItemType = match.Groups[4].ToString();
this C# code only returns string array with 3 element
1: department_name:womens
2: department_name:womens
3: womens
somehow it is duplicating 1st and 2nd element, i dont know why. BUT its not return the other elements that i expect..
can someone help me please...
when i am testing the regex online, it looks fine to me...
http://fiddle.re/crvw1
Thanks
You can use something like this to get the output you have in your question:
string txt = "department_name:womens AND item_type_keyword:base-layer-underwear";
var reg = new Regex(#"(?:department_name|item_type_keyword):([\w-]+)", RegexOptions.IgnoreCase);
var ms = reg.Matches(txt);
ArrayList results = new ArrayList();
foreach (Match match in ms)
{
results.Add(match.Groups[0].Value);
results.Add(match.Groups[1].Value);
}
// results is your final array containing all results
foreach (string elem in results)
{
Console.WriteLine(elem);
}
Prints:
department_name:womens
womens
item_type_keyword:base-layer-underwear
base-layer-underwear
match.Groups[0].Value gives the part that matched the pattern, while match.Groups[1].Value will give the part captured in the pattern.
In your first expression, you have 2 capture groups; hence why you have twice department_name:womens appearing.
Once you get the different elements, you should be able to put them in an array/list for further processing. (Added this part in edit)
The loop then allows you to iterate over each of the matches, which you cannot exactly do with if and .Match() (which is better suited for a single match, while here I'm enabling multiple matches so the order they are matched doesn't matter, or the number of matches).
ideone demo
(?:
department_name # Match department_name
| # Or
item_type_keyword # Match item_type_keyword
)
:
([\w-]+) # Capture \w and - characters
It's better to use the alternation (or logical OR) operator | because we don't know the order of the input string.
(department_name:([\w-]+))|(item_type_keyword:([\w-]+))
DEMO
String input = #"department_name:womens AND item_type_keyword:base-layer-underwear";
Regex rgx = new Regex(#"(?:(department_name:([\w-]+))|(item_type_keyword:([\w-]+)))");
foreach (Match m in rgx.Matches(input))
{
Console.WriteLine(m.Groups[1].Value);
Console.WriteLine(m.Groups[2].Value);
Console.WriteLine(m.Groups[3].Value);
Console.WriteLine(m.Groups[4].Value);
}
IDEONE
Another idea using a lookahead for capturing and getting all groups in one match:
^(?!$)(?=.*(department_name:([\w-]+))|)(?=.*(item_type_keyword:([\w-]+))|)
as a .NET String
"^(?!$)(?=.*(department_name:([\\w-]+))|)(?=.*(item_type_keyword:([\\w-]+))|)"
test at regexplanet (click on .NET); test at regex101.com
(add m multiline modifier if multiline input: "^(?m)...)
If you use any spliting with And Or , etc that you can use
(department_name:(.*?)) AND (item_type_keyword:(.*?)$)
•1: department_name:womens
•2: womens
•3: item_type_keyword:base-layer-underwear
•4: base-layer-underwear
(?=(department_name:\w+)).*?:([\w-]+)|(?=(item_type_keyword:.*)$).*?:([\w-]+)
Try this.This uses a lookahead to capture then backtrack and again capture.See demo.
http://regex101.com/r/lS5tT3/52

Regex to match and return group names

I need to match the following strings and returns the values as groups:
abctic
abctac
xyztic
xyztac
ghhtic
ghhtac
Pattern is wrote with grouping is as follows:
(?<arch>[abc,xyz,ghh])(?<flavor>[tic,tac]$)
The above returns only parts of group names. (meaning match is not correct).
If I use * in each sub pattern instead of $ at the end, groups are correct, but that would mean that abcticff will also match.
Please let me know what my correct regex should be.
Your pattern is incorrect because a pipe symbol | is used to specify alternate matches, not a comma in brackets as you were using, i.e., [x,y].
Your pattern should be: ^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$
The ^ and $ metacharacters ensures the string matches from start to end. If you need to match text in a larger string you could replace them with \b to match on a word boundary.
Try this approach:
string[] inputs = { "abctic", "abctac", "xyztic", "xyztac", "ghhtic", "ghhtac" };
string pattern = #"^(?<arch>abc|xyz|ghh)(?<flavor>tic|tac)$";
foreach (var input in inputs)
{
var match = Regex.Match(input, pattern);
if (match.Success)
{
Console.WriteLine("Arch: {0} - Flavor: {1}",
match.Groups["arch"].Value,
match.Groups["flavor"].Value);
}
else
Console.WriteLine("No match for: " + input);
}

Categories