Regex that returns a list

Regex that returns a list - c#

I have a string that I am looking up that can have two possible values:
stuff 1
grouped stuff 1-3
I am not very familiar with using regex, but I know it can be very powerful when used correctly. So forgive me if this question sounds ridiculous in anyway. I was wondering if it would be possible to have some sort of regex code that would only leave the numbers of my string (for example in this case 1 and 1-3) but perhaps if it were the example of 1-3 I could just return the 1 and 3 separately to pass into a function to get the in between.
I hope I am making sense. It is hard to put what I am looking for into words. If anyone needs any further clarification I would be more than happy to answer questions/edit my own question.

To create a list of numbers in string y, use the following:
var listOfNumbers = Regex.Matches(y, #"\d+")
.OfType<Match>()
.Select(m => m.Value)
.ToList();

This is fully possible, but best done with two separate Regexes, say SingleRegex and RangedRegex - then check for one or the other, and pass into a function when the result is RangeRegex.
As long as you're checking for "numbers in a specific place" then extra numbers won't confuse your algorythm. There are also several Regex Testers out there, a simple google Search weill give you an interface to check for various syntax and matches.

Are you just wanting to loop through all of the numbers in the string?
Here's one way you can loop throw each match in a regular expression.
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
Regex r = new Regex(#"\d+");
string s = "grouped stuff 1-3";
Match m = r.Match(s);
while(m.Success)
{
string matchText = m.Groups[0].Value;
Console.WriteLine(matchText);
m = m.NextMatch();
}
}
}
This outputs
1
3

Related

Substitutions in Regular Expressions, and Replacement pattern

I spend 4 hours on this and still is not clear to me how should this work.
I want use logic from this link. I want to transform
Some123Grouping TO GroupingSome123
I have 3 parts and should change order using replacement ($1, $2, $3)
Also I need something to transform
name#gmail.com TO name
It is not clear to me how to define replacement and what is captured in my case?
Thanks for help, I would relay appreciate it.

$1, $2, etc. are referring to groups (i.e. the indexes of their appearance of declaration). So you need to define groups in your capturing regex. You do this by using parenthesis. For example:
Regex.Replace("Some123Grouping", #"(Some)(123)(Grouping)", #"$3$1$2")
yields "GroupingSome123".
Note that for better readability, groups can also be named and then referenced by their name. For example:
Regex.Replace("mr.smith#gmail.com", #"(?<name>.*)(#gmail.com)", #"${name}")
yields "mr.smith".
BTW, if you are looking for a general (non .NET specific but great) introduction to Regexes, I recommend Regular-Expressions.info.

Simply using your requirement yields
Regex.Replace("name#gmail.com", #"(name)(#gmail.com)", #"$1")
but I suspect what you want is more along the lines of
Regex.Replace("name#gmail.com", #"(\w*)(#.*)", #"$1")

If I understood correctly:
There is pattern with Text followed by Numbers followed by Text if that is correct this should meet your pattern:
string pattern = #"([A-Za-z]+)(\d+)([A-Za-z]+)";
The next step is getting the groups out if it like:
Regex rx = new Regex(pattern);
var match = rx.Match(input);
Then your result may be obtained in 2 ways, the short version:
result = rx.Replace(input, "$3$1$2");
And the long version:
using System;
using System.Text.RegularExpressions;
public class Program
{
public static void Main()
{
string input = "Some123Grouping";
string pattern = #"([A-Za-z]+)(\d+)([A-Za-z]+)";
Regex rx = new Regex(pattern);
var match = rx.Match(input);
Console.WriteLine("{0} matches found in:\n {1}",
match.Groups.Count,
input);
var newInput = "";
for(int i= match.Groups.Count;i>0;i--){
newInput += match.Groups[i];
}
Console.WriteLine(newInput);
}
}
Regarding your second issue it seems it is as simple as:
var result ="name#gmail.com".Split('#')[0];

RegEx for split text on string .NET

I found this answer for my question, but it for PHP. Perhaps there is an analogue for .NET? I know about Split method, but I don't understand how to save text outside my tags <#any_text#>, and I need a regular expression (by the condition of the task).
For example:
string: aaa<#bbb#>aaa<#bb#>c
list: aaa
<#bbb#>
aaa
<#bb#>
c

Here you have passing test. It wasn't hard to find it on web and it would be definitely faster and better for you - try first finding solution yourself, trying some code, and then ask a question. This way you will actually learn something.
[TestMethod]
public void TestMethod1()
{
string source = "aaa<#bbb#>aaa<#bb#>c";
Regex r = new Regex("(<#.+?#>)");
string[] result = r.Split(source);
Assert.AreEqual(5, result.Length);
}

string input = #"aaa<#bbb#>aaa<#bb#>c";
var list = Regex.Matches(input, #"\<.+?\>|[^\<].+?[^\>]|.+?")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

C# Regular Expression to return only the numbers

Let's say I have the following within my source code, and I want to return only the numbers within the string:
The source is coming from a website, just fyi, and I already have it parsed out so that it comes into the program, but then I need to actually parse these numbers to return what I want. Just having a doosy of a time trying to figure it out tho :(
like: 13|100|0;
How could I write this regex?
var cData = new Array(
"g;13|g;100|g;0",
"g;40|g;100|g;1.37",
"h;43|h;100|h;0",
"h;27|h;100|h;0",
"i;34|i;100|i;0",
"i;39|i;100|i;0",
);

Not sure you actually need regex here.
var str = "g;13|g;100|g;0";
str = str.Replace("g;", "");
would give you "13|100|0".
Or a slight improvement on spinon's answer:
// \- included in case numbers can be negative. Leave it out if not.
Regex.Replace("g;13|g;100|g;0", "[^0-9\|\.\-]", "");
Or an option using split and join:
String.Join("|", "g;13|g;100|g;0".Split('|').Select(pipe => pipe.Split(';')[1]));

I would use something like this so you only keep numbers and separator:
Regex.Replace("g;13|g;100|g;0", "[^0-9|]", "");

Regex might be overkill in this case. Given the uniform delimiting of | and ; I would recommend String.Split(). Then you could either split again or use String.Replace() to get rid of the extra chars (i.e. g;).

It looks like you have a number of solutions, but I'll throw in one more where you can iterate over each group in a match to get the number out if you want.
Regex regexObj = new Regex(#"\w;([\d|.]+)\|?");
Match matchResults = regexObj.Match("g;13|g;100|g;0");
if( matchResults.IsMatch )
{
for (int i = 1; i < matchResults.Groups.Count; i++)
{
Group groupObj = matchResults.Groups[i];
if (groupObj.Success)
{
//groupObj.Value will be the number you want
}
}
}
I hope this is helps.

string "search and replace" using a .NET regex

I need to do a 2 rule "replace" -- my rules are, replace all open parens, "(" with a hyphen "-" and strip out all closing parens ")".
So for example this:
"foobar(baz2)" would become
"foobar-baz2"
I currently do it like this -- but, my hunch regex would be cleaner.
myString.Replace("(", "-").Replace(")", "");

I wouldn't go to RegEx for this - what you're doing is just right. It's clear and straightforward ... regular expressions are unlikely to make this any simpler or clearer. You would still need to make two calls to Replace because your substitutions are different for each case.

You CAN use one regex to replace both those occurrences in one line, but it would be less 'forgiving' than two single rule string replacements.
Example:
The code that would be used to do what you want with regex would be:
Regex.Replace(myString, #"([^\(]*?)\(([^\)]*?)\)", "$1-$2");
This would work fine for EXACTLY the example that you provided. If there was the slightest change in where, and how many '(' and ')' characters there are, the regex would break. You could then fix that with more regex, but it would just get uglier and uglier from there.
Regex is an awesome choice, however, for applications that are more rigid.

Jamie Zawinski suddenly comes to my mind:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
So I also think LBushkin is right in this case. Your solution works and is readable.

Nope. This is perfectly clean.
Point is, you'd have to have two regexes anyway, because your substitution strins are different.

I'd say use what you have - it's more-easily readable/maintainable. Regexes are super powerful but also sometimes super confusing. For something this simple, I'd say don't even use Regexes.

I'd think a regex is going to be kind of brittle for this kind of thing. If your version of .NET has extension methods and you'd like a cleaner syntax that scales you might introduce an extension method like this:
public static class StringExtensions
{
public static string ReplaceMany(this string s, Dictionary<string, string> replacements)
{
var sb = new StringBuilder(s);
foreach (var replacement in replacements)
{
sb = sb.Replace(replacement.Key, replacement.Value);
}
return sb.ToString();
}
}
So now you build up your dictionary of replacements...
var replacements = new Dictionary<string, string> { {"(", "-"}, {")", ""} };
And call ReplaceMany:
var result = "foobar(baz2)".ReplaceMany(replacements); // result = foobar-baz2
If you really want to show your intent you can alias Dictionary<string,string> to StringReplacements:
//At the top
using StringReplacements = System.Collections.Generic.Dictionary<string,string>;
//In your function
var replacements = new StringReplacements() { {"(", "-"}, {")", ""} };
var result = "foobar(baz2)".ReplaceMany(replacements);
Might be overkill for only two replacements, but if you have many to make it'll be cleaner than .Replace().Replace().Replace().Replace()....

Regex is overkill for such a simple scenario. What you have is perfect. Although your question has already been answered, I wanted to post to demonstrate that one regex pattern is sufficient:
string input = "foobar(baz2)";
string pattern = "([()])";
string result = Regex.Replace(input, pattern, m => m.Value == "(" ? "-" : "");
Console.WriteLine(result);
The idea is to capture the parentheses in a group. I used [()] which is a character class that'll match what we're after. Notice that inside a character class they don't need to be escaped. Alternately the pattern could've been #"(\(|\))" in which case escaping is necessary.
Next, the Replace method uses a MatchEvaluator and we check whether the captured value is an opening ( or not. If it is, a - is returned. If not we know, based on our limited pattern, that it must be a closing ) and we return an empty string.

Here's a fun LINQ-based solution to the problem. It may not be the best option, but it's an interesting one anyways:
public string SearchAndReplace(string input)
{
var openParen = '(';
var closeParen = ')';
var hyphen = '-';
var newChars = input
.Where(c => c != closeParen)
.Select(c => c == openParen ? hyphen : c);
return new string(newChars.ToArray());
}
2 interesting notes about this implementation:
It requires no complicated regex, so you get better performance and easier
maintenance.
Unlike string.Replace implementations, this method
allocates exactly 1 string.
Not bad!

How can I get a regex match to only be added once to the matches collection?

I have a string which has several html comments in it. I need to count the unique matches of an expression.
For example, the string might be:
var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";
I currently use this to get the matches:
var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);
The results of this is 3 matches. However, I would like to have this be only 2 matches since there are only two unique matches.
I know I can probably loop through the resulting MatchCollection and remove the extra Match, but I'm hoping there is a more elegant solution.
Clarification: The sample string is greatly simplified from what is actually being used. There can easily be an X8 or X9, and there are likely dozens of each in the string.

I would just use the Enumerable.Distinct Method for example like this:
string subjectString = "<!--X1-->Hi<!--X1-->there<!--X2--><!--X1-->Hi<!--X1-->there<!--X2-->";
var regex = new Regex(#"<!--X\d-->");
var matches = regex.Matches(subjectString);
var uniqueMatches = matches
.OfType<Match>()
.Select(m => m.Value)
.Distinct();
uniqueMatches.ToList().ForEach(Console.WriteLine);
Outputs this:
<!--X1-->
<!--X2-->
For regular expression, you could maybe use this one?
(<!--X\d-->)(?!.*\1.*)
Seems to work on your test string in RegexBuddy at least =)
// (<!--X\d-->)(?!.*\1.*)
//
// Options: dot matches newline
//
// Match the regular expression below and capture its match into backreference number 1 «(<!--X\d-->)»
// Match the characters “<!--X” literally «<!--X»
// Match a single digit 0..9 «\d»
// Match the characters “-->” literally «-->»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*\1.*)»
// Match any single character «.*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// Match the same text as most recently matched by capturing group number 1 «\1»
// Match any single character «.*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

It appears you're doing two different things:
Matching comments like /<-- X. -->/
Finding the set of unique comments
So it is fairly logical to handle these as two different steps:
var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);
var uniqueMatches = matches.Cast<Match>().Distinct(new MatchComparer());
class MatchComparer : IEqualityComparer<Match>
{
public bool Equals(Match a, Match b)
{
return a.Value == b.Value;
}
public int GetHashCode(Match match)
{
return match.Value.GetHashCode();
}
}

Extract the comments and store them in an array. Then you can filter out the unique values.
But I don’t know how to implement this in C#.

Depending on how many Xn's you have you might be able to use:
(\<!--X1--\>){1}.*(\<!--X2--\>){1}
That will only match each occurrence of the X1, X2 etc. once provided they are in order.

Capture the inner portion of the comment as a group. Then put those strings into a hashtable(dictionary). Then ask the dictionary for its count, since it will self weed out repeats.
var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";
var tokens = new Dicationary<string, string>();
Regex.Replace(teststring, #"<!--(.*)-->",
match => {
tokens[match.Groups[1].Value] = match.Groups[1].Valuel;
return "";
});
var uniques = tokens.Keys.Count;
By using the Regex.Replace construct you get to have a lambda called on each match. Since you are not interested in the replace, you don't set it equal to anything.
You must use Group[1] because group[0] is the entire match.
I'm only repeating the same thing on both sides, so that its easier to put into the dictionary, which only stores unique keys.

If you want a distinct Match list from a MatchCollection without converting to string, you can use something like this:
var distinctMatches = matchList.OfType<Match>().GroupBy(x => x.Value).Select(x =>x.First()).ToList();
I know it has been 12 years but sometimes we need this kind of solutions, so I wanted to share. C# evolved, .NET evolved, so it's easier now.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex that returns a list - c#

To create a list of numbers in string y, use the following: var listOfNumbers = Regex.Matches(y, #"\d+") .OfType<Match>() .Select(m => m.Value) .ToList();

Related

Substitutions in Regular Expressions, and Replacement pattern

RegEx for split text on string .NET

C# Regular Expression to return only the numbers

string "search and replace" using a .NET regex

How can I get a regex match to only be added once to the matches collection?

Categories

Resources