Regex regular expression

Regex regular expression - c#

I'm programming a calculator in C# to begin with that.
I will separate a string in two variables= Nb1 and Nb2. I looked on the web for examples and I found something :
var numAlpha = new Regex("(?<Alpha>[a-zA-Z]*)(?<Numeric>[0-9]*)");
var match = numAlpha.Match("codename123");
var alpha = match.Groups["Alpha"].Value; // Alpha = codename
var num = match.Groups["Numeric"].Value; // Numeric = 123
I fails to only adapt for the numbers : " 121165468746*1132" or "4586/6953"
Nb1 =121165468746 || 4586
Nb2 =1132 || 6953
Can you help me ? I'm going crazy :-)

var numAlpha = new Regex("(?<NumOne>[0-9]+)(?<Operator>[^0-9])(?<NumTwo>[0-9]+)");
var match = numAlpha.Match("121165468746*1132");
var nb1 = match.Groups["NumOne"].Value; // nb1 = 121165468746
var nb2 = match.Groups["NumTwo"].Value; // nb2 = 1132
var op = match.Groups["Operator"].Value; // op = *
It looks like what you're trying to do is match some pair of integers separated by an operator. The above regex uses named groups (?<GroupName> ... ), and two simple regular expressions to achieve that. [0-9]+ will match one or more digits, and [^0-9] will match any one non-digit character, which here is being assumed to be the operator.
If this isn't what you were looking for, leave a comment and I'll try to help you out. In the mean time, some reading material:
Regular Expressions Page, with plenty of tutorials and references
Javascript Regular Expressions Tester with Syntax Highlighting
Java RegEx Tester. More powerful, fewer frills.

Try to use this pattern for your Regex: it assumes that there are at least two numbers with one or more digits, separated by one or more non-digit characters (in case the operator is not only one character). The groups are called n1 and n2.
^(?<n1>\d+)[^\d]+?(?<n2>\d+)$

Use the following to match numbers with predefined 4 basic operations Multiply, Subtract, Add, Divide. You can add more operators to the "op" expressions as per your need.
Regex rg = new Regex(#"(?<num1>[0-9]+)(?<op>[\*\-\+\\])(?<num2>[0-9]+)");

Related

RegEx string between N and (N+1)th Occurance

I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;

If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.

Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2

Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];

Regex to extract data from card swipe

I'm searching for a regular expression that will extract 0263563
from
;010263563=2119?
and 0267829
from
%00000026782904?;010267829=4119?
(Must be the same regular expression).

Start at the 3rd character after the semicolon and take 7 characters.
;..(\d{7})
or more general:
;..(.{7})
Or according to comment : "To clarify characters to take are digits"
;\d\d(\d{7})

Well, you need this:
;\d{2}(\d+)
The First group will contain the number you want.

;[\S]{2}([\S]{7})
Since you said characters and not numbers, but it would work either way

For rules that are that precise (start 3 chars after ;, then next 7), you could use a plain substring:
string s = "%00000026782904?;010267829=4119?";
var pos = s.IndexOf(';');
var number = s.Substring(pos+3, 7);
And of course, test whether that IndexOf really found the ;

The below regex would exactly 7 digits which must be preceded by a ; , any two characters.
(?<=;.{2})\d{7}
Code:
String input = #";010263563=2119?
%00000026782904?;010267829=4119?";
Regex rgx = new Regex(#"(?<=;.{2})\d{7}");
foreach (Match m in rgx.Matches(input))
Console.WriteLine(m.Groups[0].Value);
IDEONE
Output:
0263563
0267829

Assuming regex is not an absolute requirement, you could use:
var input = "%00000026782904?;010267829=4119?";
// output will be: 0267829
var digits = input.SkipWhile(x => x != ';').Skip(3).Take(7).ToArray();
var output = new string(digits);

Get sub-strings from a string that are enclosed using some specified character

Suppose I have a string
Likes (20)
I want to fetch the sub-string enclosed in round brackets (in above case its 20) from this string. This sub-string can change dynamically at runtime. It might be any other number from 0 to infinity. To achieve this my idea is to use a for loop that traverses the whole string and then when a ( is present, it starts adding the characters to another character array and when ) is encountered, it stops adding the characters and returns the array. But I think this might have poor performance. I know very little about regular expressions, so is there a regular expression solution available or any function that can do that in an efficient way?

If you don't fancy using regex you could use Split:
string foo = "Likes (20)";
string[] arr = foo.Split(new char[]{ '(', ')' }, StringSplitOptions.None);
string count = arr[1];
Count = 20
This will work fine regardless of the number in the brackets ()
e.g:
Likes (242535345)
Will give:
242535345

Works also with pure string methods:
string result = "Likes (20)";
int index = result.IndexOf('(');
if (index >= 0)
{
result = result.Substring(index + 1); // take part behind (
index = result.IndexOf(')');
if (index >= 0)
result = result.Remove(index); // remove part from )
}
Demo

For a strict matching, you can do:
Regex reg = new Regex(#"^Likes\((\d+)\)$");
Match m = reg.Match(yourstring);
this way you'll have all you need in m.Groups[1].Value.
As suggested from I4V, assuming you have only that sequence of digits in the whole string, as in your example, you can use the simpler version:
var res = Regex.Match(str,#"\d+")
and in this canse, you can get the value you are looking for with res.Value
EDIT
In case the value enclosed in brackets is not just numbers, you can just change the \d with something like [\w\d\s] if you want to allow in there alphabetic characters, digits and spaces.

Even with Linq:
var s = "Likes (20)";
var s1 = new string(s.SkipWhile(x => x != '(').Skip(1).TakeWhile(x => x != ')').ToArray());

const string likes = "Likes (20)";
int likesCount = int.Parse(likes.Substring(likes.IndexOf('(') + 1, (likes.Length - likes.IndexOf(')') + 1 )));

Matching when the part in paranthesis is supposed to be a number;
string inputstring="Likes (20)"
Regex reg=new Regex(#"\((\d+)\)")
string num= reg.Match(inputstring).Groups[1].Value
Explanation:
By definition regexp matches a substring, so unless you indicate otherwise the string you are looking for can occur at any place in your string.
\d stand for digits. It will match any single digit.
We want it to potentially be repeated several times, and we want at least one. The + sign is regexp for previous symbol or group repeated 1 or more times.
So \d+ will match one or more digits. It will match 20.
To insure that we get the number that is in paranteses we say that it should be between ( and ). These are special characters in regexp so we need to escape them.
(\d+) would match (20), and we are almost there.
Since we want the part inside the parantheses, and not including the parantheses we tell regexp that the digits part is a single group.
We do that by using parantheses in our regexp. ((\d+)) will still match (20), but now it will note that 20 is a subgroup of this match and we can fetch it by Match.Groups[].
For any string in parantheses things gets a little bit harder.
Regex reg=new Regex(#"\((.+)\)")
Would work for many strings. (the dot matches any character) But if the input is something like "This is an example(parantesis1)(parantesis2)", you would match (parantesis1)(parantesis2) with parantesis1)(parantesis2 as the captured subgroup. This is unlikely to be what you are after.
The solution can be to do the matching for "any character exept a closing paranthesis"
Regex reg=new Regex(#"\(([^\(]+)\)")
This will find (parantesis1) as the first match, with parantesis1 as .Groups[1].
It will still fail for nested paranthesis, but since regular expressions are not the correct tool for nested paranthesis I feel that this case is a bit out of scope.
If you know that the string always starts with "Likes " before the group then Saves solution is better.

How to find repeatable characters

I can't understand how to solve the following problem:
I have input string "aaaabaa" and I'm trying to search for string "aa" (I'm looking for positions of characters)
Expected result is
0 1 2 5
aa aabaa
a aa abaa
aa aa baa
aaaab aa
This problem is already solved by me using another approach (non-RegEx).
But I need a RegEx I'm new to RegEx so google-search can't help me really.
Any help appreciated! Thanks!
P.S.
I've tried to use (aa)* and "\b(\w+(aa))*\w+" but those expressions are wrong

You can solve this by using a lookahead
a(?=a)
will find every "a" that is followed by another "a".
If you want to do this more generally
(\p{L})(?=\1)
This will find every character that is followed by the same character. Every found letter is stored in a capturing group (because of the brackets around), this capturing group is then reused by the positive lookahead assertion (the (?=...)) by using \1 (in \1 there is the matches character stored)
\p{L} is a unicode code point with the category "letter"
Code
String text = "aaaabaa";
Regex reg = new Regex(#"(\p{L})(?=\1)");
MatchCollection result = reg.Matches(text);
foreach (Match item in result) {
Console.WriteLine(item.Index);
}
Output
0
1
2
5

The following code should work with any regular expression without having to change the actual expression:
Regex rx = new Regex("(a)\1"); // or any other word you're looking for.
int position = 0;
string text = "aaaaabbbbccccaaa";
int textLength = text.Length;
Match m = rx.Match(text, position);
while (m != null && m.Success)
{
Console.WriteLine(m.Index);
if (m.Index <= textLength)
{
m = rx.Match(text, m.Index + 1);
}
else
{
m = null;
}
}
Console.ReadKey();
It uses the option to change the start index of a regex search for each consecutive search. The actual problem comes from the fact that the Regex engine, by default, will always continue searching after the previous match. So it will never find a possible match within another match, unless you instruct it to by using a Look ahead construction or by manually setting the start index.
Another, relatively easy, solution is to just stick the whole expression in a forward look ahead:
string expression = "(a)\1"
Regex rx2 = new Regex("(?=" + expression + ")");
MatchCollection ms = rx2.Matches(text);
var indexes = ms.Cast<Match>().Select(match => match.Index);
That way the engine will automatically advance the index by one for every match it finds.
From the docs:
When a match attempt is repeated by calling the NextMatch method, the regular expression engine gives empty matches special treatment. Usually, NextMatch begins the search for the next match exactly where the previous match left off. However, after an empty match, the NextMatch method advances by one character before trying the next match. This behavior guarantees that the regular expression engine will progress through the string. Otherwise, because an empty match does not result in any forward movement, the next match would start in exactly the same place as the previous match, and it would match the same empty string repeatedly.

Try this:
How can I find repeated characters with a regex in Java?
It is in java, but the regex and non-regex way is there. C# Regex is very similar to the Java way.

How can I get a regex match to only be added once to the matches collection?

I have a string which has several html comments in it. I need to count the unique matches of an expression.
For example, the string might be:
var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";
I currently use this to get the matches:
var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);
The results of this is 3 matches. However, I would like to have this be only 2 matches since there are only two unique matches.
I know I can probably loop through the resulting MatchCollection and remove the extra Match, but I'm hoping there is a more elegant solution.
Clarification: The sample string is greatly simplified from what is actually being used. There can easily be an X8 or X9, and there are likely dozens of each in the string.

I would just use the Enumerable.Distinct Method for example like this:
string subjectString = "<!--X1-->Hi<!--X1-->there<!--X2--><!--X1-->Hi<!--X1-->there<!--X2-->";
var regex = new Regex(#"<!--X\d-->");
var matches = regex.Matches(subjectString);
var uniqueMatches = matches
.OfType<Match>()
.Select(m => m.Value)
.Distinct();
uniqueMatches.ToList().ForEach(Console.WriteLine);
Outputs this:
<!--X1-->
<!--X2-->
For regular expression, you could maybe use this one?
(<!--X\d-->)(?!.*\1.*)
Seems to work on your test string in RegexBuddy at least =)
// (<!--X\d-->)(?!.*\1.*)
//
// Options: dot matches newline
//
// Match the regular expression below and capture its match into backreference number 1 «(<!--X\d-->)»
// Match the characters “<!--X” literally «<!--X»
// Match a single digit 0..9 «\d»
// Match the characters “-->” literally «-->»
// Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!.*\1.*)»
// Match any single character «.*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// Match the same text as most recently matched by capturing group number 1 «\1»
// Match any single character «.*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»

It appears you're doing two different things:
Matching comments like /<-- X. -->/
Finding the set of unique comments
So it is fairly logical to handle these as two different steps:
var regex = new Regex("<!--X.-->");
var matches = regex.Matches(teststring);
var uniqueMatches = matches.Cast<Match>().Distinct(new MatchComparer());
class MatchComparer : IEqualityComparer<Match>
{
public bool Equals(Match a, Match b)
{
return a.Value == b.Value;
}
public int GetHashCode(Match match)
{
return match.Value.GetHashCode();
}
}

Extract the comments and store them in an array. Then you can filter out the unique values.
But I don’t know how to implement this in C#.

Depending on how many Xn's you have you might be able to use:
(\<!--X1--\>){1}.*(\<!--X2--\>){1}
That will only match each occurrence of the X1, X2 etc. once provided they are in order.

Capture the inner portion of the comment as a group. Then put those strings into a hashtable(dictionary). Then ask the dictionary for its count, since it will self weed out repeats.
var teststring = "<!--X1-->Hi<!--X1-->there<!--X2-->";
var tokens = new Dicationary<string, string>();
Regex.Replace(teststring, #"<!--(.*)-->",
match => {
tokens[match.Groups[1].Value] = match.Groups[1].Valuel;
return "";
});
var uniques = tokens.Keys.Count;
By using the Regex.Replace construct you get to have a lambda called on each match. Since you are not interested in the replace, you don't set it equal to anything.
You must use Group[1] because group[0] is the entire match.
I'm only repeating the same thing on both sides, so that its easier to put into the dictionary, which only stores unique keys.

If you want a distinct Match list from a MatchCollection without converting to string, you can use something like this:
var distinctMatches = matchList.OfType<Match>().GroupBy(x => x.Value).Select(x =>x.First()).ToList();
I know it has been 12 years but sometimes we need this kind of solutions, so I wanted to share. C# evolved, .NET evolved, so it's easier now.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Regex regular expression - c#

Try to use this pattern for your Regex: it assumes that there are at least two numbers with one or more digits, separated by one or more non-digit characters (in case the operator is not only one character). The groups are called n1 and n2. ^(?<n1>\d+)[^\d]+?(?<n2>\d+)$

Use the following to match numbers with predefined 4 basic operations Multiply, Subtract, Add, Divide. You can add more operators to the "op" expressions as per your need. Regex rg = new Regex(#"(?<num1>[0-9]+)(?<op>[\*\-\+\\])(?<num2>[0-9]+)");

Related

RegEx string between N and (N+1)th Occurance

Regex to extract data from card swipe

Get sub-strings from a string that are enclosed using some specified character

How to find repeatable characters

How can I get a regex match to only be added once to the matches collection?

Categories

Resources