RegExp multiply matches in text - c#

I want to write a regexp to get multiple matches of the first character and next three digits. Some valid examples:
A123,
V322,
R333.
I try something like that
[a-aA-Z](1)\d3
but it gets me just the first match!
Could you possibly show me, how to rewrite this regexp to get multiple results?Thank you so much and Have a nice day!

Your regex does not work because it matches:
[a-aA-Z] - an ASCII letter, then
(1) - a 1 digit (and puts into a capture)
\d - any 1 digit
3 - a 3 digit.
So, it matches Y193, E103, etc., even in longer phrases, where Y and E are not first letters.
You need to use a word boundary and fix your pattern as
\b[a-aA-Z][0-9]{3}
NOTE: if you need to match it as a whole word, add \b at the end: \b[a-aA-Z][0-9]{3}\b.
See the regex demo.
Details:
\b - leading word boundary
[a-aA-Z] - an ASCII letter
[0-9]{3} - 3 digits.
C# code:
var results = Regex.Matches(s, #"\b[a-aA-Z][0-9]{3}")
.Cast<Match>()
.Select(m => m.Value)
.ToList();

Related

How can find format number by Regex?

This my text=0.123.456Vaaa.789.V
I want find text=123.456V
I using this pattern in C#: \.[0-9]*[\.]?[0-9]*V
But result return 2 values: 123.456V and 789.V
I don't want get case blank after ".": 789.V
How can fix my pattern?
Thank you.
In your pattern, [\.]? does not have to be a separate character class, or the dot does not have to be escaped. I suggest writing the optional dot pattern as \.?, it is least ambiguous. [0-9]* after the optional dot pattern matches zero or more digits, hence you get unexpected matches.
You do not seem to need the \. at the start, either.
You can use
[0-9]*\.?[0-9]+V
See the .NET regex demo.
Details:
[0-9]* - zero or more ASCII digits
\.? - an optional .
[0-9]+ - one or more digits
V - a V char.
See a C# regex demo:
var results = Regex.Matches(text, #"[0-9]*\.?[0-9]+V")
.Cast<Match>()
.Select(x => x.Value)
.ToList();
// => 123.456V
I think the simplest solution would be:
\d+\.\d+V
meaning you want to find some arbitrary number of digits, followed by a dot, followed by more digits, followed by the letter V.

C# equivalent for this regex pattern

I have this regular expression pattern: .{2}\#.{2}\K|\..*(*SKIP)(?!)|.(?=.*\.)
It works perfectly to convert to replace the matches to get
trabc#abtrec.com.lo => ***bc#ab*****.com.lo
demomail#demodomain.com => ******il#de*********.com
But when I try to use it on C# the \K and the (*SKIP) and (*F) are not allowed.
what will be the c# version of this pattern? or do you know a simpler way to mask the email without the unsupported pattern entries?
Demo
UPDATE:
(*SKIP): this verb causes the match to fail at the current starting position in the subject if the rest of the pattern does not match
(*F): Forces a matching failure at the given position in the pattern (the same as (?!)
Try this regex:
\w(?=.{2,}#)|(?<=#[^\.]{2,})\w
Click for Demo
Explanation:
\w - matches a word character
(?=.{2,}#) - positive lookahead to find the position immediately followed by 2+ occurrences of any character followed by #
| - OR
(?<=#[^\.]{2,}) - positive lookbehind to find the position immediately preceded by # followed by 2+ occurrences of any character that is not a .
\w - matches a word character.
Replace each match with a *
You can achieve the same result with a regex that matches items in one block, and applying a custom match evaluator:
var res = Regex.Replace(
s
, #"^.*(?=.{2}\#.{2})|(?<=.{2}\#.{2}).*(?=.com.*$)"
, match => new string('*', match.ToString().Length)
);
The regex has two parts:
The one on the left ^.*(?=.{2}\#.{2}) matches the user name portion except the last two characters
The one on the right (?<=.{2}\#.{2}).*(?=.com.*$) matches the suffix of the domain up to the ".com..." ending.
Demo.

Split String by Regex Expression

This is my string.
19282511~2017-08-28 13:24:28~Entering (A/B)~1013~283264/89282511~2017-08-28 13:24:28~Entering (A/B)~1013~283266/79282511~2017-08-28 13:24:28~Entering (A/B)~1013~283261
I would like this string be split like below:
19282511~2017-08-28 13:24:28~Entering (A/B)~1013~283264
89282511~2017-08-28 13:24:28~Entering (A/B)~1013~283266
79282511~2017-08-28 13:24:28~Entering (A/B)~1013~283261
I cannot split my string blindly by slash (/) since there is a value A/B will also get split.
Any idea of doing this by regex expression?
Your help will definitely be appreciated.
You may split with / that is in between digits:
(?<=\d)/(?=\d)
See the regex demo
Details
(?<=\d) - a positive lookbehind that requires a digit to appear immediately to the left of the current location
/ - a / char
(?=\d) - a positive lookahead that requires a digit to appear immediately to the right of the current location.
Since the \d pattern is inside non-consuming patterns, only / will be removed upon splitting and the digits will remain in the resulting items.
Another idea is to match and capture these strings using
/?([^~]*(?:~[^~]*){3}~\d+)
See this regex demo.
Details
/? - 1 or 0 / chars
([^~]*(?:~[^~]*){3}~\d+) - Group 1 (what you need to grab):
[^~]* - zero or more chars other than ~
(?:~[^~]*){3} - 3 or more sequences of ~ and then 0+ chars other than ~
~\d+ - a ~ and then 1 or more digits.
The C# code will look like
var results = Regex.Matches(s, #"/?([^~](?:~[^~]){3}~\d+)")
.Cast()
.Select(m => m.Groups1.Value)
.ToList();
NOTE: By default, \d matches all Unicode digits. If you do not want this behavior, use the RegexOptions.ECMAScript option, or replace \d with [0-9] to only match ASCII digits.

C# Regular Expression for x number of groups of A-Z separated by hyphen

I am trying to match the following pattern.
A minimum of 3 'groups' of alphanumeric characters separated by a hyphen.
Eg: ABC1-AB-B5-ABC1
Each group can be any number of characters long.
I have tried the following:
^(\w*(-)){3,}?$
This gives me what I want to an extent.
ABC1-AB-B5-0001 fails, and ABC1-AB-B5-0001- passes.
I don't want the trailing hyphen to be a requirement.
I can't figure out how to modify the expression.
Your ^(\w*(-)){3,}?$ pattern even allows a string like ----- because the only required pattern here is a hyphen: \w* may match 0 word chars. The - may be both leading and trailing because of that.
You may use
\A\w+(?:-\w+){2,}\z
Details:
\A - start of string
\w+ - 1+ word chars (that is, letters, digits or _ symbols)
(?:-\w+){2,} - 2 or more sequences of:
- - a single hyphen
\w+ - 1 or more word chars
\z - the very end of string.
See the regex demo.
Or, if you do not want to allow _:
\A[^\W_]+(?:-[^\W_]+){2,}\z
or to only allow ASCII letters and digits:
\A[A-Za-z0-9]+(?:-[A-Za-z0-9]+){2,}\z
It can be like this:
^\w+-\w+-\w+(-\w+)*$
^(\w+-){2,}(\w+)-?$
Matches 2+ groups separated by a hyphen, then a single group possibly terminated by a hyphen.
((?:-?\w+){3,})
Matches minimum 3 groups, optionally starting with a hyphen, thus ignoring the trailing hyphen.
Note that the \w word character also select the underscore char _ as well as 0-9 and a-z
link to demo

Numeric substrings between dots

I am trying to make a regex that finds substrings that start with a dot (.), have only numbers and end either with another dot or it's the strings end.
To clarify, here are a few examples:
abc.123.ds => 123
aAsd.12sd.SAs.32.asd.3123 => 32 and 3123
111.2e2 => no result
aaa.bbb.13.320.a => 13 and 320
I tried different approaches, this is the closest I cam to a result is "^[.][0-9]+\.?$" but it still fails.
Any tips would be greatly appreciated
The ^[.][0-9]+\.?$ fails becaue ^ forces the pattern to match at the start of the string and $ makes it match the end of string (the full string), and the .? at the end, when matched, will consume the . and will not let you match an overlapping number with a dot in front.
I suggest using lookarounds:
(?<=\.)[0-9]+(?=\.|$)
See the regex demo
Details:
(?<=\.) - there must be a . immediately to the left of the current position
[0-9]+ - 1+ digits
(?=\.|$) - there must be a . or end of string immediately to the right of the current location.
C#:
var res = Regex.Matches(str, #"(?<=\.)[0-9]+(?=\.|$)")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
Remove the begining of line anchor and do an alternative for the other:
\.[0-9]+(\.|$)
It is pretty simple using capturing groups:
int[] result = Regex.Matches("\.(\d+)\.?").Cast<Match>().Select(x=> int.Parse(x.Groups[2].Value)).ToList();
First group is your entire match
\.(\d+)\.?
Second is first nested brace-closed expression
\d+

Categories