Regex.Replace using regular expression as replacement - c#

I am new to C# programming language and came across the following problem
I have a string " avenue 4 TH some more words". I want to remove space between 4 and TH. I have written a regex which helps in determining whether "4 TH" is available in a string or not.
[0-9]+\s(th|nd|st|rd)
string result = "avanue 4 TH some more words";
var match = Regex.IsMatch(result,"\\b" + item + "\\b",RegexOptions.IgnoreCase) ;
Console.WriteLine(match);//True
Is there anything in C# which will remove the space
something likeRegex.Replace(result, "[0-9]+\\s(th|nd|st|rd)", "[0-9]+(th|nd|st|rd)",RegexOptions.IgnoreCase);
so that end result looks like
avenue 4TH some more words

You may use
var pattern = #"(?i)(\d+)\s*(th|[nr]d|st)\b";
var match = string.Concat(Regex.Match(result, pattern)?.Groups.Cast<Group>().Skip(1));
See the C# demo yielding 4TH.
The regex - (?i)(\d+)\s*(th|[nr]d|st)\b - matches 1 or more digits capturing the value into Group 1, then 0 or more whitespaces are matched with \s*, and then th, nd, rd or st as whole words (as \b is a word boundary) are captured into Group 2.
The Regex.Match(result, pattern)? part tries to match the pattern in the string. If there is a match, the match object Groups property is accessed and all groups are cast to aGrouplist withGroups.Cast(). Since the first group is the whole match value, we.Skip(1)` it.
The rest - the values of Group 1 and Group 2 - are concatenated with string.Concat.

Related

How can I write a Regex with matching groups for a comma separated string

I've got a random input string to validate and tokenize.
My aim is to check if my string has the following pattern
[a-zA-Z]{2}\d{2} (one or unlimited times) comma separated
So:
aa12,af43,ad46 -> is valid
,aa12,aa44 -> is NOT valid (initial comma)
aa12, -> is NOT valid ( trailing comma)
That's the first part, validation
Then, with the same regex I've got to create a group for each occurrence of the pattern (match collection)
So:
aa12,af34,tg53
is valid and must create the following groups
Group 1 -> aa12
Group 2 -> af34
Group 3 -> tg53
Is it possible to have it done with only one regex that validates and creates the groups?
I've written this
^([a-zA-Z]{2}\d{2})(?:(?:[,])([a-zA-Z]{2}\d{2})(?:[,])([a-zA-Z]{2}\d{2}))*(?:[,])([a-zA-Z]{2}\d{2})*|$
but even if it creates the groups more or less correctly, it lacks in the validation process, getting also strings that have a wrong pattern.
Any hints would be very very welcome
You can use
var text = "aa12,af43,ad46";
var pattern = #"^(?:([a-zA-Z]{2}\d{2})(?:,\b|$))+$";
var result = Regex.Matches(text, pattern)
.Cast<Match>()
.Select(x => x.Groups[1].Captures.Cast<Capture>().Select(m => m.Value))
.ToList();
foreach (var list in result)
Console.WriteLine(string.Join("; ", list));
# => aa12; af43; ad46
See the C# demo online and the regex demo.
Regex details
^ - start of string
(?:([a-zA-Z]{2}\d{2})(?:,\b|$))+ - one or more occurrences of
([a-zA-Z]{2}\d{2}) - Group 1: two ASCII letter and then two digits
(?:,\b|$) - either , followed with a word char or end of string
$ - end of string. You may use \z if you want to prevent matching trailing newlines, LF, chars.

Regex Replace between groups

So I have the following regex.replace in C#:
Regex.Replace(inputString, #"^([^,]*,){5}(.*)", #"$1somestring,$2");
where 5 is a variable number in code, but that's not really relevant since at the time of execution it will always have a set value (like 5, for example). Same with somestring,.
Essentially I want to input somestring, between the two groups. The output works for somestring,$2, but $1 is just printed as $1. So say whatever (.*) grabs = "2, a, f2" the resulting string I'd get out is $1somestring,2,a,f2 no matter what $1 is. Is this because of the repeating group feature {5}? If so, how do I grab the collection of repeats and put it in place of where I have $1 right now?
Edit: I know the first group captures correctly, as well. I grab the content of somestring, using this regex:
Regex.Match(line, #"^([^,]*,){5}([0-9]+\.[0-9]+),.*");
The first part is identical the the first group in the replacement regex, and it works fine, so there shouldn't be an issue (and they're both used on the same string).
Edit2:
Ok I'll try to explain more of the process since someone said it was hard to understand. I have three variables, line a string I work with, and latIndex and lonIndex which are just ints (tells me between what ,'s two doubles I look for are located). I have the two following matches:
var latitudeMatch = Regex.Match(line, #"^([^,]*,){" + latIndex + #"}([0-9]+\.[0-9]+),.*");
var longitudeMatch = Regex.Match(line, #"^([^,]*,){" + lonIndex + #"}([0-9]+\.[0-9]+),.*");
I then grab the doubles:
var latitude = latitudeMatch.Groups[2].Value;
var longitude = longitudeMatch.Groups[2].Value;
I use these doubles to get a string from a web API, which i store in a variable called veiRef. Then I want to insert these after the doubles, using the following code (insert after lat or lon, depending on which one appears last):
if (latIndex > lonIndex)
{
line = Regex.Replace(line, #"^([^,]*,){" + (latIndex+1) + #"}(.*)",$#"$1{veiRef},$2");
}
else
{
line = Regex.Replace(line, #"^([^,]*,){" + (lonIndex + 1) + #"}(.*)", $#"$1{veiRef},$2");
}
However, this results in a string line which doesn't have the content of $1 inserted before it ($2 works fine).
You have a repeated capturing group at the start of the pattern that you need to turn into a non-capturing one and wrap with a capturing group. Then, you may access the whole part of the match with the $1 backreference.
var line = "a, s, f, double, double, 12, sd, 1";
var latIndex = 5;
var pat = $#"^((?:[^,]*,){{{latIndex+1}}})(.*)";
// Console.WriteLine(pat); // => ^((?:[^,]*,){6})(.*)
var veiRef = "str";
line = Regex.Replace(line, pat, $"${{1}}{veiRef.Replace("$","$$")}$2");
Console.WriteLine(line); // => a, s, f, double, double, 12,str sd, 1
See the C# demo
The pattern - ^((?:[^,]*,){6})(.*) - now contains ((?:[^,]*,){6}) after ^, and this is now what $1 holds after a match is found.
Since your replacement string is dynamic, you need to make sure any $ inside gets doubled (hence, .Replace("$","$$")) and that the first backreference is unambiguous, thus it should look like ${1} (it will work regardless whether the veiRef starts with a digit or not).
Replacement string in details:
It is an interpolated string literal...
$" - declaration of the interpolated string literal (start)
${{1}} - a literal ${1} string (the { and } must be doubled to denote literal symbols)
{veiRef.Replace("$","$$")} - a piece of C# code inside the interpolated string literal (we delimit this part where code is permitted with single {...})
$2 - a literal $2 string
" - end of the interpolated string literal.
Adding an extra group around the repeating capturing group seems to provide the desired output for the example you gave.
Regex.Replace("a, s, f, double, double, 12, sd, 1", #"^(([^,]*,){5})(.*)", #"$1somestring,$3");
I'm not an expert on RegEx and someone can probably explain it better than I, but:-
Group 1 is the set of 5 repeating capture groups
Group 2 is the last of the repeating capture groups
Group 3 is the text after the 5 repeating capture groups.

RegEx string between N and (N+1)th Occurance

I am attempting to find nth occurrence of sub string between two special characters. For example.
one|two|three|four|five
Say, I am looking to find string between (n and n+1 th) 2nd and 3rd Occurrence of '|' character, which turns out to be 'three'.I want to do it using RegEx. Could someone guide me ?
My Current Attempt is as follows.
string subtext = "zero|one|two|three|four";
Regex r = new Regex(#"(?:([^|]*)|){3}");
var m = r.Match(subtext).Value;
If you have full access to C# code, you should consider a mere splitting approach:
var idx = 2; // Might be user-defined
var subtext = "zero|one|two|three|four";
var result = subtext.Split('|').ElementAtOrDefault(idx);
Console.WriteLine(result);
// => two
A regex can be used if you have no access to code (if you use some tool that is powered with .NET regex):
^(?:[^|]*\|){2}([^|]*)
See the regex demo. It matches
^ - start of string
(?:[^|]*\|){2} - 2 (or adjust it as you need) or more sequences of:
[^|]* - zero or more chars other than |
\| - a | symbol
([^|]*) - Group 1 (access via .Groups[1]): zero or more chars other than |
C# code to test:
var pat = $#"^(?:[^|]*\|){{{idx}}}([^|]*)";
var m = Regex.Match(subtext, pat);
if (m.Success) {
Console.WriteLine(m.Groups[1].Value);
}
// => two
See the C# demo
If a tool does not let you access captured groups, turn the initial part into a non-consuming lookbehind pattern:
(?<=^(?:[^|]*\|){2})[^|]*
^^^^^^^^^^^^^^^^^^^^
See this regex demo. The (?<=...) positive lookbehind only checks for a pattern presence immediately to the left of the current location, and if the pattern is not matched, the match will fail.
Use this:
(?:.*?\|){n}(.[^|]*)
where n is the number of times you need to skip your special character. The first capturing group will contain the result.
Demo for n = 2
Use this regex and then select the n-th match (in this case 2) from the Matches collection:
string subtext = "zero|one|two|three|four";
Regex r = new Regex("(?<=\|)[^\|]*");
var m = r.Matches(subtext)[2];

Extract tables and columns from SQL query using regular expression

I am trying to create a regex for this task, but I really can't grasp the understanding of regex apart from very simple cases :-( :
The problem: I have this ("SQL like") query:
SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE tcmcs003.cadr REFERS TO tccom130
AND tcmcs003.casi REFERS TO tccom705
AND tcmcs003.cprj REFERS TO tcmcs052
AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1
I want to "extract" all the table names and column names, and after that I want to simply add my characters to them...
For example replace:
SELECT tcmcs003.*, tccom130.nama
with:
SELECT tcmcs003XXX.*, tccom130XXX.namaYYY
Up to now I have the "best" regex I have is this:
(?<gselect>SELECT\s+)*(?<tname>\w{5}\d{3})*(?<spaces>[\.\,\s])+(?<colname>\w{4})*
And replacement pattern:
${gselect}${tname}XXX${spaces}${colname}YYY
The output is really terrible :-(
SELECT tcmcs003.
m130
.nama
m705
.dsca
s052
.dsca
m100
.nama
FROM
s003
m130
,m705
,s052
,m100
WHER
s003
.cadr
REFE
m130
s003
How can I write the regex?
I want to capture repeteately something like
[(any string)(table name)(\.a dot or not)(column name)(any string) ] (repeat N times)
EDIT
I am writing in C#
The pattern should be a bit more general that:
\b(tc(?:mcs|com)\d{3}XXX.\w+)\b
in the sense that table name is 5 characters (the first is always a t, followed by 4 random chars) followed by 3 random digits
table column is 4 random chars
Instead of trying to match the whole command, I'll simply match each table or column independently. Since tables have digits in its name, there's few chances it could match something else.
Match column names with:
\b(t\w{4}\d{3}\.\w{4})\b
Match table names with:
\b(t\w{4}\d{3})\b
Then, we can replace each with the desired value: "$1YYY" and "$1XXX" respectively. The patterns use these constructs:
\b Matches a word boundary (a word char on one side and not a word char on the other).
\w{4} Matches 4 word chars ([A-Za-z0-9_]).
\d{3} Matches 3 digits ([0-9]).
Code:
string input = #"SELECT tcmcs003.*, tccom130.nama, tccom705.dsca, tcmcs052.dsca, tccom100.nama
FROM tcmcs003, tccom130,tccom705,tcmcs052,tccom100
WHERE tcmcs003.cadr REFERS TO tccom130
AND tcmcs003.casi REFERS TO tccom705
AND tcmcs003.cprj REFERS TO tcmcs052
AND tcmcs003.bpid REFERS TO tccom100
ORDER BY tcmcs003._index1";
string Pattern1 = #"\b(t\w{4}\d{3}\.\w{4})\b";
string Pattern2 = #"\b(t\w{4}\d{3})\b";
Regex r1 = new Regex(Pattern1);
Regex r2 = new Regex(Pattern2);
string replacement1 = "YYY";
string replacement2 = "XXX";
string result = "";
result = r1.Replace(input, "$1" + replacement1);
result = r2.Replace(result, "$1" + replacement2);
Console.WriteLine(result);
ideone Demo

Regex problems with equal sign?

In C# I'm trying to validate a string that looks like:
I#paramname='test'
or
O#paramname=2827
Here is my code:
string t1 = "I#parameter='test'";
string r = #"^([Ii]|[Oo])#\w=\w";
var re = new Regex(r);
If I take the "=\w" off the end or variable r I get True. If I add an "=\w" after the \w it's False. I want the characters between # and = to be able to be any alphanumeric value. Anything after the = sign can have alphanumeric and ' (single quotes). What am I doing wrong here. I very rarely have used regular expressions and normally can find example, this is custom format though and even with cheatsheets I'm having issues.
^([Ii]|[Oo])#\w+=(?<q>'?)[\w\d]+\k<q>$
Regular expression:
^ start of line
([Ii]|[Oo]) either (I or i) or (O or o)
\w+ 1 or more word characters
= equals sign
(?<q>'?) capture 0 or 1 quotes in named group q
[\w\d]+ 1 or more word or digit characters
\k<q> repeat of what was captured in named group q
$ end of line
use \w+ instead of \w to one character or more. Or \w* to get zero or more:
Try this: Live demo
^([Ii]|[Oo])#\w+=\'*\w+\'*
If you are being a bit more strict with using paramname:
^([Ii]|[Oo])#paramname=[']?[\w]+[']?
Here is a demo
You could try something like this:
Regex rx = new Regex( #"^([IO])#(\w+)=(.*)$" , RegexOptions.IgnoreCase ) ;
Match group 1 will give you the value of I or O (the parameter direction?)
Match group 2 will give you the name of the parameter
Match group 3 will give you the value of the parameter
You could be stricter about the 3rd group and match it as
(([^']+)|('(('')|([^']+))*'))
The first alternative matches 1 or more non quoted character; the second alternative match a quoted string literal with any internal (embedded) quotes escape by doubling them, so it would match things like
'' (the empty string
'foo bar'
'That''s All, Folks!'

Categories