Replace regex Match with other value - c#

I have a query like this:
select * from tdirectories where tdirectories.parent in
(
select max(tdirectories.directoryid) from tdirectories
where tdirectories.ntfsdrivedocuid in
(
select ntfsdrivedocuid from tntfsdrives, tntfsdrivedocu
where tntfsdrivedocu.ntfsdriveid = tntfsdrives.ntfsdriveid and tntfsdrives.hostid in
(
select tdocu.hostid from tdocu, tshares
here tdocu.docuid = tshares.docuid
and tdocu.archiv = 0
)
and tntfsdrivedocu.archiv = 0
)
and tdirectories.pathhash in (select tshares.pathhash from tshares )
)
What I want to do is that by using RegEx I want to find this part:
select max(tdirectories.directoryid)
Inside the max can be any value. I want to find it and remove, as result i will have
select tdirectories.directoryid
The regex I have created looks like this:
Regex rgx = new Regex("(select\\s.+select)\\smax\\s*\\((?<VAR>[^)]+)\\)");
But this does not solve my issue. What am i missing?

You could go for (in free mode):
select # select literally
\ # a space
max # max literally
\(([^)]+)\) # capture anything inside the parentheses
And use the first group ($1), see a demo on regex101.com.

Related

Negative lookahead in Regex to exclude two words

I have the following regex:
(?!SELECT|FROM|WHERE|AND|OR|AS|[0-9])(?<= |^|\()([a-zA-Z0-9_]+)
that I'm matching against a string like this:
SELECT Static AS My_alias FROM Table WHERE Id = 400 AND Name = 'Something';
This already does 90% of what I want. What I also like to do is to exclude AS My_alias, where the alias can be any word.
I tried to add this to my regex, but this didn't work:
(?!SELECT|FROM|WHERE|AND|OR|AS [a-zA-Z0-9_]+|[0-9])(?<= |^|\()([a-zA-Z0-9_]+)
^^^^^^^^^^^^^^^^
this is the new part
How can I exclude this part of the string using my regex?
Demo of the regex can be found here
This excludes the AS and gets the tokens you seek. It also handles multiple select values, along zero to many Where clauses.
The thought is to use named explicit captures, and let the regex engine know to disregard any non-named capture groups. (A match but don't capture feature)
We will also put all the "tokens" wanted into one token captures (?<Token> ... ) for all of our token needs.
var data = "SELECT Static AS My_alias FROM Table WHERE Id = 400 AND Name = 'Something';";
var pattern = #"
^
SELECT\s+
(
(?<Token>[^\s]+)
(\sAS\s[^\s]+)?
[\s,]+
)+ # One to many statements
FROM\s+
(?<Token>[^\s]+) # Table name
(
\s+WHERE\s+
(
(?<Token>[^\s]+)
(.+?AND\s+)?
)+ # One to many conditions
)? # Optional Where
";
var tokens =
Regex.Matches(data, pattern,
RegexOptions.IgnorePatternWhitespace // Lets us space out/comment pattern
| RegexOptions.ExplicitCapture) // Only consume named groups.
.OfType<Match>()
.SelectMany(mt => mt.Groups["Token"].Captures // Get the captures inserted into `Token`
.OfType<Capture>()
.Select(cp => cp.Value.ToString()))
;
tokens is an array of these strings: { "Static", "Table", "Id", "Name" }
This should get you going on most of the cases of what will find. Use similar logic if you need to process selects with joins; regardless this is a good base to work from going forward.

Find multiply groups matching in specific substring

I would like to catch bold values in the string below that starts with "need" word, while words in other string that starts from "skip" and "ignored" must be ignored. I tried the pattern
need.+?(:"(?'index'\w+)"[,}])
but it found only first(ephasised) value. How I can get needed result using RegEx only?
"skip" : {"A":"ABCD123","B":"ABCD1234","C":"ABCD1235"}
"need" : {"A":"ZABCD123","B":"ZABCD1234","C":"ZABCD1235"}
"ignore" : {"A":"SABCD123","B":"SABCD1234","C":"SABCD1235"}
We are going find need and group what we find into Named Match Group => Captures. There will be two groups, one named Index which holds the A | B | C and then one named Data.
The match will hold our data which will look like this:
From there we will join them into a dictionary:
Here is the code to do that magic:
string data =
#"""skip"" : {""A"":""ABCD123"",""B"":""ABCD1234"",""C"":""ABCD1235""}
""need"" : {""A"":""ZABCD123"",""B"":""ZABCD1234"",""C"":""ZABCD1235""}
""ignore"" : {""A"":""SABCD123"",""B"":""SABCD1234"",""C"":""SABCD1235""}";
string pattern = #"
\x22need\x22\s *:\s *{ # Find need
( # Beginning of Captures
\x22 # Quote is \x22
(?<Index>[^\x22] +) # A into index.
\x22\:\x22 # ':'
(?<Data>[^\x22] +) # 'Z...' Data
\x22,? # ',(maybe)
)+ # End of 1 to many Captures";
var mt = Regex.Match(data,
pattern,
RegexOptions.IgnorePatternWhitespace | RegexOptions.ExplicitCapture);
// Get the data capture into a List<string>.
var captureData = mt.Groups["Data"].Captures.OfType<Capture>()
.Select(c => c.Value).ToList();
// Join the index capture data and project it into a dictionary.
var asDictionary = mt.Groups["Index"]
.Captures.OfType<Capture>()
.Select((cp, iIndex) => new KeyValuePair<string,string>
(cp.Value, captureData[iIndex]) )
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value );
If number of fields is fixed - you can code it like:
^"need"\s*:\s*{"A":"(\w+)","B":"(\w+)","C":"(\w+)"}
Demo
If tags would be after values - like that:
{"A":"ABCD123","B":"ABCD1234","C":"ABCD1235"} : "skip"
{"A":"ZABCD123","B":"ZABCD1234","C":"ZABCD1235"} : "need"
{"A":"SABCD123","B":"SABCD1234","C":"SABCD1235"} : "ignore"
Then you could employ infinite positive look ahead with
"\w+?":"(\w+?)"(?=.*"need")
Demo
But infinite positive look behind's are prohibited in PCRE. (prohibited use of *+ operators in look behind's syntax). So not very useful in your situation
You can't capture a dynamically set number of groups, so I'd run something like this regex
"need".*{.*,?".*?":(".+?").*}
[Demo]
with a 'match_all' function, or use Agnius' suggestion

Regular Expression - Match End if Start is Match

I want to match the following strings:
[anything can be here]
[{anything can be here}]
Can I achieve this using only one regular expression?
I am currently using this one '^\[({)?.+(})?]$', but it will match also:
[anything can be here}]
[{anything can be here]
I need to to match } only if { is used.
Please, note I can use only regular expression match function as I have implemented it as SQL CLR function in order to use it in T-SQL statements.
Basically you can write (verbatim strings):
^\[(?:{.+}|[^]{}]+)]$
You can use something more complicated with a conditional statement (?(condition)then|else):
^\[({)?[^]{}]+(?(1)}|)]$
(if capture group 1 exists, then } else nothing)
But this way is probably less efficient.
I got this working: \[[^{].*[^}]\]|\[\{.*\}\]
EDIT
as pointed out by OP something needs to be between parentheses so a 'one or more' match is more suited:
\[[^{].+[^}]\]|\[\{.+\}\]
see RegEx example here
Your regex ^\[({)?.+(})?]$ will match only an individual string like [{...}] or [{...] because 1) you have anchors (^$), and both curly braces are present in the same pattern.
I suggest using negative look-behinds to avoid matching strings that have just 1 curly brace inside the []-ed string like this:
var rgx = new Regex(#"((?!\[\{[^}]+\]|\[[^{]+\})\[\{?.+?\}?\])");
var tst = "[anything can be here] [{anything can be here}] [anything can be here}] [{anything can be here]";
var mtch = rgx.Matches(tst).Cast<Match>().ToList();
This will make sure you match the []-ed strings even in larger context.
Result:
Try this:
\[[^{].*[^}]\]|\[[^{}]\]|\[\{.+\}\]
Which when broken down matches 3 types of string:
[] surrounding ≥ 2 characters provided the first character isn't { and the last character isn't }
[{}] surrounding anything
[] surrounding a single non curly brace character (an edge case not covered by previous answers)
Okay I know this question has been answered, but I thought I'd show a pure T-SQL solution just as an alternative.
DECLARE #yourTable TABLE (val VARCHAR(100));
INSERT INTO #yourTable
VALUES ('[anything can be here]'),
('[{anything can be here}]'),
('[anything can be here}]'),
('[{anything can be here]');
WITH CTE_Brackets
AS
(
SELECT val,
CASE
WHEN CHARINDEX('{',val) > 0 THEN CHARINDEX('{',val)
END AS L_curly,
CASE
WHEN CHARINDEX('}',val) > 0 THEN CHARINDEX('}',val)
END AS R_curly,
CASE
WHEN CHARINDEX('[',val) > 0 THEN CHARINDEX('[',val)
END AS L_bracket,
CASE
WHEN CHARINDEX(']',val) > 0 THEN CHARINDEX(']',val)
END AS R_bracket
FROM #yourTable
),
CTE_string
AS
(
SELECT val,
L_curly,
R_curly,
L_bracket,
R_bracket,
SUBSTRING(val,start_pos,end_pos - start_pos) val_string
FROM CTE_Brackets A
CROSS APPLY (SELECT COALESCE(L_curly,L_bracket) + 1 AS start_pos,
COALESCE(R_curly,R_bracket) AS end_pos
) CA
)
SELECT A.val,B.val
FROM CTE_string A
INNER JOIN CTE_string B
ON A.val_string = B.val_string
AND
(
(
A.L_curly IS NOT NULL
AND A.R_curly IS NULL
AND B.L_curly IS NULL
AND B.R_curly IS NOT NULL
) --left curly matching right only curly
OR
(
A.L_curly + A.R_curly IS NOT NULL
AND B.R_curly IS NULL
AND B.L_curly IS NULL
) --matches both curly to no curly
)
ORDER BY B.val

extract distinct list of number from a list of strings in C#

I have a list of strings with the following values:
"/manufacturers/244/rz-xvxcv/images/swed"
"/manufacturers/23/rz-gf/images/sltn"
"/manufacturers/34/rz-dffdf/images/five"
"/manufacturers/23/rz-gfgf/images/lead"
"/manufacturers/322/rz-dfg/images/carr"
"/manufacturers/3789/rz-fgdfgfg/images/zing"
I need to extract a distinct list of the number values that fall in the pattern /manufacturers/[int]/rz-
So in the above example my new list would contain: 244,23,34,322,3789
Can this be done using RegEx and Linq?
I'd just use Split instead of regular expressions
var numbers = paths.Select(p=>int.Parse(p.Split('/')[2])).ToList();
Not super abstract or reusable, but very to-the-point.
var numbers = paths.Select(p => int.Parse(p.Substring(15, p.IndexOf('/', 15) - 15)));
If you want to use RegEx and LINQ:
var regex = new Regex(#"(\d+)");
var numbers = paths.Select(i => regex.Match(i).Value).ToList();
you could use this patter
(\d+)(?!.*\1)
not extracted in order though
Demo
( # Capturing Group (1)
\d # <digit 0-9>
+ # (one or more)(greedy)
) # End of Capturing Group (1)
(?! # Negative Look-Ahead
. # Any character except line break
* # (zero or more)(greedy)
\1 # Back reference to group (1)
) # End of Negative Look-Ahead

Regex problems with equal sign?

In C# I'm trying to validate a string that looks like:
I#paramname='test'
or
O#paramname=2827
Here is my code:
string t1 = "I#parameter='test'";
string r = #"^([Ii]|[Oo])#\w=\w";
var re = new Regex(r);
If I take the "=\w" off the end or variable r I get True. If I add an "=\w" after the \w it's False. I want the characters between # and = to be able to be any alphanumeric value. Anything after the = sign can have alphanumeric and ' (single quotes). What am I doing wrong here. I very rarely have used regular expressions and normally can find example, this is custom format though and even with cheatsheets I'm having issues.
^([Ii]|[Oo])#\w+=(?<q>'?)[\w\d]+\k<q>$
Regular expression:
^ start of line
([Ii]|[Oo]) either (I or i) or (O or o)
\w+ 1 or more word characters
= equals sign
(?<q>'?) capture 0 or 1 quotes in named group q
[\w\d]+ 1 or more word or digit characters
\k<q> repeat of what was captured in named group q
$ end of line
use \w+ instead of \w to one character or more. Or \w* to get zero or more:
Try this: Live demo
^([Ii]|[Oo])#\w+=\'*\w+\'*
If you are being a bit more strict with using paramname:
^([Ii]|[Oo])#paramname=[']?[\w]+[']?
Here is a demo
You could try something like this:
Regex rx = new Regex( #"^([IO])#(\w+)=(.*)$" , RegexOptions.IgnoreCase ) ;
Match group 1 will give you the value of I or O (the parameter direction?)
Match group 2 will give you the name of the parameter
Match group 3 will give you the value of the parameter
You could be stricter about the 3rd group and match it as
(([^']+)|('(('')|([^']+))*'))
The first alternative matches 1 or more non quoted character; the second alternative match a quoted string literal with any internal (embedded) quotes escape by doubling them, so it would match things like
'' (the empty string
'foo bar'
'That''s All, Folks!'

Categories