C# regex to replace a delimiter by another one - c#

I'm working on pl/sql code where i want to replace ';' which is commented with '~'.
e.g.
If i have a code as:
--comment 1 with;
select id from t_id;
--comment 2 with ;
select name from t_id;
/*comment 3
with ;*/
Then i want my result text as:
--comment 1 with~
select id from t_id;
--comment 2 with ~
select name from t_id;
/*comment 3
with ~*/
Can it be done using regex in C#?

Regular expression:
((?:--|/\*)[^~]*)~(\*/)?
C# code to use it:
string code = "all that text of yours";
Regex regex = new Regex(#"((?:--|/\*)[^~]*)~(\*/)?", RegexOptions.Multiline);
result = regex.Replace(code, "$1;$2");
Not tested with C#, but the regular expression and the replacement works in RegexBuddy with your text =)
Note: I am not a very brilliant regular expression writer, so it could probably have been written better. But it works. And handles both your cases with one-liner-comments starting with -- and also the multiline ones with /* */
Edit: Read your comment to the other answer, so removed the ^ anchor, so that it takes care of comments not starting on a new line as well.
Edit 2: Figured it could be simplified a bit. Also found it works fine without the ending $ anchor as well.
Explanation:
// ((?:--|/\*)[^~]*)~(\*/)?
//
// Options: ^ and $ match at line breaks
//
// Match the regular expression below and capture its match into backreference number 1 «((?:--|/\*)[^~]*)»
// Match the regular expression below «(?:--|/\*)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «--»
// Match the characters “--” literally «--»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «/\*»
// Match the character “/” literally «/»
// Match the character “*” literally «\*»
// Match any character that is NOT a “~” «[^~]*»
// Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
// Match the character “~” literally «~»
// Match the regular expression below and capture its match into backreference number 2 «(\*/)?»
// Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
// Match the character “*” literally «\*»
// Match the character “/” literally «/»

A regex is not really needed - you can iterate on lines, locate the lines starting with "--" and replace ";" with "~" on them.
String.StartsWith("--") - Determines whether the beginning of an instance of String matches a specified string.
String.Replace(";", "~") - Returns a new string in which all occurrences of a specified Unicode character or String in this instance are replaced with another specified Unicode character or String.

Related

Pattern Match At Specific Location For Validation

With these data examples:
/test -test/test/2016/April
/test -test/test/2016
How does one pattern match so that it can determine whether or not the number 2016 is located in this exact position?
A regex pattern can do validation or as you infer location positioning validation. The key is to setup pattern anchors based on the strings encountered before one gets to just the numeric.
For your case you have literal /s then text then a literal - then literal /s then text....etc. By following those patterns of the literal anchors with generic text, you can require a specific position.
But other numbers could spoof other patterns (noise per se), so you appear to be getting a date. The following will make sure that /{date of 19XX or 20XX}/ is the only valid item for that position.
string pattern = #"
^ # Beginning of line (anchor)
/ # / anchor
[^-]+ # Anything not a dash.
- # Anchor dash
[^/]+ # Anything not a /
/ # / anchor
[^/]+ # Anything not a /
/ # / anchor
[12][90]\d\d # Allow only a `date` field of 19XX or 20XX.
";
// IgnorePatternWhitespace *only* allows us to comment the pattern
// and place it on multiple lines (space ignored)
// it does not affect processing of the data.
// Compiled tells the parser to hold the pattern compilation
// in memory for future processing.
var validator = new Regex(pattern, RegexOptions.IgnorePatternWhitespace |
RegexOptions.Compiled);
validator.IsMatch("/ -test/test/2016/April"); // True
validator.IsMatch("/ -test/test/2016"); // True
validator.IsMatch("/ -test/test/1985/April"); // True
validator.IsMatch("/ -2017/test/1985/April"); // True
// Negative Tests
validator.IsMatch("/ -2017/test/WTF/April"); // False
validator.IsMatch("/jabberwocky/test/1985/April"); // False, no dash!
validator.IsMatch("////April"); // false
validator.IsMatch("///2016/April"); // False because no text between `/`
validator.IsMatch("/ -test/test/ 2016/April"); // False because pattern
// does not allow a space
Pattern Notes
Instead of looking of for the date with \d\d\d\d, I am giving the regex parser a specific anchor type hint that this is either going to be a date in that resides in the twentieth century, 19XX, or the twenty first century, 20XX. So I spell out the first two places of the \d\d\d\d pattern to be a set where either 1 or 2 is the first \d as [12] (1 for a 19xx pattern or 2 for a 20xx pattern) followed by the second place number to be either a nine or a zero[90]. In a modern computer system most dates will be within these two centuries; so why not craft the regex as such.
Assuming, that "exact position" means "third position", the following regex would work:
/(?:[^/]*/){2}(\d{4}).*
In C#, this can be used with the Regex Constructor and the #"" String Syntax, which makes escaping characters obsolete:
var rx = new Regex(#"/(?:[^/]*/){2}(\d{4}).*");
If this regex matches a string, the four digits of the year are captured as a result.
Explanation
/ captures the leading slash character.
[^/]* captures any sequence of characters unequal to a slash.
/ captures a slash character
the preceeding two code parts are now wrapped inside non-capturing brackets, which are specified with ?: as the first two characters inside them.
Having (?:[^/]*/) now matching a "path segment" like "test/", the pattern must be matched exactly two times in a row. that's why the brackets are followed by the quantifier {2}
Then the actual number must be matched: It consists of four digits in a row. This is represented as followed: (\d{4}) where \d means "any number" and - once again - the quantifier defines that there should be 4 in a row.
Finally, there can be aribtrary characters behind the number, ("tha path can continue"): This is specified by the . ("match any character") and the quantifier *, which means "any number of occurences".
Note: There are many dialects of Regular Expressions. This on works for the C# regex implemantation, however it should work for many others as well.
Your regex will be:
\-(?:[^\/]+\/){2}(\d+)
It will capture number appearing after xx/xx/ pattern where xx/ is adjustable.
Example:
var s1 = "/test -test/test/2016/April";
var s2 = "/test -test/test/2016";
var rx = new Regex ("\\-(?:[^\\/]+\\/){2}(\\d+)");
var m1 = rx.Match(s1);
var m2 = rx.Match(s2);
if (m1.Success && m2.Success) {
if (m1.Groups[1].Value == m2.Groups[1].Value) {
Console.WriteLine ("s1 == s2");
}
}
Based on provided input string s1 and s2, it will print:
s1 == s2

C# Regular expression to match on a character not following pairs of the same charcater

Objective: Regex Matching
For this example I'm interested in matching a "|" pipe character.
I need to match it if it's alone: "aaa|aaa"
I need to match it (the last pipe) only if it's preceded by pairs of pipe: (2,4,6,8...any even number)
Another way: I want to ignore ALL pipe pairs "||" (right to left)
or I want to select bachelor bars only (the odd man out)
string twomatches = "aaaaaaaaa||||**|**aaaaaa||**|**aaaaaa";
string onematch = "aaaaaaaaa||**|**aaaaaaa||aaaaaaaa";
string noMatch = "||";
string noMatch = "||||";
I'm trying to select the last "|" only when preceded by an even sequence of "|" pairs or in a string when a single bar exists by itself.
Regardless of the number of "|"
You may use the following regex to select just odd one pipe out:
(?<=(?<!\|)(?:\|{2})*)\|(?!\|)
See regex demo.
The regex breakdown:
(?<=(?<!\|)(?:\|{2})*) - if a pipe is preceded with an even number of pipes ((?:\|{2})* - 0 or more sequences of exactly 2 pipes) from a position that has no preceding pipe ((?<!\|))
\| - match an odd pipe on the right
(?!\|) - if it is not followed by another pipe.
Please note that this regex uses a variable-width look-behind and is very resource-consuming. I'd rather use a capturing group mechanism here, but it all depends on the actual purpose of matching that odd pipe.
Here is a modified version of the regex for removing the odd one out:
var s = "1|2||3|||4||||5|||||6||||||7|||||||";
var data = Regex.Replace(s, #"(?<!\|)(?<even_pipes>(?:\|{2})*)\|(?!\|)", "${even_pipes}");
Console.WriteLine(data);
See IDEONE demo. Here, the quantified part is moved from lookbehind to an even_pipes named capturing group, so that it could be restored with the backreference in the replaced string. Regexhero.net shows 129,046 iterations per second for the version with a capturing group and 69,206 with the original version with variable-width lookbehind.
Only use variable-width look-behind if it is absolutely necessary!
Oh, it's reopened! If you need better performance, also try this negative improved version.
\|(?!\|)(?<!(?:[^|]|^)(?:\|\|)*)
The idea here is to first match the last literal | at right side of a sequence or single | and execute a negated version of the lookbehind just after the match. This should perform considerably better.
\|(?!\|) matches literal | IF NOT followed by another pipe character (right most if sequence).
(?<!(?:[^|]|^)(?:\|\|)*) IF position right after the matched | IS NOT preceded by (?:\|\|)* any amount of literal || until a non| or ^ start.In other words: If this position is not preceded by an even amount of pipe characters.
Btw, there is no performance gain in using \|{2} over \|\| it might be better readable.
See demo at regexstorm

Regex - documentation

Hello all I am going through some old code and ran across a reg-ex, I cant figure out what it does, Can anyone shed some light on it.
<(.|\n)*?>|{(.|\n)*?}
it was in a replace string.replace statement.
Put your regex into Regex101.com
At the bottom is a guide titled Your regular expression explained
According to RegexBuddy this is what it dose:
Match either the regular expression below (attempting the next alternative only if this one fails) «<(.|\n)*?>»
Match the character “<” literally «<»
Match the regular expression below and capture its match into backreference number 1 «(.|\n)*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*?»
Match either the regular expression below (attempting the next alternative only if this one fails) «.»
Match any single character that is not a line break character «.»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «\n»
Match a line feed character «\n»
Match the character “>” literally «>»
Or match regular expression number 2 below (the entire match attempt fails if this one fails to match) «{(.|\n)*?}»
Match the character “{” literally «{»
Match the regular expression below and capture its match into backreference number 2 «(.|\n)*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Note: You repeated the capturing group itself. The group will capture only the last iteration. Put a capturing group around the repeated group to capture all iterations. «*?»
Match either the regular expression below (attempting the next alternative only if this one fails) «.»
Match any single character that is not a line break character «.»
Or match regular expression number 2 below (the entire group fails if this one fails to match) «\n»
Match a line feed character «\n»
Match the character “}” literally «}»
The Matches are:
<>
<...>
{}
{...}
when ... is any text

.NET Regex: negate previous character for the first character in string

Consider following string
"Some" string with "quotes" and \"pre-slashed\" quotes
Using regex, I want to find all the double quotes with no slash before them. So I want the regex to find four matches for the example sentence
This....
[^\\]"
...would find only three of them. I suppose that's because of the regex's state machine which is first validating the command to negate the presence of the slash.
That means I need to write a regex with some kind of look-behind, but I don't know how to work with these lookaheads and lookbehinds...im not even sure that's what I'm looking for.
The following attempt returns 6, not 4 matches...
"(?<!\\)
"(?<!\\")
Is what you're looking for
If you want to match "Some" and "quotes", then
(?<!\\")(?!\\")"[a-zA-Z0-9]*"
will do
Explanation:
(?<!\\") - Negative lookbehind. Specifies a group that can not match before your main expression
(?!\\") - Negative lookahead. Specifies a group that can not match after your main expression
"[a-zA-Z0-9]*" - String to match between regular quotes
Which means - match anything that doesn't come with \" before and \" after, but is contained inside double quotes
You almost got it, move the quote after the lookbehind, like:
(?<!\\)"
Also be ware of cases like
"escaped" backslash \\"string\"
You can use an expression like this to handle those:
(?<!\\)(?:\\\\)*"
Try this
(?<!\\)(?<qs>"[^"]+")
Explanation
<!--
(?<!\\)(?<qs>"[^"]+")
Options: case insensitive
Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind) «(?<!\\)»
Match the character “\” literally «\\»
Match the regular expression below and capture its match into backreference with name “qs” «(?<qs>"[^"]+")»
Match the character “"” literally «"»
Match any character that is NOT a “"” «[^"]+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the character “"” literally «"»
-->
code
try {
if (Regex.IsMatch(subjectString, #"(?<!\\)(?<qs>""[^""]+"")", RegexOptions.IgnoreCase)) {
// Successful match
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
}

Regular Expression for string

I have a string like
e.g AHDFFH XXXX
where 'AHDFFH' can be char string of any length.
AND 'XXXX' will be repeated no. of 'X' chars of any length which needs to be replaced by auto incremented database value in a table.
I need to find repeated 'X' chars from above string using regular expression.
Can anyone please help me to figure this out..??
Try this:
\b(\p{L})\1+\b
Explanation:
<!--
\b(\p{L})\1+\b
Options: case insensitive; ^ and $ match at line breaks
Assert position at a word boundary «\b»
Match the regular expression below and capture its match into backreference number 1 «(\p{L})»
A character with the Unicode property “letter” (any kind of letter from any language) «\p{L}»
Match the same text as most recently matched by capturing group number 1 «\1+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert position at a word boundary «\b»
-->
is your meaning some chars + (on or some)space + some numbers?
if so u can use this regexpression:
\w+\s+(\d+)
c# codes like this:
System.Text.RegularExpressions.Regex regex = new System.Text.RegularExpressions.Regex(#"\w+\s+(\d+)");
System.Text.RegularExpressions.Match m = regex.Match("aaaa 3333");
if(m.Success) {
MessageBox.Show(m.Groups[1].Value);
}

Categories