Semi-Fancy Regex.Replace() function - c#

Words placed after these punctuation marks must be capitalized (note that there may be spaces or special characters on either side of these when used):
dash ( - ), slash ( / ), colon ( : ), period ( . ), question mark ( ? ), exclamation
point ( ! ), ellipsis (... OR …) (they are different)
I am sort of bogged down on this puzzle because of all of the special regex characters that I am trying to literally look for in my search. I believe I can use Regex.Escape although I cannot get it working for me right now in this case.
A few examples of starting strings to change to might be:
Change this:
This is a dash - example
To this:
This is a dash - Example <--capitalize "Example" with Regex
This is another dash -example
This is another dash -Example
This is an ellipsis ... example
This is an ellipsis ... Example
This is another ellipsis …example
This is another ellipsis …Example
This is a slash / example
This is a slash / Example
This is a question mark ? example
This is a question mark ? Example
Here is the code I have so far:
private static string[] postCaps = { "-", "/", ":", "?", "!", "...", "…"};
private static string ReplacePostCaps(string strString)
{
foreach (string postCap in postCaps)
{
strString = Regex.Replace(strString, Regex.Escape(postCap), "/(?<=(" + Regex.Escape(postCap) + "))./", RegexOptions.IgnoreCase);
}
return strString;
}
Thank you very much!

You shouldn't need to iterate over a list of punctuation but instead could just add a character set in a single regex:
(?:[/:?!…-]|\.\.\.)\s*([a-z])
To use it with Regex.Replace():
strString = Regex.Replace(
strString,
#"(?:[/:?!…-]|\.\.\.)\s*([a-z])",
m => m.ToString().ToUpper()
);
Regex Explained:
(?: # non-capture set
[/:?!…-] # match any of these characters
| \.\.\. # *or* match three `.` characters in a row
)
\s* # allow any whitespace between matched character and letter
([a-z]) # match, and capture, a single lowercase character

Maybe this works for you:
var phrase = "This is another dash ... example";
var rx = new System.Text.RegularExpressions.Regex(#"(?<=[\-./:?!]) *\w");
var newString = rx.Replace(phrase, new System.Text.RegularExpressions.MatchEvaluator(m => m.Value.ToUpperInvariant()));

Related

Regex to extract text between parenthesis paired with asterisk

This is a slightly different from similar posts in that the parenthesis are paired with an asterisk.
example input:
yada yada (* need to grab this text *) yoda
I thought Jennifers post could be altered for this but my attempts fail.
//Regex regex = new Regex("\\((?<TextInsideBrackets>\\w+)\\)"); //original
Regex regex = new Regex("\\(\\*(?<TextInsideBrackets>\\w+)\\*\\)"); // my attempt
string incomingValue = "Autocycleprestartcase := 20; (* Yayitme ve Konveyoru bosaltabilir *)";
string insideBrackets = null;
Match match = regex.Match(incomingValue);
if (match.Success)
{
insideBrackets = match.Groups["TextInsideBrackets"].Value;
}
Suggestions?
Also, I'd like to remove the enclosed text, with the enclosing parethesis/asterisk pairs, from the input line.
So the output of above would give me
yada yada yoda
and the value
need to grab this text
Thanks
Change it to
Regex regex = new Regex("\\(\\*(?<TextInsideBrackets>[\\w ]+)\\*\\)");
^^^^^^
to allow spaces
Here is a solution to get both the values while re-using the pattern dynamically:
string incomingValue = "Autocycleprestartcase := 20; (* Yayitme ve Konveyoru bosaltabilir *)";
string pattern = #"\(\*\s*(.*?)\s*\*\)";
string insideBrackets = Regex.Match(incomingValue, pattern).Groups[1].Value ?? string.Empty;
Console.WriteLine(insideBrackets); // => Yayitme ve Konveyoru bosaltabilir
Console.WriteLine(Regex.Replace(incomingValue, $#"\s*{pattern}", string.Empty)); // => Autocycleprestartcase := 20;
See the C# demo
Pattern details:
\( - a literal ( (note the single backslash is used as the string is defined via a verbatim string literal, #"")
\* - a literal *
\s* - 0+ whitespaces (trimming the value from the left)
(.*?) - Group 1 capturing zero or more chars other than newline, as few as possible, up to the first occurrence of the subsequent subpatterns
\s* - zero or more whitespaces (trimming from the right)
\* - a literal *
\) - a literal )
To get the second value, you may use the same pattern, but add \s* (zero or more whitespaces) at the beginning, what is done with Regex.Replace(incomingValue, $#"\s*{pattern}", string.Empty).

Regex matching excluding a specific context

I'm trying to search a string for words within single quotes, but only if those single quotes are not within parentheses.
Example string:
something, 'foo', something ('bar')
So for the given example I'd like to match foo, but not bar.
After searching for regex examples I'm able to match within single quotes (see below code snippet), but am not sure how to exclude matches in the context previously described.
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"'([^']*)");
if (name.Success)
{
string matchedName = name.Groups[1].Value;
Console.WriteLine(matchedName);
}
I would recommend using lookahead instead (see it live) using:
(?<!\()'([^']*)'(?!\))
Or with C#:
string line = "something, 'foo', something ('bar')";
Match name = Regex.Match(line, #"(?<!\()'([^']*)'(?!\))");
if (name.Success)
{
Console.WriteLine(name.Groups[1].Value);
}
The easiest way to get what you need is to use an alternation group and match and capture what you need and only match what you do not need:
\([^()]*\)|'([^']*)'
See the regex demo
Details:
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
| - or
' - a '
([^']*) - Group 1 capturing 0+ chars other than '
' - a single quote.
In C#, use .Groups[1].Value to get the values you need. See the online demo:
var str = "something, 'foo', something ('bar')";
var result = Regex.Matches(str, #"\([^()]*\)|'([^']*)'")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
Another alternative is the one mentioned by Thomas, but since it is .NET, you may use infinite-width lookbehind:
(?<!\([^()]*)'([^']*)'(?![^()]*\))
See this regex demo.
Details:
(?<!\([^()]*) - a negative lookbehind failing the match if there is ( followed with 0+ chars other than ( and ) up to
'([^']*)' - a quote, 0+ chars other than single quote captured into Group 1, and another single quote
(?![^()]*\)) - a negative lookahead that fails the match if there are 0+ chars other than ( and ) followed with ) right after the ' from the preceding subpattern.
Since you'd want to exclude ', the same code as above applies.

Add prefix to special characters with Regular Expressions

I have a list of special characters that includes ^ $ ( ) % . [ ] * + - ?. I want put % in front of this special characters in a string value.
I need this to generate a Lua script to use in Redis.
For example Test$String? must be change to Test%$String%?.
Is there any way to do this with regular expressions in C#?
In C#, you just need a Regex.Replace:
var LuaEscapedString = Regex.Replace(input, #"[][$^()%.*+?-]", "%$&");
See the regex demo
The [][$^()%.*+?-] character class will match a single character, either a ], [, $, ^, (, ), %, ., *, +, ? or - and will reinsert it back with the $& backreference in the replacement pattern pre-pending with a % character.
A lookahead is just a redundant overhead here (or a show-off trick for your boss).
You can use lookaheads and replace with %
/(?=[]*+$?)[(.-])/
Regex Demo
(?=[]*+$?)[(.-]) Postive lookahead, checks if the character following any one from the altenation []. If yes, substitutes with %
You can use this regex: ([\\^$()%.\\[\\]*+\\-?])
It will match and capture characters inside the character class. Then you can use $1 to reference the captured character and insert % before it, like so: %$1.
Here is an example code and demo:
string input = "Test$String?";
string pattern = "([\\^$()%.\\[\\]*+\\-?])";
string replacement = "%$1";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(input, replacement);
Console.WriteLine("Original String: {0}", input);
Console.WriteLine("Replacement String: {0}", result);
You can use (?=[\^\$()\%.[]*+-\?]) regex replaced String as "%"

Regex removing empty spaces when using replace

My situation is not about removing empty spaces, but keeping them. I have this string >[database values] which I would like to find. I created this RegEx to find it then go in and remove the >, [, ]. The code below takes a string that is from a document. The first pattern looks for anything that is surrounded by >[some stuff] it then goes in and "removes" >, [, ]
string decoded = "document in string format";
string pattern = #">\[[A-z, /, \s]*\]";
string pattern2 = #"[>, \[, \]]";
Regex rgx = new Regex(pattern);
Regex rgx2 = new Regex(pattern2);
foreach (Match match in rgx.Matches(decoded))
{
string replacedValue= rgx2.Replace(match.Value, "");
Console.WriteLine(match.Value);
Console.WriteLine(replacedValue);
What I am getting in first my Console.WriteLine is correct. So I would be getting things like >[123 sesame St]. But my second output shows that my replace removes not just the characters but the spaces so I would get something like this 123sesameSt. I don't see any space being replaced in my Regex. Am I forgetting something, perhaps it is implicitly in a replace?
The [A-z, /, \s] and [>, \[, \]] in your patterns are also looking for commas and spaces. Just list the characters without delimiting them, like this: [A-Za-z/\s]
string pattern = #">\[[A-Za-z/\s]*\]";
string pattern2 = #"[>,\[\]]";
Edit to include Casimir's tip.
After rereading your question (if I understand well) I realize that your two steps approach is useless. You only need one replacement using a capture group:
string pattern = #">\[([^]]*)]";
Regex rgx = new Regex(pattern);
string result = rgx.Replace(yourtext, "$1");
pattern details:
>\[ # literals: >[
( # open the capture group 1
[^]]* # all that is not a ]
) # close the capture group 1
] # literal ]
the replacement string refers to the capture group 1 with $1
By defining [>, \[, \]] in pattern2 you define a character group consisting of single characters like >, ,, , [ and every other character you listed in the square brackets. But I guess you don't want to match space and ,. So if you don't want to match them leave them out like
string pattern2 = #"[>\[\]]";
Alternatively, you could use
string pattern2 = #"(>\[|\])";
Thereby, you either match >[ or ] which better expresses your intention.

Regular expression to remove whitespace around a comma, except when quoted

I have a CSV file that has rows resembling this:
1, 4, 2, "PUBLIC, JOHN Q" ,ACTIVE , 1332
I am looking for a regular expression replacement that will match against these rows and spit out something resembling this:
1,4,2,"PUBLIC, JOHN Q",ACTIVE,1332
I thought this would be rather easy: I made the expression ([ \t]+,) and replaced it with ,. I made a complement expression (,[ \t]+) with a replacement of , and I thought I had achieved a good means of right-trimming and left-trimming strings.
...but then I noticed that my "PUBLIC, JOHN Q" was now "PUBLIC,JOHN Q" which isn't what I wanted. (Note the space following the comma is now gone).
What would be the appropriate expression to trim the white space before and after a comma, but leave quoted text untouched?
UPDATE
To clarify, I am using an application to handle the file. This application allows me to define multiple regular expression replacements; it does not provide a parsing capability. While this may not be the ideal mechanism for this, it would sure beat making another application for this one file.
If the engine used by your tool is the C# regular expression engine, then you can try the following expression:
(?<!,\s*"(?:[^\\"]|\\")*)\s+(?!(?:[^\\"]|\\")*"\s*,)
replace with empty string.
The guys answers assumed the quotes are balanced and used counting to determine if the space is part of a quoted value or not.
My expression looks for all spaces that are not part of a quoted value.
RegexHero Demo
Something like this might do the job:
(?<!(^[^"]*"[^"]*(("[^"]*){2})*))[\t ]*,[ \t]*
Which matches [\t ]*,[ \t]*, only when not preceded by an odd number of quotes.
Going with some CSV library or parsing the file yourself would be much more easier, and IMO should be preferable option here.
But if you really insist on a regex, you can use this one:
"\s+(?=([^\"]*\"[^\"]*\")*[^\"]*$)"
And replace it with empty string - ""
This regex matches one or more whitespaces, followed by an even number of quotes. This will of course work only if you have balanced quote.
(?x) # Ignore Whitespace
\s+ # One or more whitespace characters
(?= # Followed by
( # A group - This group captures even number of quotes
[^\"]* # Zero or more non-quote characters
\" # A quote
[^\"]* # Zero or more non-quote characters
\" # A quote
)* # Zero or more repetition of previous group
[^\"]* # Zero or more non-quote characters
$ # Till the end
) # Look-ahead end
string format(string val)
{
if (val.StartsWith("\"")) val = " " + val;
string[] vals = val.Split('\"');
for (int i = 0; i < vals.Length; i += 2) vals[i] = vals[i].Replace(" ", "").Replace("\t", "");
return string.Join("\t", vals);
}
This will work if you have properly closed quoted strings in between
Forget the regex (See Bart's comment on the question, regular expressions aren't suitable for CSV).
public static string ReduceSpaces( string input )
{
char[] a = input.ToCharArray();
int placeComma = 0, placeOther = 0;
bool inQuotes = false;
bool followedComma = true;
foreach( char c in a ) {
inQuotes ^= (c == '\"');
if (c == ' ') {
if (!followedComma)
a[placeOther++] = c;
}
else if (c == ',') {
a[placeComma++] = c;
placeOther = placeComma;
followedComma = true;
}
else {
a[placeOther++] = c;
placeComma = placeOther;
followedComma = false;
}
}
return new String(a, 0, placeComma);
}
Demo: http://ideone.com/NEKm09

Categories