Replace certain combination of characters with another - c#

In C# I'm trying to replace characters in a string. To be more precise, wherever there is a double quote which is NOT followed, nor preceded, by a comma, I'd like to replace that double quote with a single quote. So, for example:
John",123
and
123,"John
are both fine, because there is a comma either before or after the double quote, but:
John"Marks
is not fine because there is a double quote which is neither preceded not succeeded by a comma, so it should be replaced with a single quote. I.e. it should become:
John'Marks
I'm struggling to figure this one one... any ideas anyone? Thanks...

You can use look arounds for your search regex:
(?<!,)"(?!,)
RegEx Demo
RegEx Breakup:
(?<!,) - Negative Lookbehind to assert previous character is not a comma
" - Match a double quote
(?!,) - Negative Lookahead to assert next character is not a comma
Replacement string would be just a single quote "'"
Code:
string repl = Regex.Replace(str, #"(?<!,)\"(?!,)", "'");

Related

Regex for escaping some single quotes

I was creating some regex for matching strings like :
'pulkit'
'989'
basically anything in between the two single quotes.
so I created a regex something like ['][^']*['].
But this is not working for cases like:
'burger king's'. The expected output is burger king's but from my logic
it is burger king only.
As an another example 'pulkit'sharma' the expected output should be pulkit'sharma
So can anyone help me in this ? How to escape single quotes in this case.
Try a positive lookahead to match a space or end of line for matching the closing single quote
'.+?'(?=\s|$)
Demo
You may match single quote that is not preceded with a word char and is followed with a word char, and match any text up to the ' that is preceded with a word char and not followed with a word char:
(?s)\B'\b(.*?)\b'\B
See the .NET regex demo.
Note you do not have to wrap single quotation marks with square brackets, they are not special regex metacharacters.
C# code:
var matches = Regex.Matches(text, #"(?s)\B'\b(.*?)\b'\B")
.Cast<Match>()
.Select(x => x.Groups[1].Value)
.ToList();

Regex to replace single quote by ignoring "already replaced single quote" and "beginning/ ending single quote"

I have a string like this:
var path = "'Ah'This is a 'sample\'e'";
In the above string beginning and ending single quote(after double quotes) are as expected.
i.e "'...............'";
In the rest part of the string, there are single quotes (both replaced (i.e \' and un-replaced). I have a necessity to replace the single quote wherever it is not replaced. If it is already escaped, then no action needed. I have a hard time to find suitable regex to replace this.
After replacing the string must look like this( Please note that beginning and ending single quotes must not be replaced.
"'Ah\'This is a \'sample\'e'";
Could someone please help?
You may use
s = Regex.Replace(s, #"(?<!\\)(?!^)'(?!$)", #"\'");
See the regex demo. Regex graph:
Details
(?<!\\) - a negative lookbehind that matches a location in string that is not immediately preceded with \
(?!^) - a negative lookahead that matches a location in string that is not immediately followed with start of string (it is just failing the match if the current position is the start of string)
' - a ' char
(?!$) - a negative lookahead that matches a location in string that is not immediately followed with the end of string (it is failing the match if the current position is the end of string).

Regex quotes and not a specific word before quotes

What I'm trying to do is that simple but the same time is not.
I have a function of RegEx in C# to find all the words inside quotes,
But if a specific word exist before the quotes, Ignore the whole word and continue to the next row.
While still looking for a specific kind of symbols inside the quotes and Ignore too.
Example -
My RegEx = #"(?<!Foo\()\""[^{}\r\n]*\""";
Text -
dontfindme1 = "Hello{}"
dontfindme2 = Foo("ABC")
findme1 = "Just a simple text to find"
findme2 = SuperFoo("WORKS")
Output example -
"ABC"
"Just a simple text to find"
"WORKS"
Now my problem is that I dont want to find the name "Foo(" before the quotes
And I dont want to find "{" or "}" or "(" or ")" or new lines
I only need "ABC" not to be found and skip to the next row.
You could use a negative lookahead (?! to check that the string does not match either {} between double quotes or Foo(
^(?!.*\bFoo\()(?!.*"[^"\r\n]*[{}][^"\r\n]*").*$
In C# string pattern = #"^(?!.*\bFoo\()(?!.*""[^""\r\n]*[{}][^""\r\n]*"").*$";
Regex demo
Explanation
^ Assert the start of the string
(?! Negative lookahead, assert that what follows does not
.*\bFoo\( Match any character 0+ times followed by a word boundary and Foo(
) Close negative lookahead
(?! Negative lookahead, assert that what follows does not
.* Match any character 0+ times
"[^"\r\n]* Match a double quote, match 0+ times not ", \r, \n
[{}] Match { or }
[^"\r\n]*" Match 0+ times not ", \r, \n followed by matching a double quote
) Close negative lookahead
.* Match any character 0+ times
$ Assert the end of the string

How to insert spaces between characters using Regex?

Trying to learn a little more about using Regex (Regular expressions). Using Microsoft's version of Regex in C# (VS 2010), how could I take a simple string like:
"Hello"
and change it to
"H e l l o"
This could be a string of any letter or symbol, capitals, lowercase, etc., and there are no other letters or symbols following or leading this word. (The string consists of only the one word).
(I have read the other posts, but I can't seem to grasp Regex. Please be kind :) ).
Thanks for any help with this. (an explanation would be most useful).
You could do this through regex only, no need for inbuilt c# functions.
Use the below regexes and then replace the matched boundaries with space.
(?<=.)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<=.)(?!$)", " ");
Explanation:
(?<=.) Positive lookbehind asserts that the match must be preceded by a character.
(?!$) Negative lookahead which asserts that the match won't be followed by an end of the line anchor. So the boundaries next to all the characters would be matched but not the one which was next to the last character.
OR
You could also use word boundaries.
(?<!^)(\B|b)(?!$)
DEMO
string result = Regex.Replace(yourString, #"(?<!^)(\B|b)(?!$)", " ");
Explanation:
(?<!^) Negative lookbehind which asserts that the match won't be at the start.
(\B|\b) Matches the boundary which exists between two word characters and two non-word characters (\B) or match the boundary which exists between a word character and a non-word character (\b).
(?!$) Negative lookahead asserts that the match won't be followed by an end of the line anchor.
Regex.Replace("Hello", "(.)", "$1 ").TrimEnd();
Explanation
The dot character class matches every character of your string "Hello".
The paranthesis around the dot character are required so that we could refer to the captured character through the $n notation.
Each captured character is replaced by the replacement string. Our replacement string is "$1 " (notice the space at the end). Here $1 represents the first captured group in the input, therefore our replacement string will replace each character by that character plus one space.
This technique will add one space after the final character "o" as well, so we call TrimEnd() to remove that.
A demo can be seen here.
For the enthusiast, the same effect can be achieve through LINQ using this one-liner:
String.Join(" ", YourString.AsEnumerable())
or if you don't want to use the extension method:
String.Join(" ", YourString.ToCharArray())
It's very simple. To match any character use . dot and then replace with that character along with one extra space
Here parenthesis (...) are used for grouping that can be accessed by $index
Find what : "(.)"
Replace with "$1 "
DEMO

Finding C#-style unescaped strings using regular expressions

I'm trying to write a regular expression that finds C#-style unescaped strings, such as
string x = #"hello
world";
The problem I'm having is how to write a rule that handles double quotes within the string correctly, like in this example
string x = #"before quote ""junk"" after quote";
This should be an easy one, right?
Try this one:
#".*?(""|[^"])"([^"]|$)
The first parantheses mean 'If there is an " before the finishing quote, it better be two of them', the second parantheses mean 'After the finishing quote, there sould ether be not a quote, or the end of the line'.
How 'bout the regex #\"([^\"]|\"\")*\"(?=[^\"])
Due to greedy matching, the final lookahead clause is likely not to be needed in your regex engine, although it is more specific.
If I remember correctly, you have to use \"" - the double-double quotes to hash it for C# and the backslash to hash it for regex.
Try this:
#"[^"]*?(""[^"]*?)*";
It looks for the starting characters #", for the ending characters "; (you can leave the semicolon out if you need to) and in between it can have any characters except quotes, or if there are quotes they have to be doubled.
#"(?:""|[^"])*"(?!")
is the right regex for this job. It matches the #, a quote, then either two quotes in a row or any non-quote character, repeating this up unto the next quote (that isn't doubled).
"^#(""|[^"])*$" is the regex you want, looking for first an at-sign and a double-quote, then a sequence of any characters (except double-quotes) or double double-quotes, and finally a double-quote.
As a string literal in C#, you'd have to write it string regex = "^#\"(\"\"|[^\"])*\"$"; or string regex = #"^#""(""""|[^""])*""$";. Choose your poison.

Categories