semi complex string replace? - c#

I have this text
'Random Text', 'a\nb\\c\'d\\', 'ok'
I want it to become
'Random Text', 'a\nb\c''d\', 'ok'
The issue is escaping. Instead of escaping with \ I now escape only ' with ''. This is for a 3rd party program so I can't change it thus needing to change one escaping method to another.
The issue is \\'. If i do string replace it will become \'' rather than \'. Also \n is not a newline but the actual text \n which shouldn't be modified. I tried using regex but I couldn't think of a way to say if ' replace with '' else if \\ replace with \. Obviously doing this in two step creates the problem.
How do I replace this string properly?

If I understand your question correctly, the issue lies in replacing \\ with \, which can then cause another replacement if it occurs right before '. One technique would be to replace it to an intermediary string first that you're sure will not occur anywhere else, then replace it back after you're done.
var str = #"'Random Text', 'a\nb\\c\'d\\', 'ok'";
str.Replace(#"\\", "NON_OCCURRING_TEMP")
.Replace(#"\'", "''")
.Replace("NON_OCCURRING_TEMP", #"\");
As pointed out by #AlexeiLevenkov, you can also use Regex.Replace to do both modifications simultaneously.
Regex.Replace(str, #"(\\\\)|(\\')",
match => match.Value == #"\\" ? #"\" : #"''");

Seems voithos' interpretation of the question is the right one. Another approach is to use RegEx to find all tokens at once and replace ReguarExpression.Replace
Starting point:
var matches = new Regex(#"\\\\'|\\'|'");
Console.Write(matches.Replace(#"'a b\nc d\\e\'f\\'",
match =>"["+match + "]"));

Related

Text files - how to programmatically mimic opening in Wordpad and overwriting as plain text [duplicate]

How can I replace lone instances of \n with \r\n (LF alone with CRLF) using a regular expression in C#?
I know to do it using plan String.Replace, like:
myStr.Replace("\n", "\r\n");
myStr.Replace("\r\r\n", "\r\n");
However, this is inelegant, and would destroy any "\r+\r\n" already in the text (although they are not likely to exist).
It might be faster if you use this.
(?<!\r)\n
It basically looks for any \n that is not preceded by a \r. This would most likely be faster, because in the other case, almost every letter matches [^\r], so it would capture that, and then look for the \n after that. In the example I gave, it would only stop when it found a \n, and them look before that to see if it found \r
Will this do?
[^\r]\n
Basically it matches a '\n' that is preceded with a character that is not '\r'.
If you want it to detect lines that start with just a single '\n' as well, then try
([^\r]|$)\n
Which says that it should match a '\n' but only those that is the first character of a line or those that are not preceded with '\r'
There might be special cases to check since you're messing with the definition of lines itself the '$' might not work too well. But I think you should get the idea.
EDIT: credit #Kibbee Using look-ahead s is clearly better since it won't capture the matched preceding character and should help with any edge cases as well. So here's a better regex + the code becomes:
myStr = Regex.Replace(myStr, "(?<!\r)\n", "\r\n");
I was trying to do the code below to a string and it was not working.
myStr.Replace("(?<!\r)\n", "\r\n")
I used Regex.Replace and it worked
Regex.Replace( oldValue, "(?<!\r)\n", "\r\n")
I guess that "myStr" is an object of type String, in that case, this is not regex.
\r and \n are the equivalents for CR and LF.
My best guess is that if you know that you have an \n for EACH line, no matter what, then you first should strip out every \r. Then replace all \n with \r\n.
The answer chakrit gives would also go, but then you need to use regex, but since you don't say what "myStr" is...
Edit:looking at the other examples tells me one thing.. why do the difficult things, when you can do it easy?, Because there is regex, is not the same as "must use" :D
Edit2: A tool is very valuable when fiddling with regex, xpath, and whatnot that gives you strange results, may I point you to: http://www.regexbuddy.com/
myStr.Replace("([^\r])\n", "$1\r\n");
$ may need to be a \
Try this: Replace(Char.ConvertFromUtf32(13), Char.ConvertFromUtf32(10) + Char.ConvertFromUtf32(13))
If I know the line endings must be one of CRLF or LF, something that works for me is
myStr.Replace("\r?\n", "\r\n");
This essentially does the same neslekkiM's answer except it performs only one replace operation on the string rather than two. This is also compatible with Regex engines that don't support negative lookbehinds or backreferences.

Match regex pattern in a line of text without targeting the text within quotations

Stackoverflow has been very generous with answers to my regex questions so far, but with this one I'm blanking on what to do and just can't seem to find it answered here.
So I'm parsing a string, let's say for example's sake, a line of VB-esque code like either of the following:
Call Function ( "Str ing 1 ", "String 2" , " String 3 ", 1000 ) As Integer
Dim x = "This string should not be affected "
I'm trying to parse the text in order to eliminate all leading spaces, trailing spaces, and extra internal spaces (when two "words/chunks" are separated with two or more space or when there is one or more spaces between a character and a parentheses) using regex in C#. The result after parsing the above should look like:
Call Function("Str ing 1 ", "String 2", " String 3 ", 1000) As Integer
Dim x = "This string should not be affected "
The issue I'm running into is that, I want to parse all of the line except any text contained within quotation marks (i.e. a string). Basically if there are extra spaces or whatever inside a string, I want to assume that it was intended and move on without changing the string at all, but if there are extra spaces in the line text outside of the quotation marks, I want to parse and adjust that accordingly.
So far I have the following regex which does all of the parsing I mentioned above, the only issue is it will affect the contents of strings just like any other part of the line:
var rx = new Regex(#"\A\s+|(?<=\s)\s+|(?<=.)\s+(?=\()|(?<=\()\s+(?=.)|(?<=.)\s+(?=\))|\s+\z")
.
.
.
lineOfText = rx.Replace(lineOfText, String.Empty);
Anyone have any idea how I can approach this, or know of a past question answering this that I couldn't find? Thank you!
Since you are reading the file line by line, you can use the following fix:
("[^"]*(?:""[^"]*)*")|^\s+|(?<=\s)\s+|(?<=\w)\s+(?=\()|(?<=\()\s+(?=\w)|(?<=\w)\s+(?=\))|\s+$
Replace the matched text with $1 to restore the captured string literals that were captured with ("[^"]*(?:""[^"]*)*").
See demo

Custom replace for json encoding not outputting double quotes as expected

After I created my own json encoder, I realized it was replacing double-quotes with two escaping backslashes instead of one.
I realize, now, that C# has a built in Json.Encode() method, and yes, I have gotten it to work, however, I am left baffled by why the following code (the json encoder I had built) didn't replace quotes as I would expect.
Here is my json encoder method:
public static string jsonEncode(string val)
{
val = val.Replace("\r\n", " ")
.Replace("\r", " ")
.Replace("\n", " ")
.Replace("\"", "\\\"")
.Replace("\\", "\\\\");
return val;
}
The replace call: Replace("\"", "\\\"") is replacing " with \\", which of course produces invalid json, as it sees the two backslashes (one as an escape character, much like the above C#) as a single 'real' backslash in the json file, thus not escaping the double-quote, as intended. The Replace("\\", "\\\\") call works perfectly, however (i.e., it replaces one backslash with two, as I would expect).
It is easy for me to tell that the Replace method is not performing the functions, based on my arguments, like I would expect. My question is why? I know I can't use Replace("\"", "\\"") as the backslash is also an escape character for C#, so it will produce a syntax error. Using Replace("\"", "\"") would also be silly, as it would replace a double-quote with a double-quote.
For better understanding of the replace method in C#, I would love to know why the Replace method is behaving differently than I'd expect. How does Json.Encode achieve this level of coding?
You're replacing " with \" and then replacing any backslashes with two backslashes... which will include the backslash you've already created. Perform the operations one at a time on paper and you'll see the same effect.
All you need to do is reverse the ordering of the escaping, so that you escape backslashes first and then quotes:
return val.Replace("\r\n", " ")
.Replace("\r", " ")
.Replace("\n", " ")
.Replace("\\", "\\\\")
.Replace("\"", "\\\"");
The problem is here:
Replace("\"", "\\\""); // convert " to \"
Replace("\\", "\\\\"); // which are then converted to \\"
The first line replaces " with \". The second line replaces those new \" with \\".
As Jon says, you need the replacement that escapes the escape character to run before introducing any escape characters.
But, I think you should use a real encoder. ;-)

c#.net Regex - Need to find a sequence of characters, then replace one character within it

I'm trying to build a regex that can look for names that contain apostrophes (O'Connor, O'Neil) and replace the apostrophes with 2 apostrophes (O''Connor, O''Neil).
I don't want to do this with all apostrophes in the string in question, just apostrophes that appear between two letters (upper or lower case). Now, I have no trouble finding instances of LETTER-APOSTROPHE-LETTER, but I'm not sure how to take that sequence and change the ' to ''.
You said this is for inserting values into a database. Don't do this - use parameterized queries instead, which will handle escaping properly. Jon Skeet says so.
new Regex("([a-zA-Z])'([a-zA-Z])").Replace(input, match => match.Groups[1] + "''" + match.Groups[2])
string result = Regex.Replace(input, #"(?<=[^'])(')(?=[^'])", "''");
This will work, I just tested it:
Regex.Replace("(\w)'(\w)","$1''$2");
(O'Connor, O'Neil) turns into (O''Connor, O''Neil)

regex can't recognize "\n"?

so at the end the end(after few days of debuging) i found a problem. It isnt in regex at all :/ . It seams that i was trimming ekstra white spaces with
intput= Regex.Replace(input, "\\s+", " ");
so all new lines are replaced with " ". Stupid! Moderator, please remove this if unnecesary!
I have regexp for tokenizing some text and it looks like this :
"(?<html>Ç)|
(?<number>\\d+(?:[.]\\d+)?(?=[][ \f\n\r\t\v!?.,():;\"'„Ç]|$))|
(?<other>(?:[^][Ç \f\n\r\t\v!?.,():;\"'„A-Za-zčćšđžČĆŠĐŽäöÖü][^ Ç\f\n\r\t\vA-Za-zčćšđžČĆŠĐŽäöÖü]*)?[^][ Ç\f\n\r\t\v!?.,():;\"'„A-Za-zčćšđžČĆŠĐŽäöÖü](?=[][!?.,():;\"'„]*(?:$|[ Ç\f\n\r\t\v])))|
(?<word>(?:[^][ Ç\f\n\r\t\v!?.,():;\"'„][^ Ç\f\n\r\t\v]*)?[^][ Ç\f\n\r\t\v!?.,():;\"'„])|
(?<punctuation>[][ \f\n\r\t\v!?.,():;\"'„])"
Problem is in this part: (?<punctuation>[][ \f\n\r\t\v!?.,():;\"'„]). So when im prsing text with input "\n\n" it is grouping in punctuation matches: " "," " - in other words, space and space... and I don't know why?
I could be wrong, but you need to hand the String as String to the RegEx...means you need to escape the backslashes.
... (?=[][ \\f\\n\\r\\t\\v!?.,():;\\" ...
Or otherwise C# will replace \n with a linebreak within the RegEx-Statement.
Edit: It's also possible to use literal strings, but the need to be marked with beginning # (see Martin's answer).
If you put an # in front of string you can use single backslashes and line-breaks will be recognized.
#"(?<html>Ç)|
Set RegexOptions.IgnorePatternWhiteSpace
Update:
Are you sure [^] is correct? Unless it's somekind of character group (that I have never used), that will be the same as . . Same goes for []. Perhaps I just have not used all of RE before :p

Categories