After I created my own json encoder, I realized it was replacing double-quotes with two escaping backslashes instead of one.
I realize, now, that C# has a built in Json.Encode() method, and yes, I have gotten it to work, however, I am left baffled by why the following code (the json encoder I had built) didn't replace quotes as I would expect.
Here is my json encoder method:
public static string jsonEncode(string val)
{
val = val.Replace("\r\n", " ")
.Replace("\r", " ")
.Replace("\n", " ")
.Replace("\"", "\\\"")
.Replace("\\", "\\\\");
return val;
}
The replace call: Replace("\"", "\\\"") is replacing " with \\", which of course produces invalid json, as it sees the two backslashes (one as an escape character, much like the above C#) as a single 'real' backslash in the json file, thus not escaping the double-quote, as intended. The Replace("\\", "\\\\") call works perfectly, however (i.e., it replaces one backslash with two, as I would expect).
It is easy for me to tell that the Replace method is not performing the functions, based on my arguments, like I would expect. My question is why? I know I can't use Replace("\"", "\\"") as the backslash is also an escape character for C#, so it will produce a syntax error. Using Replace("\"", "\"") would also be silly, as it would replace a double-quote with a double-quote.
For better understanding of the replace method in C#, I would love to know why the Replace method is behaving differently than I'd expect. How does Json.Encode achieve this level of coding?
You're replacing " with \" and then replacing any backslashes with two backslashes... which will include the backslash you've already created. Perform the operations one at a time on paper and you'll see the same effect.
All you need to do is reverse the ordering of the escaping, so that you escape backslashes first and then quotes:
return val.Replace("\r\n", " ")
.Replace("\r", " ")
.Replace("\n", " ")
.Replace("\\", "\\\\")
.Replace("\"", "\\\"");
The problem is here:
Replace("\"", "\\\""); // convert " to \"
Replace("\\", "\\\\"); // which are then converted to \\"
The first line replaces " with \". The second line replaces those new \" with \\".
As Jon says, you need the replacement that escapes the escape character to run before introducing any escape characters.
But, I think you should use a real encoder. ;-)
Related
When I receive input via C# it comes in escaping the \. When I'm trying to parse the string it causes an error because its using \\r instead of \r in the string. Is there some way to prevent it from escaping the \ or perhaps turning \\ into \ in the string. I've tried:
protected string UnEscape(string s)
{
if (s == "")
return " ";
return s.Replace(#"\\", #"\");
}
With no luck. So any other suggestions.
EDIT:
I was not specific enough as some of you seemed confused as to what I'm trying to achieve. In debug I was reading "\\t" in a string but I wanted "\t" not because I want to output \t but because I want to output a [tab]. With the code above I was sort of trying to recreate something that has already been done through Regex.Unescape(string).
The problem is that most .NET components do not process backslash escape sequences in strings: the compiler does it for them when the string is presented as a literal. However, there is another .NET component that processes escape sequences - the regex engine. You can use Regex.Unescape to do unescaping for you:
string escaped = #"Hello\thello\nWorld!";
string res = Regex.Unescape(escaped);
Console.WriteLine(res);
This prints
Hello hello
World!
Note that the example uses a verbatim string, so \t and \n are not replaced by the compiler. The string escaped is presented to regex engine with single slashes, (although you would see double slashes if you look at the string in the debugger).
The problem is not that it's escaping the backslash, it's that it's not parsing escape sequences into characters. Instead of getting the \r character when the characters \\ and r are entered, you get them as the two separate characters.
You can't turn #"\\" into #"\" in the string, because there isn't any double backslashes, that's only how the string is displayed when you look at it using debugging tools. It's actually a single backslash, and you can't turn that into the \ part of an escape sequence, because that's not a character by itself.
You need to replace any escape sequence in the input that you want to convert with the corresponding character:
s = s.Replace("\\r", "\r");
Edit:
To handle the special case that Servy is talking about, you replace all escape sequences at once. Example:
s = Regex.Replace(s, #"\\([\\rntb])", m => {
switch (m.Groups[1].Value) {
case "r": return "\r";
case "n": return "\n";
case "t": return "\t";
case "b": return "\b";
default: return "\\";
}
});
If you have the three characters \, \, r in the input and you want to change this to the \r character then try
input.replace(#"\\r", "\r");
If you have the two characters \, r in the input and you want to change this to the \r character then try
input.replace(#"\r", "\r");
I have this text
'Random Text', 'a\nb\\c\'d\\', 'ok'
I want it to become
'Random Text', 'a\nb\c''d\', 'ok'
The issue is escaping. Instead of escaping with \ I now escape only ' with ''. This is for a 3rd party program so I can't change it thus needing to change one escaping method to another.
The issue is \\'. If i do string replace it will become \'' rather than \'. Also \n is not a newline but the actual text \n which shouldn't be modified. I tried using regex but I couldn't think of a way to say if ' replace with '' else if \\ replace with \. Obviously doing this in two step creates the problem.
How do I replace this string properly?
If I understand your question correctly, the issue lies in replacing \\ with \, which can then cause another replacement if it occurs right before '. One technique would be to replace it to an intermediary string first that you're sure will not occur anywhere else, then replace it back after you're done.
var str = #"'Random Text', 'a\nb\\c\'d\\', 'ok'";
str.Replace(#"\\", "NON_OCCURRING_TEMP")
.Replace(#"\'", "''")
.Replace("NON_OCCURRING_TEMP", #"\");
As pointed out by #AlexeiLevenkov, you can also use Regex.Replace to do both modifications simultaneously.
Regex.Replace(str, #"(\\\\)|(\\')",
match => match.Value == #"\\" ? #"\" : #"''");
Seems voithos' interpretation of the question is the right one. Another approach is to use RegEx to find all tokens at once and replace ReguarExpression.Replace
Starting point:
var matches = new Regex(#"\\\\'|\\'|'");
Console.Write(matches.Replace(#"'a b\nc d\\e\'f\\'",
match =>"["+match + "]"));
I'm trying to escape quotes in an xpath string like so:
var mktCapNode = htmlDoc.DocumentNode.SelectSingleNode("//*[#id=""yfs_j10_a""]");
The actual string I want passed is:
//*[#id="yfs_j10_a"]
This gives a compiler errors: ) expected and ; expected
I'm sure it's simple but I'm stumped. Any ideas?
You need to make this a verbatim string to use the "" as an escape
#"//*[#id=""yfs_j10_a""]"
For a normal string literal you need to use backslashes to escape the double quotes
"//*[#id=\"yfs_j10_a\"]"
Or use the escape char '\':
"//*[#id=\"yfs_j10_a\"]"
In C# the \ character is used to escape (see documentation).
This is different from VB where there are no escape characters except "" which escapes to ".
This means in C# you do not need vbCrLf to start a new line or vbTab to add a tab character to a string. Instead use "\r\n" and "\t".
You can also make the string a literal using the # character, but I do not think this works with the quotation mark.
Add the # prefix to your string.
#"//*[#id=""yfs_j10_a""]"
or escape the quotes with a \
"//*[#id=\"yfs_j10_a\"]"
SHORT QUESTION
Let's have a regex, which reads a string inside a double quotes. This string is valid only if it has NO double quotes inside.
("([^"]+)")
How would one write a regex, which would have the same functionality but will also work for a string with a double quotes WITH a preceding slash?
"Valid string" //VALID
"Valid \"string\"" //VALID
"Invalid " + "string" //INVALID
"Invalid " + "\"string\"" //INVALID
LONG QUESTION
I'm building my own gettext implementation - I found out that the official gettext apps ( http://www.gnu.org/s/gettext/ ) are not sufficient to my needs.
That means I need to find all strings inside each C# code file myself, but only those which are passed to a particular function as the only parameter.
I built a regex which gets most of the strings. The function Translate is public, static and is situated in the namespace GetTextLocalization and in the class Localization.
(GetTextLocalization\.)?(Localization\.Translate)\("([^"]+)"\)
Of course, this will ONLY find the strings alone and it won't find any strings with a verbatim character. If a string parameter is being passed as an operation ("string a" + "string b") or starts with a verbatim (#"Verbatim string"), it will not parse, but that is not the problem.
The regex definition:
([^"]+)
says that there must be no double quotes inside the string and I know that noone in the company is connecting the string somehow while passing it in the parameter. Still, I need to have this construction as a safety "what if" measure.
But that also causes the problem. The double quotes actually can be there.
Localization.Translate("Perfectly valid String with \"double quotes\"")
I need to change the regex so it will include the strings with a double quote (so I skip anything like Translate("a" + "b") which would mess with the translation catalog) but only those which are preceded by a slash .
I thought I might need to use this (?!) grouping construct somehow but I have no idea where to place it.
Since you probably want to allow doubled backslashes before a quote, I suggest
"(?:\\.|[^"\\])*"
Explanation:
" # Match "
(?: # Either match
\\. # an escaped character
| # or
[^"\\] # any character except " or \
)* # any number of times.
" # Match "
This matches "hello", "hello\"there" or "hello\\" but fails on "hello" there" or "hello\\" there".
The following works in vb.net, and basically only allows characters on a standard US Keyboard. Any other character pasted gets deleted. I use the following regular expression code:
"[^A-Za-z0-9\[\{\}\]`~!##$%\^&*\(\)_\-+=\\/:;'""<>,\.|? ]", "")
However when I try to use it in C# it won't work, I used '\' as a escape sequence. C# seems a bit different when it comes to escape sequences? Any help would be appreciated.
Prefix the string with #. That's it. From there you can use the regex string from VB as is (including doubling up on the " character).
// Note: exact same string you're using, only with a # verbatim prefix.
string regex = #"[^A-Za-z0-9\[\{\}\]`~!##$%\^&*\(\)_\-+=\\/:;'""<>,\.|? ]";
string crazy = "hĀečlĤlŁoźtƢhǣeǮrȡe";
Console.WriteLine(Regex.Replace(crazy, regex, ""));
Output:
hellothere
Prefix your string with "#" and prefix quotes within the string with "\".
I.e. this string
abc\def"hij
in C# would be encoded as
#"abc\def\"hij"
You need to escape your " character. Do this by putting a \ before your " character.
"[^A-Za-z0-9[{}]`~!##$%\^&*()_-+=\/:;'""<>,.|? ]"
should become
"[^A-Za-z0-9[{}]`~!##$%\^&*()_-+=\/:;'\"\"<>,.|? ]"
If you use the #prefix before this, it will treat the backslash literally instead of an escape character and you wont get the desired result.
Escape your characters:
"[^A-Za-z0-9[{}]`~!##$%\^&*()_-+=\\/:;'\"<>,.|? ]"
A good tool for regular expression design and testing (free) is:
http://www.radsoftware.com.au/regexdesigner/
You need to escape you regex for use in C#
[^A-Za-z0-9\[\{\}\]`~!##$%\^&*\(\)_\-+=\\/:;'\"<>,\.|? ]
Try this one!