I would like .net core System.Text.Json to ignore the single quote character when escaping characters for serialization but I just can't get it to work:
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
encoderSettings.AllowCharacters('\u0027');
var options = new JsonSerializerOptions{
Encoder = JavaScriptEncoder.Create(encoderSettings)
};
System.Text.Json.JsonSerializer.Serialize(new { text = "abc 'zorro' 123" }, options);
This will result in a string:
{"text":"abc \u0027zorro\u0027 123"}
When I would like it to be
{"text":"abc 'zorro' 123"}
Any ideas here? Just want to not escape the single quotes. I've also tried to replace the \u0027 with a \'.
If I do it like this - it works:
options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping
};
jsonString = JsonSerializer.Serialize(new { text = "abc 'zorro' 123" }, options);
...but this will also disable escaping for all characters including <, > and & (excepting only characters that the JSON standard requires to be escaped), which I also don't want.
This behavior is documented in How to customize character encoding with System.Text.Json:
Block lists
The preceding sections show how to specify allow lists of code points or ranges that you don't want to be escaped. However, there are global and encoder-specific block lists that can override certain code points in your allow list. Code points in a block list are always escaped, even if they're included in your allow list.
Global block list
The global block list includes things like private-use characters, control characters, undefined code points, and certain Unicode categories, such as the Space_Separator category, excluding U+0020 SPACE. ... <snip>
Encoder-specific block lists
Examples of encoder-specific blocked code points include '<' and '&' for the HTML encoder, '\' for the JSON encoder, and '%' for the URL encoder. ... <snip>
So, as documented, JavaScriptEncoder.Create() may override your allowed characters and escape certain "blocked" characters. While the full set of blocked characters is not documented, from the reference source, JavaScriptEncoder.Create(TextEncoderSettings settings) constructs an encoder that blocks "HTML sensitive" characters, which are defined in AllowedBmpCodePointsBitmap.cs and include ':
public void ForbidHtmlCharacters()
{
ForbidChar('<');
ForbidChar('>');
ForbidChar('&');
ForbidChar('\''); // can be used to escape attributes
ForbidChar('\"'); // can be used to escape attributes
ForbidChar('+'); // technically not HTML-specific, but can be used to perform UTF7-based attacks
}
If you do not want to use JavaScriptEncoder.UnsafeRelaxedJsonEscaping but also don't want to have ' escaped, could create a custom JsonConverter<string> that manually pieces together the required encoded JSON string, then writes it out using Utf8JsonWriter.WriteRawValue() (which was first introduced in .NET 6):
public class StringConverter : JsonConverter<string>
{
readonly static Lazy<JavaScriptEncoder> Encoder = new (() =>
{
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.BasicLatin);
encoderSettings.AllowCharacters('\u0027');
return JavaScriptEncoder.Create(encoderSettings);
});
public override string? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) => reader.GetString();
public override void Write(Utf8JsonWriter writer, string value, JsonSerializerOptions options)
{
var encoder = Encoder.Value;
using var textWriter = new StringWriter();
textWriter.Write("\"");
foreach (var (startIndex, characterCount, final) in value.SplitIndices('\''))
{
encoder.Encode(textWriter, value, startIndex, characterCount);
if (!final)
textWriter.Write('\'');
}
textWriter.Write("\"");
writer.WriteRawValue(textWriter.ToString(), true);
}
}
public static class StringExtensions
{
public static IEnumerable<(int startIndex, int characterCount, bool final)> SplitIndices(this string value, char separator)
{
if (value == null)
throw new ArgumentNullException(nameof(value));
int index = 0;
int nextIndex;
while ((nextIndex = value.IndexOf(separator, index)) >= 0)
{
yield return (index, nextIndex - index, false);
index = nextIndex + 1;
}
yield return (index, value.Length - index, true);
}
}
Then serialize as follows:
var model = new { text = "abc 'zorro' 123" };
var options = new JsonSerializerOptions
{
Converters = { new StringConverter() },
};
var json = JsonSerializer.Serialize(model, options);
Which results in {"text":"abc 'zorro' 123"} as required. Demo fiddle here.
You could also try to create your own JavaScriptEncoder subclass that ignores global block lists, though that would likely be more involved that creating the custom converter.
Related
JsonNode.Parse() seems to convert my < and > to the escape sequences \u003C and \u003E when they appear inside double-quotes "".
How can I convert these escape sequences back to their original characters?
This is my C# code:
using System.Text.Json.Nodes;
Console.WriteLine("JsonNode test");
var testString = "{ \"testString\" : \"<...>\" }";
Console.WriteLine($"{testString}, {testString.Length}");
var jsonNode = JsonNode.Parse(testString);
var jsonString = jsonNode.ToJsonString();
Console.WriteLine($"{jsonString}, {jsonString.Length}");
Output:
JsonNode test
{ "testString" : "<...>" }, 26
{"testString":"\u003C...\u003E"}, 32
I've tried the HtmlDecode and UrlDecode methods, but they are not right for this situation.
The json is still valid, but I usually always recommend to use Neftonsoft.Json since it has much much less problems, but you can use a string Replace as well
var jsonNode = JsonNode.Parse(testString);
var jsonString = jsonNode.ToJsonString().Replace("\\u003C","<").Replace("\\u003E",">");
result
{"testString":"<...>"}
another option is to use UnsafeRelaxedJsonEscaping, but it is not safe in some cases
var options = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
WriteIndented = true
};
var jsonString = System.Text.Json.JsonSerializer.Serialize(jsonNode, options);
I am editing a serialized string becuase when deserilized it gives a parse error.
so from a long Serialized string I want to edit "myVar\": \"0.2 mm" with "myVar\": \"0.2"
if I use the follwoing code it works
string NewString = Serializedstring.Replace($"myVar\": \"0.2 mm", $"myVar\": \"0.2")
but my 0.2 is a varible that may change with every occurance. so all I want is to remove mm from the string "myVar\": \"0.2 mm"
If your JSON is coming in with two different formats, rather than trying to hack the JSON string into something usable, it is much safer to use a custom JsonConverter. For example:
public class MillimetreJsonConverter : JsonConverter<double>
{
public override double Read(ref Utf8JsonReader reader, Type typeToConvert,
JsonSerializerOptions options)
{
// First try to get a double, if it works then simply return it
if(reader.TryGetDouble(out var val))
{
return val;
}
// Otherwise we get the string value e.g. "0.2 mm" and
// do some simple string manipulation on it
var value = reader.GetString()!;
value = value.Replace(" mm", "");
if(double.TryParse(value, out var result))
{
return result;
}
// If we get here, perhaps we should throw an exception?
return 0;
}
public override void Write(Utf8JsonWriter writer, double value,
JsonSerializerOptions options)
{
// You can fill this in if you need it
throw new NotImplementedException();
}
}
Now you can modify the class to deserialise into, assuming your JSON looks like this {"myVar": "0.2 mm"},:
public class Foo
{
[JsonConverter(typeof(MillimetreJsonConverter))]
public double myVar { get; set; }
}
Finally, it's simple to deserialise:
var foo = JsonSerializer.Deserialize<Foo>(json);
I can recommend 2 alternate approaches
1)match what you can.
var str = NewString.Replace(" mm,\"", "");
if your string is not going to have a space followed by mm" anywhere else that should be fine.
2)A safer option would be to deserialize it into something that can handle it (i assume your deserializing that property to a number currently) and then string replace the property and map it to what you need it.
There are lots of ways to do this, RegEx, String tokenisation, char array searching
RegEx is probably the closest to what you are currently doing
the Regular expression syntax is described here Basics of RegEx
the pattern [0-9.]+ should match an decimal number but be warned it will also match anything that inclues numbers and dots such as ip addresses
so if you have a regex of
Regex rx = new Regex(#"([0-9.]+) mm")
<your string> = rx.Replace(<Your string>, #"$1");
the details are :
[0-9.] any number or dot
+ one or more of what ever preceded it
() a group that is of special interest
$1 the number of the group that you want to replace the match with
note $0 is the entire input string
this will replace any string that is (number) mm with just the value of the number
C#, I have an Automobile class and in that class i have a vehicleTrim field.
I use JsonConvert.SerializeObject to serialize that class and it is not escaping the single quote.
This is causing an issue when i try to set the value of an object in the web via window.localStorage.setItem function.
example:
public class Automobile
{
public string vehicleTrim { get; set; }
}
var test = new Automobile()
{
vehicleTrim = "designer's package"
};
var serialized = JsonConvert.SerializeObject(test, Formatting.None);
// serialized output: {"vehicleTrim":"designer's package"}
// expected output : {"vehicleTrim":"designer\'s package"}
so now i want to set this json object to the localstorage of my web by calling this
var jsSetScript = $"window.localStorage.setItem('automobile', '{serialized}');";
await Control.EvaluateJavascriptAsync(jsSetScript);
EvaluateJavascriptAsync returns this error trying to read the json SyntaxError: Unexpected identifier 's'. Expected ')' to end an argument list.
I manaully tried this with the escaped single quote and it was fine. So the question is how can i make serializedobject method escape the single quote?
"\'" is not even a valid JSON string literal. From the JSON spec:
Thus ' does not need to be escaped, but if it is, it must appear as "\u0027". Only the 8 listed characters have a special, abbreviated escaping syntax. (For further details see RFC 8259.)
If "\u0027" meets your needs, then setting JsonSerializerSettings.StringEscapeHandling to StringEscapeHandling.EscapeHtml should do the trick. From the docs:
StringEscapeHandling Enumeration
Specifies how strings are escaped when writing JSON text.
Default 0 Only control characters (e.g. newline) are escaped.
EscapeNonAscii 1 All non-ASCII and control characters (e.g. newline) are escaped.
EscapeHtml 2 HTML (<, >, &, ', ") and control characters (e.g. newline) are escaped.
Thus the following now succeeds:
var settings = new JsonSerializerSettings
{
StringEscapeHandling = StringEscapeHandling.EscapeHtml,
};
var serialized = JsonConvert.SerializeObject(test, Formatting.None, settings);
Console.WriteLine(serialized);
// Outputs {"vehicleTrim":"designer\u0027s package"}
Assert.IsTrue(!serialized.Contains('\''));
// Succeeds
Demo fiddle here.
I am confused by all the different escaping mechanisms for strings in C#. What I want is an escaping/unescaping method that:
1) Can be used on any string
2) escape+unescape is guaranteed to return the initial string
3) Replaces all punctuation with something else. If that is too much to ask, then at least commas, braces, and #. I am fine with spaces not being escaped.
4) Is unlikely to ever change.
Does it exist?
EDIT: This is for purposes of seriliazing and deserializing app-generated attributes. So my object may or may not have values for Attribute1, Attribute2, Attribute3, etc. Simplifying a bit, the idea is to do something like the below. Goal is to have the encoded collection be brief and more-or-less human-readable.
I am asking what methods would make sense to use for Escape and Unescape.
public abstract class GenericAttribute {
const string key1 = "KEY1"; //It is fine to put some restrictions on the keys, i.e. no punctuation
const string key2 = "KEY2";
public abstract string Encode(); // NO RESTRICTIONS ON WHAT ENCODE MIGHT RETURN
public static GenericAttribute FromKeyValuePair (string key, string value) {
switch (key) {
case key1: return new ConcreteAttribute1(value);
case key2: return new ConcreteAttribute2(value);
// etc.
}
}
}
public class AttributeCollection {
Dictionary <string, GenericAttribute> Content {get;set;}
public string Encode() {
string r = "";
bool first = true;
foreach (KeyValuePair<string, GenericAttribute> pair in this.Content) {
if (first) {
first = false;
} else {
r+=",";
}
r+=(pair.Key + "=" + Escape(pair.Value.Encode()));
}
return r;
}
public AttributeCollection(string encodedCollection) {
// input string is the return value of the Encode method
this.Content = new Dictionary<string, GenericAttribute>();
string[] array = encodedCollection.Split(',');
foreach(string component in array) {
int equalsIndex = component.IndexOf('=');
string key = component.Substring(0, equalsIndex);
string value = component.Substring(equalsIndex+1);
GenericAttribute attribute = GenericAttribute.FromKeyValuePair(key, Unescape(value));
this.Content[key]=attribute;
}
}
}
I'm not entirely sure what your asking, but I believe your intent is for the escaped character to be included, even with the escape.
var content = #"\'Hello";
Console.WriteLine(content);
// Output:
\'Hello
By utilizing the # it will include said escaping, making it apart of your string. That is for the server-side with C#, to account for other languages and escape formats only you would know that.
You can find some great information on C# escaping here:
MSDN Blog
Try using HttpServerUtility.UrlEncode and HttpServerUtility.UrlDecode. I think that will encode and decode all the things you want.
See the MSDN Docs and here is a description of the mapping on Wikipedia.
I am trying to create a generic formatter/parser combination.
Example scenario:
I have a string for string.Format(), e.g. var format = "{0}-{1}"
I have an array of object (string) for the input, e.g. var arr = new[] { "asdf", "qwer" }
I am formatting the array using the format string, e.g. var res = string.Format(format, arr)
What I am trying to do is to revert back the formatted string back into the array of object (string). Something like (pseudo code):
var arr2 = string.Unformat(format, res)
// when: res = "asdf-qwer"
// arr2 should be equal to arr
Anyone have experience doing something like this? I'm thinking about using regular expressions (modify the original format string, and then pass it to Regex.Matches to get the array) and run it for each placeholder in the format string. Is this feasible or is there any other more efficient solution?
While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.
One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact(), akin to DateTime.ParseExact(). Data is returned as an array of strings, but if you can live with that, it is terribly handy.
public static class StringExtensions
{
public static string[] ParseExact(
this string data,
string format)
{
return ParseExact(data, format, false);
}
public static string[] ParseExact(
this string data,
string format,
bool ignoreCase)
{
string[] values;
if (TryParseExact(data, format, out values, ignoreCase))
return values;
else
throw new ArgumentException("Format not compatible with value.");
}
public static bool TryExtract(
this string data,
string format,
out string[] values)
{
return TryParseExact(data, format, out values, false);
}
public static bool TryParseExact(
this string data,
string format,
out string[] values,
bool ignoreCase)
{
int tokenCount = 0;
format = Regex.Escape(format).Replace("\\{", "{");
for (tokenCount = 0; ; tokenCount++)
{
string token = string.Format("{{{0}}}", tokenCount);
if (!format.Contains(token)) break;
format = format.Replace(token,
string.Format("(?'group{0}'.*)", tokenCount));
}
RegexOptions options =
ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;
Match match = new Regex(format, options).Match(data);
if (tokenCount != (match.Groups.Count - 1))
{
values = new string[] { };
return false;
}
else
{
values = new string[tokenCount];
for (int index = 0; index < tokenCount; index++)
values[index] =
match.Groups[string.Format("group{0}", index)].Value;
return true;
}
}
}
You can't unformat because information is lost. String.Format is a "destructive" algorithm, which means you can't (always) go back.
Create a new class inheriting from string, where you add a member that keeps track of the "{0}-{1}" and the { "asdf", "qwer" }, override ToString(), and modify a little your code.
If it becomes too tricky, just create the same class, but not inheriting from string and modify a little more your code.
IMO, that's the best way to do this.
It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format method. Assume:
String.Format("{0}-{1}", "hello-world", "stack-overflow");
How would you "Unformat" it?
Assuming "-" is not in the original strings, can you not just use Split?
var arr2 = formattedString.Split('-');
Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.
A simple solution might be to
replace all format tokens with (.*)
escape all other special charaters in format
make the regex match non-greedy
This would resolve the ambiguities to the shortest possible match.
(I'm not good at RegEx, so please correct me, folks :))
After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:
Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
...
var arr = new string [] {"asdf", "qwer" };
var res = string.Format(format, arr);
unFormatLookup.Add(res,arr);
and in Unformat method, you can simply pass a string and look up that string and return the array used:
string [] Unformat(string res)
{
string [] arr;
unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
return arr;
}