CBOR serialization (string escaping) - c#

I was CBOR serializing a JSON object in C++ with nlohmann::json library and my use case involves reading the cbor byte string output in c#. I've noticed that, whereas when dumping a json object to a string in C++ with nlohmann::json library, json string values (i.e., case value_t::string) are escaped (a call to escape_string is made), no such call is made when json values are string values in the CBOR approach.
I was reading the CBOR CRF 7049 and it seems that strings do not need to be escaped when serializing to CBOR.
The behavior in the nlohmann::json library is consistent: strings are not escaped when serializing, nor excepted to be escaped when de-serializing.
But it appears that Newtonsoft.Json (C# library), expects that. Is it a valid expectation? Or am I doing something wrong in the process?
C++ side:
nlohmann::json json_doc;
json_doc["characters"] = nlohmann::json::array();
for (int i = 0; i < characters.size(); i++) {
json_doc["characters"][i]["name"] = (characters[i] != nullptr) ? characters[i]->name() : "";
}
std::vector<uint8_t> cbor = nlohmann::json::to_cbor(json_doc);
output->assign((char*)&cbor[0], cbor.size());
C# side. cbor_bytes is the cbor byte string (c++ output vector)
CBORObject cbor = CBORObject.DecodeFromBytes(cbor_bytes);
output = cbor.ToString();
Such output string by then, is wrongly formed:
{"characters": [{"name": "Clara Oswald"}, {"name": "Kensi Blye"}, {"name": "Temperance "Bones" Brennan"}]}
and cannot, obviously be parsed:
JObject output_obj = JObject.Parse(output);

CBOR (Concise Binary Object Representation) is not JSON (JavaScript Object Notation). Although CBOR may have borrowed some concepts from JSON, it is clearly a different format with different rules and goals. CBOR is a binary format; JSON is text. In CBOR, strings have length prefixes, whereas they do not in JSON. Furthermore, CBOR does not allow arbitrary whitespace between elements (it wouldn't make sense for a binary format), whereas JSON does (for human readability). Ultimately, CBOR does not need a mechanism to escape strings because it does not require delimiters to tell where a string starts and ends. JSON, on the other hand, requires double quotes to mark the beginning and end of each string. As a consequence, quotes and control characters within strings must be escaped with backslashes in JSON, as well as literal backslashes themselves. There is no getting around this rule if you want to ensure the JSON will be parsable.
In your code above you are using the CBORObject.ToString() method to turn the object into a string. If this CBORObject is from a third-party library, does the documentation state that ToString() will produce valid JSON? If so, then it definitely has a bug; it should be doing the proper escaping as required by the JSON spec. If there is no such promise of valid JSON, then you can't expect that Json.Net will be able to parse the string, even if it sort of looks like JSON. (You might check to see whether the CBORObject has some other dedicated method like ToJson() for performing this conversion.) If CBORObject is your own code, then it is on you to escape the strings properly when converting from CBOR to JSON.

Related

Are there any methods to automatically convert strings inside JSON response to BASE64?

I'm returning objects as JSON results from my APIController inside an ASP.NET MVC web-API.
To convert objects to JSON results I use the below syntax:
return Ok(new WebServiceResult(...));
Are there any methods to convert strings automatically to BASE64 encoding during object to JSON conversions?
Can OK handle non-English characters during JSON conversions and avoid failure to decode that string on other platforms (I mean does it support encodings like UTF-8 or does it handle escape characters that reside inside the contents which are going to be converted such as { character or : character)?

how can I Deserialize emoji in json in C#

I have a json file that include emoji when I want to deserialize it , it could not deserialize emoji to string.
my code is:
var mystring ={"message":"jjasdajdasjdj laslla aasdasd ssdfdsf!!! 🙌\u{1F3FD}", "updated_time":"2015-04-14T22:37:13+0000", "id":"145193995506_148030368559"}
FaceBookIdea ideaDetails = JsonConvert.DeserializeObject<FaceBookIdea>((mystring).ToString());
the error is :
{"Input string was not in a correct format."}
when I remove emoji it works well.
Thank a lot for your help
Your problem is that this portion of your message string does not conform to the JSON standard:
"\u{1F3FD}"
According to the standard, \u four-hex-digits represents a unicode character literal given by the hex value of its code point. Your string \u{1F3FD} with its curly braces does not conform to this convention, and so Json.NET throws an exception upon trying to parse it. You will see a similar error if you upload your JSON to https://jsonformatter.curiousconcept.com/.
Thus it would seem, to fix your JSON to make it conform to the standard, you need to format your character like \uXXXX using the appropriate 4 hex digits. However, your character, U+1F3FD, is larger than 0xFFFF and does not exist on the Unicode Basic Multilingual Plane. It cannot be represented as a single 4-digit hex number. c# (and utf-16 in general) represents such Unicode characters as surrogate pairs -- pairs of two two-byte chars. You will need to do the same here. The UTF-16 (hex) representation of your character is
0xD83C 0xDFFD
Thus your JSON character needs to be:
\uD83C\uDFFD
And for your entire string:
{"message":"jjasdajdasjdj laslla aasdasd ssdfdsf!!! 🙌\uD83C\uDFFD", "updated_time":"2015-04-14T22:37:13+0000", "id":"145193995506_148030368559"}

Best Method of standard string to XML legal string - C#

Currently my understanding of XML legal strings is that all is required is that you convert any instances of: &, ", ', <, > with & " &apos; < >
So I made the following parser:
private static string ToXmlCompliantStr(string uriStr)
{
string uriXml = uriStr;
uriXml = uriXml.Replace("&", "&");
uriXml = uriXml.Replace("\"", """);
uriXml = uriXml.Replace("'", "&apos;");
uriXml = uriXml.Replace("<", "<");
uriXml = uriXml.Replace(">", ">");
return uriXml;
}
I am aware that there are similar questions out there with good answers (which is how I was able to write this function) I am writing this question to ask if this code will translate ANY string that C# can throw at it and have XDocument parse it as a part of a whole document without any complaints as all the questions out there that I've found state that these are the only escape characters, not that parsing them will cause 100% valid XML string. I've gone as far as reading through the decompiled XNode class trying to see how that parse it.
Thanks
Firstly, you should absolutely not do this yourself. Use an XML API - that way you can trust that to do the right thing, rather than worrying about covering corner cases etc. You generally shouldn't be trying to come up with an "escaped string" at all - you should pass the string to the XElement constructor (or XAttribute, or whatever your situation is).
In other words, I think you should try really hard to design your application so that you don't need a method of the kind you've shown in your question at all. Look at where you'd be using that method, and see whether you can just create an XElement (or whatever) instead. If you try to treat XML as a data structure in itself rather than just as text, you'll have a much better experience in my experience.
Secondly, you need to understand that in XML 1.0 at least, there are Unicode characters that cannot be validly represented in XML, no matter how much escaping you use. In particular, values U+0000 to U+001F are unrepresentable other than U+0009 (tab), U+000A (line feed) and U+000D (carriage return). Also if you have a string which contains invalid UTF-16 (e.g. an unmatched half of a surrogate pair), that can't be correctly represented in XML.

How automatic escape quotes in json (C#)

From server I get json. Json is very big. I show litle piece of this
{
"id": "9429531978965160",
"name": "Morning in "Paris"", // json.net cannot deserialize this line, because line have no escaped quotes.
"alias": "ThisAlias"
}
The problem is the server side that generates invalid JSON.
You could try writing a regex that fixes this (searches for any quotes in between the third and last). Just note that there might be many other issues with the JSON, like newlines that are not escaped etc.
It's not just that the output you are receiving is non-standard json, it's broken in such a way that it's not a well-defined language and doesn't parse unambiguously even in the simple cases. How should you parse {"a": "A", "b": "B"}? One way is as legal json. Another valid parse is a single property a with the value "A\", \"b\": \"B".
As others have said, the best resolution is to fix the server so that it no longer outputs invalid garbage. If that's not an option, you'll have to write your own parser. A normal parser would declare an syntax error at the 'P' in "Paris". Your parser could back up to the last quote token and try to treat it as if it were escaped. The next syntax error is at the second of the consecutive quotes, and again it could back up and treat the quote token as if it were escaped. If there are any other ways in which the input deviates from legal json you'll need to handle those as well.
If you're not familiar with parsers, this will take a while. And when you're done you'll have a parser that recognizes a poorly-specified and almost totally useless language, which is to say that it will largely be a waste of time. Do what you can to fix it on the server side.

C# UTF8 encoding

I have a c# program that retrieve some JSON data and use Newtonsoft JSON to Deserialize it.
as i use persian chars in my program the JSON codes will be shown like this:\u060c \u067e\u0644\u0627\u06a9 .... also after i retrive the JSON data in my program this chars still show like its coded sample.but after i Deserialize it converted to ???? chars.
what should i do?
Your JSON deserializer is broken; \uXXXX is supposed to be turned into proper characters.
To do that yourself, use this function
// Turns every occurrence of \uXXXX into a proper character
void UnencodeJSONUnicode(string str) {
return Regex.Replace(str,
#"\\u(?<value>[0-9a-f]{4})",
match => {
string digits = match.Groups["value"].Value;
int number = int.Parse(digits, NumberStyles.HexNumber);
return char.ConvertFromUtf32(number);
});
}
(Untested code; I don't have VS available at the moment. Some exception handling would probably be nice too)
Looks like it has been JSON encoded, so you need to decode it. The DataContractJsonSerializer class can do this.
See this MSDN link for more information.

Categories