C# UTF8 encoding - c#

I have a c# program that retrieve some JSON data and use Newtonsoft JSON to Deserialize it.
as i use persian chars in my program the JSON codes will be shown like this:\u060c \u067e\u0644\u0627\u06a9 .... also after i retrive the JSON data in my program this chars still show like its coded sample.but after i Deserialize it converted to ???? chars.
what should i do?

Your JSON deserializer is broken; \uXXXX is supposed to be turned into proper characters.
To do that yourself, use this function
// Turns every occurrence of \uXXXX into a proper character
void UnencodeJSONUnicode(string str) {
return Regex.Replace(str,
#"\\u(?<value>[0-9a-f]{4})",
match => {
string digits = match.Groups["value"].Value;
int number = int.Parse(digits, NumberStyles.HexNumber);
return char.ConvertFromUtf32(number);
});
}
(Untested code; I don't have VS available at the moment. Some exception handling would probably be nice too)

Looks like it has been JSON encoded, so you need to decode it. The DataContractJsonSerializer class can do this.
See this MSDN link for more information.

Related

Preparing a String to be used in Json

I have a string where I need to use as the body of a JSON object. I know its possible that the data could have quotes in it, so I parse through to add an escape character to those instance of quotes.. like so:
string NewComment = comment.Replace("\"", "\\\"");
However, somehow on some edgecases, a quote still makes it through. I don't know if this is something with UTF or some other issue, But I am trying to find a function that would safely create a json compatible string, I figured there has to be something like this out there, or a regex way of doing so.
Basically a TLDR is how to create a json syntax safe string from a c# string
The simple answer is don't do it this way. What if you have escaped quotes in your string? "Hello \"World\"" would become invalid with such a simple approach: "Hello \\"World\\"". JSON.Net or Newtonsoft are going to save you so many headaches in the long run.

CBOR serialization (string escaping)

I was CBOR serializing a JSON object in C++ with nlohmann::json library and my use case involves reading the cbor byte string output in c#. I've noticed that, whereas when dumping a json object to a string in C++ with nlohmann::json library, json string values (i.e., case value_t::string) are escaped (a call to escape_string is made), no such call is made when json values are string values in the CBOR approach.
I was reading the CBOR CRF 7049 and it seems that strings do not need to be escaped when serializing to CBOR.
The behavior in the nlohmann::json library is consistent: strings are not escaped when serializing, nor excepted to be escaped when de-serializing.
But it appears that Newtonsoft.Json (C# library), expects that. Is it a valid expectation? Or am I doing something wrong in the process?
C++ side:
nlohmann::json json_doc;
json_doc["characters"] = nlohmann::json::array();
for (int i = 0; i < characters.size(); i++) {
json_doc["characters"][i]["name"] = (characters[i] != nullptr) ? characters[i]->name() : "";
}
std::vector<uint8_t> cbor = nlohmann::json::to_cbor(json_doc);
output->assign((char*)&cbor[0], cbor.size());
C# side. cbor_bytes is the cbor byte string (c++ output vector)
CBORObject cbor = CBORObject.DecodeFromBytes(cbor_bytes);
output = cbor.ToString();
Such output string by then, is wrongly formed:
{"characters": [{"name": "Clara Oswald"}, {"name": "Kensi Blye"}, {"name": "Temperance "Bones" Brennan"}]}
and cannot, obviously be parsed:
JObject output_obj = JObject.Parse(output);
CBOR (Concise Binary Object Representation) is not JSON (JavaScript Object Notation). Although CBOR may have borrowed some concepts from JSON, it is clearly a different format with different rules and goals. CBOR is a binary format; JSON is text. In CBOR, strings have length prefixes, whereas they do not in JSON. Furthermore, CBOR does not allow arbitrary whitespace between elements (it wouldn't make sense for a binary format), whereas JSON does (for human readability). Ultimately, CBOR does not need a mechanism to escape strings because it does not require delimiters to tell where a string starts and ends. JSON, on the other hand, requires double quotes to mark the beginning and end of each string. As a consequence, quotes and control characters within strings must be escaped with backslashes in JSON, as well as literal backslashes themselves. There is no getting around this rule if you want to ensure the JSON will be parsable.
In your code above you are using the CBORObject.ToString() method to turn the object into a string. If this CBORObject is from a third-party library, does the documentation state that ToString() will produce valid JSON? If so, then it definitely has a bug; it should be doing the proper escaping as required by the JSON spec. If there is no such promise of valid JSON, then you can't expect that Json.Net will be able to parse the string, even if it sort of looks like JSON. (You might check to see whether the CBORObject has some other dedicated method like ToJson() for performing this conversion.) If CBORObject is your own code, then it is on you to escape the strings properly when converting from CBOR to JSON.

How do I read this Object

I want to read the alert message from the object
{
alert = "1\n2\n3";
sound = default;
}
I have tried serializing it to JSON with newtonsoft and I've also tried converting it to a class. but both failed due to the formatting.
That is not valid JSON, so the best thing you can do is try to parse it yourself.
You can use a full-fledged parser/lexer for it, like ANTLR. You might get enough with some C# or regex, depending on other outputs to expect.
This regex might be a start:
alert = \"(.*?)\";.* sound = (.*?);
I have tried serializing it to JSON
But that's not JSON.
So you could try with some regex to extract the desired value:
var match = Regex.Match(payloadStr, #"alert\s=\s\""(.+)\""");
if (match.Success)
{
string alertText = match.Groups[1].Value;
}
As far as how reliable this regex is would very much depend on this custom format that is being used and what possible values it can get.

Unicode literal string

I'm sending some JSON in an HTTP POST request. Some of the text within the JSON object is supposed to have superscripts.
If I create my string in C# like this:
string s = "here is my superscript: \u00B9";
... it converts the \u00B9 to the actual superscript 1, which breaks my JSON. I want the \u00B9 to show up exactly as I write it in the the string, not as a superscript.
If I add an escape character, then it shows up like:
"here is my superscript: \\u00B9"
I don't want to use an escape character, but I also don't want it to be converted to the actual superscript. Is there a way to have C# not do Unicode conversion and leave it as literally: "\u00B9"?
If I understand your question correctly... add the at symbol (#) before the string to avoid the escape sequences being processed
string s = #"here is my superscript: \u00B9";
http://msdn.microsoft.com/en-us/library/362314fe(v=vs.80).aspx
I like #NinjaNye's answer, but the other approach is to use a double-backslash to make it literal. Thus string s = "here is my superscript: \\u00B9"
is recommended you encode your string before send to server. You can encode using base64 or URLEncode in client and decode in server side.

How divide a string into array

If I have the following plain string, how do I divide it into an array of three elements?
{["a","English"],["b","US"],["c","Chinese"]}
["a","English"],["b","US"],["c","Chinese"]
This problem is related to JSON string parsing, so I wonder if there is any API to facilitate the conversion.
use DataContract serialization http://msdn.microsoft.com/en-us/library/bb412179.aspx
I wrote a little console example using regex there is most likely a better way to do it.
static void Main(string[] args)
{
string str = "{[\"a\",\"English\"],[\"b\",\"US\"],[\"c\",\"Chinese\"]}";
foreach (System.Text.RegularExpressions.Match m in System.Text.RegularExpressions.Regex.Matches(str, #"((\[.*?\]))"))
{
Console.WriteLine(m.Captures[0]);
}
}
ASP.NET MVC comes with methods for easily converting collections to JSON format. There is also the JavaScriptSerializer Class in System.Web.Script.Serialization. Lastly there is also a good 3rd party library by James Newton called Json.NET that you can use.
Remove the curly braces then use String.Split, with ',' as the separator.
Unfortunately, I never did JSON stuff so don't know a parsing library. Can't you let WCF do this stuff for you ?
using String.Split won't work on a single token as the string in each token also contain a string (if I understood the requirements, the array elements should end up being:
["a","English"]
["b","US"]
["c","Chinese"]
If you use string.Split and use a comma as the delimiter the array will be made up of:
["a"
"English"]
["b"
"US"]
["c"
"Chinese"]
A JSON parser that I've read about but never used is available here:
http://james.newtonking.com/pages/json-net.aspx

Categories