Problem when serializing objects with strings that contain "/"

Problem when serializing objects with strings that contain "/" - c#

I am using DataContractJsonSerializer to serialize an object, and to do this I am using the following function:
public static string Serialize<T>(T obj)
{
string returnVal = "";
try
{
DataContractJsonSerializer serializer = new DataContractJsonSerializer(obj.GetType());
using (MemoryStream ms = new MemoryStream())
{
serializer.WriteObject(ms, obj);
returnVal = Encoding.UTF8.GetString(ms.ToArray());
}
}
catch (Exception /*exception*/)
{
returnVal = "";
//log error
}
return returnVal;
}
Now, this function is working well and great...except in the following situation (I am dubitative if to change it, since I don't know how it will affect the rest of my code).
The situation in which it does not work well
Say I have obj (the argument) an object such as:
[DataContract()]
public class theObject
{
[DataMember()]
public string image;
}
in which image holds the Base64 value of a BMP file.
It is a big value but for example it would start as: "Qk1W/QAAAAAAADYAAAAoAAAAawAAAMgAAAABABgAAAAAACD9AADEDgAAxA4AAAAAAAAAAAAA////////////////////////////////////7+/...."
So you see that it contains a lot of /s.
So when I pass this object to Serialize it will WriteObject in ms and then get this into an array that finally will go to returnVal.
Now let's examine returnVal. It is in JSON format (correct) and when you visualize it as JSON it will show you:
image:"Qk1W/QAAAAAAADYAAAAoAAAAawAAAMgAAAABABgAAAAAACD9AADEDgAAxA4AAAAAAAAAAAAA////////////////////////////////////7+/...."
However! when you visualize it as text it will show you:
"image":"Qk1W\/QAAAAAAADYAAAAoAAAAawAAAMgAAAABABgAAAAAACD9AADEDgAAxA4AAAAAAAAAAAAA\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/7+\/..."
Did you see? it has inserted \ before every / and it makes a lot of difference.
So my questions are:
Why visualizing it as JSON and visualizing it as Text shows different things?
How can I get after serialization the correct value (without the /s)
EDIT:
Although one can say that \/ and / are the same, the consequences are not. Later when using this JSON to throw it to a Web Api using
byte[] bytes = Encoding.UTF8.GetBytes(json);
ByteArrayContent byteContent = new ByteArrayContent(bytes);
byteContent.Headers.ContentType = new MediaTypeWithQualityHeaderValue(content);
the version with added \ results in a bytes with 115442 bytes while the version that only uses / results in bytes of 86535 bytes. Therefore the results are quite different.
So How can I get the result without the added \s?

The standard behavior of the DataContractJsonSerializer is to escape / characters in strings so that they become \/ in JSON. When the JSON is deserialized back to an object, the \/ escape sequences will be turned back into / so no data is lost or corrupted. (Try it and see.) However it does result in a larger JSON size in bytes. If this is really a concern for you, there are couple of things you can do to work around it:
Approach 1
Immediately after serializing, you could use string.Replace() to get rid of all backslashes which appear directly before slashes. You can do this right in your Serialize method by changing this line:
returnVal = Encoding.UTF8.GetString(ms.ToArray());
to this:
returnVal = Encoding.UTF8.GetString(ms.ToArray()).Replace("\\/", "/");
Because / has no special meaning in JSON, it's not actually necessary to escape them with \, although it is permissible to do so. (See page 5 of the JSON specification.) DataContractJsonSerializer will still deserialize the JSON just fine even when slashes are not escaped. (Try it yourself and see. I'd make a fiddle for this, but .NET Fiddle doesn't support DataContractJsonSerializer).
Approach 2 (recommended)
Switch to a better JSON serializer such as Json.Net which does not escape the slashes in the first place. You can simplify your code and replace your entire Serialize method with JsonConvert.SerializeObject()
Fiddle: https://dotnetfiddle.net/MQKXSD

Related

Prevent Newtonsoft JSON deserializer from accepting strings with new lines (enforce strict JSON rules)?

Json.Net allows new lines in a string value during deserialization which is against the JSON specification - how to prevent that and make JSON.Net to strictly enforce JSON rules?
We have some server side code that uses Newtonsoft to parse some JSON. The same JSON seems to fail to parse in javascript, and mysql's JSON_VALID function returns 0. Just wondering if there is a way to have Newtonsoft be more strict about deserialization. Example, here is code that runs, that should throw an exception because JSON can not have embedded new lines in strings.
string jsonStr = "{ \"bob\":\"line1\nline2\" }";
var obj = Newtonsoft.Json.JsonConvert.DeserializeObject(jsonStr);
If you look at jsonStr in the debugger, specifically using the text visualizer, you see the line break. As expected this exact string gets passed to an actual JavaScript engine, parsing fails:
JSON.parse("{ \"bob\":\"line1\nline2\" }")
VM137:1 Uncaught SyntaxError: Unexpected token
Note that serialization code seems to do the "right" thing. i.e. escapes the slash in the new line when creating output.
public class Test
{
public string Name { get; set;}
}
Test t = new Test();
t.Name = "Bob\nFrank";
string jsonOut = Newtonsoft.Json.JsonConvert.SerializeObject(t);
Am I missing something?

I've debugged the Newtonsoft Json.NET and I'll say you can't. Everything that is interesting happens in the JsonTextReader class, and there is no useful override point. The path you are interested in is Read()->ParseValue()->ParseString()->ReadStringIntoBuffer() and there, toward the end there are:
case StringUtils.CarriageReturn:
_charPos = charPos - 1;
ProcessCarriageReturn(true);
charPos = _charPos;
break;
case StringUtils.LineFeed:
_charPos = charPos - 1;
ProcessLineFeed();
charPos = _charPos;
break;
that will "accept" newlines inside strings.
Worse, there is not even an overridable method or event about "begin and end of string parsing".
You could clearly rewrite the whole JsonTextReader, but you can't simply copy and paste it in a new file, because it uses internal classes of Json.NET (like StringBuffer, StringReference, StringUtils, CollectionUtils, ConvertUtils, MiscellaneousUtils (but only for .Assert) plus internal methods of JsonReader...)
Remember that if all you want is check if the Json is valid, you can try the other parsers that exist. Microsoft gives you two: there is the (old)JavascriptSerializer and the (new)JsonSerializer.

Characters added and wrong output during serialization with Json.NET

JSON.NET seems to serialize my code into what appear to be strings, instead of objects. Here's an example of what it returns:
"{\"kvk_nummer\":11111111,\"onderneming\":\"berijf B.V.\",\"vestigingsplaats\":\"AMSTERDAM\",\"actief\":1}"
It also adds strange backslashes, I tried to get rid of them, but none of the answers I've found seemed to have helped. Here is the code that returns the string.
getregister r = new getregister
{
kvk_nummer = col1, //contains an 8 digit number
onderneming = checkTotaal[col1], //contains a name
vestigingsplaats = checkTotaal2[col1], //contains a location
actief = 1 // bool that represents wether the company is active or not
};
yield return JsonConvert.SerializeObject(r);
How can i get JSON.NET to output an object, instead of some JSON strings?

Looks like you're confusing some stuff. Taken from Serialization (C#)
Serialization is the process of converting an object into a stream of bytes to store the object or transmit it to memory, a database, or a file. Its main purpose is to save the state of an object in order to be able to recreate it when needed. The reverse process is called deserialization.
When you serialize into JSON, you get a JSON representation of your object. Which is a string representation. Taken from the JSON Wikipedia page:
JavaScript Object Notation or JSON is an open-standard file format that uses human-readable text to transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value).
In short: your code is doing what you're asking it to do. As far as the slashes go: those are escape characters. If you want (JSON.NET to return) an object, return the object you're creating (r).
return new getregister
{
kvk_nummer = col1, //contains an 8 digit number
onderneming = checkTotaal[col1], //contains a name
vestigingsplaats = checkTotaal2[col1], //contains a location
actief = 1 // bool that represents wether the company is active or not
};
If you're looking for a way to have JSON.NET return an object, you should take a look into Deserializing it. Since that takes the string-representation (JSON) for your object, and turns it back into an actual object for you.

convert byte array to string but not with Convert.ToBase64

Dears
I have a byte array that is returned from web server , it is a part of json-serialized object (property value)
It looks like below in the json string:
,"n":"y1GpP7FibyTYl40Jhx1B90WOi1mecJfpi4IEhbHPbAB64jhV16UlpEPyGpNIzDS4Lct80sIs7FW5Vnf38Z-tzPbtHyFVYYU2AC4SVrwQp9-ELz-..._xW3bmMxuwoBgHpWDTw"
Please note that there is no double equal sign at the end, like for Base64 strings. I've used three dots (...) to make string representation a little bit shorter
I can deserialize object and get proper byte array:
var kb = JsonConvert.DeserializeObject<KeyBundle>(Properties.Resources.keyBundleJson);
And can it serialize to json back:
JsonSerializerSettings settings = new JsonSerializerSettings
{
TypeNameHandling = TypeNameHandling.None,
Formatting = Formatting.Indented
};
string json = JsonConvert.SerializeObject(kb, settings);
But the problem is that result property value looks not the same as original string:
from web server it was:
y1GpP7FibyTYl40Jhx1B90WOi1mecJfpi4IEhbHPbAB64jhV16UlpEPyGpNIzDS4Lct80sIs7FW5Vnf38Z-tzPbtHyFVYYU2AC4SVrwQp9-ELz-..._xW3bmMxuwoBgHpWDTw
serialized locally:
y1GpP7FibyTYl40Jhx1B90WOi1mecJfpi4IEhbHPbAB64jhV16UlpEPyGpNIzDS4Lct80sIs7FW5Vnf38Z+tzPbtHyFVYYU2AC4SVrwQp9+ELz+.../xW3bmMxuwoBgHpWDTw==
underscores and slashes, plus and minus signs, two equal signs at the end
is it possible to serialize byte array exactly as it is done by web-server?
I have an idea to serialize it with Json and then replace minus with plus, underscore with slash and remove last two equal signs.
Any other method to get it immediately out of the box?
Regards

In urls there is different variant of Base64 used with - and _ which doesn't require additional encoding (e.g. + would be encoded to %2B). For this you can simply use string Replace method to replace those characters.
If you want an out-of-the box solution you can try Microsoft.IdentityModel.Tokens nuget package:
var encoded = Base64UrlEncoder.Encode(someString);
var decoded = Base64UrlEncoder.Decode(encoded);
For more info: https://en.wikipedia.org/wiki/Base64#URL_applications

Xml exception due to leading unicode character in REST API response

When I try to parse a response from a certain REST API, I'm getting an XmlException saying "Data at the root level is invalid. Line 1, position 1." Looking at the XML it looks fine, but then examining the first character I see that it is actually a zero-width no-break space (character code 65279 or 0xFEFF).
Is there any good reason for that character to be there? Maybe I'm supposed to be setting a different Encoding when I make my request? Currently I'm using Encoding.UTF8.
I've thought about just removing the character from the string, or asking the developer of the REST API to fix it, but before I do either of those things I wanted to check if there is a valid reason for that character to be there. I'm no unicode expert. Is there something different I should be doing?
Edit: I suspected that it might be something like that (BOM). So, the question becomes, should I have to deal with this character specially? I've tried loading the XML two ways and both throw the same exception:
public static User GetUser()
{
WebClient req = new WebClient();
req.Encoding = Encoding.UTF8;
string response = req.DownloadString(url);
XmlSerializer ser = new XmlSerializer(typeof(User));
User user = ser.Deserialize(new StringReader(response)) as User;
XElement xUser = XElement.Parse(response);
...
return user;
}

U+FFEF is a byte order mark. It's there at the start of the document to indicate the character encoding (or rather, the byte-order of an encoding which could be either way; most specifically UTF-16). It's entirely reasonable for it to be there at the start of an XML document. Its use as a zero-width non-breaking space is deprecated in favour of U+2060 instead.
It would be unreasonable if the byte-order mark was in a different encoding, e.g. if it were a UTF-8 BOM in a document which claimed to be UTF-8.
How are you loading the document? Perhaps you're specifying an inappropriate encoding somewhere? It's best to let the XML API detect the encoding if at all possible.
EDIT: After you've download it as a string, I can imagine that could cause problems... given that it's used to detect the encoding, which you've already got. Don't download it as a string - download it as binary data (WebClient.DownloadData) and then you should be able to parse it okay, I believe. However, you probably still shouldn't use XElement.Parse as there may well be a document declaration - use XDocument.Parse. I'd be slightly surprised if the result of the call could be fed straight into XmlSerializer, but you can have a go... wrap it in a MemoryStream if necessary.

That is called a Byte Order Mark. It's not required in UTF-8 though.

Instead of using Encoding.UTF8, create your own UTF-8 encoder, using the constructor overload that lets you specify whether or not the BOM is to be emitted:
req.Encoding = new UTF8Encoding( false ) ; // omit the BOM
I believe that will do the trick for you.
Amended to Note: The following will work:
public static User GetUser()
{
WebClient req = new WebClient();
req.Encoding = Encoding.UTF8;
byte[] response = req.DownloadData(url);
User instance ;
using ( MemoryStream stream = new MemoryStream(buffer) )
using ( XmlReader reader = XmlReader.Create( stream ) )
{
XmlSerializer serializer = new XmlSerializer(typeof(User)) ;
instance = (User) serializer.Deserialize( reader ) ;
}
return instance ;
}

That character at the beginning is the BOM (Byte Order Mark). It's placed as the first character in unicode text files to specify which encoding was used to create the file.
The BOM should not be part of the response, as the encoding is specified differently for HTTP content.
Typically a BOM in the response comes from sending a text file as response, where the text file was saved with the BOM signature. Visual Studio for example has an option to save a file without the BOM signature so that it can be send directly as a response.

C# UTF8 encoding

I have a c# program that retrieve some JSON data and use Newtonsoft JSON to Deserialize it.
as i use persian chars in my program the JSON codes will be shown like this:\u060c \u067e\u0644\u0627\u06a9 .... also after i retrive the JSON data in my program this chars still show like its coded sample.but after i Deserialize it converted to ???? chars.
what should i do?

Your JSON deserializer is broken; \uXXXX is supposed to be turned into proper characters.
To do that yourself, use this function
// Turns every occurrence of \uXXXX into a proper character
void UnencodeJSONUnicode(string str) {
return Regex.Replace(str,
#"\\u(?<value>[0-9a-f]{4})",
match => {
string digits = match.Groups["value"].Value;
int number = int.Parse(digits, NumberStyles.HexNumber);
return char.ConvertFromUtf32(number);
});
}
(Untested code; I don't have VS available at the moment. Some exception handling would probably be nice too)

Looks like it has been JSON encoded, so you need to decode it. The DataContractJsonSerializer class can do this.
See this MSDN link for more information.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Problem when serializing objects with strings that contain "/" - c#

Related

Prevent Newtonsoft JSON deserializer from accepting strings with new lines (enforce strict JSON rules)?

Characters added and wrong output during serialization with Json.NET

convert byte array to string but not with Convert.ToBase64

Xml exception due to leading unicode character in REST API response

C# UTF8 encoding

Categories

Resources