In .NET world, when it comes to object serialization, it usually goes into inspecting the object's fields and properties at runtime. Using reflection for this job is usually slow and is undesirable when dealing with large sets of objects. The other way is using IL emit or building expression trees that provide significant performance gain over reflection. And the latter is most modern libraries pick when dealing with serialization. However building and emitting IL at runtime takes time, and the investment is only paid back if this information is cached and reused for objects of the same type.
When using Json.NET, it is not clear to me which method described above is used, and if the latter is indeed used, whether the caching is used.
For example, when I do:
JsonConvert.SerializeObject(new Foo { value = 1 });
Does Json.NET build the Foo's member access info and cache to reuse it later?
Yes, it does. Json.NET caches type serialization information inside its IContractResolver classes DefaultContractResolver and CamelCasePropertyNamesContractResolver. Unless you specify a custom contract resolver, this information is cached and reused.
For DefaultContractResolver a global static instance is maintained internally that Json.NET uses whenever the application does not specify its own contract resolver. CamelCasePropertyNamesContractResolver, on the other hand, maintains static tables that are shared across all instances. (I believe the inconsistency arises from legacy issues; see here for details.)
Both of these types are designed to be fully thread-safe so sharing between threads should not be a problem.
If you choose to implement and instantiate your own contract resolver, then type information will only be cached and reused if you cache and reuse the contract resolver instance itself. Thus, Newtonsoft recommends:
For performance you should create a contract resolver once and reuse instances when possible. Resolving contracts is slow and implementations of IContractResolver typically cache contracts.
If memory consumption is a problem and for whatever reason you need to minimize the memory permanently taken by cached contracts, you can construct your own local instance of DefaultContractResolver (or some custom subclass), serialize using that, and then immediately remove all references to it, e.g.:
public class JsonExtensions
{
public static string SerializeObjectNoCache<T>(T obj, JsonSerializerSettings settings = null)
{
settings = settings ?? new JsonSerializerSettings();
bool reset = (settings.ContractResolver == null);
if (reset)
// To reduce memory footprint, do not cache contract information in the global contract resolver.
settings.ContractResolver = new DefaultContractResolver();
try
{
return JsonConvert.SerializeObject(obj, settings);
}
finally
{
if (reset)
settings.ContractResolver = null;
}
}
}
And if you are using CamelCasePropertyNamesContractResolver, switch to DefaultContractResolver with an appropriate naming strategy such as:
settings.ContractResolver = new DefaultContractResolver { NamingStrategy = new CamelCaseNamingStrategy() };
The majority of cached contract memory (but not all) will eventually get garbage collected. Of course, by doing this, serialization performance may suffer substantially. (Some tables containing reflected information about e.g. enum types and data contract attributes are shared globally and not reclaimed.)
For further information see Newtonsoft's Performance Tips: Reuse Contract Resolver.
Related
On our API we need to take in json, deserialize it to an interface, set a field, and ship it off. To achieve this, on both ends I'm setting the jsonConvert to use TypeNameHandling.All. The endpoint in question is supposed to be fairly locked down, but there's always a chance of someone gaining access and setting $type to a system class with a dangerous constructor or garbage collection method.
My question is would clarifying the namespace of the type before attempting to deserialize it be sufficiently safe? or would there still be a risk of having something like a sub-object with a dangerous class type in the json? If there is still a risk or an exploit I've missed, what other steps can I do to mitigate the danger?
Our company name is at the start of every namespace we use, so in the code below we just check that the type set in the json starts with our company name. The {} at the start is just so the compiler knows it doesn't need to keep the JObject in memory after the check.
{ //check the type is valid
var securityType = JsonConvert.DeserializeObject<JObject>(request.requestJson);
JToken type;
if (securityType.TryGetValue("$type", out type))
{
if (!type.ToString().ToLower().StartsWith("foo")) { //'foo' is our company name, all our namespaces start with foo
await logError($"Possible security violation, client tried to instantiate {type}", clientId: ClientId);
throw new Exception($"Request type {type} not supported, please use an IFoo");
}
}
else
{
throw new Exception("set a type...");
}
}
IFoo requestObject = JsonConvert.DeserializeObject<IFoo>(request.requestJson, new JsonSerializerSettings()
{
TypeNameHandling = TypeNameHandling.All
});
The risk with TypeNameHandling is that an attacker may trick the receiver into constructing an attack gadget - an instance of a type that when constructed, populated or disposed effects an attack on the receiving system. For an overview see
TypeNameHandling caution in Newtonsoft Json
External json vulnerable because of Json.Net TypeNameHandling auto?
If you are going to protect against such attacks by requiring all deserialized types to be in your own company's .Net namespace, be aware that, when serializing with TypeNameHandling.All, "$type" information will appear throughout the JSON token hierarchy, for all arrays and objects (including for .Net types such as List<T>). As such you must needs apply your "$type" check everywhere type information might occur. The easiest way to do this is with a custom serialization binder such as the following:
public class MySerializationBinder : DefaultSerializationBinder
{
const string MyNamespace = "foo"; //'foo' is our company name, all our namespaces start with foo
public override Type BindToType(string assemblyName, string typeName)
{
if (!typeName.StartsWith(MyNamespace, StringComparison.OrdinalIgnoreCase))
throw new JsonSerializationException($"Request type {typeName} not supported, please use an IFoo");
var type = base.BindToType(assemblyName, typeName);
return type;
}
}
Which can be used as follows:
var settings = new JsonSerializerSettings
{
SerializationBinder = new MySerializationBinder(),
TypeNameHandling = TypeNameHandling.All,
};
This has the added advantage of being more performant than your solution since pre-loading into a JObject is no longer required.
However, having done so, you may encounter the following issues:
Even if the root object is always from your company's namespace, the "$type" properties for nested values may not necessarily be in your companies namespace. Specifically, type information for harmless generic system collections such as List<T> and Dictionary<TKey, Value> as well as arrays will be included. You may need to enhance BindToType() to whitelist such types.
Serializing with TypeNameHandling.Objects or TypeNameHandling.Auto can ameliorate the need to whitelist such harmless system types, as type information for such system types is less likely to get included during serialization as compared to TypeNameHandling.All.
To further simplify the type checking as well as to reduce your attack surface overall, you might consider only allowing type information on the root object. To do that, see json.net - how to add property $type ONLY on root object. SuppressItemTypeNameContractResolver from the accepted answer can be used on the receiving side as well as the sending side, to ignore type information on non-root objects.
Alternatively, you could serialize and deserialize with TypeNameHandling.None globally and wrap your root object in a container marked with [JsonProperty(TypeNameHandling = TypeNameHandling.Auto)] like so:
public class Root<TBase>
{
[JsonProperty(TypeNameHandling = TypeNameHandling.Auto)]
public TBase Data { get; set; }
}
Since your root objects all seem to implement some interface IFoo you would serialize and deserialize a Root<IFoo> which would restrict the space of possible attack gadgets to classes implementing IFoo -- a much smaller attack surface.
Demo fiddle here.
When deserializing generics, both the outer generic and the inner generic parameter types may need to be sanitized recursively. For instance, if your namespace contains a Generic<T> then checking that the typeName begins with your company's namespace will not protect against an attack via a Generic<SomeAttackGadget>.
Even if you only allow types from your own namespace, it's hard to say that's enough to be sufficiently safe, because we don't know whether any of the classes in your own namespace might be repurposed as attack gadgets.
The normal approach to serialisation is to apply attributes to your class to describe how serialisation (or deserialization) is to proceed. For example:
[DataContract]
class MyClass
{
[DataMember]
public string Name { get; set; }
}
Is there a way to perform serialisation using JSON.NET without applying attributes to your class, but instead by providing a "sidecar" object that describes what aspects of the class are to be serialised, in some fashion.
The reason I ask relates to separation of concerns. If you have an API that is meant to be agnostic about how requests get to it, then the natural extension of that is that your API data structures should not be getting embellished with serialisation attributes.
Now of course I could take the "content" of one of my API result objects and copy it into another object having a class that does have appropriate serialisation attributes, but in some cases it would seem more desirable to say "Hey, I want to serialise this object, and the object has no serialisation attributes, so here is a separate data structure to describe what to do."
The other place where this would be handy, of course, is with third-party libraries where you have no opportunity to modify the objects (again, you could make copies of the values, but I'm looking for other ways).
You can use JsonSerializerSettings to specify various serialization options. You can specify whether to serialize or not a particular property, how to serialize a particular type or convert its value, and etc.
var settings = new JsonSerializerSettings
{
Formatting = Formatting.Indented,
.....
};
settings.Converters.Add(new StringEnumConverter { CamelCaseText = true });
settings.ContractResolver = new CamelCasePropertyNamesContractResolver();
settings.Binder = new SomeSerializationBinder(new DefaultSerializationBinder());
var result = JsonConvert.SerializeObject(yourObject, settings);
I use protobuf-net serializer like this:
ProtoBuf.Serializer.Serialize(fileStream, data);
How do I get a non-static serializer instance? I want to use it something like this:
var myProtobufNetSerializer = MyProtobufNetSerializerFactory();
myProtobufNetSerializer.Serialize(fileStream, data);
Edit:
Marc Gravell, the protobuf-net's author, replied (to this question) in his answer that it's possible, but I couldn't find how...
The important question I have is why do you want to do that? The static methods actually just expose the v1 API on the default instanc, aka RuntimeTypeModel.Default. So I could answer your question with just:
TypeModel serializer = RuntimeTypeModel.Default;
However, there would be very little benefit to doing this - you might just as well use the static methods. If, however, you want to do something more interesting, then you probably want a custom model:
RuntimeTypeModel serializer = RuntimeTypeModel.Create();
// exercise for reader: configure it, store it somewhere, re-use it
You should not create a new TypeModel per serialization required, since the TypeModel (or more specifically: RuntimeTypeModel) caches the generated strategies internally. It would be inefficient and a memory drain to keep doing this unnecessarily.
Times when you might not want to use the default type-model:
you need to support 2 different incompatible layouts (perhaps for versioning reasons) at the same time
you are using a runtime that doesn't support reflection-emit, and must use pre-built serializer types
you are doing unit testing of the library itself
probably a few others that I'm not remembering
In .NET world, when it comes to object serialization, it usually goes into inspecting the object's fields and properties at runtime. Using reflection for this job is usually slow and is undesirable when dealing with large sets of objects. The other way is using IL emit or building expression trees that provide significant performance gain over reflection. And the latter is most modern libraries pick when dealing with serialization. However building and emitting IL at runtime takes time, and the investment is only paid back if this information is cached and reused for objects of the same type.
When using Json.NET, it is not clear to me which method described above is used, and if the latter is indeed used, whether the caching is used.
For example, when I do:
JsonConvert.SerializeObject(new Foo { value = 1 });
Does Json.NET build the Foo's member access info and cache to reuse it later?
Yes, it does. Json.NET caches type serialization information inside its IContractResolver classes DefaultContractResolver and CamelCasePropertyNamesContractResolver. Unless you specify a custom contract resolver, this information is cached and reused.
For DefaultContractResolver a global static instance is maintained internally that Json.NET uses whenever the application does not specify its own contract resolver. CamelCasePropertyNamesContractResolver, on the other hand, maintains static tables that are shared across all instances. (I believe the inconsistency arises from legacy issues; see here for details.)
Both of these types are designed to be fully thread-safe so sharing between threads should not be a problem.
If you choose to implement and instantiate your own contract resolver, then type information will only be cached and reused if you cache and reuse the contract resolver instance itself. Thus, Newtonsoft recommends:
For performance you should create a contract resolver once and reuse instances when possible. Resolving contracts is slow and implementations of IContractResolver typically cache contracts.
If memory consumption is a problem and for whatever reason you need to minimize the memory permanently taken by cached contracts, you can construct your own local instance of DefaultContractResolver (or some custom subclass), serialize using that, and then immediately remove all references to it, e.g.:
public class JsonExtensions
{
public static string SerializeObjectNoCache<T>(T obj, JsonSerializerSettings settings = null)
{
settings = settings ?? new JsonSerializerSettings();
bool reset = (settings.ContractResolver == null);
if (reset)
// To reduce memory footprint, do not cache contract information in the global contract resolver.
settings.ContractResolver = new DefaultContractResolver();
try
{
return JsonConvert.SerializeObject(obj, settings);
}
finally
{
if (reset)
settings.ContractResolver = null;
}
}
}
And if you are using CamelCasePropertyNamesContractResolver, switch to DefaultContractResolver with an appropriate naming strategy such as:
settings.ContractResolver = new DefaultContractResolver { NamingStrategy = new CamelCaseNamingStrategy() };
The majority of cached contract memory (but not all) will eventually get garbage collected. Of course, by doing this, serialization performance may suffer substantially. (Some tables containing reflected information about e.g. enum types and data contract attributes are shared globally and not reclaimed.)
For further information see Newtonsoft's Performance Tips: Reuse Contract Resolver.
I have an application which supports multiple types and versions of some devices. It can connect to these devices and retrieve various information.
Depending on the type of the device, I have (among other things) a class which can contain various properties. Some properties are common to all devices, some are unique to a particular device.
This data is serialized to xml.
What would be a preferred way to implement a class which would support future properties in future versions of these devices, as well as be backwards compatible with previous application versions?
I can think of several ways, but I find none of them great:
Use a collection of name-value pairs:
pros: good backward compatibility (both xml and previous versions of my app) and extensibility,
cons: no type safety, no intellisense, requires implementation of custom xml serialization (to handle different value objects)
Create derived properties class for each new device:
pros: type safety
cons: have to use XmlInclude or custom serialization to deserialize derived classes, no backward compatibility with previous xml schema (although by implementing custom serialization I could skip unknown properties?), requires casting for accessing properties in derived classes.
Another way to do it?
I am using C#, by the way.
How about something similar to a PropertyBag ?
If you're not limited to interoperability with an external schema, then you should use Runtime Serialization and the SoapFormatter. The pattern for runtime serialization permits derived classes to specify which of their properties need to be serialized and what to do with them when deserialized.
The XML Serializer requires XmlInclude because, in effect, it needs to define the schema to use.
I like name/value sets for this sort of thing.
Many of your cons can be dealt with -- consider a base class that acts as a general name/value set with no-op methods for validating incoming name/value pairs. For known sets of names (i.e. keys), you can create derived classes that implement validation methods.
For example, Printer may have a known key "PrintsColor" that can only be "true" or "false". If someone tries to load PrintsColor = "CMYK", your Printer class would throw an exception.
Depending on what you're doing, you can go a few different ways in terms of making the validation more convenient -- utility methods in the base class (e.g. checkForValidBoolean()) or a base class that accepts name/type information in its constructor for cleaner code in your derived classes, and perhaps a mostly automated XML serialization.
For intellisense -- your derived classes could have basic accessors that are implemented in terms of the key lookup. Intellisense would present those accessor names.
This approach has worked well for me -- there's sort of a short-sightedness to classic OO design, especially for large systems with plugged-in components. IMO, the clunkier type checking here is a big of a drag, but the flexibility make it worthwhile.
I believe that creating derived properties is the best choice.
You can design your new classes using xml schema. And then just generate the class code with xsd.exe.
With .net isn't hard to develop a generic class that can serialize and deserialize all types to and from xml.
public static String toXmlString<T>(T value)
{
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
StringWriter stringWriter = new StringWriter();
try { xmlSerializer.Serialize(stringWriter, value); }
catch (Exception e)
{
throw(e);
}
finally { stringWriter.Dispose(); }
String xml = stringWriter.ToString();
stringWriter.Dispose();
return xml;
}
public static T fromXmlFile<T>(string fileName, Encoding encoding)
{
Stream stream;
try { stream = File.OpenRead(fileName); }
catch (Exception e)
{
e.Data.Add("File Name", fileName);
e.Data.Add("Type", typeof(T).ToString());
throw(e);
}
BufferedStream bufferedStream = new BufferedStream(stream);
XmlSerializer xmlSerializer = new XmlSerializer(typeof(T));
TextReader textReader;
if (encoding == null)
textReader = new StreamReader(bufferedStream);
else
textReader = new StreamReader(bufferedStream, encoding);
T value;
try { value = (T)xmlSerializer.Deserialize(textReader); }
catch (Exception e)
{
e.Data.Add("File Name", fileName);
e.Data.Add("Type", typeof(T).ToString());
throw(e);
}
finally
{
textReader.Dispose();
bufferedStream.Dispose();
}
return value;
}
Programatically speaking, this sounds like it might be a job for the Decorator Pattern. Essentially, you have a super class which defines a common interface for all these types of devices. Then you have decorator classes which have other properties which a device might have. And, when creating these devices, you can dynamically add these decorations to define new properties for the device. Graphically:
You can look at the Wikipedia page for a more detailed description. After that, it would just be a matter of doign some serialization to tell the program which decorators to load.
The general idea of what you're trying to accomplish here is precisely what the EAV pattern solves. EAV is a pattern most commonly used in database development but the concept is equally valid for applications.