XmlSerializer & Activator.CreateInstance() - c#

Okay, we all know that it is not possible to use the XmlSerializer for classes without a parameterless constuctor as the deserializer will create the object and set all properties. However, by using Activator.CreateInstance() one can instanciate classes without a parameterless constructor. For example we could instanciate the following class:
public class Foo
{
public Foo(string bar){}
}
That class can easily be instanciated with the Activator:
Activator.CreateInstance(typeof(Foo),"some string");
Unfortunantely 'Foo' cannot be serialized using the XmlSerializer as it has no parameterless constructor. Why is there no way to use the XmlSerializer like that:
new XmlSerializer(typeof(Foo)).Deserialize(stream,"some string");
Of course I could implement my own Serializer that simply will store the type and all properites & fields of an object and then will use the Activator to instanciate the object and set the previously stores properties. The question is: would that make sense? I guess there must be a strong reason against because otherwise that would be implemented already, right?!?

XmlSerializer works via C# code-generation and dynamic compilation; because it uses the C# compiler, it is necessary that the code it generates would compile - noting that it must follow the rules as a separate assembly (no internal or private access).
Basically, it wants to use new Foo(), because it literally emits the C# code "new Foo()" which is fed to the compiler.
Yes, it could have chosen to use a different instantiation technique, but: new Foo() is the authors chose to go with - and it is a reasonable default.
Some other serializers choose to use Activator, and others still use sneaky IL techniques that provide direct access to non-public methods without any indirection.

Related

Passing C# parameters which can "fit" an interface, but do not actually implement it

Note: I know this is an awful idea in practice; I'm just curious about what the CLR allow you to do, with the goal of creating some sort of 'modify a class after creating it' preprocessor.
Suppose I have the following class, which was defined in another assembly so I can't change it.
class Person {
public string Greet() => "Hello!";
}
I now define an interface, and a method, like the following:
interface IGreetable {
string Greet();
}
// ...
void PrintGreeting(IGreetable g) => Console.WriteLine(g.Greet());
The class Person does not explicity implement IGreetable, but it could do without any modification to its methods.
With that, is there any way whatsoever, using Reflection, the DLR or anything else, in which an instance of Person could be passed successfully to PrintGreeting without modifying any of the code above?
Try to use the library Impromptu-Interface
[The Impromptu-Interface] framework to allow you to wrap any object (static or dynamic) with a static interface even though it didn't inherit from it. It does this by emitting cached dynamic binding code inside a proxy.
This allows you to do something like this:
var person = new Person();
var greeter = person.ActLike<IGreetable>();
You could use a dynamic wrapper object to wire this up yourself, but you lose type safety inside the wrapping class:
class GreetableWrapper : IGreetable
{
private dynamic _wrapped;
public GreetableWrapper(dynamic wrapped)
{
_wrapped = wrapped;
}
public string Greet()
{
return _wrapped.Greet();
}
}
static void PrintGreeting(IGreetable g) => Console.WriteLine(g.Greet());
static void Main(string[] args)
{
PrintGreeting(new GreetableWrapper(new Person()));
Console.ReadLine();
}
This may be quite easy soon. Type classes may be introduced to C# as shapes where you will be able to define features of a class and code against that shape and then make use of your code with any type that matches without the author of that code having to declare anything, pretty much as you describe.
The closest thing in C# right now is perhaps how foreach works with a type that has an GetEnumerator() returning an object of a type with a MoveNext() and Current even they don't implement IEnumerable etc. only while that is a built-in concept the compiler deals with, here you could define them.
Interestingly, it will also let you define static members.
I don't believe this is possible. The compiler needs to see something that explicitly implements the interface or class so that the compiler can confirm everything is implemented.
If you could do it using redirection, you could fail to implement something. And that goes against the safety approach embraced by .NET.
An option is creating a wrapper class over the person and pass this wrapper to the method, the wrapper need to explicitly implement the interface.
If you have control of the external code, and are willing to wrap the object (and it seems like all of the answers here wrap), dynamic binding and libraries like Impromptu-Interface seem to me like a lot of trouble for something that's essentially a one liner.
class GreetablePerson : Person, IGreetable { }
And you're done.
When the compiler is building up the GreetablePerson class, the method from Person ends up doing an implicit implementation of the interface, and everything "just works." The only irritation is that the code outside has to instantiate GreetablePerson objects, but in standard object oriented terminology, an instance of GreetablePerson is an instance of Person, so this seems to me like a valid answer to the question as asked.
If the requirements are changed such that you also have pre-existing instances of Person, then something like Impromptu-Interface may become more tempting, but even then you may want to consider giving GreetablePerson a constructor that copies from Person. Choosing the best path forward from there requires getting more details about the requirements and the actual implementation details of the Person class in question.
In sort of an unrelated not, this is something that is commonly done in other languages, such as Scala and Haskell.
It's known as using what are called "type classes". Type classes essentially allow you to define behavior for a type as if it explicitly implemented an interface, without actually requiring it to do so. You can read more about it here.

How to tell if Type has been generated by the yield return? How to create an instance of this class?

I'm reflecting over an Assembly and there are some types which have been generated by yield return, is there a reliable way to filter them out?
There's no default constructor for the Type generated by yield return. It is a class and not a value type. How to make an instance of this class?
The generated type-name looks like this:
SomeNamespace.RootMode+<GetMouseArrow>d__1, where RootMode is the class name and GetMouseArrow is the function which has yield-return in it.
Use case: Serialization of complete state of a running application, including the state of yield-return statements
UPDATE:
Types generated by the yield return have a single public constructor which takes int as parameter. This is the state variable used to determine where in the function the IEnumerator is.
The generated classes are private and sealed, they implement a set of IEnumerator*/IEnumerable* interfaces.
The generated class has [CompilerGenerated] attribute set to it.
The [CompilerGenerated] is most likely what you want. It's used not just for "iterator methods" (the name of the feature you're describing), but if you're looking to filter out types, you probably want to filter out all compiler-generated types (which would also include types generated for anonymous methods, and for async methods).
As far as instantiating an instance of such a class, you can use reflection, but of course you'll have to identify the parameter to pass and use the appropriate constructor overload. Without the default constructor, the usual serialization techniques won't work; you'll have to implement something yourself to handle this explicitly. I.e. get the appropriate value from within the class instance when serializing, and then pass it to the constructor…of course, to fully restore the state of the class, you'll need to serialize all the private fields and restore those again on deserialization, all via reflection.
I hope the value of implementing this is very high to you, because the cost sure will be. :)

A difference in style: IDictionary vs Dictionary

I have a friend who's just getting into .NET development after developing in Java for ages and, after looking at some of his code I notice that he's doing the following quite often:
IDictionary<string, MyClass> dictionary = new Dictionary<string, MyClass>();
He's declaring dictionary as the Interface rather than the Class. Typically I would do the following:
Dictionary<string, MyClass> dictionary = new Dictionary<string, MyClass>();
I'd only use the IDictionary interface when it's needed (say, for example to pass the dictionary to a method that accepts an IDictionary interface).
My question is: are there any merits to his way of doing things? Is this a common practice in Java?
If IDictionary is a "more generic" type than Dictionary then it makes sense to use the more generic type in declaring variables. That way you don't have to care as much about the implementing class assigned to the variable and you can change the type easily in the future without having to change a lot of following code. For example, in Java it's often considered better to do
List<Integer> intList=new LinkedList<Integer>();
than it is to do
LinkedList<Integer> intList=new LinkedList<Integer>();
That way I'm sure all following code treats the list as a List and not a LinkedList, making it easy in the future to switch out LinkedList for Vector or any other class which implements List. I'd say this is common to Java and good programming in general.
This practice isn't just limited to Java.
It's often used in .NET as well when you want to de-couple the instance of the object from the class you're using. If you use the Interface rather than the Class, you can change the backing type whenever needed without breaking the rest of your code.
You'll also see this practice used heavily with dealing with IoC containers and instanciation using the Factory pattern.
Your friend is following the very useful principle:
"Abstract yourself from implementation details"
You should always attempt to program to the interface rather than the concrete class.
In Java or any other object oriented programming language.
In .NET world is common to use an I to indicate that is an interface what your're using. I think this is more common because in C# they don't have implements and extends to refer class vs interface inheritance.
I think whey would type
class MyClass:ISome,Other,IMore
{
}
And you can tell ISome an IMore are interfaces while Other is a class
In Java there is no need for such a thing
class MyClass extends Other implements Some, More {
}
The concept still applies, you should try to code to the interface.
For local variables and private fields, which are already implementation details, it's better to use concrete types than interfaces for the declarations because the concrete classes offer a performance boost (direct dispatch is faster than virtual/interface dispatch). The JIT will also be able to more easily inline methods if you don't unnecessarily cast to interface types in the local implementation details. If an instance of a concrete type is returned from a method that returns an interface, the cast is automatic.
Most often, you see the interface type (IDictionary) used when the member is exposed to external code, whether that be outside the assembly or just outside the class. Typically, most developers use the concrete type internally to a class definition while they expose an encapsulated property using the interface type. In this way, they can leverage the concrete type's capabilities, but if they change the concrete type, the declaring class's interface doesn't need to change.
public class Widget
{
private Dictionary<string, string> map = new Dictionary<string, string>();
public IDictionary<string, string> Map
{
get { return map; }
}
}
later can become:
class SpecialMap<TKey, TValue> : IDictionary<TKey, TValue> { ... }
public class Widget
{
private SpecialMap<string, string> map = new SpecialMap<string, string>();
public IDictionary<string, string> Map
{
get { return map; }
}
}
without changing Widget's interface and having to change other code already using it.
IDictionary is an interface and Dictionary is a class.
Dictionary implements IDictionary.
That means that this is possible to refer to Dictionary instance with/by IDictionary instance and invoke most of the Dictionary methods and properties through IDictionary instance.
This is very recommended to use interfaces as many as possible, because interfaces abstracts the modules and assemblies of the applications, allows polymorphism, which is both very common and useful in many situations and cases and allows replacing one module by another without touching the other modules.
Suppose that in the present, the programmer wrote:
IDictionary<string> dictionary = new Dictionary<string>();
And now dictionary invokes the methods and properties of Dictionary<string>.
In the future the databases has been grown up in size and we find out that Dictionary<string> is too slow, so we want to replace Dictionary<string> by RedBlackTree<string>, which is faster.
So all what is needed to be done is replacing the above instruction to:
IDictionary<string> dictionary = new RedBlackTree<string>();
Of course that if RedBlackTree implements IDictionary then all the code of the application compiles successfully and you have a newer version of your application, where the application now performs faster and more efficient than the previous version.
Without interfaces, this replacement would be more difficult to do and would require the programmers and developers to change more code that is potential to bugs.
As far as I've seen Java developers tend to use abstraction (and design patterns) more often than .NET developers. This seems another example of it: why declare the concrete class when he'll essentially only be working with the interface members?
In the described situation almost every Java developer would use the interface to declare the variable. The way the Java collections are used is probably one of the best examples:
Map map = new HashMap();
List list = new ArrayList();
Guess it just accomplishes loose coupling in a lot of situations.
Java Collections include a multitude of implementations. Therefore, it's much easier for me to make use of
List<String> myList = new ArrayList<String>();
Then in the future when I realize I need "myList" to be thread safe to simply change this single line to
List<String> myList = new Vector<String>();
And change no other line of code. This includes getters/setters as well. If you look at the number of implementations of Map for example, you can imagine why this might be good practice. In other languages, where there is only a couple implementations for something (sorry, not a big .NET guy) but in Objective-C there is really only NSDictionary and NSMutableDictionary. So, it doesn't make as much sense.
Edit:
Failed to hit on one of my key points (just alluded to it with the getter/setters).
The above allows you to have:
public void setMyList(List<String> myList) {
this.myList = myList;
}
And the client using this call need not worry about the underlying implementation. Using whatever object that conforms to the List interface that they may have.
Coming from a Java world, I agree that the "program to an interface" mantra is drilled into you. By programming to an interface, not an implementation, you make your methods more extensible to future needs.
I've found that for local variables it generally doesn't much matter whether you use the interface or the concrete class.
Unlike class members or method signatures, there is very little refactoring effort if you decide to change types, nor is the variable visible outside its usage site. In fact, when you use var to declare locals, you are not getting the interface type but rather the class type (unless you explicitly cast to the interface).
However, when declaring methods, class members, or interfaces, I think that it will save you quite a bit of headache to use the interface type up front, rather than coupling the public API to a specific class type.
Using interfaces means that "dictionary" in the following code might be any implementation of IDictionary.
Dictionary1 dictionary = new Dictionary1();
dictionary.operation1(); // if operation1 is implemented only in Dictionary1() this will fail for every other implementation
It's best seen when you hide the construction of the object:
IDictionary dictionary = DictionaryFactory.getDictionary(...);
I've encountered the same situation with a Java developer. He instantiates collections AND objects to their interface in the same way.
For instance,
IAccount account = new Account();
Properties are always get/set as interfaces. This causes problems with serialization, which is explained very well here

Why does XmlSerializer requires the classes of the serialized object declared as public?

It's totally well known that in order to be able to serialize your objects using XmlSerializer you have to declare their classes as public -otherwise you get an InvalidOperationException. The question here is why? I Googled and I found out that XmlSerializer actually generates and compiles a brand new assembly and then uses this assembly to serialize your objects. The question is, still, why does it require the class to be public, while it's easy to get access to internal types in my assembly using reflection?
Quite simply because it doesn't use reflection in order to serialise/deserialise your class - it access the public properties (and classes) directly.
Using refleciton to access members would be extremely expensive so instead, as you mention in your question, it generates a serializer class once using reflection, caches it*, and from this point onwards uses direct member access.
I should qualify this: it only generates a serializer once and caches it for certain constructor overloads on the XmlSerializer. For others, it re-generates the serializer class every time you create an instance of the serializer.
As long as you use the vanilla constructor you are alright:
XmlSerializer ser = new XmlSerializer(typeof(MyType));
The simple reason is because it's been that way since Day 1.
Also, Reflection is expensive. Why do it if you don't have to?
Also, the XML Serializer isn't intended to serialize every class in the world. It's meant to serialize classes designed to be serialized. As such, it's no great burden to make sure the data you want is in public fields and properties of a public class with a public parameterless constructor.
It's only when you try to serialize a type that was not designed to be serialized that you run into trouble.

Why XML-Serializable class need a parameterless constructor

I'm writing code to do Xml serialization. With below function.
public static string SerializeToXml(object obj)
{
XmlSerializer serializer = new XmlSerializer(obj.GetType());
using (StringWriter writer = new StringWriter())
{
serializer.Serialize(writer, obj);
return writer.ToString();
}
}
If the argument is a instance of class without parameterless constructor, it will throw a exception.
Unhandled Exception:
System.InvalidOperationException:
CSharpConsole.Foo cannot be serialized
because it does not have a
parameterless constructor. at
System.Xml.Serialization.TypeDesc.CheckSupported()
at
System.Xml.Serialization.TypeScope.GetTypeDesc(Type
type, MemberInfo sourc e, Boolean
directReference, Boolean throwOnError)
at
System.Xml.Serialization.ModelScope.GetTypeModel(Type
type, Boolean direct Reference) at
System.Xml.Serialization.XmlReflectionImporter.ImportTypeMapping(Type
type , XmlRootAttribute root, String
defaultNamespace) at
System.Xml.Serialization.XmlSerializer..ctor(Type
type, String defaultName space) at
System.Xml.Serialization.XmlSerializer..ctor(Type
type)
Why must there be a parameterless constructor in order to allow xml serialization to succeed?
EDIT: thanks for cfeduke's answer. The parameterless constructor can be private or internal.
During an object's de-serialization, the class responsible for de-serializing an object creates an instance of the serialized class and then proceeds to populate the serialized fields and properties only after acquiring an instance to populate.
You can make your constructor private or internal if you want, just so long as it's parameterless.
This is a limitation of XmlSerializer. Note that BinaryFormatter and DataContractSerializer do not require this - they can create an uninitialized object out of the ether and initialize it during deserialization.
Since you are using xml, you might consider using DataContractSerializer and marking your class with [DataContract]/[DataMember], but note that this changes the schema (for example, there is no equivalent of [XmlAttribute] - everything becomes elements).
Update: if you really want to know, BinaryFormatter et al use FormatterServices.GetUninitializedObject() to create the object without invoking the constructor. Probably dangerous; I don't recommend using it too often ;-p See also the remarks on MSDN:
Because the new instance of the object
is initialized to zero and no
constructors are run, the object might
not represent a state that is regarded
as valid by that object. The current
method should only be used for
deserialization when the user intends
to immediately populate all fields. It
does not create an uninitialized
string, since creating an empty
instance of an immutable type serves
no purpose.
I have my own serialization engine, but I don't intend making it use FormatterServices; I quite like knowing that a constructor (any constructor) has actually executed.
The answer is: for no good reason whatsoever.
Contrary to its name, the XmlSerializer class is used not only for serialization, but also for deserialization. It performs certain checks on your class to make sure that it will work, and some of those checks are only pertinent to deserialization, but it performs them all anyway, because it does not know what you intend to do later on.
The check that your class fails to pass is one of the checks that are only pertinent to deserialization. Here is what happens:
During deserialization, the XmlSerializer class will need to create
instances of your type.
In order to create an instance of a type, a constructor of that type
needs to be invoked.
If you did not declare a constructor, the compiler has already
supplied a default parameterless constructor, but if you did declare
a constructor, then that's the only constructor available.
So, if the constructor that you declared accepts parameters, then the
only way to instantiate your class is by invoking that constructor
which accepts parameters.
However, XmlSerializer is not capable of invoking any constructor
except a parameterless constructor, because it does not know what
parameters to pass to constructors that accept parameters. So, it checks to see if your class has a parameterless constructor, and since it does not, it fails.
So, if the XmlSerializer class had been written in such a way as to only perform the checks pertinent to serialization, then your class would pass, because there is absolutely nothing about serialization that makes it necessary to have a parameterless constructor.
As others have already pointed out, the quick solution to your problem is to simply add a parameterless constructor. Unfortunately, it is also a dirty solution, because it means that you cannot have any readonly members initialized from constructor parameters.
In addition to all this, the XmlSerializer class could have been written in such a way as to allow even deserialization of classes without parameterless constructors. All it would take would be to make use of "The Factory Method Design Pattern" (Wikipedia). From the looks of it, Microsoft decided that this design pattern is far too advanced for DotNet programmers, who apparently should not be unnecessarily confused with such things. So, DotNet programmers should better stick to parameterless constructors, according to Microsoft.
Seems nobody actually read the original post... it is about SERIALIZATION and not DE-... and for this, no constructors are needed or called at all. The issue is simply poor coding practice from Microsoft.
First of all, this what is written in documentation. I think it is one of your class fields, not the main one - and how you want deserialiser to construct it back w/o parameterless construction ?
I think there is a workaround to make constructor private.

Categories