I have a situation where I'd like to build MVC style views at runtime using their EditorFor/DisplayFor templates (or something similar).
Ideally our application would let the user choose which fields they want in their UI (so they can add /remove any as they see fit), to this end I thinking it'd be handy to be create viewmodel classess at runtime and add the various dataannotation attributes to them according to what user selects (ie. stringlength, required etc).
One thing I need to be able to support is changing of the generated classes at runtime without affecting other users or having to do a full iisreset.
To go about this I've been doing a bit of research and it seems like there might be 3 different approaches, CodeDom, RunSharp / Relfection.Emit,Roslyn.
From what I can tell reflection.Emit/Runsharp would allow me to create the classes and add attibutes and properties to them at runtime and probably also modify them when I need to without adverse effects.
I'm not sure if Roslyn would allow this, I haven't been able to track down any simple examples of creating a class with properties or attributes in it, and I've seen a few mentions that Roslyn's output is immutable so I'm not sure how that goes for allowing me to modify it at a later date without adverse effects.
In general from what I've seen most people don't recommend CodeDom so I'm not entirely sure if I should bother going down that route.
Can anyone give me an idea of which of these directions might be viable for me?
So, none of these solutions are going to work, and honestly, generating type at runtime really isn't what you want here.
When it comes to the CLR, once you have a type with fields and methods, you can't really add new members or change members at runtime. The closest we come to doing that is the edit-and-continue features in Visual Studio, we're highly restricted to what changes we can make. We often 'cheat' by not adding methods or attributes where you think they added, but we hide them somewhere else and emit IL that references this secret location when you make an edit. Crazy things like removing members is entirely unsupported. Even if it was supported, lots of code likes to presume that doing someObject.GetType().GetMembers() returns the same thing over and over again.
As far as Roslyn is concerned, when we say the results are "immutable" we don't mean that puts any requirement on any IL that you might generate with it. Rather, when you ask Roslyn to parse something or analyze source code, the objects (syntax trees, type information, etc) is immutable. Still, it doesn't matter since you can't modify types in the CLR once they exist.
I'm with svick in his comment -- this isn't what you want to do. Use some appropriate data structures to represent your information at runtime, rather than trying to think of this as a concrete class that can be mutated somehow.
I'm after a means of deep cloning an object graph in a perfomant way. I'm going to have multiple threads cloning a graph extremely quickly such that they can play with some state and throw away the results if they're not interesting, returning to the original to try again.
I'm currently using a deep clone via binary serialization, which although it works, isn't amazingly fast. I've seen other libraries like protobuf, but the classes in my object graph may be defined in external assemblies, inheriting from classes in the main assembly and don't wish to add any complexity in those consuming assemblies if possible.
One of the interesting things I did come across was cloning using automatically generated IL. It seems it's not quite finished and I've posted to see if the author has done any more on it, but I'm guessing not. Has anyone else developed or seen a more fully functional way of deep cloning via IL? Or another method that is going to be fast?
Other than serialisation, I only consider three options:
Stick with serialisation, but customise it. This might be useful if you want to declaratively bin stuff and there are very likely performance gains to be had.
Reflection-based object walking, in conjunction with an IL emitter such as Fasterflect.
Code-gen or code your own cloning by literally assigning properties to each other (we have some old code that uses what we call a copy-constructor for this, takes an instance of itself and manually copies the properties / fields across).
We have some instances of code where we control the binary serialisation so that we can serialise an interned GUID table (we have lots of repeating GUIDs and serialise very large lists over .NET Remoting). It works well for us and we haven't needed a third party serialisation framework, however, it's hand-crafted stuff with a little code-gen.
The CSLA.NET framework features a class called UndoableBase that uses reflection to serialise a Hashtable of property/field values. Used for allowing rollbacks on objects in memory. This might fit with your "returning to original to try again" sentence.
Personally I'd look further into a reflection-based (preferably with emitted IL for better performance) solution, this then allows you to take advantage of class/member attributes for control over the cloning process. If performance is king, this may not cut it.
What is the simplest way to identify if a given method is reading or writing a member variable or property?
I am writing a tool to assist in an RPC system, in which access to remote objects is expensive. Being able to detect if a given object is not used in a method could allow us to avoid serializing its state. Doing it on source code is perfectly reasonable (but being able to do it on compiled code would be amazing)
I think I can either write my own simple parser, I can try to use one of the existing C# parsers and work with the AST. I am not sure if it is possible to do this with Assemblies using Reflection. Are there any other ways? What would be the simplest?
EDIT: Thanks for all the quick replies. Let me give some more information to make the question clearer. I definitely prefer correct, but it definitely shouldn't be extremely complex. What I mean is that we can't go too far checking for extremes or impossibles (as the passed-in delegates that were mentioned, which is a great point). It would be enough to detect those cases and assume everything could be used and not optimize there. I would assume that those cases would be relatively uncommon.
The idea is for this tool to be handed to developers outside of our team, that should not be concerned about this optimization. The tool takes their code and generates proxies for our own RPC protocol. (we are using protobuf-net for serialization only, but no wcf nor .net remoting). For this reason, anything we use has to be free or we wouldn't be able to deploy the tool for licensing issues.
You can have simple or you can have correct - which do you prefer?
The simplest way would be to parse the class and the method body. Then identify the set of tokens which are properties and field names of the class. The subset of those tokens which appears in the method body are the properties and field names you care about.
This trivial analysis of course is not correct. If you had
class C
{
int Length;
void M() { int x = "".Length; }
}
Then you would incorrectly conclude that M references C.Length. That's a false positive.
The correct way to do it is to write a full C# compiler, and use the output of its semantic analyzer to answer your question. That's how the IDE implements features like "go to definition".
Before attempting to write this kind of logic yourself, I would check to see if you can leverage NDepend to meet your needs.
NDepend is a code dependency analysis tool ... and much more. It implements a sophisticated analyzer for examining relationships between code constructs and should be able to answer that question. It also operates on both source and IL, if I'm not mistaken.
NDepend exposes CQL - Code Query Language - which allows you to write SQL-like queries against the relationships between structures in your code. NDepend has some support for scripting and is capable of being integrated with your build process.
To complete the LBushkin answer on NDepend (Disclaimer: I am one of the developer of this tool), NDepend can indeed help you on that. The Code LINQ Query (CQLinq) below, actually match methods that...
shouldn't provoque any RPC calls but
that are reading/writing any fields of any RPC types,
or that are reading/writing any properties of any RPC types,
Notice how first we define the 4 sets: typesRPC, fieldsRPC, propertiesRPC, methodsThatShouldntUseRPC - and then we match methods that violate the rule. Of course this CQLinq rule needs to be adapted to match your own typesRPC and methodsThatShouldntUseRPC:
warnif count > 0
// First define what are types whose call are RDC
let typesRPC = Types.WithNameIn("MyRpcClass1", "MyRpcClass2")
// Define instance fields of RPC types
let fieldsRPC = typesRPC.ChildFields()
.Where(f => !f.IsStatic).ToHashSet()
// Define instance properties getters and setters of RPC types
let propertiesRPC = typesRPC.ChildMethods()
.Where(m => !m.IsStatic && (m.IsPropertyGetter || m.IsPropertySetter))
.ToHashSet()
// Define methods that shouldn't provoke RPC calls
let methodsThatShouldntUseRPC =
Application.Methods.Where(m => m.NameLike("XYZ"))
// Filter method that should do any RPC call
// but that is using any RPC fields (reading or writing) or properties
from m in methodsThatShouldntUseRPC.UsingAny(fieldsRPC).Union(
methodsThatShouldntUseRPC.UsingAny(propertiesRPC))
let fieldsRPCUsed = m.FieldsUsed.Intersect(fieldsRPC )
let propertiesRPCUsed = m.MethodsCalled.Intersect(propertiesRPC)
select new { m, fieldsRPCUsed, propertiesRPCUsed }
My intuition is that detecting which member variables will be accessed is the wrong approach. My first guess at a way to do this would be to just request serialized objects on an as-needed basis (preferably at the beginning of whatever function needs them, not piecemeal). Note that TCP/IP (i.e. Nagle's algorithm) should stuff these requests together if they are made in rapid succession and are small
Eric has it right: to do this well, you need what amounts to a compiler front end. What he didn't emphasize enough is the need for strong flow analysis capabilities (or a willingness to accept very conservative answers possibly alleviated by user annotations). Maybe he meant that in the phrase "semantic analysis" although his example of "goto definition" just needs a symbol table, not flow analysis.
A plain C# parser could only be used to get very conservative answers (e.g., if method A in class C contains identifier X, assume it reads class member X; if A contains no calls then you know it can't read member X).
The first step beyond this is having a compiler's symbol table and type information (if method A refers to class member X directly, then assume it reads member X; if A contains **no* calls and mentions identifier X only in the context of accesses to objects which are not of this class type then you know it can't read member X). You have to worry about qualified references, too; Q.X may read member X if Q is compatible with C.
The sticky point are calls, which can hide arbitrary actions. An analysis based on just parsing and symbol tables could determine that if there are calls, the arguments refer only to constants or to objects which are not of the class which A might represent (possibly inherited).
If you find an argument that has an C-compatible class type, now you have to determine whether that argument can be bound to this, requiring control and data flow analysis:
method A( ) { Object q=this;
...
...q=that;...
...
foo(q);
}
foo might hide an access to X. So you need two things: flow analysis to determine whether the initial assignment to q can reach the call foo (it might not; q=that may dominate all calls to foo), and call graph analysis to determine what methods foo might actually invoke, so that you can analyze those for accesses to member X.
You can decide how far you want to go with this simply making the conservative assumption "A reads X" anytime you don't have enough information to prove otherwise. This will you give you a "safe" answer (if not "correct" or what I'd prefer to call "precise").
Of frameworks that might be helpful, you might consider Mono, which surely parses and builds symbol tables. I don't know what support it provides for flow analysis or call graph extraction; I would not expect the Mono-to-IL front-end compiler to do a lot of that, as people usually hide that machinery in the JIT part of JIT-based systems. A downside is that Mono may be behind the "modern C#" curve; last time I heard, it handled only C# 2.0 but my information may be stale.
An alternative is our DMS Software Reengineering Toolkit and its C# Front End.
(Not an open source product).
DMS provides general source code parsing, tree building/inspection/analysis, general symbol table support and built-in machinery for implementing control-flow analysis, data flow analysis, points-to analysis (needed for "What does object O point to?"), and call graph construction. This machinery has all been tested by fire with DMS's Java and C front ends, and the symbol table support has been used to implement full C++ name and type resolution, so its pretty effective. (You don't want to underestimate the work it takes to build all that machinery; we've been working on DMS since 1995).
The C# Front End provides for full C# 4.0 parsing and full tree building. It presently does not build symbol tables for C# (we're working on this) and that's a shortcoming compared to Mono. With such a symbol table, however, you would have access to all that flow analysis machinery (which has been tested with DMS's Java and C front ends) and that might be a big step up from Mono if it doesn't provide that.
If you want to do this well, you have a considerable amount of work in front of you. If you want to stick with "simple", you'll have to do with just parsing the tree and being OK with being very conservative.
You didn't say much about knowing if a method wrote to a member. If you are going to minimize traffic the way you describe, you want to distinguish "read", "write" and "update" cases and optimize messages in both directions. The analysis is obviously pretty similar for the various cases.
Finally, you might consider processing MSIL directly to get the information you need; you'll still have the flow analysis and conservative analysis issues. You might find the following technical paper interesting; it describes a fully-distributed Java object system that has to do the same basic analysis you want to do,
and does so, IIRC, by analyzing class files and doing massive byte code rewriting.
Java Orchestra System
By RPC do you mean .NET Remoting? Or DCOM? Or WCF?
All of these offer the opportunity to monitor cross process communication and serialization via sinks and other constructs, but they are all platform specific, so you'll need to specify the platform...
You could listen for the event that a property is being read/written to with an interface similar to INotifyPropertyChanged (although you obviously won't know which method effected the read/write.)
I think the best you can do is explicitly maintain a dirty flag.
I have a class in an SDK, for which every property I am in interested in calling. I know that the only way (I think the only way this is), is to use reflection, which most people claim as being slow etc (although I've seen articles which illustrate how in some cases it is not as slow as originally thought).
Is there a better way than to loop through and invoke each property in the target class?
Also, why is reflection deemed to be so slow?
It might be worth taking a looking at TypeDescriptors. As far as I am aware they have some performance benefits over using reflection and work in a slightly different way (they cache metadata for example). The MSDN article confused me in the way it describes how reflection is used by type descriptors, so you might need to find a more expansive explanation (therfore the 3rd link might be more helpful) .
The API for type descriptors is similar to that used for reflection.
Navigate to:
http://msdn.microsoft.com/en-us/library/ms171819.aspx
http://msdn.microsoft.com/en-us/library/system.componentmodel.typedescriptor.aspx
And
http://blogs.msdn.com/b/parthopdas/archive/2006/01/03/509103.aspx
Soom loose answers to your questions then:
1) Because of caching and a slightly different implementation to reflection TypeDescriptors my provide a performance improvement over relfection alone
2) You may be able to retrieve the properties and (invoke/set/get?) the properties in one fell swoop. This may be a case of calling an invoke type method and writing a lambda statement to peform some action on the collection returned?
You can use reflection to generate C# code accessing directly to all properties you are interested in. That would be a faster way to perform the calls.
I think Reflection is not a bad option for you though. It's not that slow.
I would use reflection to generate the code that calls all the properties. Then you don't have to worry about reflection being slow.
I have heard people state that Code Generators and T4 templates should not be used. The logic behind that is that if you are generating code with a generator then there is a better more efficient way to build the code through generics and templating.
While I slightly agree with this statement above, I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :
return new T();
Additionally, if I want to generate code based on database values I have found that using Microsoft.SqlServer.Management.SMO in conjunction with T4 templates have been wonderful at generating mass amounts of code without having to copy / paste or use resharper.
Many of the problems I have found with Generics too is that to my shock there are a lot of developers who do not understand them. When I do examine generics for a solution, there are times where it gets complicated because C# states that you cannot do something that may seem logical in my mind.
What are your thoughts? Do you prefer to build a generator, or do you prefer to use generics? Also, how far can generics go? I know a decent amount about generics, but there are traps and pitfalls that I always run into that cause me to resort to a T4 template.
What is the more proper way to handle scenarios where you need a large amount of flexibility? Oh and as a bonus to this question, what are good resources on C# and Generics?
You can do new T(); if you do this
public class Meh<T>
where T : new()
{
public static T CreateOne()
{
return new T();
}
}
As for code-generators. I use one every day without any problems. I'm using one right now in fact :-)
Generics solve one problem, code-generators solve another. For example, creating a business model using a UML editor and then generating your classes with persistence code as I do all of the time using this tool couldn't be achieved with generics, because each persistent class is completely different.
As for a good source on generics. The best has got to be Jon Skeet's book of course! :-)
As the originator of T4, I've had to defend this question quite a few times as you can imagine :-)
My belief is that at its best code generation is a step on the way to producing equivalent value using reusable libraries.
As many others have said, the key concept to maintain DRY is never, ever changing generated code manually, but rather preserving your ability to regenerate when the source metadata changes or you find a bug in the code generator. At that point the generated code has many of the characteristics of object code and you don't run into copy/paste type problems.
In general, it's much less effort to produce a parameterized code generator (especially with template-based systems) than it is to correctly engineer a high quality base library that gets the usage cost down to the same level, so it's a quick way to get value from consistency and remove repetition errors.
However, I still believe that the finished system would most often be improved by having less total code. If nothing else, its memory footprint would almost always be significantly smaller (although folks tend to think of generics as cost free in this regard, which they most certainly are not).
If you've realised some value using a code generator, then this often buys you some time or money or goodwill to invest in harvesting a library from the generated codebase. You can then incrementally reengineer the code generator to target the new library and hopefully generate much less code. Rinse and repeat.
One interesting counterpoint that has been made to me and that comes up in this thread is that rich, complex, parametric libraries are not the easiest thing in terms of learning curve, especially for those not deeply immersed in the platform. Sticking with code generation onto simpler basic frameworks can produce verbose code, but it can often be quite simple and easy to read.
Of course, where you have a lot of variance and extremely rich parameterization in your generator, you might just be trading off complexity an your product for complexity in your templates. This is an easy path to slide into and can make maintenance just as much of a headache - watch out for that.
Generating code isn't evil and it doesn't smell! The key is to generate the right code at the right time. I think T4 is great--I only use it occasionally, but when I do it is very helpful. To say, unconditionally, that generating code is bad is unconditionally crazy!
It seems to me code generators are fine as long as the code generation is part of your normal build process, rather than something you run once and then keep its output. I add this caveat because if just use the code generator once and discard the data that created it, you're just automatically creating a massive DRY violation and maintenance headache; whereas generating the code every time effectively means that whatever you are using to do the generating is the real source code, and the generated files are just intermediate compile stages that you should mostly ignore.
Lex and yacc are classic examples of tools of allow you to specify functionality in an efficient manner and generate efficient code from it. Trying to do their jobs by hand will lengthen your development time and probably produce less efficient and less readable code. And while you could certainly incorporate something like lex and yacc directly into your code and do their jobs at run time instead of at compile time, that would certainly add considerable complexity to your code and slow it down. If you actually need to change your specification at run time it might be worth it, but in most normal cases using lex/yacc to generate code for you at compile time is a big win.
A good percentage of what is in Visual Studio 2010 would not be possible without code generation. Entity Framework would not be possible. The simple act of dragging and dropping a control onto a form would not be possible, nor would Linq. To say that code generation should not be used is strange as so many use it without even thinking about it.
Maybe it is a bit harsh, but for me code generation smells.
That code generation is used means that there are numerous underlying common principles which may be expressed in a "Don't repeat yourself" fashion. It may take a bit longer, but it is satisfying when you end up with classes that only contain the bits that really change, based on an infrastructure that contains the mechanics.
As to Generics...no I don't have too many issues with it. The only thing that currently doesn't work is saying that
List<Animal> a = new List<Animal>();
List<object> o = a;
But even that will be possible in the next version of C#.
Code generation is for me a workaround for many problems found in language, frameworks, etc. They are not evil by themselves, I would say it is very very bad (i.e. evil) to release a language (C#) and framework which forces you to copy&paste (swap on properties, events triggering, lack of macros) or use magical numbers (wpf binding).
So, I cry, but I use them, because I have to.
I've used T4 for code generation and also Generics. Both are good, have their pros and cons, and are suited for different purposes.
In my case, I use T4 to generate Entities, DAL and BLL based on a database schema. However, DAL and BLL reference a mini-ORM I built, based on Generics and Reflection. So I think you can use them side by side, as long as you keep in control and keep it small and simple.
T4 generates static code, while Generics is dynamic. If you use Generics, you use Reflection which is said to be less performant than "hard-coded" solution. Of course you can cache reflection results.
Regarding "return new T();", I use Dynamic Methods like this:
public class ObjectCreateMethod
{
delegate object MethodInvoker();
MethodInvoker methodHandler = null;
public ObjectCreateMethod(Type type)
{
CreateMethod(type.GetConstructor(Type.EmptyTypes));
}
public ObjectCreateMethod(ConstructorInfo target)
{
CreateMethod(target);
}
void CreateMethod(ConstructorInfo target)
{
DynamicMethod dynamic = new DynamicMethod(string.Empty,
typeof(object),
new Type[0],
target.DeclaringType);
ILGenerator il = dynamic.GetILGenerator();
il.DeclareLocal(target.DeclaringType);
il.Emit(OpCodes.Newobj, target);
il.Emit(OpCodes.Stloc_0);
il.Emit(OpCodes.Ldloc_0);
il.Emit(OpCodes.Ret);
methodHandler = (MethodInvoker)dynamic.CreateDelegate(typeof(MethodInvoker));
}
public object CreateInstance()
{
return methodHandler();
}
}
Then, I call it like this:
ObjectCreateMethod _MetodoDinamico = new ObjectCreateMethod(info.PropertyType);
object _nuevaEntidad = _MetodoDinamico.CreateInstance();
More code means more complexity. More complexity means more places for bugs to hide, which means longer fix cycles, which in turn means higher costs throughout the project.
Whenever possible, I prefer to minimize the amount of code to provide equivalent functionality; ideally using dynamic (programmatic) approaches rather than code generation. Reflection, attributes, aspects and generics provide lots of options for a DRY strategy, leaving generation as a last resort.
Generics and code generation are two different things. In some cases you could use generics instead of code generation and for those I believe you should. For the other cases code generation is a powerful tool.
For all the cases where you simply need to generate code based on some data input, code generation is the way to go. The most obvious, but by no means the only example is the forms editor in Visual Studio. Here the input is the designer data and the output is the code. In this case generics is really no help at all, but it is very nice that VS simply generates the code based on the GUI layout.
Code generators could be considered a code smell that indicate a flaw or lack of functionality in the target langauge.
For example, while it has been said here that "Objects that persist can not be generalized", it would be better to think of it as "Objects in C# that automatically persist their data can not be generalized in C#", because I surely can in Python through the use of various methods.
The Python approach could, however, be emulated in static languages through the use of operator[ ](method_name as string), which either returns a functor or a string, depending on requirements. Unfortunately that solution is not always applicable, and returning a functor can be inconvenient.
The point I am making is that code generators indicate a flaw in a chosen language that are addressed by providing a more convenient specialised syntax for the specific problem at hand.
The copy/paste type of generated code (like ORMs make) can also be very useful...
You can create your database, and then having the ORM generate a copy of that database definition expressed in your favorite language.
The advantage comes when you change your original definition (the database), press compile and the ORM (if you have a good one) can re-generates your copy of the definition. Now all references to your database can be checked by the compilers type checker and your code will fail to compile when you're using tables or columns that do not exist anymore.
Think about this: If I call a method a few times in my code, am I not referring to the name I gave to this method originally? I keep repeating that name over and over... Language designers recognized this problem and came up with "Type-safety" as the solution. Not removing the copies (as DRY suggests we should do), but checking them for correctness instead.
The ORM generated code brings the same solution when referring to table and column names. Not removing the copies/references, but bringing the database definition into your (type-safe) language where you can refer to classes and properties instead. Together with the compilers type checking, this solves a similar problem in a similar way: Guarantee compile-time errors instead of runtime ones when you refer to outdated or misspelled tables (classes) or columns (properties).
quote:
I have not really found effective ways to build templates that can say for instance instantiate themselves. In otherwords I can never do :
return new T();
public abstract class MehBase<TSelf, TParam1, TParam2>
where TSelf : MehBase<TSelf, TParam1, TParam2>, new()
{
public static TSelf CreateOne()
{
return new TSelf();
}
}
public class Meh<TParam1, TParam2> : MehBase<Meh<TParam1, TParam2>, TParam1, TParam2>
{
public void Proof()
{
Meh<TParam1, TParam2> instanceOfSelf1 = Meh<TParam1, TParam2>.CreateOne();
Meh<int, string> instanceOfSelf2 = Meh<int, string>.CreateOne();
}
}
Why does being able to copy/paste really, really fast, make it any more acceptable?
That's the only justification for code generation that I can see.
Even if the generator provides all the flexibility you need, you still have to learn how to use that flexibility - which is yet another layer of learning and testing required.
And even if it runs in zero time, it still bloats the code.
I rolled my own data access class. It knows everything about connections, transactions, stored procedure parms, etc, etc, and I only had to write all the ADO.NET stuff once.
It's now been so long since I had to write (or even look at) anything with a connection object in it, that I'd be hard pressed to remember the syntax offhand.
Code generation, like generics, templates, and other such shortcuts, is a powerful tool. And as with most powerful tools, it amplifies the capaility of its user for good and for evil - they can't be separated.
So if you understand your code generator thoroughly, anticipate everything it will produce, and why, and intend it to do so for valid reasons, then have at it. But don't use it (or any of the other technique) to get you past a place where you're not to sure where you're headed, or how to get there.
Some people think that, if you get your current problem solved and some behavior implemented, you're golden. It's not always obvious how much cruft and opaqueness you leave in your trail for the next developer (which might be yourself.)