I've written a multi-threaded windows service in C#. For some reason, csc.exe is being launched each time a thread is spawned. I doubt it's related to threading per se, but the fact that it is occurring on a per-thread basis, and that these threads are short-lived, makes the problem very visible: lots of csc.exe processes constantly starting and stopping.
Performance is still pretty good, but I expect it would improve if I could eliminate this. However, what concerns me even more is that McAfee is attempting to scan the csc.exe instances and eventually kills the service, apparently when one the instances exits in mid-scan. I need to deploy this service commercially, so changing McAfee settings is not a solution.
I assume that something in my code is triggering dynamic compilation, but I'm not sure what. Anyone else encounter this problem? Any ideas for resolving it?
Update 1:
After further research based on the suggestion and links from #sixlettervariables, the problem appears to stem from the implementation of XML serialization, as indicated in Microsoft's documentation on XmlSerializer:
To increase performance, the XML serialization infrastructure dynamically generates assemblies to serialize and deserialize specified types.
Microsoft notes an optimization further on in the same doc:
The infrastructure finds and reuses those assemblies. This behavior occurs only when using the following constructors:
XmlSerializer.XmlSerializer(Type)
XmlSerializer.XmlSerializer(Type, String)
which appears to indicate that the codegen and compilation would occur only once, at first use, as long as one of the two specified constructors are used. However, I don't benefit from this optimization because I am using another form of the constructor, specifically:
public XmlSerializer(Type type, Type[] extraTypes)
Reading a bit further, it turns out that this also happens to be a likely explanation for a memory leak that I have been observing when my code executes. Again, from the same doc:
If you use any of the other constructors, multiple versions of the same assembly are generated and never unloaded, which results in a memory leak and poor performance. The easiest solution is to use one of the previously mentioned two constructors. Otherwise, you must cache the assemblies in a Hashtable.
The two workarounds that Microsoft suggests above are a last resort for me. Going to another form of the constructor is not preferred (I am using the "extratypes" form for serialization of derived classes, which is a supported use per Microsoft's docs), and I'm not sure I like the idea of managing a cache of assemblies for use across multiple threads.
So, I have sgen'd, and see the resulting assembly of serializers for my types produced as expected, but when my code executes the sgen-produced assembly is not loaded (per observation in the fusion log viewer and process monitor). I'm currently exploring why this is the case.
Update 2:
The sgen'd assembly loads fine when I use one of the two "friendlier" XmlSerializer constructors (see Update 1, above). When I use XmlSerializer(Type), for example, the sgen'd assembly loads and no run-time codegen/compilation is performed. However, when I use XmlSerializer(Type, Type[]), the assembly does not load. Can't find any reasonable explanation for this.
So I'm reverting to using one of the supported constructors and sgen'ing. This combination eliminates my original problem (the launching of csc.exe), plus one other related problem (the XmlSerializer-induced memory leak mentioned in Update 1 above). It does mean, however, that I have to revert to a less optimal form of of serialization for derived types (the use of XmlInclude on the base type) until something changes in the framework to address this situation.
Psychic debugging:
Your Windows Service does XML serialization/deserialization
To increase performance, the XML serialization infrastructure dynamically generates assemblies to serialize and deserialize specified types.
If this is the case you can build these XML Serializer Assemblies a-priori.
Related
Going over some legacy code, I ran into piece of code that was using reflection for loading some dll's that their source code was available (they were another project in the solution).
I was cracking my skull trying to figure out why it was done this way (naturally the code was not documented...).
My question is, can you think about any good reason for preferring to load an assembly via reflection rather than referencing it?
Yes, if you have a dynamic module system, where different DLLs should be loaded depending on conditions at runtime. We do this where I work; we do a license check for different optional modules that may be loaded into our system, and then only load the DLLs associated with each module if the license checks out. This prevents code that should never be executed from being loaded, which can both improve performance slightly and prevent bugs.
Dynamically loading DLLs may also allow you to drastically change functionality without changing any source code. The main assembly may for instance set in motion a discovery process where it finds all classes that implement some interface, and chooses which one to use depending on some runtime criterion.
These days you'll typically want to use MEF for this kind of task, but that's only been around since .NET 4.0, so there are probably many codebases out there that do it manually. (I don't know much about MEF. Maybe you have to do this part manually there as well.)
But anyway, the answer to your question is that there certainly are good reasons to dynamically load DLLs using reflection. Whether it applies in your case is impossible to say without more details.
Without knowing you specific project, noone here can tell you why it was done that way in your case.
But the general reasons are:
updateability: You can simply recompile and replace the updated libary instead of having to recompile and replace the whole application.
cooperation: if the interface is clear, that way multiple teams can work together. one for the main application and others for the dlls
reusability: sometimes you need the same functionality in multiple projects, so the same dll can be used again and again
extensability: in some cases you want to be able to later extend your program with plugins that where not present at shipment time. This can be realized using dlls.
I hope this helps you understand some of your setup..
Reason for loading an assembly via reflection rather than referencing it?
Let us consider a scenario, where there are three classes with method DoWork() this method returns string, you are accessing it by checking the condition (strong type).
Now you have two more classes in two different DLL's how would you cope up the change?
1)You can add reference of new DLL's , change the conditional check and make it work.
2)You can use reflection , pass on condition and assembly name at run time, this allows you to add any number of functionality at runttime without any change of code in primary appliation.
The Assembly class has a GetReferencedAssemblies method that returns the
referenced assemblies. Is there a way to find what Types are referenced?
The CLR wont be able to tell you at runtime. You would have to do some serious static analysis of the source files - similar to the static analysis done by resharper or visual studio.
Static analysis is fairly major undertaking. You basically need a c# parser, a symbol table and plenty of time to work through all the cases that come up in abstract syntax trees.
Why can't the CLR tell you at run time? It is just in time compiled, this means that CLR bytcode is converted into machine code just before execution. Reflection only tells you stuff known statically at runtime about your types, and the CLR would only know if a type is referenced when the code is run. The CLR only knows when a type is loaded at execution time - at the point of just in time compilation.
Use System.Reflection.Assembly.GetTypes().
Types are not referenced separately from assemblies. If an assembly references another assembly, it automatically references (at least in the technical context) all the types within that assembly, as well. In order to get all the types defined (not referenced) in an assembly, you can use the Assembly.GetTypes method.
It may be possible, but sounds like a rather arduous task, to scan an assembly for which actual types it references (i.e. which types it actually invokes or otherwise mentions). This will probably involve working with IL. Something like this is best to be avoided.
Edit: Actually, when I think about it, this is not possible at all. Whatsoever. On a quite basic level. The thing is, types can be instantiated and referenced willy-nilly. It's not even uncommon for this to happen. Not to mention late binding. All this means trying to analyze an assembly for all the types it references is something like predicting the future.
Edit 2: Comments
While the question, as stated, isn't possible due to all sorts of dynamic references, it is possible greatly shrink all sorts of binary files using difference encoding. This basically allows you to get a file containing the differences between two binary files, which in the case of executables/libraries, tends to be vastly smaller than either of the actual files. Here are some applications that perform this operation. Note that bsdiff doesn't run on Windows, but there is a link to a port there, and you can find many more ports (including to .NET) with the aid of Google.
XDelta
bsdiff
If you'd look, you'll find many more such applications. One of the best parts is, they are totally self-contained and involve very little work on your part.
At my workplace we deploy internal application by only replacing assemblies that have changed (not my idea).
We can tell which assemblies we need to deploy by looking at if the source files that are compiled into the assemblies have changed. Most of the time we don't need to redeploy assemblies that depend on assemblies that have changed. However we have found some cases where even though no source files in an assembly have changed, we need to redeploy it.
So far we know that any of these changes in an assembly, will require all dependent assemblies to need to be recompiled and deployed:
Constant changes
Enum definition changes (order of values)
Return type of a function changes and caller uses var (sometimes)
Namespace of a class changes to another already referenced namespace.
Are there any other cases that we're missing? I'm also open to arguments why this entire approach is flawed (although it's been used for years).
Edit To be clear, we're always recompiling, but only deploying assemblies where the source files in them have changed.
So anything that breaks compilation will be picked up (method name changes, etc.), since they require changes in the calling code.
Here is another one:
Changes to optional parameter values.
The default values get directly compiled to the assembly using them (if not specified)
public void MyOptMethod(int optInt = 5) {}
Any calling code such as this:
theClass.MyOptMethod();
Will end up compiled to:
theClass.MyOptMethod(5);
If you change the method to:
public void MyOptMethod(int optInt = 10) {}
You will need to recompile all dependent assemblies if you want the new default to apply.
Additional changes that will require recompilation (thanks Polynomial):
Changes to generic type parameter constraints
Changes to method names (especially problematic when using reflection, as private methods may also be inspected)
Changes to exception handling (different exception type being thrown)
Changes to thread handling
Etc... etc... etc...
So - always recompile everything.
First off, we have sometimes deployed only a few assemblies in an application instead of the complete app. However, this is by no means the norm and has ONLY been done in our test environments when the developer had very recently (as in within the last few minutes) published the whole site and was just making a minor tweak. However, once the dev is satisfied they will go ahead and do a full recompile and republish.
The final push to testing is always based off a full recompile / deploy. The pushes to staging and ultimately production are based off of that full copy.
Besides repeatability, one reason is that you really can't be 100% positive that a human didn't miss something in the comparisons. Next, the amount of time to deploy 100 assemblies versus 5 is trivial and quite frankly not worth the amount of human time it takes to try and figure out what really changed.
Quite frankly, the list you have in combination with Oded's answer ought to be enough to convince others of the potential for failure. However, the very fact that you have already run into failures due to this lackadaisical approach should be enough of a warning flag to stop it from continuing.
At the end of the day, it really boils down to a question of professionalism. Standardization and repeatability of the process of moving code out of development, through the various hoops and ultimately into production are extremely important in creating robust mission critical applications. If your deployment process is frought with the potential for failure due to these types of risk inducing short cuts, it raises questions on the quality of the code being produced.
A similar question has been asked in Ordering of reflection requests in dotnet
But I'm hoping for a different answer... I'm writing a plugin for a program that uses reflection to interrogate plugins to find the entry point. Unfortunately it has a bug which means if it encounters an interface declaration during this process it crashes with an unhandled exception. I have spoken to the development team and this is unlikely to be fixed. This is extremely limiting for me for obvious reasons. One workaround I have already thought of is to have my assembly load another assembly with the interfaces in it, but for reasons I won't go into this is not a great solution. It was a while before I encountered this problem because for some reason my entry class always preceded my interfaces in the reflection enumeration order.
My question is, is there any way to influence the ordering of classes and interfaces in the assembly?
Note: I have already tried setting different accessibility levels on my interfaces but that doesn't work for me.
Cheers,
J
I'd bet the code using AppDomain.GetAssemblies() which are then inspected. The implementation of AppDomain.GetAssemblies() leads to an external method, so Reflector is of mostly no help here.
However, without actually trying it and inspecting the result, there are two logical options for the ordering of assemblies in the result:
Load order
Alphabetical order
In the first case you'd probably have to organize references among your assemblies and the load order in such a way that the foreign code finds the right assembly with the entrypoint class and stops. In the second case it would be a pure matter of naming the assemblies in a 'right' way but I doubt it's this case.
(However, the order may be completely different from the two above, e.g. 'mostly' random as well.)
In either case I think sooner or later the buggy code will encounter the problematic assembly and crash anyway. Thus the strong recommendation is: insist on having the bug fixed.
I have a .net 2.0 c# ClickOnce app and it connects to its data via Web Services. I've been told that one way to potentially speed up the application is to generate a serialization assembly beforehand. I have several questions on this front.
The default setting to whether to generate a serialization assembly is Auto. What criteria does VS2005 use to decide whether to generate a serialization assembly or not? It seems like it does not generate under Debug configuration, but it does under Release configuration, but I can't tell for sure and can't the information anywhere.
Does serialization assembly actually improve the startup of the application? Specifically what does it improve? Do I actually need a serialization assembly?
It is really asking "Shall I pre-generate the serialization assemblies and include it in the deployed project or shall I fall back on the default of generating the assemblies on the fly?" Generally that won't hurt too much after the first hit perf-wise. Where it can play in is that the serialization assemblies are generated in %SYSTEMROOT%\TEMP. Which, in some cases, the process can't access, leading to fatal exceptions in most cases.
This is not relevant to your situation, but there's another good reason for pre-generating the serialization assembly - it's necessary when hosting your code in SQL Server (i.e. SQLCLR). SQL Server doesn't allow these assemblies to be generated dynamically, so your serialization code would fail inside SQL Server.
In most cases, you aren't likely to see a huge benefit from this, especially if you keep the app open for a while. Pre-generating a serialization assembly mainly helps the first time (in an exe lifetime) that you serialize a specific type as xml.
According to Intellitrace, only the first time you XML-serialize a type, a FileNotFoundException is thrown and then caught. This means CLR expects to load an assembly containing all the XML-Serializers for that specific Assembly and when it's not found, a FileNotFoundException is thrown to signal the XmlSerializer: "Hey! Generate the darn assembly!" and this is what happens during that "Catch" and then the previously not found file exists.
I've read somewhere that using try-catch for logic is a bad exercise. IDK why Microsoft has used this approach...