Partially reference a DLL - c#

I have a library DLL full with sort algorithmn, parsers, validators, converters etc. The DLL is about 40 Mb (that is not much I know but still). Now I would like to reference just the parsers of that DLL. The point is to get out those parsers without shipping 40 Mb to the customer.
Is there a way everytime I make a release build to just take those up-to-date parsers from my library, store them into some kind of .partialDll file and deliver only them to the customer? The result would be me keeping all my helper classes in one big library which keeps growing and the customers get just what they ordered..
I guess I would need to deal with alot of reflection to achieve something like this, right? Any ideas?

Let me start with a quote from MSDN:
"Assemblies are the building blocks of .NET Framework applications; they form the fundamental unit of deployment […]."
Note that the quote is about assemblies, not about DLLs. There's a difference!
Although most .NET assemblies consist of exactly one DLL file, that is not a strict requirement: An assembly can in fact consist of more than one file; such a "multi-file assembly" can, for instance, consist of several DLLs, which in turn are called "netmodules". (A netmodule might have a .netmodule file extension by convention, but it's really a DLL containing .NET metadata and bytecode.) Each multi-file assembly has exactly one "main" module which carries the metadata that references all the other assembly files and so ties them together into a logical whole.
While an assembly has to be deployed in full (as per the above quote), the .NET runtime can load only those netmodules that are actually required for JIT code compilation and execution.
So you can split up an assembly into several parts, and have the runtime load only what is actually needed; but you cannot do the same to a netmodule / DLL file. A DLL file can only be deployed and loaded in its entirety.
Note also that Visual Studio's support for netmodules is non-existent for all practical purposes, so most people don't use them, which is why you see so few multi-file assemblies in the real world.
The bottom line is this: In practice, if you or your clients are interested in only a part of an assembly ("DLL"), then it's usually easier to split a large assembly (that is, one large Visual Studio project) into several inter-dependent assemblies (several smaller Visual Studio projects).

In general, no, there is no way to achieve that. Once you pack "everything" into a module and compile it, you can't split that module later into smaller ones. (well, ok, you can analyze the bytecode and rewrite the assembly, see the end of this post).
For me, your nullhypothesis seems wrong. You don't need to work with "one huge library that keeps all your helper classes", and really, you dont want, or you will not want to either. If you don't feel like that, I assure you that in time, years maybe, you will hate such one-to-have-it-all approach.
This is exactly what you want to escape from and this is why .Net and many other languages/environments support concept of "libraries" or "modules" and allow you to use multiple of them, and that's why most of the projects you see everywhere aren't created as "one huge EXE". It's much easier to reuse, analyze and even hunt bugs when you have it in smaller chunks.
--
However, if you'd insist, there are ways (ugly) to achive something-like you think. I assume that the "huge DLL" is in C# and is controlled by you.
First, somewhat naiive but working way, is to use "file links". In VisualStudio you can have a project that contains tons of files and producess a BigDLL "all.dll", and just by its side you can create another project that will not contain any files at all, but that will contain links to the first projects' files. Use typical "Add a file.." option to a project and note that near the final "Add" button there's a down arrow that expands to "Add as link..".
This will cause the file to stay in HugeProject, but the SmallProject will see the file too and when SmallProject is compiled, it will pull the code from that file too.
Note that this way you will actually build two separate modules assemblies: big one and small one, and your final product will need to reference the small one.
This way is naiive and ugly, it is just as if you manually copied/splitted the huge project into smaller ones, but with the tiny advantage is that you don't need to copy the code files around.
--
intermission for side-thoughts:
you can use #if to conditionally turn off some currently-unused code, however setting the flags that drive those IFs will be cumbersome
you can edit .csproj files and use MSBuild conditional clauses to automatically exclude unused code files from your HugeProject during final builds, however setting the flags that drive those IFs will be cumbersome too
--
The second way is to keep everything in the HugeProject, and to have your application(s) reference it directly, and then after building and testing everything, just before packing that and sending to customer - use some kind of trimming utility that will check what parts of code are referenced and that will remove all dead code from the assemblies. I can't give you any name for such utility, but many obfuscators come with such feature.
They will run through your compiled code, cross-reference everything, change/remove/trash class/method/propertynames and also they may as a bonus remove unused bits. Then, they'll write mangled assemblies back to disk ensuring that they reference each other and not the original ones from before mangling.
example: See a question related to that
example: See an example of such utility also consider ILMerge for better results.
Cons - utility may leave some trash it couldn't decide whether it is used or not, finding/testing/buying it may take some time and resources, you can have some signing problems since the stripped-assembly will be a brand new assembly, etc. Also, such utilities have problems if you invoke some code only by reflection and it may require you to provide some extra hints or to make sure the code "seems to be used" (example: a whole namespace of "plugins" that implement "IPlugin" and then your app searched that NS for Types and uses Activator.CreateInstance to instantiate them; no hard-linked usages, trimmer may decide to remove all plugins as "unused"; you'll need to configure trimmer carefully or be suprised).
Probably a few other ways could be found too, but seriously, in most of the times, you don't want to waste your time on that, especially manually. So just tidy up your code and split it into small libs, or start looking for automatic obfuscator&trimmer.

Related

How to handle your code that later versions of the framework include?

I have to work with an old version of Mono in Unity projects. I find myself recreating some classes and extension methods that exist in later versions of .NET. Should I be marking these with an attribute that will make it easy to take them out at a later point, just wait for the inevitable errors, and delete the duplicate code, or take some other approach I'm not familiar with yet? If the attribute route is the way to go, is there already an appropriate attribute created for this kind of thing?
Here's what I'd like:
[PresentInDotNET(3.5)]
I fill in the version and get alerted when the framework is at that level or higher.
Split them off to a separate assembly, and change the set of assemblies that make up the final delivery based on the .NET version. You need to rebuild your main assembly to refer to the correct assemblies (depending on whether Foo is in MySystem or System), but as long as you keep namespaces identical, that's all. If you are not even interested in keeping compatibility with older versions, you can simply delete classes from this assembly as they become available.
Alternatively, if the classes/extension methods you are recreating are not interesting (in the sense that you gain nothing by having .NET supply them for you), simply put them in their separate namespace and accept that you are duplicating code already present in newer versions. It doesn't matter a whole lot which assembly gets the job done, after all, as long as it happens.
Whatever you do, try to avoid going the route of #ifdefs, runtime discovery, and other conditional code, as this is much harder to maintain.
How about adding "// TODO" comments for places like this? Visual Studio will display these in the Task window and you can get at them pretty easily.

Is there a good reason for preferring reflection over reference?

Going over some legacy code, I ran into piece of code that was using reflection for loading some dll's that their source code was available (they were another project in the solution).
I was cracking my skull trying to figure out why it was done this way (naturally the code was not documented...).
My question is, can you think about any good reason for preferring to load an assembly via reflection rather than referencing it?
Yes, if you have a dynamic module system, where different DLLs should be loaded depending on conditions at runtime. We do this where I work; we do a license check for different optional modules that may be loaded into our system, and then only load the DLLs associated with each module if the license checks out. This prevents code that should never be executed from being loaded, which can both improve performance slightly and prevent bugs.
Dynamically loading DLLs may also allow you to drastically change functionality without changing any source code. The main assembly may for instance set in motion a discovery process where it finds all classes that implement some interface, and chooses which one to use depending on some runtime criterion.
These days you'll typically want to use MEF for this kind of task, but that's only been around since .NET 4.0, so there are probably many codebases out there that do it manually. (I don't know much about MEF. Maybe you have to do this part manually there as well.)
But anyway, the answer to your question is that there certainly are good reasons to dynamically load DLLs using reflection. Whether it applies in your case is impossible to say without more details.
Without knowing you specific project, noone here can tell you why it was done that way in your case.
But the general reasons are:
updateability: You can simply recompile and replace the updated libary instead of having to recompile and replace the whole application.
cooperation: if the interface is clear, that way multiple teams can work together. one for the main application and others for the dlls
reusability: sometimes you need the same functionality in multiple projects, so the same dll can be used again and again
extensability: in some cases you want to be able to later extend your program with plugins that where not present at shipment time. This can be realized using dlls.
I hope this helps you understand some of your setup..
Reason for loading an assembly via reflection rather than referencing it?
Let us consider a scenario, where there are three classes with method DoWork() this method returns string, you are accessing it by checking the condition (strong type).
Now you have two more classes in two different DLL's how would you cope up the change?
1)You can add reference of new DLL's , change the conditional check and make it work.
2)You can use reflection , pass on condition and assembly name at run time, this allows you to add any number of functionality at runttime without any change of code in primary appliation.

Does an ILSplit.exe exist, equivalent to ILMerge.exe, or how could this be made?

Does a utility for splitting a single .NET assembly into a subset of the full assembly exist? I.e. the "functional inverse" of ILMerge.exe?
This tool, of course, would be difficult to produce if it had to track dependencies etc. between classes, functions and such.
However, what I am looking is for a case where I have a very big (hundreds of MB) mixed mode assembly with mostly static classes and static methods, basically just a function library. Although, with some DLLMain initialization and similar.
What I would like is to be able to specify a list of static methods upon specific static classes that I want to keep in the subset assembly. Technically, this should be possible as an assembly is just binary information with a standardized format.
So does this exist or how could this be made? Or why would this be impractical?
No, very high odds that this tool doesn't exist. Albeit that the absence of the tool can never be disproved positively.
These IL rewriting tricks don't work on mixed-mode assemblies anyway, ILMerge doesn't support them either. Such assemblies don't just contain IL, they also have machine code and a relocation table. There is no simple way to pick machine code apart, mostly because it isn't just pure code but also contains data. Like jump tables for a switch-statement. Also the reason that programmers that write code in a native language don't bother with obfuscators. Decompiling machine code is a major time sink and always imperfect.
So, for one, it is likely that your assembly is large because it has a lot of native code. You will need to tackle this at the project level and split up this mongo project into smaller ones, distributing the source code between them. That's work. And not exactly always easy, linker errors are a common scourge when you do this. And you're likely to have to change code declarations so they can be exported. Only one way to do this, start at the beginning and split off a sub-section first so now you have a not-so-big assembly and a small one. Rinse and repeat. Do beware the cost, many assemblies make the cold start of a program slower.

Can language localisations be automatically reverse engineered from .resx files?

I'm working with an MVC 4 app that was originally created with the intention of possibly requiring language localisation so there's a heavy use of .resx files and corresponding embedding of references throughout the project. As it turns out, the app will only ever be used by English speaking audiences and indeed no other languages were ever loaded in. What we've got now is an overhead every time we need to put text on a page and increasing inconsistency as English language is hard-coded into places which can't directly access the resource files such as .js files and reference data in the DB.
Short of a lot of copying and pasting, is there any automated way to extract the English language values from the resource files and replace their references in the views? In a perfect world there'd be a tool to do this and certainly it's conceptually scriptable, does anything like this exist already?
You will have to script it. I have done similar stuff with the O2 Platform AST manipulation Mono.Cecil and mono Cecil APIs.
If you give me a small project with the use can you need (a resx file and an MVC view) I can show you a code snippet example
I haven't seen anything that would take care of this. My first thought is because of the localization issues that could be presented in most "out of the box" solution.
This maybe far fetched, but giving it a shot. Could you write a C# app that would load the assembly that holds the resource file, then loop through every file in the project and replace the resource keys with the values?
As you said, it is possible to be scripted, and this seems like the easiest yet crudest way to complete the task in my mind. Depending on the number of resources you're talking about, obviously it maybe easier and safer to copy/paste.
Satellite assemblies..If you have all app resources placed in a project then create the non-default language you want to implement. For example fr_ca.ErrorMsg | en_gb.ErrorMsg and en_Us.ErrorMsg. The default language can be specified in the main Thread.CultureInfo If en_us then fill you en_us file with all entries need and other resources will only be loaded if it does not exist in the default resource->en_us.

Best practices for assembly naming and versioning?

I am looking out for some good practices on naming assemblies and versioning them. How often do you increment the major or minor versions?
In some cases, I have seen releases going straight from version 1.0 to 3.0. In other cases, it seems to be stuck at version 1.0.2.xxxx.
This will be for a shared assembly used in multiple projects across the company. Looking forward to some good inputs.
Some good information from this article on Suzanne Cook's blog on MSDN (posted 2003-05-30):
When to Change File/Assembly Versions
First of all, file versions and assembly versions need not coincide
with each other. I recommend that file versions change with each
build. But, don’t change assembly versions with each build just so
that you can tell the difference between two versions of the same
file; use the file version for that. Deciding when to change assembly
versions takes some discussion of the types of builds to consider:
shipping and non-shipping.
Non-Shipping Builds In general, I recommend keeping non-shipping assembly versions the same between shipping builds. This
avoids strongly-named assembly loading problems due to version
mismatches. Some people prefer using publisher policy to redirect new
assembly versions for each build. I recommend against that for
non-shipping builds, however: it doesn’t avoid all of the loading
problems. For example, if a partner x-copies your app, they may not
know to install publisher policy. Then, your app will be broken for
them, even though it works just fine on your machine.
But, if there are cases where different applications on the same
machine need to bind to different versions of your assembly, I
recommend giving those builds different assembly versions so that the
correct one for each app can be used without having to use
LoadFrom/etc.
Shipping Builds As for whether it’s a good idea to change that version for shipping builds, it depends on how you want the binding to
work for end-users. Do you want these builds to be side-by-side or
in-place? Are there many changes between the two builds? Are they
going to break some customers? Do you care that it breaks them (or do
you want to force users to use your important updates)? If yes, you
should consider incrementing the assembly version. But, then again,
consider that doing that too many times can litter the user’s disk
with outdated assemblies.
When You Change Your Assembly Versions To change hardcoded versions to the new one, I recommend setting a variable to the version
in a header file and replacing the hardcoding in sources with the
variable. Then, run a pre-processor during the build to put in the
correct version. I recommend changing versions right after shipping,
not right before, so that there's more time to catch bugs due to the
change.
One way to define your versioning is to give semantic meaning to each portion:
Go from N.x to N+1.0 when compatibility breaks with the new relase
Go from N.M to N.M+1 when new features are added which do not break compatibility
Go from N.M.X to N.M.X+1 when bug fixes are added
The above is just an example -- you'd want to define the rules that make sense for you. But it is very nice for users to quickly tell if incompatibilities are expected just by looking at the version.
Oh, and don't forget to publish the rules you come up with so people know what to expect.
Semantic Versioning has a set of guidelines and rules as to how to apply this (and when). Very simple to follow and it just works.
http://semver.org/
The first thing I would recommend is to become familiar with the differences between the Assembly version and the File version. Unfortunately, .NET tends to treat these as the same when it comes to the AssemblyInfo files in that it usually only puts AssemblyVersion and allows the FileVersion to default to the same value.
Since you said this is a shared assembly, I'm assuming you mean it's shared at a binary level (not by including the project in the various solutions). If that's the case you want to be very deliberate about changing the Assembly version as that is what .NET uses to strong name the assembly (to allow you to put it in the GAC) and also makes up the "assembly full name". When the assembly version changes, it can have breaking changes for the applications that use it without adding assembly redirect entries in the app.config file.
As for naming, I think it depends on what your company naming rules are (if any) and the purpose of the library. For exmaple, if this library provides "core" (or system level) functionality that isn't specific to any particular product or line of business, you could name it as:
CompanyName.Framework.Core
if it's part of a larger library, or simply
CompanyName.Shared
CompanyName.Core
CompanyName.Framework
As far as when to increment version numbers, it's still rather subjective and depends on what you consider each portion of the build number to represent. The default Microsoft scheme is Major.Minor.Build.Revision, but that doesn't mean you can't come up with your own definitions. The most important thing is to be consistent in your strategy and make sure that the definitions and rules make sense across all of your products.
In almost every version scheme I've seen the first two portions are Major.Minor. The major version number usually increments when there are large changes and/or breaking changes, while the minor version number usually increments to indicate that something changed which did was not a breaking change. The other two numbers are considerably more subjective and can be the "build" (which is often times a serial date value or a sequentially updating number that changes each day) and the "revision" or patch number. I've also seen them reversed (giving Major.Minor.Revision.Build) where build is a sequentially incrementing number from an automated build system.
Keep in mind that the assembly major and minor versions are used as the type library version number when the assembly is exported.
Finally, take a look at some of these resources for more information:
http://msdn.microsoft.com/en-us/library/51ket42z.aspx
http://msdn.microsoft.com/en-us/library/system.reflection.assemblyversionattribute.aspx
http://blogs.msdn.com/suzcook/archive/2003/05/29/57148.aspx

Categories