.NET building process and linking

.NET building process and linking - c#

Building is the sequence composed of compiling and linking.
In .NET the source code is compiled into the assembly that contains Common Intermediate Language and type info. At run time the JIT compiler converts the CIL code into native code.
I do not understand, in .NET ,how and when the linking is occurring.
Can someone please explain the process ?
Thanks in advance

There's no linking in terms of C++.
I mean, there's no any intermediate "obj"/"lib" files, that can be distributed and linked with another "obj" files later. Reference to an assembly always has dynamic behavior (always dynamic-link library), as opposed to C++ static linking.
Something like linking is a creation of .netmodule. You can build .NET source code with compiler into .netmodule instead of assembly (look here, especially section "Differences Between C# Compiler and C++ Compiler Output"), and later you can link these modules together into a single assembly (see al.exe).
But this is uncommon practice - most of assemblies contains single module, and this work (source -> module -> assembly) has been done by compiler (e.g., csc.exe) behind the scenes. Also, I can't remember any product being redistributed as a set of .netmodule (not as a set of assemblies).

Related

Create placeholder .NET assembly [duplicate]

Since version 3.0, .NET installs a bunch of different 'reference assemblies' under C:\Program Files\Reference Assemblies\Microsoft...., to support different profiles (say .NET 3.5 client profile, Silverlight profile). Each of these is a proper .NET assembly that contains only metadata - no IL code - and each assembly is marked with the ReferenceAssemblyAttribute. The metadata is restricted to those types and member available under the applicable profile - that's how intellisense shows a restricted set of types and members. The reference assemblies are not used at runtime.
I learnt a bit about it from this blog post.
I'd like to create and use such a reference assembly for my library.
How do I create a metadata-only assembly - is there some compiler flag or ildasm post-processor?
Are there attributes that control which types are exported to different 'profiles'?
How does the reference assembly resolution at runtime - if I had the reference assembly present in my application directory instead of the 'real' assembly, and not in the GAC at all, would probing continue and my AssemblyResolve event fire so that I can supply the actual assembly at runtime?
Any ideas or pointers to where I could learn more about this would be greatly appreciated.
Update: Looking around a bit, I see the .NET 3.0 'reference assemblies' do seem to have some code, and the Reference Assembly attribute was only added in .NET 4.0. So the behaviour might have changed a bit with the new runtime.
Why? For my Excel-DNA ( http://exceldna.codeplex.com ) add-in library, I create single-file .xll add-in by packing the referenced assemblies into the .xll file as resources. The packed assemblies include the user's add-in code, as well as the Excel-DNA managed library (which might be referenced by the user's assembly).
It sounds rather complicated, but works wonderfully well most of the time - the add-in is a single small file, so no installation of distribution issues. I run into (not unexpected) problems because of different versions - if there is an old version of the Excel-DNA managed library as a file, the runtime will load that instead of the packed one (I never get a chance to interfere with the loading).
I hope to make a reference assembly for my Excel-DNA managed part that users can point to when compiling their add-ins. But if they mistakenly have a version of this assembly at runtime, the runtime should fail to load it, and give me a chance to load the real assembly from resources.

To create a reference assembly, you would add this line to your AssemblyInfo.cs file:
[assembly: ReferenceAssembly]
To load others, you can reference them as usual from your VisualStudio project references, or dynamically at runtime using:
Assembly.ReflectionOnlyLoad()
or
Assembly.ReflectionOnlyLoadFrom()
If you have added a reference to a metadata/reference assembly using VisualStudio, then intellisense and building your project will work just fine, however if you try to execute your application against one, you will get an error:
System.BadImageFormatException: Cannot load a reference assembly for execution.
So the expectation is that at runtime you would substitute in a real assembly that has the same metadata signature.
If you have loaded an assembly dynamically with Assembly.ReflectionOnlyLoad() then you can only do all the reflection operations against it (read the types, methods, properties, attributes, etc, but can not dynamically invoke any of them).
I am curious as to what your use case is for creating a metadata-only assembly. I've never had to do that before, and would love to know if you have found some interesting use for them...

If you are still interested in this possibility, I've made a fork of the il-repack project based on Mono.Cecil which accepts a "/meta" command line argument to generate a metadata only assembly for the public and protected types.
https://github.com/KarimLUCCIN/il-repack/tree/xna
(I tried it on the full XNA Framework and its working afaik ...)

Yes, this is new for .NET 4.0. I'm fairly sure this was done to avoid the nasty versioning problems in the .NET 2.0 service packs. Best example is the WaitHandle.WaitOne(int) overload, added and documented in SP2. A popular overload because it avoids having to guess at the proper value for *exitContext" in the WaitOne(int, bool) overload. Problem is, the program bombs when it is run on a version of 2.0 that's older than SP2. Not a happy diagnostic either. Isolating the reference assemblies ensures that this can't happen again.
I think those reference assemblies were created by starting from a copy of the compiled assemblies (like it was done in previous versions) and running them through a tool that strips the IL from the assembly. That tool is however not available to us, nothing in the bin/netfx 4.0 tools Windows 7.1 SDK subdirectory that could do this. Not exactly a tool that gets used often so it is probably not production quality :)

You might have luck with the Cecil Library (from Mono); I think the implementation allows ILMerge functionality, it might just as well write metadata only assemblies.
I have scanned the code base (documentation is sparse), but haven't found any obvious clues yet...
YYMV

Loading a .NET DLL from Resources inside a Win32/clr DLL

I wrote a Win32-DLL (with clr support in VS 2010/13, c++) as extension for another/old VB6 app and use the opensource-dll PDFSharp.
It works fine, but if the "PDFSharp.dll" removed from Directory the Application crashes if the program try to load my dll.
I want to include the Sharp DLL into mine, so that only one DLL is needed.
I tried to add it to resources, and load/catch the error during run time by
AppDomain^ root = AppDomain::CurrentDomain;
root->CurrentDomain->AssemblyResolve += gcnew ResolveEventHandler(MyResolveEventHandler);
in the first Function that the app calls, but my Problem is, the app/dll crashes before i can handle something.
ILMerge can't help, because it is a Win32/net(clr) DLL not a 100% NET-DLL.

C++/CLI mixed-mode DLLs have two sets of references: the native imports in the PE header, and the .NET assembly references. Problems finding the native imports will cause the symptom you observed, that loading the assembly fails early during load and cannot be intercepted and recovered.
It's not clear to me why the native dependency rules are applicable here. For a true native dependency that needs to be located using an alternate search order under your control, delay-loading could be applied. But that can't be used with a referenced .NET assembly.
In any case, the simplest fix is to not need a separate assembly at all. Your goal is single file deployment, and the ideal single file deployment scenario is when all the code is contained in a single DLL and you don't need to unpack a second file at runtime.
For pure .NET assemblies, there is an ILMerge tool that combines multiple DLLs into a single file. But your case has a C++/CLI mixed mode DLL, not pure MSIL.
Using multiple languages in a native program generally works a little bit differently. Instead of producing a complete executable from each toolset, native code standardizes an object file format (Windows .obj, Linux .o) which all the various toolsets know how to produce, and then the link step can link together object files from a variety of languages. The object files are often bundled into static libraries. (A static library is just an archive of object files, with a symbol index) Because the C++/CLI toolset is patterned on native C++, it uses this model as well.
The .NET version of this language-independent "object file" which can be further linked is a .netmodule file. Internally, it is a .NET assembly without a manifest. Functionally, it acts like a static library. And the C++/CLI link.exe can link together C# (and VB, and F#, etc) .netmodule static libraries together with the C++/CLI object files and static libraries, and native object files and libraries, when it creates the mixed-mode assembly.
This isn't the most straightforward process, because while it is supported by the underlying toolchains, the Visual Studio project options dialog boxes don't have a UI for either creating or consuming .netmodule static libraries.
For the C# side to produce a .netmodule, you should open your .csproj file and change the <OutputType> setting to module. Then reopen the project in Visual Studio and build as usual.
On the C++/CLI side, the project options dialog allows you to customize the compile and link command-lines. Change the linker command to include /link and the name of the .netmodule file.
If you've done it right, the C++/CLI linker will create a single mixed-mode DLL with all the types and code from both the C# and C++/CLI source files. And all the internal usage between C# and C++/CLI will be already resolved, so you won't have to worry about missing dependencies at run time. Well, at least not these dependencies; any you didn't choose to link in will still be handled normally.

Is it possible to run unmanaged C++ normally from a managed C++/CLI project?

I'm in the process of wrapping a pure unmanaged VC++ 9 project in C++/CLI in order to use it plainly from a .NET app. I know how to write the wrappers, and that unmanaged code can be executed from .NET, but what I can't quite wrap my head around:
The unmanaged lib is a very complex C++ library and uses a lot of inlining and other features, so I cannot compile this into the /clr-marked managed DLL. I need to compile this into a seperate DLL using the normal VC++ compiler.
How do I export symbols from this unmanaged code so that it can be used from the C++/CLI project? Do I mark every class I need visible as extern? Is it that simple or are there some more complexities?
How do I access the exported symbols from the C++/CLI project? Do I simply include the header files of the unmanaged source code and will the C++ linker take the actual code from the unmanaged DLL? Or do I have to hand write a seperate set of "extern" classes in a new header file that points to the classes in the DLL?
When my C++/CLI project creates the unmanaged classes, will the unmanaged code run perfectly fine in the normal VC9 runtime or will it be forced to run within .NET? causing more compatibility issues?
The C++ project creates lots of instances and has its own custom-implemented garbage collector, all written in plain C++, it is a DirectX sound renderer and manages lots of DirectX objects. Will all this work normally or would such Win32 functionality be affected in any way?

You can start with an ordinary native C++ project (imported from, say, Visual Studio 6.0 from well over a decade ago) and when you build it today, it will link to the current version of the VC runtime.
Then you can add a single new foo.cpp file to it, but configure that file so it has the /CLR flag enabled. This will cause the compiler to generate IL from that one file, and also link in some extra support that causes the .NET framework to be loaded into the process as it starts up, so it can JIT compile and then execute the IL.
The remainder of the application is still compiled natively as before, and is totally unaffected.
The truth is that even a "pure" CLR application is really a hybrid, because the CLR itself is (obviously) native code. A mixed C++/CLI application just extends this by allowing you to add more native code that shares the process with some CLR-hosted code. They co-exist for the lifetime of the process.
If you make a header foo.h with a declaration:
void bar(int a, int b);
You can freely implement or call this either in your native code or in the foo.cpp CLR code. The compiler/linker combination takes care of everything. There should be no need to do anything special to call into native code from within your CLR code.
You may get compile errors about incompatible switches:
/ZI - Program database for edit and continue, change it to just Program database
/Gm - you need to disable Minimal rebuild
/EHsc - C++ exceptions, change it to Yes with SEH Exceptions (/EHa)
/RTC - Runtime checks, change it to Default
Precompiled headers - change it to Not Using Precompiled Headers
/GR- - Runtime Type Information - change it to On (/GR)
All these changes only need to be made on your specific /CLR enabled files.

As mentioned from Daniel, you can fine-tune your settings on file level. You can also play with '#pragma managed' inside files, but I wouldn't do that without reason.
Have in mind, that you can create a complete mixed mode assembly. That means, you can compile your native code unchanged into this file PLUS some C++/CLI wrapper around this code. Finally, you will have the same file as native Dll with all your exported native symbols AND as full-fledged .NET assembly (exposing C++/CLI objects) at the same time!
That also means, you have only to care about exports as far as native client code outside your file is considered. Your C++/CLI code inside the mixed dll/assembly can access the native data structures using the usual access rules (provided simply by including the header)
Because you mentioned it, I did this for some non-trivial native C++ class hierarchy including a fair amount of DirectX code. So, no principal problem here.
I would advise against usage of pInvoke in a .NET-driven environment. True, it works. But for anything non-trivial (say more than 10 functions) you are certainly better with an OO approach as provided by C++/CLI. Your C# client developers will be thankful. You have all the .NET stuff like delegates/properties, managed threading and much more at your finger tips in C++/CLI. Starting with VS 2012 with a somewhat usable Intellisense too.

You can use PInvoke to call exported functions from unmanaged DLLs. This is how unmanaged Windows API is accessed from .Net. However, you may run into problems if your exported functions use C++ objects, and not just plain C data structures.
There also seems to be C++ interop technology that can be of use to you: http://msdn.microsoft.com/en-us/library/2x8kf7zx(v=vs.80).aspx

Whats the relation(if any) of MASM assembly language and ILASM?

whats the relation(if any) of MASM assembly language and ILASM. Is there a one to one conversion? Im trying to incorporate Quantum GIS into a program Im kinda writing as I go along! I have GIS on my computer, I have RedGate Reflector and it nor the Object Browser of Visual Studio 2008 couldnt open one(of several which I dont have a strong clue to how they behave) of the .dlls in Quantum. I used the MASM assembly editor and "opened" the same dll and it spewed something I didnt expect to necessarily understand in the first place. How can I/can I make a conversion of that same "code" to something I can interact with in ILASM and Im assuming consequently in Csharp? Thanks a ton for reading and all the responses to earlier questions...please bear in mind Im relatively new to programming in Csharp, and even fresher to MASM and ILASM.

MASM deals with the x86 instructions and is platform/processor dependent, while ILASM reffers to the .Net CIL (common intermediary language) instructions which are platform/processor independent. Switching from something specific to something more general is hard to achieve, that's why, AFAIK, there is no converter from MASM to ILASM (inverse, there is!)

IL is a platform independent layer of abstraction over native code. Code written on the .NET platform in C#, VB.NET, or other .NET language all compile down to an assembly .EXE/.DLL containing IL. Typically, the first time the IL code is executed the .NET runtime will run it through NGen, which compiles it once again down to native code and stores the output in a temporary location where it is actually executed. This allows .NET platform code to be deployed to any platform supporting that .NET framework, regardless of the processor or architecture of the system.
As you've seen, Reflector is great for viewing the code in an assembly because IL can easily be previewed in C# or VB.NET form. This is because IL is generally a little higher level instructions and also contain a lot of metadata that native code wouldn't normally have, such as class, method, and variable names.
It's also possible to compile a .NET project directly to native code by setting the Visual Studio project platform or by calling Ngen.exe directly on the assembly. Once done, it's really difficult to make sense of the native code.

Ther is no relationship between MASM assembly language and ILASM. I don't see you have any way to convert native code to IL code. IL can be understood by CLR only while the MASM assembly language is about native machine code. CLR turns the IL into native code in runtime

How do C/C++/Objective-C compare with C# when it comes to using libraries?

This question is based on a previous question: How does C# compilation get around needing header files?.
Confirmation that C# compilation makes use of multiple passes essentially answers my original question. Also, the answers indicated that C# uses type and method signature metadata stored in assemblies to check code syntax at compile time.
Q: how does C/C++/Objective-C know what code to load at run time that was linked at compile-time? And to tie it into a technology I'm familiar with, how does C#/CLR do this?
Correct me if I'm wrong, but for C#/CLR, my intuitive understanding is that certain paths are checked for assemblies upon execution, and basically all code is loaded and linked dynamically at run time.
Edit: Updated to include C++ and Objective-C with C.
Update: To clarify, what I really am curious about is how C/C++/Objective-C compilation matches an "externally defined" symbol in my source with the actual implementation of that code, what is the compilation output, and basically how the compilation output is executed by the microprocessor to seamlessly pass control into the library code (in terms of instruction pointer). I have done this with the CLR virtual machine, but am curious to know how this works conceptually in C++/Objective-C on an actual microprocessor.

The linker plays an essential role in C/C++ building to resolve external dependencies. .NET languages don't use a linker.
There are two kinds of external dependencies, those whose implementation is available at link time in another .obj or .lib file offered as input to the linker. And those that are available in another executable module. A DLL in Windows.
The linker resolves the first ones at link time, nothing complicated happens since the linker will know the address of the dependency. The latter step is highly platform dependent. On Windows, the linker must be provided with an import library. A pretty simple file that merely declares the name of the DLL and a list of the exported definitions in the DLL. The linker resolves the dependency by entering a jump in the code and adding a record to the external dependency table that indicates the jump location so that it can be patched at runtime. The loading of the DLL and setting up the import table is done at runtime by the Windows loader. This is a bird's-eye view of the process, there are many boring details to make this happen as quickly as possible.
In managed code all of this is done at runtime, driven by the JIT compiler. It translates IL into machine code, driven by program execution. Whenever code executes that references another type, the JIT compiler springs into action, loads the type and translates the called method of the type. A side-effect of loading the type is loading the assembly that contains the type, if it wasn't loaded before.
Notable too is the difference for external dependencies that are available at build time. A C/C++ compiler compiles one source file at a time, the dependencies are resolved by the linker. A managed compiler normally takes all source files that create an assembly as input instead of compiling them one at a time. Separate compilation and linking is in fact supported (.netmodule and al.exe) but is not well supported by available tools and thus rarely done. Also, it cannot support features like extension methods and partial classes. Accordingly, a managed compiler needs many more system resources to get the job done. Readily available on modern hardware. The build process for C/C++ was established in an era where those resources were not available.

I believe the process you're asking about is the one called symbol resolution. In the common case, it works along these lines (I've tried to keep it pretty OS-neutral):
The first step is compiling of individual source files to create object files. The source code is turned machine language instructions, and any symbols (ie. function or external variable names) that aren't defined in the source file itself result in placeholders being left in the compiled machine language code, wherever they are referenced. The unknown symbol is also added to a list in the object file - at the end of compilation, this list contains every unresolved symbol in the object file, cross-referenced with the location in the object file of all the placeholders that were added. Each object file also contains a list of the symbols exported by that object file - that is, the symbols defined in that object file that it wants to make visible to code outside that object file - along with the values of those symbols.
The second step is static linking. This also happens at compile-time. During the static linking process, all of the object files created in the first step and any static library files (which are just a special kind of object file) are combined into a single executable. The static linker does a pass through the symbols exported by each object file and static library it has been told to link together, and builds a complete list of the exported symbols (and their values). It then does a pass through the unresolved symbols in each object file, and where the symbol is found in the master list, replaces all of the placeholders with the actual value of the symbol. For any symbols that still remain unresolved at the end of this process, the linker looks through the list of symbols exported by all dynamic libraries it knows about. It builds a list of dynamic libraries that are required, and stores this in the executable. If any symbols still haven't been found, the link process fails.
The third step is dynamic linking, which happens at run time. The dynamic linker loads the dynamic libraries in the list contained in the executable, and replaces the placeholders for the remaining unresolved symbols with their corresponding values from the dynamic libraries. This can either be done "eagerly" - after the executable loads but before it runs - or "lazily", which is on-demand, when an unresolved symbol is first accessed.

The C and C++ Standards have nothing to say about run-time loading - this is entirely OS-specific. In the case of Windows, one links the code with an export library (generated when a DLL is created) that contains the names of functions and the name of the DLL they are in. The linker creates stubs in the code containing this information. At run-time, these stubs are used by the C/C++ runtime together with the Windows LoadLibrary() and associated functions to load the function code into memory and execute it.

By libraries you are referring to DLLs right?
There are certain patterns for OS to look for required files (usually start from application local path, then proceed to folder specify by environment variable <PATH>.)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.