How is Interop.xxxxx.dll generated? - c#

I recently took over a project and have little VS COM experience so please forgive me if I am not asking the right questions..
I have a C++ project which generate a COM dll, lets name it abc.dll.
I have another C# project which references the COM dll, however under the references, it is pointing to Interop.abc.dll. I deleted all abc.dll within the directory and interrop.abc.dll to see how the project would react, upon starting the project, interop.abc.dll is automatically generated. This boggles my mind because I dont know how interop.abc.dll is generated.
So here are my questions:
How does the C# project reference the interop.abc.dll initially if this is generated?
How is the interop.abc.dll generated if there is no abc.dll to begin with (I havnt built it)
I played around with the project and then interop.abc.dll stopped generating and was causing errors, why is it?

COM declarations are stored in a type library. Which is very similar to .NET metadata, COM was the grand-father of .NET, metadata that tells a compiler what types are stored in an assembly. Type libraries however use a fairly awkward binary format, it is very different from .NET metadata. Different enough to make the conversion from a type library to .NET metadata non-trivial. Some constructs have no conversion at all, some are troublesome enough that you ought to know about it.
So the .NET team decided that the conversion should be a separate step, one that could display warnings, or errors, when the type library content doesn't match .NET metadata closely enough. That conversion tool is Tlbimp.exe, the type library import tool. They did add this feature to the Project + Add Reference dialog. As long as the conversion doesn't generate any warnings that this works just fine. It very often does.
So the job of Tlbimp.exe, and the IDE in the case of Add Reference, is to mechanically translate the type library content to .NET metadata content. By convention, a type library named "Foo" is converted to "Interop.Foo.dll". It doesn't have to be, just the default name. You'll end up with a .NET assembly that doesn't contain any code, just metadata. .NET types that any .NET compiler can directly consume and otherwise match the type library declarations. The [ComImport] attribute is a very important one that tells the CLR that the metadata is actually for COM types.
So this all explains bullet 1. Bullet 2 is a no-go, you really do need the type library. It is very often embedded as a resource in the unmanaged DLL, Tlbimp.exe knows how to find it. Nobody can see the errors in 3 so of course it is unguessable what the problem might be. Just keep in mind that the conversion is not always without trouble, the point of making the conversion step explicit.

Related

Create placeholder .NET assembly [duplicate]

Since version 3.0, .NET installs a bunch of different 'reference assemblies' under C:\Program Files\Reference Assemblies\Microsoft...., to support different profiles (say .NET 3.5 client profile, Silverlight profile). Each of these is a proper .NET assembly that contains only metadata - no IL code - and each assembly is marked with the ReferenceAssemblyAttribute. The metadata is restricted to those types and member available under the applicable profile - that's how intellisense shows a restricted set of types and members. The reference assemblies are not used at runtime.
I learnt a bit about it from this blog post.
I'd like to create and use such a reference assembly for my library.
How do I create a metadata-only assembly - is there some compiler flag or ildasm post-processor?
Are there attributes that control which types are exported to different 'profiles'?
How does the reference assembly resolution at runtime - if I had the reference assembly present in my application directory instead of the 'real' assembly, and not in the GAC at all, would probing continue and my AssemblyResolve event fire so that I can supply the actual assembly at runtime?
Any ideas or pointers to where I could learn more about this would be greatly appreciated.
Update: Looking around a bit, I see the .NET 3.0 'reference assemblies' do seem to have some code, and the Reference Assembly attribute was only added in .NET 4.0. So the behaviour might have changed a bit with the new runtime.
Why? For my Excel-DNA ( http://exceldna.codeplex.com ) add-in library, I create single-file .xll add-in by packing the referenced assemblies into the .xll file as resources. The packed assemblies include the user's add-in code, as well as the Excel-DNA managed library (which might be referenced by the user's assembly).
It sounds rather complicated, but works wonderfully well most of the time - the add-in is a single small file, so no installation of distribution issues. I run into (not unexpected) problems because of different versions - if there is an old version of the Excel-DNA managed library as a file, the runtime will load that instead of the packed one (I never get a chance to interfere with the loading).
I hope to make a reference assembly for my Excel-DNA managed part that users can point to when compiling their add-ins. But if they mistakenly have a version of this assembly at runtime, the runtime should fail to load it, and give me a chance to load the real assembly from resources.
To create a reference assembly, you would add this line to your AssemblyInfo.cs file:
[assembly: ReferenceAssembly]
To load others, you can reference them as usual from your VisualStudio project references, or dynamically at runtime using:
Assembly.ReflectionOnlyLoad()
or
Assembly.ReflectionOnlyLoadFrom()
If you have added a reference to a metadata/reference assembly using VisualStudio, then intellisense and building your project will work just fine, however if you try to execute your application against one, you will get an error:
System.BadImageFormatException: Cannot load a reference assembly for execution.
So the expectation is that at runtime you would substitute in a real assembly that has the same metadata signature.
If you have loaded an assembly dynamically with Assembly.ReflectionOnlyLoad() then you can only do all the reflection operations against it (read the types, methods, properties, attributes, etc, but can not dynamically invoke any of them).
I am curious as to what your use case is for creating a metadata-only assembly. I've never had to do that before, and would love to know if you have found some interesting use for them...
If you are still interested in this possibility, I've made a fork of the il-repack project based on Mono.Cecil which accepts a "/meta" command line argument to generate a metadata only assembly for the public and protected types.
https://github.com/KarimLUCCIN/il-repack/tree/xna
(I tried it on the full XNA Framework and its working afaik ...)
Yes, this is new for .NET 4.0. I'm fairly sure this was done to avoid the nasty versioning problems in the .NET 2.0 service packs. Best example is the WaitHandle.WaitOne(int) overload, added and documented in SP2. A popular overload because it avoids having to guess at the proper value for *exitContext" in the WaitOne(int, bool) overload. Problem is, the program bombs when it is run on a version of 2.0 that's older than SP2. Not a happy diagnostic either. Isolating the reference assemblies ensures that this can't happen again.
I think those reference assemblies were created by starting from a copy of the compiled assemblies (like it was done in previous versions) and running them through a tool that strips the IL from the assembly. That tool is however not available to us, nothing in the bin/netfx 4.0 tools Windows 7.1 SDK subdirectory that could do this. Not exactly a tool that gets used often so it is probably not production quality :)
You might have luck with the Cecil Library (from Mono); I think the implementation allows ILMerge functionality, it might just as well write metadata only assemblies.
I have scanned the code base (documentation is sparse), but haven't found any obvious clues yet...
YYMV

how to explicitly link a cpp/cli file to a c# library .dll?

I have a c++/CLI library that is in turn calling a c# library. That is fine, it is linking implicitly and all is good with the world. But for various reasons the libraries are not getting quite the prefect treatment by our automated build process, and the libraries are not finding each other unless we move the libraries to locations that we would rather not have them in, and would rather not fold into our build process.
It is suggested to me that we/I could write a post-build event that uses XCOPY. but lets say we don't want to do that.
Another suggestion is to explicitly load the dll. Windows says that to link explicitly "Applications must make a function call to explicitly load the DLL at run time." The problem is that Microsoft's example is not enough for my small mind to understand how to proceed with this idea. Worse, the only example I could find is out of date. Perhaps I am not using the right search terms but I am having difficulty finding more about it with google.
How do we explicitly Link a c++/Cli Library to a C# .dll?
----edit
OK, How do we explicitly Link a C++/CLI code, which exports a library using __declspec(), to a C# .dll.
There is no such thing as a "C++/CLI library", only assemblies are supported. There is no explicit or implicit linking, binding always happens at runtime. Assemblies are found at runtime by the CLR, the rules it uses to locate them are described in detail in the MSDN library.
Copying all dependencies into the same directory as the EXE is the sane way to go about it while you are developing the code. Well supported by build system, the C# and C++ rules are however different. C++ projects build to the solution's Debug directory, C# projects build to the EXE project's bin\Debug directory. So yes, altering a C++ project's Output Directory setting or copying files with a post build event is usually required to get everything together.

Microsoft .NET namespace casing convention

I was in the process of adding a reference to a dll when I noticed that the vast majority instances of the Microsoft namespace has a uppercase M and on rare occasions they have a lower case m.
Is there a reason or any logic for this?
Does anyone know the reasoning for this decision by Microsoft?
Those entries you pointed out are not normal .NET framework assemblies. And they are auto-generated by tooling. Two good reasons they don't follow .NET framework naming conventions. They are PIAs, Primary Interop Assemblies. They contain the declarations retrieved from a COM component's type library, converted into .NET metadata to make it easy for the CLR to interop with the COM component. The types in these PIAs have the [ComImport] attribute.
Tlbimp.exe is the tool used to auto-generate these assemblies, the /primary command line option generates a PIA. The ones you got from Microsoft are slightly different from the ones you'll get when you run Tlbimp.exe yourself. For one, Microsoft includes a version resource in the assembly. For another, the names for these PIAs are not the default name that Tlbimp.exe generates. So you are seeing what the build engineer at Microsoft typed for the /out command line option. Clearly he's not paying much attention to casing.
Microsoft.msxml is the PIA for c:\windows\system32\msxml3.dll, very commonly used in older code to read and write XML documents. Microsoft.mshtml is the PIA for c:\windows\system32\mshtml.tlb, the type library for the DOM interface supported by Internet Explorer and one you'll need when you want to dig through the HTML elements of a web page. You can look at these type libraries in their "native" format with the Oleview.exe tool, File + View Typelib. What you'll see otherwise looks very similar to what you'll see with Object Browser, except that they are expressed in IDL, Interface Description Language, the language that was originally used to generate these type libraries.
PIAs are mostly an historical artifact, the Embed Interop Type feature available since .NET 4 made them unnecessary.

Requiring library consumers reference additional assembly when using certain types

I have library code that uses ICSharpCode.SharpZipLib under the hood to make it easy to use ZIP files as data sources when running integration tests.
As it stands, if I reference my library from another project, the other project will compile just fine, but when it accesses the code that uses SharpZipLib, I get an exception for it not finding the zip library:
failed: System.IO.FileNotFoundException : Could not load file or assembly 'ICSharpCode.SharpZipLib, Version=0.85.5.452, Culture=neutral, PublicKeyToken=1b03e6acf1164f73' or one of its dependencies. The system cannot find the file specified.
If the types in my library derived from a class in SharpZipLib, it'd generate a compile error CS0012. What other ways are there for triggering a CS0012, so that using code that requires SharpZipLib (but doesn't clearly indicate it) would cause consumer code to fail compilation?
I've had similar problems in the past when I've used libraries like DeftTech.DuckTyping under the hood. I'd add my library code to a new project, start working, compile, run, and then suddenly hit an edge case that I'd used duck typing to get around and get a runtime error.
What I'd most like is to have the same behavior as if I'd derived from a type in the 3rd-party library, so that a reference to my derived type generates a CS0012:
The type 'type' is defined in an assembly that is not referenced. You must add a reference to assembly 'assembly'.
You only get compiler errors if you are DIRECTLY interacting with libraries that aren't referenced.
If you use other libraries that internally use a third party library then you will never get a compiler error. The reason is this just doesn't make much sense having a compile error because:
It does not affect compiling at all, so why a compiler error?
Your application MIGHT run correctly, because there is no guarantee the third-party library EVER gets called.
It might actually break several libraries, that e.g. do reference external libraries for debugging, but just don't ship them for release.
Edit: If your problem is that you are forgetting about the third-party library you can simply reference it directly from your application even if you never use it. Then e.g. Visual Studio will automatically copy it to your output bin folder and includes it in setups, and so on.
If you're seeing this while in Visual Studio it's probably because the ICSharpCode.SharpZipLib.dll isn't being copied to the build folder of your "other" project.
So this won't be a problem when you distribute your library for consumption by third parties because the ICSharpCode.SharpZibLib.dll will be in the same folder as your library.
During development and testing though it can be a bit of a hassle. Generally when setting up a multi-project solution I just have all the projects target their Output folder to a single solution-wide Build folder. That way all the dependencies are copied to the same location for testing.
You just have to copy ICSharpCodeSharpZipLib.dll to C:\Windows\assembly and your problem will be solved.

How do C/C++/Objective-C compare with C# when it comes to using libraries?

This question is based on a previous question: How does C# compilation get around needing header files?.
Confirmation that C# compilation makes use of multiple passes essentially answers my original question. Also, the answers indicated that C# uses type and method signature metadata stored in assemblies to check code syntax at compile time.
Q: how does C/C++/Objective-C know what code to load at run time that was linked at compile-time? And to tie it into a technology I'm familiar with, how does C#/CLR do this?
Correct me if I'm wrong, but for C#/CLR, my intuitive understanding is that certain paths are checked for assemblies upon execution, and basically all code is loaded and linked dynamically at run time.
Edit: Updated to include C++ and Objective-C with C.
Update: To clarify, what I really am curious about is how C/C++/Objective-C compilation matches an "externally defined" symbol in my source with the actual implementation of that code, what is the compilation output, and basically how the compilation output is executed by the microprocessor to seamlessly pass control into the library code (in terms of instruction pointer). I have done this with the CLR virtual machine, but am curious to know how this works conceptually in C++/Objective-C on an actual microprocessor.
The linker plays an essential role in C/C++ building to resolve external dependencies. .NET languages don't use a linker.
There are two kinds of external dependencies, those whose implementation is available at link time in another .obj or .lib file offered as input to the linker. And those that are available in another executable module. A DLL in Windows.
The linker resolves the first ones at link time, nothing complicated happens since the linker will know the address of the dependency. The latter step is highly platform dependent. On Windows, the linker must be provided with an import library. A pretty simple file that merely declares the name of the DLL and a list of the exported definitions in the DLL. The linker resolves the dependency by entering a jump in the code and adding a record to the external dependency table that indicates the jump location so that it can be patched at runtime. The loading of the DLL and setting up the import table is done at runtime by the Windows loader. This is a bird's-eye view of the process, there are many boring details to make this happen as quickly as possible.
In managed code all of this is done at runtime, driven by the JIT compiler. It translates IL into machine code, driven by program execution. Whenever code executes that references another type, the JIT compiler springs into action, loads the type and translates the called method of the type. A side-effect of loading the type is loading the assembly that contains the type, if it wasn't loaded before.
Notable too is the difference for external dependencies that are available at build time. A C/C++ compiler compiles one source file at a time, the dependencies are resolved by the linker. A managed compiler normally takes all source files that create an assembly as input instead of compiling them one at a time. Separate compilation and linking is in fact supported (.netmodule and al.exe) but is not well supported by available tools and thus rarely done. Also, it cannot support features like extension methods and partial classes. Accordingly, a managed compiler needs many more system resources to get the job done. Readily available on modern hardware. The build process for C/C++ was established in an era where those resources were not available.
I believe the process you're asking about is the one called symbol resolution. In the common case, it works along these lines (I've tried to keep it pretty OS-neutral):
The first step is compiling of individual source files to create object files. The source code is turned machine language instructions, and any symbols (ie. function or external variable names) that aren't defined in the source file itself result in placeholders being left in the compiled machine language code, wherever they are referenced. The unknown symbol is also added to a list in the object file - at the end of compilation, this list contains every unresolved symbol in the object file, cross-referenced with the location in the object file of all the placeholders that were added. Each object file also contains a list of the symbols exported by that object file - that is, the symbols defined in that object file that it wants to make visible to code outside that object file - along with the values of those symbols.
The second step is static linking. This also happens at compile-time. During the static linking process, all of the object files created in the first step and any static library files (which are just a special kind of object file) are combined into a single executable. The static linker does a pass through the symbols exported by each object file and static library it has been told to link together, and builds a complete list of the exported symbols (and their values). It then does a pass through the unresolved symbols in each object file, and where the symbol is found in the master list, replaces all of the placeholders with the actual value of the symbol. For any symbols that still remain unresolved at the end of this process, the linker looks through the list of symbols exported by all dynamic libraries it knows about. It builds a list of dynamic libraries that are required, and stores this in the executable. If any symbols still haven't been found, the link process fails.
The third step is dynamic linking, which happens at run time. The dynamic linker loads the dynamic libraries in the list contained in the executable, and replaces the placeholders for the remaining unresolved symbols with their corresponding values from the dynamic libraries. This can either be done "eagerly" - after the executable loads but before it runs - or "lazily", which is on-demand, when an unresolved symbol is first accessed.
The C and C++ Standards have nothing to say about run-time loading - this is entirely OS-specific. In the case of Windows, one links the code with an export library (generated when a DLL is created) that contains the names of functions and the name of the DLL they are in. The linker creates stubs in the code containing this information. At run-time, these stubs are used by the C/C++ runtime together with the Windows LoadLibrary() and associated functions to load the function code into memory and execute it.
By libraries you are referring to DLLs right?
There are certain patterns for OS to look for required files (usually start from application local path, then proceed to folder specify by environment variable <PATH>.)

Categories