Faster C# compilation via usage of "object" files and merge later?

Faster C# compilation via usage of "object" files and merge later? - c#

I'm very new to C# and have more knowledge in the field of C++ and Java. When compiling a C++ or Java project, I am used to that compilation is performed for each source file on its own. In C++, and additional step to link all the object files into one library/exe/dll is taken afterwards.
I see several advantages in this method, but I can not find a way to realize it in C# using the mono dmcs compiler. Assume I have two files each with a single class.
OptionsSet.cs
interface OptionsSet {
//
}
DefaultOptionsSet.cs
class DefaultOptionsSet : OptionsSet {
//
}
I can successfully compile this into a library by invoking
dmcs mylib/OptionsSet.cs mylib/DefaultOptionsSet.cs -target:library -out:mylib.dll
But I do not want to recompile all source files when I changed a single file! Doing the following:
dmcs mylib/DefaultOptionsSet.cs -target:library -out:mylib/DefaultOptionsSet.dll
yields
mylib\DefaultOptionsSet.cs(15,27): error CS0246: The type or namespace name `OptionsSet' could not be found. Are you missing an assembly reference?
Compilation failed: 1 error(s), 0 warnings
I know I can add assembly references with the -r option, but what if the assembly reference was not yet compiled?
In Java, my Makefile would look like this:
SOURCE_DIRS = mylib
SOURCES = $(foreach dir,$(SOURCE_DIRS),$(wildcard $(dir)/*.java))
CLASSES = $(SOURCES:java=class)
compile: $(CLASSES)
mylib/%.class: mylib/%.java
javac $< -classpath .
But I can not directly translate this to build my C# library.

C# does not support to do what you want to do, there is only a single step between source files (.cs) and the final assemblies (.dll and .exe).
Have in mind that a C# compiler is in general a lot faster than a C/C++ compiler for the same amount of source code (among other things this is because the C# doesn't have to read megabytes of header files for each source file), so generally compilation speed is not an issue.
If you really want what you can do is to split your work into several assemblies (.dll files), compile each separately, and then just reference those assemblies when building your main executable. However unless your project is really big, you'll spend more time implementing this split than the time you'd save from building less when done.
I'd recommend not to worry about long compile-times unless you actually have a problem with long compile-times.

Related

How can we be sure all our internal things will actually be compiled into the same assembly?

I already know that C# provides internal modifier:
internal: The type or member can be accessed by any code in the same assembly, but not from another assembly.
--- Quoted from this official document
And I also understand what is an assembly. It is basically a DLL or EXE.
Now, how can we be sure all our internal things will actually be compiled into the same assembly? The reason I ask this, is because I got an impression from here that the build process can be very flexible, different source file can potentially be compiled into different assemblies:
csc /target:library /out:MathLibraryPart1.DLL Add.cs
csc /target:library /out:MathLibraryPart2.DLL Mult.cs
to a point that we would even need a way to detect whether a file is in a particular assembly.
Put it in another way, if I already wrote my library with internal modifier here and there and then commit my code, but then the eventual binary output may or may not work as intended, due to the different way to build my source code. That doesn't sound right to me. Unless, the way to build my source code is (forced to become) also an integral part of my project and need to be checked into my code repo. Is that the case?

The build is deterministic, is the result of running VS builder or MSBuild on the solution file and the project files. They define strictly what assemblies are produced, there is no flexibility there. The general rule is "one .csproj file produces one assembly". And one .cs file belongs to one .csproj.
As for you modifying the access of a method or type to internal and then discovering at runtime that something is broken, you can rest assured: the discovery occurs at compile time. Your code won't even compile anymore.
Also, your binary 'may or may not work' seems like you're missing basic unit tests. Add unit tests, make them part of your build, and then you'll know if the code works or not (at least the part that is tested). Integration tests also help. Get started with developer testing tools.
Also, read Code Complete. So many of your questions were answered years ago and is sad to see them come back again and again.

Thanks for the hint (and motivation) from #Remus-Rusanu, I think the best way to understand this is do a hands-on experiment.
//File: foo.cs
namespace Space
{
public class FooClass
{
public static int Foo() {return BarClass.Bar();}
}
}
//File: bar.cs
namespace Space
{
internal class BarClass
{
public static int Bar() {return 123;}
}
}
Now, attempt to compile these 2 files into separated assemblies would not compile, which is a correct behavior:
csc /target:library /out:foo.dll foo.cs
error CS0103: The name 'Bar' does not exist in the current context
Compiling them together, you will get the library, and then all the internals inside this dll will work as expected.
csc /target:library /out:foobar.dll foo.cs bar.cs
# It will generate foobar.dll
So this clarifies my previous question. Yes, we can be sure that "all our internal things will actually be compiled into the same assembly", because otherwise the compile attempt would fail.

Compile to Intermediate type

This is a learning for me.
Compile to What Output type or How to Compile a C# Class Library to an Intermediate File, but Not DLL; which can be used in other project without having source code and not passing it to End User.
This is achievable in Delphi/C/C++ as per my knowledge.

which can be used in other project without having source code and not passing it to End User.
It sounds to me like you should compile it to a dll, but perhaps consider ILMerge as part of your build/deploy strategy. And frankly there is rarely any good reason not to simply ship the dll without merging.
Note that csc does allow you to output raw modules, via /target:module (presumably then re-combining with /addmodule) - but frankly that will be a real pain to work with.

Include file in C#, Mono (gmcs)

I'm writing a really small application in C# and I need to include an .cs file with some class. How do I do that? I'm looking for some equivalent to PHP's "include file" or Python's "from file import something". Obviously "using" is not enough and the file has to be somehow linked. And I really don't want to start some new super-huge-totally-useless project using MonoDevelop or VisualStudio, I would like to stay with simple gmcs and command line.

You simply include both file names on the command line and ensure that the namespaces are the same or that the namespace of the included file is imported via a using statement or via fully qualified references. Then the command line for compilation looks like this:
gmcs mainFile.cs includeFile.cs
Note that the mono command line is designed to support the exact same syntax (with a few additions) as the Microsoft compiler so this is true for both of them.
Fundamentally this is what the project files and visual studio are doing (albeit going through an in memory msbuild equivalent)

There are two ways to "include" a file in .NET (and Mono)
Compile several files together.
gmcs mainFile.cs includeFile.cs
then files are then compiled together to a single assembly
Compile the includeFile to a separate assembly and reference that from the main assembly
gmcs -target:library includeFile.cs
gmcs -r:includeFile.dll mainFile.cs
this way you get two assemblies

Requiring library consumers reference additional assembly when using certain types

I have library code that uses ICSharpCode.SharpZipLib under the hood to make it easy to use ZIP files as data sources when running integration tests.
As it stands, if I reference my library from another project, the other project will compile just fine, but when it accesses the code that uses SharpZipLib, I get an exception for it not finding the zip library:
failed: System.IO.FileNotFoundException : Could not load file or assembly 'ICSharpCode.SharpZipLib, Version=0.85.5.452, Culture=neutral, PublicKeyToken=1b03e6acf1164f73' or one of its dependencies. The system cannot find the file specified.
If the types in my library derived from a class in SharpZipLib, it'd generate a compile error CS0012. What other ways are there for triggering a CS0012, so that using code that requires SharpZipLib (but doesn't clearly indicate it) would cause consumer code to fail compilation?
I've had similar problems in the past when I've used libraries like DeftTech.DuckTyping under the hood. I'd add my library code to a new project, start working, compile, run, and then suddenly hit an edge case that I'd used duck typing to get around and get a runtime error.
What I'd most like is to have the same behavior as if I'd derived from a type in the 3rd-party library, so that a reference to my derived type generates a CS0012:
The type 'type' is defined in an assembly that is not referenced. You must add a reference to assembly 'assembly'.

You only get compiler errors if you are DIRECTLY interacting with libraries that aren't referenced.
If you use other libraries that internally use a third party library then you will never get a compiler error. The reason is this just doesn't make much sense having a compile error because:
It does not affect compiling at all, so why a compiler error?
Your application MIGHT run correctly, because there is no guarantee the third-party library EVER gets called.
It might actually break several libraries, that e.g. do reference external libraries for debugging, but just don't ship them for release.
Edit: If your problem is that you are forgetting about the third-party library you can simply reference it directly from your application even if you never use it. Then e.g. Visual Studio will automatically copy it to your output bin folder and includes it in setups, and so on.

If you're seeing this while in Visual Studio it's probably because the ICSharpCode.SharpZipLib.dll isn't being copied to the build folder of your "other" project.
So this won't be a problem when you distribute your library for consumption by third parties because the ICSharpCode.SharpZibLib.dll will be in the same folder as your library.
During development and testing though it can be a bit of a hassle. Generally when setting up a multi-project solution I just have all the projects target their Output folder to a single solution-wide Build folder. That way all the dependencies are copied to the same location for testing.

You just have to copy ICSharpCodeSharpZipLib.dll to C:\Windows\assembly and your problem will be solved.

How do C/C++/Objective-C compare with C# when it comes to using libraries?

This question is based on a previous question: How does C# compilation get around needing header files?.
Confirmation that C# compilation makes use of multiple passes essentially answers my original question. Also, the answers indicated that C# uses type and method signature metadata stored in assemblies to check code syntax at compile time.
Q: how does C/C++/Objective-C know what code to load at run time that was linked at compile-time? And to tie it into a technology I'm familiar with, how does C#/CLR do this?
Correct me if I'm wrong, but for C#/CLR, my intuitive understanding is that certain paths are checked for assemblies upon execution, and basically all code is loaded and linked dynamically at run time.
Edit: Updated to include C++ and Objective-C with C.
Update: To clarify, what I really am curious about is how C/C++/Objective-C compilation matches an "externally defined" symbol in my source with the actual implementation of that code, what is the compilation output, and basically how the compilation output is executed by the microprocessor to seamlessly pass control into the library code (in terms of instruction pointer). I have done this with the CLR virtual machine, but am curious to know how this works conceptually in C++/Objective-C on an actual microprocessor.

The linker plays an essential role in C/C++ building to resolve external dependencies. .NET languages don't use a linker.
There are two kinds of external dependencies, those whose implementation is available at link time in another .obj or .lib file offered as input to the linker. And those that are available in another executable module. A DLL in Windows.
The linker resolves the first ones at link time, nothing complicated happens since the linker will know the address of the dependency. The latter step is highly platform dependent. On Windows, the linker must be provided with an import library. A pretty simple file that merely declares the name of the DLL and a list of the exported definitions in the DLL. The linker resolves the dependency by entering a jump in the code and adding a record to the external dependency table that indicates the jump location so that it can be patched at runtime. The loading of the DLL and setting up the import table is done at runtime by the Windows loader. This is a bird's-eye view of the process, there are many boring details to make this happen as quickly as possible.
In managed code all of this is done at runtime, driven by the JIT compiler. It translates IL into machine code, driven by program execution. Whenever code executes that references another type, the JIT compiler springs into action, loads the type and translates the called method of the type. A side-effect of loading the type is loading the assembly that contains the type, if it wasn't loaded before.
Notable too is the difference for external dependencies that are available at build time. A C/C++ compiler compiles one source file at a time, the dependencies are resolved by the linker. A managed compiler normally takes all source files that create an assembly as input instead of compiling them one at a time. Separate compilation and linking is in fact supported (.netmodule and al.exe) but is not well supported by available tools and thus rarely done. Also, it cannot support features like extension methods and partial classes. Accordingly, a managed compiler needs many more system resources to get the job done. Readily available on modern hardware. The build process for C/C++ was established in an era where those resources were not available.

I believe the process you're asking about is the one called symbol resolution. In the common case, it works along these lines (I've tried to keep it pretty OS-neutral):
The first step is compiling of individual source files to create object files. The source code is turned machine language instructions, and any symbols (ie. function or external variable names) that aren't defined in the source file itself result in placeholders being left in the compiled machine language code, wherever they are referenced. The unknown symbol is also added to a list in the object file - at the end of compilation, this list contains every unresolved symbol in the object file, cross-referenced with the location in the object file of all the placeholders that were added. Each object file also contains a list of the symbols exported by that object file - that is, the symbols defined in that object file that it wants to make visible to code outside that object file - along with the values of those symbols.
The second step is static linking. This also happens at compile-time. During the static linking process, all of the object files created in the first step and any static library files (which are just a special kind of object file) are combined into a single executable. The static linker does a pass through the symbols exported by each object file and static library it has been told to link together, and builds a complete list of the exported symbols (and their values). It then does a pass through the unresolved symbols in each object file, and where the symbol is found in the master list, replaces all of the placeholders with the actual value of the symbol. For any symbols that still remain unresolved at the end of this process, the linker looks through the list of symbols exported by all dynamic libraries it knows about. It builds a list of dynamic libraries that are required, and stores this in the executable. If any symbols still haven't been found, the link process fails.
The third step is dynamic linking, which happens at run time. The dynamic linker loads the dynamic libraries in the list contained in the executable, and replaces the placeholders for the remaining unresolved symbols with their corresponding values from the dynamic libraries. This can either be done "eagerly" - after the executable loads but before it runs - or "lazily", which is on-demand, when an unresolved symbol is first accessed.

The C and C++ Standards have nothing to say about run-time loading - this is entirely OS-specific. In the case of Windows, one links the code with an export library (generated when a DLL is created) that contains the names of functions and the name of the DLL they are in. The linker creates stubs in the code containing this information. At run-time, these stubs are used by the C/C++ runtime together with the Windows LoadLibrary() and associated functions to load the function code into memory and execute it.

By libraries you are referring to DLLs right?
There are certain patterns for OS to look for required files (usually start from application local path, then proceed to folder specify by environment variable <PATH>.)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.