Measure startup performance c# application

Measure startup performance c# application - c#

I noticed that sometimes a .net 4.0 c# application takes a long time to start, without any apparent reason. Can can I determine what's actually happening, what modules are loaded? I'm using a number of external assemblies. Can putting them into the GAC improve performances?
Is .NET 4 slower than .NET 2?

.NET programs have two distinct start-up behaviors. They are called cold-start and warm-start. The cold-start is the slow one, you'll get it when no .NET program was started before. Or when the program you start is large and was never run before. The operating system has to find the assembly files on disk, they won't be available in the file system cache (RAM). That takes a while, hard disks are slow and there are a lot of files to find. A small do-nothing Winforms app has to load 51 DLLs to get started. A do-nothing WPF app weighs in at 77 DLLs.
You get a warm start when the assembly files were loaded before, not too long ago. The assembly file data now comes from RAM instead of the slow disk, that's zippedy-doodah. The only startup overhead is now the jitter.
There's little you can do about cold starts, the assemblies have to come of the disk one way or another. A fast disk makes a Big difference, SSDs are especially effective. Using ngen.exe to pre-jit an assembly actually makes the problem worse, it creates another file that needs to be found and loaded. Which is the reason that Microsoft recommends not prejitting small assemblies. Seeing this problem with .NET 4 programs is also highly indicated, you don't have a lot of programs that bind to the version 4 CLR and framework assemblies. Not yet anyway, this solves itself over time.
There's another way this problem automatically disappears. The Windows SuperFetch feature will start to notice that you often load the CLR and the jitted Framework assemblies and will start to pre-load them into RAM automatically. The same kind of trick that the Microsoft Office and Adobe Reader 'optimizers' use. They are also programs that have a lot of DLL dependencies. Unmanaged ones, the problem isn't specific to .NET. These optimizers are crude, they preload the DLLs when you login. Which is the 'I'm really important, screw everything else' approach to working around the problem, make sure you disable them so they don't crowd out the RAM space that SuperFetch could use.

The startup time is most likely due to the runtime JIT compiling assembly IL into machine code for execution. It can also be affected by the debugger - as another answerer has suggested.
Excluding that - I'll talk about an application ran 'in the wild' on a user's machine, with no debugger etc.
The JIT compiler in .Net 4 is, I think it's fair to say, better than in .Net 2 - so no; it's not slower.
You can improve this startup time significantly by running ngen on your application's assemblies - this pre-compiles the EXEs and DLLs into native images. However you lose some flexibility by doing this and, in general, there is not much point.
You should see the startup time of some MFC apps written in C++ - all native code, and yet depending on how they are linked they can take just as long.
It does, of course, also depend on what an application is actually doing at startup!

I dont think putting your assemblies in GAC will boot the performance.
If possible do logging for each instruction you have written on Loading or Intialize events which may help you to identify which statement is actually taking time and with this you can identify the library which is taking time in loading.

Related

Is machine-specific binary code produced by JIT saved permanently to disk as well? [duplicate]

A .NET program is first compiled into MSIL code. When it is executed, the JIT compiler will compile it into native machine code.
I am wondering:
Where is these JIT-compiled machine code stored? Is it only stored in address space of the process? But since the second startup of the program is much faster than the first time, I think this native code must have been stored on disk somewhere even after the execution has finished. But where?

Memory. It can be cached, that's the job of ngen.exe. It generates a .ni.dll version of the assembly, containing machine code and stored in the GAC. Which automatically gets loaded afterward, bypassing the JIT step.
But that has little to do with why your program starts faster the 2nd time. The 1st time you have a so-called "cold start". Which is completely dominated by the time spent on finding the DLLs on the hard drive. The second time you've got a warm start, the DLLs are already available in the file system cache.
Disks are slow. An SSD is an obvious fix.
Fwiw: this is not a problem that's exclusive to managed code. Large unmanaged programs with lots of DLLs have it too. Two canonical examples, present on most dev machines are Microsoft Office and Acrobat Reader. They cheat. When installed, they put an "optimizer" in the Run registry key or the Startup folder. All that these optimizers do is load all the DLLs that the main program uses, then exit. This primes the file system cache, when the user subsequently uses the program, it will start up quickly since its warm start is fast.
Personally, I find this extraordinarily annoying. Because what they really do is slow down any other program that I may want to start after logging in. Which is rarely Office or Acrobat. I make it a point to delete these optimizers, repeatedly if necessary when a blasted update puts it back.
You can use this trick too, but use it responsibly please.

As others have pointed out, code is JIT'd on a per process basis in your case, and is not cached - the speed-up you are seeing on second load is OS disk caching (i.e. in-memory) of the assemblies.
However, whilst there is no caching (apart from OS disk caching) in the desktop\server version of the framework, there is caching of JIT'd machine code in another version of the framework.
Of interest is what is happening in the .Net Compact Framework (NETCF for Windows Phone 7 relase). Recent advances see sharing of some JIT'd framework code between processes where the JIT'd code is indeed cached. This has been primarily carried out for better performance (load time and memory usage) in constrained devices such as mobile phones.
So in answer to the question there is no direct framework caching of JIT'd code in the desktop\server version of the CLR, but there will be in the latest version of the compact framework i.e. NETCF.
Reference: We Believe in Sharing
Link

JIT compiled machine code is cached in memory per-method, each time that a method is executed for the first time. I don't think it is ever cached to disk.
You may find that the process is faster to load the second time because Windows cached (in memory) the files used by your process (dlls, resources etc etc) on the first run. On the second run there is no need to go to disk, where this may have been done on the first run.
You could confirm this by running NGen.exe to actually pre-compile the machine code for your architecture, and compare the performance of the first and second runs. My bet is that the second run would still be faster, due to caching in the OS.

In short, the IL is JIT-compiled for each invocation of the program and is maintained in code pages of the process address space. See Chapter 1 of Richter for great coverage of the .NET execution model.

I believe that the JIT compiled code is never stored or swapped out of memory. The performance boost you perceive on a second execution of an assembly is due to dependant assemblies already being in memory or disc cache.

Yes, NGEN.EXE will place a JIT compiled version of a .NET executable in the GAC, even when
the MSIL version is not there. I have tried that, but to no avail.
I believe, unless the original MSIL version is also in the GAC and would be loaded
from there, the JIT version in the GAC will not be used.
I also believe that on-the-fly JIT compiles (not NGEN) are never cached; they occupy process
memory only.
I believe this from reading the MS doc and from various experiments. I would welcome either
a confirmation or rebuttal of my assertions from those "who know".

How to speed up MonoTouch compilation time?

It is well known that
If compiling takes even 15 seconds, programmers will get bored while the compiler runs and switch over to reading The Onion, which will suck them in and kill hours of productivity.
Our MonoTouch app takes 40 seconds to compile on Macbook Air in Debug/Simulator configuration.
We have about 10 assemblies in the solution.
We're also linking against some native libraries with gcc_flags.
I'm sure there are ways to optimize compilation time that I'm not aware of, which might have to do with references, linker, whatever.
I'm asking this question in hope that someone with better knowledge than me will compile (no pun intended) a list of tips and things to check to reduce MonoTouch compilation time for debug builds.
Please don't suggest hardware optimizations or optimizations not directly related to MonoTouch.

Build Time Improvements in Xamarin.iOS 6.4
Xamarin.iOS 6.4 has significant build time improvements, and there is now an option to only send updated bits of code to the device. See for yourself:
(source: xamarin.com)
Read more and learn how to enable incremental build in Rolf's post.
Evolve 2013 Video
An updated and expanded version of this content can be seen in the video of the Advanced iOS Build mechanics talk I gave at Evolve 2013.
Original Answer
There are several factors affecting build speed. However most of them have more impact on device builds, including the use of the managed linker that you mentioned.
Managed Linker
For devices then Link all is the fastest, followed by Link SDK and (at the very end) Don't link. The reason is that the linker can eliminate code faster than the AOT compiler can build it (net gain). Also the smaller .app will upload faster to your devices.
For simulator Don't link is always faster because there's no AOT (the JIT is used). You should not use other linking options unless you want to test them (it's still faster than doing a device build).
Device tricks
Building a single architecture (e.g. ARMv7) is faster than a FAT binary (e.g. ARMv7 + ARMV7s). Smaller applications also means less time to upload to device;
The default AOT compiler (mono) is a lot faster than using LLVM compilers. However the later will generate better code and also supports ARMv7s, Thumb2;
If you have large assets bundled in your .app then it will take time to deploy/upload them (every time since they must be signed) with your app. I wrote a blog post on how you can workaround this - it can save a lot of time if you have large assets;
Object file caching was implemented in MonoTouch 5.4. Some builds will be a lot faster, but others won't be (when the cache must be purged) faster (but never slower ;-). More information why this often happens here).
Debug builds takes longer because of symbols, running dsymutil and, since it ends up being larger, extra time to upload to devices.
Release builds will, by default (you can turn it off), do a IL strip of the assemblies. That takes only a bit of time - likely gained back when deploying (smaller .app) to the device.
Simulator tricks
Like said earlier try to avoid linking since it will take more time and will require copying assemblies (instead of symlinking them);
Using native libraries is slower because we cannot reuse the shared simlauncher main executable in such cases and need to ask gcc to compile one for the application (and that's slow).
Finally whenever in doubt time it! and by that I mean you can add --time --time to your project extra mtouch arguments to see a timestamp after each operation :-)

This is not really meant as an answer, rather a temporary placeholder until there is a better one.
I found this quote by Seb:
Look at your project's build options and make sure the "Linker
behavior" is at the default "Link SDK assemblies".
If it's showing "Don't link" then you'll experience very long build
time (a large part of it in dsymutil).
I don't know if it is still relevant though, because MonoDevelop shows a warning sign when I choose this option, and it doesn't seem to affect performance much.

You cannot expect your compiler to be lightninng quick without understanding everything that it is required to do. Larger applications will naturally take longer. Different languages or different compilers of the same language can make a huge difference on how long it takes to compile your code.
We have a project that will take almost 2 minutes to compile. Your best solution is to figure out a way to reduce the number of times you compile your code.
Instead of trying to fix 1 line of code and rebuilding, over and over again. Get a group of people together to discuss the problem. Or create a list of 3 or 4 things you want to work on, complete them all then test.
These are just some suggestions and they will not work in all cases.

how to make software faster

It seem that when I re run my .net application , it became much faster than before , why ?
Also is there anyway for my software to be run faster on startup ?
regards

If it's the first .NET application running in your system, then the first time you run it, all the .NET libraries and the CLR have to be loaded from physical disk. The second time you run, everything will be in the file system cache, so it'll be loading it from memory. There may well be other caching effects in play beyond the file system cache, but that's the most obvious one.
The same is true of your specific application, although that's likely to be a lot smaller than the framework itself.
One option to try to bootstrap this is to have a small no-op application (e.g. a WinForms app that never actually launches a window) which runs on startup. Of course, this will slow down the rest of your startup a bit - and if the computer doesn't run any .NET applications for a long time, the framework will be ejected from the cache eventually.

The first time you run your .NET app the following happens:
1) Loading of your application, the runtime, and the framework from hard disk (which is slow) to the memory (which is much faster)
2) Then your application and the associated libraries are just-in-time JIT compiled to native code...as needed. This native code stays around in the memory, and the runtime infrastructure keeps a record of the code that it has compiled to native code.
3) Only in the third step does this native code actually executed by the processor.
If you dont shut down your computer and rerun your application. The following happens:
1) When the run time encounters your managed code that has already been compiled to native by the JIT compiler , it does not recompile it. It simply executes the already compile native in memory.
2) Only the code that was not JIT compiled to native in the first run is now compile from managed to native...and thats only if needed.
So on a second run of your application two things get real fast:
1) loading either doesnt happen at all or its far smaller than the first one.
2) compilation from managed to native either doesnt happen or its minimal
Thats why your second run of the application is almost always faster then the first run.

This is almost certainly because the OS has loaded needed DLLs which stay in memory (unless the memory is needed elsewhere) after your application exits.
You can run your program in a special mode that just loads and exits) so that those DLLs will load up and this is a trick used by a few applications (MS Office and OpenOffice.org are two that spring to mind immediately).
Some people will run their programs at startup to make their first invocation seem faster but it's my opinion that this should be left to the user. It is their machine after all. By all means show them how they can do it (e.g., add yourprogram.exe /loadandexit to your startup folder) but leave it up to them.
I, for one, don't want every application I run slowing down my boot time.

Where is the .NET JIT-compiled code cached?

A .NET program is first compiled into MSIL code. When it is executed, the JIT compiler will compile it into native machine code.
I am wondering:
Where is these JIT-compiled machine code stored? Is it only stored in address space of the process? But since the second startup of the program is much faster than the first time, I think this native code must have been stored on disk somewhere even after the execution has finished. But where?

Memory. It can be cached, that's the job of ngen.exe. It generates a .ni.dll version of the assembly, containing machine code and stored in the GAC. Which automatically gets loaded afterward, bypassing the JIT step.
But that has little to do with why your program starts faster the 2nd time. The 1st time you have a so-called "cold start". Which is completely dominated by the time spent on finding the DLLs on the hard drive. The second time you've got a warm start, the DLLs are already available in the file system cache.
Disks are slow. An SSD is an obvious fix.
Fwiw: this is not a problem that's exclusive to managed code. Large unmanaged programs with lots of DLLs have it too. Two canonical examples, present on most dev machines are Microsoft Office and Acrobat Reader. They cheat. When installed, they put an "optimizer" in the Run registry key or the Startup folder. All that these optimizers do is load all the DLLs that the main program uses, then exit. This primes the file system cache, when the user subsequently uses the program, it will start up quickly since its warm start is fast.
Personally, I find this extraordinarily annoying. Because what they really do is slow down any other program that I may want to start after logging in. Which is rarely Office or Acrobat. I make it a point to delete these optimizers, repeatedly if necessary when a blasted update puts it back.
You can use this trick too, but use it responsibly please.

As others have pointed out, code is JIT'd on a per process basis in your case, and is not cached - the speed-up you are seeing on second load is OS disk caching (i.e. in-memory) of the assemblies.
However, whilst there is no caching (apart from OS disk caching) in the desktop\server version of the framework, there is caching of JIT'd machine code in another version of the framework.
Of interest is what is happening in the .Net Compact Framework (NETCF for Windows Phone 7 relase). Recent advances see sharing of some JIT'd framework code between processes where the JIT'd code is indeed cached. This has been primarily carried out for better performance (load time and memory usage) in constrained devices such as mobile phones.
So in answer to the question there is no direct framework caching of JIT'd code in the desktop\server version of the CLR, but there will be in the latest version of the compact framework i.e. NETCF.
Reference: We Believe in Sharing
Link

JIT compiled machine code is cached in memory per-method, each time that a method is executed for the first time. I don't think it is ever cached to disk.
You may find that the process is faster to load the second time because Windows cached (in memory) the files used by your process (dlls, resources etc etc) on the first run. On the second run there is no need to go to disk, where this may have been done on the first run.
You could confirm this by running NGen.exe to actually pre-compile the machine code for your architecture, and compare the performance of the first and second runs. My bet is that the second run would still be faster, due to caching in the OS.

In short, the IL is JIT-compiled for each invocation of the program and is maintained in code pages of the process address space. See Chapter 1 of Richter for great coverage of the .NET execution model.

I believe that the JIT compiled code is never stored or swapped out of memory. The performance boost you perceive on a second execution of an assembly is due to dependant assemblies already being in memory or disc cache.

Yes, NGEN.EXE will place a JIT compiled version of a .NET executable in the GAC, even when
the MSIL version is not there. I have tried that, but to no avail.
I believe, unless the original MSIL version is also in the GAC and would be loaded
from there, the JIT version in the GAC will not be used.
I also believe that on-the-fly JIT compiles (not NGEN) are never cached; they occupy process
memory only.
I believe this from reading the MS doc and from various experiments. I would welcome either
a confirmation or rebuttal of my assertions from those "who know".

Have you ever used ngen.exe?

Has anybody here ever used ngen? Where? why? Was there any performance improvement? when and where does it make sense to use it?

I don't use it day-to-day, but it is used by tools that want to boost performance; for example, Paint.NET uses NGEN during the installer (or maybe first use). It is possible (although I don't know for sure) that some of the MS tools do, too.
Basically, NGEN performs much of the JIT for an assembly up front, so that there is very little delay on a cold start. Of course, in most typical usage, not 100% of the code is ever reached, so in some ways this does a lot of unnecessary work - but it can't tell that ahead of time.
The downside, IMO, is that you need to use the GAC to use NGEN; I try to avoid the GAC as much as possible, so that I can use robocopy-deployment (to servers) and ClickOnce (to clients).

Yes, I've seen performance improvements. My measurements indicated that it did improve startup performance if I also put my assemblies into the GAC since my assemblies are all strong named. If your assemblies are strong named, NGen won't make any difference without using the GAC. The reason for this is that if you have strong named assemblies that are not in the GAC, then the .NET runtime validates that your strong named assembly hasn't been tampered with by loading the whole managed assembly from disk so it can validate it circumventing one of the major benefits of NGen.
This wasn't a very good option for my application since we rely on common assemblies from our company (that are also strong named). The common assemblies are used by many products that use many different versions, putting them in the GAC meant that if one of our applications didn't say "use specific version" of one of the common assemblies it would load the GAC version regardless of what version was in its executing directory. We decided that the benefits of NGen weren't worth the risks.

Ngen mainly reduces the start-up time of .NET app and application's working set. But it's have some disadvantages (from CLR Via C# of Jeffrey Richter):
No Intellectual Property Protection
NGen'd files can get out of sync
Inferior Load-Time Performance (Rebasing/Binding)
Inferior Execution-Time Performance
Due to all of the issues just listed, you should be very cautious when considering the use of
NGen.exe. For server-side applications, NGen.exe makes little or no sense because only the
first client request experiences a performance hit; future client requests run at high speed. In
addition, for most server applications, only one instance of the code is required, so there is no
working set benefit.
For client applications, NGen.exe might make sense to improve startup time or to reduce
working set if an assembly is used by multiple applications simultaneously. Even in a case in
which an assembly is not used by multiple applications, NGen'ing an assembly could improve
working set. Moreover, if NGen.exe is used for all of a client application's assemblies, the CLR
will not need to load the JIT compiler at all, reducing working set even further. Of course, if
just one assembly isn't NGen'd or if an assembly's NGen'd file can't be used, the JIT compiler
will load, and the application's working set increases.

ngen is mostly known for improving startup time (by eliminating JIT compilation). It might improve (by reducing JIT time) or decrease overall performance of the application (since some JIT optimizations won't be available).
.NET Framework itself uses ngen for many assemblies upon installation.

i have used it but just for research purpose. use it ONLY if you are sure about the cpu architecture of your deployment environment (it wont change)
but let me tell you JIT compilation is not too bad and if you have deployments across multiple cpu environments (for example a windows client application which is updated often) THEN DO NOT USE NGEN. thats coz a valid ngen cache depends upon many attributes. if one of these fail, your assembly falls back to jit again
JIT is a clear winner in such cases, as it optimizes code on the fly based on the cpu architecture its running on. (for eg it can detect if there are more then 1 cpu)
and clr is getting better with every release, so in short stick with JIT unless you are dead sure of your deployment environment - even then your performance gains would hardly justify using ngen.exe (probably gains would be in few hundred ms) - imho - its not worth the efforts
also check this real nice link on this topic - JIT Compilation and Performance - To NGen or Not to NGen?

Yes. Used on a WPF application to speed up startup time. Startup time went from 9 seconds to 5 seconds. Read about it in my blog :
I recently discovered how great NGEN can be for performance. The
application I currently work on has a data access layer (DAL) that is
generated. The database schema is quite large, and we also generate
some of the data (list of values) directly into the DAL. Result: many
classes with many fields, and many methods. JIT overhead often showed
up when profiling the application, but after a search on JIT compiling
and NGEN I though it wasn’t worth it. Install-time overhead, with
management my major concern, made me ignore the signs and focus on
adding more functionality to the application instead. When we changed
architecture to “Any CPU” running on 64 bit machines things got worse:
We experienced hang in our application for up to 10 seconds on a
single statement, with the profiler showing only JIT overhead on the
problem-area. NGEN solved the problem: the statement went from 10
seconds to 1 millisecond. This statement was not part of the
startup-procedure, so I was eager to find out what NGEN’ing the whole
application could do to the startup time. It went from 8 seconds to
3.5 seconds.
Conclusion: I really recommend giving NGEN a try on your application!

As an addition to Mehrdad Afshari's comment about JIT compilation. If serializing a class with many properties via the XmlSerializer and on a 64-bit system a SGEN, NGEN combo has a potentially huge (in our case gigabytes and minutes) effect.
More info here:
XmlSerializer startup HUGE performance loss on 64bit systems see Nick Martyshchenko's answer especially.

Yes, I tried it with a small single CPU-intensive exe and with ngen it was slightly slower!
I installed and uninstalled the ngen image multiple times and ran a benchmark.
I always got the following times reproducable +/- 0.1s:
33.9s without,
35.3s with

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.