Has anybody here ever used ngen? Where? why? Was there any performance improvement? when and where does it make sense to use it?
I don't use it day-to-day, but it is used by tools that want to boost performance; for example, Paint.NET uses NGEN during the installer (or maybe first use). It is possible (although I don't know for sure) that some of the MS tools do, too.
Basically, NGEN performs much of the JIT for an assembly up front, so that there is very little delay on a cold start. Of course, in most typical usage, not 100% of the code is ever reached, so in some ways this does a lot of unnecessary work - but it can't tell that ahead of time.
The downside, IMO, is that you need to use the GAC to use NGEN; I try to avoid the GAC as much as possible, so that I can use robocopy-deployment (to servers) and ClickOnce (to clients).
Yes, I've seen performance improvements. My measurements indicated that it did improve startup performance if I also put my assemblies into the GAC since my assemblies are all strong named. If your assemblies are strong named, NGen won't make any difference without using the GAC. The reason for this is that if you have strong named assemblies that are not in the GAC, then the .NET runtime validates that your strong named assembly hasn't been tampered with by loading the whole managed assembly from disk so it can validate it circumventing one of the major benefits of NGen.
This wasn't a very good option for my application since we rely on common assemblies from our company (that are also strong named). The common assemblies are used by many products that use many different versions, putting them in the GAC meant that if one of our applications didn't say "use specific version" of one of the common assemblies it would load the GAC version regardless of what version was in its executing directory. We decided that the benefits of NGen weren't worth the risks.
Ngen mainly reduces the start-up time of .NET app and application's working set. But it's have some disadvantages (from CLR Via C# of Jeffrey Richter):
No Intellectual Property Protection
NGen'd files can get out of sync
Inferior Load-Time Performance (Rebasing/Binding)
Inferior Execution-Time Performance
Due to all of the issues just listed, you should be very cautious when considering the use of
NGen.exe. For server-side applications, NGen.exe makes little or no sense because only the
first client request experiences a performance hit; future client requests run at high speed. In
addition, for most server applications, only one instance of the code is required, so there is no
working set benefit.
For client applications, NGen.exe might make sense to improve startup time or to reduce
working set if an assembly is used by multiple applications simultaneously. Even in a case in
which an assembly is not used by multiple applications, NGen'ing an assembly could improve
working set. Moreover, if NGen.exe is used for all of a client application's assemblies, the CLR
will not need to load the JIT compiler at all, reducing working set even further. Of course, if
just one assembly isn't NGen'd or if an assembly's NGen'd file can't be used, the JIT compiler
will load, and the application's working set increases.
ngen is mostly known for improving startup time (by eliminating JIT compilation). It might improve (by reducing JIT time) or decrease overall performance of the application (since some JIT optimizations won't be available).
.NET Framework itself uses ngen for many assemblies upon installation.
i have used it but just for research purpose. use it ONLY if you are sure about the cpu architecture of your deployment environment (it wont change)
but let me tell you JIT compilation is not too bad and if you have deployments across multiple cpu environments (for example a windows client application which is updated often) THEN DO NOT USE NGEN. thats coz a valid ngen cache depends upon many attributes. if one of these fail, your assembly falls back to jit again
JIT is a clear winner in such cases, as it optimizes code on the fly based on the cpu architecture its running on. (for eg it can detect if there are more then 1 cpu)
and clr is getting better with every release, so in short stick with JIT unless you are dead sure of your deployment environment - even then your performance gains would hardly justify using ngen.exe (probably gains would be in few hundred ms) - imho - its not worth the efforts
also check this real nice link on this topic - JIT Compilation and Performance - To NGen or Not to NGen?
Yes. Used on a WPF application to speed up startup time. Startup time went from 9 seconds to 5 seconds. Read about it in my blog :
I recently discovered how great NGEN can be for performance. The
application I currently work on has a data access layer (DAL) that is
generated. The database schema is quite large, and we also generate
some of the data (list of values) directly into the DAL. Result: many
classes with many fields, and many methods. JIT overhead often showed
up when profiling the application, but after a search on JIT compiling
and NGEN I though it wasn’t worth it. Install-time overhead, with
management my major concern, made me ignore the signs and focus on
adding more functionality to the application instead. When we changed
architecture to “Any CPU” running on 64 bit machines things got worse:
We experienced hang in our application for up to 10 seconds on a
single statement, with the profiler showing only JIT overhead on the
problem-area. NGEN solved the problem: the statement went from 10
seconds to 1 millisecond. This statement was not part of the
startup-procedure, so I was eager to find out what NGEN’ing the whole
application could do to the startup time. It went from 8 seconds to
3.5 seconds.
Conclusion: I really recommend giving NGEN a try on your application!
As an addition to Mehrdad Afshari's comment about JIT compilation. If serializing a class with many properties via the XmlSerializer and on a 64-bit system a SGEN, NGEN combo has a potentially huge (in our case gigabytes and minutes) effect.
More info here:
XmlSerializer startup HUGE performance loss on 64bit systems see Nick Martyshchenko's answer especially.
Yes, I tried it with a small single CPU-intensive exe and with ngen it was slightly slower!
I installed and uninstalled the ngen image multiple times and ran a benchmark.
I always got the following times reproducable +/- 0.1s:
33.9s without,
35.3s with
Related
A .NET program is first compiled into MSIL code. When it is executed, the JIT compiler will compile it into native machine code.
I am wondering:
Where is these JIT-compiled machine code stored? Is it only stored in address space of the process? But since the second startup of the program is much faster than the first time, I think this native code must have been stored on disk somewhere even after the execution has finished. But where?
Memory. It can be cached, that's the job of ngen.exe. It generates a .ni.dll version of the assembly, containing machine code and stored in the GAC. Which automatically gets loaded afterward, bypassing the JIT step.
But that has little to do with why your program starts faster the 2nd time. The 1st time you have a so-called "cold start". Which is completely dominated by the time spent on finding the DLLs on the hard drive. The second time you've got a warm start, the DLLs are already available in the file system cache.
Disks are slow. An SSD is an obvious fix.
Fwiw: this is not a problem that's exclusive to managed code. Large unmanaged programs with lots of DLLs have it too. Two canonical examples, present on most dev machines are Microsoft Office and Acrobat Reader. They cheat. When installed, they put an "optimizer" in the Run registry key or the Startup folder. All that these optimizers do is load all the DLLs that the main program uses, then exit. This primes the file system cache, when the user subsequently uses the program, it will start up quickly since its warm start is fast.
Personally, I find this extraordinarily annoying. Because what they really do is slow down any other program that I may want to start after logging in. Which is rarely Office or Acrobat. I make it a point to delete these optimizers, repeatedly if necessary when a blasted update puts it back.
You can use this trick too, but use it responsibly please.
As others have pointed out, code is JIT'd on a per process basis in your case, and is not cached - the speed-up you are seeing on second load is OS disk caching (i.e. in-memory) of the assemblies.
However, whilst there is no caching (apart from OS disk caching) in the desktop\server version of the framework, there is caching of JIT'd machine code in another version of the framework.
Of interest is what is happening in the .Net Compact Framework (NETCF for Windows Phone 7 relase). Recent advances see sharing of some JIT'd framework code between processes where the JIT'd code is indeed cached. This has been primarily carried out for better performance (load time and memory usage) in constrained devices such as mobile phones.
So in answer to the question there is no direct framework caching of JIT'd code in the desktop\server version of the CLR, but there will be in the latest version of the compact framework i.e. NETCF.
Reference: We Believe in Sharing
Link
JIT compiled machine code is cached in memory per-method, each time that a method is executed for the first time. I don't think it is ever cached to disk.
You may find that the process is faster to load the second time because Windows cached (in memory) the files used by your process (dlls, resources etc etc) on the first run. On the second run there is no need to go to disk, where this may have been done on the first run.
You could confirm this by running NGen.exe to actually pre-compile the machine code for your architecture, and compare the performance of the first and second runs. My bet is that the second run would still be faster, due to caching in the OS.
In short, the IL is JIT-compiled for each invocation of the program and is maintained in code pages of the process address space. See Chapter 1 of Richter for great coverage of the .NET execution model.
I believe that the JIT compiled code is never stored or swapped out of memory. The performance boost you perceive on a second execution of an assembly is due to dependant assemblies already being in memory or disc cache.
Yes, NGEN.EXE will place a JIT compiled version of a .NET executable in the GAC, even when
the MSIL version is not there. I have tried that, but to no avail.
I believe, unless the original MSIL version is also in the GAC and would be loaded
from there, the JIT version in the GAC will not be used.
I also believe that on-the-fly JIT compiles (not NGEN) are never cached; they occupy process
memory only.
I believe this from reading the MS doc and from various experiments. I would welcome either
a confirmation or rebuttal of my assertions from those "who know".
As we know, .net has CLI and JIT to execute programs. but these two stage maybe cause to lower speed and performance in compare with c++ that compile all codes in one stage. I want to know that .net's languages how to overcome this disadvantage and deal with it?
Having worked on both C++ compilers and now having spent the past few years working on the .Net JIT, I think there are a few things worth considering:
As many others have pointed out, the JIT is running in process with your app, and it tries to carefully balance quick JIT times versus the quality of jitted code. The more elaborate optimizations seen in C++ often come with very high compile time price tags, and there are some pretty sharp knees in the compile-time-vs-code-quality graph.
Prejitting seemingly can change this equation somewhat as the jit runs beforehand and could take more time, but prejitting's ability to enlarge optimization scope is quite limited (for instance we try and avoid introducing fragile cross-assembly dependencies, and so for example won't inline across assembly boundaries). So prejitted code tends to run somewhat more slowly than jitted code, and mainly helps application startup times.
.Net's default execution model precludes many interprocedural optimizations, because of dynamic class loading, reflection, and the ability of a profiler to update method bodies in a running process. We think, by and large, that the productivity and app architecture gains from these features are worth the trouble. But for cases where these features are not needed we are looking for ways to ensure that if your app doesn't need it, your app won't pay for it.
For example we have some "pure" AOT work going on over in CoreRT but as a consequence reflection is limited.
.Net Core 2.1 includes a preview of Tiered jitting, which will allow us to ease some of the constraints on jit time -- we'll be able to invest more time jitting methods that we know are frequently executed. So I would expect to see more sophisticated optimizations get added to the JIT over time.
.Net Core 2.1 also includes a preview of Hardware Intrinsics so you can take full advantage of the rich instruction sets available on modern hardware.
.Net's JIT does not yet get much benefit from profile feedback. This is something we are actively working on changing, though it will take time, and will likely be tied into tiering.
The .Net execution model fundamentally alters the way one needs to think about certain compiler optimizations. For instance, from the compiler's standpoint, many operations -- including low level things like field access -- can raise semantically meaningful exceptions (in C++ only calls/throws can cause exceptions). And .Net's GC is precise and relocating which imposes constraints on optimizations in other ways.
I'm working on a project at work where there's a performance issue with the code.
I've got some changes I think will improve performance, but no real way of gauging how my changes affect it.
I wrote a unit test that does things the way they're currently implemented, with a Stopwatch to monitor how fast the function runs. I've also wrote a similar unit test that does things slightly differently.
If the tests are ran together, one takes 1s to complete, the other takes 73 ms.
If the tests are ran separately, they both take around 1s to complete (yea.. that change i made didn't seem to change much).
If the tests are identical, I have the same issue, one runs faster than the other.
Is visual studio doing something behind the scenes to improve performance? Can I turn it off if it is?
I've tried moving tests into different files, which didn't fix the issue I'm having.
I'd like to be able to run all the tests, but have them run as if there's only one test running at a time.
My guess: it's likely down to dll loading and JIT compiling
1. Assembly loading.
.NET lazily loads assemblies (dll's). If you add reference to FooLibrary, it doesn't mean it gets loaded when your code loads.
Instead, what happens is that the first time you call a function or instantiate a class from FooLibrary, then the CLR will go and load the dll it lives in. This involves searching for it in the filesystem, possible security checks, etc.
If your code is even moderately complex, then the "first test" can often end up causing dozens of assemblies to get loaded, which obviously takes some time.
Subsequent tests appear fast because everything's already loaded.
2. JIT Compiling
Remember, your .NET assemblies don't contain code that the CPU can directly execute. Whenever you call any .NET function, the CLR takes the MSIL bytecode and compiles it into executable machine code, and then it goes and runs this machine code. It does this on a per-function basis.
So, if you consider that the first time you call any function, there will be a small delay while it JIT compiles, these things can add up. This can be particularly bad if you're calling a lot of functions or initializing a big third party library (think entity framework, etc).
As above, subsequent tests appear fast, because many of the functions will have already been JIT compiled, and cached in memory.
So, how can you get around this?
You can improve the assembly loading time by having fewer assemblies. This means fewer file searches and so on. The microsoft .NET performance guidelines go into more detail.
Also, I believe installing them in the global assembly cache may (??) help, but I haven't tested that at all so please take it with a large grain of salt.
Installing into the GAC requires administrative permissions and is quite a heavyweight operation. You don't want to be doing it during development, as it will cause you problems (assemblies get loaded from the GAC in preference to the filesystem, so you can end up loading old copies of your code without realizing it).
You can improve the JIT time by using ngen to pre-compile your assemblies. However, like with the GAC, this requires administrative permissions and takes some time, so you do not want to do it during development either.
My advice?
Firstly, measuring performance in unit tests is not a particularly good or reliable thing to be doing. Who knows what else visual studio is doing in the background that may or may not affect your tests.
Once you've got your code you're trying to benchmark out into a standalone app, have it loop and run all the tests twice, and discard the first result :-)
"Premature optimization is the root of all evil."
If you didn't measure before, how do you know you are fixing anything now? How do you even know you had a problem that needed to be solved?
Unit tests are for operational correctness. They could be used for performance, but I would not depend on that because many other factors come into play at run-time.
Your best bet is to get a profiler (or use one that comes with VS) and start measuring.
I noticed that sometimes a .net 4.0 c# application takes a long time to start, without any apparent reason. Can can I determine what's actually happening, what modules are loaded? I'm using a number of external assemblies. Can putting them into the GAC improve performances?
Is .NET 4 slower than .NET 2?
.NET programs have two distinct start-up behaviors. They are called cold-start and warm-start. The cold-start is the slow one, you'll get it when no .NET program was started before. Or when the program you start is large and was never run before. The operating system has to find the assembly files on disk, they won't be available in the file system cache (RAM). That takes a while, hard disks are slow and there are a lot of files to find. A small do-nothing Winforms app has to load 51 DLLs to get started. A do-nothing WPF app weighs in at 77 DLLs.
You get a warm start when the assembly files were loaded before, not too long ago. The assembly file data now comes from RAM instead of the slow disk, that's zippedy-doodah. The only startup overhead is now the jitter.
There's little you can do about cold starts, the assemblies have to come of the disk one way or another. A fast disk makes a Big difference, SSDs are especially effective. Using ngen.exe to pre-jit an assembly actually makes the problem worse, it creates another file that needs to be found and loaded. Which is the reason that Microsoft recommends not prejitting small assemblies. Seeing this problem with .NET 4 programs is also highly indicated, you don't have a lot of programs that bind to the version 4 CLR and framework assemblies. Not yet anyway, this solves itself over time.
There's another way this problem automatically disappears. The Windows SuperFetch feature will start to notice that you often load the CLR and the jitted Framework assemblies and will start to pre-load them into RAM automatically. The same kind of trick that the Microsoft Office and Adobe Reader 'optimizers' use. They are also programs that have a lot of DLL dependencies. Unmanaged ones, the problem isn't specific to .NET. These optimizers are crude, they preload the DLLs when you login. Which is the 'I'm really important, screw everything else' approach to working around the problem, make sure you disable them so they don't crowd out the RAM space that SuperFetch could use.
The startup time is most likely due to the runtime JIT compiling assembly IL into machine code for execution. It can also be affected by the debugger - as another answerer has suggested.
Excluding that - I'll talk about an application ran 'in the wild' on a user's machine, with no debugger etc.
The JIT compiler in .Net 4 is, I think it's fair to say, better than in .Net 2 - so no; it's not slower.
You can improve this startup time significantly by running ngen on your application's assemblies - this pre-compiles the EXEs and DLLs into native images. However you lose some flexibility by doing this and, in general, there is not much point.
You should see the startup time of some MFC apps written in C++ - all native code, and yet depending on how they are linked they can take just as long.
It does, of course, also depend on what an application is actually doing at startup!
I dont think putting your assemblies in GAC will boot the performance.
If possible do logging for each instruction you have written on Loading or Intialize events which may help you to identify which statement is actually taking time and with this you can identify the library which is taking time in loading.
A .NET program is first compiled into MSIL code. When it is executed, the JIT compiler will compile it into native machine code.
I am wondering:
Where is these JIT-compiled machine code stored? Is it only stored in address space of the process? But since the second startup of the program is much faster than the first time, I think this native code must have been stored on disk somewhere even after the execution has finished. But where?
Memory. It can be cached, that's the job of ngen.exe. It generates a .ni.dll version of the assembly, containing machine code and stored in the GAC. Which automatically gets loaded afterward, bypassing the JIT step.
But that has little to do with why your program starts faster the 2nd time. The 1st time you have a so-called "cold start". Which is completely dominated by the time spent on finding the DLLs on the hard drive. The second time you've got a warm start, the DLLs are already available in the file system cache.
Disks are slow. An SSD is an obvious fix.
Fwiw: this is not a problem that's exclusive to managed code. Large unmanaged programs with lots of DLLs have it too. Two canonical examples, present on most dev machines are Microsoft Office and Acrobat Reader. They cheat. When installed, they put an "optimizer" in the Run registry key or the Startup folder. All that these optimizers do is load all the DLLs that the main program uses, then exit. This primes the file system cache, when the user subsequently uses the program, it will start up quickly since its warm start is fast.
Personally, I find this extraordinarily annoying. Because what they really do is slow down any other program that I may want to start after logging in. Which is rarely Office or Acrobat. I make it a point to delete these optimizers, repeatedly if necessary when a blasted update puts it back.
You can use this trick too, but use it responsibly please.
As others have pointed out, code is JIT'd on a per process basis in your case, and is not cached - the speed-up you are seeing on second load is OS disk caching (i.e. in-memory) of the assemblies.
However, whilst there is no caching (apart from OS disk caching) in the desktop\server version of the framework, there is caching of JIT'd machine code in another version of the framework.
Of interest is what is happening in the .Net Compact Framework (NETCF for Windows Phone 7 relase). Recent advances see sharing of some JIT'd framework code between processes where the JIT'd code is indeed cached. This has been primarily carried out for better performance (load time and memory usage) in constrained devices such as mobile phones.
So in answer to the question there is no direct framework caching of JIT'd code in the desktop\server version of the CLR, but there will be in the latest version of the compact framework i.e. NETCF.
Reference: We Believe in Sharing
Link
JIT compiled machine code is cached in memory per-method, each time that a method is executed for the first time. I don't think it is ever cached to disk.
You may find that the process is faster to load the second time because Windows cached (in memory) the files used by your process (dlls, resources etc etc) on the first run. On the second run there is no need to go to disk, where this may have been done on the first run.
You could confirm this by running NGen.exe to actually pre-compile the machine code for your architecture, and compare the performance of the first and second runs. My bet is that the second run would still be faster, due to caching in the OS.
In short, the IL is JIT-compiled for each invocation of the program and is maintained in code pages of the process address space. See Chapter 1 of Richter for great coverage of the .NET execution model.
I believe that the JIT compiled code is never stored or swapped out of memory. The performance boost you perceive on a second execution of an assembly is due to dependant assemblies already being in memory or disc cache.
Yes, NGEN.EXE will place a JIT compiled version of a .NET executable in the GAC, even when
the MSIL version is not there. I have tried that, but to no avail.
I believe, unless the original MSIL version is also in the GAC and would be loaded
from there, the JIT version in the GAC will not be used.
I also believe that on-the-fly JIT compiles (not NGEN) are never cached; they occupy process
memory only.
I believe this from reading the MS doc and from various experiments. I would welcome either
a confirmation or rebuttal of my assertions from those "who know".