JIT & loop optimization

JIT & loop optimization - c#

using System;
namespace ConsoleApplication1
{
class TestMath
{
static void Main()
{
double res = 0.0;
for(int i =0;i<1000000;++i)
res += System.Math.Sqrt(2.0);
Console.WriteLine(res);
Console.ReadKey();
}
}
}
By benchmarking this code against the c++ version, I discover than performance are 10 times slower than c++ version. I have no problem with that , but that lead me to the following question :
It seems (after a few search) that JIT compiler can't optimize this code as c++ compiler can do, namely just call sqrt once and apply *1000000 on it.
Is there a way to force JIT to do it ?

I repro, I clock the C++ version at 1.2 msec, the C# version at 12.2 msec. The reason is readily visible if you take a look at the machine code the C++ code generator and optimizer emits. It rewrites the loop like this (using the C# equivalent):
double temp = Math.Sqrt(2.0);
for (int i = 0; i < 1000000; ++i) {
res += temp;
}
That's a combination of two optimizations, called "invariant code motion" and "loop hoisting". In other words, the C++ compiler knows enough about the sqrt() function to know that its return value is not affected by the surrounding code so can be moved at will. And that it is then worth-while to move that code outside of the loop and create an extra local variable to store the result. And that calculating sqrt() is slower than adding. Sounds obvious but that's a rule that has to built into the optimizer and has to be considered, one of many, many rules.
And yes, the jitter optimizer misses that one. It is guilty of not being able to spent the same amount of time as the C++ optimizer, it operates under heavy time constraints. Because if it takes too long then the program takes too much time getting started.
Tongue in cheek: a C# programmer needs to be a bit smarter than the code generator and recognize these optimization opportunities himself. This is a fairly obvious one. Well, now that you know about it anyway :)

To do the optimization you want, the compiler has to assure that the function Sqrt() will always return the same value for a certain input.
The compiler can do all kinds of checks that the function isn't using any other "outer" variables to see if it's stateless. But that also doesn't always mean that it can't be affected by side affects.
When a function is called in a loop it should be called in each iteration (think of a multithreaded environment to see why this is important). So usually it's up to the user to take constant stuff out of the loop if he wants that kind of optimization.
Back to the C++ compiler - the compiler might have certain optimization for its library functions. A lot of compilers try to optimize important libraries like the math library, so that might be compiler specific.
Another big difference is in C++ you usually include that kinda stuff from a header file. This means the compiler may have all the information it needs to decide if the function call doesn't change between calls.
The .Net compiler (at compile time - Visual Studio) doesn't always have all the code to parse. Most of the library functions are already compiled (into IL - first stage). And so might not be able to do deep optimizations considering 3rd party dlls. And at the JIT (runtime) compilation it will probably be too costly to do these kind of optimizations across assemblies.

It might help the JIT (or even the C# compiler) if Math.Sqrt was annotated as [Pure]. Then, assuming the arguments to the function are constant as they are in your example, the calculation of the value could be lifted outside the loop.
What's more, such a loop could reasonably be converted into the code:
double res = 1000000 * Math.Sqrt(2.0);
In theory the compiler or JIT could perform this automatically. However I suspect that it would be optimising for a pattern that happens rarely in actual code.
I opened a feature request for ReSharper, suggesting that the design-time tool suggests such a refactoring.

Related

Repeated access to properties and speed in C# [duplicate]

Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)

First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.

If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.

Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.

It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.

why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.

If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.

The 32 bytes limit for inline function ... not too small?

I have a very small c# code marked as inline, but dont work.
I have seen that the longest function generates more than 32 bytes of IL code. Does the limit of 32 bytes too short ?
// inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public bool INL_IsInRange (this byte pValue, byte pMin) {
return(pValue>=pMin);
}
// NOT inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public bool INL_IsInRange (this byte pValue, byte pMin, byte pMax) {
return(pValue>=pMin&&pValue<=pMax);
}
Is it possible to change that limit?

I am looking for inline function criteria also. In your case, I believe that JIT optimization timed out before it could reach the decision to inline your second function. For JIT, it's not a priority to inline a function, so it was busy analyzing your long code. However, if you place your calls inside tight loops, JIT will probably inline them, as inner calls gain priority to inline. If you really care about this type of micro-optimization, it's time to switch to C++. It's a whole new brave world out there for you to explore and exploit!
I noticed that the question had been edited right after this answer had been posted, meaning a high level of interactivity. Well, I don't know why there is a limit of 32 bytes, but that seems to be exactly the size of a CPU cache block, conservatively speaking. What a coincidence! In any case, code optimization must be done with a particular hardware configuration, better saved in an extra file side by side with its assembly. The timeout policy is stupid, because optimization is not supposed to be done at run-time, competing against the precious code execution time. Optimization is supposed to be done at application load-time, only the first time it's run on the machine, once for all. It can be triggered again when hardware configuration change is detected. Again, if you really need performance, just go with C/C++. C# is not designed for performance and will never make performance its top priority. Like Java, C# is designed for safety, with a much stronger caution against possible negative performance impacts.

Up to the "32-bytes of IL" limit, there are a number of other factors which affect whether a method would be inlined or not. There are at least a couple of articles that describe these factors.
One article explains that a scoring heuristic is used to adjust an initial guess about the relative size of the code when inlined vs not (i.e. whether the call site is larger or smaller than the inlined code itself):
If inlining makes code smaller then the call it replaces, it is ALWAYS good. Note that we are talking about the NATIVE code size, not the IL code size (which can be quite different).
The more a particular call site is executed, the more it will benefit from inlning. Thus code in loops deserves to be inlined more than code that is not in loops.
If inlining exposes important optimizations, then inlining is more desirable. In particular methods with value types arguments benefit more than normal because of optimizations like this and thus having a bias to inline these methods is good.
Thus the heuristic the X86 JIT compiler uses is, given an inline candidate.
Estimate the size of the call site if the method were not inlined.
Estimate the size of the call site if it were inlined (this is an estimate based on the IL, we employ a simple state machine (Markov Model), created using lots of real data to form this estimator logic)
Compute a multiplier. By default it is 1
Increase the multiplier if the code is in a loop (the current heuristic bumps it to 5 in a loop)
Increase the multiplier if it looks like struct optimizations will kick in.
If InlineSize <= NonInlineSize * Multiplier do the inlining.
Another article explains several conditions that will prevent a method from being inlined based on their mere existence (including the "32-bytes of IL" limit):
These are some of the reasons for which we won't inline a method:
Method is marked as not inline with the CompilerServices.MethodImpl attribute.
Size of inlinee is limited to 32 bytes of IL: This is a heuristic, the rationale behind it is that usually, when you have methods bigger than that, the overhead of the call will not be as significative compared to the work the method does. Of course, as a heuristic, it fails in some situations. There have been suggestions for us adding an attribute to control these threshold. For Whidbey, that attribute has not been added (it has some very bad properties: it's x86 JIT specific and it's longterm value, as compilers get smarter, is dubious).
Virtual calls: We don't inline across virtual calls. The reason for not doing this is that we don't know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it's not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we're not forced to be so aggressive about optimizing this case.
Valuetypes: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, getting some statistics on how frequently methods cannot be inlined due to this reason and considering the cost of making this area of the JIT significantly better, we decided that it made more sense for our customers to spend our time working in other optimizations or CLR features. Whidbey is better than previous versions in one case: value types that only have a pointer size int as a member, this was (relatively) not expensive to make better, and helped a lot in common value types such as pointer wrappers (IntPtr, etc).
MarshalByRef: Call targets that are in MarshalByRef classes won't be inlined (call has to be intercepted and dispatched). We've got better in Whidbey for this scenario
VM restrictions: These are mostly security, the JIT must ask the VM for permission to inline a method (see CEEInfo::canInline in Rotor source to get an idea of what kind of things the VM checks for).
Complicated flowgraph: We don't inline loops, methods with exception handling regions, etc...
If basic block that has the call is deemed as it won't execute frequently (for example, a basic block that has a throw, or a static class constructor), inlining is much less aggressive (as the only real win we can make is code size)
Other: Exotic IL instructions, security checks that need a method frame, etc...

Primenumbers C# and C++ [duplicate]

Or is it now the other way around?
From what I've heard there are some areas in which C# proves to be faster than C++, but I've never had the guts to test it by myself.
Thought any of you could explain these differences in detail or point me to the right place for information on this.

There is no strict reason why a bytecode based language like C# or Java that has a JIT cannot be as fast as C++ code. However C++ code used to be significantly faster for a long time, and also today still is in many cases. This is mainly due to the more advanced JIT optimizations being complicated to implement, and the really cool ones are only arriving just now.
So C++ is faster, in many cases. But this is only part of the answer. The cases where C++ is actually faster, are highly optimized programs, where expert programmers thoroughly optimized the hell out of the code. This is not only very time consuming (and thus expensive), but also commonly leads to errors due to over-optimizations.
On the other hand, code in interpreted languages gets faster in later versions of the runtime (.NET CLR or Java VM), without you doing anything. And there are a lot of useful optimizations JIT compilers can do that are simply impossible in languages with pointers. Also, some argue that garbage collection should generally be as fast or faster as manual memory management, and in many cases it is. You can generally implement and achieve all of this in C++ or C, but it's going to be much more complicated and error prone.
As Donald Knuth said, "premature optimization is the root of all evil". If you really know for sure that your application will mostly consist of very performance critical arithmetic, and that it will be the bottleneck, and it's certainly going to be faster in C++, and you're sure that C++ won't conflict with your other requirements, go for C++. In any other case, concentrate on first implementing your application correctly in whatever language suits you best, then find performance bottlenecks if it runs too slow, and then think about how to optimize the code. In the worst case, you might need to call out to C code through a foreign function interface, so you'll still have the ability to write critical parts in lower level language.
Keep in mind that it's relatively easy to optimize a correct program, but much harder to correct an optimized program.
Giving actual percentages of speed advantages is impossible, it largely depends on your code. In many cases, the programming language implementation isn't even the bottleneck. Take the benchmarks at http://benchmarksgame.alioth.debian.org/ with a great deal of scepticism, as these largely test arithmetic code, which is most likely not similar to your code at all.

I'm going to start by disagreeing with part of the accepted (and well-upvoted) answer to this question by stating:
There are actually plenty of reasons why JITted code will run slower than a properly optimized C++ (or other language without runtime overhead)
program including:
compute cycles spent on JITting code at runtime are by definition unavailable for use in program execution.
any hot paths in the JITter will be competing with your code for instruction and data cache in the CPU. We know that cache dominates when it comes to performance and native languages like C++ do not have this type of contention, by design.
a run-time optimizer's time budget is necessarily much more constrained than that of a compile-time optimizer's (as another commenter pointed out)
Bottom line: Ultimately, you will almost certainly be able to create a faster implementation in C++ than you could in C#.
Now, with that said, how much faster really isn't quantifiable, as there are too many variables: the task, problem domain, hardware, quality of implementations, and many other factors. You'll have run tests on your scenario to determine the the difference in performance, and then decide whether it is worth the the additional effort and complexity.
This is a very long and complex topic, but I feel it's worth mentioning for the sake of completeness that C#'s runtime optimizer is excellent, and is able to perform certain dynamic optimizations at runtime that are simply not available to C++ with its compile-time (static) optimizer. Even with this, the advantage is still typically deeply in the native application's court, but the dynamic optimizer is the reason for the "almost certainly" qualifier given above.
--
In terms of relative performance, I was also disturbed by the figures and discussions I saw in some other answers, so I thought I'd chime in and at the same time, provide some support for the statements I've made above.
A huge part of the problem with those benchmarks is you can't write C++ code as if you were writing C# and expect to get representative results (eg. performing thousands of memory allocations in C++ is going to give you terrible numbers.)
Instead, I wrote slightly more idiomatic C++ code and compared against the C# code #Wiory provided. The two major changes I made to the C++ code were:
used vector::reserve()
flattened the 2d array to 1d to achieve better cache locality (contiguous block)
C# (.NET 4.6.1)
private static void TestArray()
{
const int rows = 5000;
const int columns = 9000;
DateTime t1 = System.DateTime.Now;
double[][] arr = new double[rows][];
for (int i = 0; i < rows; i++)
arr[i] = new double[columns];
DateTime t2 = System.DateTime.Now;
Console.WriteLine(t2 - t1);
t1 = System.DateTime.Now;
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
arr[i][j] = i;
t2 = System.DateTime.Now;
Console.WriteLine(t2 - t1);
}
Run time (Release): Init: 124ms, Fill: 165ms
C++14 (Clang v3.8/C2)
#include <iostream>
#include <vector>
auto TestSuite::ColMajorArray()
{
constexpr size_t ROWS = 5000;
constexpr size_t COLS = 9000;
auto initStart = std::chrono::steady_clock::now();
auto arr = std::vector<double>();
arr.reserve(ROWS * COLS);
auto initFinish = std::chrono::steady_clock::now();
auto initTime = std::chrono::duration_cast<std::chrono::microseconds>(initFinish - initStart);
auto fillStart = std::chrono::steady_clock::now();
for(auto i = 0, r = 0; r < ROWS; ++r)
{
for (auto c = 0; c < COLS; ++c)
{
arr[i++] = static_cast<double>(r * c);
}
}
auto fillFinish = std::chrono::steady_clock::now();
auto fillTime = std::chrono::duration_cast<std::chrono::milliseconds>(fillFinish - fillStart);
return std::make_pair(initTime, fillTime);
}
Run time (Release): Init: 398µs (yes, that's microseconds), Fill: 152ms
Total Run times: C#: 289ms, C++ 152ms (roughly 90% faster)
Observations
Changing the C# implementation to the same 1d array implementation
yielded Init: 40ms, Fill: 171ms, Total: 211ms (C++ was still almost
40% faster).
It is much harder to design and write "fast" code in C++ than it is to write "regular" code in either language.
It's (perhaps) astonishingly easy to get poor performance in C++; we saw that with unreserved vectors performance. And there are lots of pitfalls like this.
C#'s performance is rather amazing when you consider all that is going on at runtime. And that performance is comparatively easy to
access.
More anecdotal data comparing the performance of C++ and C#: https://benchmarksgame.alioth.debian.org/u64q/compare.php?lang=gpp&lang2=csharpcore
The bottom line is that C++ gives you much more control over performance. Do you want to use a pointer? A reference? Stack memory? Heap? Dynamic polymorphism or eliminate the runtime overhead of a vtable with static polymorphism (via templates/CRTP)? In C++ you have to... er, get to make all these choices (and more) yourself, ideally so that your solution best addresses the problem you're tackling.
Ask yourself if you actually want or need that control, because even for the trivial example above, you can see that although there is a significant improvement in performance, it requires a deeper investment to access.

It's five oranges faster. Or rather: there can be no (correct) blanket answer. C++ is a statically compiled language (but then, there's profile guided optimization, too), C# runs aided by a JIT compiler. There are so many differences that questions like “how much faster” cannot be answered, not even by giving orders of magnitude.

In my experience (and I have worked a lot with both languages), the main problem with C# compared to C++ is high memory consumption, and I have not found a good way to control it. It was the memory consumption that would eventually slow down .NET software.
Another factor is that JIT compiler cannot afford too much time to do advanced optimizations, because it runs at runtime, and the end user would notice it if it takes too much time. On the other hand, a C++ compiler has all the time it needs to do optimizations at compile time. This factor is much less significant than memory consumption, IMHO.

One particular scenario where C++ still has the upper hand (and will, for years to come) occurs when polymorphic decisions can be predetermined at compile time.
Generally, encapsulation and deferred decision-making is a good thing because it makes the code more dynamic, easier to adapt to changing requirements and easier to use as a framework. This is why object oriented programming in C# is very productive and it can be generalized under the term “generalization”. Unfortunately, this particular kind of generalization comes at a cost at run-time.
Usually, this cost is non-substantial but there are applications where the overhead of virtual method calls and object creation can make a difference (especially since virtual methods prevent other optimizations such as method call inlining). This is where C++ has a huge advantage because you can use templates to achieve a different kind of generalization which has no impact on runtime but isn't necessarily any less polymorphic than OOP. In fact, all of the mechanisms that constitute OOP can be modelled using only template techniques and compile-time resolution.
In such cases (and admittedly, they're often restricted to special problem domains), C++ wins against C# and comparable languages.

C++ (or C for that matter) gives you fine-grained control over your data structures. If you want to bit-twiddle you have that option. Large managed Java or .NET apps (OWB, Visual Studio 2005) that use the internal data structures of the Java/.NET libraries carry the baggage with them. I've seen OWB designer sessions using over 400 MB of RAM and BIDS for cube or ETL design getting into the 100's of MB as well.
On a predictable workload (such as most benchmarks that repeat a process many times) a JIT can get you code that is optimised well enough that there is no practical difference.
IMO on large applications the difference is not so much the JIT as the data structures that the code itself is using. Where an application is memory-heavy you will get less efficient cache usage. Cache misses on modern CPUs are quite expensive. Where C or C++ really win is where you can optimise your usage of data structures to play nicely with the CPU cache.

For graphics the standard C# Graphics class is way slower than GDI accessed via C/C++.
I know this has nothing to do with the language per se, more with the total .NET platform, but Graphics is what is offered to the developer as a GDI replacement, and its performance is so bad I wouldn't even dare to do graphics with it.
We have a simple benchmark we use to see how fast a graphics library is, and that is simply drawing random lines in a window. C++/GDI is still snappy with 10000 lines while C#/Graphics has difficulty doing 1000 in real-time.

The garbage collection is the main reason Java# CANNOT be used for real-time systems.
When will the GC happen?
How long will it take?
This is non-deterministic.

We have had to determine if C# was comparable to C++ in performance and I wrote some test programs for that (using Visual Studio 2005 for both languages). It turned out that without garbage collection and only considering the language (not the framework) C# has basically the same performance as C++. Memory allocation is way faster in C# than in C++ and C# has a slight edge in determinism when data sizes are increased beyond cache line boundaries. However, all of this had eventually to be paid for and there is a huge cost in the form of non-deterministic performance hits for C# due to garbage collection.

C/C++ can perform vastly better in programs where there are either large arrays or heavy looping/iteration over arrays (of any size). This is the reason that graphics are generally much faster in C/C++, because heavy array operations underlie almost all graphics operations. .NET is notoriously slow in array indexing operations due to all the safety checks, and this is especially true for multi-dimensional arrays (and, yes, rectangular C# arrays are even slower than jagged C# arrays).
The bonuses of C/C++ are most pronounced if you stick directly with pointers and avoid Boost, std::vector and other high-level containers, as well as inline every small function possible. Use old-school arrays whenever possible. Yes, you will need more lines of code to accomplish the same thing you did in Java or C# as you avoid high-level containers. If you need a dynamically sized array, you will just need to remember to pair your new T[] with a corresponding delete[] statement (or use std::unique_ptr)—the price for the extra speed is that you must code more carefully. But in exchange, you get to rid yourself of the overhead of managed memory / garbage collector, which can easily be 20% or more of the execution time of heavily object-oriented programs in both Java and .NET, as well as those massive managed memory array indexing costs. C++ apps can also benefit from some nifty compiler switches in certain specific cases.
I am an expert programmer in C, C++, Java, and C#. I recently had the rare occasion to implement the exact same algorithmic program in the latter 3 languages. The program had a lot of math and multi-dimensional array operations. I heavily optimized this in all 3 languages. The results were typical of what I normally see in less rigorous comparisons: Java was about 1.3x faster than C# (most JVMs are more optimized than the CLR), and the C++ raw pointer version came in about 2.1x faster than C#. Note that the C# program only used safe code—it is my opinion that you might as well code it in C++ before using the unsafe keyword.
Lest anyone think I have something against C#, I will close by saying that C# is probably my favorite language. It is the most logical, intuitive and rapid development language I've encountered so far. I do all my prototyping in C#. The C# language has many small, subtle advantages over Java (yes, I know Microsoft had the chance to fix many of Java's shortcomings by entering the game late and arguably copying Java). Toast to Java's Calendar class anyone? If Microsoft ever spends real effort to optimize the CLR and the .NET JITter, C# could seriously take over. I'm honestly surprised they haven't already—they did so many things right in the C# language, why not follow it up with heavy-hitting compiler optimizations? Maybe if we all beg.

As usual, it depends on the application. There are cases where C# is probably negligibly slower, and other cases where C++ is 5 or 10 times faster, especially in cases where operations can be easily SIMD'd.

I know it isn't what you were asking, but C# is often quicker to write than C++, which is a big bonus in a commercial setting.

> From what I've heard ...
Your difficulty seems to be in deciding whether what you have heard is credible, and that difficulty will just be repeated when you try to assess the replies on this site.
How are you going to decide if the things people say here are more or less credible than what you originally heard?
One way would be to ask for evidence.
When someone claims "there are some areas in which C# proves to be faster than C++" ask them why they say that, ask them to show you measurements, ask them to show you programs. Sometimes they will simply have made a mistake. Sometimes you'll find out that they are just expressing an opinion rather than sharing something that they can show to be true.
Often information and opinion will be mixed up in what people claim, and you'll have to try and sort out which is which. For example, from the replies in this forum:
"Take the benchmarks at http://shootout.alioth.debian.org/
with a great deal of scepticism, as
these largely test arithmetic code,
which is most likely not similar to
your code at all."
Ask yourself if you really
understand what "these largely test
arithmetic code" means, and then
ask yourself if the author has
actually shown you that his claim is
true.
"That's a rather useless test, since it really depends on how well
the individual programs have been
optimized; I've managed to speed up
some of them by 4-6 times or more,
making it clear that the comparison
between unoptimized programs is
rather silly."
Ask yourself whether the author has
actually shown you that he's managed
to "speed up some of them by 4-6
times or more" - it's an easy claim to make!

For 'embarassingly parallel' problems, when using Intel TBB and OpenMP on C++ I have observed a roughly 10x performance increase compared to similar (pure math) problems done with C# and TPL. SIMD is one area where C# cannot compete, but I also got the impression that TPL has a sizeable overhead.
That said, I only use C++ for performance-critical tasks where I know I will be able to multithread and get results quickly. For everything else, C# (and occasionally F#) is just fine.

In theory, for long running server-type application, a JIT-compiled language can become much faster than a natively compiled counterpart. Since the JIT compiled language is generally first compiled to a fairly low-level intermediate language, you can do a lot of the high-level optimizations right at compile time anyway. The big advantage comes in that the JIT can continue to recompile sections of code on the fly as it gets more and more data on how the application is being used. It can arrange the most common code-paths to allow branch prediction to succeed as often as possible. It can re-arrange separate code blocks that are often called together to keep them both in the cache. It can spend more effort optimizing inner loops.
I doubt that this is done by .NET or any of the JREs, but it was being researched back when I was in university, so it's not unreasonable to think that these sort of things may find their way into the real world at some point soon.

Applications that require intensive memory access eg. image manipulation are usually better off written in unmanaged environment (C++) than managed (C#). Optimized inner loops with pointer arithmetics are much easier to have control of in C++. In C# you might need to resort to unsafe code to even get near the same performance.

I've tested vector in C++ and C# equivalent - List and simple 2d arrays.
I'm using Visual C#/C++ 2010 Express editions. Both projects are simple console applications, I've tested them in standard (no custom settings) release and debug mode.
C# lists run faster on my pc, array initialization is also faster in C#, math operations are slower.
I'm using Intel Core2Duo P8600#2.4GHz, C# - .NET 4.0.
I know that vector implementation is different than C# list, but I just wanted to test collections that I would use to store my objects (and being able to use index accessor).
Of course you need to clear memory (let's say for every use of new), but I wanted to keep the code simple.
C++ vector test:
static void TestVector()
{
clock_t start,finish;
start=clock();
vector<vector<double>> myList=vector<vector<double>>();
int i=0;
for( i=0; i<500; i++)
{
myList.push_back(vector<double>());
for(int j=0;j<50000;j++)
myList[i].push_back(j+i);
}
finish=clock();
cout<<(finish-start)<<endl;
cout<<(double(finish - start)/CLOCKS_PER_SEC);
}
C# list test:
private static void TestVector()
{
DateTime t1 = System.DateTime.Now;
List<List<double>> myList = new List<List<double>>();
int i = 0;
for (i = 0; i < 500; i++)
{
myList.Add(new List<double>());
for (int j = 0; j < 50000; j++)
myList[i].Add(j *i);
}
DateTime t2 = System.DateTime.Now;
Console.WriteLine(t2 - t1);
}
C++ - array:
static void TestArray()
{
cout << "Normal array test:" << endl;
const int rows = 5000;
const int columns = 9000;
clock_t start, finish;
start = clock();
double** arr = new double*[rows];
for (int i = 0; i < rows; i++)
arr[i] = new double[columns];
finish = clock();
cout << (finish - start) << endl;
start = clock();
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
arr[i][j] = i * j;
finish = clock();
cout << (finish - start) << endl;
}
C# - array:
private static void TestArray()
{
const int rows = 5000;
const int columns = 9000;
DateTime t1 = System.DateTime.Now;
double[][] arr = new double[rows][];
for (int i = 0; i < rows; i++)
arr[i] = new double[columns];
DateTime t2 = System.DateTime.Now;
Console.WriteLine(t2 - t1);
t1 = System.DateTime.Now;
for (int i = 0; i < rows; i++)
for (int j = 0; j < columns; j++)
arr[i][j] = i * j;
t2 = System.DateTime.Now;
Console.WriteLine(t2 - t1);
}
Time: (Release/Debug)
C++
600 / 606 ms array init,
200 / 270 ms array fill,
1sec /13sec vector init & fill.
(Yes, 13 seconds, I always have problems with lists/vectors in debug mode.)
C#:
20 / 20 ms array init,
403 / 440 ms array fill,
710 / 742 ms list init & fill.

It's an extremely vague question without real definitive answers.
For example; I'd rather play 3D-games that are created in C++ than in C#, because the performance is certainly a lot better. (And I know XNA, etc., but it comes no way near the real thing).
On the other hand, as previously mentioned; you should develop in a language that lets you do what you want quickly, and then if necessary optimize.

.NET languages can be as fast as C++ code, or even faster, but C++ code will have a more constant throughput as the .NET runtime has to pause for GC, even if it's very clever about its pauses.
So if you have some code that has to consistently run fast without any pause, .NET will introduce latency at some point, even if you are very careful with the runtime GC.

I suppose there are applications written in C# running fast, as well as there are more C++ written apps running fast (well C++ just older... and take UNIX too...)
- the question indeed is - what is that thing, users and developers are complaining about ...
Well, IMHO, in case of C# we have very comfort UI, very nice hierarchy of libraries, and whole interface system of CLI. In case of C++ we have templates, ATL, COM, MFC and whole shebang of alreadyc written and running code like OpenGL, DirectX and so on... Developers complains of indeterminably risen GC calls in case of C# (means program runs fast, and in one second - bang! it's stuck).
To write code in C# very simple and fast (not to forget that also increase chance of errors.
In case of C++, developers complains of memory leaks, - means crushes, calls between DLLs, as well as of "DLL hell" - problem with support and replacement libraries by newer ones...
I think more skill you'll have in the programming language, the more quality (and speed) will characterize your software.

I would put it this way: programmers who write faster code, are the ones who are the more informed of what makes current machines go fast, and incidentally they are also the ones who use an appropriate tool that allows for precise low-level and deterministic optimisation techniques. For these reasons, these people are the ones who use C/C++ rather than C#. I would go as far as stating this as a fact.

Well, it depends. If the byte-code is translated into machine-code (and not just JIT) (I mean if you execute the program) and if your program uses many allocations/deallocations it could be faster because the GC algorithm just need one pass (theoretically) through the whole memory once, but normal malloc/realloc/free C/C++ calls causes an overhead on every call (call-overhead, data-structure overhead, cache misses ;) ).
So it is theoretically possible (also for other GC languages).
I don't really see the extreme disadvantage of not to be able to use metaprogramming with C# for the most applications, because the most programmers don't use it anyway.
Another big advantage is that the SQL, like the LINQ "extension", provides opportunities for the compiler to optimize calls to databases (in other words, the compiler could compile the whole LINQ to one "blob" binary where the called functions are inlined or for your use optimized, but I'm speculating here).

If I'm not mistaken, C# templates are determined at runtime. This must be slower than compile time templates of C++.
And when you take in all the other compile-time optimizations mentioned by so many others, as well as the lack of safety that does, indeed, mean more speed...
I'd say C++ is the obvious choice in terms of raw speed and minimum memory consumption. But this also translates into more time developing the code and ensuring you aren't leaking memory or causing any null pointer exceptions.
Verdict:
C#: Faster development, slower run
C++: Slow development, faster run.

There are some major differences between C# and C++ on the performance aspect:
C# is GC / heap based. The allocation and GC itself is overhead as the non locality of the memory access
C++ optimizer's have become very good over the years. JIT compilers cannot achieve the same level since they have only limited compilation time and don't see the global scope
Besides that programmer competence plays also a role. I have seen bad C++ code where classes where passed by value as argument all over the place. You can actually make the performance worse in C++ if you don't know what you are doing.

I found this April 2020 read: https://www.quora.com/Why-is-C-so-slow-compared-to-Python by a real-world programmer with 15+ years of Software Development experience.
It states that C# is slower usually because it is compiled to Common Intermediate Language (CIL) instead of machine code like C++. The CIL is then put through Common Language Runtime (CLR) which outputs machine code. However, if you keep executing C# it will take the output of the machine code and cache it so the machine code is saved for the next execution. All in all, C# can be faster if you execute multiple times since it is in machine code after multiple executions.
There is also comments that a good C++ programmer can do optimizations that can be time consuming that will in end be optimized.

> After all, the answers have to be somewhere, haven't they? :)
Umm, no.
As several replies noted, the question is under-specified in ways that invite questions in response, not answers. To take just one way:
the question conflates language with language implementation - this C program is both 2,194 times slower and 1.17 times faster than this C# program - we would have to ask you: Which language implementations?
And then which programs? Which machine? Which OS? Which data set?

It really depends on what you're trying to accomplish in your code. I've heard that it's just stuff of urban legend that there is any performance difference between VB.NET, C# and managed C++. However, I've found, at least in string comparisons, that managed C++ beats the pants off of C#, which in turn beats the pants off of VB.NET.
I've by no means done any exhaustive comparisons in algorithmic complexity between the languages. I'm also just using the default settings in each of the languages. In VB.NET I'm using settings to require declaration of variables, etc. Here is the code I'm using for managed C++: (As you can see, this code is quite simple). I'm running the same in the other languages in Visual Studio 2013 with .NET 4.6.2.
#include "stdafx.h"
using namespace System;
using namespace System::Diagnostics;
bool EqualMe(String^ first, String^ second)
{
return first->Equals(second);
}
int main(array<String ^> ^args)
{
Stopwatch^ sw = gcnew Stopwatch();
sw->Start();
for (int i = 0; i < 100000; i++)
{
EqualMe(L"one", L"two");
}
sw->Stop();
Console::WriteLine(sw->ElapsedTicks);
return 0;
}

One area that I was instrumenting code in C++ vs C# was in creating a database connection to SQL Server and returning a resultset. I compared C++ (Thin layer over ODBC) vs C# (ADO.NET SqlClient) and found that C++ was about 50% faster than the C# code. ADO.NET is supposed to be a low-level interface for dealing with the database. Where you see perhaps a bigger difference is in memory consumption rather than raw speed.
Another thing that makes C++ code faster is that you can tune the compiler options at a granular level, optimizing things in a way you can't in C#.

Inspired by this, I did a quick test with 60 percent of common instruction needed in most of the programs.
Here’s the C# code:
for (int i=0; i<1000; i++)
{
StreamReader str = new StreamReader("file.csv");
StreamWriter stw = new StreamWriter("examp.csv");
string strL = "";
while((strL = str.ReadLine()) != null)
{
ArrayList al = new ArrayList();
string[] strline = strL.Split(',');
al.AddRange(strline);
foreach(string str1 in strline)
{
stw.Write(str1 + ",");
}
stw.Write("\n");
}
str.Close();
stw.Close();
}
String array and arraylist are used purposely to include those instructions.
Here's the c++ code:
for (int i = 0; i<1000; i++)
{
std::fstream file("file.csv", ios::in);
if (!file.is_open())
{
std::cout << "File not found!\n";
return 1;
}
ofstream myfile;
myfile.open ("example.txt");
std::string csvLine;
while (std::getline(file, csvLine))
{
std::istringstream csvStream(csvLine);
std::vector csvColumn;
std::string csvElement;
while( std::getline(csvStream, csvElement, ‘,’) )
{
csvColumn.push_back(csvElement);
}
for (std::vector::iterator j = csvColumn.begin(); j != csvColumn.end(); ++j)
{
myfile << *j << ", ";
}
csvColumn.clear();
csvElement.clear();
csvLine.clear();
myfile << "\n";
}
myfile.close();
file.close();
}
The input file size I used was 40 KB.
And here's the result -
C++ code ran in 9 seconds.
C# code: 4 seconds!!!
Oh, but this was on Linux... With C# running on Mono... And C++ with g++.
OK, this is what I got on Windows – Visual Studio 2003:
C# code ran in 9 seconds.
C++ code – horrible 370 seconds!!!

Can I check if the C# compiler inlined a method call?

I'm writing an XNA game where I do per-pixel collision checks. The loop which checks this does so by shifting an int and bitwise ORing and is generally difficult to read and understand.
I would like to add private methods such as private bool IsTransparent(int pixelColorValue) to make the loop more readable, but I don't want the overhead of method calls since this is very performance sensitive code.
Is there a way to force the compiler to inline this call or will I do I just hope that the compiler will do this optimization?
If there isn't a way to force this, is there a way to check if the method was inlined, short of reading the disassembly? Will the method show up in reflection if it was inlined and no other callers exist?
Edit: I can't force it, so can I detect it?

No you can't. Even more, the one who decides on inlining isn't VS compiler that takes you code and converts it into IL, but JIT compiler that takes IL and converts it to machine code. This is because only the JIT compiler knows enough about the processor architecture to decide if putting a method inline is appropriate as it’s a tradeoff between instruction pipelining and cache size.
So even looking in .NET Reflector will not help you.

"You can check
System.Reflection.MethodBase.GetCurrentMethod().Name.
If the method is inlined, it will
return the name of the caller
instead."
--Joel Coehoorn

There is a new way to encourage more agressive inlining in .net 4.5 that is described here: http://blogs.microsoft.co.il/blogs/sasha/archive/2012/01/20/aggressive-inlining-in-the-clr-4-5-jit.aspx
Basically it is just a flag to tell the compiler to inline if possible. Unfortunatly, it's not available in the current version of XNA (Game Studio 4.0) but should be available when XNA catches up to VS 2012 this year some time. It is already available if you are somehow running on Mono.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static int LargeMethod(int i, int j)
{
if (i + 14 > j)
{
return i + j;
}
else if (j * 12 < i)
{
return 42 + i - j * 7;
}
else
{
return i % 14 - j;
}
}

Be aware that the XBox works different.
A google turned up this:
"The inline method which mitigates the overhead of a call of a method.
JIT forms into an inline what fulfills the following conditions.
The IL code size is 16 bytes or less.
The branch command is not used (if
sentence etc.).
The local variable is not used.
Exception handling has not been
carried out (try, catch, etc.).
float is not used as the argument or
return value of a method (probably by
the Xbox 360, not applied).
When two or more arguments are in a
method, it uses for the turn
declared.
However, a virtual function is not formed into an inline."
http://xnafever.blogspot.com/2008/07/inline-method-by-xna-on-xbox360.html
I have no idea if he is correct. Anyone?

Nope, you can't.
Basically, you can't do that in most modern C++ compilers either. inline is just an offer to the compiler. It's free to take it or not.
The C# compiler does not do any special inlining at the IL level. JIT optimizer is the one that will do it.

why not use unsafe code (inline c as its known) and make use of c/c++ style pointers, this is safe from the GC (ie not affected by collection) but comes with its own security implications (cant use for internet zone apps) but is excellent for the kind of thing it appears you are trying to achieve especially with performance and even more so with arrays and bitwise operations?
to summarise, you want performance for a small part of your app? use unsafe code and make use of pointers etc seems the best option to me
EDIT: a bit of a starter ?
http://msdn.microsoft.com/en-us/library/aa288474(VS.71).aspx

The only way to check this is to get or write a profiler, and hook into the JIT events, you must also make sure Inlining is not turned off as it is by default when profiling.

You can detect it at runtime with the aforementioned GetCurrentMethod call. But, that'd seem to be a bit of a waste[1]. The easiest thing to do would to just ILDASM the MSIL and check there.
Note that this is specifically for the compiler inlining the call, and is covered in the various Reflection docs on MSDN.
If the method that calls the GetCallingAssembly method is expanded inline by the compiler (that is, if the compiler inserts the function body into the emitted Microsoft intermediate language (MSIL), rather than emitting a function call), then the assembly returned by the GetCallingAssembly method is the assembly containing the inline code. This might be different from the assembly that contains the original method. To ensure that a method that calls the GetCallingAssembly method is not inlined by the compiler, you can apply the MethodImplAttribute attribute with MethodImplOptions.NoInlining.
However, the JITter is also free to inline calls - but I think a disassembler would be the only way to verify what is and isn't done at that level.
Edit: Just to clear up some confusion in this thread, csc.exe will inline MSIL calls - though the JITter will (probably) be more aggressive in it.
[1] And, by waste - I mean that (a) that it defeats the purpose of the inlining (better performance) because of the Reflection lookup. And (b), it'd probably change the inlining behavior so that it's no longer inlined anyway. And, before you think you can just turn it on Debug builds with an Assert or something - realize that it will not be inlined during Debug, but may be in Release.

Is there a way to force the compiler to inline this call or will I do I just hope that the compiler will do this optimization?
If it is cheaper to inline the function, it will. So don't worry about it unless your profiler says that it actually is a problem.
For more information
JIT Enhancements in .NET 3.5 SP1

For simple code, you can try to get asm even online: https://sharplab.io/
For more complex cases, try https://github.com/szehetner/InliningAnalyzer (I've not tried it yet).

Turning off JIT, and controlling codeflow in MSIL (own VM)

I'm writing my own scripting language in C#, with some features I like, and I chose to use MSIL as output's bytecode (Reflection.Emit is quite useful, and I dont have to think up another bytecode). It works, emits executable, which can be run ( even decompiled with Reflector :) )and is quite fast.
But - I want to run multiple 'processes' in one process+one thread, and control their assigned CPU time manually (also implement much more robust IPC that is offered by .NET framework) Is there any way to entirely disable JIT and create own VM, stepping instruction-after-instruction using .NET framework (and control memory usage, etc.), without need to write anything on my own, or to achieve this I must write entire MSIL interpret?
EDIT 1): I know that interpreting IL isn't the fastest thing in the universe :)
EDIT 2): To clarify - I want my VM to be some kind of 'operating system' - it gets some CPU time and divides it between processes, controls memory allocation for them, and so on. It doesnt have to be fast, nor effective, but just a proof of concept for some of my experiments. I dont need to implement it on the level of processing every instruction - if this should be done by .NET, I wont mind, i just want to say : step one instruction, and wait till I told you to step next.
EDIT 3): I realized, that ICorDebug can maybe accomplish my needs, now looking at implementation of Mono's runtime.

You could use Mono - I believe that allows an option to interpret the IL instead of JITting it. The fact that it's open source means (subject to licensing) that you should be able to modify it according to your needs, too.
Mono doesn't have all of .NET's functionality, admittedly - but it may do all you need.

Beware that MSIL was designed to be parsed by a JIT compiler. It is not very suitable for an interpreter. A good example is perhaps the ADD instruction. It is used to add a wide variety of value type values: byte, short, int32, int64, ushort, uint32, uint64. Your compiler knows what kind of add is required but you'll lose that type info when generating the MSIL.
Now you need to find it back at runtime and that requires checking the types of the values on the evaluation stack. Very slow.
An easily interpreted IL has dedicated ADD instructions like ADD8, ADD16, etc.

Microsofts implementation of the Common Language Runtime has only one execution system, the JIT. Mono, on the other hand comes with both, a JIT and an interpreter.
I, however, do not fully understand what exactly you want to do yourself and what you would like to leave to Microsofts implementation:
Is there any way to entirely disable JIT and create own VM?
and
... without need to write anything on my own, or to achieve this I must write entire MSIL interpret?
is sort of contradicting.
If you think, you can write a better execution system than microsofts JIT, you will have to write it from scratch. Bear in mind, however, that both microsofts and monos JIT are highly optimized compilers. (Programming language shootout)
Being able to schedule CPU time for operating system processes exactly is not possible from user mode. That's the operating systems task.
Some implementation of green threads might be an idea, but that is definitely a topic for unmanaged code. If that's what you want, have a look at the CLR hosting API.
I would suggest, you try to implement your language in CIL. After all, it gets compiled down to raw x86. If you don't care about verifiability, you can use pointers where necessary.

One thing you could consider doing is generating code in a state-machine style. Let me explain what I mean by this.
When you write generator methods in C# with yield return, the method is compiled into an inner IEnumerator class that implements a state machine. The method's code is compiled into logical blocks that are terminated with a yield return or yield break statement, and each block corresponds to a numbered state. Because each yield return must provide a value, each block ends by storing a value in a local field. The enumerator object, in order to generate its next value, calls a method that consists of a giant switch statement on the current state number in order to run the current block, then advances the state and returns the value of the local field.
Your scripting language could generate its methods in a similar style, where a method corresponds to a state machine object, and the VM allocates time by advancing the state machine during the time allotted. A few tricky parts to this method: implementing things like method calls and try/finally blocks are harder than generating straight-up MSIL.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.