How do the .NET JITs optimise generated code layout?

How do the .NET JITs optimise generated code layout? - c#

Back in 2009 I posted this answer to a question about optimisations for nested try/catch/finally blocks.
Thinking about this again some years later, it seems the question could be extended to that other control flow, not only try/catch/finally, but also if/else.
At each of these junctions, execution will follow one path. Code must be generated for both, obviously, but the order in which they're placed in memory, and the number of jumps required to navigate through them will differ.
The order generated code is laid out in memory has implications for the miss rate on the CPU's instruction cache. Having the instruction pipeline stalled, waiting for memory reads, can really kill loop performance.
I don't think loops (for/foreach/while) are a such a good fit unless you expect the loop have zero iterations more often than it has some, as the natural generation order seems pretty optimal.
Some questions:
In what ways do the available .NET JITs optimise for generated instruction order?
How much difference can this make in practice to common code? What about perfectly suited cases?
Is there anything the developer can do to influence this layout? What about mangling with the forbidden goto?
Does the specific JIT being used make much difference to layout?
Does the method inlining heuristic come into play here too?
Basically anything interesting related to this aspect of the JIT!
Some initial thoughts:
Moving catch blocks out of line is an easy job, as they're supposed to be the exceptional case by definition. Not sure this happens.
For some loops I suspect you can increase performance non-trivially. However in general I don't think it'll make that much difference.
I don't know how the JIT decides the order of generated code. In C on Linux you have likely(cond) and unlikely(cond) which you can use to tell to the compiler which branch is the common path to optimise for. I'm not sure that all compilers respect these macros.
Instruction ordering is distinct from the problem of branch prediction, in which the CPU guesses (on its own, afaik) which branch will be taken in order to start the pipeline (oversimplied steps: decode, fetch operands, execute, write back) on instructions, before the execute step has determined the value of the condition variable.
I can't think of any way to influence this order in the C# language. Perhaps you can manipulate it a bit by gotoing to labels explicitly, but is this portable, and are there any other problems with it?
Perhaps this is what profile guided optimisation is for. Do we have that in the .NET ecosystem, now or in plan? Maybe I'll go and have a read about LLILC.

The optimization you are referring to is called the code layout optimization which is defined as follows:
Those pieces of code that are executed close in time in the same thread should be be close in the virtual address space so that they fit in a single or few consecutive cache lines. This reduces cache misses.
Those pieces of code that are executed close in time in different threads should be be close in the virtual address space so that they fit in a single or few consecutive cache lines as long as there is no self-modifying code. This gets lower priority than the previous one. This reduces cache misses.
Those pieces of code that are executed frequently (hot code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Those pieces of code that are rarely executed (cold code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Now to your questions.
In what ways do the available .NET JITs optimise for generated
instruction order?
"Instruction order" is really a very general term. Many optimizations affect instruction order. I'll assume that you're referring to code layout.
JITters by design should take the minimum amount of time to compile code while at the same time produce high-quality code. To achieve this, they only perform the most important optimizations so that it's really worth spending time doing them. Code layout optimization is not one of them because without profiling, it may not be beneficial. While a JITter can certainly perform profiling and dynamic optimization, there is a generally preferred way.
How much difference can this make in practice to common code? What
about perfectly suited cases?
Code layout optimization by itself can improve overall performance typically by -1% (negative one) to 4%, which is enough to make compiler writers happy. I would like to add that it reduces energy consumption indirectly by reducing cache misses. The reduction in miss ratio of the instruction cache can be typically up to 35%.
Is there anything the developer can do to influence this layout? What
about mangling with the forbidden goto?
Yes, there are numerous ways. I would like to mention the generally recommended one which is mpgo.exe. Please do not use goto for this purpose. It's forbidden.
Does the specific JIT being used make much difference to layout?
No.
Does the method inlining heuristic come into play here too?
Inlining can indeed improve code layout with respect to function calls. It's one of the most important optimizations and all .NET JITs perform it.
Moving catch blocks out of line is an easy job, as they're supposed to
be the exceptional case by definition. Not sure this happens.
Yes it might be "easy", but what is the potential gained benefit? catch blocks are typically small in size (containing a call to a function that handles the exception). Handling this particular case of code layout does not seem promising. If you really care, use mpgo.exe.
I don't know how the JIT decides the order of generated code. In C on
Linux you have likely(cond) and unlikely(cond) which you can use to
tell to the compiler which branch is the common path to optimise for.
Using PGO is much more preferable over using likely(cond) and unlikely(cond) for two reasons:
The programmer might inadvertently make mistakes while placing likely(cond) and unlikely(cond) in the code. It actually happens a lot. Making big mistakes while trying to manually optimize the code is very typical.
Adding likely(cond) and unlikely(cond) all over the code makes it less maintainable in the future. You'll have to make sure that these hints hold every time you change the source code. In large code bases, this could be ( or rather is) a nightmare.
Instruction ordering is distinct from the problem of branch
prediction...
Assuming you are talking about code layout, yes they are distinct. But code layout optimization is usually guided by a profile which really includes branch statistics. Hardware branch prediction is of course totally different.
Maybe I'll go and have a read about LLILC.
While using mpgo.exe is the mainstream way of performing this optimization, you can use LLILC also since LLVM support profile-guided optimization as well. But I don't think you need to go this far.

Related

Repeated access to properties and speed in C# [duplicate]

Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)

First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.

If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.

Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.

It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.

why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.

If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.

Compile C#, so that it runs with the speed of C++

Alright, so I wanted to ask if it's actually possible to make a parser from c# to c++.
So that code written in C# would be able to run as fast as code written in C++.
Is it actually possible to do? I'm not asking how hard is it going to be.

What makes you think that translating your C# code to C++ would magically make it faster?
Languages don't have a speed. Assuming that C# code is slower (I'll get back to that), it is because of what that code does (including the implicit requirements placed by C#, such as bounds checking on arrays), and not because of the language it is written in.
If you converted your C# code to C++, it would still need to do bounds checking on arrays, because the original source code expected this to happen, so it would have to do just as much work.
Moreover, C# often isn't slower than C++. There are plenty of benchmarks floating around on the internet, generally showing that for the most part, C# is as fast as (or faster than) C++. Only when you spend a lot of time optimizing your code, does C++ become faster.
If you want faster code, you need to write code that requires less work to execute, not try to change the source language. That's just cargo-cult programming at its worst. You once saw some efficient code, and that was written in C++, so now you try to make things C++, in the hope of attracting any efficiency that might be passing by.
It just doesn't work that way.

Although you could translate C# code to C++, there would be the issue that C# depends on the .Net framework libraries which are not native, so you could not simply translate C# code to C++.
Update
Also C# code depends on the runtime to do things such as memory management i.e. Garbage Collection. If you translated the C# code to C++, where would the memory management code be? Parsing and translating is not going to fix issues like that.

The Mono project has invested quite a lot of energy in turning LLVM into a native machine code compiler for the C# runtime, although there are some problems with specific language constructs like shared generics etc.. Check it out and take it for a spin.

You can use NGen to compile IL to native code

Performance related tweaks:
Platform independent
use a profiler to spot the bottlenecks;
prevent unnecessary garbage (spot it using generation #0 collect count and the Large Object heap)
prevent unnecessary copying (use struct wisely)
prevent unwarranted generics (code-sharing has unexpected performance side effects)
prefer oldfashioned loops over enumerator blocks when performance is an issue
When using LINQ watch closely where you maintain/break deferred evaluation. Both can be enormous boosts to performance
use reflection.emit/Expression Trees to precompile certain dynamic logic that is performance bottleneck
Mono
use Mono --gc=sgen --optimize=inline,... (the SGEN garbage collector can make orders of magnitude difference). See also man mono for a lot of tuning/optimization options
use MONO_GENERIC_SHARING=none to disable sharing of generics (making particular tasks a lot quicker especially when supporting both valuetypes and reftypes) (not recommended for regular production use)
use the -optimize+ compile flag (optimizing the CLR code independently from what the JITter may do with that)
Less mainstream:
use the LLVM backend (click the quote:)
This allows Mono to benefit from all of the compiler optimizations done in LLVM. For example the SciMark score goes from 482 to 610.
use mkbundle to create a statically linked NATIVE binary image (already fully JITted, i.e. AOT (ahead-of-time compiled))
MS .NET
Most of the above have direct Microsoft pendants (NGen, `/Optimize' etc.)
Of course MS don't have a switchable/tunable garbage collector, and I don't think a fully compiled native binary can be achieved like with mono.

As always the answer to making code run faster is:
Find the bottleneck and optimize that
Most of the time the bottleneck is either:
time spend in a critical loop
Review your algorithm and datastructure, do not change the language, the latter will give a 10% speedup, the first will give you a 1000x speedup.
If you're stuck on the best algorithm, you can always ask a specific, short and detailed question on SO.
time waiting for resources for a slow source
Reduce the amount of stuff you're requesting from the source
instead of:
SELECT * FROM bigtable
do
SELECT TOP 10 * FROM bigtable ORDER BY xxx
The latter will return instantly and you cannot show a million records in a meaningful way anyhow.
Or you can have the server at the order end reduce the data so that it doesn't take a 100 years to cross the network.
Alternativly you can execute the slow datafetch routine in a separate thread, so the rest of your program can do meaningful stuff instead of waiting.
Time spend because you are overflowing memory with Gigabytes of data
Use a different algorithm that works on a smaller dataset at a time.
Try to optimize cache usage.
The answer to efficient coding is measure where your coding time goes
Use a profiler.
see: http://csharp-source.net/open-source/profilers
And optimize those parts that eat more than 50% of your CPU time.
Do this for a number of iterations, and soon your 10 hour running time will be down to a manageable 3 minutes, instead of the 9.5 hours that you will get from switching to this or that better language.

When is optimization premature? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I see this term used a lot but I feel like most people use it out of laziness or ignorance. For instance, I was reading this article:
http://blogs.msdn.com/b/ricom/archive/2006/09/07/745085.aspx
where he talks about his decisions he makes to implement the types necessary for his app.
If it was me, talking about these for code that we need to write, other programmers would think either:
I am thinking way too much ahead when there is nothing and thus prematurely optimizing.
Over-thinking insignificant details when there is no slowdowns or performance problems experienced.
or both.
and would suggest to just implement it and not worry about these until they become a problem.
Which is more preferential?
How to make the differentiation between premature optimization vs informed decision making for a performance critical application before any implementation is done?

Optimization is premature if:
Your application isn't doing anything time-critical. (Which means, if you're writing a program that adds up 500 numbers in a file, the word "optimization" shouldn't even pop into your brain, since all it'll do is waste your time.)
You're doing something time-critical in something other than assembly, and still worrying whether i++; i++; is faster or i += 2... if it's really that critical, you'd be working in assembly and not wasting time worrying about this. (Even then, this particular example most likely won't matter.)
You have a hunch that one thing might be a bit faster than the other, but you need to look it up. For example, if something is bugging you about whether StopWatch is faster or Environment.TickCount, it's premature optimization, since if the difference was bigger, you'd probably be more sure and wouldn't need to look it up.
If you have a guess that something might be slow but you're not too sure, just put a //NOTE: Performance? comment, and if you later run into bottlenecks, check such places in your code. I personally don't worry about optimizations that aren't too obvious; I just use a profiler later, if I need to.
Another technique:
I just run my program, randomly break into it with the debugger, and see where it stopped -- wherever it stops is likely a bottleneck, and the more often it stops there, the worse the bottleneck. It works almost like magic. :)

This proverb does not (I believe) refer to optimizations that are built into a good design as it is created. It refers to tasks specifically targeted at performance, which otherwise would not be undertaken.
This kind of optimization does not "become" premature, according to the common wisdom — it is guilty until proven innocent.

Optimisation is the process of making existing code run more efficiently (faster speed, and/or less resource usage)
All optimisation is premature if the programmer has not proven that it is necessary. (For example, by running the code to determine if it achieves the correct results in an acceptable timeframe. This could be as simple as running it to "see" if it runs fast enough, or running under a profiler to analyze it more carefully).
There are several stages to programming something well:
1) Design the solution and pick a good, efficient algorithm.
2) Implement the solution in a maintainable, well coded manner.
3) Test the solution and see if it meets your requirements on speed, RAM usage, etc. (e.g. "When the user clicks "Save", does it take less than 1 second?" If it takes 0.3s, you really don't need to spend a week optimising it to get that time down to 0.2s)
4) IF it does not meet the requirements, consider why. In most cases this means go to step (1) to find a better algorithm now that you understand the problem better. (Writing a quick prototype is often a good way of exploring this cheaply)
5) IF it still does not meet the requirements, start considering optimisations that may help speed up the runtime (for example, look-up tables, caching, etc). To drive this process, profiling is usually an important tool to help you locate the bottle-necks and inefficiences in the code, so you can make the greatest gain for the time you spend on the code.
I should point out that an experienced programmer working on a reasonably familiar problem may be able to jump through the first steps mentally and then just apply a pattern, rather than physically going through this process every time, but this is simply a short cut that is gained through experience
Thus, there are many "optimisations" that experienced programmers will build into their code automatically. These are not "premature optimisations" so much as "common-sense efficiency patterns". These patterns are quick and easy to implement, but vastly improve the efficiency of the code, and you don't need to do any special timing tests to work out whether or not they will be of benefit:
Not putting unnecessary code into loops. (Similar to the optimisation of removing unnecessary code from existing loops, but it doesn't involve writing the code twice!)
Storing intermediate results in variables rather than re-calculating things over and over.
Using look-up tables to provide precomputed values rather than calculating them on the fly.
Using appropriate-sized data structures (e.g. storing a percentage in a byte (8 bits) rather than a long (64 bits) will use 8 times less RAM)
Drawing a complex window background using a pre-drawn image rather than drawing lots of individual components
Applying compression to packets of data you intend to send over a low-speed connection to minimise the bandwidth usage.
Drawing images for your web page in a style that allows you to use a format that will get high quality and good compression.
And of course, although it's not technically an "optmisation", choosing the right algorithm in the first place!
For example, I just replaced an old piece of code in our project. My new code is not "optimised" in any way, but (unlike the original implementation) it was written with efficiency in mind. The result: Mine runs 25 times faster - simply by not being wasteful. Could I optimise it to make it faster? Yes, I could easily get another 2x speedup. Will I optimise my code to make it faster? No - a 5x speed improvement would have been sufficient, and I have already achieved 25x. Further work at this point would just be a waste of precious programming time. (But I can revisit the code in future if the requirements change)
Finally, one last point: The area you are working in dictates the bar you must meet. If you are writing a graphics engine for a game or code for a real-time embedded controller, you may well find yourself doing a lot of optimisation. If you are writing a desktop application like a notepad, you may never need to optimise anything as long as you aren't overly wasteful.

When starting out, just delivering a product is more important than optimizing.
Over time you are going to profile various applications and will learn coding skills that will naturally lead to optimized code. Basically at some point you'll be able to spot potential trouble spots and build things accordingly.
However don't sweat it until you've found an actual problem.

Premature optimization is making an optimization for performance at the cost of some other positive attribute of your code (e.g. readability) before you know that it is necessary to make this tradeoff.
Usually premature optimizations are made during the development process without using any profiling tools to find bottlenecks in the code. In many cases the optimization will make the code harder to maintain and sometimes also increases the development time, and therefore the cost of the software. Worse... some premature optimizations turn out not to be make the code any faster at all and in some cases can even make the code slower than it was before.

When you have less that 10 years of coding experience.

Having (lots of) experience might be a trap. I know many very experienced programmers (C\C++, assembly) who tend to worry too much because they are used to worry about clock ticks and superfluous bits.
There are areas such as embedded or realtime systems where these do count but in regular OLTP/LOB apps most of your effort should be directed towards maintainability, readability and changeabilty.

Optimization is tricky. Consider the following examples:
Deciding on implementing two servers, each doing its own job, instead of implementing a single server that will do both jobs.
Deciding to go with one DBMS and not another, for performance reasons.
Deciding to use a specific, non-portable API when there is a standard (e.g., using Hibernate-specific functionality when you basically need the standard JPA), for performance reasons.
Coding something in assembly for performance reasons.
Unrolling loops for performance reasons.
Writing a very fast but obscure piece of code.
My bottom line here is simple. Optimization is a broad term. When people talk about premature optimization, they don't mean you need to just do the first thing that comes to mind without considering the complete picture. They are saying you should:
Concentrate on the 80/20 rule - don't consider ALL the possible cases, but the most probable ones.
Don't over-design stuff without any good reason.
Don't write code that is not clear, simple and easily maintainable if there is no real, immediate performance problem with it.
It really all boils down to your experience. If you are an expert in image processing, and someone requests you do something you did ten times before, you will probably push all your known optimizations right from the beginning, but that would be ok. Premature optimization is when you're trying to optimize something when you don't know it needs optimization to begin with. The reason for that is simple - it's risky, it's wasting your time, and it will be less maintainable. So unless you're experienced and you've been down that road before, don't optimize if you don't know there's a problem.

Note that optimization is not free (as in beer)
it takes more time to write
it takes more time to read
it takes more time to test
it takes more time to debug
...
So before optimizing anything, you should be sure it's worth it.
That Point3D type you linked to seems like the cornerstone of something, and the case for optimization was probably obvious.
Just like the creators of the .NET library didn't need any measurements before they started optimizing System.String. They would have to measure during though.
But most code does not play a significant role in the performance of the end product. And that means any effort in optimization is wasted.
Besides all that, most 'premature optimizations' are untested/unmeasured hacks.

Optimizations are premature if you spend too much time designing those during the earlier phases of implementation. During the early stages, you have better things to worry about: getting core code implemented, unit tests written, systems talking to each other, UI, and whatever else. Optimizing comes with a price, and you might well be wasting time on optimizing something that doesn't need to be, all the while creating code that is harder to maintain.
Optimizations only make sense when you have concrete performance requirements for your project, and then performance will matter after the initial development and you have enough of your system implemented in order to actually measure whatever it is you need to measure. Never optimize without measuring.
As you gain more experience, you can make your early designs and implementations with a small eye towards future optimizations, that is, try to design in such a way that will make it easier to measure performance and optimize later on, should that even be necessary. But even in this case, you should spend little time on optimizations in the early phases of development.

Tips for optimizing C#/.NET programs [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
It seems like optimization is a lost art these days. Wasn't there a time when all programmers squeezed every ounce of efficiency from their code? Often doing so while walking five miles in the snow?
In the spirit of bringing back a lost art, what are some tips that you know of for simple (or perhaps complex) changes to optimize C#/.NET code? Since it's such a broad thing that depends on what one is trying to accomplish it'd help to provide context with your tip. For instance:
When concatenating many strings together use StringBuilder instead. See link at the bottom for caveats on this.
Use string.Compare to compare two strings instead of doing something like string1.ToLower() == string2.ToLower()
The general consensus so far seems to be measuring is key. This kind of misses the point: measuring doesn't tell you what's wrong, or what to do about it if you run into a bottleneck. I ran into the string concatenation bottleneck once and had no idea what to do about it, so these tips are useful.
My point for even posting this is to have a place for common bottlenecks and how they can be avoided before even running into them. It's not even necessarily about plug and play code that anyone should blindly follow, but more about gaining an understanding that performance should be thought about, at least somewhat, and that there's some common pitfalls to look out for.
I can see though that it might be useful to also know why a tip is useful and where it should be applied. For the StringBuilder tip I found the help I did long ago at here on Jon Skeet's site.

It seems like optimization is a lost art these days.
There was once a day when manufacture of, say, microscopes was practiced as an art. The optical principles were poorly understood. There was no standarization of parts. The tubes and gears and lenses had to be made by hand, by highly skilled workers.
These days microscopes are produced as an engineering discipline. The underlying principles of physics are extremely well understood, off-the-shelf parts are widely available, and microscope-building engineers can make informed choices as to how to best optimize their instrument to the tasks it is designed to perform.
That performance analysis is a "lost art" is a very, very good thing. That art was practiced as an art. Optimization should be approached for what it is: an engineering problem solvable through careful application of solid engineering principles.
I have been asked dozens of times over the years for my list of "tips and tricks" that people can use to optimize their vbscript / their jscript / their active server pages / their VB / their C# code. I always resist this. Emphasizing "tips and tricks" is exactly the wrong way to approach performance. That way leads to code which is hard to understand, hard to reason about, hard to maintain, that is typically not noticably faster than the corresponding straightforward code.
The right way to approach performance is to approach it as an engineering problem like any other problem:
Set meaningful, measurable, customer-focused goals.
Build test suites to test your performance against these goals under realistic but controlled and repeatable conditions.
If those suites show that you are not meeting your goals, use tools such as profilers to figure out why.
Optimize the heck out of what the profiler identifies as the worst-performing subsystem. Keep profiling on every change so that you clearly understand the performance impact of each.
Repeat until one of three things happens (1) you meet your goals and ship the software, (2) you revise your goals downwards to something you can achieve, or (3) your project is cancelled because you could not meet your goals.
This is the same as you'd solve any other engineering problem, like adding a feature -- set customer focused goals for the feature, track progress on making a solid implementation, fix problems as you find them through careful debugging analysis, keep iterating until you ship or fail. Performance is a feature.
Performance analysis on complex modern systems requires discipline and focus on solid engineering principles, not on a bag full of tricks that are narrowly applicable to trivial or unrealistic situations. I have never once solved a real-world performance problem through application of tips and tricks.

Get a good profiler.
Don't bother even trying to optimize C# (really, any code) without a good profiler. It actually helps dramatically to have both a sampling and a tracing profiler on hand.
Without a good profiler, you're likely to create false optimizations, and, most importantly, optimize routines that aren't a performance problem in the first place.
The first three steps to profiling should always be 1) Measure, 2) measure, and then 3) measure....

Optimization guidelines:
Don't do it unless you need to
Don't do it if it's cheaper to throw new hardware at the problem instead of a developer
Don't do it unless you can measure the changes in a production-equivalent environment
Don't do it unless you know how to use a CPU and a Memory profiler
Don't do it if it's going to make your code unreadable or unmaintainable
As processors continue to get faster the main bottleneck in most applications isn't CPU, it's bandwidth: bandwidth to off-chip memory, bandwidth to disk and bandwidth to net.
Start at the far end: use YSlow to see why your web site is slow for end-users, then move back and fix you database accesses to be not too wide (columns) and not too deep (rows).
In the very rare cases where it's worth doing anything to optimize CPU usage be careful that you aren't negatively impacting memory usage: I've seen 'optimizations' where developers have tried to use memory to cache results to save CPU cycles. The net effect was to reduce the available memory to cache pages and database results which made the application run far slower! (See rule about measuring.)
I've also seen cases where a 'dumb' un-optimized algorithm has beaten a 'clever' optimized algorithm. Never underestimate how good compiler-writers and chip-designers have become at turning 'inefficient' looping code into super efficient code that can run entirely in on-chip memory with pipelining. Your 'clever' tree-based algorithm with an unwrapped inner loop counting backwards that you thought was 'efficient' can be beaten simply because it failed to stay in on-chip memory during execution. (See rule about measuring.)

When working with ORMs be aware of N+1 Selects.
List<Order> _orders = _repository.GetOrders(DateTime.Now);
foreach(var order in _orders)
{
Print(order.Customer.Name);
}
If the customers are not eagerly loaded this could result in several round trips to the database.

Don't use magic numbers, use enumerations
Don't hard-code values
Use generics where possible since it's typesafe & avoids boxing & unboxing
Use an error handler where it's absolutely needed
Dispose, dispose, dispose. CLR wound't know how to close your database connections, so close them after use and dispose of unmanaged resources
Use common-sense!

OK, I have got to throw in my favorite: If the task is long enough for human interaction, use a manual break in the debugger.
Vs. a profiler, this gives you a call stack and variable values you can use to really understand what's going on.
Do this 10-20 times and you get a good idea of what optimization might really make a difference.

If you identify a method as a bottleneck, but you don't know what to do about it, you are essentially stuck.
So I'll list a few things. All of these things are not silver bullets and you will still have to profile your code. I'm just making suggestions for things you could do and can sometimes help. Especially the first three are important.
Try solving the problem using just (or: mainly) low-level types or arrays of them.
Problems are often small - using a smart but complex algorithm does not always make you win, especially if the less-smart algorithm can be expressed in code that only uses (arrays of) low level types. Take for example InsertionSort vs MergeSort for n<=100 or Tarjan's Dominator finding algorithm vs using bitvectors to naively solve the data-flow form of the problem for n<=100. (the 100 is of course just to give you some idea - profile!)
Consider writing a special case that can be solved using just low-level types (often problem instances of size < 64), even if you have to keep the other code around for larger problem instances.
Learn bitwise arithmetic to help you with the two ideas above.
BitArray can be your friend, compared to Dictionary, or worse, List. But beware that the implementation is not optimal; You can write a faster version yourself. Instead of testing that your arguments are out of range etc., you can often structure your algorithm so that the index can not go out of range anyway - but you can not remove the check from the standard BitArray and it is not free.
As an example of what you can do with just arrays of low level types, the BitMatrix is a rather powerful structure that can be implemented as just an array of ulongs and you can even traverse it using an ulong as "front" because you can take the lowest order bit in constant time (compared with the Queue in Breadth First Search - but obviously the order is different and depends on the index of the items rather than purely the order in which you find them).
Division and modulo are really slow unless the right hand side is a constant.
Floating point math is not in general slower than integer math anymore (not "something you can do", but "something you can skip doing")
Branching is not free. If you can avoid it using a simple arithmetic (anything but division or modulo) you can sometimes gain some performance. Moving a branch to outside a loop is almost always a good idea.

People have funny ideas about what actually matters. Stack Overflow is full of questions about, for example, is ++i more "performant" than i++. Here's an example of real performance tuning, and it's basically the same procedure for any language. If code is simply written a certain way "because it's faster", that's guessing.
Sure, you don't purposely write stupid code, but if guessing worked, there would be no need for profilers and profiling techniques.

The truth is that there is no such thing as the perfect optimised code. You can, however, optimise for a specific portion of code, on a known system (or set of systems) on a known CPU type (and count), a known platform (Microsoft? Mono?), a known framework / BCL version, a known CLI version, a known compiler version (bugs, specification changes, tweaks), a known amount of total and available memory, a known assembly origin (GAC? disk? remote?), with known background system activity from other processes.
In the real world, use a profiler, and look at the important bits; usually the obvious things are anything involving I/O, anything involving threading (again, this changes hugely between versions), and anything involving loops and lookups, but you might be surprised at what "obviously bad" code isn't actually a problem, and what "obviously good" code is a huge culprit.

Tell the compiler what to do, not how to do it. As an example, foreach (var item in list) is better than for (int i = 0; i < list.Count; i++) and m = list.Max(i => i.value); is better than list.Sort(i => i.value); m = list[list.Count - 1];.
By telling the system what you want to do it can figure out the best way to do it. LINQ is good because its results aren't computed until you need them. If you only ever use the first result, it doesn't have to compute the rest.
Ultimately (and this applies to all programming) minimize loops and minimize what you do in loops. Even more important is to minimize the number of loops inside your loops. What's the difference between an O(n) algorithm and an O(n^2) algorithm? The O(n^2) algorithm has a loop inside of a loop.

I don't really try to optimize my code but at times I will go through and use something like reflector to put my programs back to source. It is interesting to then compare what I wrong with what the reflector will output. Sometimes I find that what I did in a more complicated form was simplified. May not optimize things but helps me to see simpler solutions to problems.

What is code optimization?

When said this code need some optimization, or can be some how optimized, what does that mean? which kind of code need optimization? How to apply optimization to the code in c#? What the benefits from that?

Optimization is a very broad term. In general it implies modifying the system to make some of its aspect to work more efficiently or use fewer resources or be more robust. For example, a computer program may be optimized so that it will execute faster or use less memory or disk storage or be more responsive in terms of UI.
Although "optimization" has the same root as "optimal", the process of optimization does not produce a totally optimal system: there's always a trade-off, so only attributes of greatest interest are optimized.
And remember:
The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet. (Michael A. Jackson)

Optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources.
In your case refers mainly to 2 levels:
Design level
At the highest level, the design may be optimized to make best use of the available resources. The implementation of this design will benefit from a good choice of efficient algorithms and the implementation of these algorithms will benefit from writing good quality code. The architectural design of a system overwhelmingly affects its performance. The choice of algorithm affects efficiency more than any other item of the design. In some cases, however, optimization relies on using fancier algorithms, making use of special cases and special tricks and performing complex trade-offs; thus, a fully optimized program can sometimes, if insufficiently commented, be more difficult for less experienced programmers to comprehend and hence may contain more faults than unoptimized versions.
Source code level
Avoiding bad quality coding can also improve performance, by avoiding obvious slowdowns. After that, however, some optimizations are possible which actually decrease maintainability; some, but not all of them can nowadays be performed by optimizing compilers. For instance, using more indirection is often needed to simplify or improve a software, but that indirection has a cost.

Code optimization is making code run faster. There are two primary ways of doing this:
1) Squeezing more work into less cycles. Figure out where the code is doing an extra copy or if there is a branch in a tight loop. This is optimizing in the small.
2) Making your algorithms scale better. You may have heard of "Big O" notation. This is making an algorithm degrade much less quickly with large sets of data.
For instance, if you naively search a phone book for a name you will start on page 1 and read all the names until you find the one you are looking for. This will take a number of instructions scaled by the number of names in the phone book. We call this O(n). Now think about how you really search the phone book. You open to some place toward the middle and see which side the name you are looking for is on. This is called a binary search and scales at the logarithm of the number of names. We call this O(logn). It's much faster.
Remember the first rule of optimization: Measure first. Many man years have been spent optimizing code that wasn't run very much.

When doing code optimization, you take a metric on your code and try to make it more efficient. The metric usually refers to a scarce resource.
Here are common metrics
Execution speed (usually the first that comes to mind when saying optimization)
Memory consumption
Executable size (on embedded systems it can be important)
Database access
Remote service access (Make it less chatty, caching..)
Simplicity, readability, maintainability of the code
After optimization the code should give the same result.
The problem is that you have to make choices. Execution speed often comes with more memory consuption...
You should also alwas consider optimization globally. Having a gain of 10ms in a loop when you then spend 1000ms waiting for a web service is totaly useless.

To add to Anton Gogolev's answer, when a piece of code needs optimisation, it is because a particular performance requirement is not met. We develop programs to meet users requirements, right? Most programmers tend to think largely in terms of functional requirements, i.e. what the program does, but users will also have performance requirements, what is the resource cost (network bandwidth, CPU cycles, memory, disk space, etc...) of providing the functionality. Optimization is the process of changing a piece of code to meet a specific performance requirement. IMHO this should happen at design time, but you will sometimes write a piece of code only to discover it underperforms. To optimize the code, you first have to find out which is the resource that you are over using. If it is CPU cycles or memory, a profiler might help. If it is network bandwidth, which is a very common one these days, you will need to do some load testing and comms profiling.
My advice would be to always understand your current and probable future perfromance requirements before writing code, and optimize at design stage. Late optimization is expensive, difficult, and often either fails or results in ugly code.

Optimization has two main purposes:
getting your software use less resources, e.g., run faster, be smaller, use less RAM, less hard disk space both when running and when storing documents, less network access, ...
getting your software be more maintainable, by refactoring it.
You don't need to optimize as long as no related issue has been raised: It is far more difficult to debug optimized code than to optimize correct code.

It might be for example that the code has a block of code which is duplicated, and could/should be put into a method, you might be using deprecated methods/classes, there might be simpler ways to do what the code is doing, there might be some cleaning up to do (e.g. remove hard coding) etc...

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.