Many IF statements vs Delegates?

Many IF statements vs Delegates? - c#

Would it be better for performance to use if statements over and over again, or to use them once and use delegates to call functions based on the output from the IF statements? I want to say that the answer is obviously delegates, but I'm not sure if going to different methods over and over is faster or slower than many IF statements that do the same thing. I hope I explained it right.
PS The framework I need to know this for is XNA, if it matters.

You have your trade offs. The best answer was commented already and that is to profile both and then figure it out. IF statements may take more CPU because it has to do the comparisons again and again. On the other hand using delegates takes more memory and it's another object you need to keep around.
Personally what I've like doing the best (when applicable, don't know the full context of your question) is turning your IF ELSE statements into a switch-case. This works really well for state machines and other repetitive processes plus you eliminate all that branching that comes with IFs. However, this is assuming that the values your are checking for are all relatively close in range or else you'll be causing a lot of pain for the compiler.

That is a useful technique, sure. Clearly, for only a few if's delegates don't make sense because they are costly and the CPU cannot easily predict the branch target. And for very many if's a one-time initialized delegate makes sense.
It is unclear where the break-even point is. That one needs to be measured.

Related

Repeated access to properties and speed in C# [duplicate]

Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)

First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.

If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.

Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.

It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.

why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.

If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.

How do the .NET JITs optimise generated code layout?

Back in 2009 I posted this answer to a question about optimisations for nested try/catch/finally blocks.
Thinking about this again some years later, it seems the question could be extended to that other control flow, not only try/catch/finally, but also if/else.
At each of these junctions, execution will follow one path. Code must be generated for both, obviously, but the order in which they're placed in memory, and the number of jumps required to navigate through them will differ.
The order generated code is laid out in memory has implications for the miss rate on the CPU's instruction cache. Having the instruction pipeline stalled, waiting for memory reads, can really kill loop performance.
I don't think loops (for/foreach/while) are a such a good fit unless you expect the loop have zero iterations more often than it has some, as the natural generation order seems pretty optimal.
Some questions:
In what ways do the available .NET JITs optimise for generated instruction order?
How much difference can this make in practice to common code? What about perfectly suited cases?
Is there anything the developer can do to influence this layout? What about mangling with the forbidden goto?
Does the specific JIT being used make much difference to layout?
Does the method inlining heuristic come into play here too?
Basically anything interesting related to this aspect of the JIT!
Some initial thoughts:
Moving catch blocks out of line is an easy job, as they're supposed to be the exceptional case by definition. Not sure this happens.
For some loops I suspect you can increase performance non-trivially. However in general I don't think it'll make that much difference.
I don't know how the JIT decides the order of generated code. In C on Linux you have likely(cond) and unlikely(cond) which you can use to tell to the compiler which branch is the common path to optimise for. I'm not sure that all compilers respect these macros.
Instruction ordering is distinct from the problem of branch prediction, in which the CPU guesses (on its own, afaik) which branch will be taken in order to start the pipeline (oversimplied steps: decode, fetch operands, execute, write back) on instructions, before the execute step has determined the value of the condition variable.
I can't think of any way to influence this order in the C# language. Perhaps you can manipulate it a bit by gotoing to labels explicitly, but is this portable, and are there any other problems with it?
Perhaps this is what profile guided optimisation is for. Do we have that in the .NET ecosystem, now or in plan? Maybe I'll go and have a read about LLILC.

The optimization you are referring to is called the code layout optimization which is defined as follows:
Those pieces of code that are executed close in time in the same thread should be be close in the virtual address space so that they fit in a single or few consecutive cache lines. This reduces cache misses.
Those pieces of code that are executed close in time in different threads should be be close in the virtual address space so that they fit in a single or few consecutive cache lines as long as there is no self-modifying code. This gets lower priority than the previous one. This reduces cache misses.
Those pieces of code that are executed frequently (hot code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Those pieces of code that are rarely executed (cold code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Now to your questions.
In what ways do the available .NET JITs optimise for generated
instruction order?
"Instruction order" is really a very general term. Many optimizations affect instruction order. I'll assume that you're referring to code layout.
JITters by design should take the minimum amount of time to compile code while at the same time produce high-quality code. To achieve this, they only perform the most important optimizations so that it's really worth spending time doing them. Code layout optimization is not one of them because without profiling, it may not be beneficial. While a JITter can certainly perform profiling and dynamic optimization, there is a generally preferred way.
How much difference can this make in practice to common code? What
about perfectly suited cases?
Code layout optimization by itself can improve overall performance typically by -1% (negative one) to 4%, which is enough to make compiler writers happy. I would like to add that it reduces energy consumption indirectly by reducing cache misses. The reduction in miss ratio of the instruction cache can be typically up to 35%.
Is there anything the developer can do to influence this layout? What
about mangling with the forbidden goto?
Yes, there are numerous ways. I would like to mention the generally recommended one which is mpgo.exe. Please do not use goto for this purpose. It's forbidden.
Does the specific JIT being used make much difference to layout?
No.
Does the method inlining heuristic come into play here too?
Inlining can indeed improve code layout with respect to function calls. It's one of the most important optimizations and all .NET JITs perform it.
Moving catch blocks out of line is an easy job, as they're supposed to
be the exceptional case by definition. Not sure this happens.
Yes it might be "easy", but what is the potential gained benefit? catch blocks are typically small in size (containing a call to a function that handles the exception). Handling this particular case of code layout does not seem promising. If you really care, use mpgo.exe.
I don't know how the JIT decides the order of generated code. In C on
Linux you have likely(cond) and unlikely(cond) which you can use to
tell to the compiler which branch is the common path to optimise for.
Using PGO is much more preferable over using likely(cond) and unlikely(cond) for two reasons:
The programmer might inadvertently make mistakes while placing likely(cond) and unlikely(cond) in the code. It actually happens a lot. Making big mistakes while trying to manually optimize the code is very typical.
Adding likely(cond) and unlikely(cond) all over the code makes it less maintainable in the future. You'll have to make sure that these hints hold every time you change the source code. In large code bases, this could be ( or rather is) a nightmare.
Instruction ordering is distinct from the problem of branch
prediction...
Assuming you are talking about code layout, yes they are distinct. But code layout optimization is usually guided by a profile which really includes branch statistics. Hardware branch prediction is of course totally different.
Maybe I'll go and have a read about LLILC.
While using mpgo.exe is the mainstream way of performing this optimization, you can use LLILC also since LLVM support profile-guided optimization as well. But I don't think you need to go this far.

Is this a good practice to avoid constant object creation?

Good morning,
Say I have a class ClassA, an operator + which sums up two objects of type ClassA, an implicit casting from intto ClassA, and that I want to overload the operator ++... Supposing the code for + is rather long, but that the sum of a ClassA and 1 is a very particular case of it, which option is better?
Implement ++ using + and the implicit casting already defined.
Repeat part of the code which simplifies alot when adding just 1.
My idea is that (2) is better since it saves the creation of a new ClassA object by the implicit casting, which can be quite useful if the ++ operator is used, for example, in a for cycle. Also, speed is a must.
Thank you very much.

You've answered your own question. If speed is a must, then go with the second, faster option (it's a good idea to benchmark it to make sure it really is significantly faster though).
Otherwise, go with the first option since less code is better (and staying DRY doubly-so). Less code means less potential bugs, less to maintain, less to write, and less to read. If the code largely duplicates another section of code, then you'd have to keep the two in sync as you make changes -- this would be inviting trouble, as it's easy to forget to update one (and even if you always remember to make changes to both places, since they're not exactly identical it's possible to correctly update one section and incorrectly update the other).
Make sure speed really is a must though before making your final decision -- you don't want premature optimization.

Either way is acceptable. It sounds like the second way is what you're already leaning towards, so try it. In fact, try it both ways and measure the time it takes to increment a million times. Benchmarking is always the way to make these decisions.
In case you haven't done any benchmarking before, the simplest way is to create a System.Diagnostics.Stopwatch and start/stop it around the relevant code. You can then write the elapsed time to a console.

My opinion is that if +1 is a real special case that is much simpler, do the special ++ implementation. You can always comment it out and refer it to + 1 if you want to keep your code small.
Otherwise, it will be too easy to forget about this special optimization 6 mo. down the road when you are trying to optimize.
Premature optimization refers to the fact that you are optimizing before you know how to, not when you have a clear rationale to do so. How to draw the line is difficult, however; you'd need to decide how much simpler the ++ code is in order to consider putting it in.

Cost of using params in C#

Does anyone have advice for using the params in C# for method argument passing. I'm contemplating making overloads for the first 6 arguments and then a 7th using the params feature. My reasoning is to avoid the extra array allocation the params feature require. This is for some high performant utility methods. Any advice? Is it a waste of code to create all the overloads?

Honestly, I'm a little bothered by everyone shouting "premature optimization!" Here's why.
What you say makes perfect sense, particularly as you have already indicated you are working on a high-performance library.
Even BCL classes follow this pattern. Consider all the overloads of string.Format or Console.WriteLine.
This is very easy to get right. The whole premise behind the movement against premature optimization is that when you do something tricky for the purposes of optimizing performance, you're liable to break something by accident and make your code less maintainable. I don't see how that's a danger here; it should be very straightforward what you're doing, to yourself as well as any future developer who may deal with your code.
Also, even if you profiled the results of both approaches and saw only a very small difference in speed, there's still the issue of memory allocation. Creating a new array for every method call entails allocating more memory that will need to be garbage collected later. And in some scenarios where "nearly" real-time behavior is desired (such as algorithmic trading, the field I'm in), minimizing garbage collections is just as important as maximizing execution speed.
So, even if it earns me some downvotes: I say go for it.
(And to those who claim "the compiler surely already does something like this"--I wouldn't be so sure. Firstly, if that were the case, I fail to see why BCL classes would follow this pattern, as I've already mentioned. But more importantly, there is a very big semantic difference between a method that accepts multiple arguments and one that accepts an array. Just because one can be used as a substitute for the other doesn't mean the compiler would, or should, attempt such a substitution).

Yes, that's the strategy that the .NET framework uses. String.Concat() would be a good example. It has overloads for up to 4 strings, plus a fallback one that takes a params string[]. Pretty important here, Concat needs to be fast and is there to help the user fall in the pit of success when he uses the + operator instead of a StringBuilder.
The code duplication you'll get is the price. You'd profile them to see if the speedup is worth the maintenance headache.
Fwiw: there are plenty of micro-optimizations like this in the .NET framework. Somewhat necessary because the designers could not really predict how their classes were going to be used. String.Concat() is just as likely to be used in a tight inner loop that is critical to program perf as, say, a config reader that only runs once at startup. As the end-user of your own code, you typically have the luxury of not having to worry about that. The reverse is also true, the .NET framework code is remarkably free of micro-optimizations when it is unlikely that their benefit would be measurable. Like providing overloads when the core code is slow anyway.

You can always pass Tuple as a parameter, or if the types of the parameters are always the same, an IList<T>.
As other answers and comments have said, you should only optimize after:
Ensuring correct behavior.
Determining the need to optimize.

My point is, if your method is capable of getting unlimited number of parameters, then the logic inside it works in an array-style. So, having overloads for limited number of parameters wouldn't be helping. Unless, you can implement limited number of parameters in a whole different way that is much faster.
For example, if you're handing the parameters to a Console.WriteLine, there's a hidden array creation in there too, so either way you end up having an array.
And, sorry for bothering Dan Tao, I also feel like it is premature optimization. Because you need to know what difference would it make to have overloads with limited number of parameters. If your application is that much performance-critical, you'd need to implement both ways and try to run a test and compare execution times.

Don't even think about performance at this stage. Create whatever overloads will make your code easier to write and easier to understand at 4am two years from now. Sometimes that means params, sometimes that means avoiding it.
After you've got something that works, figure out if these are a performance problem. It's not hard to make the parameters more complicated, but if you add unnecessary complexity now, you'll never make them less so later.

You can try something like this to benchmark the performance so you have some concrete numbers to make decisions with.
In general, object allocation is slightly faster than in C/C++ and deletion is much, much faster for small objects -- until you have tens of thousands of them being made per second. Here's an old article regarding memory allocation performance.

Tips for optimizing C#/.NET programs [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
It seems like optimization is a lost art these days. Wasn't there a time when all programmers squeezed every ounce of efficiency from their code? Often doing so while walking five miles in the snow?
In the spirit of bringing back a lost art, what are some tips that you know of for simple (or perhaps complex) changes to optimize C#/.NET code? Since it's such a broad thing that depends on what one is trying to accomplish it'd help to provide context with your tip. For instance:
When concatenating many strings together use StringBuilder instead. See link at the bottom for caveats on this.
Use string.Compare to compare two strings instead of doing something like string1.ToLower() == string2.ToLower()
The general consensus so far seems to be measuring is key. This kind of misses the point: measuring doesn't tell you what's wrong, or what to do about it if you run into a bottleneck. I ran into the string concatenation bottleneck once and had no idea what to do about it, so these tips are useful.
My point for even posting this is to have a place for common bottlenecks and how they can be avoided before even running into them. It's not even necessarily about plug and play code that anyone should blindly follow, but more about gaining an understanding that performance should be thought about, at least somewhat, and that there's some common pitfalls to look out for.
I can see though that it might be useful to also know why a tip is useful and where it should be applied. For the StringBuilder tip I found the help I did long ago at here on Jon Skeet's site.

It seems like optimization is a lost art these days.
There was once a day when manufacture of, say, microscopes was practiced as an art. The optical principles were poorly understood. There was no standarization of parts. The tubes and gears and lenses had to be made by hand, by highly skilled workers.
These days microscopes are produced as an engineering discipline. The underlying principles of physics are extremely well understood, off-the-shelf parts are widely available, and microscope-building engineers can make informed choices as to how to best optimize their instrument to the tasks it is designed to perform.
That performance analysis is a "lost art" is a very, very good thing. That art was practiced as an art. Optimization should be approached for what it is: an engineering problem solvable through careful application of solid engineering principles.
I have been asked dozens of times over the years for my list of "tips and tricks" that people can use to optimize their vbscript / their jscript / their active server pages / their VB / their C# code. I always resist this. Emphasizing "tips and tricks" is exactly the wrong way to approach performance. That way leads to code which is hard to understand, hard to reason about, hard to maintain, that is typically not noticably faster than the corresponding straightforward code.
The right way to approach performance is to approach it as an engineering problem like any other problem:
Set meaningful, measurable, customer-focused goals.
Build test suites to test your performance against these goals under realistic but controlled and repeatable conditions.
If those suites show that you are not meeting your goals, use tools such as profilers to figure out why.
Optimize the heck out of what the profiler identifies as the worst-performing subsystem. Keep profiling on every change so that you clearly understand the performance impact of each.
Repeat until one of three things happens (1) you meet your goals and ship the software, (2) you revise your goals downwards to something you can achieve, or (3) your project is cancelled because you could not meet your goals.
This is the same as you'd solve any other engineering problem, like adding a feature -- set customer focused goals for the feature, track progress on making a solid implementation, fix problems as you find them through careful debugging analysis, keep iterating until you ship or fail. Performance is a feature.
Performance analysis on complex modern systems requires discipline and focus on solid engineering principles, not on a bag full of tricks that are narrowly applicable to trivial or unrealistic situations. I have never once solved a real-world performance problem through application of tips and tricks.

Get a good profiler.
Don't bother even trying to optimize C# (really, any code) without a good profiler. It actually helps dramatically to have both a sampling and a tracing profiler on hand.
Without a good profiler, you're likely to create false optimizations, and, most importantly, optimize routines that aren't a performance problem in the first place.
The first three steps to profiling should always be 1) Measure, 2) measure, and then 3) measure....

Optimization guidelines:
Don't do it unless you need to
Don't do it if it's cheaper to throw new hardware at the problem instead of a developer
Don't do it unless you can measure the changes in a production-equivalent environment
Don't do it unless you know how to use a CPU and a Memory profiler
Don't do it if it's going to make your code unreadable or unmaintainable
As processors continue to get faster the main bottleneck in most applications isn't CPU, it's bandwidth: bandwidth to off-chip memory, bandwidth to disk and bandwidth to net.
Start at the far end: use YSlow to see why your web site is slow for end-users, then move back and fix you database accesses to be not too wide (columns) and not too deep (rows).
In the very rare cases where it's worth doing anything to optimize CPU usage be careful that you aren't negatively impacting memory usage: I've seen 'optimizations' where developers have tried to use memory to cache results to save CPU cycles. The net effect was to reduce the available memory to cache pages and database results which made the application run far slower! (See rule about measuring.)
I've also seen cases where a 'dumb' un-optimized algorithm has beaten a 'clever' optimized algorithm. Never underestimate how good compiler-writers and chip-designers have become at turning 'inefficient' looping code into super efficient code that can run entirely in on-chip memory with pipelining. Your 'clever' tree-based algorithm with an unwrapped inner loop counting backwards that you thought was 'efficient' can be beaten simply because it failed to stay in on-chip memory during execution. (See rule about measuring.)

When working with ORMs be aware of N+1 Selects.
List<Order> _orders = _repository.GetOrders(DateTime.Now);
foreach(var order in _orders)
{
Print(order.Customer.Name);
}
If the customers are not eagerly loaded this could result in several round trips to the database.

Don't use magic numbers, use enumerations
Don't hard-code values
Use generics where possible since it's typesafe & avoids boxing & unboxing
Use an error handler where it's absolutely needed
Dispose, dispose, dispose. CLR wound't know how to close your database connections, so close them after use and dispose of unmanaged resources
Use common-sense!

OK, I have got to throw in my favorite: If the task is long enough for human interaction, use a manual break in the debugger.
Vs. a profiler, this gives you a call stack and variable values you can use to really understand what's going on.
Do this 10-20 times and you get a good idea of what optimization might really make a difference.

If you identify a method as a bottleneck, but you don't know what to do about it, you are essentially stuck.
So I'll list a few things. All of these things are not silver bullets and you will still have to profile your code. I'm just making suggestions for things you could do and can sometimes help. Especially the first three are important.
Try solving the problem using just (or: mainly) low-level types or arrays of them.
Problems are often small - using a smart but complex algorithm does not always make you win, especially if the less-smart algorithm can be expressed in code that only uses (arrays of) low level types. Take for example InsertionSort vs MergeSort for n<=100 or Tarjan's Dominator finding algorithm vs using bitvectors to naively solve the data-flow form of the problem for n<=100. (the 100 is of course just to give you some idea - profile!)
Consider writing a special case that can be solved using just low-level types (often problem instances of size < 64), even if you have to keep the other code around for larger problem instances.
Learn bitwise arithmetic to help you with the two ideas above.
BitArray can be your friend, compared to Dictionary, or worse, List. But beware that the implementation is not optimal; You can write a faster version yourself. Instead of testing that your arguments are out of range etc., you can often structure your algorithm so that the index can not go out of range anyway - but you can not remove the check from the standard BitArray and it is not free.
As an example of what you can do with just arrays of low level types, the BitMatrix is a rather powerful structure that can be implemented as just an array of ulongs and you can even traverse it using an ulong as "front" because you can take the lowest order bit in constant time (compared with the Queue in Breadth First Search - but obviously the order is different and depends on the index of the items rather than purely the order in which you find them).
Division and modulo are really slow unless the right hand side is a constant.
Floating point math is not in general slower than integer math anymore (not "something you can do", but "something you can skip doing")
Branching is not free. If you can avoid it using a simple arithmetic (anything but division or modulo) you can sometimes gain some performance. Moving a branch to outside a loop is almost always a good idea.

People have funny ideas about what actually matters. Stack Overflow is full of questions about, for example, is ++i more "performant" than i++. Here's an example of real performance tuning, and it's basically the same procedure for any language. If code is simply written a certain way "because it's faster", that's guessing.
Sure, you don't purposely write stupid code, but if guessing worked, there would be no need for profilers and profiling techniques.

The truth is that there is no such thing as the perfect optimised code. You can, however, optimise for a specific portion of code, on a known system (or set of systems) on a known CPU type (and count), a known platform (Microsoft? Mono?), a known framework / BCL version, a known CLI version, a known compiler version (bugs, specification changes, tweaks), a known amount of total and available memory, a known assembly origin (GAC? disk? remote?), with known background system activity from other processes.
In the real world, use a profiler, and look at the important bits; usually the obvious things are anything involving I/O, anything involving threading (again, this changes hugely between versions), and anything involving loops and lookups, but you might be surprised at what "obviously bad" code isn't actually a problem, and what "obviously good" code is a huge culprit.

Tell the compiler what to do, not how to do it. As an example, foreach (var item in list) is better than for (int i = 0; i < list.Count; i++) and m = list.Max(i => i.value); is better than list.Sort(i => i.value); m = list[list.Count - 1];.
By telling the system what you want to do it can figure out the best way to do it. LINQ is good because its results aren't computed until you need them. If you only ever use the first result, it doesn't have to compute the rest.
Ultimately (and this applies to all programming) minimize loops and minimize what you do in loops. Even more important is to minimize the number of loops inside your loops. What's the difference between an O(n) algorithm and an O(n^2) algorithm? The O(n^2) algorithm has a loop inside of a loop.

I don't really try to optimize my code but at times I will go through and use something like reflector to put my programs back to source. It is interesting to then compare what I wrong with what the reflector will output. Sometimes I find that what I did in a more complicated form was simplified. May not optimize things but helps me to see simpler solutions to problems.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.