When said this code need some optimization, or can be some how optimized, what does that mean? which kind of code need optimization? How to apply optimization to the code in c#? What the benefits from that?
Optimization is a very broad term. In general it implies modifying the system to make some of its aspect to work more efficiently or use fewer resources or be more robust. For example, a computer program may be optimized so that it will execute faster or use less memory or disk storage or be more responsive in terms of UI.
Although "optimization" has the same root as "optimal", the process of optimization does not produce a totally optimal system: there's always a trade-off, so only attributes of greatest interest are optimized.
And remember:
The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet. (Michael A. Jackson)
Optimization is the process of modifying a system to make some aspect of it work more efficiently or use fewer resources.
In your case refers mainly to 2 levels:
Design level
At the highest level, the design may be optimized to make best use of the available resources. The implementation of this design will benefit from a good choice of efficient algorithms and the implementation of these algorithms will benefit from writing good quality code. The architectural design of a system overwhelmingly affects its performance. The choice of algorithm affects efficiency more than any other item of the design. In some cases, however, optimization relies on using fancier algorithms, making use of special cases and special tricks and performing complex trade-offs; thus, a fully optimized program can sometimes, if insufficiently commented, be more difficult for less experienced programmers to comprehend and hence may contain more faults than unoptimized versions.
Source code level
Avoiding bad quality coding can also improve performance, by avoiding obvious slowdowns. After that, however, some optimizations are possible which actually decrease maintainability; some, but not all of them can nowadays be performed by optimizing compilers. For instance, using more indirection is often needed to simplify or improve a software, but that indirection has a cost.
Code optimization is making code run faster. There are two primary ways of doing this:
1) Squeezing more work into less cycles. Figure out where the code is doing an extra copy or if there is a branch in a tight loop. This is optimizing in the small.
2) Making your algorithms scale better. You may have heard of "Big O" notation. This is making an algorithm degrade much less quickly with large sets of data.
For instance, if you naively search a phone book for a name you will start on page 1 and read all the names until you find the one you are looking for. This will take a number of instructions scaled by the number of names in the phone book. We call this O(n). Now think about how you really search the phone book. You open to some place toward the middle and see which side the name you are looking for is on. This is called a binary search and scales at the logarithm of the number of names. We call this O(logn). It's much faster.
Remember the first rule of optimization: Measure first. Many man years have been spent optimizing code that wasn't run very much.
When doing code optimization, you take a metric on your code and try to make it more efficient. The metric usually refers to a scarce resource.
Here are common metrics
Execution speed (usually the first that comes to mind when saying optimization)
Memory consumption
Executable size (on embedded systems it can be important)
Database access
Remote service access (Make it less chatty, caching..)
Simplicity, readability, maintainability of the code
After optimization the code should give the same result.
The problem is that you have to make choices. Execution speed often comes with more memory consuption...
You should also alwas consider optimization globally. Having a gain of 10ms in a loop when you then spend 1000ms waiting for a web service is totaly useless.
To add to Anton Gogolev's answer, when a piece of code needs optimisation, it is because a particular performance requirement is not met. We develop programs to meet users requirements, right? Most programmers tend to think largely in terms of functional requirements, i.e. what the program does, but users will also have performance requirements, what is the resource cost (network bandwidth, CPU cycles, memory, disk space, etc...) of providing the functionality. Optimization is the process of changing a piece of code to meet a specific performance requirement. IMHO this should happen at design time, but you will sometimes write a piece of code only to discover it underperforms. To optimize the code, you first have to find out which is the resource that you are over using. If it is CPU cycles or memory, a profiler might help. If it is network bandwidth, which is a very common one these days, you will need to do some load testing and comms profiling.
My advice would be to always understand your current and probable future perfromance requirements before writing code, and optimize at design stage. Late optimization is expensive, difficult, and often either fails or results in ugly code.
Optimization has two main purposes:
getting your software use less resources, e.g., run faster, be smaller, use less RAM, less hard disk space both when running and when storing documents, less network access, ...
getting your software be more maintainable, by refactoring it.
You don't need to optimize as long as no related issue has been raised: It is far more difficult to debug optimized code than to optimize correct code.
It might be for example that the code has a block of code which is duplicated, and could/should be put into a method, you might be using deprecated methods/classes, there might be simpler ways to do what the code is doing, there might be some cleaning up to do (e.g. remove hard coding) etc...
Related
Back in 2009 I posted this answer to a question about optimisations for nested try/catch/finally blocks.
Thinking about this again some years later, it seems the question could be extended to that other control flow, not only try/catch/finally, but also if/else.
At each of these junctions, execution will follow one path. Code must be generated for both, obviously, but the order in which they're placed in memory, and the number of jumps required to navigate through them will differ.
The order generated code is laid out in memory has implications for the miss rate on the CPU's instruction cache. Having the instruction pipeline stalled, waiting for memory reads, can really kill loop performance.
I don't think loops (for/foreach/while) are a such a good fit unless you expect the loop have zero iterations more often than it has some, as the natural generation order seems pretty optimal.
Some questions:
In what ways do the available .NET JITs optimise for generated instruction order?
How much difference can this make in practice to common code? What about perfectly suited cases?
Is there anything the developer can do to influence this layout? What about mangling with the forbidden goto?
Does the specific JIT being used make much difference to layout?
Does the method inlining heuristic come into play here too?
Basically anything interesting related to this aspect of the JIT!
Some initial thoughts:
Moving catch blocks out of line is an easy job, as they're supposed to be the exceptional case by definition. Not sure this happens.
For some loops I suspect you can increase performance non-trivially. However in general I don't think it'll make that much difference.
I don't know how the JIT decides the order of generated code. In C on Linux you have likely(cond) and unlikely(cond) which you can use to tell to the compiler which branch is the common path to optimise for. I'm not sure that all compilers respect these macros.
Instruction ordering is distinct from the problem of branch prediction, in which the CPU guesses (on its own, afaik) which branch will be taken in order to start the pipeline (oversimplied steps: decode, fetch operands, execute, write back) on instructions, before the execute step has determined the value of the condition variable.
I can't think of any way to influence this order in the C# language. Perhaps you can manipulate it a bit by gotoing to labels explicitly, but is this portable, and are there any other problems with it?
Perhaps this is what profile guided optimisation is for. Do we have that in the .NET ecosystem, now or in plan? Maybe I'll go and have a read about LLILC.
The optimization you are referring to is called the code layout optimization which is defined as follows:
Those pieces of code that are executed close in time in the same thread should be be close in the virtual address space so that they fit in a single or few consecutive cache lines. This reduces cache misses.
Those pieces of code that are executed close in time in different threads should be be close in the virtual address space so that they fit in a single or few consecutive cache lines as long as there is no self-modifying code. This gets lower priority than the previous one. This reduces cache misses.
Those pieces of code that are executed frequently (hot code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Those pieces of code that are rarely executed (cold code) should be close in the virtual address space so that they fit in as few virtual pages as possible. This reduces page faults and working set size.
Now to your questions.
In what ways do the available .NET JITs optimise for generated
instruction order?
"Instruction order" is really a very general term. Many optimizations affect instruction order. I'll assume that you're referring to code layout.
JITters by design should take the minimum amount of time to compile code while at the same time produce high-quality code. To achieve this, they only perform the most important optimizations so that it's really worth spending time doing them. Code layout optimization is not one of them because without profiling, it may not be beneficial. While a JITter can certainly perform profiling and dynamic optimization, there is a generally preferred way.
How much difference can this make in practice to common code? What
about perfectly suited cases?
Code layout optimization by itself can improve overall performance typically by -1% (negative one) to 4%, which is enough to make compiler writers happy. I would like to add that it reduces energy consumption indirectly by reducing cache misses. The reduction in miss ratio of the instruction cache can be typically up to 35%.
Is there anything the developer can do to influence this layout? What
about mangling with the forbidden goto?
Yes, there are numerous ways. I would like to mention the generally recommended one which is mpgo.exe. Please do not use goto for this purpose. It's forbidden.
Does the specific JIT being used make much difference to layout?
No.
Does the method inlining heuristic come into play here too?
Inlining can indeed improve code layout with respect to function calls. It's one of the most important optimizations and all .NET JITs perform it.
Moving catch blocks out of line is an easy job, as they're supposed to
be the exceptional case by definition. Not sure this happens.
Yes it might be "easy", but what is the potential gained benefit? catch blocks are typically small in size (containing a call to a function that handles the exception). Handling this particular case of code layout does not seem promising. If you really care, use mpgo.exe.
I don't know how the JIT decides the order of generated code. In C on
Linux you have likely(cond) and unlikely(cond) which you can use to
tell to the compiler which branch is the common path to optimise for.
Using PGO is much more preferable over using likely(cond) and unlikely(cond) for two reasons:
The programmer might inadvertently make mistakes while placing likely(cond) and unlikely(cond) in the code. It actually happens a lot. Making big mistakes while trying to manually optimize the code is very typical.
Adding likely(cond) and unlikely(cond) all over the code makes it less maintainable in the future. You'll have to make sure that these hints hold every time you change the source code. In large code bases, this could be ( or rather is) a nightmare.
Instruction ordering is distinct from the problem of branch
prediction...
Assuming you are talking about code layout, yes they are distinct. But code layout optimization is usually guided by a profile which really includes branch statistics. Hardware branch prediction is of course totally different.
Maybe I'll go and have a read about LLILC.
While using mpgo.exe is the mainstream way of performing this optimization, you can use LLILC also since LLVM support profile-guided optimization as well. But I don't think you need to go this far.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I see this term used a lot but I feel like most people use it out of laziness or ignorance. For instance, I was reading this article:
http://blogs.msdn.com/b/ricom/archive/2006/09/07/745085.aspx
where he talks about his decisions he makes to implement the types necessary for his app.
If it was me, talking about these for code that we need to write, other programmers would think either:
I am thinking way too much ahead when there is nothing and thus prematurely optimizing.
Over-thinking insignificant details when there is no slowdowns or performance problems experienced.
or both.
and would suggest to just implement it and not worry about these until they become a problem.
Which is more preferential?
How to make the differentiation between premature optimization vs informed decision making for a performance critical application before any implementation is done?
Optimization is premature if:
Your application isn't doing anything time-critical. (Which means, if you're writing a program that adds up 500 numbers in a file, the word "optimization" shouldn't even pop into your brain, since all it'll do is waste your time.)
You're doing something time-critical in something other than assembly, and still worrying whether i++; i++; is faster or i += 2... if it's really that critical, you'd be working in assembly and not wasting time worrying about this. (Even then, this particular example most likely won't matter.)
You have a hunch that one thing might be a bit faster than the other, but you need to look it up. For example, if something is bugging you about whether StopWatch is faster or Environment.TickCount, it's premature optimization, since if the difference was bigger, you'd probably be more sure and wouldn't need to look it up.
If you have a guess that something might be slow but you're not too sure, just put a //NOTE: Performance? comment, and if you later run into bottlenecks, check such places in your code. I personally don't worry about optimizations that aren't too obvious; I just use a profiler later, if I need to.
Another technique:
I just run my program, randomly break into it with the debugger, and see where it stopped -- wherever it stops is likely a bottleneck, and the more often it stops there, the worse the bottleneck. It works almost like magic. :)
This proverb does not (I believe) refer to optimizations that are built into a good design as it is created. It refers to tasks specifically targeted at performance, which otherwise would not be undertaken.
This kind of optimization does not "become" premature, according to the common wisdom — it is guilty until proven innocent.
Optimisation is the process of making existing code run more efficiently (faster speed, and/or less resource usage)
All optimisation is premature if the programmer has not proven that it is necessary. (For example, by running the code to determine if it achieves the correct results in an acceptable timeframe. This could be as simple as running it to "see" if it runs fast enough, or running under a profiler to analyze it more carefully).
There are several stages to programming something well:
1) Design the solution and pick a good, efficient algorithm.
2) Implement the solution in a maintainable, well coded manner.
3) Test the solution and see if it meets your requirements on speed, RAM usage, etc. (e.g. "When the user clicks "Save", does it take less than 1 second?" If it takes 0.3s, you really don't need to spend a week optimising it to get that time down to 0.2s)
4) IF it does not meet the requirements, consider why. In most cases this means go to step (1) to find a better algorithm now that you understand the problem better. (Writing a quick prototype is often a good way of exploring this cheaply)
5) IF it still does not meet the requirements, start considering optimisations that may help speed up the runtime (for example, look-up tables, caching, etc). To drive this process, profiling is usually an important tool to help you locate the bottle-necks and inefficiences in the code, so you can make the greatest gain for the time you spend on the code.
I should point out that an experienced programmer working on a reasonably familiar problem may be able to jump through the first steps mentally and then just apply a pattern, rather than physically going through this process every time, but this is simply a short cut that is gained through experience
Thus, there are many "optimisations" that experienced programmers will build into their code automatically. These are not "premature optimisations" so much as "common-sense efficiency patterns". These patterns are quick and easy to implement, but vastly improve the efficiency of the code, and you don't need to do any special timing tests to work out whether or not they will be of benefit:
Not putting unnecessary code into loops. (Similar to the optimisation of removing unnecessary code from existing loops, but it doesn't involve writing the code twice!)
Storing intermediate results in variables rather than re-calculating things over and over.
Using look-up tables to provide precomputed values rather than calculating them on the fly.
Using appropriate-sized data structures (e.g. storing a percentage in a byte (8 bits) rather than a long (64 bits) will use 8 times less RAM)
Drawing a complex window background using a pre-drawn image rather than drawing lots of individual components
Applying compression to packets of data you intend to send over a low-speed connection to minimise the bandwidth usage.
Drawing images for your web page in a style that allows you to use a format that will get high quality and good compression.
And of course, although it's not technically an "optmisation", choosing the right algorithm in the first place!
For example, I just replaced an old piece of code in our project. My new code is not "optimised" in any way, but (unlike the original implementation) it was written with efficiency in mind. The result: Mine runs 25 times faster - simply by not being wasteful. Could I optimise it to make it faster? Yes, I could easily get another 2x speedup. Will I optimise my code to make it faster? No - a 5x speed improvement would have been sufficient, and I have already achieved 25x. Further work at this point would just be a waste of precious programming time. (But I can revisit the code in future if the requirements change)
Finally, one last point: The area you are working in dictates the bar you must meet. If you are writing a graphics engine for a game or code for a real-time embedded controller, you may well find yourself doing a lot of optimisation. If you are writing a desktop application like a notepad, you may never need to optimise anything as long as you aren't overly wasteful.
When starting out, just delivering a product is more important than optimizing.
Over time you are going to profile various applications and will learn coding skills that will naturally lead to optimized code. Basically at some point you'll be able to spot potential trouble spots and build things accordingly.
However don't sweat it until you've found an actual problem.
Premature optimization is making an optimization for performance at the cost of some other positive attribute of your code (e.g. readability) before you know that it is necessary to make this tradeoff.
Usually premature optimizations are made during the development process without using any profiling tools to find bottlenecks in the code. In many cases the optimization will make the code harder to maintain and sometimes also increases the development time, and therefore the cost of the software. Worse... some premature optimizations turn out not to be make the code any faster at all and in some cases can even make the code slower than it was before.
When you have less that 10 years of coding experience.
Having (lots of) experience might be a trap. I know many very experienced programmers (C\C++, assembly) who tend to worry too much because they are used to worry about clock ticks and superfluous bits.
There are areas such as embedded or realtime systems where these do count but in regular OLTP/LOB apps most of your effort should be directed towards maintainability, readability and changeabilty.
Optimization is tricky. Consider the following examples:
Deciding on implementing two servers, each doing its own job, instead of implementing a single server that will do both jobs.
Deciding to go with one DBMS and not another, for performance reasons.
Deciding to use a specific, non-portable API when there is a standard (e.g., using Hibernate-specific functionality when you basically need the standard JPA), for performance reasons.
Coding something in assembly for performance reasons.
Unrolling loops for performance reasons.
Writing a very fast but obscure piece of code.
My bottom line here is simple. Optimization is a broad term. When people talk about premature optimization, they don't mean you need to just do the first thing that comes to mind without considering the complete picture. They are saying you should:
Concentrate on the 80/20 rule - don't consider ALL the possible cases, but the most probable ones.
Don't over-design stuff without any good reason.
Don't write code that is not clear, simple and easily maintainable if there is no real, immediate performance problem with it.
It really all boils down to your experience. If you are an expert in image processing, and someone requests you do something you did ten times before, you will probably push all your known optimizations right from the beginning, but that would be ok. Premature optimization is when you're trying to optimize something when you don't know it needs optimization to begin with. The reason for that is simple - it's risky, it's wasting your time, and it will be less maintainable. So unless you're experienced and you've been down that road before, don't optimize if you don't know there's a problem.
Note that optimization is not free (as in beer)
it takes more time to write
it takes more time to read
it takes more time to test
it takes more time to debug
...
So before optimizing anything, you should be sure it's worth it.
That Point3D type you linked to seems like the cornerstone of something, and the case for optimization was probably obvious.
Just like the creators of the .NET library didn't need any measurements before they started optimizing System.String. They would have to measure during though.
But most code does not play a significant role in the performance of the end product. And that means any effort in optimization is wasted.
Besides all that, most 'premature optimizations' are untested/unmeasured hacks.
Optimizations are premature if you spend too much time designing those during the earlier phases of implementation. During the early stages, you have better things to worry about: getting core code implemented, unit tests written, systems talking to each other, UI, and whatever else. Optimizing comes with a price, and you might well be wasting time on optimizing something that doesn't need to be, all the while creating code that is harder to maintain.
Optimizations only make sense when you have concrete performance requirements for your project, and then performance will matter after the initial development and you have enough of your system implemented in order to actually measure whatever it is you need to measure. Never optimize without measuring.
As you gain more experience, you can make your early designs and implementations with a small eye towards future optimizations, that is, try to design in such a way that will make it easier to measure performance and optimize later on, should that even be necessary. But even in this case, you should spend little time on optimizations in the early phases of development.
The following article discusses an alternative heap structure that takes into consideration that most servers are virtualized and therefore most memory is paged to disk.
http://queue.acm.org/detail.cfm?id=1814327
Can (or should) a .NET developer implement a B-Heap data structure so that parent-child relationships are maintained within the same Virtual Memory Page? How or where would this be implemented?
Clarification
In other words, is this type of data structure needed within .NET as a primimitive type? True it should be implemented in either natively in the CLR or in a p/invoke.
When a server administrator deploys my .NET app within a virtual machine, does this binary heap optimization make sense? If so, when does it make sense? (number of objects, etc)
To at least a certain extent, BCL collections do seem to take paging concerns into account. They also take CPU cache concerns into account (which overlaps in some regard, as locality of memory can affect both, though in different ways).
Consider that Queue<T> uses arrays for internal storage. In purely random-access terms (that is to say, where there is never any cost for paging or CPU cache flushing) this is a poor choice; the queue will almost always be solely added to at one point and removed from at another and hence an internal implementation as a singly linked list would win in almost every way (for that matter, in terms of iterating through the queue - which it also supports - a linked list shouldn't do much worse than an array in this regard in a pure-random-access situation). Where array-based implementation fares better than singly-linked-list is precisely when paging and CPU cache are considered. That MS went for a solution that is worse in the pure-random-access situation but better in the real-world case where paging matters, so that they are paying attention to the effects of paging.
Of course, from the outside that isn't obvious - and shouldn't be. From the outside we want something that works like a queue; making the inside efficient is a different concern.
These concerns are also met in other ways. The way the GC works, for example, minimises the amount of paging necessary as its moving objects not only makes for less fragmentation, but also makes for fewer page faults. Other collections are also implemented in ways to make paging less frequent than the most immediate solution would suggest.
That's just a few things that stand out to me from things I have looked at. I'd bet good money such concerns are also considered at many other places in the .NET teams work. Likewise with other frameworks. Consider that the one big performance concern Cliff Click mentions repeatedly in terms of his Java lock-free hashtable (I really much finish checking my C# implementation) apart from those of lock-free concurrency (the whole point of the exercise) is cache-lines; and it's also the one other performance concern he doesn't dismiss!
Consider also, that most uses of most collections are going to fit in one page anyway!
If you are implementing your own collections, or putting a standard collection into particularly heavy use, then these are things you need to think about (sometimes "nah, not an issue" is enough thinking, sometimes it isn't) but that doesn't mean they aren't already thought about in terms of what we get from the BCL.
if you have an especially special-case scenario and algorithm then you might benefit from that kind of optimization.
But generally speaking, when reimplementing core parts of the CLR framework (on top of the CLR I might add, ie in managed code) your chances of doing it more efficiently than the CLR team did are incredibly slim. So I wouldn't recommend it unless you have already profiled the heck out of your current implementation and have positively identified issues related to locality of data in memory. And even then, you will get more bang for your buck by tweaking your algorithm to work better with the CLR memory management scheme then trying to bypass or work around it.
I'm writing a book on multicore programming using .NET 4 and I'm curious to know what parts of multicore programming people have found difficult to grok or anticipate being difficult to grok?
What's a useful unit of work to parallelize, and how do I find/organize one?
All these parallelism primitives aren't helpful if you fork a piece of work that is smaller than the forking overhead; in fact, that buys you a nice slowdown instead of what you are expecting.
So one of the big problems is finding units of work that are obviously more expensive than the parallelism primitives. A key problem here is that nobody knows what anything costs to execute, including the parallelism primitives themselves. Clearly calibrating these costs would be very helpful. (As an aside, we designed, implemented, and daily use a parallel programming langauge, PARLANSE whose objective was to minimize the cost of the parallelism primitives by allowing the compiler to generate and optimize them, with the goal of making smaller bits of work "more parallelizable").
One might also consider discussion big-Oh notation and its applications. We all hope that the parallelism primitives have cost O(1). If that's the case, then if you find work with cost O(x) > O(1) then that work is a good candidate for parallelization. If your proposed work is also O(1), then whether it is effective or not depends on the constant factors and we are back to calibration as above.
There's the problem of collecting work into large enough units, if none of the pieces are large enough. Code motion, algorithm replacement, ... are all useful ideas to achieve this effect.
Lastly, there's the problem of synchnonization: when do my parallel units have to interact, what primitives should I use, and how much do those primitives cost? (More than you expect!).
I guess some of it depends on how basic or advanced the book/audience is. When you go from single-threaded to multi-threaded programming for the first time, you typically fall off a huge cliff (and many never recover, see e.g. all the muddled questions about Control.Invoke).
Anyway, to add some thoughts that are less about the programming itself, and more about the other related tasks in the software process:
Measuring: deciding what metric you are aiming to improve, measuring it correctly (it is so easy to accidentally measure the wrong thing), using the right tools, differentiating signal versus noise, interpreting the results and understanding why they are as they are.
Testing: how to write tests that tolerate unimportant non-determinism/interleavings, but still pin down correct program behavior.
Debugging: tools, strategies, when "hard to debug" implies feedback to improve your code/design and better partition mutable state, etc.
Physical versus logical thread affinity: understanding the GUI thread, understanding how e.g. an F# MailboxProcessor/agent can encapsulate mutable state and run on multiple threads but always with only a single logical thread (one program counter).
Patterns (and when they apply): fork-join, map-reduce, producer-consumer, ...
I expect that there will be a large audience for e.g. "help, I've got a single-threaded app with 12% CPU utilization, and I want to learn just enough to make it go 4x faster without much work" and a smaller audience for e.g. "my app is scaling sub-linearly as we add cores because there seems to be contention here, is there a better approach to use?", and so a bit of the challenge may be serving each of those audiences.
Since you write a whole book for multi-core programming in .Net.
I think you can also go beyond multi-core a little bit.
For example, you can use a chapter talking about parallel computing in a distributed system in .Net. Unlikely, there is no mature frameworks in .Net yet. DryadLinq is the closest. (On the other side, Hadoop and its friends in Java platform are really good.)
You can also use a chapter demonstrating some GPU computing stuff.
One thing that has tripped me up is which approach to use to solve a particular type of problem. There's agents, there's tasks, async computations, MPI for distribution - for many problems you could use multiple methods but I'm having difficulty understanding why I should use one over another.
To understand: low level memory details like the difference between acquire and release semantics of memory.
Most of the rest of the concepts and ideas (anything can interleave, race conditions, ...) are not that difficult with a little usage.
Of course the practice, especially if something is failing sometimes, is very hard as you need to work at multiple levels of abstraction to understand what is going on, so keep your design simple and as far as possible design out the need for locking etc. (e.g. using immutable data and higher level abstractions).
Its not so much theoretical details, but more the practical implementation details which trips people up.
What's the deal with immutable data structures?
All the time, people try to update a data structure from multiple threads, find it too hard, and someone chimes in "use immutable data structures!", and so our persistent coder writes this:
ImmutableSet set;
ThreadLoop1()
foreach(Customer c in dataStore1)
set = set.Add(ProcessCustomer(c));
ThreadLoop2()
foreach(Customer c in dataStore2)
set = set.Add(ProcessCustomer(c));
Coder has heard all their lives that immutable data structures can be updated without locking, but the new code doesn't work for obvious reasons.
Even if your targeting academics and experienced devs, a little primer on the basics of immutable programming idioms can't hurt.
How to partition roughly equal amounts of work between threads?
Getting this step right is hard. Sometimes you break up a single process into 10,000 steps which can be executed in parallel, but not all steps take the same amount of time. If you split the work on 4 threads, and the first 3 threads finish in 1 second, and the last thread takes 60 seconds, your multithreaded program isn't much better than the single-threaded version, right?
So how do you partition problems with roughly equal amounts of work between all threads? Lots of good heuristics on solving bin packing problems should be relevant here..
How many threads?
If your problem is nicely parallelizable, adding more threads should make it faster, right? Well not really, lots of things to consider here:
Even a single core processor, adding more threads can make a program faster because more threads gives more opportunities for the OS to schedule your thread, so it gets more execution time than the single-threaded program. But with the law of diminishing returns, adding more threads increasing context-switching, so at a certain point, even if your program has the most execution time the performance could still be worse than the single-threaded version.
So how do you spin off just enough threads to minimize execution time?
And if there are lots of other apps spinning up threads and competing for resources, how do you detect performance changes and adjust your program automagically?
I find the conceptions of synchronized data moving across worker nodes in complex patterns very hard to visualize and program.
Usually I find debugging to be a bear, also.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
It seems like optimization is a lost art these days. Wasn't there a time when all programmers squeezed every ounce of efficiency from their code? Often doing so while walking five miles in the snow?
In the spirit of bringing back a lost art, what are some tips that you know of for simple (or perhaps complex) changes to optimize C#/.NET code? Since it's such a broad thing that depends on what one is trying to accomplish it'd help to provide context with your tip. For instance:
When concatenating many strings together use StringBuilder instead. See link at the bottom for caveats on this.
Use string.Compare to compare two strings instead of doing something like string1.ToLower() == string2.ToLower()
The general consensus so far seems to be measuring is key. This kind of misses the point: measuring doesn't tell you what's wrong, or what to do about it if you run into a bottleneck. I ran into the string concatenation bottleneck once and had no idea what to do about it, so these tips are useful.
My point for even posting this is to have a place for common bottlenecks and how they can be avoided before even running into them. It's not even necessarily about plug and play code that anyone should blindly follow, but more about gaining an understanding that performance should be thought about, at least somewhat, and that there's some common pitfalls to look out for.
I can see though that it might be useful to also know why a tip is useful and where it should be applied. For the StringBuilder tip I found the help I did long ago at here on Jon Skeet's site.
It seems like optimization is a lost art these days.
There was once a day when manufacture of, say, microscopes was practiced as an art. The optical principles were poorly understood. There was no standarization of parts. The tubes and gears and lenses had to be made by hand, by highly skilled workers.
These days microscopes are produced as an engineering discipline. The underlying principles of physics are extremely well understood, off-the-shelf parts are widely available, and microscope-building engineers can make informed choices as to how to best optimize their instrument to the tasks it is designed to perform.
That performance analysis is a "lost art" is a very, very good thing. That art was practiced as an art. Optimization should be approached for what it is: an engineering problem solvable through careful application of solid engineering principles.
I have been asked dozens of times over the years for my list of "tips and tricks" that people can use to optimize their vbscript / their jscript / their active server pages / their VB / their C# code. I always resist this. Emphasizing "tips and tricks" is exactly the wrong way to approach performance. That way leads to code which is hard to understand, hard to reason about, hard to maintain, that is typically not noticably faster than the corresponding straightforward code.
The right way to approach performance is to approach it as an engineering problem like any other problem:
Set meaningful, measurable, customer-focused goals.
Build test suites to test your performance against these goals under realistic but controlled and repeatable conditions.
If those suites show that you are not meeting your goals, use tools such as profilers to figure out why.
Optimize the heck out of what the profiler identifies as the worst-performing subsystem. Keep profiling on every change so that you clearly understand the performance impact of each.
Repeat until one of three things happens (1) you meet your goals and ship the software, (2) you revise your goals downwards to something you can achieve, or (3) your project is cancelled because you could not meet your goals.
This is the same as you'd solve any other engineering problem, like adding a feature -- set customer focused goals for the feature, track progress on making a solid implementation, fix problems as you find them through careful debugging analysis, keep iterating until you ship or fail. Performance is a feature.
Performance analysis on complex modern systems requires discipline and focus on solid engineering principles, not on a bag full of tricks that are narrowly applicable to trivial or unrealistic situations. I have never once solved a real-world performance problem through application of tips and tricks.
Get a good profiler.
Don't bother even trying to optimize C# (really, any code) without a good profiler. It actually helps dramatically to have both a sampling and a tracing profiler on hand.
Without a good profiler, you're likely to create false optimizations, and, most importantly, optimize routines that aren't a performance problem in the first place.
The first three steps to profiling should always be 1) Measure, 2) measure, and then 3) measure....
Optimization guidelines:
Don't do it unless you need to
Don't do it if it's cheaper to throw new hardware at the problem instead of a developer
Don't do it unless you can measure the changes in a production-equivalent environment
Don't do it unless you know how to use a CPU and a Memory profiler
Don't do it if it's going to make your code unreadable or unmaintainable
As processors continue to get faster the main bottleneck in most applications isn't CPU, it's bandwidth: bandwidth to off-chip memory, bandwidth to disk and bandwidth to net.
Start at the far end: use YSlow to see why your web site is slow for end-users, then move back and fix you database accesses to be not too wide (columns) and not too deep (rows).
In the very rare cases where it's worth doing anything to optimize CPU usage be careful that you aren't negatively impacting memory usage: I've seen 'optimizations' where developers have tried to use memory to cache results to save CPU cycles. The net effect was to reduce the available memory to cache pages and database results which made the application run far slower! (See rule about measuring.)
I've also seen cases where a 'dumb' un-optimized algorithm has beaten a 'clever' optimized algorithm. Never underestimate how good compiler-writers and chip-designers have become at turning 'inefficient' looping code into super efficient code that can run entirely in on-chip memory with pipelining. Your 'clever' tree-based algorithm with an unwrapped inner loop counting backwards that you thought was 'efficient' can be beaten simply because it failed to stay in on-chip memory during execution. (See rule about measuring.)
When working with ORMs be aware of N+1 Selects.
List<Order> _orders = _repository.GetOrders(DateTime.Now);
foreach(var order in _orders)
{
Print(order.Customer.Name);
}
If the customers are not eagerly loaded this could result in several round trips to the database.
Don't use magic numbers, use enumerations
Don't hard-code values
Use generics where possible since it's typesafe & avoids boxing & unboxing
Use an error handler where it's absolutely needed
Dispose, dispose, dispose. CLR wound't know how to close your database connections, so close them after use and dispose of unmanaged resources
Use common-sense!
OK, I have got to throw in my favorite: If the task is long enough for human interaction, use a manual break in the debugger.
Vs. a profiler, this gives you a call stack and variable values you can use to really understand what's going on.
Do this 10-20 times and you get a good idea of what optimization might really make a difference.
If you identify a method as a bottleneck, but you don't know what to do about it, you are essentially stuck.
So I'll list a few things. All of these things are not silver bullets and you will still have to profile your code. I'm just making suggestions for things you could do and can sometimes help. Especially the first three are important.
Try solving the problem using just (or: mainly) low-level types or arrays of them.
Problems are often small - using a smart but complex algorithm does not always make you win, especially if the less-smart algorithm can be expressed in code that only uses (arrays of) low level types. Take for example InsertionSort vs MergeSort for n<=100 or Tarjan's Dominator finding algorithm vs using bitvectors to naively solve the data-flow form of the problem for n<=100. (the 100 is of course just to give you some idea - profile!)
Consider writing a special case that can be solved using just low-level types (often problem instances of size < 64), even if you have to keep the other code around for larger problem instances.
Learn bitwise arithmetic to help you with the two ideas above.
BitArray can be your friend, compared to Dictionary, or worse, List. But beware that the implementation is not optimal; You can write a faster version yourself. Instead of testing that your arguments are out of range etc., you can often structure your algorithm so that the index can not go out of range anyway - but you can not remove the check from the standard BitArray and it is not free.
As an example of what you can do with just arrays of low level types, the BitMatrix is a rather powerful structure that can be implemented as just an array of ulongs and you can even traverse it using an ulong as "front" because you can take the lowest order bit in constant time (compared with the Queue in Breadth First Search - but obviously the order is different and depends on the index of the items rather than purely the order in which you find them).
Division and modulo are really slow unless the right hand side is a constant.
Floating point math is not in general slower than integer math anymore (not "something you can do", but "something you can skip doing")
Branching is not free. If you can avoid it using a simple arithmetic (anything but division or modulo) you can sometimes gain some performance. Moving a branch to outside a loop is almost always a good idea.
People have funny ideas about what actually matters. Stack Overflow is full of questions about, for example, is ++i more "performant" than i++. Here's an example of real performance tuning, and it's basically the same procedure for any language. If code is simply written a certain way "because it's faster", that's guessing.
Sure, you don't purposely write stupid code, but if guessing worked, there would be no need for profilers and profiling techniques.
The truth is that there is no such thing as the perfect optimised code. You can, however, optimise for a specific portion of code, on a known system (or set of systems) on a known CPU type (and count), a known platform (Microsoft? Mono?), a known framework / BCL version, a known CLI version, a known compiler version (bugs, specification changes, tweaks), a known amount of total and available memory, a known assembly origin (GAC? disk? remote?), with known background system activity from other processes.
In the real world, use a profiler, and look at the important bits; usually the obvious things are anything involving I/O, anything involving threading (again, this changes hugely between versions), and anything involving loops and lookups, but you might be surprised at what "obviously bad" code isn't actually a problem, and what "obviously good" code is a huge culprit.
Tell the compiler what to do, not how to do it. As an example, foreach (var item in list) is better than for (int i = 0; i < list.Count; i++) and m = list.Max(i => i.value); is better than list.Sort(i => i.value); m = list[list.Count - 1];.
By telling the system what you want to do it can figure out the best way to do it. LINQ is good because its results aren't computed until you need them. If you only ever use the first result, it doesn't have to compute the rest.
Ultimately (and this applies to all programming) minimize loops and minimize what you do in loops. Even more important is to minimize the number of loops inside your loops. What's the difference between an O(n) algorithm and an O(n^2) algorithm? The O(n^2) algorithm has a loop inside of a loop.
I don't really try to optimize my code but at times I will go through and use something like reflector to put my programs back to source. It is interesting to then compare what I wrong with what the reflector will output. Sometimes I find that what I did in a more complicated form was simplified. May not optimize things but helps me to see simpler solutions to problems.