Performance cost of Method Encapsulation - c#

Is there a performance cost to encapsulating methods? A very brief, arbitrary example:
public static decimal Floor(decimal value)
{
return Math.Floor(value);
}
Would the above function be inlined? And if so, would it be the exact same as calling Math.Floor() from the code? I did Google before writing this.

Method likely will be inlined (at JIT time, C# compiler does not inline method in IL). Even if not cost is unlikely to impact your overall program. Since optimization and performance numbers are specific to particular code/application you need to measure your case if you see performance problem.
In particular Writing Faster Managed Code: Know What Things Cost article on MSDN gives following estimate for cost of method call: max 6.8 nano-seconds (for 2003 level machine) if the call is not optimized.
Consider reading the rest of the article. In particular Table 3 talks about not only the cost of method calls, but also how much do the operations as trivial as addition, subtraction, multiplication and division cost.
If you need to confirm whether method is inlined - it is covered in many SO questions like Can I check if the C# compiler inlined a method call?

Related

Repeated access to properties and speed in C# [duplicate]

Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)
First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.
If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.
Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.
It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.
why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.
If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.

Function call for small method will consume for memory or not..... in C#

I have one question. Instead of writing big method(including big business logic), i preferred to divide this method in small methods and call them in one method because for me it looks so neat and easy to maintain. But my Team Lead said that "Don't write small methods and call them in one because it consumes more memory while you call small methods." Is that correct ?
Please suggest what should i do in this case ? and once again thank you for your valuable time
There are many factors that come into play here. More context of your project would be required to give any strict conclusions.
Generally speaking though, C#, VB and managed languages in general were devised to prioritize developer productivity over performance. In that light, worrying about method call memory consumption seems questionable.
Additionally, IL-based languages (C#, VB, ...) use a JIT that compiles the intermediate code to CPU-specific assembly in runtime. JIT's unit of work is a method. The bigger the method, the less optimizations JIT can do. Therefore a big method may yield worse performance than many small methods doing the same work. In addition, JIT can also do an optimization called inlining where a small method code is generated inside its caller, eliding the function call altogether.
Function call takes very little memory by C#/VB's terms. Unless you're working in a very constrained environment (e.g. embedded), such optimization doesn't really make sense, especially when not backed by any reasonable arguments.
You are both mistaken.
OOP is built on a concept of divide and conquer so you should divide your method into small methods for the sake of reuse ability and maintainability.
About the memory consumed, I don't think it will consume more memory but this may happen when you create methods for each small task.
So yes divide them into small methods only if need to, with respect of resources and sharing variables.

The 32 bytes limit for inline function ... not too small?

I have a very small c# code marked as inline, but dont work.
I have seen that the longest function generates more than 32 bytes of IL code. Does the limit of 32 bytes too short ?
// inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public bool INL_IsInRange (this byte pValue, byte pMin) {
return(pValue>=pMin);
}
// NOT inlined
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static public bool INL_IsInRange (this byte pValue, byte pMin, byte pMax) {
return(pValue>=pMin&&pValue<=pMax);
}
Is it possible to change that limit?
I am looking for inline function criteria also. In your case, I believe that JIT optimization timed out before it could reach the decision to inline your second function. For JIT, it's not a priority to inline a function, so it was busy analyzing your long code. However, if you place your calls inside tight loops, JIT will probably inline them, as inner calls gain priority to inline. If you really care about this type of micro-optimization, it's time to switch to C++. It's a whole new brave world out there for you to explore and exploit!
I noticed that the question had been edited right after this answer had been posted, meaning a high level of interactivity. Well, I don't know why there is a limit of 32 bytes, but that seems to be exactly the size of a CPU cache block, conservatively speaking. What a coincidence! In any case, code optimization must be done with a particular hardware configuration, better saved in an extra file side by side with its assembly. The timeout policy is stupid, because optimization is not supposed to be done at run-time, competing against the precious code execution time. Optimization is supposed to be done at application load-time, only the first time it's run on the machine, once for all. It can be triggered again when hardware configuration change is detected. Again, if you really need performance, just go with C/C++. C# is not designed for performance and will never make performance its top priority. Like Java, C# is designed for safety, with a much stronger caution against possible negative performance impacts.
Up to the "32-bytes of IL" limit, there are a number of other factors which affect whether a method would be inlined or not. There are at least a couple of articles that describe these factors.
One article explains that a scoring heuristic is used to adjust an initial guess about the relative size of the code when inlined vs not (i.e. whether the call site is larger or smaller than the inlined code itself):
If inlining makes code smaller then the call it replaces, it is ALWAYS good. Note that we are talking about the NATIVE code size, not the IL code size (which can be quite different).
The more a particular call site is executed, the more it will benefit from inlning. Thus code in loops deserves to be inlined more than code that is not in loops.
If inlining exposes important optimizations, then inlining is more desirable. In particular methods with value types arguments benefit more than normal because of optimizations like this and thus having a bias to inline these methods is good.
Thus the heuristic the X86 JIT compiler uses is, given an inline candidate.
Estimate the size of the call site if the method were not inlined.
Estimate the size of the call site if it were inlined (this is an estimate based on the IL, we employ a simple state machine (Markov Model), created using lots of real data to form this estimator logic)
Compute a multiplier. By default it is 1
Increase the multiplier if the code is in a loop (the current heuristic bumps it to 5 in a loop)
Increase the multiplier if it looks like struct optimizations will kick in.
If InlineSize <= NonInlineSize * Multiplier do the inlining.
Another article explains several conditions that will prevent a method from being inlined based on their mere existence (including the "32-bytes of IL" limit):
These are some of the reasons for which we won't inline a method:
Method is marked as not inline with the CompilerServices.MethodImpl attribute.
Size of inlinee is limited to 32 bytes of IL: This is a heuristic, the rationale behind it is that usually, when you have methods bigger than that, the overhead of the call will not be as significative compared to the work the method does. Of course, as a heuristic, it fails in some situations. There have been suggestions for us adding an attribute to control these threshold. For Whidbey, that attribute has not been added (it has some very bad properties: it's x86 JIT specific and it's longterm value, as compilers get smarter, is dubious).
Virtual calls: We don't inline across virtual calls. The reason for not doing this is that we don't know the final target of the call. We could potentially do better here (for example, if 99% of calls end up in the same target, you can generate code that does a check on the method table of the object the virtual call is going to execute on, if it's not the 99% case, you do a call, else you just execute the inlined code), but unlike the J language, most of the calls in the primary languages we support, are not virtual, so we're not forced to be so aggressive about optimizing this case.
Valuetypes: We have several limitations regarding value types an inlining. We take the blame here, this is a limitation of our JIT, we could do better and we know it. Unfortunately, when stack ranked against other features of Whidbey, getting some statistics on how frequently methods cannot be inlined due to this reason and considering the cost of making this area of the JIT significantly better, we decided that it made more sense for our customers to spend our time working in other optimizations or CLR features. Whidbey is better than previous versions in one case: value types that only have a pointer size int as a member, this was (relatively) not expensive to make better, and helped a lot in common value types such as pointer wrappers (IntPtr, etc).
MarshalByRef: Call targets that are in MarshalByRef classes won't be inlined (call has to be intercepted and dispatched). We've got better in Whidbey for this scenario
VM restrictions: These are mostly security, the JIT must ask the VM for permission to inline a method (see CEEInfo::canInline in Rotor source to get an idea of what kind of things the VM checks for).
Complicated flowgraph: We don't inline loops, methods with exception handling regions, etc...
If basic block that has the call is deemed as it won't execute frequently (for example, a basic block that has a throw, or a static class constructor), inlining is much less aggressive (as the only real win we can make is code size)
Other: Exotic IL instructions, security checks that need a method frame, etc...

JIT & loop optimization

using System;
namespace ConsoleApplication1
{
class TestMath
{
static void Main()
{
double res = 0.0;
for(int i =0;i<1000000;++i)
res += System.Math.Sqrt(2.0);
Console.WriteLine(res);
Console.ReadKey();
}
}
}
By benchmarking this code against the c++ version, I discover than performance are 10 times slower than c++ version. I have no problem with that , but that lead me to the following question :
It seems (after a few search) that JIT compiler can't optimize this code as c++ compiler can do, namely just call sqrt once and apply *1000000 on it.
Is there a way to force JIT to do it ?
I repro, I clock the C++ version at 1.2 msec, the C# version at 12.2 msec. The reason is readily visible if you take a look at the machine code the C++ code generator and optimizer emits. It rewrites the loop like this (using the C# equivalent):
double temp = Math.Sqrt(2.0);
for (int i = 0; i < 1000000; ++i) {
res += temp;
}
That's a combination of two optimizations, called "invariant code motion" and "loop hoisting". In other words, the C++ compiler knows enough about the sqrt() function to know that its return value is not affected by the surrounding code so can be moved at will. And that it is then worth-while to move that code outside of the loop and create an extra local variable to store the result. And that calculating sqrt() is slower than adding. Sounds obvious but that's a rule that has to built into the optimizer and has to be considered, one of many, many rules.
And yes, the jitter optimizer misses that one. It is guilty of not being able to spent the same amount of time as the C++ optimizer, it operates under heavy time constraints. Because if it takes too long then the program takes too much time getting started.
Tongue in cheek: a C# programmer needs to be a bit smarter than the code generator and recognize these optimization opportunities himself. This is a fairly obvious one. Well, now that you know about it anyway :)
To do the optimization you want, the compiler has to assure that the function Sqrt() will always return the same value for a certain input.
The compiler can do all kinds of checks that the function isn't using any other "outer" variables to see if it's stateless. But that also doesn't always mean that it can't be affected by side affects.
When a function is called in a loop it should be called in each iteration (think of a multithreaded environment to see why this is important). So usually it's up to the user to take constant stuff out of the loop if he wants that kind of optimization.
Back to the C++ compiler - the compiler might have certain optimization for its library functions. A lot of compilers try to optimize important libraries like the math library, so that might be compiler specific.
Another big difference is in C++ you usually include that kinda stuff from a header file. This means the compiler may have all the information it needs to decide if the function call doesn't change between calls.
The .Net compiler (at compile time - Visual Studio) doesn't always have all the code to parse. Most of the library functions are already compiled (into IL - first stage). And so might not be able to do deep optimizations considering 3rd party dlls. And at the JIT (runtime) compilation it will probably be too costly to do these kind of optimizations across assemblies.
It might help the JIT (or even the C# compiler) if Math.Sqrt was annotated as [Pure]. Then, assuming the arguments to the function are constant as they are in your example, the calculation of the value could be lifted outside the loop.
What's more, such a loop could reasonably be converted into the code:
double res = 1000000 * Math.Sqrt(2.0);
In theory the compiler or JIT could perform this automatically. However I suspect that it would be optimising for a pattern that happens rarely in actual code.
I opened a feature request for ReSharper, suggesting that the design-time tool suggests such a refactoring.

Cost of using params in C#

Does anyone have advice for using the params in C# for method argument passing. I'm contemplating making overloads for the first 6 arguments and then a 7th using the params feature. My reasoning is to avoid the extra array allocation the params feature require. This is for some high performant utility methods. Any advice? Is it a waste of code to create all the overloads?
Honestly, I'm a little bothered by everyone shouting "premature optimization!" Here's why.
What you say makes perfect sense, particularly as you have already indicated you are working on a high-performance library.
Even BCL classes follow this pattern. Consider all the overloads of string.Format or Console.WriteLine.
This is very easy to get right. The whole premise behind the movement against premature optimization is that when you do something tricky for the purposes of optimizing performance, you're liable to break something by accident and make your code less maintainable. I don't see how that's a danger here; it should be very straightforward what you're doing, to yourself as well as any future developer who may deal with your code.
Also, even if you profiled the results of both approaches and saw only a very small difference in speed, there's still the issue of memory allocation. Creating a new array for every method call entails allocating more memory that will need to be garbage collected later. And in some scenarios where "nearly" real-time behavior is desired (such as algorithmic trading, the field I'm in), minimizing garbage collections is just as important as maximizing execution speed.
So, even if it earns me some downvotes: I say go for it.
(And to those who claim "the compiler surely already does something like this"--I wouldn't be so sure. Firstly, if that were the case, I fail to see why BCL classes would follow this pattern, as I've already mentioned. But more importantly, there is a very big semantic difference between a method that accepts multiple arguments and one that accepts an array. Just because one can be used as a substitute for the other doesn't mean the compiler would, or should, attempt such a substitution).
Yes, that's the strategy that the .NET framework uses. String.Concat() would be a good example. It has overloads for up to 4 strings, plus a fallback one that takes a params string[]. Pretty important here, Concat needs to be fast and is there to help the user fall in the pit of success when he uses the + operator instead of a StringBuilder.
The code duplication you'll get is the price. You'd profile them to see if the speedup is worth the maintenance headache.
Fwiw: there are plenty of micro-optimizations like this in the .NET framework. Somewhat necessary because the designers could not really predict how their classes were going to be used. String.Concat() is just as likely to be used in a tight inner loop that is critical to program perf as, say, a config reader that only runs once at startup. As the end-user of your own code, you typically have the luxury of not having to worry about that. The reverse is also true, the .NET framework code is remarkably free of micro-optimizations when it is unlikely that their benefit would be measurable. Like providing overloads when the core code is slow anyway.
You can always pass Tuple as a parameter, or if the types of the parameters are always the same, an IList<T>.
As other answers and comments have said, you should only optimize after:
Ensuring correct behavior.
Determining the need to optimize.
My point is, if your method is capable of getting unlimited number of parameters, then the logic inside it works in an array-style. So, having overloads for limited number of parameters wouldn't be helping. Unless, you can implement limited number of parameters in a whole different way that is much faster.
For example, if you're handing the parameters to a Console.WriteLine, there's a hidden array creation in there too, so either way you end up having an array.
And, sorry for bothering Dan Tao, I also feel like it is premature optimization. Because you need to know what difference would it make to have overloads with limited number of parameters. If your application is that much performance-critical, you'd need to implement both ways and try to run a test and compare execution times.
Don't even think about performance at this stage. Create whatever overloads will make your code easier to write and easier to understand at 4am two years from now. Sometimes that means params, sometimes that means avoiding it.
After you've got something that works, figure out if these are a performance problem. It's not hard to make the parameters more complicated, but if you add unnecessary complexity now, you'll never make them less so later.
You can try something like this to benchmark the performance so you have some concrete numbers to make decisions with.
In general, object allocation is slightly faster than in C/C++ and deletion is much, much faster for small objects -- until you have tens of thousands of them being made per second. Here's an old article regarding memory allocation performance.

Categories