I have a helper method that takes a begin date and an end date and through certain business logic yields an integer result. This helper method is sometimes called in excess of 10,000 times for a given set of data (though this doesn't occur often).
Question:
Considering performance only, is it more efficient to make this helper method as a static method to some helper class, or would it be more gainful to have the helper method as a public method to a class?
Static method example:
// an iterative loop
foreach (var result in results) {
int daysInQueue = HelperClass.CalcDaysInQueue(dtBegin, dtEnd);
}
Public member method example:
// an iterative loop
HelperClass hc = new HelperClass();
foreach (var result in results) {
int daysInQueue = hc.CalcDaysInQueue(dtBegin, dtEnd);
}
Thanks in advance for the help!
When you call an instance method the compiler always invisibly passes one extra parameter, available inside that method under this name. static methods are not called on behalf of any object, thus they don't have this reference.
I see few benefits of marking utility methods as static:
small performance improvement, you don't pay for a reference to this which you don't really use. However I doubt you will ever see the difference.
convenience - you can call static method wherever and whenever you want, the compiler is not forcing you to provide an instance of an object, which is not really needed for that method
readability: instance method should operate on instance's state, not merely on parameters. If it's an instance method not needing an instance to work, it's confusing.
The difference in performance here is effectively nothing. You will have a hard time actually measuring the difference in time (and getting over the "noise" of other stuff going on with your CPU), that's how small it will be.
Unless you happen to go and perform a whole bunch of database queries or read in several gigabytes of info from files in the constructor of the object (I'm assuming here that' it's just empty) it will have a fairly small cost, and since it's out of the loop it doesn't scale at all.
You should be making this decision based on what logically makes sense, not based on performance, until you have a strong reason to believe that there is a significant, and necessary performance gain to be had by violating standard practices/readability/etc.
In this particular case your operation is logically 'static'. There is no state that is used, so there is no need to have an instance of the object, as such the method should be made static. Others have said that it might perform better, which is very possibly true, but that shouldn't be why you make it static. If the operation logically made sense as an instance method you shouldn't try to force it into a static method just to try to get it to run faster; that's learning the wrong lesson here.
Just benchmark it :) In theory a static method should be faster since it leaves out the virtual call overhead but this overhead might not be significant in your case (but I'm not even sure what language the example is in). Just time both loops with a large enough number of iterations for it to take a minute or so and see for yourself. Jut make sure you use non-trivial data so your compiler doesn't optimize the calls out.
Based on my understanding, it would be more beneficial for performance to make it a static method. This means that there isn't an instance of the object created, although the performance difference would be negligible, I think. That is the case if there isn't some data that has to be recreated every time you call the static function, which could be stored in the class object.
You say 'considering performance only'. In that case you should fully focus on whats inside
HelperClass.CalcDaysInQueue(dtBegin, dtEnd);
And not on the 0.0001% of runtime spent in calling that routine. If it's a short routine the JIT compiler will inline it anyway and in that case there will be NO performance difference between the static and instance method.
Related
I have a class like below (refactored for purpose) in a PR and one of the seniors who is now on holiday stated I was misusing static variables and I should pass variables from method to method.
class Class {
static int dataPool;
// the below methods are called cyclicly for the duration of the application,
// xxxx times, First method is called and then Second is called,
// over and over
public static void First()
{
// at the start, reads dataPool count to do computation
// at the end, set dataPool count based on computation
}
public static void Second()
{
// at the start, read dataPool count to do some computation
// at the end, set dataPool count based on computation
}
}
I want to understand why using variables like the above is 'bad' to learn. Can anyone explain please?
I want to understand why using variables like the above is 'bad' to learn. Can anyone explain please?
Main reasons:
It means your code is no-longer re-entrant.
It means your code is no-longer thread-safe (or even less thread-safe) and cannot run concurrently.
It means your code has state in unexpected places (see: Principle of Least Astonishment).
It means your code will perform poorly in a multi-threaded system (even if it's strictly single-threaded and without reentrancy) as having mutable static-state means your compiler cannot optimize your program's use of memory because it cannot guarantee thread ownership of data. Cache coherency is expensive.
It means you cannot correctly unit-test your code (using the strict definition of "unit test", not Visual Studio's).
It means the JIT compiler cannot optimize your program as much as it could if it could otherwise because reasoning about the lifetime of static state is very difficult.
It means your code could be broken if another function or thread decides to mutate your static state without you knowing it.
I've noticed that the following code generates heap allocations which trigger the garbage collector at some point and I would like to know why this is the case and how to avoid it:
private Dictionary<Type, Action> actionTable = new Dictionary<Type, Action>();
private void Update(int num)
{
Action action;
// if (!actionTable.TryGetValue(typeof(int), out action))
if (false)
{
action = () => Debug.Log(num);
actionTable.Add(typeof(int), action);
}
action?.Invoke();
}
I understand that using a lambda such as () => Debug.Log(num) will generate a small helper class (e.g. <>c__DisplayClass7_0) to hold the local variable. This is why I wanted to test if I could cache this allocation in a dictionary. However, I noticed, that the call to Update leads to allocations even when the lambda code is never reached due to the if-statement. When I comment out the lambda, the allocation disappears from the profiler. I am using the Unity Profiler tool (a performance reporting tool within the Unity game engine) which shows such allocations in bytes per frame while in development/debug mode.
I surmise that the compiler or JIT compiler generates the helper class for the lambda for the scope of the method even though I don't understand why this would be desirable.
Finally, is there any way of caching delegates in this manner without allocating and without forcing the calling code to cache the action in advance? (I do know, that I could also allocate the action once in the client code, but in this example I would strictly like to implement some kind of automatic caching because I do not have complete control over the client).
Disclaimer: This is mostly a theoretical question out of interest. I do realize that most applications will not benefit from micro-optimizations like this.
Servy's answer is correct and gives a good workaround. I thought I might add a few more details.
First off: implementation choices of the C# compiler are subject to change at any time and for any reason; nothing I say here is a requirement of the language and you should not depend on it.
If you have a closed-over outer variable of a lambda then all closed-over variables are made into fields of a closure class, and that closure class is allocated from the long-term pool ("the heap") as soon as the function is activated. This happens regardless of whether the closure class is ever read from.
The compiler team could have chosen to defer creation of the closure class until the first point where it was used: where a local was read or written or a delegate was created. However, that would then add additional complexity to the method! That makes the method larger, it makes it slower, it makes it more likely that you'll have a cache miss, it makes the jitter work harder, it makes more basic blocks so the jitter might skip an optimization, and so on. This optimization likely does not pay for itself.
However, the compiler team does make similar optimizations in cases where it is more likely to pay off. Two examples:
The 99.99% likely scenario for an iterator block (a method with a yield return in it) is that the IEnumerable will have GetEnumerator called exactly once. The generated enumerable therefore has logic that implements both IEnumerable and IEnumerator; the first time GetEnumerator is called, the object is cast to IEnumerator and returned. The second time, we allocate a second enumerator. This saves one object in the highly likely scenario, and the extra code generated is pretty simple and rarely called.
It is common for async methods to have a "fast path" that returns without ever awaiting -- for example, you might have an expensive asynchronous call the first time, and then the result is cached and returned the second time. The C# compiler generates code that avoids creating the "state machine" closure until the first await is encountered, and therefore prevents an allocation on the fast path, if there is one.
These optimizations tend to pay off, but 99% of the time when you have a method that makes a closure, it actually makes the closure. It's not really worth deferring it.
I surmise that the compiler or JIT compiler generates the helper class for the lambda for the scope of the method even though I don't understand why this would be desirable.
Consider the case where there's more than one anonymous method with a closure in the same method (a common enough occurrence). Do you want to create a new instance for every single one, or just have them all share a single instance? They went with the latter. There are advantages and disadvantages to either approach.
Finally, is there any way of caching delegates in this manner without allocating and without forcing the calling code to cache the action in advance?
Simply move that anonymous method into its own method, so that when that method is called the anonymous method is created unconditionally.
private void Update(int num)
{
Action action = null;
// if (!actionTable.TryGetValue(typeof(int), out action))
if (false)
{
Action CreateAction()
{
return () => Debug.Log(num);
}
action = CreateAction();
actionTable.Add(typeof(int), action);
}
action?.Invoke();
}
(I didn't check if the allocation happened for a nested method. If it does, make it a non-nested method and pass in the int.)
Which of the following two pieces of code will perform better in different cases and why?
1.
private readonly ConcurrentDictionary<int, List<T>> _coll;
_coll.GetOrAdd(1, new List<T>());
This creates a new List on every call even when it is not needed (How much does this statement still matter if we pass the capacity as 0?).
2.
private readonly ConcurrentDictionary<int, List<T>> _coll;
_coll.GetOrAdd(1, (val) => new List<T>());
This only creates the List on demand, but has a delegate call.
In terms of memory, the first way is going to cause an allocation every time, while the second will use a cached delegate object since it does not capture any variables. The compiler handles the generation of the cached delegate. There is no difference in the first case for capacity set to zero since the default constructor for List<T> uses an empty array on initialization, the same as an explicit capacity of 0.
In terms of execution instructions, they are the same when the key is found since the second argument is not used. If the key is not found, the first way simply has to read a local variable while the second way will have a layer of indirection to invoke a delegate. Also, looking into the source code, it appears that GetOrAdd with the factory will do an additional lookup (via TryGetValue) to avoid invoking the factory. The delegate could also potentially be executed multiple times. GetOrAdd simply guarantees that you see one entry in the dictionary, not that the factory is invoked only once.
In summary, the first way might be more performant if the key is typically not found since the allocation needs to happen anyway and there is no indirection via a delegate. However if the key is typically found, the second way is more performant because there are fewer allocations. For an implementation in a cache, you typically expect there to be lots of hits so if that is where this is, I would recommend the second way. In practice, the difference between the two depends on how sensitive the overall application is to allocations in this code path.
Also whatever implementation that is using this will likely need to implement locking around the List<T> that is returned since it is not thread-safe.
I can't imagine you'd see much of a difference in performance, unless you were working with an extremely large dataset. It would also depend on how likely each of your items are hit. Generics are extremely well optimised at the runtime level, and using a delegate results in an allocation either way.
My suggestion would be to use Enumerable.Empty<T>() as you'll be saving yourself an allocation on each array item.
Are there overall rules/guidelines for what makes a method thread-safe? I understand that there are probably a million one-off situations, but what about in general? Is it this simple?
If a method only accesses local variables, it's thread safe.
Is that it? Does that apply for static methods as well?
One answer, provided by #Cybis, was:
Local variables cannot be shared among threads because each thread gets its own stack.
Is that the case for static methods as well?
If a method is passed a reference object, does that break thread safety? I have done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method? If so, what are they?"
EDIT
A lot of good points have been made here. I think the real answer to this question is: "There are no simple rules to ensure thread safety." Cool. Fine. But in general I think the accepted answer provides a good, short summary. There are always exceptions. So be it. I can live with that.
If a method (instance or static) only references variables scoped within that method then it is thread safe because each thread has its own stack:
In this instance, multiple threads could call ThreadSafeMethod concurrently without issue.
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number; // each thread will have its own variable for number.
number = parameter1.Length;
return number;
}
}
This is also true if the method calls other class method which only reference locally scoped variables:
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number;
number = this.GetLength(parameter1);
return number;
}
private int GetLength(string value)
{
int length = value.Length;
return length;
}
}
If a method accesses any (object state) properties or fields (instance or static) then you need to use locks to ensure that the values are not modified by a different thread:
public class Thing
{
private string someValue; // all threads will read and write to this same field value
public int NonThreadSafeMethod(string parameter1)
{
this.someValue = parameter1;
int number;
// Since access to someValue is not synchronised by the class, a separate thread
// could have changed its value between this thread setting its value at the start
// of the method and this line reading its value.
number = this.someValue.Length;
return number;
}
}
You should be aware that any parameters passed in to the method which are not either a struct or immutable could be mutated by another thread outside the scope of the method.
To ensure proper concurrency you need to use locking.
for further information see lock statement C# reference and ReadWriterLockSlim.
lock is mostly useful for providing one at a time functionality,
ReadWriterLockSlim is useful if you need multiple readers and single writers.
If a method only accesses local variables, it's thread safe. Is that it?
Absolultely not. You can write a program with only a single local variable accessed from a single thread that is nevertheless not threadsafe:
https://stackoverflow.com/a/8883117/88656
Does that apply for static methods as well?
Absolutely not.
One answer, provided by #Cybis, was: "Local variables cannot be shared among threads because each thread gets its own stack."
Absolutely not. The distinguishing characteristic of a local variable is that it is only visible from within the local scope, not that it is allocated on the temporary pool. It is perfectly legal and possible to access the same local variable from two different threads. You can do so by using anonymous methods, lambdas, iterator blocks or async methods.
Is that the case for static methods as well?
Absolutely not.
If a method is passed a reference object, does that break thread safety?
Maybe.
I've done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
You are going to have to learn to live with disappointment. This is a very difficult subject.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method?
Nope. As you saw from my example earlier an empty method can be non-thread-safe. You might as well ask "is there a short list of rules that ensures a method is correct". No, there is not. Thread safety is nothing more than an extremely complicated kind of correctness.
Moreover, the fact that you are asking the question indicates your fundamental misunderstanding about thread safety. Thread safety is a global, not a local property of a program. The reason why it is so hard to get right is because you must have a complete knowledge of the threading behaviour of the entire program in order to ensure its safety.
Again, look at my example: every method is trivial. It is the way that the methods interact with each other at a "global" level that makes the program deadlock. You can't look at every method and check it off as "safe" and then expect that the whole program is safe, any more than you can conclude that because your house is made of 100% non-hollow bricks that the house is also non-hollow. The hollowness of a house is a global property of the whole thing, not an aggregate of the properties of its parts.
There is no hard and fast rule.
Here are some rules to make code thread safe in .NET and why these are not good rules:
Function and all functions it calls must be pure (no side effects) and use local variables. Although this will make your code thread-safe, there is also very little amount of interesting things you can do with this restriction in .NET.
Every function that operates on a common object must lock on a common thing. All locks must be done in same order. This will make the code thread safe, but it will be incredibly slow, and you might as well not use multiple threads.
...
There is no rule that makes the code thread safe, the only thing you can do is make sure that your code will work no matter how many times is it being actively executed, each thread can be interrupted at any point, with each thread being in its own state/location, and this for each function (static or otherwise) that is accessing common objects.
It must be synchronized, using an object lock, stateless, or immutable.
link: http://docs.oracle.com/javase/tutorial/essential/concurrency/immutable.html
What is the performance concern with static method over non-static methods? I have read that Static methods are better in terms of performance but i want to know, how they are faster? If a method is not using any instance member then our compiler should take care of it and treat it as static method.
Edit: Eric comments more on this here, and hints that there are some times when call is used... although note that his new() example isn't guaranteed ;-p
In the original compiler (pre-1.1), the compiler did treat non-virtual instance methods (without this) as static; the problem was that this lead to some odd problems with null checking, i.e.
obj.SomeMethod();
didn't threw an exception (for obj=null and non-virtual method SomeMethod which didn't touch this). Which was bad if you ever changed the implementation of SomeMethod. When they investigated the cost of adding the explicit null check (i.e. null-check then static-call), it turned out to be just the same as using a virtual-call, so they did that instead, which makes it far more flexible and predictable.
Note that the "don't throw an exception" is also entirely the behaviour if SomeMethod is an extension-method (static).
I think at one point you could emit IL to invoke a regular instance method via static-call, but the last time I tried I got the "oh no you don't!" message from the CLR (this operation may destabilise the runtime); either they blocked this entirely, or (perhaps more likely) I borked the custom IL.
Yes a static call would be faster - you don't need to create an instance of the object before you call the method. (Although you obviously won't notice the difference)
In practical terms it doesn't matter if the compiler optimizes a method (makes the instance method static) - you won't call the instance method unless you've already created the instance already, right?
At the end of the day you should rather try to optimize your code for maintainability rather than trying to save 3 nanoseconds here or there.
See this question.
Here's the excerpt:
a static call is 4 to 5 times faster
than constructing an instance every
time you call an instance method.
However, we're still only talking
about tens of nanoseconds per call
I doubt the compiler will treat it as a static method, although you can check for yourself. The benefit would be no creation of the instance. No garbage collector to worry about. And only the static constructor to be called, if there is one.
static methods fast,because constructing an instance
buy if you only create a instance and save static member , performance is equal
they are very small in total performance
so .......
yes static method is fast but the memory acquired by the static variable is not controlled by GC and is not released even if it is not needed, so that is an issue.
but more than anything else you should consider the design of the allpication as the memory and speed has increased by days but your design may suck if you dont make use of static variables properly.