I have a class like below (refactored for purpose) in a PR and one of the seniors who is now on holiday stated I was misusing static variables and I should pass variables from method to method.
class Class {
static int dataPool;
// the below methods are called cyclicly for the duration of the application,
// xxxx times, First method is called and then Second is called,
// over and over
public static void First()
{
// at the start, reads dataPool count to do computation
// at the end, set dataPool count based on computation
}
public static void Second()
{
// at the start, read dataPool count to do some computation
// at the end, set dataPool count based on computation
}
}
I want to understand why using variables like the above is 'bad' to learn. Can anyone explain please?
I want to understand why using variables like the above is 'bad' to learn. Can anyone explain please?
Main reasons:
It means your code is no-longer re-entrant.
It means your code is no-longer thread-safe (or even less thread-safe) and cannot run concurrently.
It means your code has state in unexpected places (see: Principle of Least Astonishment).
It means your code will perform poorly in a multi-threaded system (even if it's strictly single-threaded and without reentrancy) as having mutable static-state means your compiler cannot optimize your program's use of memory because it cannot guarantee thread ownership of data. Cache coherency is expensive.
It means you cannot correctly unit-test your code (using the strict definition of "unit test", not Visual Studio's).
It means the JIT compiler cannot optimize your program as much as it could if it could otherwise because reasoning about the lifetime of static state is very difficult.
It means your code could be broken if another function or thread decides to mutate your static state without you knowing it.
Related
I have learned that accessing the same Object from different Threads is not threadsafe and should be protected. Be it thru Locking or 'Interlocked.Exchange' or Immutables or any other means.
This means that the following code is potentially NOT threadsafe because it does not protect access to the shared 'test' object.
My Questions are:
Is the following code Safe or Not?
If Not, what is the worst that could happen?
If Not, is there any exception that is thrown on a dirty Read or Writes that I could catch to prevent the worst?
class Test
{
public Test()
{
Foo =new Random().Next(10000);
}
public int Foo;
}
internal class Program
{
public static async Task Main(string[] args)
{
var test =new Test();
var exitToken = new CancellationTokenSource(TimeSpan.FromSeconds(120)).Token;
var readerTask = Task.Run(async () =>
{
while (!exitToken.IsCancellationRequested)
{
Console.WriteLine("Random Foo: " + test.Foo);
await Task.Delay(TimeSpan.FromSeconds(5));
}
});
var writerTask = Task.Run(async () =>
{
while (!exitToken.IsCancellationRequested)
{
test = new Test();
await Task.Delay(TimeSpan.FromSeconds(5));
}
});
await Task.WhenAll(readerTask, writerTask);
}
}
Is the following code Safe or Not?
When talking about "safe" it is important to specify what it is "safe" for. Usually when designing a class we have some requirements we want the class to fulfill (sometimes called "invariants"). If these requirements are fulfilled even if the object is used from multiple threads we call it "thread safe". Take a trivial class:
public class Incrementer{
public int Value {get; private set;}
public void Increment() => Value++;
}
We would probably have a requirement that the value should exactly equal the number of times the Increment method was called. This requirement would not be fulfilled if it is called concurrently, but that does not mean it will throw an exception, blow up, or crash, just that the value does not match our requirement.
If we change the requirement to be "value > 0 if Increment is called at least one time", then the read-modify-write issue is not relevant, since we only care if the value has been written once.
In your specific example the only variable that is written to and read concurrently is the test-variable. Since writing references are atomic we know this will always point to some object. There could potentially be issues with reordering when the object is constructed. This could cause the reference to be updated before the actual value has been set, and would therefore be observed as zero. I do not think this would occur with common x86 hardware and software, but the safe version would be to either use a lock, or create the object, issue a memory barrier and then update the reference.
The other potential risk is that the read-thread does not update the reference from memory, and just loads it once and reuses the same value in each iteration. I do not think this could actually occur in this case, since the loop also calls Task.Delay and Console.WriteLine, and I would expect both of these to issue a memory barriers or something else that ensures a read actually occurs, and that no reordering is done. But I would probably still recommend using a lock or marking the variable as volatile, it is usually a good idea to err on the side of caution.
If Not, what is the worst that could happen?
In this case, that the same value would always be printed. But as I mentioned above, this will most likely not occur.
If Not, is there any exception that is thrown on a dirty Read or Writes that I could catch to prevent the worst?
In general, no. Some types may throw exceptions if they are used concurrently or from the wrong thread, a typical example would be any UI classes. But this is not something that should be relied upon.
This is also one of the reason why multi threading bugs are so devious, If you are unlucky the only effect is that the value is wrong, and it might not even be obviously wrong, it might just be off a little bit. And if you are even more unlucky it only occur in special circumstances, like when running under full load on a 20 core server, so you might never be able to reproduce it in a development environment.
I think I slightly got the idea what exactly Volatile.Write and Volatile.Read do, but I've seen some examples where Volatile.Write is used at the beginning of a method like in the CLR via C# book where Jeffrey shows how to implement a Simple Spin Lock using Interlocked. Here's the code:
struct SimpleSpinLock {
private int _resourceInUse; // 0 = false, 1 = true
public void Enter() {
while (true) {
if (Interlocked.Exchange(ref _resourceInUse, 1) == 0) return;
}
}
public void Leave() {
Volatile.Write(ref _resourceInUse, 0); // Why is it here??
}
}
This is how the class is suppose to be used:
class SomeResource {
private SimpleSpinLock _s1 = new SimpleSpinLock();
public void AccessResource() {
_s1.Enter();
// Some code that only one thread at a time can get in..
_s1.Leave();
}
}
So as I know Volatile.Write is used to guarantee that instructions which are above it will be executed exactly before the Volatile.Write. But in the Leave method there is only one instruction and what's the reason to use Volatile.Write here? Probably I understand the things completely wrong, so I'd be grateful if someone could lead me to the right way.
Would not claim that I have enough brainpower to fully understand/explain this, but here are my 5 cents. First of all compiler can inline Leave method call cause it consists of only one line, so actual write can be surrounded by other instructions. Secondary (and mainly, I suppose) the Volatile class docs state next:
On a multiprocessor system, a volatile write operation ensures that a value written to a memory location is immediately visible to all processors.
So the goal of this Volatile.Write call to make the change seen to other processor as soon as possible.
Also see answers to this question and read about volatile keyword
Volatile.Write
"Writes a value to a field. On systems that require it, inserts a memory barrier that prevents the processor from reordering memory operations as follows: If a read or write appears before this method in the code, the processor cannot move it after this method."
The way I understand this is:
The method this sentence is referring to is Volatile.Write(). So, "if a read or write operation appears before Volatile.Write in the code, the processor cannot move it after Volatile.Write". That means if another thread (for example) is reading/writing on the same resource it must wait for the Volatile.Write to execute before being scheduled by the processor.
It makes sense, no? I don't think it's a matter of position of read/write instructions in the "hosting" method (Leave), but it's more about read/writes "happening" in between Enter and Leave.
I have a helper method that takes a begin date and an end date and through certain business logic yields an integer result. This helper method is sometimes called in excess of 10,000 times for a given set of data (though this doesn't occur often).
Question:
Considering performance only, is it more efficient to make this helper method as a static method to some helper class, or would it be more gainful to have the helper method as a public method to a class?
Static method example:
// an iterative loop
foreach (var result in results) {
int daysInQueue = HelperClass.CalcDaysInQueue(dtBegin, dtEnd);
}
Public member method example:
// an iterative loop
HelperClass hc = new HelperClass();
foreach (var result in results) {
int daysInQueue = hc.CalcDaysInQueue(dtBegin, dtEnd);
}
Thanks in advance for the help!
When you call an instance method the compiler always invisibly passes one extra parameter, available inside that method under this name. static methods are not called on behalf of any object, thus they don't have this reference.
I see few benefits of marking utility methods as static:
small performance improvement, you don't pay for a reference to this which you don't really use. However I doubt you will ever see the difference.
convenience - you can call static method wherever and whenever you want, the compiler is not forcing you to provide an instance of an object, which is not really needed for that method
readability: instance method should operate on instance's state, not merely on parameters. If it's an instance method not needing an instance to work, it's confusing.
The difference in performance here is effectively nothing. You will have a hard time actually measuring the difference in time (and getting over the "noise" of other stuff going on with your CPU), that's how small it will be.
Unless you happen to go and perform a whole bunch of database queries or read in several gigabytes of info from files in the constructor of the object (I'm assuming here that' it's just empty) it will have a fairly small cost, and since it's out of the loop it doesn't scale at all.
You should be making this decision based on what logically makes sense, not based on performance, until you have a strong reason to believe that there is a significant, and necessary performance gain to be had by violating standard practices/readability/etc.
In this particular case your operation is logically 'static'. There is no state that is used, so there is no need to have an instance of the object, as such the method should be made static. Others have said that it might perform better, which is very possibly true, but that shouldn't be why you make it static. If the operation logically made sense as an instance method you shouldn't try to force it into a static method just to try to get it to run faster; that's learning the wrong lesson here.
Just benchmark it :) In theory a static method should be faster since it leaves out the virtual call overhead but this overhead might not be significant in your case (but I'm not even sure what language the example is in). Just time both loops with a large enough number of iterations for it to take a minute or so and see for yourself. Jut make sure you use non-trivial data so your compiler doesn't optimize the calls out.
Based on my understanding, it would be more beneficial for performance to make it a static method. This means that there isn't an instance of the object created, although the performance difference would be negligible, I think. That is the case if there isn't some data that has to be recreated every time you call the static function, which could be stored in the class object.
You say 'considering performance only'. In that case you should fully focus on whats inside
HelperClass.CalcDaysInQueue(dtBegin, dtEnd);
And not on the 0.0001% of runtime spent in calling that routine. If it's a short routine the JIT compiler will inline it anyway and in that case there will be NO performance difference between the static and instance method.
Are there overall rules/guidelines for what makes a method thread-safe? I understand that there are probably a million one-off situations, but what about in general? Is it this simple?
If a method only accesses local variables, it's thread safe.
Is that it? Does that apply for static methods as well?
One answer, provided by #Cybis, was:
Local variables cannot be shared among threads because each thread gets its own stack.
Is that the case for static methods as well?
If a method is passed a reference object, does that break thread safety? I have done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method? If so, what are they?"
EDIT
A lot of good points have been made here. I think the real answer to this question is: "There are no simple rules to ensure thread safety." Cool. Fine. But in general I think the accepted answer provides a good, short summary. There are always exceptions. So be it. I can live with that.
If a method (instance or static) only references variables scoped within that method then it is thread safe because each thread has its own stack:
In this instance, multiple threads could call ThreadSafeMethod concurrently without issue.
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number; // each thread will have its own variable for number.
number = parameter1.Length;
return number;
}
}
This is also true if the method calls other class method which only reference locally scoped variables:
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number;
number = this.GetLength(parameter1);
return number;
}
private int GetLength(string value)
{
int length = value.Length;
return length;
}
}
If a method accesses any (object state) properties or fields (instance or static) then you need to use locks to ensure that the values are not modified by a different thread:
public class Thing
{
private string someValue; // all threads will read and write to this same field value
public int NonThreadSafeMethod(string parameter1)
{
this.someValue = parameter1;
int number;
// Since access to someValue is not synchronised by the class, a separate thread
// could have changed its value between this thread setting its value at the start
// of the method and this line reading its value.
number = this.someValue.Length;
return number;
}
}
You should be aware that any parameters passed in to the method which are not either a struct or immutable could be mutated by another thread outside the scope of the method.
To ensure proper concurrency you need to use locking.
for further information see lock statement C# reference and ReadWriterLockSlim.
lock is mostly useful for providing one at a time functionality,
ReadWriterLockSlim is useful if you need multiple readers and single writers.
If a method only accesses local variables, it's thread safe. Is that it?
Absolultely not. You can write a program with only a single local variable accessed from a single thread that is nevertheless not threadsafe:
https://stackoverflow.com/a/8883117/88656
Does that apply for static methods as well?
Absolutely not.
One answer, provided by #Cybis, was: "Local variables cannot be shared among threads because each thread gets its own stack."
Absolutely not. The distinguishing characteristic of a local variable is that it is only visible from within the local scope, not that it is allocated on the temporary pool. It is perfectly legal and possible to access the same local variable from two different threads. You can do so by using anonymous methods, lambdas, iterator blocks or async methods.
Is that the case for static methods as well?
Absolutely not.
If a method is passed a reference object, does that break thread safety?
Maybe.
I've done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
You are going to have to learn to live with disappointment. This is a very difficult subject.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method?
Nope. As you saw from my example earlier an empty method can be non-thread-safe. You might as well ask "is there a short list of rules that ensures a method is correct". No, there is not. Thread safety is nothing more than an extremely complicated kind of correctness.
Moreover, the fact that you are asking the question indicates your fundamental misunderstanding about thread safety. Thread safety is a global, not a local property of a program. The reason why it is so hard to get right is because you must have a complete knowledge of the threading behaviour of the entire program in order to ensure its safety.
Again, look at my example: every method is trivial. It is the way that the methods interact with each other at a "global" level that makes the program deadlock. You can't look at every method and check it off as "safe" and then expect that the whole program is safe, any more than you can conclude that because your house is made of 100% non-hollow bricks that the house is also non-hollow. The hollowness of a house is a global property of the whole thing, not an aggregate of the properties of its parts.
There is no hard and fast rule.
Here are some rules to make code thread safe in .NET and why these are not good rules:
Function and all functions it calls must be pure (no side effects) and use local variables. Although this will make your code thread-safe, there is also very little amount of interesting things you can do with this restriction in .NET.
Every function that operates on a common object must lock on a common thing. All locks must be done in same order. This will make the code thread safe, but it will be incredibly slow, and you might as well not use multiple threads.
...
There is no rule that makes the code thread safe, the only thing you can do is make sure that your code will work no matter how many times is it being actively executed, each thread can be interrupted at any point, with each thread being in its own state/location, and this for each function (static or otherwise) that is accessing common objects.
It must be synchronized, using an object lock, stateless, or immutable.
link: http://docs.oracle.com/javase/tutorial/essential/concurrency/immutable.html
Have I understood correctly that all threads have copy of method's variables in their own stack so there won't be problems when a static method is called from different threads?
Yes and no. If the parameters are value types, then yes they have their own copies. Or if the reference type is immutable, then it can't be altered and you have no issues. However, if the parameters are mutable reference types, then there are still possible thread safety issues to consider with the arguments being passed in.
Does that make sense? If you pass a reference type as an argument, it's reference is passed "by value" so it's a new reference that refers back to the old object. Thus you could have two different threads potentially altering the same object in a non-thread-safe way.
If each of those instances are created and used only in the thread using them, then chances are low that you'd get bit, but I just wanted to emphasize that just because you're using static methods with only locals/parameters is not a guarantee of thread-safety (same with instance of course as noted by Chris).
Have I understood correctly that all threads have copy of method's variables in their own stack so there won't be problems when a static method is called from different threads?
No.
First off, it is false that "all threads have a copy of the method's local variables in their own stack." A local variable is only generated on the stack when it has a short lifetime; local variables may have arbitrarily long lifetimes if they are (1) closed-over outer variables, (2) declared in an iterator block, or (3) declared in an async method.
In all of those cases a local variable created by an activation of a method on one thread can later be mutated by multiple threads. Doing so is not threadsafe.
Second, there are plenty of possible problems when calling static methods from different threads. The fact that local variables are sometimes allocated on the stack does not magically make access to shared memory by static methods suddenly correct.
can there be concurrency issues when using C# class with only static methods and no variables?
I assume you mean "no static variables" and not "no local variables".
Absolutely there can be. For example, here's a program with no static variables, no non-static methods, no objects created apart from the second thread, and a single local variable to hold a reference to that thread. None of the methods other than the cctor actually do anything at all. This program deadlocks. You cannot assume that just because your program is dead simple that it contains no threading bugs!
Exercise to the reader: describe why this program that appears to contain no locks actually deadlocks.
class MyClass
{
static MyClass()
{
// Let's run the initialization on another thread!
var thread = new System.Threading.Thread(Initialize);
thread.Start();
thread.Join();
}
static void Initialize()
{ /* TODO: Add initialization code */ }
static void Main()
{ }
}
It sounds like you are looking for some magical way of knowing that your program has no threading issues. There is no such magical way of knowing that, short of making it single-threaded. You're going to have to analyze your use of threads and shared data structures.
There is no such guarantee unless all of the variables are immutable reference types or value types.
If the variables are mutable reference types, proper synchronization needs to be performed.
EDIT: Mutable variables only need to be synchronized if they are shared between threads- locally declared mutables that are not exposed outside of the method need not be synchronized.
Yes, unless methods use only local scope variable and no any gloval variable, so there is no any way any of that methods can impact on the state of any object, if this is true, you have no problems to use it in multithreading. I would say, that even , in this conditions, static they or not, is not relevant.
If they are variables local to the method then yes, you have nothing to worry about. Just make sure you are not passing parameters by reference or accessing global variables and changing them in different threads. Then you will be in trouble.
static methods can refer to data in static fields -- either in their class or outside of it -- which may not be thread safe.
So ultimately the answer to your question is "no", because there may be problems, although usually there won't be.
Two threads should still be able to operate on the same object either by the object being passed in to methods on different threads as parameters, or if an object can be accessed globally via Singleton or the like all bets are off.
Mark
As an addendum to the answers about why static methods are not necessarily thread-safe, it's worth considering why they might be, and why they often are.
The first reason why they might be is, I think, the sort of case you were thinking of:
public static int Max(int x, int y)
{
return x > y ? x : y;
}
This pure function is thread-safe because there is no way for it to affect code on any other thread, the locals x and y remain local to the thead they are on, not being stored in a shared location, captured in a delegate, or otherwise leaving the purely local context.
It's always worth noting, that combinations of thread-safe operations can be non thread-safe (e.g. doing a thread-safe read of whether a concurrent dictionary has a key followed by a thread-safe read of the value for that key, is not thread-safe as state can change between those two thread-safe operations). Static members tend not to be members that can be combined in such non thread-safe ways in order to avoid this.
A static method may also guarantee it's own thread-safety:
public object GetCachedEntity(string key)
{
object ret; //local and remains so.
lock(_cache) //same lock used on every operation that deals with _cache;
return _cache.TryGetValue(key, out ret) ? ret : null;
}
OR:
public object GetCachedEntity(string key)
{
object ret;
return _cache.TryGetValue(key, out ret) ? ret : null; //_cache is known to be thread-safe in itself!
}
Of course here this is no different than an instance member which protects itself against corruption from other threads (by co-operating with all other code that deals with the objects they share).
Notably though, it is very common for static members to be thread-safe, and instance members to not be thread-safe. Almost every static member of the FCL guarantees thread-safety in the documentation, and almost every instance member does not barring some classes specifically designed for concurrent use (even in some cases where the instance member actually is thread-safe).
The reasons are two-fold:
The sort of operations most commonly useful for static members are either pure functions (most of the static members of the Math class, for example) or read static read-only variables which will not be changed by other threads.
It's very hard to bring your own synchronisation to a third-party's static members.
The second point is important. If I have an object whose instance members are not thread-safe, then assuming that calls do not affect non-thread-safe data shared between different instances (possible, but almost certainly a bad design), then if I want to share it between threads, I can provide my own locking to do so.
If however, I am dealing with static members that are not thread-safe, it is much harder for me to do this. Indeed, considering that I may be racing not just with my own code, but with code from other parties, it may be impossible. This would make any such public static member next to useless.
Ironically, the reason that static members tend to be thread-safe is not that it's easier to make them so (though that does cover the pure functions), but that it's harder to! So hard in-fact that the author of the code has to do it for the user, because the user won't be able to themselves.