Are there overall rules/guidelines for what makes a method thread-safe? I understand that there are probably a million one-off situations, but what about in general? Is it this simple?
If a method only accesses local variables, it's thread safe.
Is that it? Does that apply for static methods as well?
One answer, provided by #Cybis, was:
Local variables cannot be shared among threads because each thread gets its own stack.
Is that the case for static methods as well?
If a method is passed a reference object, does that break thread safety? I have done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method? If so, what are they?"
EDIT
A lot of good points have been made here. I think the real answer to this question is: "There are no simple rules to ensure thread safety." Cool. Fine. But in general I think the accepted answer provides a good, short summary. There are always exceptions. So be it. I can live with that.
If a method (instance or static) only references variables scoped within that method then it is thread safe because each thread has its own stack:
In this instance, multiple threads could call ThreadSafeMethod concurrently without issue.
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number; // each thread will have its own variable for number.
number = parameter1.Length;
return number;
}
}
This is also true if the method calls other class method which only reference locally scoped variables:
public class Thing
{
public int ThreadSafeMethod(string parameter1)
{
int number;
number = this.GetLength(parameter1);
return number;
}
private int GetLength(string value)
{
int length = value.Length;
return length;
}
}
If a method accesses any (object state) properties or fields (instance or static) then you need to use locks to ensure that the values are not modified by a different thread:
public class Thing
{
private string someValue; // all threads will read and write to this same field value
public int NonThreadSafeMethod(string parameter1)
{
this.someValue = parameter1;
int number;
// Since access to someValue is not synchronised by the class, a separate thread
// could have changed its value between this thread setting its value at the start
// of the method and this line reading its value.
number = this.someValue.Length;
return number;
}
}
You should be aware that any parameters passed in to the method which are not either a struct or immutable could be mutated by another thread outside the scope of the method.
To ensure proper concurrency you need to use locking.
for further information see lock statement C# reference and ReadWriterLockSlim.
lock is mostly useful for providing one at a time functionality,
ReadWriterLockSlim is useful if you need multiple readers and single writers.
If a method only accesses local variables, it's thread safe. Is that it?
Absolultely not. You can write a program with only a single local variable accessed from a single thread that is nevertheless not threadsafe:
https://stackoverflow.com/a/8883117/88656
Does that apply for static methods as well?
Absolutely not.
One answer, provided by #Cybis, was: "Local variables cannot be shared among threads because each thread gets its own stack."
Absolutely not. The distinguishing characteristic of a local variable is that it is only visible from within the local scope, not that it is allocated on the temporary pool. It is perfectly legal and possible to access the same local variable from two different threads. You can do so by using anonymous methods, lambdas, iterator blocks or async methods.
Is that the case for static methods as well?
Absolutely not.
If a method is passed a reference object, does that break thread safety?
Maybe.
I've done some research, and there is a lot out there about certain cases, but I was hoping to be able to define, by using just a few rules, guidelines to follow to make sure a method is thread safe.
You are going to have to learn to live with disappointment. This is a very difficult subject.
So, I guess my ultimate question is: "Is there a short list of rules that define a thread-safe method?
Nope. As you saw from my example earlier an empty method can be non-thread-safe. You might as well ask "is there a short list of rules that ensures a method is correct". No, there is not. Thread safety is nothing more than an extremely complicated kind of correctness.
Moreover, the fact that you are asking the question indicates your fundamental misunderstanding about thread safety. Thread safety is a global, not a local property of a program. The reason why it is so hard to get right is because you must have a complete knowledge of the threading behaviour of the entire program in order to ensure its safety.
Again, look at my example: every method is trivial. It is the way that the methods interact with each other at a "global" level that makes the program deadlock. You can't look at every method and check it off as "safe" and then expect that the whole program is safe, any more than you can conclude that because your house is made of 100% non-hollow bricks that the house is also non-hollow. The hollowness of a house is a global property of the whole thing, not an aggregate of the properties of its parts.
There is no hard and fast rule.
Here are some rules to make code thread safe in .NET and why these are not good rules:
Function and all functions it calls must be pure (no side effects) and use local variables. Although this will make your code thread-safe, there is also very little amount of interesting things you can do with this restriction in .NET.
Every function that operates on a common object must lock on a common thing. All locks must be done in same order. This will make the code thread safe, but it will be incredibly slow, and you might as well not use multiple threads.
...
There is no rule that makes the code thread safe, the only thing you can do is make sure that your code will work no matter how many times is it being actively executed, each thread can be interrupted at any point, with each thread being in its own state/location, and this for each function (static or otherwise) that is accessing common objects.
It must be synchronized, using an object lock, stateless, or immutable.
link: http://docs.oracle.com/javase/tutorial/essential/concurrency/immutable.html
Related
I have a class like below (refactored for purpose) in a PR and one of the seniors who is now on holiday stated I was misusing static variables and I should pass variables from method to method.
class Class {
static int dataPool;
// the below methods are called cyclicly for the duration of the application,
// xxxx times, First method is called and then Second is called,
// over and over
public static void First()
{
// at the start, reads dataPool count to do computation
// at the end, set dataPool count based on computation
}
public static void Second()
{
// at the start, read dataPool count to do some computation
// at the end, set dataPool count based on computation
}
}
I want to understand why using variables like the above is 'bad' to learn. Can anyone explain please?
I want to understand why using variables like the above is 'bad' to learn. Can anyone explain please?
Main reasons:
It means your code is no-longer re-entrant.
It means your code is no-longer thread-safe (or even less thread-safe) and cannot run concurrently.
It means your code has state in unexpected places (see: Principle of Least Astonishment).
It means your code will perform poorly in a multi-threaded system (even if it's strictly single-threaded and without reentrancy) as having mutable static-state means your compiler cannot optimize your program's use of memory because it cannot guarantee thread ownership of data. Cache coherency is expensive.
It means you cannot correctly unit-test your code (using the strict definition of "unit test", not Visual Studio's).
It means the JIT compiler cannot optimize your program as much as it could if it could otherwise because reasoning about the lifetime of static state is very difficult.
It means your code could be broken if another function or thread decides to mutate your static state without you knowing it.
If I have a local variable like so:
Increment()
{
int i = getFromDb(); // get count for a customer from db
};
And this is an instance class which gets incremented (each time a customer - an instance object - makes a purchase), is this variable thread safe? I hear that local variables are thread safe because each thread gets its own stack, etc etc.
Also, am I right in thinking that this variable is shared state? What I'm lacking in the thinking dept is that this variable will be working with different customer objects (e.g. John, Paul, etc) so is thread safe but this is flawed thinking and a bit of inexperience in concurrent programming. This sounds very naive, but then I don't have a lot of experience in concurrent coding like I do in general, synchronous coding.
EDIT: Also, the function call getFromDb() isn't part of the question and I don't expect anyone to guess on its thread safety as it's just a call to indicate the value is assigned from a function which gets data from the db. :)
EDIT 2: Also, getFromDb's thread safety is guaranteed as it only performs read operations.
i is declared as a local (method) variable, so it only normally exists in the stack-frame of Increment() - so yes, i is thread safe... (although I can't comment on getFromDb).
except if:
Increment is an iterator block (i.e. uses yield return or yield break)
i is used in an anonymous method ( delegate { i = i + 1;} ) or lambda (foo => {i=i+foo;}
In the above two scenarios, there are some cases when it can be exposed outside the stack. But I doubt you are doing either.
Note that fields (variables on the class) are not thread-safe, as they are trivially exposed to other threads. This is even more noticeable with static fields, since all threads automatically share the same field (except for thread-static fields).
Your statement has two separate parts - a function call, and an assignment.
The assignment is thread safe, because the variable is local. Every different invocation of this method will get its own version of the local variable, each stored in a different stack frame in a different place in memory.
The call to getFromDb() may or may not be threadsafe - depending on its implementation.
As long as the variable is local to the method it is thread-safe. If it was a static variable, then it wouldn't be by default.
class Example
{
static int var1; //not thread-safe
public void Method1()
{ int var2; //thread-safe
}
}
While your int i is thread safe, your whole case is case is probably not thread safe. Your int i is, as you said, thread safe because each thread has its own stack trace and therefore each thread has his own i. However, your thread all share the same database, therefore your database accesses are not thread safe. You need to properly synchronize your database accesses to make sure that each thread will see the database only at the correct moment.
As usual with concurrency and multithreading, you do not need to synchronize on your DB if you only read information. You do need to synchronize as soon as two thread will try to read/write the same set of informations from your DB.
i will be "thread safe" as each thread will have it's own copy of i on the stack as you suggest. The real question will be are the contents of getFromDb() thread safe?
i is a local variable so it's not shared state.
If your getFromDb() is reading from, say, an oracle sequence or a sql server autoincrement field then the db is taking care of synchronization (in most scenarios, excluding replication/distributed DBs) so you can probably safely return the result to any calling thread. That is, the DB is guaranteeing that every getFromDB() call will get a different value.
Thread-safety is usually a little bit of work - changing the type of a variable will rarely get you thread safety since it depends on how your threads will access the data. You can save yourself some headache by reworking your algorithm so that it uses a queue that all consumers synchronize against instead of trying to orchestrate a series of locks/monitors. Or better yet make the algorithm lock-free if possible.
i is thread safe syntactically. But when you assign instance variable's values,instance method return value to i, then the shared data is manipulated by multiple threads.
Have I understood correctly that all threads have copy of method's variables in their own stack so there won't be problems when a static method is called from different threads?
Yes and no. If the parameters are value types, then yes they have their own copies. Or if the reference type is immutable, then it can't be altered and you have no issues. However, if the parameters are mutable reference types, then there are still possible thread safety issues to consider with the arguments being passed in.
Does that make sense? If you pass a reference type as an argument, it's reference is passed "by value" so it's a new reference that refers back to the old object. Thus you could have two different threads potentially altering the same object in a non-thread-safe way.
If each of those instances are created and used only in the thread using them, then chances are low that you'd get bit, but I just wanted to emphasize that just because you're using static methods with only locals/parameters is not a guarantee of thread-safety (same with instance of course as noted by Chris).
Have I understood correctly that all threads have copy of method's variables in their own stack so there won't be problems when a static method is called from different threads?
No.
First off, it is false that "all threads have a copy of the method's local variables in their own stack." A local variable is only generated on the stack when it has a short lifetime; local variables may have arbitrarily long lifetimes if they are (1) closed-over outer variables, (2) declared in an iterator block, or (3) declared in an async method.
In all of those cases a local variable created by an activation of a method on one thread can later be mutated by multiple threads. Doing so is not threadsafe.
Second, there are plenty of possible problems when calling static methods from different threads. The fact that local variables are sometimes allocated on the stack does not magically make access to shared memory by static methods suddenly correct.
can there be concurrency issues when using C# class with only static methods and no variables?
I assume you mean "no static variables" and not "no local variables".
Absolutely there can be. For example, here's a program with no static variables, no non-static methods, no objects created apart from the second thread, and a single local variable to hold a reference to that thread. None of the methods other than the cctor actually do anything at all. This program deadlocks. You cannot assume that just because your program is dead simple that it contains no threading bugs!
Exercise to the reader: describe why this program that appears to contain no locks actually deadlocks.
class MyClass
{
static MyClass()
{
// Let's run the initialization on another thread!
var thread = new System.Threading.Thread(Initialize);
thread.Start();
thread.Join();
}
static void Initialize()
{ /* TODO: Add initialization code */ }
static void Main()
{ }
}
It sounds like you are looking for some magical way of knowing that your program has no threading issues. There is no such magical way of knowing that, short of making it single-threaded. You're going to have to analyze your use of threads and shared data structures.
There is no such guarantee unless all of the variables are immutable reference types or value types.
If the variables are mutable reference types, proper synchronization needs to be performed.
EDIT: Mutable variables only need to be synchronized if they are shared between threads- locally declared mutables that are not exposed outside of the method need not be synchronized.
Yes, unless methods use only local scope variable and no any gloval variable, so there is no any way any of that methods can impact on the state of any object, if this is true, you have no problems to use it in multithreading. I would say, that even , in this conditions, static they or not, is not relevant.
If they are variables local to the method then yes, you have nothing to worry about. Just make sure you are not passing parameters by reference or accessing global variables and changing them in different threads. Then you will be in trouble.
static methods can refer to data in static fields -- either in their class or outside of it -- which may not be thread safe.
So ultimately the answer to your question is "no", because there may be problems, although usually there won't be.
Two threads should still be able to operate on the same object either by the object being passed in to methods on different threads as parameters, or if an object can be accessed globally via Singleton or the like all bets are off.
Mark
As an addendum to the answers about why static methods are not necessarily thread-safe, it's worth considering why they might be, and why they often are.
The first reason why they might be is, I think, the sort of case you were thinking of:
public static int Max(int x, int y)
{
return x > y ? x : y;
}
This pure function is thread-safe because there is no way for it to affect code on any other thread, the locals x and y remain local to the thead they are on, not being stored in a shared location, captured in a delegate, or otherwise leaving the purely local context.
It's always worth noting, that combinations of thread-safe operations can be non thread-safe (e.g. doing a thread-safe read of whether a concurrent dictionary has a key followed by a thread-safe read of the value for that key, is not thread-safe as state can change between those two thread-safe operations). Static members tend not to be members that can be combined in such non thread-safe ways in order to avoid this.
A static method may also guarantee it's own thread-safety:
public object GetCachedEntity(string key)
{
object ret; //local and remains so.
lock(_cache) //same lock used on every operation that deals with _cache;
return _cache.TryGetValue(key, out ret) ? ret : null;
}
OR:
public object GetCachedEntity(string key)
{
object ret;
return _cache.TryGetValue(key, out ret) ? ret : null; //_cache is known to be thread-safe in itself!
}
Of course here this is no different than an instance member which protects itself against corruption from other threads (by co-operating with all other code that deals with the objects they share).
Notably though, it is very common for static members to be thread-safe, and instance members to not be thread-safe. Almost every static member of the FCL guarantees thread-safety in the documentation, and almost every instance member does not barring some classes specifically designed for concurrent use (even in some cases where the instance member actually is thread-safe).
The reasons are two-fold:
The sort of operations most commonly useful for static members are either pure functions (most of the static members of the Math class, for example) or read static read-only variables which will not be changed by other threads.
It's very hard to bring your own synchronisation to a third-party's static members.
The second point is important. If I have an object whose instance members are not thread-safe, then assuming that calls do not affect non-thread-safe data shared between different instances (possible, but almost certainly a bad design), then if I want to share it between threads, I can provide my own locking to do so.
If however, I am dealing with static members that are not thread-safe, it is much harder for me to do this. Indeed, considering that I may be racing not just with my own code, but with code from other parties, it may be impossible. This would make any such public static member next to useless.
Ironically, the reason that static members tend to be thread-safe is not that it's easier to make them so (though that does cover the pure functions), but that it's harder to! So hard in-fact that the author of the code has to do it for the user, because the user won't be able to themselves.
I read all documentation about thread-safe types and the "lock" statement, but I am still not getting it 100%.
When exactly do I need to use the "lock" statement? How it relates to (non) thread-safe types? Thank you.
Imagine an instance of a class with a global variable in it. Imagine two threads call a method on that object at exactly the same time, and that method updates the global variable inside.
The likelihood is that value in the variable will get corrupted. Different languages and compilers/interpreters will deal with this in different ways (or not at all...) but the point is that you get "undesired" and "unpredictable" results.
Now imagine that the method obtains a "lock" on the variable before attempting to read from or write to it. The first thread to call the method will get a "lock" on the variable, the second thread to call the method will have to wait until the lock is released by the first thread. While you still have a race condition (i.e. the second thread might overwrite the value from the first) at least you have predictable results because no two threads (that are unaware of each other) can modify the value at the same time.
You use the lock statement to obtain that lock on the variable. Typically you'd define a separate object variable and use that for the lock object:
public class MyThreadSafeClass
{
private readonly object lockObject = new object();
private string mySharedString;
public void ThreadSafeMethod(string newValue)
{
lock (lockObject)
{
// Once one thread has got inside this lock statement, any others will have to wait outside for their turn...
mySharedString = newValue;
}
}
}
A type is deemed "thread-safe" if it applies the principle that no corruption will occur if shared data is accessed by multiple threads at the same time.
Beware the difference between "immutable" and "thread-safe". Thread-safe says that you have coded for the scenario and won't get corruption if two threads access shared state at the same time, whereas immutability is simply saying you return a new object rather than modifying it. Immutable objects are thread-safe, but not all thread-safe objects are immutable.
Thread safe code means code that can be accessed with many threads and still operate correctly.
In C#, this normally requires some sort of synchronization mechanism. A simple one is the lock statement (which is behind the scenes a call to Monitor.Enter). A code block that is surrounded by a lock block can only be accessed by one thread at a time.
Any use of a type that is not thread safe requires you to manage synchronization yourself.
A good resource to learn about threading in C# is the free eBook by Joe Albahari, found here.
http://en.wikipedia.org/wiki/Thread_safety
So the following code is possible in C# to start a new thread in a method:
int myInt;
new Thread(delegate() {
// yeah, I'm in a separate thread! And I can access myInt as well, awesome
}).Start();
and all your local variables in that scope are also available in that thread. I've used this in times where I need to perform small calculations "on the side" instead of creating small methods and starting them as threads instead, so I find it quite useful, even if it's not that pretty always.
Recently I also got in the situation where I needed to break of a long going thread that needed 5 variables from the calling method, and it got me wondering whether or not this was "good coding" considering both scopes need references to the local variables.
I am therefore wondering if it is better to create a new class (something in the lines of SpecificContext) and pass that as an object to a Thread using the ParameterizedThreadStart delegate or if this design is actually considered as "best practice" since we kind of get type checking of the variables? Also, are there any downsides to using one approach over the other, in both design and/or performance?
Since myInt was used by the delegate, the compiler created a hidden class to convert myInt from a normal stack variable to a reference of an heap variable. In effect, it created a start parameter object for you. Since the compiler is doing all right things for you, I would use the most readable code. In this case, I suspect it would be using myInt 'directly'.
Note that you have no better control over this automaticaly generated object over one you generate yourself. You still have to worry about synchronization, if both threads are accessing these start objects.
When you have a anonymous method, C# compiler generates a class for you. This class contains all local variables, referenced inside a delegate and it is not synchronized.
It is definetely better to create a class and pass variables because you'll have control over the synchonization.