(I wish I could tag this question for all class-constructing languages implementing threads, but here under Java, C++, C# and Ruby. Not that I am cool with all of these)
I think I have seen statements to this effect (that class constructors are threadsafe) on blog posts/tutorials. I can't trace any direct statements, but many posts and tutorials make the assumption, or do not even mention the problem of threads running on constructors and destructors. Sticking to Java, which has a history and some formal approach to multi-threading,
Javamex
Jankov's tutorials
Oracle tutorials
All these articles/webpages are written in a confident way and contain rounded discussions. They all mention the Java feature of method synchronization, so you would hope they might mention how this would affect the special methods of construction and destruction. But they do not.
But class constructors and destructors need to be considered like any class methods. Here is an article on Java,
Safe construction techniques in Java
about leaking 'this' references from constructors. And here are a couple of StackOverflow posts,
Incompletely constructed objects in Java,
Java constructor needs locking
showing constructors with thread issues. I doubt threading issues in special methods are limited to Java.
So, I'm wondering,
Is the assumption of threadsafety (however defined) based on the general layout of constructors?
A tightly coded constructor with not much code would be close to re-entrant code (accepting data through parameters, etc.)
Or do interpreters/compilers handle constructors/destructors with special treatment or protections?
For example, the Java memory model makes some comments on expectations at the end of construction, and I expect other language specifications will too.
Wikipedia on constructors has little on this. In a different context this post Constructors in Programming languages contains some hints, but is not about threadsafety.
While there many be information in specialist books, it would be good to have general (though language-specific mentions are interesting!) explanations/discussion on StackOverflow.
In general local variables which do not point to shared data are thread safe. As you are only creating an object in one thread normally, it is effectively a thread local data structure and thus thread safe (mostly).
In Java, you can break this assumption in a number of ways which include
starting a new thread in a constructor
setting a reference to the object which is visible to another thread.
using non-final fields and adding the object to a thread unsafe container or shared data structure.
Normally these actions are all considered bad practice, so if you avoid these, you have a thread safe constructor without the need for locking.
I think the original question is based on some misunderstanding. Constructors are not considered threadsafe.
If the constructor is affecting anything outside of the object itself, then it is not threadsafe, just like any other member functions of the class.
I think the basis on this is based on a constructor that doesn't affect anything other than the object contents (and there are no static member variables), then it's threadsafe based on the fact that there is nothing outside of the object itself that is affected - and until the constructor has finished, nothing else knows that the object exists, so there is no possibility for another thread to "use" the object. But this fails as soon as some global state (any global/static variable, I/O, etc) gets involved, and at that point, thread safety depends on proper locking (of some sort).
Example of Java constructor thread-safety problem is Double Checked Locking pattern, see http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html. In other words
X x = new X();
is always safe, but
X x; <-- field
x = new X(); <- in a method
is not necessarily safe
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Here is a class field,
class someClass {
Int someClassField = nil
...
(please, please(!) ignore issues of visibility, this question is concerned wih overall design, not language implementation) If I look online at tutorials, I am told this field is safe to use by multiple threads. When the tutorials say safe, they do not mean that one thread can not interfere with the value visible to another. Such interference may be the intention - the field may be a counter. What the tutorials mean is, when one thread changes this field, the field will not be left in an unsafe state. Take this field,
class someClass {
List<List> someClassField = new List<Int>()
...
As I understand, if the field is a simple list, one thread could leave it in an inconsistent state (i.e. partially disconnected). If another thread uses the list it will fail - in a language like C this would be a disaster. Even reading could fail.
Well then, the class used on the field could be asked to copy out it's state (the copying could be extended to a full defence of immutability, but I'm keeping the discussion simple). If the class copies out it's state, then modifications are done away from the copy on the field, in a new copy modified for return. This new, modified copy can be reassigned to the field. But is that assignment threadsafe - in the sense that the value of the field can not be in an inconsistent state - because the allocation of the reference of the new object to the field is atomic?
I'm ignoring all issues of wether a language engine might reorder, cache etc. See the many posts below (Java especially, it seems),
c# question has hints
Rule of thumb answers in Scala, but seems to muddle linear synchronisation with outright disaster?
Dark information on Java's thread visibility issue. One post suggests, yes, reference writing is atomic
Java question related to this. More of the same Java confusion between visibility and objects being unformed
immutable-objects-are-thread-safe-but-why Java question. Sounds like the right question, but what kind of thread safety?
.net question slews off course
I'd like to work this question on a smaller scale...
In most languages object assignment is atomic.
In this specific case you need to be careful though as doing x=new X() there is no guarantee in all languages that X is fully initialized before the assignment. I'm not sure where C# stands on that.
You also have to consider visibility as well as atomicity. In Java for example you would need to make the variable volatile as otherwise changes made in one thread may not be visible in another thread at all.
C++ defines a data race as two or more threads potentially accessing the same memory location simultaneously, at least one of which is a modification. The behavior of programs with data races is undefined. So no, it is not safe for multiple threads to access this field if at least one of them may modify it.
Writing a reference in Java is atomic (writes to longs or doubles are only if the field is volatile btw), but that alone doesn't help you at all.
Example to demonstrate:
class Foo {
int x;
public Foo() { x = 5};
}
Now assume we do an assignment such as foo = new Foo() (without final or volatile modifiers for foo!). From a low level point of view that means we have to do the following:
allocate memory
run the constructor
assign memory address to the field.
but as long as the constructor doesn't read the field we're assigning it to, the compiler can just as well do the following:
allocate memory
assign memory address to the field.
run the constructor
Thread-safe? Certainly not (and you're never guaranteed to actually see the update if you don't put memory barriers in). Java gives more guarantees when final fields are involved, so creating a new immutable object will be thread-safe (you'll never see the uninitialized value of a final field). Volatile fields (we're talking about the assignment here not fields in the object) avoid this problem too in both java and c#. Not sure about C# and readonly though.
In Java the assignment to references and primitives are atomic except for the 64bit primitive types long and double. Assignments to Java longs and doubles can be made atomic by declaring them with the volatile modifier. See: Are 64 bit assignments in Java atomic on a 32 bit machine?
This is so because the Java VM specification requires it in order for the VM to be Java compliant.
Scala runs on top of a standard Java VM and so will also provide the same guarantees as Java with respect to assignments unless they start using JNI.
One of the problems with C/C++ (and one of its strengths) is that both languages allow very fine grained mapping of data structures to memory addresses. At this level, whether writes to memory are atomic or not depend very much on the hardware platform. For instance, CPU's are usually unable to atomically read, let alone write to variables that are not aligned appropriately. e.g. When 16bit variables aren't aligned to even addresses, or when 32bit variables aren't aligned to addresses that are a multiple of 4, and so on. It gets worse when the variable extends beyond one cache line into the next, or beyond one page into the next. Hence C does not guarantee that assignments will be atomic.
John's console application calls my DLL function many times (~15x per sec). I am thinking to put this function as a static method.
I know that :
It can only access static props and objects.
It doesn't need an instance to run the function.
But I don't know if these are the only questions which i need to ask myself.
Each John's calls to my function is in a new thread that he creates.
If there is an error in my function, how will this affect all other calls?
Should I make this function a regular function with instance to the class (which John will create)?
What about GC?
What is the best practice answer for this question?
Sounds like there could be a problem. Calling a method which operates on static (shared) objects in a multithread environment should ring some alert bells for you.
Review your code and if there's a chance that a shared object is accessed from two (or more) threads at the same time, make the object an instance field and make your methods instance methods.
Of course, whether or not there is a risk depends much on the actual code (which you don't show) but making all calls nonstatic means that you lower the potential risk.
Generally, if your static method doesn't use any state (i.e. both reading and writing to static fields and properties, or calling other static methods that do so), there won't be any side effects for the caller, no matter how many threads they start.
If you're asking for best practice, static methods are mostly a bad idea though. If at all, static methods should be only used for very generic utility functionality.
It's not recommended because you can't predict if requirements change and you need some state one day. Then you'd better use a class that the caller can instantiate, and you'll break all existing code that uses your function.
About garbage collection, yes sure that has some overhead, but that is currently the sacrifice if you go the route of memory-managed OO. If you want more control, use unsafe code or a language like C++ or Vala.
I would agree with Wiktor (+1), but would also add that if synchronization is required in the static method, it may be more efficient to use multiple instances. Otherwise, having many threads might be pointless as only one can access a critical section at a time.
In a complex application (involving inversion of control and quite some classes) it is hardly possible to know when a certain object won't be referenced anylonger.
First Question: Suggests the statement above that there is a design flaw in such an application, since there is a pattern saying: "In all OO programming it is about objects using other types of objects to ease up implementation. However: For any object created there should be some owner that will take care of its lifetime."
I assume it is save to state that traditional unmanaged OO programming works like stated above: Some owner will eventually free / release the used object.
However the benefit of a managed language is that in principle you don't have to care about lifetime management anymore. As long an object is referenced anyhow (event-handler...) and from anywhere (maybe not the "owner") it lives and should live, since it is still in use.
I really like that idea and that you don't have to think in terms of owner relationships. However at some point in a program it might get obvious that you want to get rid of an object (or at least mute it in a way as it wouldn't be there).
IStoppable: a suggestion of a design pattern
There could be an interface like "IStoppable", with a "Stop()" method and an "Stopped" event, so that any other object using it can remove their references onto the object. (Therefore would need to unplug their OnStopped event handler within the event handler if that is possible). As a result the object is no longer needed and will get collected.
Maybe it is naive but what i like to believe about that idea is that there wouldn't be an undefined state of the object. Even if some other object missed to unregister itself on OnStopped it will just stay alive and can still get called. Nothing got broken just by removing most references onto it.
I think this pattern can be viewed as an anarchistic app design, since
it is based on the idea that ANY other object can manage the lifetime of an IStoppable
there is no need for an owner
it would be considered as OK to leave the decision of unregistering from an IStoppable to those using it
you don't need to dispose, destroy or throw away - you just stop and let live (let GC do the dirty part)
IDisposable: from scatch and just to check a related pattern:
The disposable pattern suggests that you should still think and work like in unmanaged OO programming: Dispose an object that you don't need anylonger.
using is your friend in a method (very comfortable!)
an own IDisposable implementation is your friend otherwise.
after using it / calling Dispose you shouldn't call it anylonger: undefined behaviour.
implementation and resource centric: it is not so much about when and why, but more about the details of reclaiming resources
So again: In an application where i don't have in mind if anything else but an "owner" is pointing to an object, it is hard to ensure that noone will reference and call it anylonger.
I read of a "Dispose" event in the Component class of .NET. Is there a design pattern around it?
Why would i want to think in terms of Disposables? Why should i?
In a managed world...
Thanks!
Sebastian
I personally don't like the idea of IStoppable, as defined above. You're saying you want any object to manage the lifetime of the object - however, a defined lifecycle really suggests ownership - allowing multiple objects to manage the lifetime of a single object is going to cause issues in the long
IDisposable is, however, a well defined pattern in the .NET world. I wrote an entire series on implementing IDisposable which is a decent introduction to it's usage. However, it's purpose is for handling resource which have an unmanaged component - when you have a managed object that refers to a native resource, it's often desirable to have explicit control of the lifetime of that resource. IDisposable is a defined pattern for handling that situation.
That being said, a proper implementation of IDisposable will still clean up your resources if you fail to call Dispose(). The downside is that the resource will be cleaned up during the object's finalization, which could occur at any arbitrary point after the object is no longer used. This can be very bad for quite a few reasons - especially if you're using native resources that are limited in nature. By not disposing of the resource immediately, you can run out of resources before the GC runs on the object, especially if there isn't a lot of memory pressure in the system.
Ok first I would point out a few things I find uncomfortable about your IStoppable suggestion.
IStoppable raises event Stopped, consumers must know about this and release references. This is a bit complex at best, problematic at worst. Consumers must know where every reference is in order to remove/reset the reference.
You claim "... Nothing got broken just by removing most references onto it.". That entirely depends on the object implementing IStoppable and it's uses. Say, for example, my IStoppable object is an object cache. Now I forget about or ignore the event and suddenly I'm using a different object cache as the rest of the world... maybe that is ok, maybe not.
Events are a horrible way to provide behavior like this due to the fact that exceptions prove difficult to handle. What does it mean when the third out 10 event handlers throws an exception in the IStoppable.Stopped event?
I think what your trying to express is an object that may be 'owned' by many things and can be forcefully released by one? In this case you might consider using a reference counter pattern, more like old-school COM. That of course has issues as well, but they are less of a problem in a managed world.
The issue with a reference counter around an object is that you come back to the idea of an invalid/uninitialized object. One possible way to solve this is to provide the reference counter with a valid 'default' instance (or a factory delegate) to use when all references have been release and someone still wants an instance.
I think you have a misunderstanding of modern OO languages; in particular scope and garbage collection.
The lifetime of the objects are very much controlled by their scope. Whether the scope is limited to a using clause, a method, or even the appdomain.
Although you don't necessarily "care" about the lifetime of the object, the compiler does and will set it aside for garbage collection as soon as it goes out of scope.
You can speed up that process by purposely telling the garbage collector to run now, but that's usually a pointless exercise as the compiler will optimize the code to do so at the most opportune time anyway.
If you are talking about objects in multi-threaded applications, these already expose mechanisms to stop their execution or otherwise kill them on demand.
Which leaves us with unmanaged resources. For those, the wrapper should implement IDisposable. I'll skip talking about it as Reed Copsey has already covered that ground nicely.
While there are times a Disposed event (like the one used by Windows Forms) can be useful, events do add a fair bit of overhead. In cases where an object will keep all the IDisposables it ever owns until it's disposed (a common situation) it may be better to keep a List(Of IDisposable) and have a private function "T RegDisp<T>(T obj) where T:IDisposable" which will add an object to the disposables list and return it. Instead of setting a field to SomeDisposable, set it to RegDisp(SomeDisposable). Note that in VB, provided all constructor calls are wrapped in factory methods, it's possible to safely use RegDisp() within field initializers, but that cannot be done in C#.
Incidentally, if an IDisposable's constructor accepts an IDisposable as a parameter, it may often be helpful to have it accept a Boolean indicating whether or not ownership of that object will be transferred. If a possibly-owned IDisposable will be exposed in a mutable property (e.g. PictureBox.Image) the property itself should be read-only, with a setter method that accepts an ownership flag. Calling the set method when the object owns the old object should Dispose the old object before setting the new one. Using that approach will eliminate much of the need for a Disposed event.
In the current implementation of CPython, there is an object known as the "GIL" or "Global Interpreter Lock". It is essentially a mutex that prevents two Python threads from executing Python code at the same time. This prevents two threads from being able to corrupt the state of the Python interpreter, but also prevents multiple threads from really executing together. Essentially, if I do this:
# Thread A
some_list.append(3)
# Thread B
some_list.append(4)
I can't corrupt the list, because at any given time, only one of those threads are executing, since they must hold the GIL to do so. Now, the items in the list might be added in some indeterminate order, but the point is that the list isn't corrupted, and two things will always get added.
So, now to C#. C# essentially faces the same problem as Python, so, how does C# prevent this? I'd also be interested in hearing Java's story, if anyone knows it.
Clarification: I'm interested in what happens without explicit locking statements, especially to the VM. I am aware that locking primitives exist for both Java & C# - they exist in Python as well: The GIL is not used for multi-threaded code, other than to keep the interpreter sane. I am interested in the direct equivalent of the above, so, in C#, if I can remember enough... :-)
List<String> s;
// Reference to s is shared by two threads, which both execute this:
s.Add("hello");
// State of s?
// State of the VM? (And if sane, how so?)
Here's another example:
class A
{
public String s;
}
// Thread A & B
some_A.s = some_other_value;
// some_A's state must change: how does it change?
// Is the VM still in good shape afterwards?
I'm not looking to write bad C# code, I understand the lock statements. Even in Python, the GIL doesn't give you magic-multi-threaded code: you must still lock shared resources. But the GIL prevents Python's "VM" from being corrupted - it is this behavior that I'm interested in.
Most other languages that support threading don't have an equivalent of the Python GIL; they require you to use mutexes, either implicitly or explicitly.
Using lock, you would do this:
lock(some_list)
{
some_list.Add(3);
}
and in thread 2:
lock(some_list)
{
some_list.Add(4);
}
The lock statement ensures that the object inside the lock statement, some_list in this case, can only be accessed by a single thread at a time. See http://msdn.microsoft.com/en-us/library/c5kehkcz(VS.80).aspx for more information.
C# does not have an equivalent of GIL to Python.
Though they face the same issue, their design goals make them
different.
With GIL, CPython ensures that suche operations as appending a list
from two threads is simple. Which also
means that it would allow only one
thread to run at any time. This
makes lists and dictionaries thread safe. Though this makes the job
simpler and intuitive, it makes it
harder to exploit the multithreading
advantage on multicores.
With no GIL, C# does the opposite. It ensures that the burden of integrity is on the developer of the
program but allows you to take
advantage of running multiple threads
simultaneously.
As per one of the discussion -
The GIL in CPython is purely a design choice of having
a big lock vs a lock per object
and synchronisation to make sure that objects are kept in a coherent state.
This consist of a trade off - Giving up the full power of
multithreading.
It has been that most problems do not suffer from this disadvantage
and there are libraries which help you exclusively solve this issue when
required.
That means for a certain class of problems, the burden to utilize the
multicore is
passed to developer so that rest can enjoy the more simpler, intuitive
approach.
Note: Other implementation like IronPython do not have GIL.
It may be instructive to look at the documentation for the Java equivalent of the class you're discussing:
Note that this implementation is not synchronized. If multiple threads access an ArrayList instance concurrently, and at least one of the threads modifies the list structurally, it must be synchronized externally. (A structural modification is any operation that adds or deletes one or more elements, or explicitly resizes the backing array; merely setting the value of an element is not a structural modification.) This is typically accomplished by synchronizing on some object that naturally encapsulates the list. If no such object exists, the list should be "wrapped" using the Collections.synchronizedList method. This is best done at creation time, to prevent accidental unsynchronized access to the list:
List list = Collections.synchronizedList(new ArrayList(...));
The iterators returned by this class's iterator and listIterator methods are fail-fast: if the list is structurally modified at any time after the iterator is created, in any way except through the iterator's own remove or add methods, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Most complex datastructures(for example lists) can be corrupted when used without locking in multiple threads.
Since changes of references are atomic, a reference always stays a valid reference.
But there is a problem when interacting with security critical code. So any datastructures used by critical code most be one of the following:
Inaccessible from untrusted code, and locked/used correctly by trusted code
Immutable (String class)
Copied before use (valuetype parameters)
Written in trusted code and uses internal locking to guarantee a safe state
For example critical code cannot trust a list accessible from untrusted code. If it gets passed in a List, it has to create a private copy, do it's precondition checks on the copy, and then operate on the copy.
I'm going to take a wild guess at what the question really means...
In Python data structures in the interpreter get corrupted because Python is using a form of reference counting.
Both C# and Java use garbage collection and in fact they do use a global lock when doing a full heap collection.
Data can be marked and moved between "generations" without a lock. But to actually clean it up everything must come to a stop. Hopefully a very short stop, but a full stop.
Here is an interesting link on CLR garbage collection as of 2007:
http://vineetgupta.spaces.live.com/blog/cns!8DE4BDC896BEE1AD!1104.entry
MSDN gives the following warning about the lock keyword in C#:
In general, avoid locking on a public
type, or instances beyond your code's
control. The common constructs lock
(this), lock (typeof (MyType)), and
lock ("myLock") violate this
guideline:
* lock (this) is a problem if the instance can be accessed publicly.
* lock (typeof (MyType)) is a problem if MyType is publicly accessible.
Yet it gives no solid reasoning for it. The lock(this) is explained here on SO. I'm interested in lock(typeof(MyType)) case. What is dangerous about it?
Thank you.
It's dangerous because anything can take that lock so it's difficult or impossible to prevent a deadlock situation.
There used to be an article on this ("Don't Lock Type Objects!" a Dr. GUI article) in with some comments by Rico Mariani. Apparently the article is no longer directly available, but there are 'mirrors' floating around, including at http://bytes.com/topic/c-sharp/answers/249277-dont-lock-type-objects.
Here's an excerpt:
The basic problem here is that you don't own the type object, and you don't know who else could access it. In general, it's a very bad idea to rely on locking an object you didn't create and don't know who else might be accessing. Doing so invites deadlock. The safest way is to only lock private objects.
But wait; it's even worse than all that. As it turns out, type objects are sometimes shared across application domains (but not across processes) in current versions of the .NET runtime. (This is generally okay since they're immutable.) That means that it's possible for ANOTHER APPLICATION running even in a different application domain (but in the same process) to deadlock your application by getting a lock on a type object you want to lock and never releasing it. And it would be easy to get access to that type object because the object has a nameāthe fully qualified name of the type! Remember that lock/SyncLock blocks (that's a polite word for hangs) until it can obtain a lock. It's obviously really quite bad to rely on a lock that another program or component can lock and cause you to deadlock.
It's the same problem as with lock(this) - you're locking on a reference which other code has access to, so it could be locking on it too.
If you have two unrelated pieces of code locking on the same reference without intending to exclude each other, then in the best case you could lose a bit of performance due to a lack of concurrency - and in the worst case you could introduce a deadlock.
Because the result of typeof (MyType) (which is an object of type Type) is widely accessible and other thread can lock on the same object, and hold that lock indefinitely. Then internal logic of MyType has effectively given away significant control over it's synchronization logic. This might not be an actual problem if this is intended, but coding defensively/skeptically should be your modus operandi.
This wouldn't be "an issue" if this modified-parallel form of the advice were followed:
In general, avoid locking on a public type, or instances that you did not create or define. The common constructs lock (this), lock (typeof (MyType)) violate this guideline if you did not create the instance or declare the type..
However, as the above 'cannot be guaranteed' for public types or accessible instances across all encountered code, the MSDN and other sources argue that these should be avoided for Defensive Programming against a singular potential hard-to-detect runtime (Deadlock) issue. This is good advice given that most coders are not very good or diligent about rules..
..and someone who encountered such bug in the wild would be much more adamant about not allowing this specific issue to occur again, by imposing the guidelines stated. (Java 1.0/1.1 with the Threaded AWT UI Model was especially problematic.)
The case of lock ("mylock") is singularly special in that it should be avoided due to string-interning as one generally cannot "know" if they are violating the advice above..
because the target of a lock is ONLY to establish a place to store the lock boolean (Am I locked or not) for other threads to look at....
The common misconception that the target of a lock is actually somehow being locked is just wrong... What is "locked" is, .... nothing, unless in methods which can access some shared memory in an unsafe manner, you write the code to look at the this lock and not proceed until it has been released... using a Type object, as a lock target is wrong because code snippets anywhere in the entire solution process space can access that type object and change the synch block that the lock boolean is stored in. Creating a locally scoped object allows you to better ensurethat only those threads and methods that can access or mess with your "at risk" shared memory can also access and/or modify the lock.
It is stated also documentation under topic "Managed Threading Best Practices".
https://msdn.microsoft.com/en-us/library/1c9txz50(v=vs.110).aspx
It says;
Don't use types as lock objects. That is, avoid code such as
lock(typeof(X)) in C# or SyncLock(GetType(X)) in Visual Basic, or the
use of Monitor.Enter with Type objects. For a given type, there is
only one instance of System.Type per application domain. If the type
you take a lock on is public, code other than your own can take locks
on it, leading to deadlocks. For additional issues, see Reliability
Best Practices.
Use caution when locking on instances, for example lock(this) in C# or
SyncLock(Me) in Visual Basic. If other code in your application,
external to the type, takes a lock on the object, deadlocks could
occur.