Maybe the question sounds silly, but I don't understand 'something about threads and locking and I would like to get a confirmation (here's why I ask).
So, if I have 10 servers and 10 request in the same time come to each server, that's 100 request across the farm. Without locking, thats 100 request to the database.
If I do something like this:
private static readonly object myLockHolder = new object();
if (Cache[key] == null)
{
lock(myLockHolder)
{
if (Cache[key] == null)
{
Cache[key] = LengthyDatabaseCall();
}
}
}
How many database requests will I do? 10? 100? Or as much as I have threads?
You have a hierarchy of objects:
You have servers (10)
On each server you have processes (probably only 1 - your service/app pool)
In each process you have threads (probably many)
Your code will only prohibit threads within the same process on the same server access to modify the Cache object simultaneously. You can create locks across processes and even across servers, but the cost increases a lot as you move up the hierarchy.
Using the lock statement does not actually lock any threads. However, if one thread is executing code inside the lock (that is in the block of code following the lock statement) any other thread that wants to take the lock and execute the same code has to wait until the first thread holding the lock leaves the block of code and releases the lock.
The C# lock statement uses a Windows critical section which a lightweight locking mechanism. If you want to lock across processes you can use a mutex instead. To lock across servers you can use a database or a shared file.
As dkackman has pointed out .NET has the concept of an AppDomain that is a kind of lightweight process. You can have multiple AppDomains per process. The C# lock statement only locks a resource within a single AppDomain, and a proper description of the hierarchy would include the AppDomain below the process and above the threads. However, quite often you only have a single AppDomain in a process making the distinction somewhat irrelevant.
The C# lock statement locks on a particular instance of an object (the object you created with new object()). Objects are (in most cases) not shared across AppDomains, thus if you are having 10 servers, 10 threads can run concurrently access your database with that piece of code.
Lock is not blocking threads.
It is locking some instance of an object. And each thread which tries to access it is blocked.
So in your case each thread which will try to access myLockHolder will be locked and not all the threads.
In other words we can say that Lock statement is syntactic sugar for using Critical Section.
Like you can see in MSDN :
lock(expression) statement block
where:
expression Specifies the object that you want to lock on. expression must
be a reference type. Typically,
expression will either be this, if you
want to protect an instance variable,
or typeof(class), if you want to
protect a static variable (or if the
critical section occurs in a static
method in the given class).
statement block The statements of the critical section.
lock will block all threads in that application from accessing the myLockHolder object.
So if you have 10 instances of the application running you'll get 10 requests to the server while the object is locked on each. The moment you exit the lock statement, the next request will process in that application, but as long as Cache[key] is not null, it won't access the database..
The number of actual requests you get depends on what happens here:
if (Cache[key] == null)
{
Cache[key] = LengthyDatabaseCall();
}
If LengthyDatabaseCall(); fails, the next request will try and access the database server and retrieve the information as well, so really your best case scenario is that there will only be 10 requests to the server.
Only the threads that need access to your shared variable at the moment another thread is using it will go into a wait state.
how many that is at any give time is hard to determine.
Your DB will get 10 requests, with odds being good that requests 2-10 run much faster than request 1.
Related
I am using TvdbLib in a program. This library can use a cache for loading TV series quicker. To further improve the speed of the program, I do all my loading of TV series on separate threads. When two threads run simultaneously and try to read/write from the cache simultaneously, I will get the following error:
The process cannot access the file
'C:\BinaryCache\79349\series_79349.ser' because it is being used by
another process.
Does anyone know how to avoid this and still have the program running smoothly?
CacheProvider is not built for being used in multi-threaded scenarios... either use it in one thread only or lock on every access via a shared object or supply every thread with its own CacheProvider and its own distinct _root directory (in the constructor).
You can use the lock statement to ensure only one thread is accessing the cache at the same time:
http://msdn.microsoft.com/en-us/library/c5kehkcz(v=vs.71).aspx
From the error I assume that TvdbLib does not support multiple concurrent threads accessing the same cache. As it is an open source project, you could get the source code and implement your own protection around the cache access, e.g., using the lock statement. Of course, you could lock within your own code before it calls TvdbLib, but because this will be a higher level, the lock will be maintained for longer and you may not get the fine-grained concurrency that you want.
I read about lock, though not understood nothing at all.
My question is why do we use a un-used object and lock that and how this makes something thread-safe or how this helps in multi-threading ? Isn't there other way to make thread-safe code.
public class test {
private object Lock { get; set; }
...
lock (this.Lock) { ... }
...
}
Sorry is my question is very stupid, but i don't understand, although i've used it many times.
Accessing a piece of data from one thread while other thread is modifying it is called "data race condition" (or just "data race") and can lead to corruption of data. (*)
Locks are simply a mechanism for avoiding data races. If two (or more) concurrent threads lock the same lock object, then they are no longer concurrent and can no longer cause data races, for the duration of the lock. Essentially, we are serializing the access to shared data.
The trick is to keep your locks as "wide" as you must to avoid data races, yet as "narrow" as you can to gain performance through concurrent execution. This is a fine balance that can easily go out of whack in either direction, which is why multi-threaded programming is hard.
Some guidelines:
As long all threads are just reading the data and none will ever modify it, lock is unnecessary.
Conversely, if at least one thread might at some point modify the data, then all concurrent code paths accessing that same data must be properly serialized through locks, even those that only read the data.
Using a lock in one code path but not the other will leave the data wide open to race conditions.
Also, using one lock object in one code path, but a different lock object in another (concurrent) code path does not serialize these code paths and leaves you wide open to data races.
On the other hand, if two concurrent code paths access different data, they can use different lock objects. But, whenever there is more than one lock object, watch out for deadlocks. A deadlock is often also a "code race condition" (and a heisenbug, see below).
The lock object does not need to be (and usually isn't) the same thing as the data you are trying to protect. Unfortunately, there is no language facility that lets you "declare" which data is protected by which lock object, so you'll have to very carefully document your "locking convention" both for other people that might maintain your code, and for yourself (since even after a short time you will forget some of the nooks and crannies of your locking convention).
It's usually a good idea to protect the lock object from the outside world as much as you can. After all, you are using it for the very sensitive task of locking and you don't want it locked by external actors in unforeseen ways. That's why using this or a public field as a lock object is usually a bad idea.
The lock keyword is simply a more convenient syntax for Monitor.Enter and Monitor.Exit.
The lock object can be any object in .NET, but value objects will be boxed in the call to Monitor.Enter, which means threads will not share the same lock object, leaving the data unprotected. Therefore, only use reference types as lock objects.
For inter-process communication you can use a global mutex, which can be created by passing a non-empty name to Mutex Constructor. Global mutexes provide essentially the same functionality as regular "local" locking, except they can be shared between separate processes.
There are synchronization mechanisms other than locks, such as semaphores, condition variables, message queues or atomic operations. Be careful when mixing different synchronization mechanisms.
Locks also behave as memory barriers, which is increasingly important on modern multi-core, multi-cache CPUs. This is part of the reason why you need locks on reading the data and not just writing.
(*) It is called "race" because concurrent threads are "racing" towards performing an operation on the shared data and whoever wins that race determines the outcome of the operation. So the outcome depends on timing of the execution, which is essentially random on modern preemptive multitasking OSes. Worse yet, timing is easily modified by a simple act of observing the program execution through tools such as debugger, which makes them "heisenbugs" (i.e. the phenomenon being observed is changed by the mere act of observation).
Lock object is like a door into the single room where only one guest per time can enter.
The room can be your data, the guest can be your function.
define data (room)
add door (lock object)
invite guests (functions)
using lock insctruction close/open door to allow only one guest per time enter into the room.
Why we need this? If you simulatniously write a data in a file (just an example, can be 1000s others) you will need to sync an access of your funcitons (close/open door for guests) to the write file, so any function will append to the end of the file (assuming that is requierement of this example)
This is naturally not only way sync the threads, there are more out there:
Monitors
Wait hadlers
...
Check out the link for complete information and description of each of them
Thread Synchronization
Yes, there is indeed another way:
using System.Runtime.CompilerServices;
class Test
{
private object Lock { get; set; }
[MethodImpl(MethodImplOptions.Synchronized)]
public void Foo()
{
// Now this instance is locked
}
}
While it looks more "natural", it's not used often, because of the fact that the object is locking on itself this way, so other code could not risk locking on this object -- it could cause a deadlock.
Because of this, you usually create a (lazy-initialized) private field referring to an object, and use that object as a lock instead. This will guarantee that no one else can lock against the same object as you.
A little more detail on what's happening beneath the hood:
When you "lock on an object", you're not locking on the object itself. Rather, you're using the object as a guaranteed-to-be-unique-address-in-memory throughout your program. When you "lock", the runtime takes the object's address, uses it to look up the actual lock inside another table (which is hidden from you), and uses that object as the ""lock" (also known as a "critical section").
So really, for you, an object is just a proxy/symbol -- it isn't doing anything by itself; it's just acting as a unique indicator that will never clash with another valid object in the same program.
When you have different threads accessing same variable/resource at the same time they may over write on this variable/resource and you can have unexpected results. Lock will make sure only one thread can assess variable at on time and remain thread will queue to get access to this variable/resource till lock is released
suppose we have balance variable of an account.
Two different thread read its value which was 100
Suppose first thread adds 50 to it like 100 + 50 and saves it and balance will have 150
As second thread already read 100 and mean while. suppose it subtract 50 like 100-50 but point to note here is that first thread has made the balance 150 so second thread should to 150-50 this could cause serious problems.
So lock makes sure that when on thread wants to change some resource states it locks it and leaves after committing change
The lock statement introduces the concept of mutual exclusion. Only one thread can acquire a lock on a given object at any one time. This prevents threads from accessing shared data structures concurrently, thus corrupting them.
If other threads already hold a lock, the lock statement will block until it is able to acquire an exclusive lock on its argument before allowing its block to execute.
Note that the only thing lock does is control entry to the block of code. Access to members of the class is completely unrelated to the lock. It is up to the class itself to ensure that accesses that must be synchronized are coordinated by the use of lock or other synchronization primitives. Also note that access to some or all members may not have to be synchronized. For instance, if you want to maintain a counter, you could use the Interlocked class without locking.
An alternative to locking is lock-free data structures, which behave correctly in the presence of multiple threads. Operations on lock-free data structures must be designed very carefully, usually with the assistance of lock-free primitives such as compare-and-swap (CAS).
The general theme of such techniques is to try to perform operations on data structures atomically and detect when operations fail due to concurrent actions by other threads, followed by retries. This works well on a lightly loaded system where failures are unlikely, but can produce runaway behaviour as the failure rate climbs and retries become a dominant load. This problem can be ameliorated by backing off the retry rate, effectively throttling the load.
A more sophisticated alternative is software transactional memory. Unlike CAS, STM generalizes the concept of fail-and-retry to arbitrarily complex memory operations. In simple terms, you start a transaction, perform all your operations, and finally commit. The system detects if the operations cannot succeed due to conflicting operations performed by other threads that beat the current thread to the punch. In such cases, STM can either fail outright, requiring the application to take corrective action, or, in more sophisticated implementations, it can automatically go back to the start of the transaction and try again.
Your confusion is pretty typical for those just getting familiar with the lock keyword in C#. You are right, the object used in the lock statement is really nothing more than a token that defines a critical section. That object, in no way, has any protection from multithreaded access itself.
The way this works is that the CLR reserves a 4 byte (32-bit systems) section in the object header (type handle) called the sync block. The sync block is nothing more than an index into an array that stores the actual critical section information. When you use the lock keyword the CLR will modify this sync block value accordingly.
There are advantages and disadvantages to this scheme. The advantage is that it made for a fairly elegant solution to defining critical sections. One obvious disadvantage is that each object instance contains the sync block and most instances never use it so it would seem to be a waste of space in most cases. Another disadvantage is that boxed value types can be used which is almost always wrong and certainly leads to confusion.
I remember way back when .NET was first released that there was a lot of chatter over whether the lock keyword was good or bad for the language. The general consensus (at least as I remember it) was that it was bad because the using keyword could have been easily used instead. In fact, a solution that used the using keyword actually would have made more sense because it could have been done without the need for the sync block. The c# design team even went on record to say that had they been given a second chance the lock keyword never would have made it into the language.1
1The only reference I could find for this is on Jon Skeet's website here.
I am making use of the C# code located at the following links to implement a Ram-disk project.
Link to description of source code
Link to source code
As a summary, the code indicated above makes use of a simple tree structure to store the directories, sub-directories and files. At the root is a MemoryFolder object which stores zero or more 'MemoryFolder' objects and/or MemoryFile objects. Each MemoryFolder object in turn stores zero or more MemoryFolder objects and/or MemoryFile objects and so forth up to an unlimited depth.
However, the code is not thread safe. What is the most elegant way of implementing thread safety? In addition, how should the following non-exhaustive list of multithreading requirements for a typical file system be enforced by using the appropriate locking strategy?
The creation of two different folder (each by a different thread) simultaneously under the same
parent folder can occur concurrently if the thread safe
implementation allows it. Otherwise, some locking strategy should be
implemented to only allow sequential creation.
None of the direct or indirect parent folders of the folder
containing a specific file (that is currently read by another
thread) propagating all the way up to the root folder can be moved
or deleted by another thread until the ReadFile thread completes its
execution.
With regards to each unique file, allows concurrent access for multiple ReadFile threads but restricting access to a single WriteFile thread.
If two separate ReadFile threads (fired almost simultaneously),
each from a different application attempts to create a folder with
the same name (assuming that the folder does not already exist
before both threads are fired), the first thread that enters the
Ram-Disk always succeeds while the second one always fails. In other
words, the order of thread execution is deterministic.
The total disk space calculation method GetDiskFreeSpace running
under a separate thread should not complete its execution until all
WriteFile threads that are already in progress complete its execution. All subsequent WriteFile threads that have not begun executing are blocked until the GetDiskFreeSpace thread completes its execution.
The easiest way to do this would be to protect the entire tree with a ReaderWriterLockSlim. That allows concurrent access by multiple readers or exclusive access by a single writer. Any method that will modify the structure in any way will have to acquire the write lock, and no other threads will be allowed to read or write to the structure until that thread releases the write lock.
Any thread that wants to read the structure has to acquire the read lock. Multiple readers can acquire the read lock concurrently, but if a thread wants to acquire the write lock--which means waiting until all existing read locks are released.
There might be a way to make that data structure lock-free. Doing so, however, could be quite difficult. The reader/writer lock will give you the functionality you want, and I suspect it would be fast enough.
If you want to share this across processes, that's another story. The ReaderWriterLockSlim doesn't work across processes. You could, however, implement something similar using a combination of the synchronization primitives, or create a device driver (or service) that serves the requests, thereby keeping it all in the same process.
The environment:
3 web services 2 in the same pool 1 in a different application pool.
They all have the same code trying to access something that is not thread safe say a file that they write to.
I try and lock this code the same way for each web service. I'm not sure if the lock keyword is doing what I want.
One lock I try is this in each web service:
string stringValue
lock (stringValue)
The other lock I try is:
lock (typeof(MyWebServiceClass))
Will these locks prevent any simultaneous writes to the file while it is in use? In this case there are multiple clients hitting each of these web services.
You need a named Mutex to lock on across application pools /processes:
The C# lock keyword is syntactic sugar for Monitor.Enter(), Monitor.Exit() method calls in a try/finally block. Monitor is a light weight (fully managed) synchronization primitive for in-process locking.
A Mutex on the other hand can be either local or global (across processes on the same machine) - global mutexes, also called named mutexes, are visible throughout the operating system, and can be used to synchronize threads in multiple application domains or processes. Also see MSDN.
I think you need to use a Mutex to lock between AppDomains.
Also, for what its worth, avoid locking on a type. That can often result in deadlocks if code elsewhere tries to lock after the first lock has been obtained. It's best to lock on an object whose only purpose is to act as a lock.
For example:
object padlock;
lock(padlock)
{
// work here
}
I am writing a server application which processes request from multiple clients. For the processing of requests I am using the threadpool.
Some of these requests modify a database record, and I want to restrict the access to that specific record to one threadpool thread at a time. For this I am using named semaphores (other processes are also accessing these records).
For each new request that wants to modify a record, the thread should wait in line for its turn.
And this is where the question comes in:
As I don't want the threadpool to fill up with threads waiting for access to a record, I found the RegisterWaitForSingleObject method in the threadpool.
But when I read the documentation (MSDN) under the section Remarks:
New wait threads are created automatically when required. ...
Does this mean that the threadpool will fill up with wait-threads? And how does this affect the performance of the threadpool?
Any other suggestions to boost performance is more than welcome!
Thanks!
Your solution is a viable option. In the absence of more specific details I do not think I can offer other tangible options. However, let me try to illustrate why I think your current solution is, at the very least, based on sound theory.
Lets say you have 64 requests that came in simultaneously. It is reasonable to assume that the thread pool could dispatch each one of those requests to a thread immediately. So you might have 64 threads that immediately begin processing. Now lets assume that the mutex has already been acquired by another thread and it is held for a really long time. That means those 64 threads will be blocked for a long time waiting for the thread that currently owns the mutex to release it. That means those 64 threads are wasted on doing nothing.
On the other hand, if you choose to use RegisterWaitForSingleObject as opposed to using a blocking call to wait for the mutex to be released then you can immediately release those 64 waiting threads (work items) and allow them to be put back into the pool. If I were to implement my own version of RegisterWaitForSingleObject then I would use the WaitHandle.WaitAny method which allows me to specify up to 64 handles (I did not randomly choose 64 for the number of requests afterall) in a single blocking method call. I am not saying it would be easy, but I could replace my 64 waiting threads for only a single thread from the pool. I do not know how Microsoft implemented the RegisterWaitForSingleObject method, but I am guessing they did it in a manner that is at least as efficient as my strategy. To put this another way, you should be able to reduce the number of pending work items in the thread pool by at least a factor of 64 by using RegisterWaitForSingleObject.
So you see, your solution is based on sound theory. I am not saying that your solution is optimal, but I do believe your concern is unwarranted in regards to the specific question asked.
IMHO you should let the database do its own synchronization. All you need to do is to ensure that you're sync'ed within your process.
Interlocked class might be a premature optimization that is too complex to implement. I would recommend using higher-level sync objects, such as ReaderWriterLockSlim. Or better yet, a Monitor.
An approach to this problem that I've used before is to have the first thread that gets one of these work items be responsible for any other ones that occur while it's processing the work item(s), This is done by queueing the work items then dropping into a critical section to process the queue. Only the 'first' thread will drop into the critical section. If a thread can't get the critical section, it'll leave and let the thread already operating in the critical section handle the queued object.
It's really not very complicated - the only thing that might not be obvious is that when leaving the critical section, the processing thread has to do it in a way that doesn't potentially leave a late-arriving workitem on the queue. Basically, the 'processing' critical section lock has to be released while holding the queue lock. If not for this one requirement, a synchronized queue would be sufficient, and the code would really be simple!
Pseudo code:
// `workitem` is an object that contains the database modification request
//
// `queue` is a Queue<T> that can hold these workitem requests
//
// `processing_lock` is an object use to provide a lock
// to indicate a thread is processing the queue
// any number of threads can call this function, but only one
// will end up processing all the workitems.
//
// The other threads will simply drop the workitem in the queue
// and leave
void threadpoolHandleDatabaseUpdateRequest(workitem)
{
// put the workitem on a queue
Monitor.Enter(queue.SyncRoot);
queue.Enqueue(workitem);
Monitor.Exit(queue.SyncRoot);
bool doProcessing;
Monitor.TryEnter(processing_queue, doProcessing);
if (!doProcessing) {
// another thread has the processing lock, it'll
// handle the workitem
return;
}
for (;;) {
Monitor.Enter(queue.SyncRoot);
if (queue.Count() == 0) {
// done processing the queue
// release locks in an order that ensures
// a workitem won't get stranded on the queue
Monitor.Exit(processing_queue);
Monitor.Exit(queue.SyncRoot);
break;
}
workitem = queue.Dequeue();
Monitor.Exit(queue.SyncRoot);
// this will get the database mutex, do the update and release
// the database mutex
doDatabaseModification(workitem);
}
}
ThreadPool creates a wait thread for ~64 waitable objects.
Good comments are here: Thread.sleep vs Monitor.Wait vs RegisteredWaitHandle?