I am using TvdbLib in a program. This library can use a cache for loading TV series quicker. To further improve the speed of the program, I do all my loading of TV series on separate threads. When two threads run simultaneously and try to read/write from the cache simultaneously, I will get the following error:
The process cannot access the file
'C:\BinaryCache\79349\series_79349.ser' because it is being used by
another process.
Does anyone know how to avoid this and still have the program running smoothly?
CacheProvider is not built for being used in multi-threaded scenarios... either use it in one thread only or lock on every access via a shared object or supply every thread with its own CacheProvider and its own distinct _root directory (in the constructor).
You can use the lock statement to ensure only one thread is accessing the cache at the same time:
http://msdn.microsoft.com/en-us/library/c5kehkcz(v=vs.71).aspx
From the error I assume that TvdbLib does not support multiple concurrent threads accessing the same cache. As it is an open source project, you could get the source code and implement your own protection around the cache access, e.g., using the lock statement. Of course, you could lock within your own code before it calls TvdbLib, but because this will be a higher level, the lock will be maintained for longer and you may not get the fine-grained concurrency that you want.
Related
I have .net windows service which gets list of image files from a folder and do some conversion and sent the converted files to another directory. I want achive more throughput by adding another instance of serVice watching same folder. I want 2 instances process files independently without any duplicate processing.
What patterns can be used?
Is file locking would work for this ?
Don't want to use database or any other messaging platform.
I Can use text files etc to create synchronization if needed.
If using .net I would consider creating multiple threads (using TPL in .net) that would be used to process the files in parallel. This way you have a single process that has control over the entire process. Hence no need to track what process (exe) is processing a file, no databases, no locking, etc..
However if you wish to have multiple processes processing the files, then one option of synchronizing the processing is to make use of a Mutex.
I would use this option along with Solution 1.
I.e. use TPL (multiple threads) in one service. And also use Mutexes. This way you have the benefit of multiple threads and multiple services. Hopefully this is what you are after.
https://msdn.microsoft.com/en-us/library/bwe34f1k(v=vs.110).aspx
Before processing any file, create a Mutex with a particular name and if ownership has been granted, then continue processing the file. If ownership hasn't been granted you can safely assume that another process or another thread (within the same application) has acquired a lock on this Mutex, meaning another process/thread is already processing the file.
Sample code:
var fileMutex = new Mutex(true, "File Name", out mutexWasCreated);
if (mutexWasCreated){
//Some other process/thread is processing this file, so nothing to do
}
else {
//Start processing the file
}
If one service (exe) goes down, then the threads would die, meaning the mutexes would be released and those files will be available for processing by another process.
I read about lock, though not understood nothing at all.
My question is why do we use a un-used object and lock that and how this makes something thread-safe or how this helps in multi-threading ? Isn't there other way to make thread-safe code.
public class test {
private object Lock { get; set; }
...
lock (this.Lock) { ... }
...
}
Sorry is my question is very stupid, but i don't understand, although i've used it many times.
Accessing a piece of data from one thread while other thread is modifying it is called "data race condition" (or just "data race") and can lead to corruption of data. (*)
Locks are simply a mechanism for avoiding data races. If two (or more) concurrent threads lock the same lock object, then they are no longer concurrent and can no longer cause data races, for the duration of the lock. Essentially, we are serializing the access to shared data.
The trick is to keep your locks as "wide" as you must to avoid data races, yet as "narrow" as you can to gain performance through concurrent execution. This is a fine balance that can easily go out of whack in either direction, which is why multi-threaded programming is hard.
Some guidelines:
As long all threads are just reading the data and none will ever modify it, lock is unnecessary.
Conversely, if at least one thread might at some point modify the data, then all concurrent code paths accessing that same data must be properly serialized through locks, even those that only read the data.
Using a lock in one code path but not the other will leave the data wide open to race conditions.
Also, using one lock object in one code path, but a different lock object in another (concurrent) code path does not serialize these code paths and leaves you wide open to data races.
On the other hand, if two concurrent code paths access different data, they can use different lock objects. But, whenever there is more than one lock object, watch out for deadlocks. A deadlock is often also a "code race condition" (and a heisenbug, see below).
The lock object does not need to be (and usually isn't) the same thing as the data you are trying to protect. Unfortunately, there is no language facility that lets you "declare" which data is protected by which lock object, so you'll have to very carefully document your "locking convention" both for other people that might maintain your code, and for yourself (since even after a short time you will forget some of the nooks and crannies of your locking convention).
It's usually a good idea to protect the lock object from the outside world as much as you can. After all, you are using it for the very sensitive task of locking and you don't want it locked by external actors in unforeseen ways. That's why using this or a public field as a lock object is usually a bad idea.
The lock keyword is simply a more convenient syntax for Monitor.Enter and Monitor.Exit.
The lock object can be any object in .NET, but value objects will be boxed in the call to Monitor.Enter, which means threads will not share the same lock object, leaving the data unprotected. Therefore, only use reference types as lock objects.
For inter-process communication you can use a global mutex, which can be created by passing a non-empty name to Mutex Constructor. Global mutexes provide essentially the same functionality as regular "local" locking, except they can be shared between separate processes.
There are synchronization mechanisms other than locks, such as semaphores, condition variables, message queues or atomic operations. Be careful when mixing different synchronization mechanisms.
Locks also behave as memory barriers, which is increasingly important on modern multi-core, multi-cache CPUs. This is part of the reason why you need locks on reading the data and not just writing.
(*) It is called "race" because concurrent threads are "racing" towards performing an operation on the shared data and whoever wins that race determines the outcome of the operation. So the outcome depends on timing of the execution, which is essentially random on modern preemptive multitasking OSes. Worse yet, timing is easily modified by a simple act of observing the program execution through tools such as debugger, which makes them "heisenbugs" (i.e. the phenomenon being observed is changed by the mere act of observation).
Lock object is like a door into the single room where only one guest per time can enter.
The room can be your data, the guest can be your function.
define data (room)
add door (lock object)
invite guests (functions)
using lock insctruction close/open door to allow only one guest per time enter into the room.
Why we need this? If you simulatniously write a data in a file (just an example, can be 1000s others) you will need to sync an access of your funcitons (close/open door for guests) to the write file, so any function will append to the end of the file (assuming that is requierement of this example)
This is naturally not only way sync the threads, there are more out there:
Monitors
Wait hadlers
...
Check out the link for complete information and description of each of them
Thread Synchronization
Yes, there is indeed another way:
using System.Runtime.CompilerServices;
class Test
{
private object Lock { get; set; }
[MethodImpl(MethodImplOptions.Synchronized)]
public void Foo()
{
// Now this instance is locked
}
}
While it looks more "natural", it's not used often, because of the fact that the object is locking on itself this way, so other code could not risk locking on this object -- it could cause a deadlock.
Because of this, you usually create a (lazy-initialized) private field referring to an object, and use that object as a lock instead. This will guarantee that no one else can lock against the same object as you.
A little more detail on what's happening beneath the hood:
When you "lock on an object", you're not locking on the object itself. Rather, you're using the object as a guaranteed-to-be-unique-address-in-memory throughout your program. When you "lock", the runtime takes the object's address, uses it to look up the actual lock inside another table (which is hidden from you), and uses that object as the ""lock" (also known as a "critical section").
So really, for you, an object is just a proxy/symbol -- it isn't doing anything by itself; it's just acting as a unique indicator that will never clash with another valid object in the same program.
When you have different threads accessing same variable/resource at the same time they may over write on this variable/resource and you can have unexpected results. Lock will make sure only one thread can assess variable at on time and remain thread will queue to get access to this variable/resource till lock is released
suppose we have balance variable of an account.
Two different thread read its value which was 100
Suppose first thread adds 50 to it like 100 + 50 and saves it and balance will have 150
As second thread already read 100 and mean while. suppose it subtract 50 like 100-50 but point to note here is that first thread has made the balance 150 so second thread should to 150-50 this could cause serious problems.
So lock makes sure that when on thread wants to change some resource states it locks it and leaves after committing change
The lock statement introduces the concept of mutual exclusion. Only one thread can acquire a lock on a given object at any one time. This prevents threads from accessing shared data structures concurrently, thus corrupting them.
If other threads already hold a lock, the lock statement will block until it is able to acquire an exclusive lock on its argument before allowing its block to execute.
Note that the only thing lock does is control entry to the block of code. Access to members of the class is completely unrelated to the lock. It is up to the class itself to ensure that accesses that must be synchronized are coordinated by the use of lock or other synchronization primitives. Also note that access to some or all members may not have to be synchronized. For instance, if you want to maintain a counter, you could use the Interlocked class without locking.
An alternative to locking is lock-free data structures, which behave correctly in the presence of multiple threads. Operations on lock-free data structures must be designed very carefully, usually with the assistance of lock-free primitives such as compare-and-swap (CAS).
The general theme of such techniques is to try to perform operations on data structures atomically and detect when operations fail due to concurrent actions by other threads, followed by retries. This works well on a lightly loaded system where failures are unlikely, but can produce runaway behaviour as the failure rate climbs and retries become a dominant load. This problem can be ameliorated by backing off the retry rate, effectively throttling the load.
A more sophisticated alternative is software transactional memory. Unlike CAS, STM generalizes the concept of fail-and-retry to arbitrarily complex memory operations. In simple terms, you start a transaction, perform all your operations, and finally commit. The system detects if the operations cannot succeed due to conflicting operations performed by other threads that beat the current thread to the punch. In such cases, STM can either fail outright, requiring the application to take corrective action, or, in more sophisticated implementations, it can automatically go back to the start of the transaction and try again.
Your confusion is pretty typical for those just getting familiar with the lock keyword in C#. You are right, the object used in the lock statement is really nothing more than a token that defines a critical section. That object, in no way, has any protection from multithreaded access itself.
The way this works is that the CLR reserves a 4 byte (32-bit systems) section in the object header (type handle) called the sync block. The sync block is nothing more than an index into an array that stores the actual critical section information. When you use the lock keyword the CLR will modify this sync block value accordingly.
There are advantages and disadvantages to this scheme. The advantage is that it made for a fairly elegant solution to defining critical sections. One obvious disadvantage is that each object instance contains the sync block and most instances never use it so it would seem to be a waste of space in most cases. Another disadvantage is that boxed value types can be used which is almost always wrong and certainly leads to confusion.
I remember way back when .NET was first released that there was a lot of chatter over whether the lock keyword was good or bad for the language. The general consensus (at least as I remember it) was that it was bad because the using keyword could have been easily used instead. In fact, a solution that used the using keyword actually would have made more sense because it could have been done without the need for the sync block. The c# design team even went on record to say that had they been given a second chance the lock keyword never would have made it into the language.1
1The only reference I could find for this is on Jon Skeet's website here.
I have a program that runs on multithread but all of them need to save results to same text file
I get access violation error
how can i avoid doing that
Wrap file IO into a lock statement:
private static object _syncRoot = new object();
and then:
lock(_syncRoot)
{
// do whatever you have to do with this file
}
Take a look at the lock statement: http://msdn.microsoft.com/en-us/library/c5kehkcz.aspx
The simplest is to simply make sure you have some locking construct (mutex, monitor, etc) against access to the file, then each thread can access it in isolation. This could either be accessing the same underlying Stream/TextWriter/etc, or could be opening/closing the file inside the locked region.
A more complex approach would be to have a dedicated writer thread, and a synchronised work queue. Then all threads can add to the queue and a single thread draughts and writes to the file. This means your main threads are only blocked while adding to a queue (very brief), rather than blocked on IO (slower). However, note that if the process exits abnormally, data in the queue may be lost.
I would recommend reading up on the ReaderWriterLock class or the ReaderWriterLockSlim class which is faster but has some gotchas, I believe it would suit your needs perfectly.
ReaderWriterLock
ReaderWriterLockSlim
I am making use of the C# code located at the following links to implement a Ram-disk project.
Link to description of source code
Link to source code
As a summary, the code indicated above makes use of a simple tree structure to store the directories, sub-directories and files. At the root is a MemoryFolder object which stores zero or more 'MemoryFolder' objects and/or MemoryFile objects. Each MemoryFolder object in turn stores zero or more MemoryFolder objects and/or MemoryFile objects and so forth up to an unlimited depth.
However, the code is not thread safe. What is the most elegant way of implementing thread safety? In addition, how should the following non-exhaustive list of multithreading requirements for a typical file system be enforced by using the appropriate locking strategy?
The creation of two different folder (each by a different thread) simultaneously under the same
parent folder can occur concurrently if the thread safe
implementation allows it. Otherwise, some locking strategy should be
implemented to only allow sequential creation.
None of the direct or indirect parent folders of the folder
containing a specific file (that is currently read by another
thread) propagating all the way up to the root folder can be moved
or deleted by another thread until the ReadFile thread completes its
execution.
With regards to each unique file, allows concurrent access for multiple ReadFile threads but restricting access to a single WriteFile thread.
If two separate ReadFile threads (fired almost simultaneously),
each from a different application attempts to create a folder with
the same name (assuming that the folder does not already exist
before both threads are fired), the first thread that enters the
Ram-Disk always succeeds while the second one always fails. In other
words, the order of thread execution is deterministic.
The total disk space calculation method GetDiskFreeSpace running
under a separate thread should not complete its execution until all
WriteFile threads that are already in progress complete its execution. All subsequent WriteFile threads that have not begun executing are blocked until the GetDiskFreeSpace thread completes its execution.
The easiest way to do this would be to protect the entire tree with a ReaderWriterLockSlim. That allows concurrent access by multiple readers or exclusive access by a single writer. Any method that will modify the structure in any way will have to acquire the write lock, and no other threads will be allowed to read or write to the structure until that thread releases the write lock.
Any thread that wants to read the structure has to acquire the read lock. Multiple readers can acquire the read lock concurrently, but if a thread wants to acquire the write lock--which means waiting until all existing read locks are released.
There might be a way to make that data structure lock-free. Doing so, however, could be quite difficult. The reader/writer lock will give you the functionality you want, and I suspect it would be fast enough.
If you want to share this across processes, that's another story. The ReaderWriterLockSlim doesn't work across processes. You could, however, implement something similar using a combination of the synchronization primitives, or create a device driver (or service) that serves the requests, thereby keeping it all in the same process.
Maybe the question sounds silly, but I don't understand 'something about threads and locking and I would like to get a confirmation (here's why I ask).
So, if I have 10 servers and 10 request in the same time come to each server, that's 100 request across the farm. Without locking, thats 100 request to the database.
If I do something like this:
private static readonly object myLockHolder = new object();
if (Cache[key] == null)
{
lock(myLockHolder)
{
if (Cache[key] == null)
{
Cache[key] = LengthyDatabaseCall();
}
}
}
How many database requests will I do? 10? 100? Or as much as I have threads?
You have a hierarchy of objects:
You have servers (10)
On each server you have processes (probably only 1 - your service/app pool)
In each process you have threads (probably many)
Your code will only prohibit threads within the same process on the same server access to modify the Cache object simultaneously. You can create locks across processes and even across servers, but the cost increases a lot as you move up the hierarchy.
Using the lock statement does not actually lock any threads. However, if one thread is executing code inside the lock (that is in the block of code following the lock statement) any other thread that wants to take the lock and execute the same code has to wait until the first thread holding the lock leaves the block of code and releases the lock.
The C# lock statement uses a Windows critical section which a lightweight locking mechanism. If you want to lock across processes you can use a mutex instead. To lock across servers you can use a database or a shared file.
As dkackman has pointed out .NET has the concept of an AppDomain that is a kind of lightweight process. You can have multiple AppDomains per process. The C# lock statement only locks a resource within a single AppDomain, and a proper description of the hierarchy would include the AppDomain below the process and above the threads. However, quite often you only have a single AppDomain in a process making the distinction somewhat irrelevant.
The C# lock statement locks on a particular instance of an object (the object you created with new object()). Objects are (in most cases) not shared across AppDomains, thus if you are having 10 servers, 10 threads can run concurrently access your database with that piece of code.
Lock is not blocking threads.
It is locking some instance of an object. And each thread which tries to access it is blocked.
So in your case each thread which will try to access myLockHolder will be locked and not all the threads.
In other words we can say that Lock statement is syntactic sugar for using Critical Section.
Like you can see in MSDN :
lock(expression) statement block
where:
expression Specifies the object that you want to lock on. expression must
be a reference type. Typically,
expression will either be this, if you
want to protect an instance variable,
or typeof(class), if you want to
protect a static variable (or if the
critical section occurs in a static
method in the given class).
statement block The statements of the critical section.
lock will block all threads in that application from accessing the myLockHolder object.
So if you have 10 instances of the application running you'll get 10 requests to the server while the object is locked on each. The moment you exit the lock statement, the next request will process in that application, but as long as Cache[key] is not null, it won't access the database..
The number of actual requests you get depends on what happens here:
if (Cache[key] == null)
{
Cache[key] = LengthyDatabaseCall();
}
If LengthyDatabaseCall(); fails, the next request will try and access the database server and retrieve the information as well, so really your best case scenario is that there will only be 10 requests to the server.
Only the threads that need access to your shared variable at the moment another thread is using it will go into a wait state.
how many that is at any give time is hard to determine.
Your DB will get 10 requests, with odds being good that requests 2-10 run much faster than request 1.