Thread safe mechanism to swap immutable references - c#

Let's say I have a complex object that is immutable that stores a bunch of data. Let's call this type MyDataCache.
Now, let's say I have a MyDataCacheManager which holds a reference to a MyDataCache. Callers can ask the manager for a reference of the cache and it will give that. There are 1000s of calls per second to get a reference for reading from the cache.
Very infrequently (once every few hours), the cache needs to be updated. So my thinking is, a new MyDataCache is created in the background with the refreshed data and then the reference inside the manager is swapped.
However, I need this to be thread safe. Given, MyDataCache is immutable can I create a method in the manager that simply takes a new MyDataCache as a parameter and sets it as its new internal reference? Given it's just a reference switch of an immutable object, would this be thread safe?
If not, can I use Interlocked.Exchange to achieve this?
Ultimately, given 1000s of reads per second and only 1 update every few hours, what's the best thread safety strategy for this?

Related

How Instances of immutable types are inherently thread-safe

I search about Why .NET String is immutable? And got this answer:
Instances of immutable types are inherently thread-safe, since no
thread can modify it, the risk of a thread modifying it in a way that
interfers with another is removed (the reference itself is a different
matter).
So I want to know How Instances of immutable types are inherently thread-safe?
Why Instances of immutable types are inherently thread-safe?
Because an instance of a string type can't be mutated across multiple threads. This effectively means that one thread changing the string won't result in that same string being changed in another thread, since a new string is allocated in the place the mutation is taking place.
Generally, everything becomes easier when you create an object once, and then only observe it. Once you need to modify it, a new local copy gets created.
Wikipedia:
Immutable objects can be useful in multi-threaded applications.
Multiple threads can act on data represented by immutable objects
without concern of the data being changed by other threads. Immutable
objects are therefore considered to be more thread-safe than mutable
objects.
#xanatos (and wikipedia) point out that immutable isn't always thread-safe. We like to make that correlation because we say "any type which has persistent non-changing state is safe across thread boundaries", but may not be always the case. Assume a type is immutable from the "outside", but internally will need to modify it's state in a way which may not be safe when done in parallel from multiple threads, and may cause undetermined behavior. This means that although immutable, it is not thread safe.
To conclude, immutable != thread-safe. But immutability does take you one step closer, when done right, towards being able to do multi-threaded work correctly.
The short answer:
Because you only write the data in 1 thread and always read it after writing in multiple threads. Because there is no read/write conflict possible, it's thread safe.
The long answer:
A string is essentially a pointer to a buffer of memory. Basically what happens is that you create a buffer, fill it with characters and then expose the pointer to the outside world.
Note that you cannot access the contents of the string before the string object itself is constructed, which enforces this ordering of 'write data', then 'expose pointer'. If you would do it the other way around (I guess that's theoretically possible), problems might arrise.
If another thread (let's say: CPU) reads the pointer, it is a 'new pointer' for the CPU, which therefore requires the CPU to go to the 'real' memory and then read the data. If it would take the pointer contents from cache, we would have had a problem.
The last piece of the puzzle has to do with memory management: we have to know it's a 'new' pointer. In .NET we know this is the case: memory on the heap is basically never re-used until a GC occurs. The garbage collector then does a mark, sweep and compact.
Now, you might argue that the 'compact' phase reuses pointers, therefore changing the contents of the pointers. While this is true, the GC also has to stop the threads and force a full memory fence, which in simple terms, flushes the CPU cache. After that, all memory access is guaranteed, which ensures you always have to go to memory after the GC phase completes.
As you can see there is no way to read the data by not reading it directly from memory (the way it was written). Since it's immutable, the contents remain the same for all threads until it's eventually collected. As such, it's thread safe.
I've seen some discussion about immutable here, that suggests you can change an internal state. Of course, the moment you start changing things, you can potentially introduce read/write conflicts.
The definition of that I'm using here is to keep the contents constant after creation. That is: write once, read many, don't change (any) state after exposing the pointer. You get the picture.
One of the biggest problem in multi-threading code is two threads accessing the same memory cell at the same time with at least one of them modifying this memory cell.
If none of the threads can modify a memory cell, the problem does not exist any longer.
Because an immutable variable is not modifyable, it can be used from several threads without any further measures (for example locks).

How should I share a large read-only List<T> with each Task.Factory.StartNew() method

Consider that I have a custom class called Terms and that class contains a number of strings properties. Then I create a fairly large (say 50,000) List<Terms> object. This List<Terms> only needs to be read from but it needs to be read from by multiple instances of Task.Factory.StartNew (the number of instances could vary from 1 to 100s).
How would I best pass that list into the long running task? Memory isn't too much of a concern as this is a custom application for a specific use on a specific server with plenty of memory. Should I reference it or should I just pass it off as a normal argument into the method doing the work?
Since you're passing a reference it doesn't really matter how you pass it, it won't copy the list itself. As Ket Smith said, I would pass it as a parameter to the method you are executing.
The issue is List<T> is not entirely thread-safe. Reads by multiple threads are safe but a write can cause some issues:
It is safe to perform multiple read operations on a List, but issues can occur if the collection is modified while it’s being read. To ensure thread safety, lock the collection during a read or write operation.
From List<T>
You say your list is read-only so that may be a non-issue, but a single unpredictable change could lead to unexpected behavior and so it's bug-prone.
I recommend using ImmutableList<T> which is inherently thread-safe since it's immutable.
So long as you don't try to copy it into each separate task, it shouldn't make much difference: more a matter of coding style than anything else. Each task will still be working with the same list in memory: just a different reference to the same underlying list.
That said, sheerly as a matter of coding style and maintainability, I'd probably try to pass it in as a parameter to whatever method you're executing in your Task.Factory.StartNew() (or better yet, Task.Run() - see here). That way, you've clearly called out your task's dependencies, and if you decide that you need to get the list from some other place, it's more clear what you've got to change. (But you could probably find 20 places in my own code where I haven't followed that rule: sometimes I go with what's easier for me now than with what's likely to be easier for the me six months from now.)

c# Static or Non Static Class

I have a c# windows forms mp3 player application. I have the audio files in my Resources folder and a separate "MyAudio" static class which handles all the audio related work like playing and increasing volume etc.
From my Form, I just call the play method using:
MyAudio.Play(track);
In the MyAudio class, I have a WindowsMediaPlayer object declared as:
private static WindowsMediaPlayer obj=new WindowsMediaPlayer();
My Question is, in terms of efficiency and less memory usage, is it better to declare MyAudio class as static or non static? Is it wise to create a Object of the MyAudio class in form and then call the methods or directly call using class Name?
Also is it good practice to declare the instance variables as static?
Your question is indeed broad, but there are few design principles that you can take care of, while you are designing a class:
Do I need the object and it's state throughout the application lifetime
Do I need to maintain the state of class variables for future use
Do I need to multi-thread or parallelize the application at any point of time
Do I need to decouple the component in the future and used in other scenarios like Ajax based web scenario
Important thing in this case is that you are keen to maintain the state for the application lifetime and the amount of memory usage is fine for the application environment, since after initializing you would be able to get all the data from memory and don't need to query a source like database. However, this is good for the scenario where you need to initialize once and read as a static information in the rest of the application. In case you plan to re query the information, then the part purpose of using static type would be lost
Let's assume in the future you need to parallelize your code for performance enhancement, then static will come to haunt you, since it would be shared among threads and invariably would need a synchronization construct like lock, mutex, which will serialize all threads and thus purpose would be lost. Same things would happen in a Web / Ajax scenario and your static component cannot handle the multiple parallel requests and will get corrupted until and unless synchronized. Here instance variable per thread is a boon, as they do task / data parallelization without requiring a lock, mutex
In my understanding static is a convenience, which many programmers misuse, by avoiding the instance variable and using at will, without understanding the implications. From the GC perspective, it cannot collect the static variable, so the working set of the application would invariably increase till it stabilize and will not decrease until and unless explicitly released by program, which is not good for any application, until and unless we are storing data to avoid network database calls.
Ideal design would suggest to always use the instance class, which gets created, does its work and gets released, not linger around. In case there's information that needs to be passed from one function to another like in your case from Play to Pause to Stop, then that data can be persisted to a static variable and modified in a thread safe manner, which is a much better approach
If we just take example given by you since it's a windows form, which does operations like Play, then static would be fine, as it is an executable running on a system, but for testing imagine a scenario that you initiate multiple instances by double clicking and play around on each one, by pressing different operations, then they all will access same static object and you may get a corruption issue, in fact to resolve such scenario you may even chose your class to be singleton, where at a given moment no more than one instance can exist in the memory, like it happens for Yahoo messenger, no matter how many times you click, always same instance comes up.
There are no static instance variables. However its best practice to define static members if they don't have anything to do with a particular instance of the class.

Create thread-safe cache of disposable objects

I have to create thread safe cache of disposable objects. How i see it:
I have some data class, that i want to cache, ex MyData
I'm creating some collection (ConcurrentDictionary for example) for MyData
I have method that creates new instance of MyData, using some key.
When i need to get MyData for some key i check if it exists in my storage - then use it from collection, else creates new instance and put it into collection
I have some event, when i should invalidate cache. On this event i clear storage.
Probles is that MyData is Disposable. I don't know when should i call Dispose method. I can't call Dispose method when clearing collection on clear-cache event, becouse some thread can use this instances of MyData in a moment.
What pattern should i choose?
I can't see much sense with concurrent collection in the case, when consumers of the cache should be able to complete current operation with the item from cache. You really need a synchronization here.
Most obvious you can do here, is to use ReaderWriteLockSlim and a wrapper around regular Dictionary<TKey, TValue>.
When someone wants to use item from cache, it acquires a read access. When someone wants to modify cache (add an item, or invalidate cache at all), it acquires a write access (hence, writer can't invalidate cache, until last reader won't release the lock).
Another option is to consider approach, when you're just catching ObjectDisposedException. But this approach assumes, that currect operation can be interrupted from the outside.
i think you should encapsulate the cache mechanism and no one should call dispose the object that wraps the dictionary should do it without out side intervention ( a Singleton pattern would be good here, and all access to the dictionary should be locked, on the event you invalidate the cache just lock it again dispose of it and clear it. if you need help with the singleton tell me.
If the objects are logically immutable, are somewhat expensive to create, and are known to consume no non-fungible resources and only a modest quantity of fungible ones; and if tracking all references to them would be impractical, you might consider using cache of short weak references. Such a situation would be one of the few times I might consider it reasonable to "rely" upon finalization, since only one object matching a given key would ever exist outside the "freachable" queue (list of objects needing cleanup) at any given time. One advantage of that approach is that provided you avoid resurrecting objects which have become eligible for finalization, your code shouldn't have to worry too much about threading issues in most cases. You'll have to be a periodically clean out entries whose associated weak references have gone dead, and watch out for the possibility that a cache entry might get updated with a new object after it's been recognized as dead, but the GC should ensure that objects don't get cleaned up while anyone's using them.

C# .Net 4.5 Multithreading sharing variables

I am new to multithreading and have a question on sharing objects. I am doing this in C# .Net 4.5
I have an list that contains a object called Price. The class Price contains 12 properties one of type datetime and the others are of type double.
I then run 4 tasks which all reference this object List. None of the tasks will change the List object they are just reading from the object.
So the fact the tasks are all referencing the same object but only reading from it am I right to think that I will not need any locking?
Yes the read does not modify anything for those types (and indeed most types), so it's safe.
Until and unless you do not have update and add going on any other thread you do not need to add locking. If update or edit is going on any other thread then do consider to use locking.
ReaderWriterLockSlim provides an easy and efficient way to provide advanced Reader and Writer locks.
Moreover as mentioned in Thread Safety section in documentation,
It is safe to perform multiple read operations on a List, but issues can occur if the collection is modified while it’s being read.

Categories