Dual-queue producer-consumer in .NET (forcing member variable flush) - c#

I have a thread which produces data in the form of simple object (record). The thread may produce a thousand records for each one that successfully passes a filter and is actually enqueued. Once the object is enqueued it is read-only.
I have one lock, which I acquire once the record has passed the filter, and I add the item to the back of the producer_queue.
On the consumer thread, I acquire the lock, confirm that the producer_queue is not empty,
set consumer_queue to equal producer_queue, create a new (empty) queue, and set it on producer_queue. Without any further locking I process consumer_queue until it's empty and repeat.
Everything works beautifully on most machines, but on one particular dual-quad server I see in ~1/500k iterations an object that is not fully initialized when I read it out of consumer_queue. The condition is so fleeting that when I dump the object after detecting the condition the fields are correct 90% of the time.
So my question is this: how can I assure that the writes to the object are flushed to main memory when the queue is swapped?
Edit:
On the producer thread:
(producer_queue above is m_fillingQueue; consumer_queue above is m_drainingQueue)
private void FillRecordQueue() {
while (!m_done) {
int count;
lock (m_swapLock) {
count = m_fillingQueue.Count;
}
if (count > 5000) {
Thread.Sleep(60);
} else {
DataRecord rec = GetNextRecord();
if (rec == null) break;
lock (m_swapLock) {
m_fillingQueue.AddLast(rec);
}
}
}
}
In the consumer thread:
private DataRecord Next(bool remove) {
bool drained = false;
while (!drained) {
if (m_drainingQueue.Count > 0) {
DataRecord rec = m_drainingQueue.First.Value;
if (remove) m_drainingQueue.RemoveFirst();
if (rec.Time < FIRST_VALID_TIME) {
throw new InvalidOperationException("Detected invalid timestamp in Next(): " + rec.Time + " from record " + rec);
}
return rec;
} else {
lock (m_swapLock) {
m_drainingQueue = m_fillingQueue;
m_fillingQueue = new LinkedList<DataRecord>();
if (m_drainingQueue.Count == 0) drained = true;
}
}
}
return null;
}
The consumer is rate-limited, so it can't get ahead of the consumer.
The behavior I see is that sometimes the Time field is reading as DateTime.MinValue; by the time I construct the string to throw the exception, however, it's perfectly fine.

Have you tried the obvious: is microcode update applied on the fancy 8-core box(via BIOS update)? Did you run Windows Updates to get the latest processor driver?
At the first glance, it looks like you're locking your containers. So I am recommending the systems approach, as it sound like you're not seeing this issue on a good-ol' dual core box.

Assuming these are in fact the only methods that interact with the m_fillingQueue variable, and that DataRecord cannot be changed after GetNextRecord() creates it (read-only properties hopefully?), then the code at least on the face of it appears to be correct.
In which case I suggest that GregC's answer would be the first thing to check; make sure the failing machine is fully updated (OS / drivers / .NET Framework), becasue the lock statement should involve all the required memory barriers to ensure that the rec variable is fully flushed out of any caches before the object is added to the list.

Related

Trying to find a lock-less solution for a C# concurrent queue

I have the following code in C#:
(_StoreQueue is a ConcurrentQueue)
var S = _StoreQueue.FirstOrDefault(_ => _.TimeStamp == T);
if (S == null)
{
lock (_QueueLock)
{
// try again
S = _StoreQueue.FirstOrDefault(_ => _.TimeStamp == T);
if (S == null)
{
S = new Store(T);
_StoreQueue.Enqueue(S);
}
}
}
The system is collecting data in real time (fairly high frequency, around 300-400 calls / second) and puts it in bins (Store objects) that represent a 5 second interval. These bins are in a queue as they get written and the queue gets emptied as data is processed and written.
So, when data is arriving, a check is done to see if there is a bin for that timestamp (rounded by 5 seconds), if not, one is created.
Since this is quite heavily multi-threaded, the system goes with the following logic:
If there is a bin, it is used to put data.
If there is no bin, a lock gets initiated and within that lock, the check is done again to make sure it wasn't created by another thread in the meantime. and if there is still no bin, one gets created.
With this system, the lock is roughly used once every 2k calls
I am trying to see if there is a way to remove the lock, but it is mostly because I'm thinking there has to be a better solution that the double check.
An alternative I have been thinking about is to create empty bins ahead of time and that would entirely remove the need for any locks, but the search for the right bin would become slower as it would have to scan the list pre-built bins to find the proper one.
Using a ConcurrentDictionary can fix the issue you are having. Here i assumed a type double for your TimeStamp property but it can be anything, as long as you make the ConcurrentDictionary key match the type.
class Program
{
ConcurrentDictionary<double, Store> _StoreQueue = new ConcurrentDictionary<double, Store>();
static void Main(string[] args)
{
var T = 17d;
// try to add if not exit the store with 17
_StoreQueue.GetOrAdd(T, new Store(T));
}
public class Store
{
public double TimeStamp { get; set; }
public Store(double timeStamp)
{
TimeStamp = timeStamp;
}
}
}

Chance of hitting the same function at the same time by two Threads/Tasks

Assuming the following case:
public HashTable map = new HashTable();
public void Cache(String fileName) {
if (!map.ContainsKey(fileName))
{
map.Add(fileName, new Object());
_Cache(fileName);
}
}
}
private void _Cache(String fileName) {
lock (map[fileName])
{
if (File Already Cached)
return;
else {
cache file
}
}
}
When having the following consumers:
Task.Run(()=> {
Cache("A");
});
Task.Run(()=> {
Cache("A");
});
Would it be possible in any ways that the Cache method would throw a Duplicate key exception meaning that both tasks would hit the map.add method and try to add the same key??
Edit:
Would using the following data structure solve this concurrency problem?
public class HashMap<Key, Value>
{
private HashSet<Key> Keys = new HashSet<Key>();
private List<Value> Values = new List<Value>();
public int Count => Keys.Count;
public Boolean Add(Key key, Value value) {
int oldCount = Keys.Count;
Keys.Add(key);
if (oldCount != Keys.Count) {
Values.Add(value);
return true;
}
return false;
}
}
Yes, of course it would be possible. Consider the following fragment:
if (!map.ContainsKey(fileName))
{
map.Add(fileName, new Object());
Thread 1 may execute if (!map.ContainsKey(fileName)) and find that the map does not contain the key, so it will proceed to add it, but before it gets the chance to add it, Thread 2 may also execute if (!map.ContainsKey(fileName)), at which point it will also find that the map does not contain the key, so it will also proceed to add it. Of course, that will fail.
EDIT (after clarifications)
So, the problem seems to be how to keep the main map locked for as little as possible, and how to prevent cached objects from being initialized twice.
This is a complex problem, so I cannot give you a ready-to-run answer that will work, (especially since I do not currently even have a C# development environment handy,) but generally speaking, I think that you should proceed as follows:
Fully guard your map with lock().
Keep your map locked as little as possible; when an object is not found to be in the map, add an empty object to the map and exit the lock immediately. This will ensure that this map will not become a point of contention for all requests coming in to the web server.
After the check-if-present-and-add-if-not fragment, you are holding an object which is guaranteed to be in the map. However, this object may and may not be initialized at this point. That's fine. We will take care of that next.
Repeat the lock-and-check idiom, this time with the cached object: every single incoming request interested in that specific object will need to lock it, check whether it is initialized, and if not, initialize it. Of course, only the first request will suffer the penalty of initialization. Also, any requests that arrive before the object has been fully initialized will have to wait on their lock until the object is initialized. But that's all very fine, that's exactly what you want.

Executing part of code exactly 1 time inside Parallel.ForEach

I have to query in my company's CRM Solution(Oracle's Right Now) for our 600k users, and update them there if they exist or create them in case they don't. To know if the user already exists in Right Now, I consume a third party WS. And with 600k users this can be a real pain due to the time it takes each time to get a response(around 1 second). So I managed to change my code to use Parallel.ForEach, querying each record in just 0,35 seconds, and adding it to a List<User> of records to be created or to be updated (Right Now is kinda dumb so I need to separate them in 2 lists and call 2 distinct WS methods).
My code used to run perfectly before multithread, but took too long. The problem is that I can't make a batch too large or I get a timeout when I try to update or create via Web Service. So I'm sending them around 500 records at once, and when it runs the critical code part, it executes many times.
Parallel.ForEach(boDS.USERS.AsEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = -1 }, row =>
{
...
user = null;
user = QueryUserById(row["USER_ID"].Trim());
if (user == null)
{
isUpdate = false;
gObject.ID = new ID();
}
else
{
isUpdate = true;
gObject.ID = user.ID;
}
... fill user attributes as generic fields ...
gObject.GenericFields = listGenericFields.ToArray();
if (isUpdate)
listUserUpdate.Add(gObject);
else
listUserCreate.Add(gObject);
if (i == batchSize - 1 || i == (boDS.USERS.Rows.Count - 1))
{
UpdateProcessingOptions upo = new UpdateProcessingOptions();
CreateProcessingOptions cpo = new CreateProcessingOptions();
upo.SuppressExternalEvents = false;
upo.SuppressRules = false;
cpo.SuppressExternalEvents = false;
cpo.SuppressRules = false;
RNObject[] results = null;
// <Critical_code>
if (listUserCreate.Count > 0)
{
results = _service.Create(_clientInfoHeader, listUserCreate.ToArray(), cpo);
}
if (listUserUpdate.Count > 0)
{
_service.Update(_clientInfoHeader, listUserUpdate.ToArray(), upo);
}
// </Critical_code>
listUserUpdate = new List<RNObject>();
listUserCreate = new List<RNObject>();
}
i++;
});
I thought about using lock or mutex, but it isn't gonna help me, since they will just wait to execute afterwards. I need some solution to execute only ONCE in only ONE thread that part of code. Is it possible? Can anyone share some light?
Thanks and kind regards,
Leandro
As you stated in the comments you're declaring the variables outside of the loop body. That's where your race conditions originate from.
Let's take variable listUserUpdate for example. It's accessed randomly by parallel executing threads. While one thread is still adding to it, e.g. in listUserUpdate.Add(gObject); another thread could already be resetting the lists in listUserUpdate = new List<RNObject>(); or enumerating it in listUserUpdate.ToArray().
You really need to refactor that code to
make each loop run as independent from each other as you can by moving variables inside the loop body and
access data in a synchronizing way using locks and/or concurrent collections
You can use the Double-checked locking pattern. This is usually used for singletons, but you're not making a singleton here so generic singletons like Lazy<T> do not apply.
It works like this:
Separate out your shared data into some sort of class:
class QuerySharedData {
// All the write-once-read-many fields that need to be shared between threads
public QuerySharedData() {
// Compute all the write-once-read-many fields. Or use a static Create method if that's handy.
}
}
In your outer class add the following:
object padlock;
volatile QuerySharedData data
In your thread's callback delegate, do this:
if (data == null)
{
lock (padlock)
{
if (data == null)
{
data = new QuerySharedData(); // this does all the work to initialize the shared fields
}
}
}
var localData = data
Then use the shared query data from localData By grouping the shared query data into a subordinate class you avoid the necessity of making its individual fields volatile.
More about volatile here: Part 4: Advanced Threading.
Update my assumption here is that all the classes and fields held by QuerySharedData are read-only once initialized. If this is not true, for instance if you initialize a list once but add to it in many threads, this pattern will not work for you. You will have to consider using things like Thread-Safe Collections.

How to freeze a popsicle in .NET (make a class immutable)

I'm designing a class that I wish to make readonly after a main thread is done configuring it, i.e. "freeze" it. Eric Lippert calls this popsicle immutability. After it is frozen, it can be accessed by multiple threads concurrently for reading.
My question is how to write this in a thread safe way that is realistically efficient, i.e. without trying to be unnecessarily clever.
Attempt 1:
public class Foobar
{
private Boolean _isFrozen;
public void Freeze() { _isFrozen = true; }
// Only intended to be called by main thread, so checks if class is frozen. If it is the operation is invalid.
public void WriteValue(Object val)
{
if (_isFrozen)
throw new InvalidOperationException();
// write ...
}
public Object ReadSomething()
{
return it;
}
}
Eric Lippert seems to suggest this would be OK in this post.
I know writes have release semantics, but as far as I understand this only pertains to ordering, and it doesn't necessarily mean that all threads will see the value immediately after the write. Can anyone confirm this? This would mean this solution is not thread safe (this may not be the only reason of course).
Attempt 2:
The above, but using Interlocked.Exchange to ensure the value is actually published:
public class Foobar
{
private Int32 _isFrozen;
public void Freeze() { Interlocked.Exchange(ref _isFrozen, 1); }
public void WriteValue(Object val)
{
if (_isFrozen == 1)
throw new InvalidOperationException();
// write ...
}
}
Advantage here would be that we ensure the value is published without suffering the overhead on every read. If none of the reads are moved before the write to _isFrozen as the Interlocked method uses a full memory barrier I would guess this is thread safe. However, who knows what the compiler will do (and according to section 3.10 of the C# spec that seems like quite a lot), so I don't know if this is threadsafe.
Attempt 3:
Also do the read using Interlocked.
public class Foobar
{
private Int32 _isFrozen;
public void Freeze() { Interlocked.Exchange(ref _isFrozen, 1); }
public void WriteValue(Object val)
{
if (Interlocked.CompareExchange(ref _isFrozen, 0, 0) == 1)
throw new InvalidOperationException();
// write ...
}
}
Definitely thread safe, but it seems a little wasteful to have to do the compare exchange for every read. I know this overhead is probably minimal, but I'm looking for a reasonably efficient method (although perhaps this is it).
Attempt 4:
Using volatile:
public class Foobar
{
private volatile Boolean _isFrozen;
public void Freeze() { _isFrozen = true; }
public void WriteValue(Object val)
{
if (_isFrozen)
throw new InvalidOperationException();
// write ...
}
}
But Joe Duffy declared "sayonara volatile", so I won't consider this a solution.
Attempt 5:
Lock everything, seems a bit overkill:
public class Foobar
{
private readonly Object _syncRoot = new Object();
private Boolean _isFrozen;
public void Freeze() { lock(_syncRoot) _isFrozen = true; }
public void WriteValue(Object val)
{
lock(_syncRoot) // as above we could include an attempt that reads *without* this lock
if (_isFrozen)
throw new InvalidOperationException();
// write ...
}
}
Also seems definitely thread safe, but has more overhead than using the Interlocked approach above, so I would favour attempt 3 over this one.
And then I can come up with at least some more (I'm sure there are many more):
Attempt 6: use Thread.VolatileWrite and Thread.VolatileRead, but these are supposedly a little on the heavy side.
Attempt 7: use Thread.MemoryBarrier, seems a little too internal.
Attempt 8: create an immutable copy - don't want to do this
Summarising:
which attempt would you use and why (or how would you do it if entirely different)? (i.e. what is the best way for publishing a value once that is then read concurrently, while being reasonably efficient without being overly "clever"?)
does .NET's memory model "release" semantics of writes imply that all other threads see updates (cache coherency etc.)? I generally don't want to think too much about this, but it's nice to have an understanding.
EDIT:
Perhaps my question wasn't clear, but I am looking in particular for reasons as to why the above attempts are good or bad. Note that I am talking here about a scenario of one single writer that writes then freezes before any concurrent reads. I believe attempt 1 is OK but I'd like to know exactly why (as I wonder if reads could be optimized away somehow, for example).
I care less about whether or not this is good design practice but more about the actual threading aspect of it.
Many thanks for the response the question received, but I have chosen to mark this as an answer myself because I feel that the answers given do not quite answer my question and I do not want to give the impression to anyone visiting the site that the marked answer is correct simply because it was automatically marked as such due to the bounty expiring.
Furthermore I do not think the answer with the highest number of votes was overwhelmingly voted for, not enough to mark it automatically as an answer.
I am still leaning to attempt #1 being correct, however, I would have liked some authoritative answers. I understand x86 has a strong model, but I don't want to (and shouldn't) code for a particular architecture, after all that's one of the nice things about .NET.
If you are in doubt about the answer, go for one of the locking approaches, perhaps with the optimizations shown here to avoid a lot of contention on the lock.
Maybe slightly off topic but just out of curiosity :) Why don't you use "real" immutability? e.g. making Freeze() return an immutable copy (without "write methods" or any other possibility to change the inner state) and using this copy instead of the original object. You could even go without changing the state and return a new copy (with the changed state) on each write operation instead (afaik the string class works this). "Real immutability" is inherently thread safe.
I vote for Attempt 5, use the lock(this) implementation.
This is the most reliable means of making this work. Reader/writer locks could be employed, but to very little gain. Just go with using a normal lock.
If necessary you could improve the 'frozen' performance by first checking _isFrozen and then locking:
void Freeze() { lock (this) _isFrozen = true; }
object ReadValue()
{
if (_isFrozen)
return Read();
else
lock (this) return Read();
}
void WriteValue(object value)
{
lock (this)
{
if (_isFrozen) throw new InvalidOperationException();
Write(value);
}
}
If you really create, fill and freeze the object before showing it to other threads, then you don't need anything special to deal with thread-safety (the strong memory model of .NET is already your guarantee), so the solution 1 is valid.
But, if you give the unfrozen object to another thread (or if you are simple creating your class without knowing how users will use it) then using the version the solution that returns a new fully immutable instance is probably better. In this case, the Mutable instance is like the StringBuilder and the immutable instance is like the string. If you need an extra guarantee, the mutable instance may check its creator thread and throw exceptions if it is used from any other thread (in all methods... to avoid possible partial reads).
Attempt 2 is thread safe on x86 and other processors that have a strong memory model, but how I would do it is to make thread safety the consumers problem because there is no way for you to efficiently do it within the consumed code. Consider:
if(!foo.frozen)
{
foo.apropery = "avalue";
}
the thread saftey of the frozen property and the guard code in apropery's setter doesn't really matter because even they are perfectly thread safe you still have a race condition. Instead I would write it like
lock(foo)
{
if(!foo.frozen)
{
foo.apropery = "avalue";
}
}
and have neither of the properties inherently thread safe.
#1 - reader not threadsafe - I believe problem would be in reader side, not writer (code not shown)
#2 - reader not threadsafe - same as #1
#3 - promising, read check can be optimized out for most cases (when CPU caches are in sync)
Attempt 3:
Also do the read using Interlocked.
public class Foobar {
private object _syncRoot = new object();
private int _isFrozen = 0; // perf compiler warning, but training code, so show defaults
// Why Exchange to 1 then throw away result. Best to just increment.
//public void Freeze() { Interlocked.Exchange(ref _isFrozen, 1); }
public void Freeze() { Interlocked.Increment(ref _isFrozen); }
public void WriteValue(Object val) {
// if this core can see _isFrozen then no special lock or sync needed
if (_isFrozen != 0)
throw new InvalidOperationException();
lock(_syncRoot) {
if (_isFrozen != 0)
throw new InvalidOperationException(); // the 'throw' is 100x-1000x more costly than the lock, just eat it
_val = val;
}
}
public object Read() {
// frozen is one-way, if one-way state has been published
// to my local CPU cache then just read _val.
// There are very strange corner cases when _isFrozen and _val fields are in
// different cache lines, but should be nearly impossible to hit unless
// dealing with very large structs (make it more likely to cross
// 4k cache line).
if (_isFrozen != 0)
return _val;
// else
lock(_syncRoot) { // _isFrozen is 0 here
if (_isFrozen != 0) // if _isFrozen is 1 here we just collided with writer using lock on other thread, or our CPU cache was out of sync and lock() forced the dirty cache line to be read from main memory
return _val;
throw new InvalidOperationException(); // throw is 100x-1000x more expensive than lock, eat the cost of lock
}
}
}
Joe Duffy's post about 'volatile is dead' is, I think, in the context of his next-gen CLR/OS architecture and for CLR on ARM. Those of us doing multi-core x64/x86 I think volatile is fine. If perf is the primary concern I suggest you measure the code above and compare it to volatile.
Unlike other folks posting answers I wouldn't jump straight to lock() if you have lots of readers (3 or more threads likely to read the same object at the same time). But in your sample you mix perf-sensitive question with exceptions when a collision happens, which doesn't make much sense. If you're using exceptions, then you can also use other higher-level constructs.
If you want complete safety but need to optimize for lots of concurrent readers change lock()/Monitor to ReaderWriterLockSlim.
.NET has new primitives to handle publishing values. Take a look at Rx. It can be very fast and lockless for some cases (I think they use optimizations similar to above).
If written multiple times but only one value is kept - in Rx that is "new ReplaySubject(bufferSize: 1)". If you try it you might be surprised how fast it. At the same time I applaud your attempt to learn this level of detail.
If you want to go lockless get over your distaste for Thread.MemoryBarrier(). It is extremely important. But it has the same gotchas as volatile as described by Joe Duffy - it was designed as a hint to the compiler & CPU to prevent reordering of memory reads (which take a long time in CPU terms, so they are aggressively reordered when there are no hints present). When this reordering is combined with CLR constructs like auto-inline of functions and you can see very surprising behavior at the memory & register level. MemoryBarrier() just disables those single-threaded memory access assumptions that CPU and CLR use most of the time.
Perhaps my question wasn't clear, but I am looking in particular for reasons as to why the above attempts are good or bad. Note that I am talking here about a scenario of one single writer that writes then freezes before any concurrent reads. I believe attempt 1 is OK but I'd like to know exactly why (as I wonder if reads could be optimized away somehow, for example). I care less about whether or not this is good design practice but more about the actual threading aspect of it.
Ok, now I better understand what you are doing and looking for in a response. Allow me to elaborate on my previous answer promoting the use of locks by first addressing each of your attempts.
Attempt 1:
The approach of using a simple class that has no synchronization primitives of any form is entirely viable in your example. Since the 'authoring' thread is the only thread having access to this class during it's mutating state this should be safe. If an only if another thread has the potential to access before the class is 'frozen' would you need to provide synchronization. Essentially, it's not possible for a thread to have a cache of something it has never seen.
Aside from a thread having a cached copy of the internal state of this list there is one other concurrency issue that you should be concerned with. You should consider write reordering by the authoring thread. You example solution doesn't have enough code for me to address this, but the process of handing this 'frozen' list to another thread is the heart of the issue. Are you using Interlocked.Exchange or writing to a volatile state?
I still advocate that is not the best approach simply because there is no guarantee that another thread has not seen the instance while it's mutating.
Attempt 2:
While attempt 2 should not be used. If you are using atomic writes to a member, one should also use atomic reads. I would never recommend one without the other as without both reads and writes being atomic you haven't gained anything. The correct application of atomic reads and writes is your 'Attempt 3'.
Attempt 3:
This will guarantee an exception is thrown if a thread has attempted to mutate an frozen list. However it makes no assertion that a read is only acceptable on a frozen instance. This, IMHO, is just as bad as accessing our _isFrozen variable with atomic and non-atomic accessors. If you are going to say that it's important to safeguard writes, then you should always safeguard reads. One without the other is just 'odd'.
Overlooking my own feeling towards writing code that gaurds writes but not reads this is an acceptable approach given your specific uses. I have one writer, I write, I freeze, then I make it available to readers. Under this scenario you code works correctly. You rely on the atomic operation on the set of _isFrozen to provide the required memory barrier prior to handing the class to another thread.
In a nutshell this approach works, but again if a thread has an instance that is not frozen it's going to break.
Attempt 4:
While at heart this is nearly the same as attempt 3 (given one writer) there is one big difference. In this example, if you check _isFrozen in the reader then every access will require a memory barrier. This is unnecessary overhead once the list is frozen.
Still this has the same issue as Attempt 3 in that no assertions are made about the state of _isFrozen during the read so the performance should be identical in your example usage.
Attempt 5:
As I said this is my preference given the modification to read as appears in my other answer.
Attempt 6:
Is essentially the same as #4.
Attempt 7:
You could solve your specific needs with a Thread.MemoryBarrier. Essentially using the code from Attempt 1, you create the instance, call Freeze(), add your Thread.MemoryBarrier, and then share the instance (or share it within a lock). This should work great, again only under your limited use case.
Attempt 8:
Without knowing more about this, I can't advise on the cost of the copy.
Summary
Again I prefer using a class that has some threading guarantee or none at all. Creating a class that is only 'partially' thread safe is, IMO, dangerous.
In the words of a famous jedi master:
Either do or do not there is no try.
The same goes for thread safety. The class should either be thread safe or not. Taking this approach you are left with either using my augmentation of Attempt 5, or using Attempt 7. Given the choice, I would never recommend #7.
So my recommendation stands firmly behind a completely thread-safe version. The performance cost between the two is so infinitesimally small it's almost non-existent. The reader threads will never hit the lock simply because of your usage scenario of having a single writer. Yet, if they do, proper behavior is still a certainty. Thus as your code changes over time and suddenly your instance is being shared prior to being frozen you don't wind up with race condition that crashes your program. Thread safe, or not, don't be half-in or you wind up with nasty surprise someday.
My preference is all classes shared by more than one thread are one of two types:
Completely immutable.
Completely Thread-safe.
Since a popsicle list is not immutable by design it does not fit #1. Therefore if you are going to share the object across threads it should fit #2.
Hopefully all this ranting further explains my reasoning :)
_syncRoot
Many people have noticed that I skipped the use of a _syncRoot on my locking implementation. While the reasons to use _syncRoot are valid they are not always necessary. In your example usage where you have a single writer the use of lock(this) should suffice nicely without adding another heap allocation for _syncRoot.
Is the thing constructed and written to, then permanently frozen and read multiple times?
Or do you freeze and unfreeze and refreeze it multiple times?
If it's the former, then perhaps the "is frozen" check should be in the reader method not the writer method (to prevent it reading before it's frozen).
Or, if it's the latter, then the use case you need to beware of is:
Main thread invokes the writer method, finds that it's not frozen, and therefore begins to write
Before the write has finished, someone tries to freeze the object and then reads from it, while the other (main) thread is still writing
In the latter case, Google shows a lot of results for multiple reader single writer which you might find interesting.
In general, each mutable object should have precisely one clearly-defined "owner"; shared objects should be immutable. Popsicles should not be accessible by multiple threads until after they are frozen.
Personally, I don't like forms of popsicle immunity with an exposed "freeze" method. I think a cleaner approach is to have AsMutable and AsImmutable methods (each of which would simply return the object unmodified when appropriate). Such an approach can allow for more robust promises about immutability. For example, if an "unshared mutable object" is being mutated while its AsImmutable member is being called (behavior which would be contrary to the object being "unshared"), the state of the data in the copy may be indeterminate, but whatever was returned would be immutable. By contrast, if one thread froze an object and then assumed it was immutable while another thread was writing to it, the "immutable" object could end up changing after it was frozen and its values were read.
Edit
Based on further description, I would suggest having code which writes to the object do so within a monitor lock, and having the freeze routine look something like:
public Thingie Freeze(void) // Returns the object in question
{
if (isFrozen) // Private field
return this;
else
return DoFreeze();
}
Thingie DoFreeze(void)
{
if (Monitor.TryEnter(whatever))
{
isFrozen = true;
return this;
}
else if (isFrozen)
return this;
else
throw new InvalidOperationException("Object in use by writer");
}
The Freeze method may be called any number of times by any number of threads; it should be short enough to be inlined (though I haven't profiled it), and should thus take almost no time to execute. If the first access of the object in any thread is via the Freeze method, that should guarantee proper visibility under any reasonable memory model (even if the thread didn't see the updates to the object performed by the thread which created and originally froze it, it would perform the TryEnter, which would guarantee a memory barrier, and after that failed it would notice that the object was frozen and return it.
If code which is going to write the object acquires the lock first, an attempt to write to a frozen object could deadlock. If one would rather have such code throw an exception, one use TryEnter and throw an exception if it can't get the lock.
The object used for locking should be something which is exclusively held by the object to be frozen. If the object to be frozen doesn't hold a purely-private reference to anything, one could either lock on this or create a private object purely for locking purposes. Note that it is safe to abandon 'entered' monitor locks without cleanup; the GC will simply forget about them, since if no references exist to a lock there's no way anybody will ever care (or could even ask) whether the lock was entered at the time it was abandoned.
I am not sure in terms of cost how the following approach will do, but it is a bit different. Only initially if there are multiple threads trying to write value simultaneously will they encounter locks. Once it is frozen all later calls will get the exception directly.
Attempt 9:
public class Foobar
{
private readonly Object _syncRoot = new Object();
private object _val;
private Boolean _isFrozen;
private Action<object> WriteValInternal;
public void Freeze() { _isFrozen = true; }
public Foobar()
{
WriteValInternal = BeforeFreeze;
}
private void BeforeFreeze(object val)
{
lock (_syncRoot)
{
if (_isFrozen == false)
{
//Write the values....
_val = val;
//...
//...
//...
//and then modify the write value function
WriteValInternal = AfterFreeze;
Freeze();
}
else
{
throw new InvalidOperationException();
}
}
}
private void AfterFreeze(object val)
{
throw new InvalidOperationException();
}
public void WriteValue(Object val)
{
WriteValInternal(val);
}
public Object ReadSomething()
{
return _val;
}
}
Have you checked out Lazy
http://msdn.microsoft.com/en-us/library/dd642331.aspx
which uses ThreadLocal
http://msdn.microsoft.com/en-us/library/dd642243.aspx
And actually looking further there is a Freezable class...
http://msdn.microsoft.com/en-us/library/vstudio/ms602734(v=vs.100).aspx
you may achieve this using POST Sharp
take one interface
public interface IPseudoImmutable
{
bool IsFrozen { get; }
bool Freeze();
}
then derive your attribute from InstanceLevelAspect like this
/// <summary>
/// implement by divyang
/// </summary>
[Serializable]
[IntroduceInterface(typeof(IPseudoImmutable),
AncestorOverrideAction = InterfaceOverrideAction.Ignore, OverrideAction = InterfaceOverrideAction.Fail)]
public class PseudoImmutableAttribute : InstanceLevelAspect, IPseudoImmutable
{
private volatile bool isFrozen;
#region "IPseudoImmutable"
[IntroduceMember]
public bool IsFrozen
{
get
{
return this.isFrozen;
}
}
[IntroduceMember(IsVirtual = true, OverrideAction = MemberOverrideAction.Fail)]
public bool Freeze()
{
if (!this.isFrozen)
{
this.isFrozen = true;
}
return this.IsFrozen;
}
#endregion
[OnLocationSetValueAdvice]
[MulticastPointcut(Targets = MulticastTargets.Property | MulticastTargets.Field)]
public void OnValueChange(LocationInterceptionArgs args)
{
if (!this.IsFrozen)
{
args.ProceedSetValue();
}
}
}
public class ImmutableException : Exception
{
/// <summary>
/// The location name.
/// </summary>
private readonly string locationName;
/// <summary>
/// Initializes a new instance of the <see cref="ImmutableException"/> class.
/// </summary>
/// <param name="message">
/// The message.
/// </param>
public ImmutableException(string message)
: base(message)
{
}
public ImmutableException(string message, string locationName)
: base(message)
{
this.locationName = locationName;
}
public string LocationName
{
get
{
return this.locationName;
}
}
}
then apply in your class like this
[PseudoImmutableAttribute]
public class TestClass
{
public string MyString { get; set; }
public int MyInitval { get; set; }
}
then run it in multi thread
/// <summary>
/// The program.
/// </summary>
public class Program
{
/// <summary>
/// The main.
/// </summary>
/// <param name="args">
/// The args.
/// </param>
public static void Main(string[] args)
{
Console.Title = "Divyang Demo ";
var w = new Worker();
w.Run();
Console.ReadLine();
}
}
internal class Worker
{
private object SyncObject = new object();
public Worker()
{
var r = new Random();
this.ObjectOfMyTestClass = new MyTestClass { MyInitval = r.Next(500) };
}
public MyTestClass ObjectOfMyTestClass { get; set; }
public void Run()
{
Task readWork;
readWork = Task.Factory.StartNew(
action: () =>
{
for (;;)
{
Task.Delay(1000);
try
{
this.DoReadWork();
}
catch (Exception exception)
{
// Console.SetCursorPosition(80,80);
// Console.SetBufferSize(100,100);
Console.WriteLine("Read Exception : {0}", exception.Message);
}
}
// ReSharper disable FunctionNeverReturns
});
Task writeWork;
writeWork = Task.Factory.StartNew(
action: () =>
{
for (int i = 0; i < int.MaxValue; i++)
{
Task.Delay(1000);
try
{
this.DoWriteWork();
}
catch (Exception exception)
{
Console.SetCursorPosition(80, 80);
Console.SetBufferSize(100, 100);
Console.WriteLine("write Exception : {0}", exception.Message);
}
if (i == 5000)
{
((IPseudoImmutable)this.ObjectOfMyTestClass).Freeze();
}
}
});
Task.WaitAll();
}
/// <summary>
/// The do read work.
/// </summary>
public void DoReadWork()
{
// ThreadId where reading is done
var threadId = System.Threading.Thread.CurrentThread.ManagedThreadId;
// printing on screen
lock (this.SyncObject)
{
Console.SetCursorPosition(0, 0);
Console.SetBufferSize(290, 290);
Console.WriteLine("\n");
Console.WriteLine("Read Start");
Console.WriteLine("Read => Thread Id: {0} ", threadId);
Console.WriteLine("Read => this.objectOfMyTestClass.MyInitval: {0} ", this.ObjectOfMyTestClass.MyInitval);
Console.WriteLine("Read => this.objectOfMyTestClass.MyString: {0} ", this.ObjectOfMyTestClass.MyString);
Console.WriteLine("Read End");
Console.WriteLine("\n");
}
}
/// <summary>
/// The do write work.
/// </summary>
public void DoWriteWork()
{
// ThreadId where reading is done
var threadId = System.Threading.Thread.CurrentThread.ManagedThreadId;
// random number generator
var r = new Random();
var count = r.Next(15);
// new value for Int property
var tempInt = r.Next(5000);
this.ObjectOfMyTestClass.MyInitval = tempInt;
// new value for string Property
var tempString = "Randome" + r.Next(500).ToString(CultureInfo.InvariantCulture);
this.ObjectOfMyTestClass.MyString = tempString;
// printing on screen
lock (this.SyncObject)
{
Console.SetBufferSize(290, 290);
Console.SetCursorPosition(125, 25);
Console.WriteLine("\n");
Console.WriteLine("Write Start");
Console.WriteLine("Write => Thread Id: {0} ", threadId);
Console.WriteLine("Write => this.objectOfMyTestClass.MyInitval: {0} and New Value :{1} ", this.ObjectOfMyTestClass.MyInitval, tempInt);
Console.WriteLine("Write => this.objectOfMyTestClass.MyString: {0} and New Value :{1} ", this.ObjectOfMyTestClass.MyString, tempString);
Console.WriteLine("Write End");
Console.WriteLine("\n");
}
}
}
but still it will allow you to change property like array ,list . but if you apply more login in that then it may work for all type of property and field
I'd do something like this, inspired by C++ movable types. Just remember not to access the object after Freeze/Thaw.
Of course, you can add a _data != null check/throw if you want to be clear about why the user gets an NRE if accessing after thaw/freeze.
public class Data
{
public string _foo;
public int _bar;
}
public class Mutable
{
private Data _data = new Data();
public Mutable() {}
public string Foo { get => _data._foo; set => _data._foo = value; }
public int Bar { get => _data._bar; set => _data._bar = value; }
public Frozen Freeze()
{
var f = new Frozen(_data);
_data = null;
return f;
}
}
public class Frozen
{
private Data _data;
public Frozen(Data data) => _data = data;
public string Foo => _data._foo;
public int Bar => _data._bar;
public Mutable Thaw()
{
var m = new Mutable(_data);
_data = null;
return m;
}
}

Parallels.ForEach Taking same Time as Foreach

All,
I am using the Parallels.ForEach as follows
private void fillEventDifferencesParallels(IProducerConsumerCollection<IEvent> events, Dictionary<string, IEvent> originalEvents)
{
Parallel.ForEach<IEvent>(events, evt =>
{
IEvent originalEventInfo = originalEvents[evt.EventID];
evt.FillDifferences(originalEventInfo);
});
}
Ok, so the problem I'm having is I have a list of 28 of these (a test sample, this should be able to scale to 200+) and the FillDifferences method is quite time consuming (about 4s per call). So the Average time for this to run in a normal ForEach has been around 100-130s. When I run the same thing in Parallel, it takes the same amount of time and Spikes my CPU (Intel I5, 2 Core, 2 Threads per Core) causing the app to become sluggish while this query is running (this is running on a thread that was spawned by the GUI thread).
So my question is, what am I doing wrong that is causing this to take the same amount of time? I read that List wasn't thread safe so I rewrote this to use the IProducerConsumerCollection. Is there any other pitfalls that may be causing this?
The FillDifferences Method calls a static class that uses reflection to find out how many differences there are between the original and the modified object. The static object has no 'global' variables, just ones local to the methods being invoked.
Some of you wanted to see what the FillDifferences() method called. This is where it ends up ultimately:
public List<IDifferences> ShallowCompare(object orig, object changed, string currentName)
{
List<IDifferences> differences = new List<IDifferences>();
foreach (MemberInfo m in orig.GetType().GetMembers())
{
List<IDifferences> temp = null;
//Go through all MemberInfos until you find one that is a Property.
if (m.MemberType == MemberTypes.Property)
{
PropertyInfo p = (PropertyInfo)m;
string newCurrentName = "";
if (currentName != null && currentName.Length > 0)
{
newCurrentName = currentName + ".";
}
newCurrentName += p.Name;
object propertyOrig = null;
object propertyChanged = null;
//Find the property Information from the orig object
if (orig != null)
{
propertyOrig = p.GetValue(orig, null);
}
//Find the property Information from the changed object
if (changed != null)
{
propertyChanged = p.GetValue(changed, null);
}
//Send the property to find the differences, if any. This is a SHALLOW compare.
temp = objectComparator(p, propertyOrig, propertyChanged, true, newCurrentName);
}
if (temp != null && temp.Count > 0)
{
foreach (IDifferences difference in temp)
{
addDifferenceToList(differences, difference);
}
}
}
return differences;
}
I believe you may be running into the cost of thread context switching. Since these tasks are long running I can imagine many threads are being created on the ThreadPool to handle them.
0ms == 1 thread
500ms == 2 threads
1000 ms == 3 threads
1500 ms == 4 threads
2000 ms == 5 threads
2500 ms == 6 threads
3000 ms == 7 threads
3500 ms == 8 threads
4000 ms == 9 threads
By 4000ms only the first task has been completed so this process will continue. A possible solution is as follows.
System.Threading.ThreadPool.SetMaxThreads(4, 4);
Looking at what it's doing, the only time your threads aren't doing anything is when the OS switches them out to give another thread a go, so you've got the gain of being able to run on an other core - the cost of all the context switches.
You'd have to chuck some logging in to find out for definite, but I suspect the bottle neck is physical threads, unless you have one somewhere else you' haven't posted.
If that's true, I'd be tempted to rejig the code. Have two threads one for finding properties to compare, and one for comparing them and a common queue. May be another one to throw classes in the list and collate the results.
Could be me old time batch processing head though.

Categories