My question is, in the below code, can I be sure that the instance methods will be accessing the variables I think they will, or can they be changed by another thread while I'm still working? Do closures have anything to do with this, i.e. will I be working on a local copy of the IEnumerable<T> so enumeration is safe?
To paraphrase my question, do I need any locks if I'm never writing to shared variables?
public class CustomerClass
{
private Config cfg = (Config)ConfigurationManager.GetSection("Customer");
public void Run()
{
var serviceGroups = this.cfg.ServiceDeskGroups.Select(n => n.Group).ToList();
var groupedData = DataReader.GetSourceData().AsEnumerable().GroupBy(n => n.Field<int>("ID"));
Parallel.ForEach<IGrouping<int, DataRow>, CustomerDataContext>(
groupedData,
() => new CustomerDataContext(),
(g, _, ctx) =>
{
var inter = this.FindOrCreateInteraction(ctx, g.Key);
inter.ID = g.Key;
inter.Title = g.First().Field<string>("Title");
this.CalculateSomeProperty(ref inter, serviceGroups);
return ctx;
},
ctx => ctx.SubmitAllChanges());
}
private Interaction FindOrCreateInteraction(CustomerDataContext ctx, int ID)
{
var inter = ctx.Interactions.Where(n => n.Id = ID).SingleOrDefault();
if (inter == null)
{
inter = new Interaction();
ctx.InsertOnSubmit(inter);
}
return inter;
}
private void CalculateSomeProperty(ref Interaction inter, IEnumerable<string> serviceDeskGroups)
{
// Reads from the List<T> class instance variable. Changes the state of the ref'd object.
if (serviceGroups.Contains(inter.Group))
{
inter.Ours = true;
}
}
}
I seem to have found the answer and in the process, also the question.
The real question was whether local "variables", that turn out to be actually objects, can be trusted for concurrent access. The answer is no, if they happen to have internal state that is not handled in a thread-safe manner, all bets are off. The closure doesn't help, it just captures a reference to said object.
In my specific case - concurrent reads from IEnumerable<T> and no writes to it, it is actually thread safe, because each call to foreach, Contains(), Where(), etc. gets a fresh new IEnumerator, which is only visible from the thread that requested it. Any other objects, however, must also be checked, one by one.
So, hooray, no locks or synchronized collections for me :)
Thanks to #ebb and #Dave, although you didn't answer the question directly, you pointed me in the right direction.
If you're interested in the results, this is a run on my home PC (a quad-core) with Thread.SpinWait to simulate the processing time of a row. The real app had an improvement of almost 2X (01:03 vs 00:34) on a dual-core hyper-threaded machine with SQL Server on the local network.
Single-threaded, using foreach. I don't know why, but there is a pretty high number of cross-core context switches.
Using Parallel.ForEach, lock-free with thread-locals where needed.
Right now, from what I can tell, your instance methods are not using any member variables. That makes them stateless and therefore threadsafe. However, in that same case, you'd be better off marking them "static" for code clarity and a slight performance benefit.
If those instance methods were using a member variable, then they'd only be as threadsafe as that variable (for example, if you used a simple list, it would not be threadsafe and you may see weird behavior). Long story short, member variables are the enemy of easy thread safety.
Here's my refactor (disclaimer, not tested). If you want to provide data that's passed in, you'll stay saner if you pass them as parameters and don't keep them as member variables :
UPDATE: You asked for a way to reference your read only list, so I've added that and removed the static tags (so that the instance variable can be shared).
public class CustomerClass
{
private List<string> someReadOnlyList;
public CustomerClass(){
List<string> tempList = new List<string>() { "string1", "string2" };
someReadOnlyList = ArrayList.Synchronized(tempList);
}
public void Run()
{
var groupedData = DataReader.GetSourceData().AsEnumerable().GroupBy(n => n.Field<int>("ID"));
Parallel.ForEach<IGrouping<int, DataRow>, CustomerDataContext>(
groupedData,
() => new CustomerDataContext(),
(g, _, ctx) =>
{
var inter = FindOrCreateInteraction(ctx, g.Key);
inter.ID = g.Key;
inter.Title = g.First().Field<string>("Title");
CalculateSomeProperty(ref inter);
return ctx;
},
ctx => ctx.SubmitAllChanges());
}
private Interaction FindOrCreateInteraction(CustomerDataContext ctx, int ID)
{
var query = ctx.Interactions.Where(n => n.Id = ID);
if (query.Any())
{
return query.Single();
}
else
{
var inter = new Interaction();
ctx.InsertOnSubmit(inter);
return inter;
}
}
private void CalculateSomeProperty(ref Interaction inter)
{
Console.Writeline(someReadOnlyList[0]);
//do some other stuff
}
}
Related
I have the following function which is intended to "memoize" argument-less functions. Meaning to only call the function once, and then return the same result all other times.
private static Func<T> Memoize<T>(Func<T> func)
{
var lockObject = new object();
var value = default(T);
var inited = false;
return () => {
if (inited)
return value;
lock (lockObject) {
if (!inited) {
value = func();
inited = true;
}
}
return value;
};
}
Can I be certain that if a thread reads "inited == true" outside the lock, it will then read the "value" which was written before "inited" was set to true?
Note: Double-checked locking in .NET covers the fact the it should work and this question mainly to check if my implementation is correct and maybe get better alternatives.
No, because inited is not volatile. volatile gives you the memory release and acquire fences you need in order to establish the correct happens-before relationship.
If there's no release fence before inited is set to true, then the value may not be completely written by the time another thread reads inited and sees it as true, which could result in a half-constructed object being returned. Similarly, if there's a release fence but no corresponding acquire fence before reading inited in the first check, it's possible that the object is fully constructed, but that the CPU core that saw inited as true hasn't yet seen the memory effects of value being written (cache coherency does not necessarily mandate that the effects of consecutive writes are seen in order on other cores). This would again potentially result in a half-constructed object being returned.
This is, by the way, an instance of the already very well-documented double-checked locking pattern.
Instead of using a lambda that captures local variables (which will make the compiler generate an implicit class to hold the closed-over variables in non-volatile fields), I suggest explicitly creating your own class with a volatile filed for value.
private class Memoized<T>
{
public T value;
public volatile bool inited;
}
private static Func<T> Memoize<T>(Func<T> func)
{
var memoized = new Memoized<T>();
return () => {
if (memoized.inited)
return memoized.value;
lock (memoized) {
if (!memoized.inited) {
memoized.value = func();
memoized.inited = true;
}
}
return memoized.value;
};
}
Of course, as others have mentioned Lazy<T> exists for this very purpose. Use it instead of rolling your own, but it's always a good idea to know the theory behind how something works.
I think you would be better off using the standard Lazy<T> class to implement the functionality you need, as in:
private static Func<T> Memoize<T>(Func<T> func)
{
var lazyValue = new Lazy<T>(func, isThreadSafe: true);
return () => lazyValue.Value;
}
No, that code is not safe. The compiler is free to reorder the writes to value and inited; so is the memory system. This means that another thread might see inited set to true whilst value is still at its default.
This pattern is called double-checked locking, and is discussed by Albahari under Lazy Initialization. The recommended solution is to use the built-in Lazy<T> class. An equivalent implementation would be the following:
private static Func<T> Memoize<T>(Func<T> func)
{
var lazy = new Lazy<T>(func);
return () => lazy.Value;
}
When an online user send a message to a offline user , I keep those messages in a ConcurrentDictionary . each user is running/spinning in its own Task(thread).
public static ConcurrentDictionary<int, List<MSG>> DIC_PROFILEID__MSGS = new ConcurrentDictionary...
So the method look like :
/*1*/ public static void SaveLaterMessages(MSG msg)
/*2*/ {
/*3*/ var dic = Globals.DIC_PROFILEID__MSGS;
/*4*/
/*5*/
/*6*/ lock (saveLaterMssagesLocker)
/*7*/ {
/*8*/ List<MSG> existingLst;
/*9*/ if (!dic.TryGetValue(msg.To, out existingLst))
/*10*/ {
/*11*/ existingLst = new List<MSG>();
/*12*/ dic.TryAdd(msg.To, existingLst);
/*13*/ }
/*14*/ existingLst.Add(msg);
/*15*/ }
/*16*/ }
Please notice the lock at #6. I did this because if 2 threads are in #10 , they will both cause creation of a new List (which is bad).
But I'm bothered by the fact that I lock "too much".
In other words , If I send message to offline-user-20 , there is no reason that one will not be able to send message to offline-user-22.
So I'm thinking about creating additional dictionary of locks :
Dictionary <int , object> DicLocks = new Dictionary <int , object>();
Where the int key is the userID
Later , initialize each entry with new Object()
So now my method will look like :
public static void SaveLaterMessages(MSG msg)
{
var dic = Globals.DIC_PROFILEID__MSGS;
lock (Globals.DicLocks[msg.To]) //changed here !!!
{
List<MSG> existingLst;
if (!dic.TryGetValue(msg.To, out existingLst))
{
existingLst = new List<MSG>();
dic.TryAdd(msg.To, existingLst);
}
existingLst.Add(msg);
}
}
Now , users can inset messages to a different offline-users without interfering.
Question
1) Am I right with this approach , or is there any better approach ?
2) I really hate to lock around the ConcurrentDictionary, it is 100% not right. should I make it a regular dictionary ?
ConcurrentDictonary has tools to help you do the situation you are in, if you switch your retrieval/creation of your existing list to a single thread safe operation it makes the problem much simpler.
public static void SaveLaterMessages(MSG msg)
{
var dic = Globals.DIC_PROFILEID__MSGS;
List<MSG> existingLst = dic.GetOrAdd(msg.To, (key) => new List<MSG>());
lock(((ICollection)existingLst).SyncRoot)
{
existingLst.Add(msg);
}
}
This attempts to get the list from the dictionary and creates a new list if it did not exist, it then locks only on the non thread safe operation of adding to the list on the list object itself.
If possible a even better option is replace your List<MSG> with a thread safe collection like ConcurrentQueue<MSG> and you won't need to perform any locks at all (the ability to do this all depends on how messages are used once they are in the list). If you do need to use a List you don't need Globals.DicLocks[msg.To] to lock on, it is perfectly acceptable to lock on the list object that is returned from the collection.
One advantage you could get from a second lock object is if you are going to have lots of reads but very few writes you could use a ReaderWriterLockSlim to allow multiple concurrent readers but only one writer.
public static void SaveLaterMessages(MSG msg)
{
var dic = Globals.DIC_PROFILEID__MSGS;
List<MSG> existingLst = dic.GetOrAdd(msg.To, (key) => new List<MSG>());
var lockingObj = GetLockingObject(existingLst);
lockingObj.EnterWriteLock();
try
{
existingLst.Add(msg);
}
finally
{
lockingObj.ExitWriteLock();
}
}
private static ConcurrentDictionary<List<MSG>, ReaderWriterLockSlim> _msgLocks = new ConcurrentDictionary<List<MSG>, ReaderWriterLockSlim>();
public static ReaderWriterLockSlim GetLockingObject(List<MSG> msgList)
{
_msgLocks.GetOrAdd(msgList, (key) => new ReaderWriterLockSlim());
}
//Elsewhere in multiple threads.
public MSG PeekNewestMessage(int myId)
{
var dic = Globals.DIC_PROFILEID__MSGS;
var list = dic[myId];
var lockingObj = GetLockingObject(list);
lockingObj.EnterReadLock();
try
{
return list.FirstOrDefault();
}
finally
{
lockingObj.ExitReadLock();
}
}
However I would still recommend the ConcurrentQueue<MSG> approach over this approach.
You could also wrap your "List()" code in a class with a singleton pattern that manages the lifetime of the list per user and adding / removing from it, then your concurrent dictionary and support code becomes lock free. Your "List()" could also be a concurrent queue and further reduce the work you need to do around locking for future adds etc.
http://msdn.microsoft.com/en-us/library/dd267265(v=vs.110).aspx
For high volumes, you could also use a service bus type pattern to cross thread boundaries and create a queue for each user and push messages down it for consumption when the user comes back online.
I am trying to build a subscription list. Let's take the example:
list of Publishers, each having a list of Magazines, each having a list of subscribers
Publishers --> Magazines --> Subscribers
Makes sense to use of a Dictionary within a Dictionary within a Dictionary in C#. Is it possible to do this without locking the entire structure when adding/removing a subscriber without race conditions?
Also the code gets messy very quickly in C# which makes me think I am not going down the right path. Is there an easier way to do this? Here are the constructor and subscribe method:
Note: The code uses Source, Type, Subscriber instead of the names above
Source ---> Type ---> Subscriber
public class SubscriptionCollection<SourceT, TypeT, SubscriberT>
{
// Race conditions here I'm sure! Not locking anything yet but should revisit at some point
ConcurrentDictionary<SourceT, ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>> SourceTypeSubs;
public SubscriptionCollection()
{
SourceTypeSubs = new ConcurrentDictionary<SourceT, ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>>();
}
public void Subscribe(SourceT sourceT, TypeT typeT, SubscriberT subT) {
ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>> typesANDsubs;
if (SourceTypeSubs.TryGetValue(sourceT, out typesANDsubs))
{
ConcurrentDictionary<SubscriberT, SubscriptionInfo> subs;
if (typesANDsubs.TryGetValue(typeT, out subs))
{
SubscriptionInfo subInfo;
if (subs.TryGetValue(subT, out subInfo))
{
// Subscription already exists - do nothing
}
else
{
subs.TryAdd(subT, new SubscriptionInfo());
}
}
else
{
// This type does not exist - first add type, then subscription
var newType = new ConcurrentDictionary<SubscriberT, SubscriptionInfo>();
newType.TryAdd(subT, new SubscriptionInfo());
typesANDsubs.TryAdd(typeT, newType);
}
}
else
{
// this source does not exist - first add source, then type, then subscriptions
var newSource = new ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>();
var newType = new ConcurrentDictionary<SubscriberT, SubscriptionInfo>();
newType.TryAdd(subT, new SubscriptionInfo());
newSource.TryAdd(typeT, newType);
SourceTypeSubs.TryAdd(sourceT, newSource);
};
}
If you use ConcurrentDictionary, like you already do, you don't need locking, that's already taken care of.
But you still have to think about race conditions and how to deal with them. Fortunately, ConcurrentDictionary gives you exactly what you need. For example, if you have two threads, that both try to subscribe to source that doesn't exist yet at the same time, only one of them will succeed. But that's why TryAdd() returns whether the addition was successful. You can't just ignore its return value. If it returns false, you know some other thread already added that source, so you can retrieve the dictionary now.
Another option is to use the GetOrAdd() method. It retrieves already existing value, and creates it if it doesn't exist yet.
I would rewrite your code like this (and make it much simpler along the way):
public void Subscribe(SourceT sourceT, TypeT typeT, SubscriberT subT)
{
var typesAndSubs = SourceTypeSubs.GetOrAdd(sourceT,
_ => new ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>());
var subs = typesAndSubs.GetOrAdd(typeT,
_ => new ConcurrentDictionary<SubscriberT, SubscriptionInfo>());
subs.GetOrAdd(subT, _ => new SubscriptionInfo());
}
While i was looking at some legacy application code i noticed it is using a string object to do thread synchronization. I'm trying to resolve some thread contention issues in this program and was wondering if this could lead so some strange situations. Any thoughts ?
private static string mutex= "ABC";
internal static void Foo(Rpc rpc)
{
lock (mutex)
{
//do something
}
}
Strings like that (from the code) could be "interned". This means all instances of "ABC" point to the same object. Even across AppDomains you can point to the same object (thx Steven for the tip).
If you have a lot of string-mutexes, from different locations, but with the same text, they could all lock on the same object.
The intern pool conserves string storage. If you assign a literal string constant to several variables, each variable is set to reference the same constant in the intern pool instead of referencing several different instances of String that have identical values.
It's better to use:
private static readonly object mutex = new object();
Also, since your string is not const or readonly, you can change it. So (in theory) it is possible to lock on your mutex. Change mutex to another reference, and then enter a critical section because the lock uses another object/reference. Example:
private static string mutex = "1";
private static string mutex2 = "1"; // for 'lock' mutex2 and mutex are the same
private static void CriticalButFlawedMethod() {
lock(mutex) {
mutex += "."; // Hey, now mutex points to another reference/object
// You are free to re-enter
...
}
}
To answer your question (as some others already have), there are some potential problems with the code example you provided:
private static string mutex= "ABC";
The variable mutex is not immutable.
The string literal "ABC" will refer to the same interned object reference everywhere in your application.
In general, I would advise against locking on strings. However, there is a case I've ran into where it is useful to do this.
There have been occasions where I have maintained a dictionary of lock objects where the key is something unique about some data that I have. Here's a contrived example:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
ConcurrentDictionary<int, object> _locks = new ConcurrentDictionary<int, object>();
void DoSomething(SomeEntity entity)
{
var mutex = _locks.GetOrAdd(entity.Id, id => new object());
lock(mutex)
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The goal of code like this is to serialize concurrent invocations of DoSomething() within the context of the entity's Id. The downside is the dictionary. The more entities there are, the larger it gets. It's also just more code to read and think about.
I think .NET's string interning can simplify things:
void Main()
{
var a = new SomeEntity{ Id = 1 };
var b = new SomeEntity{ Id = 2 };
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(a));
Task.Run(() => DoSomething(b));
Task.Run(() => DoSomething(b));
}
void DoSomething(SomeEntity entity)
{
lock(string.Intern("dee9e550-50b5-41ae-af70-f03797ff2a5d:" + entity.Id))
{
Console.WriteLine("Inside {0}", entity.Id);
// do some work
}
}
The difference here is that I am relying on the string interning to give me the same object reference per entity id. This simplifies my code because I don't have to maintain the dictionary of mutex instances.
Notice the hard-coded UUID string that I'm using as a namespace. This is important if I choose to adopt the same approach of locking on strings in another area of my application.
Locking on strings can be a good idea or a bad idea depending on the circumstances and the attention that the developer gives to the details.
If you need to lock a string, you can create an object that pairs the string with an object that you can lock with.
class LockableString
{
public string _String;
public object MyLock; //Provide a lock to the data in.
public LockableString()
{
MyLock = new object();
}
}
My 2 cents:
ConcurrentDictionary is 1.5X faster than interned strings. I did a benchmark once.
To solve the "ever-growing dictionary" problem you can use a dictionary of semaphores instead of a dictionary of objects. AKA use ConcurrentDictionary<string, SemaphoreSlim> instead of <string, object>. Unlike the lock statements, Semaphores can track how many threads have locked on them. And once all the locks are released - you can remove it from the dictionary. See this question for solutions like that: Asynchronous locking based on a key
Semaphores are even better because you can even control the concurrency level. Like, instead of "limiting to one concurrent run" - you can "limit to 5 concurrent runs". Awesome free bonus isn't it? I had to code an email-service that needed to limit the number of concurrent connections to a server - this came very very handy.
I imagine that locking on interned strings could lead to memory bloat if the strings generated are many and are all unique. Another approach that should be more memory efficient and solve the immediate deadlock issue is
// Returns an Object to Lock with based on a string Value
private static readonly ConditionalWeakTable<string, object> _weakTable = new ConditionalWeakTable<string, object>();
public static object GetLock(string value)
{
if (value == null) throw new ArgumentNullException(nameof(value));
return _weakTable.GetOrCreateValue(value.ToLower());
}
Ok, playing around with the .Net 4.0 Parellel Extensions in System.Threading.Tasks. I'm finding what seems like weird behaivor, but I assume I'm jsut doing something wrong. I have an interface and a couple implementing clases, they're simple for this.
interface IParallelPipe
{
void Process(ref BlockingCollection<Stream> stream, long stageId);
}
class A:IParallelPipe
{
public void Process(ref BlockingCollection<Stream> stream, long stageId)
{
//do stuff
}
}
class B:IParallelPipe
{
public void Process(ref BlockingCollection<Stream> stream, long stageId)
{
//do stuff
}
}
I then have my class that starts things off on these. This is where the problem arises. I essentially get information about what implementing class to invoke from a type passed in and then call a factory to instantiate it and then I create a task with it and start it up. Shown here:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
foreach (Stage s in pipeline.Stages)
{
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
Task.Factory.StartNew(() => p.Process(ref bcs, s.id));
}
In each run of this in my sample, pipeline.Stages contains two elements, one that gets instantiated as class A and the other as class B. This is fine, I see it in te debugger as p coming away with the two different types. However, class B never gets called, instead I get two invocations of the A.Process(...) method. Both contain the stageId for the that was passed in (ie. the two invocations have different stageIds).
Now, if I take and separate things out a bit, just for testing I can get things to work by doing something like this:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
A a = null;
B b = null;
foreach (Stage s in pipeline.Stages)
{
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
if(p is A)
a = p;
else
b = p;
}
Task.Factory.StartNew(() => a.Process(ref bcs, idThatINeed));
Task.Factory.StartNew(() => b.Process(ref bcs, idThatINeed));
This invokes the appropriate class!
Any thoughts???
The behaviour you're describing seems odd to me - I'd expect the right instances to be used, but potentially with the wrong stage ID - the old foreach variable capture problem. The variable s is being captured, and by the time the task factory evaluates the closure, the value of s has changed.
This is definitely a problem in your code, but it doesn't explain why you're seeing a problem. Just to check, you really are declaring p within the loop, and not outside it? If you were declaring p outside the loop, that would explain everything.
Here's the fix for the capture problem though:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
foreach (Stage s in pipeline.Stages)
{
Stage copy = s;
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
Task.Factory.StartNew(() => p.Process(ref bcs, copy.id));
}
Note that we're just taking a copy inside the loop, and capturing that copy, to get a different "instance" of the variable each time.
Alternatively, instead of capturing the stage, we could just capture the ID as that's all we need:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
foreach (Stage s in pipeline.Stages)
{
long id = s.id;
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
Task.Factory.StartNew(() => p.Process(ref bcs, id));
}
If that doesn't help, could you post a short but complete program which demonstrates the problem? That would make it a lot easier to track down.