Ok, playing around with the .Net 4.0 Parellel Extensions in System.Threading.Tasks. I'm finding what seems like weird behaivor, but I assume I'm jsut doing something wrong. I have an interface and a couple implementing clases, they're simple for this.
interface IParallelPipe
{
void Process(ref BlockingCollection<Stream> stream, long stageId);
}
class A:IParallelPipe
{
public void Process(ref BlockingCollection<Stream> stream, long stageId)
{
//do stuff
}
}
class B:IParallelPipe
{
public void Process(ref BlockingCollection<Stream> stream, long stageId)
{
//do stuff
}
}
I then have my class that starts things off on these. This is where the problem arises. I essentially get information about what implementing class to invoke from a type passed in and then call a factory to instantiate it and then I create a task with it and start it up. Shown here:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
foreach (Stage s in pipeline.Stages)
{
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
Task.Factory.StartNew(() => p.Process(ref bcs, s.id));
}
In each run of this in my sample, pipeline.Stages contains two elements, one that gets instantiated as class A and the other as class B. This is fine, I see it in te debugger as p coming away with the two different types. However, class B never gets called, instead I get two invocations of the A.Process(...) method. Both contain the stageId for the that was passed in (ie. the two invocations have different stageIds).
Now, if I take and separate things out a bit, just for testing I can get things to work by doing something like this:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
A a = null;
B b = null;
foreach (Stage s in pipeline.Stages)
{
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
if(p is A)
a = p;
else
b = p;
}
Task.Factory.StartNew(() => a.Process(ref bcs, idThatINeed));
Task.Factory.StartNew(() => b.Process(ref bcs, idThatINeed));
This invokes the appropriate class!
Any thoughts???
The behaviour you're describing seems odd to me - I'd expect the right instances to be used, but potentially with the wrong stage ID - the old foreach variable capture problem. The variable s is being captured, and by the time the task factory evaluates the closure, the value of s has changed.
This is definitely a problem in your code, but it doesn't explain why you're seeing a problem. Just to check, you really are declaring p within the loop, and not outside it? If you were declaring p outside the loop, that would explain everything.
Here's the fix for the capture problem though:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
foreach (Stage s in pipeline.Stages)
{
Stage copy = s;
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
Task.Factory.StartNew(() => p.Process(ref bcs, copy.id));
}
Note that we're just taking a copy inside the loop, and capturing that copy, to get a different "instance" of the variable each time.
Alternatively, instead of capturing the stage, we could just capture the ID as that's all we need:
BlockingCollection<Stream> bcs = new BlockingCollection<Stream>();
foreach (Stage s in pipeline.Stages)
{
long id = s.id;
IParallelPipe p = (IParallelPipe)Factory.GetPipe(s.type);
Task.Factory.StartNew(() => p.Process(ref bcs, id));
}
If that doesn't help, could you post a short but complete program which demonstrates the problem? That would make it a lot easier to track down.
Related
I have this code:
// positions is a List<Position>
Parallel.ForEach(positions, (position) =>
{
DeterminePostPieceIsVisited(position, postPieces);
});
private void DeterminePostPieceIsVisited(Position position, IEnumerable<Postpieces> postPieces)
{
foreach (var postPiece in postPieces)
{
if (postPiece.Deliverd)
continue;
var distanceToClosestPosition = postPiece.GPS.Distance(position.GPS);
postPiece.Deliverd = distanceToClosestPosition.HasValue && IsInRadius(distanceToClosestPosition.Value);
}
}
}
I know that 50 post pieces must have the property Deliverd set to true. But, when running this code, I get changing results. Sometimes I get 44, when I run it another time I get 47. The results are per execution different.
When I run this code using a plain foreach-loop I get the expected result. So I know my implementation of the method DeterminePostPieceIsVisited is correct.
Could someone explain to me why using the Parallel foreach gives me different results each time I execute this code?
You've already, I think, tried to avoid a race, but there is still one - if two threads are examining the same postPiece at the same time, they may both observe that Deliverd (sic) is false, and then both assess whether it's been delivered to position (a distinct value for each thread) and both attempt to set a value for Deliverd - and often, I would guess, one of them will be trying to set it to false. Simple fix:
private void DeterminePostPieceIsVisited(Position position, IEnumerable<Postpieces> postPieces)
{
foreach (var postPiece in postPieces)
{
if (postPiece.Deliverd)
continue;
var distanceToClosestPosition = postPiece.GPS.Distance(position.GPS);
var delivered = distanceToClosestPosition.HasValue && IsInRadius(distanceToClosestPosition.Value);
if(delivered)
postPiece.Deliverd = true;
}
}
Also, by the way:
When I run this code using a plain foreach-loop I get the expected result. So I know my implementation of the method DeterminePostPieceIsVisited is correct.
The correct thing to state is would be "I know my implementation is correct for single threaded access" - what you hadn't established is that the method was safe for calling from multiple threads.
I have solved my issue with ConcurrentBag<T>. Here's what I use now:
var concurrentPostPiecesList = new ConcurrentBag<Postpiece>(postPieces);
Parallel.ForEach(positions, (position) =>
{
DeterminePostPieceIsVisited(position, concurrentPostPiecesList);
});
private void DeterminePostPieceIsVisited(Position position, ConcurrentBag<Postpieces> postPieces)
{
foreach (var postPiece in postPieces)
{
if (postPiece.Deliverd)
continue;
var distanceToClosestPosition = postPiece.GPS.Distance(position.GPS);
postPiece.Deliverd = distanceToClosestPosition.HasValue && IsInRadius(distanceToClosestPosition.Value);
}
}
After Deserializing a file with just one record
It seems that it's in an infinitive loop
IndexSeries = (List<string>)bFormatter.Deserialize(fsSeriesIndexGet);
IndexSeries.ForEach(name => AddSerie(name));
//IndexSeries.ForEach(delegate(String name)
//{
// AddSerie(name);
//});
AddSerie will be executed infinitively !
You use ambiguous terms. Firstly you mention an infinite loop, and then mention that AddSerie will be executed 'infinitively' [sic]; based on this, I would think that the issue you're bringing up is not with ForEach going on and on forever (as implied/stated), but instead that AddSerie does something once that seems to be taking forever.
This could even amount to something mentioned by Joey: if you're adding an element to a list while within the context of a ForEach call, then you're always one step behind in completion, and hence won't 'complete'. However, getting an OutOfMemoryException would actually occur relatively quickly if, say, AddSerie does nothing but that - it might take longer to get to such a point if AddSerie is a relatively time-consuming method. Then again, you might never get such an exception (in the context discussed) if AddSerie simply takes a dogs age to complete without contributing to the length of the list.
Showing your AddSerie code would be potentially most helpful in determining the actual issue.
If I define:
//class level declaration (in a console app)
static List<string> strings;
static void Loop(string s)
{
Console.WriteLine(s);
strings.Add(s + "!");
}
Then
static void Main(string[] args)
{
strings = new List<string> { "sample" };
strings.ForEach(s => Console.WriteLine(s));
}
executes normally, outputing a single string, while
static void Main(string[] args)
{
strings = new List<string> { "sample" };
strings.ForEach(s => Loop(s));
}
loops indefinitely, adding '!'s in the process, and
static void Main(string[] args)
{
strings = new List<string> { "sample" };
foreach (string s in strings)
{
Loop(s);
}
}
throws an InvalidOperationException (Collection was modified; enumeration operation may not execute), which, in my opinion is the correct behavior. Why the List.ForEach method allows the list to be changed by the action, I do not know, but would like to find out :)
I am trying to build a subscription list. Let's take the example:
list of Publishers, each having a list of Magazines, each having a list of subscribers
Publishers --> Magazines --> Subscribers
Makes sense to use of a Dictionary within a Dictionary within a Dictionary in C#. Is it possible to do this without locking the entire structure when adding/removing a subscriber without race conditions?
Also the code gets messy very quickly in C# which makes me think I am not going down the right path. Is there an easier way to do this? Here are the constructor and subscribe method:
Note: The code uses Source, Type, Subscriber instead of the names above
Source ---> Type ---> Subscriber
public class SubscriptionCollection<SourceT, TypeT, SubscriberT>
{
// Race conditions here I'm sure! Not locking anything yet but should revisit at some point
ConcurrentDictionary<SourceT, ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>> SourceTypeSubs;
public SubscriptionCollection()
{
SourceTypeSubs = new ConcurrentDictionary<SourceT, ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>>();
}
public void Subscribe(SourceT sourceT, TypeT typeT, SubscriberT subT) {
ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>> typesANDsubs;
if (SourceTypeSubs.TryGetValue(sourceT, out typesANDsubs))
{
ConcurrentDictionary<SubscriberT, SubscriptionInfo> subs;
if (typesANDsubs.TryGetValue(typeT, out subs))
{
SubscriptionInfo subInfo;
if (subs.TryGetValue(subT, out subInfo))
{
// Subscription already exists - do nothing
}
else
{
subs.TryAdd(subT, new SubscriptionInfo());
}
}
else
{
// This type does not exist - first add type, then subscription
var newType = new ConcurrentDictionary<SubscriberT, SubscriptionInfo>();
newType.TryAdd(subT, new SubscriptionInfo());
typesANDsubs.TryAdd(typeT, newType);
}
}
else
{
// this source does not exist - first add source, then type, then subscriptions
var newSource = new ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>();
var newType = new ConcurrentDictionary<SubscriberT, SubscriptionInfo>();
newType.TryAdd(subT, new SubscriptionInfo());
newSource.TryAdd(typeT, newType);
SourceTypeSubs.TryAdd(sourceT, newSource);
};
}
If you use ConcurrentDictionary, like you already do, you don't need locking, that's already taken care of.
But you still have to think about race conditions and how to deal with them. Fortunately, ConcurrentDictionary gives you exactly what you need. For example, if you have two threads, that both try to subscribe to source that doesn't exist yet at the same time, only one of them will succeed. But that's why TryAdd() returns whether the addition was successful. You can't just ignore its return value. If it returns false, you know some other thread already added that source, so you can retrieve the dictionary now.
Another option is to use the GetOrAdd() method. It retrieves already existing value, and creates it if it doesn't exist yet.
I would rewrite your code like this (and make it much simpler along the way):
public void Subscribe(SourceT sourceT, TypeT typeT, SubscriberT subT)
{
var typesAndSubs = SourceTypeSubs.GetOrAdd(sourceT,
_ => new ConcurrentDictionary<TypeT, ConcurrentDictionary<SubscriberT, SubscriptionInfo>>());
var subs = typesAndSubs.GetOrAdd(typeT,
_ => new ConcurrentDictionary<SubscriberT, SubscriptionInfo>());
subs.GetOrAdd(subT, _ => new SubscriptionInfo());
}
My question is, in the below code, can I be sure that the instance methods will be accessing the variables I think they will, or can they be changed by another thread while I'm still working? Do closures have anything to do with this, i.e. will I be working on a local copy of the IEnumerable<T> so enumeration is safe?
To paraphrase my question, do I need any locks if I'm never writing to shared variables?
public class CustomerClass
{
private Config cfg = (Config)ConfigurationManager.GetSection("Customer");
public void Run()
{
var serviceGroups = this.cfg.ServiceDeskGroups.Select(n => n.Group).ToList();
var groupedData = DataReader.GetSourceData().AsEnumerable().GroupBy(n => n.Field<int>("ID"));
Parallel.ForEach<IGrouping<int, DataRow>, CustomerDataContext>(
groupedData,
() => new CustomerDataContext(),
(g, _, ctx) =>
{
var inter = this.FindOrCreateInteraction(ctx, g.Key);
inter.ID = g.Key;
inter.Title = g.First().Field<string>("Title");
this.CalculateSomeProperty(ref inter, serviceGroups);
return ctx;
},
ctx => ctx.SubmitAllChanges());
}
private Interaction FindOrCreateInteraction(CustomerDataContext ctx, int ID)
{
var inter = ctx.Interactions.Where(n => n.Id = ID).SingleOrDefault();
if (inter == null)
{
inter = new Interaction();
ctx.InsertOnSubmit(inter);
}
return inter;
}
private void CalculateSomeProperty(ref Interaction inter, IEnumerable<string> serviceDeskGroups)
{
// Reads from the List<T> class instance variable. Changes the state of the ref'd object.
if (serviceGroups.Contains(inter.Group))
{
inter.Ours = true;
}
}
}
I seem to have found the answer and in the process, also the question.
The real question was whether local "variables", that turn out to be actually objects, can be trusted for concurrent access. The answer is no, if they happen to have internal state that is not handled in a thread-safe manner, all bets are off. The closure doesn't help, it just captures a reference to said object.
In my specific case - concurrent reads from IEnumerable<T> and no writes to it, it is actually thread safe, because each call to foreach, Contains(), Where(), etc. gets a fresh new IEnumerator, which is only visible from the thread that requested it. Any other objects, however, must also be checked, one by one.
So, hooray, no locks or synchronized collections for me :)
Thanks to #ebb and #Dave, although you didn't answer the question directly, you pointed me in the right direction.
If you're interested in the results, this is a run on my home PC (a quad-core) with Thread.SpinWait to simulate the processing time of a row. The real app had an improvement of almost 2X (01:03 vs 00:34) on a dual-core hyper-threaded machine with SQL Server on the local network.
Single-threaded, using foreach. I don't know why, but there is a pretty high number of cross-core context switches.
Using Parallel.ForEach, lock-free with thread-locals where needed.
Right now, from what I can tell, your instance methods are not using any member variables. That makes them stateless and therefore threadsafe. However, in that same case, you'd be better off marking them "static" for code clarity and a slight performance benefit.
If those instance methods were using a member variable, then they'd only be as threadsafe as that variable (for example, if you used a simple list, it would not be threadsafe and you may see weird behavior). Long story short, member variables are the enemy of easy thread safety.
Here's my refactor (disclaimer, not tested). If you want to provide data that's passed in, you'll stay saner if you pass them as parameters and don't keep them as member variables :
UPDATE: You asked for a way to reference your read only list, so I've added that and removed the static tags (so that the instance variable can be shared).
public class CustomerClass
{
private List<string> someReadOnlyList;
public CustomerClass(){
List<string> tempList = new List<string>() { "string1", "string2" };
someReadOnlyList = ArrayList.Synchronized(tempList);
}
public void Run()
{
var groupedData = DataReader.GetSourceData().AsEnumerable().GroupBy(n => n.Field<int>("ID"));
Parallel.ForEach<IGrouping<int, DataRow>, CustomerDataContext>(
groupedData,
() => new CustomerDataContext(),
(g, _, ctx) =>
{
var inter = FindOrCreateInteraction(ctx, g.Key);
inter.ID = g.Key;
inter.Title = g.First().Field<string>("Title");
CalculateSomeProperty(ref inter);
return ctx;
},
ctx => ctx.SubmitAllChanges());
}
private Interaction FindOrCreateInteraction(CustomerDataContext ctx, int ID)
{
var query = ctx.Interactions.Where(n => n.Id = ID);
if (query.Any())
{
return query.Single();
}
else
{
var inter = new Interaction();
ctx.InsertOnSubmit(inter);
return inter;
}
}
private void CalculateSomeProperty(ref Interaction inter)
{
Console.Writeline(someReadOnlyList[0]);
//do some other stuff
}
}
I asked a question about building custom Thread Safe Generic List now I am trying to unit test it and I absolutely have no idea how to do that. Since the lock happens inside the ThreadSafeList class I am not sure how to make the list to lock for a period of time while I am try to mimic the multiple add call. Thanks.
Can_add_one_item_at_a_time
[Test]
public void Can_add_one_item_at_a_time() //this test won't pass
{
//I am not sure how to do this test...
var list = new ThreadSafeList<string>();
//some how need to call lock and sleep inside list instance
//say somehow list locks for 1 sec
var ta = new Thread(x => list.Add("a"));
ta.Start(); //does it need to aboard say before 1 sec if locked
var tb = new Thread(x => list.Add("b"));
tb.Start(); //does it need to aboard say before 1 sec if locked
//it involves using GetSnapshot()
//which is bad idea for unit testing I think
var snapshot = list.GetSnapshot();
Assert.IsFalse(snapshot.Contains("a"), "Should not contain a.");
Assert.IsFalse(snapshot.Contains("b"), "Should not contain b.");
}
Snapshot_should_be_point_of_time_only
[Test]
public void Snapshot_should_be_point_of_time_only()
{
var list = new ThreadSafeList<string>();
var ta = new Thread(x => list.Add("a"));
ta.Start();
ta.Join();
var snapshot = list.GetSnapshot();
var tb = new Thread(x => list.Add("b"));
tb.Start();
var tc = new Thread(x => list.Add("c"));
tc.Start();
tb.Join();
tc.Join();
Assert.IsTrue(snapshot.Count == 1, "Snapshot should only contain 1 item.");
Assert.IsFalse(snapshot.Contains("b"), "Should not contain a.");
Assert.IsFalse(snapshot.Contains("c"), "Should not contain b.");
}
Instance method
public ThreadSafeList<T> Instance<T>()
{
return new ThreadSafeList<T>();
}
Let's look at your first test, Can_add_one_item_at_a_time.
First of all, your exit conditions don't make sense. Both items should be added, just one at a time. So of course your test will fail.
You also don't need to make a snapshot; remember, this is a test, nothing else is going to be touching the list while your test is running.
Last but not least, you need to make sure that you aren't trying to evaluate your exit conditions until all of the threads have actually finished. Simplest way is to use a counter and a wait event. Here's an example:
[Test]
public void Can_add_from_multiple_threads()
{
const int MaxWorkers = 10;
var list = new ThreadSafeList<int>(MaxWorkers);
int remainingWorkers = MaxWorkers;
var workCompletedEvent = new ManualResetEvent(false);
for (int i = 0; i < MaxWorkers; i++)
{
int workerNum = i; // Make a copy of local variable for next thread
ThreadPool.QueueUserWorkItem(s =>
{
list.Add(workerNum);
if (Interlocked.Decrement(ref remainingWorkers) == 0)
workCompletedEvent.Set();
});
}
workCompletedEvent.WaitOne();
workCompletedEvent.Close();
for (int i = 0; i < MaxWorkers; i++)
{
Assert.IsTrue(list.Contains(i), "Element was not added");
}
Assert.AreEqual(MaxWorkers, list.Count,
"List count does not match worker count.");
}
Now this does carry the possibility that the Add happens so quickly that no two threads will ever attempt to do it at the same time. No Refunds No Returns partially explained how to insert a conditional delay. I would actually define a special testing flag, instead of DEBUG. In your build configuration, add a flag called TEST, then add this to your ThreadSafeList class:
public class ThreadSafeList<T>
{
// snip fields
public void Add(T item)
{
lock (sync)
{
TestUtil.WaitStandardThreadDelay();
innerList.Add(item);
}
}
// snip other methods/properties
}
static class TestUtil
{
[Conditional("TEST")]
public static void WaitStandardThreadDelay()
{
Thread.Sleep(1000);
}
}
This will cause the Add method to wait 1 second before actually adding the item as long as the build configuration defines the TEST flag. The entire test should take at least 10 seconds; if it finishes any faster than that, something's wrong.
With that in mind, I'll leave the second test up to you. It's similar.
You will need to insert some TESTONLY code that adds a delay in your lock. You can create a function like this:
[Conditional("DEBUG")]
void SleepForABit(int delay) { thread.current.sleep(delay); }
and then call it in your class. The Conditional attribute ensure it is only called in DEBUG builds and you can leave it in your compiled code.
Write something which consistently delays 100Ms or so and something that never waits and let'em slug it out.
You might want to take a look at Chess. It's a program specifically designed to find race conditions in multi-threaded code.