Basically what I want to do is this psuedo code
List<DatabaseRecord> records;
List<ChangedItem> changedItems;
Parallel.ForEach<DatabaseRecord>(records, (item, loopState) =>
{
if (item.HasChanged)
{
lock (changedItems)
{
changedItems.Add(new ChangedItem(item));
}
}
});
But what I'm worried about is locking on the changedItems. While it works, I have heard that it has to serialize the locked object over and over. Is there a better way of doing this?
Why don't you use PLinq instead? No locking needed:
changedItems = records.AsParallel()
.Where(x => x.HasChanged)
.Select(x => new ChangedItem(x))
.ToList();
Since you are projecting into a new list of ChangedItem and do not have any side effects this would be the way to go in my opinion.
Could you use a ConcurrentCollection for your changedItems. Something like ConcurrentQueue? Then you wouldn't need to lock at all.
Update:
With regard to the ConcurrentQueue, the Enqueue won't block the thread in keeping the operation thread safe. It stays in user mode with a SpinWait...
public void Enqueue(T item)
{
SpinWait wait = new SpinWait();
while (!this.m_tail.TryAppend(item, ref this.m_tail))
{
wait.SpinOnce();
}
}
I don't think that locking on the list will cause the list to serialize/deserialize as locking takes place on a private field available on all objects (syncBlockIndex). However, the recommended way to go about locking is to create a private field that you will specifically use for locking:
object _lock = new object();
This is because you have control over what you lock on. If you publish access to your list via a property, then code outside of your control might take a lock on that object and thus introduce a deadlock situation.
With regards to PLinq, I think deciding to use it depends on what your host and load is like. For example, in ASP.NET, PLINQ is more processor hungry which will get it done more quickly but at the expense of taking processing away from serving other web requests. The syntax is admittedly a lot cleaner.
It seems that you using this code in single thread ?
if it's single thread , No lock needed .
Does it works alright when you remove the line " lock (changedItems) "
The code that #BrokenGlass have posted is more clear and easy to understand.
Related
I would like to learn the best way to use locking in a Parallel.ForEach. Should I lock the whole code block inside of the iteration or should I only lock the object which I want to use as multi-thread safe before I do any process?
for example:
Parallel.ForEach(list, item =>
{
lock (secondList)
{
//consider other processes works in here
if (item.Active)
secondList.Add(item);
}
});
or
Parallel.ForEach(list, item =>
{
//consider other processes works in here
if (item.Active)
{
lock (secondList)
secondList.Add(item);
}
});
If your application concurrent(parallelism is one of the types of concurrency) and you want to use a thread-safe collection, there is no reason to lock collections on your own. There are bunch of concurrent collections provided by Microsoft which exist in System.Collections.Concurrent
Thread-safe Collections
Parallel.ForEach is a way to try to get more parallelism into your code. lock tends to reduce parallelism in your code1. As such, it's rarely correct to want to combine them2.
As Olegl suggested, Concurrent Collections could be one way to go here to avoid the lock.
Another interesting approach would be to use PLINQ here instead of Parallel.ForEach. It's 2019 already, what's interesting about writing another loop?
This would do something like this instead:
secondList.AddRange(list.AsParallel.Where(item =>
{
//consider other processes works in here
return item.Active;
});
This allows you to keep your non-thread-safe secondList collection but still not worry about locks - because it's your own existing thread calling AddRange that ends up consuming that IEnumerable<T> that PLINQ offers; so only that one thread is adding items to the collection.
PLINQ tries to tune buffering options but may not achieve a good enough job, depending on the size of the input list and how many threads it chooses to use. If you're unhappy with any speedup from it (or it doesn't achieve any), try playing with the WithXxx methods it offers before writing it off.
If I had to pick between your two examples (assuming that they're both otherwise correct), I'd choose option 2, because it does less work whilst holding a lock that is being hotly contested by all of the other workers.
1Unless you know that all of the locks that will be requested are fine-grained enough that no two parallel threads will attempt to acquire the same lock. But if we know that, why are we using locks again?
2And so I'll go out on a limb and say it's "always" incorrect to combine them when all parallel locks are on the same lock object, unless there's significant processing happening in parallel outside of the lock.
For example that kind of usings:
public static T CastTo<T>(this ArrayOfKeyValueOfstringstringKeyValueOfstringstring[] item)
{
var obj = Activator.CreateInstance(typeof(T), true);
var padlock = new object();
Parallel.ForEach(typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public), prop =>
{
lock (padlock)
{
if (!prop.TryGetAttribute<GldnhrnFieldAttribute>(out var fieldAttribute))
return;
var code = fieldAttribute?.Code;
if (string.IsNullOrEmpty(code)) return;
SetPropertyValue(item, obj, prop);
}
});
return (T)obj;
}
as you can see I want to cast my data to class over here. same question with different code block, should I lock all code block or should I lock only before calling SetPropertyValue method?
public static T CastTo<T>(this ArrayOfKeyValueOfstringstringKeyValueOfstringstring[] item)
{
var obj = Activator.CreateInstance(typeof(T), true);
var padlock = new object();
Parallel.ForEach(typeof(T).GetProperties(BindingFlags.Instance | BindingFlags.Public), prop =>
{
if (!prop.TryGetAttribute<GldnhrnFieldAttribute>(out var fieldAttribute))
return;
var code = fieldAttribute?.Code;
if (string.IsNullOrEmpty(code)) return;
lock (padlock)
SetPropertyValue(item, obj, prop);
});
return (T)obj;
}
I am upgrading some legacy WinForms code and I am trying to figure out what the "right way" as of .NET 4.6.1 to refactor the following.
The current code is doing a tight while(true) loop while checking a bool property. This property puts a lock() on a generic List<T> and then returns true if it has no items (list.Count == 0).
The loop has the dreaded Application.DoEvents() in it to make sure the message pump continues processing, otherwise it would lock up the application.
Clearly, this needs to go.
My confusion is how to start a basic refactoring where it still can check the queue length, while executing on a thread and not blowing out the CPU for no reason. A delay between checks here is fine, even a "long" one like 100ms+.
I was going to go with an approach that makes the method async and lets a Task run to do the check:
await Task.Run(() => KeepCheckingTheQueue());
Of course, this keeps me in the situation of the method needing to ... loop to check the state of the queue.
Between the waiting, awaiting, and various other methods that can be used to move this stuff to the thread pool ... any suggestion on how best to handle this?
What I need is how to best "poll' a boolean member (or property) while freeing the UI, without the DoEvents().
The answer you're asking for:
private async Task WaitUntilAsync(Func<bool> func)
{
while (!func())
await Task.Delay(100);
}
await WaitUntilAsync(() => list.Count == 0);
However, polling like this is a really poor approach. If you can describe the actual problem your code is solving, then you can get better solutions.
For example, if the list represents some queue of work, and your code is wanting to asynchronously wait until it's done, then this can be better coded using an explicit signal (e.g., TaskCompletionSource<T>) or a true producer/consumer queue (e.g., TPL Dataflow).
It's generally never a good idea for client code to worry about locking a collection (or sprinkling your code with lock() blocks everywhere) before querying it. Best to encapsulate that complexity out.
Instead I recommend using one of the .NET concurrent collections such as ConcurrentBag. No need for creating a Task which is somewhat expensive.
If your collection does not change much you might want to consider one of the immutable thread-safe collections such as ImmutableList<>.
EDIT: Upon reading your comments I suggest you use a WinForms Timer; OnApplicationIdle or BackgroundWorker. The problem with async is that you still need to periodically call it. Using a timer or app idle callback offers the benefit of using the GUI thread.
Depending on the use case, you could start a background thread or a background worker. Or maybe even a timer.
Those are executed in a different thread, and are therefore not locking the execution of your other form related code. Invoke the original thread if you have to perform actions on the UI thread.
I would also recommend to prevent locking as much as possible, for example by doing a check before actually locking:
if (list.Count == 0)
{
lock (lockObject)
{
if (list.Count == 0)
{
// execute your code here
}
}
}
That way you are only locking if you really need to and you avoid unnecessary blocking of your application.
I think what you're after here is the ability to await Task.Yield().
class TheThing {
private readonly List<int> _myList = new List<int>();
public async Task WaitForItToNotBeEmpty() {
bool hadItems;
do {
await Task.Yield();
lock (_myList) // Other answers have touched upon this locking concern
hadItems = _myList.Count != 0;
} while (!hadItems);
}
// ...
}
This is more a conceptual question. I was wondering if I used a lock inside of Parallel.ForEach<> loop if that would take away the benefits of Paralleling a foreachloop.
Here is some sample code where I have seen it done.
Parallel.ForEach<KeyValuePair<string, XElement>>(binReferences.KeyValuePairs, reference =>
{
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
fileLocks.Add(reference.Key, new object());
}
}
RecursiveBinUpdate(reference.Value, testPath, reference.Key, maxRecursionCount, ref recursionCount);
lock (fileLocks[reference.Key])
{
reference.Value.Document.Save(reference.Key);
}
});
Where fileLockObject and fileLocks are as follows.
private static object fileLockObject = new object();
private static Dictionary<string, object> fileLocks = new Dictionary<string, object>();
Does this technique completely make the loop not parallel?
I would like to see your thoughts on this.
It means all of the work inside of the lock can't be done in parallel. This greatly harms the performance here, yes. Since the entire body is not all locked (and locked on the same object) there is still some parallelization here though. Whether the parallelization that you do get adds enough benefit to surpass the overhead that comes with managing the threads and synchronizing around the locks is something you really just need to test yourself with your specific data.
That said, it looks like what you're doing (at least in the first locked block, which is the one I'd be more concerned with at every thread is locking on the same object) is locking access to a Dictionary. You can instead use a ConcurrentDictionary, which is specifically designed to be utilized from multiple threads, and will minimize the amount of synchronization that needs to be done.
if I used a lock ... if that would take away the benefits of Paralleling a foreachloop.
Proportionally. When RecursiveBinUpdate() is a big chunk of work (and independent) then it will still pay off. The locking part could be a less than 1%, or 99%. Look up Amdahls law, that applies here.
But worse, your code is not thread-safe. From your 2 operations on fileLocks, only the first is actually inside a lock.
lock (fileLockObject)
{
if (fileLocks.ContainsKey(reference.Key) == false)
{
...
}
}
and
lock (fileLocks[reference.Key]) // this access to fileLocks[] is not protected
change the 2nd part to:
lock (fileLockObject)
{
reference.Value.Document.Save(reference.Key);
}
and the use of ref recursionCount as a parameter looks suspicious too. It might work with Interlocked.Increment though.
The "locked" portion of the loop will end up running serially. If the RecursiveBinUpdate function is the bulk of the work, there may be some gain, but it would be better if you could figure out how to handle the lock generation in advance.
When it comes to locks, there's no difference in the way PLINQ/TPL threads have to wait to gain access. So, in your case, it only makes the loop not parallel in those areas that you're locking and any work outside those locks is still going to execute in parallel (i.e. all the work in RecursiveBinUpdate).
Bottom line, I see nothing substantially wrong with what you're doing here.
In case we have an immutable object like an ImmutableList(). What is the preferred method for using this object in a multi threaded environment?
Eg
public class MutableListOfObjects()
{
private volatile ImmutableList objList;
public class MutableListOfObjects()
{
objList = new ImmutableList();
}
void Add(object o)
{
// Adding a new object to a list will create a new list to ensure immutability of lists.
// Is declaring the object as volatile enough or do we want to
// use other threading concepts?
objList = objList.Add(o);
}
// Will objList always use that lest version of the list
bool Exist(object o)
{
return objList.Exist(o);
}
}
Is declaring the reference volatile sufficient for achieving the desired behavior? Or is it preferable to use other threading functions?
"Preferred" is contextual. The simplest approach is to use a lock, and in most cases that will do the job very effectively. If you have good reason to think that lock is a problem, then Interlocked is useful:
bool retry;
do {
var snapshot = objList;
var combined = snapshot.Add(o);
retry = Interlocked.CompareExchange(ref objList, combined, snapshot)
!= snapshot;
} while(retry);
This basically works on an optimistic but checked path: most times through, it'll only go through once. Occasionally somebody will change the value of objList while we aren't looking - that's fine, we just try again.
There are, however, pre-canned implementations of thread-safe lists etc, by people who really know what they are talking about. Consider using ConcurrentBag<T> etc. Or just a List<T> with a lock.
A simple and efficient approach is to use ImmutableInterlocked.Update. You pass it a method to perform the add. It calls your add method and then atomically assigns the new value to objList if the list didn't change during the add. If the list changed, Update calls your add method again to retry. It keeps retrying until it is able to write the change.
ImmutableInterlocked.Update(ref objList, l => l.Add(o));
If you have a lot of write contention, such that you'd spend too much time on retries, then using a lock on some stable object (not objList) is preferable.
volatile will not help you in this case - it will not create the lock between reading objList, calling Add() and assigning objList. You should use a locking mechanism instead. volatile just protects against operation reallocations.
In your case you are creating a new list every time an object is added - usually a much better alternative would be to create the list inside a local thread variable (so that it is not subject to multi-threading issues) and once the list is created, mark it as immutable or create a immutable wrapper for it. This way you will get much better performance and memory usage.
I have a function thats the main bottleneck of my application, because its doing heavy string comparisions against a global list shared among the threads. My question is basicly this:
Is it bad practive to lock the list ( called List gList ) multiple times in 1 function. For then to lock it again later ( Basicly locking when doing the lookup, unlocking getting a new item ready for insertion then locking it again and adding the new item).
When i you a profiler i dont see any indication that im paying a heavy price for this, but could i be at a later point or when the code it out in the wild? Anyone got any best practive or personal experence in this?
How do you perform the locking? You may want to look into using ReaderWriterLockSlim, if that is not already the case.
Here is a simple usage example:
class SomeData
{
private IList<string> _someStrings = new List<string>();
private ReaderWriterLockSlim _lock = new ReaderWriterLockSlim();
public void Add(string text)
{
_lock.EnterWriteLock();
try
{
_someStrings.Add(text);
}
finally
{
_lock.ExitWriteLock();
}
}
public bool Contains(string text)
{
_lock.EnterReadLock();
try
{
return _someStrings.Contains(text);
}
finally
{
_lock.ExitReadLock();
}
}
}
It sounds like you don't want to be releasing the lock between the lookup and the insertion. Either that, or you don't need to lock during the lookup at all.
Are you trying to add to the list only if the element is not already there? If so, then releasing the lock between the two steps allows another thread to add to the list while you are preparing your element. By the time you are ready to add, your lookup is out of date.
If it is not a problem that the lookup might be out of date, then you probably don't need to lock during the lookup at all.
In general, you want to lock for as short a time as possible. The cost of a contention is much much higher (must go to kernel) than the cost of a contention-free lock acquisition (can be done in userspace), so finer-grained locking will usually be good for performance even if it means acquiring the lock more times.
That said, make sure you profile in an appropriate situation for this: one with a high amount of simultaneous load. Otherwise your results will have little relationship to reality.
In my opinion there are to few data to give a concrete answer. Generally not the number of locks creates a performance issue, but the number of threads that are waiting for that lock.