I've created a windows service which runs a multi-threaded routine on a machine with 24 cores, 48 virtual, using Parallel.ForEach. This service, which has been running great in a production environment, bulk copies data into an SQL Server database. Currently it does this very well, around 6000 inserts per second, but I believe it can be tweaked. Below is part of the code I am using; there's an example of current functionality and proposed changes for tweaking. As can be seen from the code, currently a lock is taken for every call to Add, which I believe makes the Parallel.ForEach somewhat non-parallel. So I'm looking for a "fix"; and hoping my new method, also defined in the code, would do the trick.
public class MainLoop
{
public void DoWork()
{
var options = new ParallelOptions
{
MaxDegreeOfParallelism = System.Environment.ProcessorCount * 2
};
var workQueueManager = new ObjWorkQueueManager(queueSize: 1000);
// ignore the fact that this while loop would be a never ending loop,
// there's other logic not shown here that exits the loop!
while (true)
{
ICollection<object> work = GetWork();
Parallel.ForEach(work, options, (item) =>
{
workQueueManager.AddOLD(item);
});
}
}
private ICollection<object> GetWork()
{
// return list of work from some arbitrary source
throw new NotImplementedException();
}
}
public class ObjWorkQueueManager
{
private readonly int _queueSize;
private ObjDataReader _queueDataHandler;
private readonly object _sync;
public ObjWorkQueueManager(int queueSize)
{
_queueSize = queueSize;
_queueDataHandler = new ObjDataReader(queueSize);
_sync = new object();
}
// current Add method works great, but blocks with EVERY call
public void AddOLD(object value)
{
lock (_sync)
{
if (_queueDataHandler.Add(value) == _queueSize)
{
// create a new thread to handle copying the queued data to repository
Thread t = new Thread(SaveQueuedData);
t.Start(_queueDataHandler);
// start a new queue
_queueDataHandler = new ObjDataReader(_queueSize);
}
}
}
// hoping for a new Add method to work better by blocking only
// every nth call where n = _queueSize
public void AddNEW(object value)
{
int queued;
if ((queued = _queueDataHandler.Add(value)) >= _queueSize)
{
lock (_sync)
{
if (queued == _queueSize)
{
Thread t = new Thread(SaveQueuedData);
t.Start(_queueDataHandler);
}
}
}
else if (queued == 0)
{
lock (_sync)
{
_queueDataHandler = new ObjDataReader(_queueSize);
AddNEW(value);
}
}
}
// this method will Bulk Copy data into an SQL DB
private void SaveQueuedData(object o)
{
// do something with o as ObjDataReader
}
}
// implements IDataReader, Read method of IDataReader dequeues from _innerQueue
public class ObjDataReader
{
private readonly int _capacity;
private Queue<object> _innerQueue;
public ObjDataReader(int capacity)
{
_capacity = capacity;
_innerQueue = new Queue<object>(capacity);
}
public int Add(object value)
{
if (_innerQueue.Count < _capacity)
{
_innerQueue.Enqueue(value);
return _innerQueue.Count;
}
return 0;
}
}
Related
I want to implement following data-type
public class MyType
{
void Set(int i);
void AddHandler(int i, Action action);
}
Semantics are as follows.
Both methods must be concurrency safe.
Maximum value of 'i' is known and is relatively low (~100).
Trying to set i more than once should fail.
Calling set with value i should call all handlers registered for that i.
AddHandler registers new handler for given i. If i is already set, action is immediately called.
For example, consider the following sequence
Set(1)
Set(2)
AddHandler(3, f1)
AddHandler(3, f2)
Set(1) // Fails, 1 is already set
AddHandler(2, g) // g is called as 2 is already set
Set(3) // f1, f2 are called
AddHandler(3, h) // h is called as 3 is now set
Goal is to minimize allocations needed to be done for each method call. Here is code for my attempt to implement it.
public class MyType
{
const int N = 10;
static readonly Action[] s_emptyHandler = new Action[0];
readonly bool[] m_vars = new bool[N];
readonly List<Action>[] m_handlers = new List<Action>[N];
public void Set(int i)
{
Action[] handlers;
lock (this)
{
if (m_vars[i]) throw new InvalidOperationException();
m_vars[i] = true;
handlers = m_handlers[i] != null ? m_handlers[i].ToArray() : s_emptyHandler;
}
foreach (var action in handlers)
action();
}
public void AddHandler(int i, Action action)
{
var done = false;
lock (this)
{
if (m_vars[i])
done = true;
else
{
if(m_handlers[i] == null)
m_handlers[i] = new List<Action>();
m_handlers[i].Add(action);
}
}
if (done)
action();
}
}
Taking array snapshot at every Set method is ineffective. From the other side, since you need additional synchronization, using BlockingCollection doesn't make sense. For your case, some immutable collection would fit better.
There are even simple method taking advantage of the fact that you are only adding handlers. We can use an explicit array with count field pair instead of a list class, so all we need to do inside the Set method is to take array reference and count value inside the protected block. Then we can safely iterate the array up to count and invoke the handlers. Here is a code using the approach described:
public class MyType
{
struct Entry
{
public bool IsSet;
public int HandlerCount;
public Action[] HandlerList;
public void Add(Action handler)
{
if (HandlerList == null) HandlerList = new Action[4];
else if (HandlerList.Length == HandlerCount) Array.Resize(ref HandlerList, 2 * HandlerCount);
HandlerList[HandlerCount++] = handler;
}
}
const int N = 10;
readonly Entry[] entries = new Entry[N];
readonly object syncLock = new object();
public void Set(int index)
{
int handlerCount;
Action[] handlerList;
lock (syncLock)
{
if (entries[index].IsSet) throw new InvalidOperationException();
entries[index].IsSet = true;
handlerCount = entries[index].HandlerCount;
handlerList = entries[index].HandlerList;
}
for (int i = 0; i < handlerCount; i++)
handlerList[i]();
}
public void AddHandler(int index, Action handler)
{
if (handler == null) throw new ArgumentException("handler");
lock (syncLock)
{
entries[index].Add(handler);
if (!entries[index].IsSet) return;
}
handler();
}
}
public class MyType
{
private HashSet<int> set = new HashSet<int>();
private Dictionary<int, BlockingCollection<Action>> actions = new Dictionary<int, BlockingCollection<Action>>();
private void ExecuteActions(BlockingCollection<Action> toExecute)
{
Task.Run(() =>
{
while (!toExecute.IsCompleted)
{
try
{
Action action = toExecute.Take();
action();
}
catch { }
}
});
}
public void Set(int i)
{
lock (this)
{
if (!set.Contains(i))
{
set.Add(i);
BlockingCollection<Action> toExecute;
if (!actions.TryGetValue(i, out toExecute))
{
actions[i] = toExecute = new BlockingCollection<Action>();
}
ExecuteActions(toExecute);
}
}
}
public void AddHandler(int i, Action action)
{
lock (this)
{
BlockingCollection<Action> toExecute;
if (!actions.TryGetValue(i, out toExecute))
{
actions[i] = toExecute = new BlockingCollection<Action>();
}
toExecute.Add(action);
}
}
}
I have been working on a mock-up for an import service which currently runs in sequence. However my mock-up seems to exhibit a strange problem where by sometimes one or two items in the for loop aren't executed.
class Service
{
private Thread _worker;
private bool _stopping;
private CancellationTokenSource _cts;
private ParallelOptions _po;
private Repository _repository;
public void Start(Repository repository)
{
_repository = repository;
_cts = new CancellationTokenSource();
_po = new ParallelOptions {
CancellationToken = _cts.Token
};
_worker = new Thread(ProcessImport);
_worker.Start();
}
public void Stop()
{
_stopping = true;
_cts.Cancel();
if(_worker != null && _worker.IsAlive)
_worker.Join();
}
private void ProcessImport()
{
while (!_stopping)
{
var import = _repository.GetInProgressImport();
if (import == null)
{
Thread.Sleep(1000);
continue;
}
try
{
Parallel.For(0, 1000, _po, i => Work.DoWork(i, import, _cts.Token, _repository));
}
catch (OperationCanceledException)
{
// Unmark batch so it can be started again
batch = _repository.GetBatch(import.BatchId);
batch.Processing = false;
_repository.UpdateBatch(batch);
Console.WriteLine("Aborted import {0}", import.ImportId);
}
catch (Exception ex)
{
Console.WriteLine("Something went wrong: {0}", ex.Message);
}
}
}
}
class Work
{
public static void DoWork(int i, Import import, CancellationToken ct, Repository repository)
{
// Simulate doing some work
Thread.Sleep(100);
HandleAbort(ct);
Thread.Sleep(100);
HandleAbort(ct);
Thread.Sleep(100);
// Update the batch
var batch = repository.GetBatch(import.BatchId);
batch.Processed++;
if (batch.Processed == batch.Total)
{
batch.Finished = DateTime.Now;
batch.Processing = false;
}
repository.UpdateBatch(batch);
}
private static void HandleAbort(CancellationToken ct)
{
if (!ct.IsCancellationRequested)
return;
ct.ThrowIfCancellationRequested();
}
}
With this code, I often find that the batches are never complete and that batch.Processed = 999 or 998.
Can anyone shed any light on what I've done wrong.
Thanks in advance.
Edit:
To be clear about the repository/batch object - I believe in my current mock-up that it is threadsafe
class Repository
{
private ConcurrentBag<Batch> _batchData = new ConcurrentBag<Batch>();
private ConcurrentBag<Import> _importData = new ConcurrentBag<Import>();
public void CreateImport(Import import)
{
_importData.Add(import);
}
public Import GetInProgressImport()
{
var import = _importData
.Join(_batchData, i => i.BatchId, b => b.BatchId, (i, b) => new
{
Import = i,
Batch = b
})
.Where(j => j.Batch.Processed < j.Batch.Total && !j.Batch.Processing)
.OrderByDescending(j => j.Batch.Total - j.Batch.Processed)
.ThenBy(j => j.Batch.BatchId - j.Batch.BatchId)
.Select(j => j.Import)
.FirstOrDefault();
if (import == null)
return null;
// mark the batch as processing
var batch = GetBatch(import.BatchId);
batch.Processing = true;
UpdateBatch(batch);
return import;
}
public List<Import> ListImports()
{
return _importData.ToList();
}
public void CreateBatch(Batch batch)
{
_batchData.Add(batch);
}
public Batch GetBatch(Int64 batchId)
{
return _batchData.FirstOrDefault(b => b.BatchId == batchId);
}
public void UpdateBatch(Batch batch)
{
var batchData = _batchData.First(b => b.BatchId == batch.BatchId);
batchData.Total = batch.Total;
batchData.Processed = batch.Processed;
batchData.Started = batch.Started;
batchData.Finished = batch.Finished;
batchData.Processing = batch.Processing;
}
}
class Import
{
public Int64 ImportId { get; set; }
public Int64 BatchId { get; set; }
}
class Batch
{
public Int64 BatchId { get; set; }
public int Total { get; set; }
public int Processed { get; set; }
public DateTime Created { get; set; }
public DateTime Started { get; set; }
public DateTime Finished { get; set; }
public bool Processing { get; set; }
}
This is only a mock-up so there is no DB or other persistence behind my repository.
Also, I'm not competing my batch on the value of i, but rather the number of iterations of the loop (the work actually having been done) indicated by the Processed property of the batch object.
Thanks
Solution:
I had forgotten about the need synchronise the update of the batch. Should look like:
class Work
{
private static object _sync = new object();
public static void DoWork(int i, Import import, CancellationToken ct, Repository repository)
{
// Do work
Thread.Sleep(100);
HandleAbort(ct);
Thread.Sleep(100);
HandleAbort(ct);
Thread.Sleep(100);
lock (_sync)
{
// Update the batch
var batch = repository.GetBatch(import.BatchId);
batch.Processed++;
if (batch.Processed == batch.Total)
{
batch.Finished = DateTime.Now;
batch.Processing = false;
}
repository.UpdateBatch(batch);
}
}
private static void HandleAbort(CancellationToken ct)
{
if (!ct.IsCancellationRequested)
return;
ct.ThrowIfCancellationRequested();
}
}
Looks like lost updates on batch.Processed. Increments are not atomic. batch.Processed++; is racy. Use Interlocked.Increment.
It seems to me like you don't have a good understanding of threading right now. It's very dangerous to perform such elaborate threading without a good understanding. The mistakes you make are hard to test for but production will find them.
According to MSDN, the overloads of Parallel.For specify the second integer as toExclusive, meaning to goes up to but does not meet that value. In other words, 999 is the expected result, not 1000 - but note also that by starting at "0", your loop does execute 1,000 times.
From a glance, your code is parallel, so make sure you're not seeing the "999" call out of order from the "998" one - this is because by being executed in parallel, your code is inherently unordered, and can easily end up being very randomly rearranged. Also, read up on lock, as your code may be accessing values which it should be waiting for.
I have a WCF service and an resource with records (having IDs to identify them). I want that only 1 ID can be accessed simultaneously - so i have written a little resource helper:
public sealed class ConcurrencyIdManager
{
private static object _syncRootGrant = new object();
private static List<int> _IdsInUse = new List<int>();
... // singleton
public void RequestAndWaitForIdGrant(int id)
{
lock (_syncRootGrant)
{
while (_IdsInUse.Where(i => i == id).Count() != 0)
{
Monitor.Wait(_syncRootGrant);
}
_IdsInUse.Add(id);
}
}
public void ReleaseGrantForId(int id)
{
lock (_syncRootGrant)
{
_IdsInUse.Remove(id);
Monitor.PulseAll(_syncRootGrant);
}
}
So in my WCF service i have
public void UpdateMySpecialEntity(Entity foo)
{
ConcurrencyIdManager.Instance.RequestAndWaitForIdGrant(foo.Id);
try {
// do something with the entity foo
}
finally { ConcurrencyIdManager.Instance.ReleaseGrantForId(foo.Id); }
}
Is the implementation correct so far? :-)
If am reading your notes right, you want id's 3 4 and 5 to edit simultaneously, but two threads with id 5 to block and wait for each other.
In that case use a concurrent collection of lock objects and use a simple lock on the object for that Id.
e.g. in pseudo c#
ConcurrentDictionary<int,object> lockObjects = new ConcurrentDictionary<int,object)
public void UpdateMySpecialEntity(Entity foo)
{
object idLock = lockObject.GetOrAdd(foo.id,new object());
lock (idLock)
{
// do lock sensitive stuff in here.
}
}
This is a design question, not a bug fix problem.
The situation is this. I have a lot of collections and objects contained in one class. Their contents are only changed by a single message handler thread. There is one other thread which is doing rendering. Each frame it iterates through some of these collections and draws to the screen based on the value of these objects. It does not alter the objects in any way, it is just reading their values.
Now when the rendering is being done, if any of the collections are altered, my foreach loops in the rendering method fail. How should I make this thread safe? Edit: So I have to lock the collections outside each foreach loop I run on them. This works, but it seems like a lot of repetitive code to solve this problem.
As a short, contrived example:
class State
{
public object LockObjects;
public List<object> Objects;
// Called by message handler thread
void HandleMessage()
{
lock (LockObjects)
{
Objects.Add(new object());
}
}
}
class Renderer
{
State m_state;
// Called by rendering thread
void Render()
{
lock (m_state.LockObjects)
{
foreach (var obj in m_state.Objects)
{
DrawObject(obj);
}
}
}
}
This is all well and good, but I'd rather not put locks on all my state collections if there's a better way. Is this "the right" way to do it or is there a better way?
The better way is to use begin/end methods and separated lists for your both threads and synchronization using auto events for example. It will be lock-free to your message handler thread and enables you to have a lot of render/message handler threads:
class State : IDisposable
{
private List<object> _objects;
private ReaderWriterLockSlim _locker;
private object _cacheLocker;
private List<object> _objectsCache;
private Thread _synchronizeThread;
private AutoResetEvent _synchronizationEvent;
private bool _abortThreadToken;
public State()
{
_objects = new List<object>();
_objectsCache = new List<object>();
_cacheLocker = new object();
_locker = new ReaderWriterLockSlim();
_synchronizationEvent = new AutoResetEvent(false);
_abortThreadToken = false;
_synchronizeThread = new Thread(Synchronize);
_synchronizeThread.Start();
}
private void Synchronize()
{
while (!_abortThreadToken)
{
_synchronizationEvent.WaitOne();
int objectsCacheCount;
lock (_cacheLocker)
{
objectsCacheCount = _objectsCache.Count;
}
if (objectsCacheCount > 0)
{
_locker.EnterWriteLock();
lock (_cacheLocker)
{
_objects.AddRange(_objectsCache);
_objectsCache.Clear();
}
_locker.ExitWriteLock();
}
}
}
public IEnumerator<object> GetEnumerator()
{
_locker.EnterReadLock();
foreach (var o in _objects)
{
yield return o;
}
_locker.ExitReadLock();
}
// Called by message handler thread
public void HandleMessage()
{
lock (_cacheLocker)
{
_objectsCache.Add(new object());
}
_synchronizationEvent.Set();
}
public void Dispose()
{
_abortThreadToken = true;
_synchronizationEvent.Set();
}
}
Or (the simpler way) you can use ReaderWriteerLockSlim (Or just locks if you sure you have only one reader) like in the following code:
class State
{
List<object> m_objects = new List<object>();
ReaderWriterLockSlim locker = new ReaderWriterLockSlim();
public IEnumerator<object> GetEnumerator()
{
locker.EnterReadLock();
foreach (var o in Objects)
{
yield return o;
}
locker.ExitReadLock();
}
private List<object> Objects
{
get { return m_objects; }
set { m_objects = value; }
}
// Called by message handler thread
public void HandleMessage()
{
locker.EnterWriteLock();
Objects.Add(new object());
locker.ExitWriteLock();
}
}
Humm... have you tried with a ReaderWriterLockSlim ? Enclose each conllection with one of this, and ensure you start a read or write operation each time you access it.
I'm writing a wrapper around a 3rd party library, and it has a method to scan the data it manages. The method takes a callback method that it calls for each item in the data that it finds.
e.g. The method is essentially: void Scan(Action<object> callback);
I want to wrap it and expose a method like IEnumerable<object> Scan();
Is this possible without resorting to a separate thread to do the actual scan and a buffer?
You can do this quite simply with Reactive:
class Program
{
static void Main(string[] args)
{
foreach (var x in CallBackToEnumerable<int>(Scan))
Console.WriteLine(x);
}
static IEnumerable<T> CallBackToEnumerable<T>(Action<Action<T>> functionReceivingCallback)
{
return Observable.Create<T>(o =>
{
// Schedule this onto another thread, otherwise it will block:
Scheduler.Later.Schedule(() =>
{
functionReceivingCallback(o.OnNext);
o.OnCompleted();
});
return () => { };
}).ToEnumerable();
}
public static void Scan(Action<int> act)
{
for (int i = 0; i < 100; i++)
{
// Delay to prove this is working asynchronously.
Thread.Sleep(100);
act(i);
}
}
}
Remember that this doesn't take care of things like cancellation, since the callback method doesn't really allow it. A proper solution would require work on the part of the external library.
You should investigate the Rx project — this allows an event source to be consumed as an IEnumerable.
I'm not sure if it allows vanilla callbacks to be presented as such (it's aimed at .NET events) but it would be worth a look as it should be possible to present a regular callback as an IObservable.
Here is a blocking enumerator (the Scan method needs to run in a separate thread)
public class MyEnumerator : IEnumerator<object>
{
private readonly Queue<object> _queue = new Queue<object>();
private ManualResetEvent _event = new ManualResetEvent(false);
public void Callback(object value)
{
lock (_queue)
{
_queue.Enqueue(value);
_event.Set();
}
}
public void Dispose()
{
}
public bool MoveNext()
{
_event.WaitOne();
lock (_queue)
{
Current = _queue.Dequeue();
if (_queue.Count == 0)
_event.Reset();
}
return true;
}
public void Reset()
{
_queue.Clear();
}
public object Current { get; private set; }
object IEnumerator.Current
{
get { return Current; }
}
}
static void Main(string[] args)
{
var enumerator = new MyEnumerator();
Scan(enumerator.Callback);
while (enumerator.MoveNext())
{
Console.WriteLine(enumerator.Current);
}
}
You could wrap it in a simple IEnumerable<Object>, but I would not recommend it. IEnumerable lists implies that you can run multiple enumerators on the same list, which you can't in this case.
How about this one:
IEnumerable<Object> Scan()
{
List<Object> objList = new List<Object>();
Action<Object> action = (obj) => { objList.Add(obj); };
Scan(action);
return objList;
}
Take a look at the yield keyword -- which will allow you to have a method that looks like an IEnumerable but which actually does processing for each return value.