PLinq and Object Pooling using ConcurrentCollections - c#

I have a method that has to iterate over a large set of data and returned the processed results to a consumer thread for serialization. Streaming PLinq fits best performance-wise.
Because these operations are frequent, i am using an objectpool to cache the containers for my processing, to minimize the object creation. I tried implementing the objectpool using a concurrentstack (concurrentbag and concurrentqueue exhibit the same problem). In same rare cases, the same item (looking at the hashcode) is acquired from the pool by the same thread, although it was not released by the consumer thread. I added tracing in the acquire and release methods of the pool, and this is the output:
5:11:32.250 PM Get item 16071020 for Thread 31
5:11:32.254 PM Get item 16071020 for Thread 31
5:11:32.260 PM Put item 16071020 for Thread 27
5:11:32.286 PM Put item 16071020 for Thread 27
Here is the code i am using:
var itemsToProcess = data.AsParallel()
.Where(x => Filter(x))
.Select(row => Process(row));
In the Process method, i will get the object from the pool:
result = ObjectPool.Instance.GetObject();
The Pool class implementation:
public class ObjectPool
{
private ConcurrentStack<object[]> _objects;
private int size;
private const int maxSize = 20000;
private static ObjectPool instance = new ObjectPool(500);
public static ObjectPool Instance
{
get { return instance; }
}
private ObjectPool(int size)
{
this.size = size;
_objects = new ConcurrentStack<object[]>();
}
public object[] GetObject()
{
object[] item;
if (_objects.TryPop(out item))
{
Trace.WriteLine(string.Format("Get item {0} for Thread {1}", item.GetHashCode(), Thread.CurrentThread.ManagedThreadId));
return item;
}
return new object[size];
}
public void Clear()
{
_objects.Clear();
}
public void PutObject(object[] item)
{
Trace.WriteLine(string.Format("Put item {0} for Thread {1}", item.GetHashCode(), Thread.CurrentThread.ManagedThreadId));
if (_objects.Count < maxSize)
{
_objects.Push(item);
}
}
}
I am at a loss on how to prevent this kind of situation to occur. Any ideas on why this can happen and how to prevent it?

I can't see anything wrong with the code you posted.
To me, the most likely case seems to be that you call PutObject() twice on the same array. But without seeing more of your code, it's impossible to tell.

Related

Proper data access in Multithreading

I have Method which is used by multiple threads at the same time. each one of this thread Call another method to receive the data they need from a List (each one should get a different data not same).
I wrote this code to get Data from a list and use them in the Threads.
public static List<string> ownersID;
static int idIdx = 0;
public static string[] GetUserID()
{
if (idIdx < ownersID.Count-1)
{
string[] ret = { ownersID[idIdx], idIdx.ToString() };
idIdx++;
return ret;
}
else if (idIdx >= ownersID.Count)
{
string[] ret = { "EndOfThat" };
return ret;
}
return new string[0];
}
Then each thread use this code to receive the data and remove it from the list:
string[] arrOwner = GetUserID();
string id = arrOwner[0];
ownersID.RemoveAt(Convert.ToInt32(arrOwner[1]));
But sometimes 2 or more threads can have the same data.
Is there has any better way to do this?
If you want to do it with List just add little bit of locking
private object _lock = new object();
private List<string> _list = new List<string>();
public void Add(string someStr)
{
lock(_lock)
{
if (_list.Any(s => s == someStr) // already added (inside lock)
return;
_list.Add(someStr);
}
}
public void Remove(string someStr)
{
lock(_lock)
{
if (!_list.Any(s => s == someStr) // already removed(inside lock)
return;
_list.Remove(someStr);
}
}
With that, no thread will be adding/removing anything while another thread does the same. Your list will be protected from multi-thread access. And you make sure that you only have 1 of the kind. However, you can achieve this using ConcurrentDictionary<T1, T2>
Update: I removed pre-lock check due to this MSDN thread safety statement
It is safe to perform multiple read operations on a List (read - multithreading), but issues can occur if the collection is modified while it's being read.
On a larger scale of application you can use .Net queue to communicate between two thread.
The benefit of using a queue is you don't need to lock the object which will be decrease the latency.From Main thread to Thread A , Thread B And Thread C the data will add and receive through queue.No Locking.

Async Producer/Consumer

I have a instance of a class that is accessed from several threads. This class take this calls and add a tuple into a database. I need this to be done in a serial manner, as due to some db constraints, parallel threads could result in an inconsistent database.
As I am new to parallelism and concurrency in C#, I did this:
private BlockingCollection<Task> _tasks = new BlockingCollection<Task>();
public void AddDData(string info)
{
Task t = new Task(() => { InsertDataIntoBase(info); });
_tasks.Add(t);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Task t;
if (_tasks.TryTake(out t))
{
t.Start();
t.Wait();
}
}
});
}
The AddDData is the one who is called by multiple threads and InsertDataIntoBase is a very simple insert that should take few milliseconds.
The problem is that, for some reason that my lack of knowledge doesn't allow me to figure out, sometimes a task is been called twice! It always goes like this:
T1
T2
T3
T1 <- PK error.
T4
...
Did I understand .Take() completely wrong, am I missing something or my producer/ consumer implementation is really bad?
Best Regards,
Rafael
UPDATE:
As suggested, I made a quick sandbox test implementation with this architecture and as I was suspecting, it does not guarantee that a task will not be fired before the previous one finishes.
So the question remains: how to properly queue tasks and fire them sequentially?
UPDATE 2:
I simplified the code:
private BlockingCollection<Data> _tasks = new BlockingCollection<Data>();
public void AddDData(Data info)
{
_tasks.Add(info);
}
private void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_tasks.IsCompleted)
{
Data info;
if (_tasks.TryTake(out info))
{
InsertIntoDB(info);
}
}
});
}
Note that I got rid of Tasks as I'm relying on synced InsertIntoDB call (as it is inside a loop), but still no luck... The generation is fine and I'm absolutely sure that only unique instances are going to the queue. But no matter I try, sometimes the same object is used twice.
I think this should work:
private static BlockingCollection<string> _itemsToProcess = new BlockingCollection<string>();
static void Main(string[] args)
{
InsertWorker();
GenerateItems(10, 1000);
_itemsToProcess.CompleteAdding();
}
private static void InsertWorker()
{
Task.Factory.StartNew(() =>
{
while (!_itemsToProcess.IsCompleted)
{
string t;
if (_itemsToProcess.TryTake(out t))
{
// Do whatever needs doing here
// Order should be guaranteed since BlockingCollection
// uses a ConcurrentQueue as a backing store by default.
// http://msdn.microsoft.com/en-us/library/dd287184.aspx#remarksToggle
Console.WriteLine(t);
}
}
});
}
private static void GenerateItems(int count, int maxDelayInMs)
{
Random r = new Random();
string[] items = new string[count];
for (int i = 0; i < count; i++)
{
items[i] = i.ToString();
}
// Simulate many threads adding items to the collection
items
.AsParallel()
.WithDegreeOfParallelism(4)
.WithExecutionMode(ParallelExecutionMode.ForceParallelism)
.Select((x) =>
{
Thread.Sleep(r.Next(maxDelayInMs));
_itemsToProcess.Add(x);
return x;
}).ToList();
}
This does mean that the consumer is single threaded, but allows for multiple producer threads.
From your comment
"I simplified the code shown here, as the data is not a string"
I assume that info parameter passed into AddDData is a mutable reference type. Make sure that the caller is not using the same info instance for multple calls since that reference is captured in Task lambda .
Based on the trace that you provided the only logical possibility is that you have called InsertWorker twice (or more). There are thus two background threads waiting for items to appear in the collection and occasionally they both manage to grab an item and begin executing it.

C#: Locking a Queue properly for Iteration

I am using a Queue (C#) to store data that has to be sent to any client connecting.
my lock statement is private readonly:
private readonly object completedATEQueueSynched = new object();
only two methods are enqueueing:
1) started by mouse-movement, executed by the mainform-thread:
public void handleEddingToolMouseMove(MouseEventArgs e)
{
AbstractTrafficElement de = new...
sendElementToAllPlayers(de)
lock (completedATEQueueSynched)
{
completedATEQueue.Enqueue(de);
}
}
2) started on a button-event, executed by mainform-thread too (does not matter here, but better safe than sorry):
public void handleBLC(EventArgs e)
{
AbstractTrafficElement de = new...
sendElementToAllPlayers(de);
lock (completedATEQueueSynched)
{
completedATEQueue.Enqueue(de);
}
}
this method is called by the thread responsible for the specific client connected. here it is:
private void sendSetData(TcpClient c)
{
NetworkStream clientStream = c.GetStream();
lock (completedATEQueueSynched)
{
foreach (AbstractTrafficElement ate in MainForm.completedATEQueue)
{
binaryF.Serialize(clientStream, ate);
}
}
}
if a client connects and i am moving my mouse at the same time, a deadlock occurs.
if i lock the iteration only, a InvalidOperation exection is thrown, because the queue changed.
i have tried the synchronized Queue-Wrapper as well, but it does't work for Iterating. (even in combination with locks)
any ideas? i just don't get my mistake
You can reduce the contention, probably enough to make it acceptable:
private void sendSetData(TcpClient c)
{
IEnumerable<AbstractTrafficElement> list;
lock (completedATEQueueSynched)
{
list = MainForm.completedATEQueue.ToList(); // take a snapshot
}
NetworkStream clientStream = c.GetStream();
foreach (AbstractTrafficElement ate in list)
{
binaryF.Serialize(clientStream, ate);
}
}
But of course a snapshot introduces its own bit of timing logic. What exactly does 'all elements' mean at any given moment?
Looks like ConcurrentQueue you've wanted
UPDATE
Yes work fine, TryDequeue uses within the Interlocked.CompareExchange and SpinWait. Lock is not good choice, because too expensive take a look on SpinLock and don't forget about Data Structures for Parallel Programming
Her is enqueue from ConcurrentQueue, as you see only SpinWait and Interlocked.Increment are used. looks pretty nice
public void Enqueue(T item)
{
SpinWait spinWait = new SpinWait();
while (!this.m_tail.TryAppend(item, ref this.m_tail))
spinWait.SpinOnce();
}
internal void Grow(ref ConcurrentQueue<T>.Segment tail)
{
this.m_next = new ConcurrentQueue<T>.Segment(this.m_index + 1L);
tail = this.m_next;
}
internal bool TryAppend(T value, ref ConcurrentQueue<T>.Segment tail)
{
if (this.m_high >= 31)
return false;
int index = 32;
try
{
}
finally
{
index = Interlocked.Increment(ref this.m_high);
if (index <= 31)
{
this.m_array[index] = value;
this.m_state[index] = 1;
}
if (index == 31)
this.Grow(ref tail);
}
return index <= 31;
}
Henk Holterman's approach is good if your rate of en-queue, dequeue on queue is not very high. Here I think you are capturing mouse movements. If you expect to generate lot of data in queue the above approach is not fine. The lock becomes contention between the network code and en-queue code. The granularity of this lock is at whole queue level.
In this case I'll recommend what GSerjo mentioned - ConcurrentQueue. I've looked into the implementation of this queue. It is very granular. It operates at single element level in queue. While one thread is dequeueing, other threads can in parallel enqueue without stopping.

Why does the process lose threads?

Here is some code that perpetually generate GUIDs. I've written it to learn about threading. In it you'll notice that I've got a lock around where I generate GUIDs and enqueue them even though the ConcurrentQueue is thread safe. It's because my actual code will need to use NHibernate and so I must make sure that only one thread gets to fill the queue.
While I monitor this code in Task Manager, I notice the process drops the number of threads from 18 (on my machine) to 14 but no less. Is this because my code isn't good?
Also can someone refactor this if they see fit? I love shorter code.
class Program
{
ConcurrentNewsBreaker Breaker;
static void Main(string[] args)
{
new Program().Execute();
Console.Read();
}
public void Execute()
{
Breaker = new ConcurrentNewsBreaker();
QueueSome();
}
public void QueueSome()
{
ThreadPool.QueueUserWorkItem(DoExecute);
}
public void DoExecute(Object State)
{
String Id = Breaker.Pop();
Console.WriteLine(String.Format("- {0} {1}", Thread.CurrentThread.ManagedThreadId, Breaker.Pop()));
if (Breaker.Any())
QueueSome();
else
Console.WriteLine(String.Format("- {0} XXXX ", Thread.CurrentThread.ManagedThreadId));
}
}
public class ConcurrentNewsBreaker
{
static readonly Object LockObject = new Object();
ConcurrentQueue<String> Store = new ConcurrentQueue<String>();
public String Pop()
{
String Result = null;
if (Any())
Store.TryDequeue(out Result);
return Result;
}
public Boolean Any()
{
if (!Store.Any())
{
Task FillTask = new Task(FillupTheQueue, Store);
FillTask.Start();
FillTask.Wait();
}
return Store.Any();
}
private void FillupTheQueue(Object StoreObject)
{
ConcurrentQueue<String> Store = StoreObject as ConcurrentQueue<String>;
lock(LockObject)
{
for(Int32 i = 0; i < 100; i++)
Store.Enqueue(Guid.NewGuid().ToString());
}
}
}
You are using .NET's ThreadPool so .NET/Windows manages the number of threads based on the amount of work waiting to be processed.
While I monitor this code in Task
Manager, I notice the process drops
the number of threads from 18 (on my
machine) to 14 but no less. Is this
because my code isn't good?
This does not indicate a problem. 14 is still high, unless you've got a 16-core cpu.
The threadpool will try to adjust and do the work with as few threads as possible.
You should start to worry when the number of threads goes up significantly.

Is this a good impl for a Producer/Consumer unique keyed buffer?

Can anyone see any problems with this Producer/Consumer unique keyed buffer impl? The idea is if you add items for processing with the same key only the lastest value will be processed and the old/existing value will be thrown away.
public sealed class PCKeyedBuffer<K,V>
{
private readonly object _locker = new object();
private readonly Thread _worker;
private readonly IDictionary<K, V> _items = new Dictionary<K, V>();
private readonly Action<V> _action;
private volatile bool _shutdown;
public PCKeyedBuffer(Action<V> action)
{
_action = action;
(_worker = new Thread(Consume)).Start();
}
public void Shutdown(bool waitForWorker)
{
_shutdown = true;
if (waitForWorker)
_worker.Join();
}
public void Add(K key, V value)
{
lock (_locker)
{
_items[key] = value;
Monitor.Pulse(_locker);
}
}
private void Consume()
{
while (true)
{
IList<V> values;
lock (_locker)
{
while (_items.Count == 0) Monitor.Wait(_locker);
values = new List<V>(_items.Values);
_items.Clear();
}
foreach (V value in values)
{
_action(value);
}
if(_shutdown) return;
}
}
}
static void Main(string[] args)
{
PCKeyedBuffer<string, double> l = new PCKeyedBuffer<string, double>(delegate(double d)
{
Thread.Sleep(10);
Console.WriteLine(
"Processed: " + d.ToString());
});
for (double i = 0; i < 100; i++)
{
l.Add(i.ToString(), i);
}
for (double i = 0; i < 100; i++)
{
l.Add(i.ToString(), i);
}
for (double i = 0; i < 100; i++)
{
l.Add(i.ToString(), i);
}
Console.WriteLine("Done Enqeueing");
Console.ReadLine();
}
After a quick once over I would say that the following code in the Consume method
while (_items.Count == 0) Monitor.Wait(_locker);
Should probably Wait using a timeout and check the _shutdown flag each iteration. Especially since you are not setting your consumer thread to be aq background thread.
In addition, the Consume method does not appear very scalable, since it single handedly tries to process an entire queue of items. Of course this might depend on the rate that items are being produced. I would probably have the consumer focus on a single item in the list and then use TPL to run multiple concurrent consumers, this way you can take advantage of multple cores while letting TPL balance the work load for you. To reduce the required locking for the consumer processing a single item you could use a ConcurrentDictionary
As Chris pointed out, ConcurrentDictionary already exists and is more scalable. It was added to the base libraries in .NET 4.0, and is also available as an add-on to .NET 3.5.
This is one of the few attempts at creating a custom producer/consumer that is actually correct. So job well done in that regard. However, like Chris pointed out your stop flag will be ignored while Monitor.Wait is blocked. There is no need to rehash his suggestion for fixing that. The advice I can offer is to use a BlockingCollection instead of doing the Wait/Pulse calls manually. That would also solve the shutdown problem since the Take method is cancellable. If you are not using .NET 4.0 then it available in the Reactive Extension download that Stephen linked to. If that is not an option then Stephen Toub has a correct implementation here (except his is not cancellable, but you can always do a Thread.Interrupt to safely unblock it). What you can do is feed in KeyValuePair items into the queue instead of using a Dictionary.

Categories