How to access the underlying default concurrent queue of a blocking collection? - c#

I have multiple producers and a single consumer. However if there is something in the queue that is not yet consumed a producer should not queue it again. (unique no duplicates blocking collection that uses the default concurrent queue)
if (!myBlockingColl.Contains(item))
myBlockingColl.Add(item)
However the blocking collection does not have a contains method nor does it provide any kind of TryPeek() like method. How can I access the underlying concurrent queue so I can do something like
if (!myBlockingColl.myConcurQ.trypeek(item)
myBlockingColl.Add(item)
In a tail spin?

This is an interesting question. This is the first time I have seen someone ask for a blocking queue that ignores duplicates. Oddly enough I could find nothing like what you want that already exists in the BCL. I say this is odd because BlockingCollection can accept a IProducerConsumerCollection as the underlying collection which has the TryAdd method that is advertised as being able to fail when duplicates are detected. The problem is that I see no concrete implementation of IProducerConsumerCollection that prevents duplicates. At least we can write our own.
public class NoDuplicatesConcurrentQueue<T> : IProducerConsumerCollection<T>
{
// TODO: You will need to fully implement IProducerConsumerCollection.
private Queue<T> queue = new Queue<T>();
public bool TryAdd(T item)
{
lock (queue)
{
if (!queue.Contains(item))
{
queue.Enqueue(item);
return true;
}
return false;
}
}
public bool TryTake(out T item)
{
lock (queue)
{
item = null;
if (queue.Count > 0)
{
item = queue.Dequeue();
}
return item != null;
}
}
}
Now that we have our IProducerConsumerCollection that does not accept duplicates we can use it like this:
public class Example
{
private BlockingCollection<object> queue = new BlockingCollection<object>(new NoDuplicatesConcurrentQueue<object>());
public Example()
{
new Thread(Consume).Start();
}
public void Produce(object item)
{
bool unique = queue.TryAdd(item);
}
private void Consume()
{
while (true)
{
object item = queue.Take();
}
}
}
You may not like my implementation of NoDuplicatesConcurrentQueue. You are certainly free to implement your own using ConcurrentQueue or whatever if you think you need the low-lock performance that the TPL collections provide.
Update:
I was able to test the code this morning. There is some good news and bad news. The good news is that this will technically work. The bad news is that you probably will not want to do this because BlockingCollection.TryAdd intercepts the return value from the underlying IProducerConsumerCollection.TryAdd method and throws an exception when false is detected. Yep, that is right. It does not return false like you would expect and instead generates an exception. I have to be honest, this is both surprising and ridiculous. The whole point of the TryXXX methods is that they should not throw exceptions. I am deeply disappointed.

In addition to the caveat Brian Gideon mentioned after Update, his solution suffers from these performance issues:
O(n) operations on the queue (queue.Contains(item)) have a severe impact on performance as the queue grows
locks limit concurrency (which he does mention)
The following code improves on Brian's solution by
using a hash set to do O(1) lookups
combining 2 data structures from the System.Collections.Concurrent namespace
N.B. As there is no ConcurrentHashSet, I'm using a ConcurrentDictionary, ignoring the values.
In this rare case it is luckily possible to simply compose a more complex concurrent data structure out of multiple simpler ones, without adding locks. The order of operations on the 2 concurrent data structures is important here.
public class NoDuplicatesConcurrentQueue<T> : IProducerConsumerCollection<T>
{
private readonly ConcurrentDictionary<T, bool> existingElements = new ConcurrentDictionary<T, bool>();
private readonly ConcurrentQueue<T> queue = new ConcurrentQueue<T>();
public bool TryAdd(T item)
{
if (existingElements.TryAdd(item, false))
{
queue.Enqueue(item);
return true;
}
return false;
}
public bool TryTake(out T item)
{
if (queue.TryDequeue(out item))
{
bool _;
existingElements.TryRemove(item, out _);
return true;
}
return false;
}
...
}
N.B. Another way at looking at this problem: You want a set that preserves the insertion order.

I would suggest implementing your operations with lock so that you don't read and write the item in a way that corrupts it, making them atomic. For example, with any IEnumerable:
object bcLocker = new object();
// ...
lock (bcLocker)
{
bool foundTheItem = false;
foreach (someClass nextItem in myBlockingColl)
{
if (nextItem.Equals(item))
{
foundTheItem = true;
break;
}
}
if (foundTheItem == false)
{
// Add here
}
}

How to access the underlying default concurrent queue of a blocking collection?
The BlockingCollection<T> is backed by a ConcurrentQueue<T> by default. In other words if you don't specify explicitly its backing storage, it will create a ConcurrentQueue<T> behind the scenes. Since you want to have direct access to the underlying storage, you can create manually a ConcurrentQueue<T> and pass it to the constructor of the BlockingCollection<T>:
ConcurrentQueue<Item> queue = new();
BlockingCollection<Item> collection = new(queue);
Unfortunately the ConcurrentQueue<T> collection doesn't have a TryPeek method with an input parameter, so what you intend to do is not possible:
if (!queue.TryPeek(item)) // Compile error, missing out keyword
collection.Add(item);
Also be aware that the queue is now owned by the collection. If you attempt to mutate it directly (by issuing Enqueue or TryDequeue commands), the collection will throw exceptions.

Related

Handling concurrency at group level rather than application level

I would like to handle the concurrent issue in the API. Here is a situation where we get a request from multiple users for the same group. There can be multiple groups as well. Below solution i think should work, correct me
// This will be a singleton across the API
ConcurrentDictionary<string, string> dict = new ConcurrentDictionary<string, string>();
if (dict.ContainsKey(groupId)) {
throw new Exception("request already accepted");
} else {
// Thinking this is thread lock operation or i can put lock statement
if(dict.TryAdd(groupId, "Added") == false) {
throw new Exception("request already accepted");
}
// continue the original logic
}
After every 10 minutes, we will clean off the older keys in dictionary (note this operation should work normal i.e. like thread is not locked mode because it will be working on already used and old keys). Does concurrent dictionary have thread locking at key level rather than dictionary level? so that we don't block all the requests instead we only block particular requests related to the group. Any help is greatly appreciated.
One quick solution is having lock wrapper around get and add of dictionary operation but this would stop all the requests from proceeding, we want to block at group level. Any help is greatly appreciated.
Adding stuff into a concurrent dictionary is a very fast operation. You are also not making threads wait for the first one to finish, you are throwing right away if they cannot acquire the lock.
That makes me think that probably Double Checked Lock is not really needed for your case
So, I would simply do your inner check without the outer one:
if(dict.TryAdd(groupId, "Added") == false)
{
throw new Exception("request already accepted");
}
If you have waaaay too many request after the first one, then I would do what you have done, since ContainsKey will not lock
Another interesting topic is how you are going to clean this.
maybe you could do all this locking in an IDisposable object that can remove itself at dispose time. For example:
// NOTE: THIS IS JUST PSEUDOCODE
// In your controller, you can simply do this...
//
public SomeController()
{
using (var operation = new GroupOperation(groupId))
{
// In here I am sure I am the only operation of this group
}
// In here I am sure that the operation got removed from the dictionary
}
// This class hides all the complexity of the concurrent dictionary
//
public class GroupOperation : IDisposable
{
var singletonDictionary = new ConcurrentDictionary<int,int>()
int GroupId;
public GroupOperation(int GroupID)
{
this.GroupId = GroupId;
if(!singletonDictionary.TryADd(GroupID, 1))
{
throw new Exception("Sorry, operation in progress for your group");
}
}
protected virtual void Dispose(bool disposing)
{
singletonDictionary.Remove(GroupId)
}
}

How to implement this specific Producer-Consumer pattern

I'm trying to write a windows service whose producers and consumers work like this:
Producer: At scheduled times, get all unprocessed items (Processed = 0 on their row in the db) and add each one to the work queue that isn't already in the work queue
Consumer: Constantly pull items from the work queue and process them and update the db (Processed = 1 on their row)
I've tried to look for examples of this exact data flow in C#.NET so I can leverage the existing libraries. But so far I haven't found exactly that.
I see on https://blog.stephencleary.com/2012/11/async-producerconsumer-queue-using.html the example
private static void Produce(BufferBlock<int> queue, IEnumerable<int> values)
{
foreach (var value in values)
{
queue.Post(value);
}
queue.Complete();
}
private static async Task<IEnumerable<int>> Consume(BufferBlock<int> queue)
{
var ret = new List<int>();
while (await queue.OutputAvailableAsync())
{
ret.Add(await queue.ReceiveAsync());
}
return ret;
}
Here's the "idea" of what I'm trying to modify that to do:
while(true)
{
if(await WorkQueue.OutputAvailableAsync())
{
ProcessItem(await WorkQueue.ReceiveAsync());
}
else
{
await Task.Delay(5000);
}
}
...would be how the Consumer works, and
MyTimer.Elapsed += Produce;
static async void Produce(object source, ElapsedEventArgs e)
{
IEnumerable<Item> items = GetUnprocessedItemsFromDb();
foreach(var item in items)
if(!WorkQueue.Contains(w => w.Id == item.Id))
WorkQueue.Enqueue(item);
}
...would be how the Producer works.
That's a rough idea of what I'm trying to do. Can any of you show me the right way to do it, or link me to the proper documentation for solving this type of problem?
Creating a custom BufferBlock<T> that rejects duplicate messages is anything but trivial. The TPL Dataflow components do not expose their internal state for the purpose of customization. You can see here an attempt to circumvent this limitation, by creating a custom ActionBlock<T> with an exposed IEnumerable<T> InputQueue property. The code is lengthy and obscure, and creating a custom BufferUniqueBlock<T> might need double the amount of code, because this class implements the ISourceBlock<T> interface too.
My suggestion is to find some other way to avoid processing twice an Item, instead of preventing duplicates from entering the queue. Maybe you could add the responsibility to the Consumer to query the database, and check if the currently received item is unprocessed, before actually processing it.

Are threads automatically queued in order to access locked code?

I have a code that manages a large queue of data, it's locked witch lock statement to ensure only a single thread is working on it at a time.
The order of data in queue is really important, and each thread with its parameters can either add or take from it.
How do I ensure threads are queued to start in order of FIFO like my queue? Does the lock statement guarantee this?
var t = new Thread(() => parse(params)); //This is how I start my threads.
t.Start();
No, the lock statement does not guarantee FIFO ordering. Per Albahari:
If more than one thread contends the lock, they are queued on a “ready queue” and granted the lock on a first-come, first-served basis (a caveat is that nuances in the behavior of Windows and the CLR mean that the fairness of the queue can sometimes be violated).
If you want to ensure that your items are retrieved in a FIFO order, you should use the ConcurrentQueue<T> collection instead.
Edit: If you're targeting .NET 2.0, you could use a custom implementation for a concurrent thread-safe queue. Here's a trivial one:
public class ThreadSafeQueue<T>
{
private readonly object syncLock = new object();
private readonly Queue<T> innerQueue = new Queue<T>();
public void Enqueue(T item)
{
lock (syncLock)
innerQueue.Enqueue(item);
}
public bool TryDequeue(out T item)
{
lock (syncLock)
{
if (innerQueue.Count == 0)
{
item = default(T);
return false;
}
item = innerQueue.Dequeue();
return true;
}
}
}
Lock does't guarantee First In First Out access. An alternate approach would be Queue if you are limited with .NET 2.0. Keep in mind that, Queue is not thread safe hence you should synchronize the access.

Is there anything like a expandable Queue in C#?

i have a set of IDs on which i do some operations:
Queue<string> queue = new Queue<string>();
queue.Enqueue("1");
queue.Enqueue("2");
...
queue.Enqueue("10");
foreach (string id in queue)
{
DoSomeWork(id);
}
static void DoSomeWork(string id)
{
// Do some work and oooo there are new ids which should also be processed :)
foreach(string newID in newIDs)
{
if(!queue.Contains(newID)) queue.Enqueue(newID);
}
}
Is it possible to add some new items to queue in DoSomeWork() which will be also processed bei the main foreach-Loop?
What you're doing is to use an iterator over a changing collection. This is bad practice, since some collections will throw an exception when doing this (e.g. the collection should not change during the enumeration).
Use the following approach, which does use new items as well:
while (queue.Count > 0)
{
DoSomeWork(queue.Dequeue());
}
Use Dequeue instead of a foreach-loop. Most enumerators become invalid whenever the underlying container is changed. And En-/Dequeue are the natural operations on a Queue. Else you could use List<T> or HashSet<T>
while(queue.Count>0)
{
var value=queue.Dequeue();
...
}
To check if an item has already been processed a HashSet<T> is a fast solution. I typically use a combination of HashSet and Queue in those cases. The advantage of this solution is that it's O(n) because checking and adding to a HashSet is O(1). Your original code was O(n^2) since Contains on a Queue is O(n).
Queue<string> queue=new Queue<string>();
HashSet<string> allItems=new HashSet<string>();
void Add(string item)
{
if(allItems.Add(item))
queue.Enqueue(item);
}
void DoWork()
{
while(queue.Count>0)
{
var value=queue.Dequeue();
...
}
}
It is common for loop iterations to add more work; just pass the queue into the method as an argument and adding to it should work fine.
The problem is that ou should be using Dequeue:
while(queue.Count>0) {
DoSomeWork(queue.Dequeue());
}

Producer Consumer queue does not dispose

i have built a Producer Consumer queue wrapping a ConcurrentQueue of .net 4.0 with SlimManualResetEvent signaling between the producing (Enqueue) and the consuming (while(true) thread based.
the queue looks like:
public class ProducerConsumerQueue<T> : IDisposable, IProducerConsumerQueue<T>
{
private bool _IsActive=true;
public int Count
{
get
{
return this._workerQueue.Count;
}
}
public bool IsActive
{
get { return _IsActive; }
set { _IsActive = value; }
}
public event Dequeued<T> OnDequeued = delegate { };
public event LoggedHandler OnLogged = delegate { };
private ConcurrentQueue<T> _workerQueue = new ConcurrentQueue<T>();
private object _locker = new object();
Thread[] _workers;
#region IDisposable Members
int _workerCount=0;
ManualResetEventSlim _mres = new ManualResetEventSlim();
public void Dispose()
{
_IsActive = false;
_mres.Set();
LogWriter.Write("55555555555");
for (int i = 0; i < _workerCount; i++)
// Wait for the consumer's thread to finish.
{
_workers[i].Join();
}
LogWriter.Write("6666666666");
// Release any OS resources.
}
public ProducerConsumerQueue(int workerCount)
{
try
{
_workerCount = workerCount;
_workers = new Thread[workerCount];
// Create and start a separate thread for each worker
for (int i = 0; i < workerCount; i++)
(_workers[i] = new Thread(Work)).Start();
}
catch (Exception ex)
{
OnLogged(ex.Message + ex.StackTrace);
}
}
#endregion
#region IProducerConsumerQueue<T> Members
public void EnqueueTask(T task)
{
if (_IsActive)
{
_workerQueue.Enqueue(task);
//Monitor.Pulse(_locker);
_mres.Set();
}
}
public void Work()
{
while (_IsActive)
{
try
{
T item = Dequeue();
if (item != null)
OnDequeued(item);
}
catch (Exception ex)
{
OnLogged(ex.Message + ex.StackTrace);
}
}
}
#endregion
private T Dequeue()
{
try
{
T dequeueItem;
//if (_workerQueue.Count > 0)
//{
_workerQueue.TryDequeue(out dequeueItem);
if (dequeueItem != null)
return dequeueItem;
//}
if (_IsActive)
{
_mres.Wait();
_mres.Reset();
}
//_workerQueue.TryDequeue(out dequeueItem);
return dequeueItem;
}
catch (Exception ex)
{
OnLogged(ex.Message + ex.StackTrace);
T dequeueItem;
//if (_workerQueue.Count > 0)
//{
_workerQueue.TryDequeue(out dequeueItem);
return dequeueItem;
}
}
public void Clear()
{
_workerQueue = new ConcurrentQueue<T>();
}
}
}
when calling Dispose it sometimes blocks on the join (one thread consuming) and the dispose method is stuck. i guess it get's stuck on the Wait of the resetEvents but for that i call the set on the dispose.
any suggestions?
Update: I understand your point about needing a queue internally. My suggestion to use a BlockingCollection<T> is based on the fact that your code contains a lot of logic to provide the blocking behavior. Writing such logic yourself is very prone to bugs (I know this from experience); so when there's an existing class within the framework that does at least some of the work for you, it's generally preferable to go with that.
A complete example of how you can implement this class using a BlockingCollection<T> is a little bit too large to include in this answer, so I've posted a working example on pastebin.com; feel free to take a look and see what you think.
I also wrote an example program demonstrating the above example here.
Is my code correct? I wouldn't say yes with too much confidence; after all, I haven't written unit tests, run any diagnostics on it, etc. It's just a basic draft to give you an idea how using BlockingCollection<T> instead of ConcurrentQueue<T> cleans up a lot of your logic (in my opinion) and makes it easier to focus on the main purpose of your class (consuming items from a queue and notifying subscribers) rather than a somewhat difficult aspect of its implementation (the blocking behavior of the internal queue).
Question posed in a comment:
Any reason you're not using BlockingCollection<T>?
Your answer:
[...] i needed a queue.
From the MSDN documentation on the default constructor for the BlockingCollection<T> class:
The default underlying collection is a ConcurrentQueue<T>.
If the only reason you opted to implement your own class instead of using BlockingCollection<T> is that you need a FIFO queue, well then... you might want to rethink your decision. A BlockingCollection<T> instantiated using the default parameterless constructor is a FIFO queue.
That said, while I don't think I can offer a comprehensive analysis of the code you've posted, I can at least offer a couple of pointers:
I'd be very hesitant to use events in the way that you are here for a class that deals with such tricky multithreaded behavior. Calling code can attach any event handlers it wants, and these can in turn throw exceptions (which you don't catch), block for long periods of time, or possibly even deadlock for reasons completely outside your control--which is very bad in the case of a blocking queue.
There's a race condition in your Dequeue and Dispose methods.
Look at these lines of your Dequeue method:
if (_IsActive) // point A
{
_mres.Wait(); // point C
_mres.Reset(); // point D
}
And now take a look at these two lines from Dispose:
_IsActive = false;
_mres.Set(); // point B
Let's say you have three threads, T1, T2, and T3. T1 and T2 are both at point A, where each checks _IsActive and finds true. Then Dispose is called, and T3 sets _IsActive to false (but T1 and T2 have already passed point A) and then reaches point B, where it calls _mres.Set(). Then T1 gets to point C, moves on to point D, and calls _mres.Reset(). Now T2 reaches point C and will be stuck forever since _mres.Set will not be called again (any thread executing Enqueue will find _IsActive == false and return immediately, and the thread executing Dispose has already passed point B).
I'd be happy to try and offer some help on solving this race condition, but I'm skeptical that BlockingCollection<T> isn't in fact exactly the class you need for this. If you can provide some more information to convince me that this isn't the case, maybe I'll take another look.
Since _IsActive isn't marked as volatile and there's no lock around all access, each core can have a separate cache for this value and that cache may never get refreshed. So marking _IsActive to false in Dispose will not actually affect all running threads.
http://igoro.com/archive/volatile-keyword-in-c-memory-model-explained/
private volatile bool _IsActive=true;

Categories