I understand that BlockingCollection using ConcurrentQueue has a boundedcapacity of 100.
However I'm unsure as to what that means.
I'm trying to achieve a concurrent cache which, can dequeue, can deque/enque in one operation if the queue size is too large, (i.e. loose messages when the cache overflows). Is there a way to use boundedcapacity for this or is it better to manually do this or create a new collection.
Basically I have a reading thread and several writing threads. I would like it if the data in the queue is the "freshest" of all the writers.
A bounded capacity of N means that if the queue already contains N items, any thread attempting to add another item will block until a different thread removes an item.
What you seem to want is a different concept - you want most recently added item to be the first item that is dequeued by the consuming thread.
You can achieve that by using a ConcurrentStack rather than a ConcurrentQueue for the underlying store.
You would use this constructor and pass in a ConcurrentStack.
For example:
var blockingCollection = new BlockingCollection<int>(new ConcurrentStack<int>());
By using ConcurrentStack, you ensure that each item that the consuming thread dequeues will be the freshest item in the queue at that time.
Also note that if you specify an upper bound for the blocking collection, you can use BlockingCollection.TryAdd() which will return false if the collection was full at the time you called it.
It sounds to me like you're trying to build something like an MRU (most recently used) cache. BlockingCollection is not the best way to do that.
I would suggest instead that you use a LinkedList. It's not thread-safe, so you'll have to provide your own synchronization, but that's not too tough. Your enqueue method looks like this:
LinkedList<MyType> TheQueue = new LinkedList<MyType>();
object listLock = new object();
void Enqueue(MyType item)
{
lock (listLock)
{
TheQueue.AddFirst(item);
while (TheQueue.Count > MaxQueueSize)
{
// Queue overflow. Reduce to max size.
TheQueue.RemoveLast();
}
}
}
And dequeue is even easier:
MyType Dequeue()
{
lock (listLock)
{
return (TheQueue.Count > 0) ? TheQueue.RemoveLast() : null;
}
}
It's a little more involved if you want the consumers to do non-busy waits on the queue. You can do it with Monitor.Wait and Monitor.Pulse. See the example on the Monitor.Pulse page for an example.
Update:
It occurs to me that you could do the same thing with a circular buffer (an array). Just maintain head and tail pointers. You insert at head and remove at tail. If you go to insert, and head == tail, then you need to increment tail, which effectively removes the previous tail item.
If you want a custom BlockingCollection that holds the N most recent elements, and drops the oldest elements when it's full, you could create one quite easily based on a Channel<T>. The Channels are intended to be used in asynchronous scenarios, but making them block the consumer is trivial and should not cause any unwanted side-effects (like deadlocks), even if used in an environment with a SynchronizationContext installed.
public class MostRecentBlockingCollection<T>
{
private readonly Channel<T> _channel;
public MostRecentBlockingCollection(int capacity)
{
_channel = Channel.CreateBounded<T>(new BoundedChannelOptions(capacity)
{
FullMode = BoundedChannelFullMode.DropOldest,
});
}
public bool IsCompleted => _channel.Reader.Completion.IsCompleted;
public void Add(T item)
=> _channel.Writer.WriteAsync(item).AsTask().GetAwaiter().GetResult();
public T Take()
=> _channel.Reader.ReadAsync().AsTask().GetAwaiter().GetResult();
public void CompleteAdding() => _channel.Writer.Complete();
public IEnumerable<T> GetConsumingEnumerable()
{
while (_channel.Reader.WaitToReadAsync().AsTask().GetAwaiter().GetResult())
while (_channel.Reader.TryRead(out var item))
yield return item;
}
}
The MostRecentBlockingCollection class blocks only the consumer. The producer can always add items in the collection, causing (potentially) some previously added elements to be dropped.
Adding cancellation support should be straightforward, since the Channel<T> API already supports it. Adding support for timeout is less trivial, but shouldn't be very difficult to do.
Related
I have a System.Collections.Generic.List<T> to which I only ever add items in a timer callback. The timer is restarted only after the operation completes.
I have a System.Collections.Concurrent.ConcurrentQueue<T> which stores indices of added items in the list above. This store operation is also always performed in the same timer callback described above.
Is a read operation that iterates the queue and accesses the corresponding items in the list thread safe?
Sample code:
private List<Object> items;
private ConcurrentQueue<int> queue;
private Timer timer;
private void callback(object state)
{
int index = items.Count;
items.Add(new object());
if (true)//some condition here
queue.Enqueue(index);
timer.Change(TimeSpan.FromMilliseconds(500), TimeSpan.FromMilliseconds(-1));
}
//This can be called from any thread
public IEnumerable<object> AccessItems()
{
foreach (var index in queue)
{
yield return items[index];
}
}
My understanding:
Even if the list is resized when it is being indexed, I am only accessing an item that already exists, so it does not matter whether it is read from the old array or the new array. Hence this should be thread-safe.
Is a read operation that iterates the queue and accesses the corresponding items in the list thread safe?
Is it documented as being thread safe?
If no, then it is foolish to treat it as thread safe, even if it is in this implementation by accident. Thread safety should be by design.
Sharing memory across threads is a bad idea in the first place; if you don't do it then you don't have to ask whether the operation is thread safe.
If you have to do it then use a collection designed for shared memory access.
If you can't do that then use a lock. Locks are cheap if uncontended.
If you have a performance problem because your locks are contended all the time then fix that problem by changing your threading architecture rather than trying to do dangerous and foolish things like low-lock code. No one writes low-lock code correctly except for a handful of experts. (I am not one of them; I don't write low-lock code either.)
Even if the list is resized when it is being indexed, I am only accessing an item that already exists, so it does not matter whether it is read from the old array or the new array.
That's the wrong way to think about it. The right way to think about it is:
If the list is resized then the list's internal data structures are being mutated. It is possible that the internal data structure is mutated into an inconsistent form halfway through the mutation, that will be made consistent by the time the mutation is finished. Therefore my reader can see this inconsistent state from another thread, which makes the behaviour of my entire program unpredictable. It could crash, it could go into an infinite loop, it could corrupt other data structures, I don't know, because I'm running code that assumes a consistent state in a world with inconsistent state.
Big edit
The ConcurrentQueue is only safe with regard to the Enqueue(T) and T Dequeue() operations.
You're doing a foreach on it and that doesn't get synchronized at the required level.
The biggest problem in your particular case is the fact the enumerating of the Queue (which is a Collection in it's own right) might throw the wellknown "Collection has been modified" exception. Why is that the biggest problem ? Because you are adding things to the queue after you've added the corresponding objects to the list (there's also a great need for the List to be synchronized but that + the biggest problem get solved with just one "bullet"). While enumerating a collection it is not easy to swallow the fact that another thread is modifying it (even if on a microscopic level the modification is a safe - ConcurrentQueue does just that).
Therefore you absolutely need synchronize the access to the queues (and the central List while you're at it) using another means of synchronization (and by that I mean you can also forget abount ConcurrentQueue and use a simple Queue or even a List since you never Dequeue things).
So just do something like:
public void Writer(object toWrite) {
this.rwLock.EnterWriteLock();
try {
int tailIndex = this.list.Count;
this.list.Add(toWrite);
if (..condition1..)
this.queue1.Enqueue(tailIndex);
if (..condition2..)
this.queue2.Enqueue(tailIndex);
if (..condition3..)
this.queue3.Enqueue(tailIndex);
..etc..
} finally {
this.rwLock.ExitWriteLock();
}
}
and in the AccessItems:
public IEnumerable<object> AccessItems(int queueIndex) {
Queue<object> whichQueue = null;
switch (queueIndex) {
case 1: whichQueue = this.queue1; break;
case 2: whichQueue = this.queue2; break;
case 3: whichQueue = this.queue3; break;
..etc..
default: throw new NotSupportedException("Invalid queue disambiguating params");
}
List<object> results = new List<object>();
this.rwLock.EnterReadLock();
try {
foreach (var index in whichQueue)
results.Add(this.list[index]);
} finally {
this.rwLock.ExitReadLock();
}
return results;
}
And, based on my entire understanding of the cases in which your app accesses the List and the various Queues, it should be 100% safe.
End of big edit
First of all: What is this thing you call Thread-Safe ? by Eric Lippert
In your particular case, I guess the answer is no.
It is not the case that inconsistencies might arrise in the global context (the actual list).
Instead it is possible that the actual readers (who might very well "collide" with the unique writer) end up with inconsistencies in themselves (their very own Stacks meaning: local variables of all methods, parameters and also their logically isolated portion of the heap)).
The possibility of such "per-Thread" inconsistencies (the Nth thread wants to learn the number of elements in the List and finds out that value is 39404999 although in reality you only added 3 values) is enough to declare that, generally speaking that architecture is not thread-safe ( although you don't actually change the globally accessible List, simply by reading it in a flawed manner ).
I suggest you use the ReaderWriterLockSlim class.
I think you will find it fits your needs:
private ReaderWriterLockSlim rwLock = new ReaderWriterLockSlim(LockRecursionPolicy.SupportsRecursion);
private List<Object> items;
private ConcurrentQueue<int> queue;
private Timer timer;
private void callback(object state)
{
int index = items.Count;
this.rwLock.EnterWriteLock();
try {
// in this place, right here, there can be only ONE writer
// and while the writer is between EnterWriteLock and ExitWriteLock
// there can exist no readers in the following method (between EnterReadLock
// and ExitReadLock)
// we add the item to the List
// AND do the enqueue "atomically" (as loose a term as thread-safe)
items.Add(new object());
if (true)//some condition here
queue.Enqueue(index);
} finally {
this.rwLock.ExitWriteLock();
}
timer.Change(TimeSpan.FromMilliseconds(500), TimeSpan.FromMilliseconds(-1));
}
//This can be called from any thread
public IEnumerable<object> AccessItems()
{
List<object> results = new List<object>();
this.rwLock.EnterReadLock();
try {
// in this place there can exist a thousand readers
// (doing these actions right here, between EnterReadLock and ExitReadLock)
// all at the same time, but NO writers
foreach (var index in queue)
{
this.results.Add ( this.items[index] );
}
} finally {
this.rwLock.ExitReadLock();
}
return results; // or foreach yield return you like that more :)
}
No because you are reading and writing to/from the same object concurrently. This is not documented to be safe so you can't be sure it is safe. Don't do it.
The fact that it is in fact unsafe as of .NET 4.0 means nothing, btw. Even if it was safe according to Reflector it could change anytime. You can't rely on the current version to predict future versions.
Don't try to get away with tricks like this. Why not just do it in an obviously safe way?
As a side note: Two timer callbacks can execute at the same time, so your code is doubly broken (multiple writers). Don't try to pull off tricks with threads.
It is thread-safish. The foreach statement uses the ConcurrentQueue.GetEnumerator() method. Which promises:
The enumeration represents a moment-in-time snapshot of the contents of the queue. It does not reflect any updates to the collection after GetEnumerator was called. The enumerator is safe to use concurrently with reads from and writes to the queue.
Which is another way of saying that your program isn't going to blow up randomly with an inscrutable exception message like the kind you'll get when you use the Queue class. Beware of the consequences though, implicit in this guarantee is that you may well be looking at a stale version of the queue. Your loop will not be able to see any elements that were added by another thread after your loop started executing. That kind of magic doesn't exist and is impossible to implement in a consistent way. Whether or not that makes your program misbehave is something you will have to think about and can't be guessed from the question. It is pretty rare that you can completely ignore it.
Your usage of the List<> is however utterly unsafe.
(NOTE: I'm using .Net 4, not .Net 4.5, so I cannot use the TPL's DataflowBlock classes.)
TL;DR Version
Ultimately, I'm just looking for a way to process sequential work items using multiple threads in a way that preserves their order in the final output, without requiring an unbounded output buffer.
Motivation
I have existing code to provide a multithreaded mechanism for processing multiple blocks of data where one I/O-bound thread (the "supplier") is reponsible for enqueuing blocks of data for processing. These blocks of data comprise the work items.
One or more threads (the "processors") are responsible for dequeuing one work item at a time, which they process and then write the processed data to an output queue before dequeuing their next work item.
A final I/O-bound thread (the "consumer") is responsible for dequeuing completed work items from the output queue and writing them to the final destination. These work items are (and must be) written in the same order that they were enqueued. I implemented this using a concurrent priority queue, where the priority of each item is defined by its source index.
I'm using this scheme to do some custom compression on a large data stream, where the compression itself is relatively slow but the reading of the uncompressed data and the writing of the compressed data is relatively fast (although I/O-bound).
I process the data in fairly large chunks of the order of 64K, so the overhead of the pipeline is relatively small.
My current solution is working well but it involves a lot of custom code written 6 years ago using many synchronisation events, and the design seems somewhat clunky; therefore I have embarked on academic excercise to see if it can be rewritten using more modern .Net libraries.
The new design
My new design uses the BlockingCollection<> class, and is based somewhat on this Microsoft article.
In particular, look at the section entitled Load Balancing Using Multiple Producers. I have tried using that approach, and therefore I have several processing tasks each of which takes work items from a shared input BlockingCollection and writes its completed items to its own BlockingCollection output queue.
Because each processing task has its own output queue, I'm trying to use BlockingCollection.TakeFromAny() to dequeue the first available completed work item.
The Multiplexer problem
So far so good, but now here comes the problem. The Microsoft article states:
The gaps are a problem. The next stage of the pipeline, the Display Image stage, needs to show images in order and without gaps in the sequence. This is where the multiplexer comes in. Using the TakeFromAny method, the multiplexer waits for input from both of the filter stage producer queues. When an image arrives, the multiplexer looks to see if the image's sequence number is the next in the expected sequence. If it is, the multiplexer passes it to the Display Image stage. If the image is not the next in the sequence, the multiplexer holds the value in an internal look-ahead buffer and repeats the take operation for the input queue that does not have a look-ahead value. This algorithm allows the multiplexer to put together the inputs from the incoming producer queues in a way that ensures sequential order without sorting the values.
Ok, so what happens is that the processing tasks can produce finished items in pretty much any order. The multiplexer is responsible for outputting these items in the correct order.
However...
Imagine that we have 1000 items to process. Further imagine that for some weird reason, the very first item takes longer to process that all the other items combined.
Using my current scheme, the multiplexer will keep reading and buffering items from all the processing output queues until it finds the next one that it's supposed to output. Since the item that its waiting for is (according to my "imagine if" above) only going to appear after ALL the other work items have been processed, I will effectively be buffering all the work items in the entire input!
The amount of data is way too large to allow this to happen. I need to be able to stop the processing tasks from outputting completed work items when the output queue has reached a certain maximum size (i.e. it's a bounded output queue) UNLESS the work item happens to be the one the multiplexer is waiting for.
And that's where I'm getting a bit stuck. I can think of many ways to actually implement this, but they all seem to be over-complex to the extent that they are no better than the code I'm thinking to replace!
What's my question?
My question is: Am I going about this the right way?
I would have thought this would be a well-understood problem, but my research has only turned up articles that seem to ignore the unbounded buffering problem that occurs if a work item takes a very long time compared to all the other work items.
Can anyone point me at any articles that describe a reasonable way to achieve this?
TL;DR Version
Ultimately, I'm just looking for a way to process sequential work items using multiple threads in a way that preserves their order in the final output, without requiring an unbounded output buffer.
Create a pool of items at startup, 1000, say. Store them on a BlockingCollection - a 'pool queue'.
The supplier gets items from the pool queue, loads them from the file, loads in the sequence-number/whatever and submits them to the processors threadpool.
The processors do their stuff and sends the output to the multiplexer. The multiplexer does it job of storing any out-of-order items until earlier items have been processed.
When an item has been completely consumed by whatever the multiplexer outputs to, they are returned to the pool queue for re-use by the supplier.
If one 'slow item' does require enormous amounts of processing, the out-of-order collection in the multiplexer will grow as the 'quick items' slip through on the other pool threads, but because the multiplexer is not actually feeding its items to its output, the pool queue is not being replenished.
When the pool empties, the supplier will block on it and will be unable to supply any more items.
The 'quick items' remaining on the processing pool input will get processed and then processing will stop except for the 'slow item'. The supplier is blocked, the multiplexer has [poolSize-1] items in its collection. No extra memory is being used, no CPU is being wasted, the only thing happening is the processing of the 'slow item'.
When the 'slow item' is finally done, it gets output to the multiplexer.
The multiplexer can now output all [poolSize] items in the required sequential order. As these items are consumed, the pool gets filled up again and the supplier, now able to get items from the pool, runs on, again reading its file an queueing up items to the processor pool.
Auto-regulating, no bounded buffers required, no memory runaway.
Edit: I meant 'no bounded buffers required' :)
Also, no GC holdups - since the items are re-used, they don't need GC'ing.
I think you misunderstand the article. According to the description, it doesn't have an unbounded buffer, there will be at most one value in the look-ahread buffer for each queue. When you dequeue a value that's not the next one, you save it and then wait only on the queue that doesn't have a value in the buffer. (If you have multiple input buffers, the logic will have to be more complicated, or you would need a tree of 2 queue multiplexers.)
If you combine this with BlockingCollections that have specified bounded capacity, you get exactly the behavior you want: if one producer is too slow, the others will pause until the slow thread catches up.
Have you considered not using manual producer/consumer buffering but instead the .AsParallel().AsOrdered() PLINQ alternative? Semantically, this is exactly what you want - a sequence of items processed in parallel but ordered in output. Your code could look as simple as...
var orderedOutput =
ReadSequentialBlocks()
.AsParallel()
.AsOrdered()
.Select(ProcessBlock)
foreach(var item in orderedOutput)
Sink(item);
The default degree of parallelism is the number of processors on your machine, but you can tune it. There is an automatic output buffer. If the default buffering consumes too many resources, you can turn it off:
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
However, I'd certainly give the plain unadorned version a shot first - you never know, it might just work fine out of the box. Finally, if you want the simplicity of auto-multiplexing but a larger-than-zero yet non-automatic buffer, you could always use the PLINQ query to fill a fixed-size BlockingCollection<> which is read with a consuming enumerable on another thread.
Follow up
For completeness, here is the code that I wound up with. Thanks to Martin James for his answer, which provided the basis for the solution.
I'm still not completely happy with the multiplexor (see ParallelWorkProcessor.multiplex()). It works, but it seems a bit klunky.
I used Martin James' idea about a work pool to prevent unbounded growth of the multiplexor buffer, however I substituted a SemaphoreSlim for the work pool queue (since it provides the same functionality, but it's a bit simpler to use and uses less resources).
The worker tasks write their completed items to a concurrent priority queue. This allows me to easily and efficiently find the next item to output.
I used a sample concurrent priority queue from Microsoft, modified to provide an autoreset event that's signalled whenever a new item is enqueued.
Here's the ParallelWorkProcessor class. You use it by providing it with three delegates; one to provide the work items, one to process a work item, and one to output a completed work item.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics.Contracts;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
public sealed class ParallelWorkProcessor<T> where T: class // T is the work item type.
{
public delegate T Read(); // Called by only one thread.
public delegate T Process(T block); // Called simultaneously by multiple threads.
public delegate void Write(T block); // Called by only one thread.
public ParallelWorkProcessor(Read read, Process process, Write write, int numWorkers = 0)
{
_read = read;
_process = process;
_write = write;
numWorkers = (numWorkers > 0) ? numWorkers : Environment.ProcessorCount;
_workPool = new SemaphoreSlim(numWorkers*2);
_inputQueue = new BlockingCollection<WorkItem>(numWorkers);
_outputQueue = new ConcurrentPriorityQueue<int, T>();
_workers = new Task[numWorkers];
startWorkers();
Task.Factory.StartNew(enqueueWorkItems);
_multiplexor = Task.Factory.StartNew(multiplex);
}
private void startWorkers()
{
for (int i = 0; i < _workers.Length; ++i)
{
_workers[i] = Task.Factory.StartNew(processBlocks);
}
}
private void enqueueWorkItems()
{
int index = 0;
while (true)
{
T data = _read();
if (data == null) // Signals end of input.
{
_inputQueue.CompleteAdding();
_outputQueue.Enqueue(index, null); // Special sentinel WorkItem .
break;
}
_workPool.Wait();
_inputQueue.Add(new WorkItem(data, index++));
}
}
private void multiplex()
{
int index = 0; // Next required index.
int last = int.MaxValue;
while (index != last)
{
KeyValuePair<int, T> workItem;
_outputQueue.WaitForNewItem(); // There will always be at least one item - the sentinel item.
while ((index != last) && _outputQueue.TryPeek(out workItem))
{
if (workItem.Value == null) // The sentinel item has a null value to indicate that it's the sentinel.
{
last = workItem.Key; // The sentinel's key is the index of the last block + 1.
}
else if (workItem.Key == index) // Is this block the next one that we want?
{
// Even if new items are added to the queue while we're here, the new items will be lower priority.
// Therefore it is safe to assume that the item we will dequeue now is the same one we peeked at.
_outputQueue.TryDequeue(out workItem);
Contract.Assume(workItem.Key == index); // This *must* be the case.
_workPool.Release(); // Allow the enqueuer to queue another work item.
_write(workItem.Value);
++index;
}
else // If it's not the block we want, we know we'll get a new item at some point.
{
_outputQueue.WaitForNewItem();
}
}
}
}
private void processBlocks()
{
foreach (var block in _inputQueue.GetConsumingEnumerable())
{
var processedData = _process(block.Data);
_outputQueue.Enqueue(block.Index, processedData);
}
}
public bool WaitForFinished(int maxMillisecondsToWait) // Can be Timeout.Infinite.
{
return _multiplexor.Wait(maxMillisecondsToWait);
}
private sealed class WorkItem
{
public WorkItem(T data, int index)
{
Data = data;
Index = index;
}
public T Data { get; private set; }
public int Index { get; private set; }
}
private readonly Task[] _workers;
private readonly Task _multiplexor;
private readonly SemaphoreSlim _workPool;
private readonly BlockingCollection<WorkItem> _inputQueue;
private readonly ConcurrentPriorityQueue<int, T> _outputQueue;
private readonly Read _read;
private readonly Process _process;
private readonly Write _write;
}
}
And here's my test code:
using System;
using System.Diagnostics;
using System.Threading;
namespace Demo
{
public static class Program
{
private static void Main(string[] args)
{
_rng = new Random(34324);
int threadCount = 8;
_maxBlocks = 200;
ThreadPool.SetMinThreads(threadCount + 2, 4); // Kludge to prevent slow thread startup.
var stopwatch = new Stopwatch();
_numBlocks = _maxBlocks;
stopwatch.Restart();
var processor = new ParallelWorkProcessor<byte[]>(read, process, write, threadCount);
processor.WaitForFinished(Timeout.Infinite);
Console.WriteLine("\n\nFinished in " + stopwatch.Elapsed + "\n\n");
}
private static byte[] read()
{
if (_numBlocks-- == 0)
{
return null;
}
var result = new byte[128];
result[0] = (byte)(_maxBlocks-_numBlocks);
Console.WriteLine("Supplied input: " + result[0]);
return result;
}
private static byte[] process(byte[] data)
{
if (data[0] == 10) // Hack for test purposes. Make it REALLY slow for this item!
{
Console.WriteLine("Delaying a call to process() for 5s for ID 10");
Thread.Sleep(5000);
}
Thread.Sleep(10 + _rng.Next(50));
Console.WriteLine("Processed: " + data[0]);
return data;
}
private static void write(byte[] data)
{
Console.WriteLine("Received output: " + data[0]);
}
private static Random _rng;
private static int _numBlocks;
private static int _maxBlocks;
}
}
I need to implement the producer/consumer pattern around a fixed-size FIFO queue. I think a wrapper class around a ConcurrentQueue might work for this but I'm not completely sure (and I've never worked with a ConcurrentQueue before). The twist in this is that the queue needs to only hold a fixed number of items (strings, in my case). My application will have one producer task/thread and one consumer task/thread. When my consumer task runs, it needs to dequeue all of the items that exist in the queue at that moment in time and process them.
For what it's worth, processing of the queued items by my consumer is nothing more than uploading them via SOAP to a web app that isn't 100% reliable. If the connection can't be established or the call SOAP call fails, I'm supposed to discard those items and go back to the queue for more. Because of the overhead of SOAP, I was trying to maximize the number of items from the queue that I could send in one SOAP call.
At times, my producer may add items faster than my consumer is able to remove and process them. If the queue is already full and my producer needs to add another item, I need to enqueue the new item but then dequeue the oldest item so that the size of the queue remains fixed. Basically, I need to keep the most recent items that are produced in the queue at all time (even if it means some items don't get consumed because my consumer is currently processing previous items).
With regard to the producer keeping the number if items in the queue fixed, I found one potential idea from this question:
Fixed size queue which automatically dequeues old values upon new enques
I'm currently using a wrapper class (based on that answer) around a ConcurrentQueue with an Enqueue() method like this:
public class FixedSizeQueue<T>
{
readonly ConcurrentQueue<T> queue = new ConcurrentQueue<T>();
public int Size { get; private set; }
public FixedSizeQueue(int size)
{
Size = size;
}
public void Enqueue(T obj)
{
// add item to the queue
queue.Enqueue(obj);
lock (this) // lock queue so that queue.Count is reliable
{
while (queue.Count > Size) // if queue count > max queue size, then dequeue an item
{
T objOut;
queue.TryDequeue(out objOut);
}
}
}
}
I create an instance of this class with a size limit on the queue like this:
FixedSizeQueue<string> incomingMessageQueue = new FixedSizeQueue<string>(10); // 10 item limit
I start up my producer task and it begins filling the queue. The code in my Enqueue() method seems to be working properly with regard to removing the oldest item from the queue when adding an item causes the queue count to exceed the max size. Now I need my consumer task to dequeue items and process them but here's where my brain gets confused. What's the best way to implement a Dequeue method for my consumer that will take a snapshot of the queue at a moment in time and dequeue all items for processing (the producer may still be adding items to the queue during this process)?
Simply stated, the ConcurrentQueue has a "ToArray" method which, when entered, will lock the collection and produce a "snapshot" of all current items in the queue. If you want your consumer to be given a block of things to work on, you can lock the same object the enqueueing method has, Call ToArray(), and then spin through a while(!queue.IsEmpty) queue.TryDequeue(out trash) loop to clear the queue, before returning the array you extracted.
This would be your GetAll() method:
public T[] GetAll()
{
lock (syncObj) // so that we don't clear items we didn't get with ToArray()
{
var result = queue.ToArray();
T trash;
while(!queue.IsEmpty) queue.TryDequeue(out trash);
}
}
Since you have to clear out the queue, you could simply combine the two operations; create an array of the proper size (using queue.Count), then while the queue is not empty, Dequeue an item and put it in the array, before returning.
Now, that's the answer to the specific question. I must now in good conscience put on my CodeReview.SE hat and point out a few things:
NEVER use lock(this). You never know what other objects may be using your object as a locking focus, and thus would be blocked when the object locks itself from the inside. The best practice is to lock a privately scoped object instance, usually one created just to be locked: private readonly object syncObj = new object();
Since you're locking critical sections of your wrapper anyway, I would use an ordinary List<T> instead of a concurrent collection. Access is faster, it's more easily cleaned out, so you'll be able to do what you're doing much more simply than ConcurrentQueue allows. To enqueue, lock the sync object, Insert() before index zero, then remove any items from index Size to the list's current Count using RemoveRange(). To dequeue, lock the same sync object, call myList.ToArray() (from the Linq namespace; does pretty much the same thing as ConcurrentQueue's does) and then call myList.Clear() before returning the array. Couldn't be simpler:
public class FixedSizeQueue<T>
{
private readonly List<T> queue = new List<T>();
private readonly object syncObj = new object();
public int Size { get; private set; }
public FixedSizeQueue(int size) { Size = size; }
public void Enqueue(T obj)
{
lock (syncObj)
{
queue.Insert(0,obj)
if(queue.Count > Size)
queue.RemoveRange(Size, Count-Size);
}
}
public T[] Dequeue()
{
lock (syncObj)
{
var result = queue.ToArray();
queue.Clear();
return result;
}
}
}
You seem to understand that you are throwing enqueued items away using this model. That's usually not a good thing, but I'm willing to give you the benefit of the doubt. However, I will say there is a lossless way to achieve this, using a BlockingCollection. A BlockingCollection wraps any IProducerConsumerCollection including most System.Collections.Concurrent classes, and allows you to specify a maximum capacity for the queue. The collection will then block any thread attempting to dequeue from an empty queue, or any thread attempting to add to a full queue, until items have been added or removed such that there is something to get or room to insert. This is the best way to implement a producer-consumer queue with a maximum size, or one that would otherwise require "polling" to see if there's something for the consumer to work on. If you go this route, only the ones the consumer has to throw away are thrown away; the consumer will see all the rows the producer puts in and makes its own decision about each.
You don't want to use lock with this. See Why is lock(this) {…} bad? for more details.
This code
// if queue count > max queue size, then dequeue an item
while (queue.Count > Size)
{
T objOut;
queue.TryDequeue(out objOut);
}
suggests that you need to somehow wait or notify the consumer about the item's availability. In this case consider using BlockingCollection<T> instead.
If I have a ConcurrentQueue, is there a preferred way to consume it with a Linq statement? It doesn't have a method to dequeue all the items as a sequence, and it's enumerator doesn't remove items.
I'm doing batch consumption, meaning periodically I want to process the queue and empty it, instead of processing it until it is empty and blocking until more items are enqueued. BlockingCollection doesn't seem like it will work because it will block when it gets to the last item, and I want that thread to do other stuff, like clear other queues.
static ConcurrentQueue<int> MyQueue = new ConcurrentQueue<int>();
void Main()
{
MyQueue.Enqueue(1);MyQueue.Enqueue(2);MyQueue.Enqueue(3);MyQueue.Enqueue(4);MyQueue.Enqueue(5);
var lst = MyQueue.ToLookup(x => x.SomeProperty);
//queue still has all elements
MyQueue.Dump("queue");
}
For now, I've made a helper method
static IEnumerable<T> ReadAndEmptyQueue<T>(this ConcurrentQueue<T> q)
{
T item;
while(q.TryDequeue(out item))
{
yield return item;
}
}
var lk = MyQueue.ReadAndEmptyQueue().ToLookup(x => x.SomeProperty);
MyQueue.Dump(); //size is now zero
Is there a better way, or am I doing it right?
Your approach is very reasonable, in my opinion. Allowing a consumer to empty the queue in this fashion is clean and simple.
BlockingCollection doesn't seem like it will work because it will block when it gets to the last item, and I want that thread to do other stuff, like clear other queues.
The one thing I'd mention - sometimes, from a design standpoint, it's easier to just fire off a separate consumer thread per queue. If you do that, each BlockingCollection<T> can just use GetConsumingEnumerable() and block as needed, as they'll be in a wait state when the queue is empty.
This is the approach I take more frequently, as it's often much simpler from a synchronization standpoint if each collection has one or more dedicated consumers, instead of a consumer switching between what it's consuming.
I'm creating a Windows service that makes use of a FileSystemWatcher to monitor a particular folder for additions of a particular file type. Due the gap between the Created event and when the file is actually ready to be manipulated, I created a Queue<T> to hold the file names that need processing. In the Created event handler, the item is added to the queue. Then using a timer, I periodically grab the first item from the queue and process it. If the processing fails, the item is added back to the queue so the service can retry processing it later.
This works fine but I've found it has one side-effect: the first processing attempt for new items does not happen until all the old retry items have been retried. Since it's possible the queue could contain many items, I'd like to force the new items to the front of the queue so they are processed first. But from the Queue<T> documentation, there is no obvious way of adding an item to the front of the queue.
I suppose I could create a second queue for new items and process that one preferentially but having a single queue seems simpler.
So is there an easy way to add an item to the front of the queue?
It kind of looks like you want a LinkedList<T>, which allows you to do things like AddFirst(), AddLast(), RemoveFirst(), and RemoveLast().
Simply use the Peek method in your timer callback instead of Dequeue.
If processing succeeds, then dequeue the item.
Well, I agree with CanSpice; however, you could:
var items = queue.ToArray();
queue.Clear();
queue.Enqueue(newFirstItem);
foreach(var item in items)
queue.Enqueue(item);
Nasty hack, but it will work ;)
Rather you might think about adding a second queue instance. This one being the 'priority' queue that you check/execute first. This would be a little cleaner. You might even create your own queue class to wrap it all up nice and neat like ;)
Do not take from the queue until you processed it, is an alternative solution.
Use queue.Peek() to get your first item and only Dequeue() when the operation succeeds.
Use queue.Count > 0 before you Peek(), else you will get the 'queue is empty' result.
I would suggest using two queues: one for new items, and one for retry items. Wrap both queues in a single object that has the same semantics as a queue as far as removal goes, but allows you to flag things as going into the New queue or the Retry queue on insert. Something like:
public class DoubleQueue<T>
{
private Queue<T> NewItems = new Queue<T>();
private Queue<T> RetryItems = new Queue<T>();
public Enqueue(T item, bool isNew)
{
if (isNew)
NewItems.Enqueue(item);
else
RetryItems.Enqueue(item);
}
public T Dequeue()
{
if (NewItems.Count > 0)
return NewItems.Dequeue();
else
return RetryItems.Dequeue();
}
}
Of course, you'll need to have a Count property that returns the number of items in both queues.
If you have more than two types of items, then it's time to upgrade to a priority queue.
Sounds like what you're after is a Stack - This is a LIFO (Last in, first out) buffer.
You need a Priority Queue. Take a look at the C5 Collections Library. It's IntervalHeap implements the IPriorityQueue interface. The C5 collections library is pretty good, too.
I believe you can find implementations at http://www.codeproject.com and http://www.codeplex.com as well.