I'm creating a Windows service that makes use of a FileSystemWatcher to monitor a particular folder for additions of a particular file type. Due the gap between the Created event and when the file is actually ready to be manipulated, I created a Queue<T> to hold the file names that need processing. In the Created event handler, the item is added to the queue. Then using a timer, I periodically grab the first item from the queue and process it. If the processing fails, the item is added back to the queue so the service can retry processing it later.
This works fine but I've found it has one side-effect: the first processing attempt for new items does not happen until all the old retry items have been retried. Since it's possible the queue could contain many items, I'd like to force the new items to the front of the queue so they are processed first. But from the Queue<T> documentation, there is no obvious way of adding an item to the front of the queue.
I suppose I could create a second queue for new items and process that one preferentially but having a single queue seems simpler.
So is there an easy way to add an item to the front of the queue?
It kind of looks like you want a LinkedList<T>, which allows you to do things like AddFirst(), AddLast(), RemoveFirst(), and RemoveLast().
Simply use the Peek method in your timer callback instead of Dequeue.
If processing succeeds, then dequeue the item.
Well, I agree with CanSpice; however, you could:
var items = queue.ToArray();
queue.Clear();
queue.Enqueue(newFirstItem);
foreach(var item in items)
queue.Enqueue(item);
Nasty hack, but it will work ;)
Rather you might think about adding a second queue instance. This one being the 'priority' queue that you check/execute first. This would be a little cleaner. You might even create your own queue class to wrap it all up nice and neat like ;)
Do not take from the queue until you processed it, is an alternative solution.
Use queue.Peek() to get your first item and only Dequeue() when the operation succeeds.
Use queue.Count > 0 before you Peek(), else you will get the 'queue is empty' result.
I would suggest using two queues: one for new items, and one for retry items. Wrap both queues in a single object that has the same semantics as a queue as far as removal goes, but allows you to flag things as going into the New queue or the Retry queue on insert. Something like:
public class DoubleQueue<T>
{
private Queue<T> NewItems = new Queue<T>();
private Queue<T> RetryItems = new Queue<T>();
public Enqueue(T item, bool isNew)
{
if (isNew)
NewItems.Enqueue(item);
else
RetryItems.Enqueue(item);
}
public T Dequeue()
{
if (NewItems.Count > 0)
return NewItems.Dequeue();
else
return RetryItems.Dequeue();
}
}
Of course, you'll need to have a Count property that returns the number of items in both queues.
If you have more than two types of items, then it's time to upgrade to a priority queue.
Sounds like what you're after is a Stack - This is a LIFO (Last in, first out) buffer.
You need a Priority Queue. Take a look at the C5 Collections Library. It's IntervalHeap implements the IPriorityQueue interface. The C5 collections library is pretty good, too.
I believe you can find implementations at http://www.codeproject.com and http://www.codeplex.com as well.
Related
My application is a multi-threaded application. I use threads and Tasks to enqueue and dequeue items from ~4 queues using locks.
Sometimes, when i dequeue, the item is null and when i look inside the queue i can see some other items are null too (e.g the 5th item is null).
whenever i enqueue i always create a new item, so it being null is impossible. at first i thought that another thread is messing with my items. but when i saw that the 5th item was null too,while the 3rd, 4th and 2nd wasn't i realized that its impossible because you cant touch the 5th item before dequeueing the previous items.
i cannot share my code.
is someone familiar with that kind of situation?
what could be the cause?
----------------------------------EDIT-----------------------------
the class that enqueues the queue inherits from serial port and enqueues like this:
if(BytesToRead>0)
{
byte[] data=new byte[BytesToRead];
Read(data,0,data.length)
MyClass c=new MyClass(){m_data=data, m_tod=DateTime.Now};
_dataQueue.Enqueue(c);
}
and the classes that dequeues vary but the idea is similar:
lock(_sync)
{
var item=_dataQueue.dequeue();
}
when i dequeue i get null. as you can see i use DateTime.Now so its really wierd that it goes null. I mean if a thread uses this so the item shouldnt be there right?
each class that uses the queue has a copy of it. and inside every class there are about 3 threads that uses the queue
that its impossible because you cant touch the 5th item before dequeueing the previous items.
That reasoning does not hold when a collection that is not thread-safe is used from multiple threads. A race-condition can lead to all sorts of symptoms.
So best thing is to check your locking. If possible, post a mock-up of the code.
After the ----Edit---- :
Your are not locking around the Enqueue. The basic fix:
//_dataQueue.Enqueue(c);
lock(_sync)
{
_dataQueue.Enqueue(c);
}
I understand that BlockingCollection using ConcurrentQueue has a boundedcapacity of 100.
However I'm unsure as to what that means.
I'm trying to achieve a concurrent cache which, can dequeue, can deque/enque in one operation if the queue size is too large, (i.e. loose messages when the cache overflows). Is there a way to use boundedcapacity for this or is it better to manually do this or create a new collection.
Basically I have a reading thread and several writing threads. I would like it if the data in the queue is the "freshest" of all the writers.
A bounded capacity of N means that if the queue already contains N items, any thread attempting to add another item will block until a different thread removes an item.
What you seem to want is a different concept - you want most recently added item to be the first item that is dequeued by the consuming thread.
You can achieve that by using a ConcurrentStack rather than a ConcurrentQueue for the underlying store.
You would use this constructor and pass in a ConcurrentStack.
For example:
var blockingCollection = new BlockingCollection<int>(new ConcurrentStack<int>());
By using ConcurrentStack, you ensure that each item that the consuming thread dequeues will be the freshest item in the queue at that time.
Also note that if you specify an upper bound for the blocking collection, you can use BlockingCollection.TryAdd() which will return false if the collection was full at the time you called it.
It sounds to me like you're trying to build something like an MRU (most recently used) cache. BlockingCollection is not the best way to do that.
I would suggest instead that you use a LinkedList. It's not thread-safe, so you'll have to provide your own synchronization, but that's not too tough. Your enqueue method looks like this:
LinkedList<MyType> TheQueue = new LinkedList<MyType>();
object listLock = new object();
void Enqueue(MyType item)
{
lock (listLock)
{
TheQueue.AddFirst(item);
while (TheQueue.Count > MaxQueueSize)
{
// Queue overflow. Reduce to max size.
TheQueue.RemoveLast();
}
}
}
And dequeue is even easier:
MyType Dequeue()
{
lock (listLock)
{
return (TheQueue.Count > 0) ? TheQueue.RemoveLast() : null;
}
}
It's a little more involved if you want the consumers to do non-busy waits on the queue. You can do it with Monitor.Wait and Monitor.Pulse. See the example on the Monitor.Pulse page for an example.
Update:
It occurs to me that you could do the same thing with a circular buffer (an array). Just maintain head and tail pointers. You insert at head and remove at tail. If you go to insert, and head == tail, then you need to increment tail, which effectively removes the previous tail item.
If you want a custom BlockingCollection that holds the N most recent elements, and drops the oldest elements when it's full, you could create one quite easily based on a Channel<T>. The Channels are intended to be used in asynchronous scenarios, but making them block the consumer is trivial and should not cause any unwanted side-effects (like deadlocks), even if used in an environment with a SynchronizationContext installed.
public class MostRecentBlockingCollection<T>
{
private readonly Channel<T> _channel;
public MostRecentBlockingCollection(int capacity)
{
_channel = Channel.CreateBounded<T>(new BoundedChannelOptions(capacity)
{
FullMode = BoundedChannelFullMode.DropOldest,
});
}
public bool IsCompleted => _channel.Reader.Completion.IsCompleted;
public void Add(T item)
=> _channel.Writer.WriteAsync(item).AsTask().GetAwaiter().GetResult();
public T Take()
=> _channel.Reader.ReadAsync().AsTask().GetAwaiter().GetResult();
public void CompleteAdding() => _channel.Writer.Complete();
public IEnumerable<T> GetConsumingEnumerable()
{
while (_channel.Reader.WaitToReadAsync().AsTask().GetAwaiter().GetResult())
while (_channel.Reader.TryRead(out var item))
yield return item;
}
}
The MostRecentBlockingCollection class blocks only the consumer. The producer can always add items in the collection, causing (potentially) some previously added elements to be dropped.
Adding cancellation support should be straightforward, since the Channel<T> API already supports it. Adding support for timeout is less trivial, but shouldn't be very difficult to do.
I need to implement the producer/consumer pattern around a fixed-size FIFO queue. I think a wrapper class around a ConcurrentQueue might work for this but I'm not completely sure (and I've never worked with a ConcurrentQueue before). The twist in this is that the queue needs to only hold a fixed number of items (strings, in my case). My application will have one producer task/thread and one consumer task/thread. When my consumer task runs, it needs to dequeue all of the items that exist in the queue at that moment in time and process them.
For what it's worth, processing of the queued items by my consumer is nothing more than uploading them via SOAP to a web app that isn't 100% reliable. If the connection can't be established or the call SOAP call fails, I'm supposed to discard those items and go back to the queue for more. Because of the overhead of SOAP, I was trying to maximize the number of items from the queue that I could send in one SOAP call.
At times, my producer may add items faster than my consumer is able to remove and process them. If the queue is already full and my producer needs to add another item, I need to enqueue the new item but then dequeue the oldest item so that the size of the queue remains fixed. Basically, I need to keep the most recent items that are produced in the queue at all time (even if it means some items don't get consumed because my consumer is currently processing previous items).
With regard to the producer keeping the number if items in the queue fixed, I found one potential idea from this question:
Fixed size queue which automatically dequeues old values upon new enques
I'm currently using a wrapper class (based on that answer) around a ConcurrentQueue with an Enqueue() method like this:
public class FixedSizeQueue<T>
{
readonly ConcurrentQueue<T> queue = new ConcurrentQueue<T>();
public int Size { get; private set; }
public FixedSizeQueue(int size)
{
Size = size;
}
public void Enqueue(T obj)
{
// add item to the queue
queue.Enqueue(obj);
lock (this) // lock queue so that queue.Count is reliable
{
while (queue.Count > Size) // if queue count > max queue size, then dequeue an item
{
T objOut;
queue.TryDequeue(out objOut);
}
}
}
}
I create an instance of this class with a size limit on the queue like this:
FixedSizeQueue<string> incomingMessageQueue = new FixedSizeQueue<string>(10); // 10 item limit
I start up my producer task and it begins filling the queue. The code in my Enqueue() method seems to be working properly with regard to removing the oldest item from the queue when adding an item causes the queue count to exceed the max size. Now I need my consumer task to dequeue items and process them but here's where my brain gets confused. What's the best way to implement a Dequeue method for my consumer that will take a snapshot of the queue at a moment in time and dequeue all items for processing (the producer may still be adding items to the queue during this process)?
Simply stated, the ConcurrentQueue has a "ToArray" method which, when entered, will lock the collection and produce a "snapshot" of all current items in the queue. If you want your consumer to be given a block of things to work on, you can lock the same object the enqueueing method has, Call ToArray(), and then spin through a while(!queue.IsEmpty) queue.TryDequeue(out trash) loop to clear the queue, before returning the array you extracted.
This would be your GetAll() method:
public T[] GetAll()
{
lock (syncObj) // so that we don't clear items we didn't get with ToArray()
{
var result = queue.ToArray();
T trash;
while(!queue.IsEmpty) queue.TryDequeue(out trash);
}
}
Since you have to clear out the queue, you could simply combine the two operations; create an array of the proper size (using queue.Count), then while the queue is not empty, Dequeue an item and put it in the array, before returning.
Now, that's the answer to the specific question. I must now in good conscience put on my CodeReview.SE hat and point out a few things:
NEVER use lock(this). You never know what other objects may be using your object as a locking focus, and thus would be blocked when the object locks itself from the inside. The best practice is to lock a privately scoped object instance, usually one created just to be locked: private readonly object syncObj = new object();
Since you're locking critical sections of your wrapper anyway, I would use an ordinary List<T> instead of a concurrent collection. Access is faster, it's more easily cleaned out, so you'll be able to do what you're doing much more simply than ConcurrentQueue allows. To enqueue, lock the sync object, Insert() before index zero, then remove any items from index Size to the list's current Count using RemoveRange(). To dequeue, lock the same sync object, call myList.ToArray() (from the Linq namespace; does pretty much the same thing as ConcurrentQueue's does) and then call myList.Clear() before returning the array. Couldn't be simpler:
public class FixedSizeQueue<T>
{
private readonly List<T> queue = new List<T>();
private readonly object syncObj = new object();
public int Size { get; private set; }
public FixedSizeQueue(int size) { Size = size; }
public void Enqueue(T obj)
{
lock (syncObj)
{
queue.Insert(0,obj)
if(queue.Count > Size)
queue.RemoveRange(Size, Count-Size);
}
}
public T[] Dequeue()
{
lock (syncObj)
{
var result = queue.ToArray();
queue.Clear();
return result;
}
}
}
You seem to understand that you are throwing enqueued items away using this model. That's usually not a good thing, but I'm willing to give you the benefit of the doubt. However, I will say there is a lossless way to achieve this, using a BlockingCollection. A BlockingCollection wraps any IProducerConsumerCollection including most System.Collections.Concurrent classes, and allows you to specify a maximum capacity for the queue. The collection will then block any thread attempting to dequeue from an empty queue, or any thread attempting to add to a full queue, until items have been added or removed such that there is something to get or room to insert. This is the best way to implement a producer-consumer queue with a maximum size, or one that would otherwise require "polling" to see if there's something for the consumer to work on. If you go this route, only the ones the consumer has to throw away are thrown away; the consumer will see all the rows the producer puts in and makes its own decision about each.
You don't want to use lock with this. See Why is lock(this) {…} bad? for more details.
This code
// if queue count > max queue size, then dequeue an item
while (queue.Count > Size)
{
T objOut;
queue.TryDequeue(out objOut);
}
suggests that you need to somehow wait or notify the consumer about the item's availability. In this case consider using BlockingCollection<T> instead.
If I have a ConcurrentQueue, is there a preferred way to consume it with a Linq statement? It doesn't have a method to dequeue all the items as a sequence, and it's enumerator doesn't remove items.
I'm doing batch consumption, meaning periodically I want to process the queue and empty it, instead of processing it until it is empty and blocking until more items are enqueued. BlockingCollection doesn't seem like it will work because it will block when it gets to the last item, and I want that thread to do other stuff, like clear other queues.
static ConcurrentQueue<int> MyQueue = new ConcurrentQueue<int>();
void Main()
{
MyQueue.Enqueue(1);MyQueue.Enqueue(2);MyQueue.Enqueue(3);MyQueue.Enqueue(4);MyQueue.Enqueue(5);
var lst = MyQueue.ToLookup(x => x.SomeProperty);
//queue still has all elements
MyQueue.Dump("queue");
}
For now, I've made a helper method
static IEnumerable<T> ReadAndEmptyQueue<T>(this ConcurrentQueue<T> q)
{
T item;
while(q.TryDequeue(out item))
{
yield return item;
}
}
var lk = MyQueue.ReadAndEmptyQueue().ToLookup(x => x.SomeProperty);
MyQueue.Dump(); //size is now zero
Is there a better way, or am I doing it right?
Your approach is very reasonable, in my opinion. Allowing a consumer to empty the queue in this fashion is clean and simple.
BlockingCollection doesn't seem like it will work because it will block when it gets to the last item, and I want that thread to do other stuff, like clear other queues.
The one thing I'd mention - sometimes, from a design standpoint, it's easier to just fire off a separate consumer thread per queue. If you do that, each BlockingCollection<T> can just use GetConsumingEnumerable() and block as needed, as they'll be in a wait state when the queue is empty.
This is the approach I take more frequently, as it's often much simpler from a synchronization standpoint if each collection has one or more dedicated consumers, instead of a consumer switching between what it's consuming.
First of all, sorry about the title -- I couldn't figure out one that was short and clear enough.
Here's the issue: I have a list List<MyClass> list to which I always add newly-created instances of MyClass, like this: list.Add(new MyClass()). I don't add elements any other way.
However, then I iterate over the list with foreach and find that there are some null entries. That is, the following code:
foreach (MyClass entry in list)
if (entry == null)
throw new Exception("null entry!");
will sometimes throw an exception.
I should point out that the list.Add(new MyClass()) are performed from different threads running concurrently. The only thing I can think of to account for the null entries is the concurrent accesses. List<> isn't thread-safe, after all. Though I still find it strange that it ends up containing null entries, instead of just not offering any guarantees on ordering.
Can you think of any other reason?
Also, I don't care in which order the items are added, and I don't want the calling threads to block waiting to add their items. If synchronization is truly the issue, can you recommend a simple way to call the Add method asynchronously, i.e., create a delegate that takes care of that while my thread keeps running its code? I know I can create a delegate for Add and call BeginInvoke on it. Does that seem appropriate?
Thanks.
EDIT: A simple solution based on Kevin's suggestion:
public class AsynchronousList<T> : List<T> {
private AddDelegate addDelegate;
public delegate void AddDelegate(T item);
public AsynchronousList() {
addDelegate = new AddDelegate(this.AddBlocking);
}
public void AddAsynchronous(T item) {
addDelegate.BeginInvoke(item, null, null);
}
private void AddBlocking(T item) {
lock (this) {
Add(item);
}
}
}
I only need to control Add operations and I just need this for debugging (it won't be in the final product), so I just wanted a quick fix.
Thanks everyone for your answers.
List<T> can only support multiple readers concurrently. If you are going to use multiple threads to add to the list, you'll need to lock the object first. There is really no way around this, because without a lock you can still have someone reading from the list while another thread updates it (or multiple objects trying to update it concurrently also).
http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
Your best bet probably is to encapsulate the list in another object, and have that object handle the locking and unlocking actions on the internal list. That way you could make your new object's "Add" method asynchronous and let the calling objects go on their merry way. Any time you read from it though you'll most likely still have to wait on any other objects finishing their updates though.
The only thing I can think of to account for the null entries is the concurrent accesses. List<> isn't thread-safe, after all.
That's basically it. We are specifically told it's not thread-safe, so we shouldn't be surprised that concurrent access results in contract-breaking behaviour.
As to why this specific problem occurs, we can but speculate, since List<>'s private implementation is, well, private (I know we have Reflector and Shared Source - but in principle it is private). Suppose the implementation involves an array and a 'last populated index'. Suppose also that 'Add an item' looks like this:
Ensure the array is big enough for another item
last populated index <- last populated index + 1
array[last populated index] = incoming item
Now suppose there are two threads calling Add. If the interleaved sequence of operations ends up like this:
Thread A : last populated index <- last populated index + 1
Thread B : last populated index <- last populated index + 1
Thread A : array[last populated index] = incoming item
Thread B : array[last populated index] = incoming item
then not only will there be a null in the array, but also the item that thread A was trying to add won't be in the array at all!
Now, I don't know for sure how List<> does its stuff internally. I have half a memory that it is with an ArrayList, which internally uses this scheme; but in fact it doesn't matter. I suspect that any list mechanism that expects to be run non-concurrently can be made to break with concurrent access and a sufficiently 'unlucky' interleaving of operations. If we want thread-safety from an API that doesn't provide it, we have to do some work ourselves - or at least, we shouldn't be surprised if the API sometimes breaks its when we don't.
For your requirement of
I don't want the calling threads to block waiting to add their item
my first thought is a Multiple-Producer-Single-Consumer queue, wherein the threads wanting to add items are the producers, which dispatch items to the queue async, and there is a single consumer which takes items off the queue and adds them to the list with appropriate locking. My second thought is that this feels as if it would be heavier than this situation warrants, so I'll let it mull for a bit.
If you're using .NET Framework 4, you might check out the new Concurrent Collections. When it comes to threading, it's better not to try to be clever, as it's extremely easy to get it wrong. Synchronization can impact performance, but the effects of getting threading wrong can also result in strange, infrequent errors that are a royal pain to track down.
If you're still using Framework 2 or 3.5 for this project, I recommend simply wrapping your calls to the list in a lock statement. If you're concerned about performance of Add (are you performing some long-running operation using the list somewhere else?) then you can always make a copy of the list within a lock and use that copy for your long-running operation outside the lock. Simply blocking on the Adds themselves shouldn't be a performance issue, unless you have a very large number of threads. If that's the case, you can try the Multiple-Producer-Single-Consumer queue that AakashM recommended.