I'm trying to use WebClient to download a bunch of files asynchronously. From my understanding, this is possible, but you need to have one WebClient object for each download. So I figured I'd just throw a bunch of them in a queue at the start of my program, then pop them off one at a time and tell them to download a file. When the file is done downloading, they can get pushed back onto the queue.
Pushing stuff onto my queue shouldn't be too bad, I just have to do something like:
lock(queue) {
queue.Enqueue(webClient);
}
Right? But what about popping them off? I want my main thread to sleep when the queue is empty (wait until another web client is ready so it can start the next download). I suppose I could use a Semaphore alongside the queue to keep track of how many elements are in the queue, and that would put my thread to sleep when necessary, but it doesn't seem like a very good solution. What happens if I forget to decrement/increment my Semaphore every time I push/pop something on/off my queue and they get out of sync? That would be bad. Isn't there some nice way to have queue.Dequeue() automatically sleep until there is an item to dequeue then proceed?
I'd also welcome solutions that don't involve a queue at all. I just figured a queue would be the easiest way to keep track of which WebClients are ready for use.
Here's an example using a Semaphore. IMO it is a lot cleaner than using a Monitor:
public class BlockingQueue<T>
{
Queue<T> _queue = new Queue<T>();
Semaphore _sem = new Semaphore(0, Int32.MaxValue);
public void Enqueue(T item)
{
lock (_queue)
{
_queue.Enqueue(item);
}
_sem.Release();
}
public T Dequeue()
{
_sem.WaitOne();
lock (_queue)
{
return _queue.Dequeue();
}
}
}
What you want is a producer/consumer queue.
I have a simple example of this in my threading tutorial - scroll about half way down that page. It was written pre-generics, but it should be easy enough to update. There are various features you may need to add, such as the ability to "stop" the queue: this is often performed by using a sort of "null work item" token; you inject as many "stop" items in the queue as you have dequeuing threads, and each of them stops dequeuing when it hits one.
Searching for "producer consumer queue" may well provide you with better code samples - this was really just do demonstrate waiting/pulsing.
IIRC, there are types in .NET 4.0 (as part of Parallel Extensions) which will do the same thing but much better :) I think you want a BlockingCollection wrapping a ConcurrentQueue.
I use a BlockingQueue to deal with exactly this type of situation. You can call .Dequeue when the queue is empty, and the calling thread will simply wait until there is something to Dequeue.
public class BlockingQueue<T> : IEnumerable<T>
{
private int _count = 0;
private Queue<T> _queue = new Queue<T>();
public T Dequeue()
{
lock (_queue)
{
while (_count <= 0)
Monitor.Wait(_queue);
_count--;
return _queue.Dequeue();
}
}
public void Enqueue(T data)
{
if (data == null)
throw new ArgumentNullException("data");
lock (_queue)
{
_queue.Enqueue(data);
_count++;
Monitor.Pulse(_queue);
}
}
IEnumerator<T> IEnumerable<T>.GetEnumerator()
{
while (true)
yield return Dequeue();
}
IEnumerator IEnumerable.GetEnumerator()
{
return ((IEnumerable<T>) this).GetEnumerator();
}
}
Just use this in place of a normal Queue and it should do what you need.
Related
I have a simple scenario with two threads where the first thread reads permanently some data and enqueues that data into a queue. The second thread first peeks at a single object from that queue and makes some conditional checks. If these are good the single object will be dequeued and passed to some processing.
I have tried to use the ConcurrentQueue which is a thread safe implementation of a simple queue, but the problem with this one is that all calls are blocking. This means if the first thread is enqueuing an object, the second thread can't peek or dequeue an object.
In my situation I need to enqueue at the end and dequeue from the beginning of the queue at the same time.
The lock statement of C# would also.
So my question is whether it is possible to do these both operations in parallel without blocking each other in a thread safe way.
These are my first tries and this is an similar example for my problem.
using System;
using System.Collections.Generic;
using System.Threading.Tasks;
namespace Scenario {
public class Program {
public static void Main(string[] args) {
Scenario scenario = new Scenario();
scenario.Start();
Console.ReadKey();
}
public class Scenario {
public Scenario() {
someData = new Queue<int>();
}
public void Start() {
Task.Factory.StartNew(firstThread);
Task.Factory.StartNew(secondThread);
}
private void firstThread() {
Random random = new Random();
while (true) {
int newData = random.Next(1, 100);
someData.Enqueue(newData);
Console.WriteLine("Enqueued " + newData);
}
}
private void secondThread() {
Random random = new Random();
while (true) {
if (someData.Count == 0) {
continue;
}
int singleData = someData.Peek();
int someValue = random.Next(1, 100);
if (singleData > someValue || singleData == 1 || singleData == 99) {
singleData = someData.Dequeue();
Console.WriteLine("Dequeued " + singleData);
// ... processing ...
}
}
}
private readonly Queue<int> someData;
}
}
}
Second example:
public class Scenario {
public Scenario() {
someData = new ConcurrentQueue<int>();
}
public void Start() {
Task.Factory.StartNew(firstThread);
Task.Factory.StartNew(secondThread);
}
private void firstThread() {
Random random = new Random();
while (true) {
int newData = random.Next(1, 100);
someData.Enqueue(newData);
lock (syncRoot) { Console.WriteLine($"Enqued {enqued++} Dequed {dequed}"); }
}
}
private void secondThread() {
Random random = new Random();
while (true) {
if (!someData.TryPeek(out int singleData)) {
continue;
}
int someValue = random.Next(1, 100);
if (singleData > someValue || singleData == 1 || singleData == 99) {
if (!someData.TryDequeue(out singleData)) {
continue;
}
lock (syncRoot) { Console.WriteLine($"Enqued {enqued} Dequed {dequed++}"); }
// ... processing ...
}
}
}
private int enqued = 0;
private int dequed = 0;
private readonly ConcurrentQueue<int> someData;
private static readonly object syncRoot = new object();
}
First off: I strongly encourage you to reconsider whether your technique of having multiple threads and a shared memory data structure is even the right approach at all. Code that has multiple threads of control sharing access to data structures is hard to get right, and failures can be subtle, catastrophic, and hard to debug.
Second: If you are bent upon multiple threads and a shared memory data structure, I strongly encourage you to use designed-by-experts data types like concurrent queues, rather than rolling your own.
Now that I've got those warnings out of the way: here is a way to address your concern. It is sufficiently complicated that you should obtain the services of an expert on the C# memory model to verify the correctness of your solution if you go with this. I would not consider myself to be competent to implement the scheme I'm about to describe, not without help of someone who is actually an expert on the memory model.
The goal is to have a queue that supports simultaneous enqueue and dequeue operations and low lock contention.
What you want is two immutable stack variables called the enqueue stack and the dequeue stack, each with their own lock.
The enqueue operation is:
Take the enqueue lock
Push the item onto the enqueue stack; this produces a new stack in O(1) time.
Assign the newly produced stack to the enqueue stack variable.
Release the enqueue lock
The dequeue operation is:
Take the dequeue lock
If the dequeue stack is empty then
take the enqueue lock
enumerate the enqueue stack and use it to build the dequeue stack; this reverses the enqueue stack, which maintains the property we want: that the first in is the first out.
assign an empty immutable stack to the enqueue stack variable
release the enqueue lock
assign the new stack to the dequeue stack
If the dequeue stack is empty, throw, or abandon and retry later, or sleep until signaled by the enqueue operation, or whatever the right thing to do here is.
The dequeue stack is not empty.
Pop an item from the dequeue stack, which produces a new stack in O(1).
Assign the new stack to the dequeue stack variable.
Release the dequeue lock.
Process the item.
Note that of course if there is only one thread dequeuing, then we don't need the dequeue lock at all, but with this scheme there can be many threads dequeuing.
Suppose there are 1000 items on the enqueue stack and zero on the dequeue stack. When we dequeue the first time, we do an expensive O(n) operation of reversing the enqueue stack once, but now we have 1000 items on the dequeue stack. Once the dequeue stack is big, the dequeueing thread can spend most of its time processing, while the enqueuing thread spends most of its time enqueuing. Contention on the enqueue lock is rare, but expensive when it happens.
Why use immutable data structures? Everything I described here would also work with mutable stacks, but (1) it is easier to reason about immutable stacks, (2) if you want to really live dangerously you can elide some of the locks and go for interlocked swap operations; make sure you understand everything about the possible re-orderings of operations in low-lock conditions if you're doing that.
UPDATE:
The real problem is that i cant dequeue and process a lot of points because i am permanently reading and enquing new points. That enqueue calls are blocking the processing step.
Well if that is your real problem then mentioning it in the question instead of burying it in a comment would be a good idea. Help us help you.
There are a number of things you could do here. You could for example set the priority of the enqueuing thread lower than the priority of the dequeuing thread. Or you could have multiple dequeuing threads, as many as there are CPUs in your machine. Or you could dynamically choose to drop some enqueue operations if the dequeues are not keeping up. Without knowing a lot more about your actual problem it is hard to give advice on how to solve it.
I have a code that manages a large queue of data, it's locked witch lock statement to ensure only a single thread is working on it at a time.
The order of data in queue is really important, and each thread with its parameters can either add or take from it.
How do I ensure threads are queued to start in order of FIFO like my queue? Does the lock statement guarantee this?
var t = new Thread(() => parse(params)); //This is how I start my threads.
t.Start();
No, the lock statement does not guarantee FIFO ordering. Per Albahari:
If more than one thread contends the lock, they are queued on a “ready queue” and granted the lock on a first-come, first-served basis (a caveat is that nuances in the behavior of Windows and the CLR mean that the fairness of the queue can sometimes be violated).
If you want to ensure that your items are retrieved in a FIFO order, you should use the ConcurrentQueue<T> collection instead.
Edit: If you're targeting .NET 2.0, you could use a custom implementation for a concurrent thread-safe queue. Here's a trivial one:
public class ThreadSafeQueue<T>
{
private readonly object syncLock = new object();
private readonly Queue<T> innerQueue = new Queue<T>();
public void Enqueue(T item)
{
lock (syncLock)
innerQueue.Enqueue(item);
}
public bool TryDequeue(out T item)
{
lock (syncLock)
{
if (innerQueue.Count == 0)
{
item = default(T);
return false;
}
item = innerQueue.Dequeue();
return true;
}
}
}
Lock does't guarantee First In First Out access. An alternate approach would be Queue if you are limited with .NET 2.0. Keep in mind that, Queue is not thread safe hence you should synchronize the access.
I have a WinForms app with one consumer and one producer task. My producer task periodically connects to a web service and retrieves a specified number of strings which then need to be placed into some kind of concurrent fixed-size FIFO queue. My consumer task then processes these strings and sends then out as SMS messages (one string per message). The SOAP function that my producer task calls requires a parameter to specify the number of strings that I want to get. This number will be determined by the space available in my queue. So if I have a max queue size of 100 strings and I have 60 strings in the queue the next time my producer polls the web service, I need it to ask for 40 strings since that's all that I can fit in my queue at that moment.
Here's the code that I'm using to represent my fixed-size FIFO queue:
public class FixedSizeQueue<T>
{
private readonly List<T> queue = new List<T>();
private readonly object syncObj = new object();
public int Size { get; private set; }
public FixedSizeQueue(int size)
{
Size = size;
}
public void Enqueue(T obj)
{
lock (syncObj)
{
queue.Insert(0, obj);
if (queue.Count > Size)
{
queue.RemoveRange(Size, queue.Count - Size);
}
}
}
public T[] Dequeue()
{
lock (syncObj)
{
var result = queue.ToArray();
queue.Clear();
return result;
}
}
public T Peek()
{
lock (syncObj)
{
var result = queue[0];
return result;
}
}
public int GetCount()
{
lock (syncObj)
{
return queue.Count;
}
}
My producer task doesn't currently specify the number of strings that I need from the web service but it seems like it could be as simple as getting the current item count in my queue (q.GetCount()) and then subtracting it from my max queue size. However, even though GetCount() uses a lock, isn't it possible that as soon as GetCount() exits, my consumer task could process 10 strings in the queue meaning that I'll never actually be able to keep the queue 100% full?
Also, my consumer task basically needs to "peek" at the first string in the queue before trying to sent it in an SMS message. In the event that the message can't be sent, I need to leave the string in it's original position in the queue. My first thought about accomplishing this is to "peek" at the first string in the queue, try to send it in an SMS message and then remove it from the queue if the send was successful. This way, if the send fails, the string is still in the queue at its original position. Does that sound reasonable?
This is a broad question, so there really is no definitive answer, but here are my thoughts.
However, even though GetCount() uses a lock, isn't it possible that as soon as GetCount() exits, my consumer task could process 10 strings in the queue meaning that I'll never actually be able to keep the queue 100% full?
Yes, it is, unless you lock on syncObj for the entire duration of your query to the web service. But the point of producer/consumer is to allow the consumer to process items while the producer is fetching more. There's really not much you can do about this; at some point, the queue will not be 100% full. If it always was 100% full then that would mean that the consumer isn't doing anything at all.
This way, if the send fails, the string is still in the queue at its original position. Does that sound reasonable?
Perhaps, but the way you have this coded, a Dequeue() operation returns the entire state of the queue and clears it. Your only option given this interface is to re-queue failed items to be processed later, which is a perfectly reasonable technique.
I would also consider adding a way for the consumer to block itself until there are items to be processed. For example:
public T[] WaitForItemAndDequeue(TimeSpan timeout)
{
lock (syncObj) {
if (queue.Count == 0 && !Monitor.Wait(syncObj, timeout)) {
return null; // Timeout expired
}
return Dequeue();
}
}
public T[] WaitForItem()
{
lock (syncObj) {
while (queue.Count != 0) {
Monitor.Wait(syncObj);
}
return Dequeue();
}
}
Then you have to change Enqueue() to call Monitor.Pulse(syncObj) after it has manipulated the list (so at the end of the method, but inside of the lock block).
I need to implement a sort of task buffer. Basic requirements are:
Process tasks in a single background thread
Receive tasks from multiple threads
Process ALL received tasks i.e. make sure buffer is drained of buffered tasks after a stop signal is received
Order of tasks received per thread must be maintained
I was thinking of implementing it using a Queue like below. Would appreciate feedback on the implementation. Are there any other brighter ideas to implement such a thing?
public class TestBuffer
{
private readonly object queueLock = new object();
private Queue<Task> queue = new Queue<Task>();
private bool running = false;
public TestBuffer()
{
}
public void start()
{
Thread t = new Thread(new ThreadStart(run));
t.Start();
}
private void run()
{
running = true;
bool run = true;
while(run)
{
Task task = null;
// Lock queue before doing anything
lock (queueLock)
{
// If the queue is currently empty and it is still running
// we need to wait until we're told something changed
if (queue.Count == 0 && running)
{
Monitor.Wait(queueLock);
}
// Check there is something in the queue
// Note - there might not be anything in the queue if we were waiting for something to change and the queue was stopped
if (queue.Count > 0)
{
task = queue.Dequeue();
}
}
// If something was dequeued, handle it
if (task != null)
{
handle(task);
}
// Lock the queue again and check whether we need to run again
// Note - Make sure we drain the queue even if we are told to stop before it is emtpy
lock (queueLock)
{
run = queue.Count > 0 || running;
}
}
}
public void enqueue(Task toEnqueue)
{
lock (queueLock)
{
queue.Enqueue(toEnqueue);
Monitor.PulseAll(queueLock);
}
}
public void stop()
{
lock (queueLock)
{
running = false;
Monitor.PulseAll(queueLock);
}
}
public void handle(Task dequeued)
{
dequeued.execute();
}
}
You can actually handle this with the out-of-the-box BlockingCollection.
It is designed to have 1 or more producers, and 1 or more consumers. In your case, you would have multiple producers and one consumer.
When you receive a stop signal, have that signal handler
Signal producer threads to stop
Call CompleteAdding on the BlockingCollection instance
The consumer thread will continue to run until all queued items are removed and processed, then it will encounter the condition that the BlockingCollection is complete. When the thread encounters that condition, it just exits.
You should think about ConcurrentQueue, which is FIFO, in fact. If not suitable, try some of its relatives in Thread-Safe Collections. By using these you can avoid some risks.
I suggest you take a look at TPL DataFlow. BufferBlock is what you're looking for, but it offers so much more.
Look at my lightweight implementation of threadsafe FIFO queue, its a non-blocking synchronisation tool that uses threadpool - better than create own threads in most cases, and than using blocking sync tools as locks and mutexes. https://github.com/Gentlee/SerialQueue
Usage:
var queue = new SerialQueue();
var result = await queue.Enqueue(() => /* code to synchronize */);
You could use Rx on .NET 3.5 for this. It might have never come out of RC, but I believe it is stable* and in use by many production systems. If you don't need Subject you might find primitives (like concurrent collections) for .NET 3.5 you can use that didn't ship with the .NET Framework until 4.0.
Alternative to Rx (Reactive Extensions) for .net 3.5
* - Nit picker's corner: Except for maybe advanced time windowing, which is out of scope, but buffers (by count and time), ordering, and schedulers are all stable.
I have scenarios where I need a main thread to wait until every one of a set of possible more than 64 threads have completed their work, and for that I wrote the following helper utility, (to avoid the 64 waithandle limit on WaitHandle.WaitAll())
public static void WaitAll(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
foreach (WaitHandle wh in handles) wh.WaitOne();
}
With this utility method, however, each waithandle is only examined after every preceding one in the array has been signalled... so it is in effect synchronous, and will not work if the waithandles are autoResetEvent wait handles (which clear as soon as a waiting thread has been released)
To fix this issue I am considering changing this code to the following, but would like others to check and see if it looks like it will work, or if anyone sees any issues with it, or can suggest a better way ...
Thanks in advance:
public static void WaitAllParallel(WaitHandle[] handles)
{
if (handles == null)
throw new ArgumentNullException("handles",
"WaitHandle[] handles was null");
int actThreadCount = handles.Length;
object locker = new object();
foreach (WaitHandle wh in handles)
{
WaitHandle qwH = wh;
ThreadPool.QueueUserWorkItem(
delegate
{
try { qwH.WaitOne(); }
finally { lock(locker) --actThreadCount; }
});
}
while (actThreadCount > 0) Thread.Sleep(80);
}
If you know how many threads you have, you can use an interlocked decrement. This is how I usually do it:
{
eventDone = new AutoResetEvent();
totalCount = 128;
for(0...128) {ThreadPool.QueueUserWorkItem(ThreadWorker, ...);}
}
void ThreadWorker(object state)
try
{
... work and more work
}
finally
{
int runningCount = Interlocked.Decrement(ref totalCount);
if (0 == runningCount)
{
// This is the last thread, notify the waiters
eventDone.Set();
}
}
Actually, most times I don't even signal but instead invoke a callback continues the processing from where the waiter would continue. Less blocked threads, more scalability.
I know is different and may not apply to your case (eg. for sure will not work if some of thoe handles are not threads, but I/O or events), but it may worth thinking about this.
I'm not sure what exactly you're trying to do, but would a CountdownEvent (.NET 4.0) conceptually solve your problem?
I'm not a C# or .NET programmer, but you could use a semaphore that is posted when one of your worker threads exits. The monitoring thread would simply wait on the semaphore n times where n is the number of worker threads. Semaphores are traditionally used to count resources in use but they can be used to count jobs completed by waiting on the same semaphore for n times.
When working with lots of simultaneous threads, I prefer to add each thread's ManagedThreadId into a Dictionary when I start the thread, and then have each thread invoke a callback routine that removes the dying thread's id from the Dictionary. The Dictionary's Count property tells you how many threads are active. Use the value side of the key/value pair to hold info that your UI thread can use to report status. Wrap the Dictionary with a lock to keep things safe.
ThreadPool.QueueUserWorkItem(o =>
{
try
{
using (var h = (o as WaitHandle))
{
if (!h.WaitOne(100000))
{
// Alert main thread of the timeout
}
}
}
finally
{
Interlocked.Decrement(ref actThreadCount);
}
}, wh);