I had a very basic question which is more around concepts of ConcurrentQueue. A queue is FIFO. When multiple threads start accessing it, how do we guarantee FIFO?
Suppose, I have added Apple, Oranges, Lemon, Peach and Apricot - in that order. The first TryTake should return Apple. But what happens when multiple threads starts giving their own TryTake requests? Would'nt there be a possibility that when one thread could return Lemon even before another thread could return Apple? I am assuming the other items will also be returned till the Queue is empty. But will these returns govern around the basic principles of FIFO?
The behavior of the the ConcurrentQueue itself is always going to be FIFO.
When we talk about threads "returning" items from the ConcurrentQueue, we're talking about an operation that involves both dequeuing an item and performing some sort of an operation that enables you to observe what has been dequeued. Whether it's printing an output or adding that item to another list, you don't actually know which item has been taken out of the queue until you inspect it.
While the queue itself is FIFO, you can't predict the sequence in which those other events, such as inspecting the dequeued items, will occur. The items will be dequeued FIFO, but you may or may not be able to observe what comes out of the queue in that order. The different threads may not perform that inspection or output in exactly the same order in which they removed items from the queue.
In other words, it's going to happen FIFO but it may or may not always look like it. You wouldn't want to read from a ConcurrentQueue concurrently if the exact sequence in which the items handled was critical.
If you were to test this (I'm about to write something) then you'd probably find items getting processed in exact FIFO sequence most of the time, but then every once in a while they wouldn't be.
Here's a console app. It's going to
insert the numbers from 1-5000 in a ConcurrentQueue, single-threaded.
perform concurrent operations to dequeue each of those items and move them to another ConcurrentQueue. This is the "multithreaded consumer."
Read the items in the second queue (single threaded, again) and report any numbers that are out of sequence.
Many times I run it and nothing is out of sequence. But about 50% of the time it reports just a few numbers out of sequence. So if you were counting on all of the numbers getting processed in their original sequence it would happen with almost all of the numbers most of the time. But then it wouldn't. That's fine if you don't care about the exact sequence, but buggy and unpredictable if you do.
The conclusion - don't depend on the exact sequence of multithreaded operations.
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading.Tasks;
namespace ConcurrentQueueExperiment
{
class Program
{
static void Main(string[] args)
{
var inputQueue = new ConcurrentQueue<int>();
var outputQueue = new ConcurrentQueue<int>();
Enumerable.Range(1,5000).ToList().ForEach(inputQueue.Enqueue);
while (inputQueue.Any())
{
Task.Factory.StartNew(() =>
{
int dequeued;
if (inputQueue.TryDequeue(out dequeued))
{
outputQueue.Enqueue(dequeued);
}
});
}
int output = 0;
var previous = 0;
while (outputQueue.TryDequeue(out output))
{
if(output!=previous+1)
Console.WriteLine("Out of sequence: {0}, {1}", previous, output);
previous = output;
}
Console.WriteLine("Done!");
Console.ReadLine();
}
}
}
Related
I'm going to start by describing my use case:
I have built an app which processes LARGE datasets, runs various transformations on them and them spits them out. This process is very time sensitive so a lot of time has gone into optimising.
The idea is to read a bunch of records at a time, process each one on different threads and write the results to file. But instead of writing them to one file, the results are written to one of many temp files which get combined into the desired output file at the end. This is so that we avoid memory write protection exceptions or bottlenecks (as much as possible).
To achieve that, we have an array of 10 fileUtils, 1 of which get passed to a thread as it is initiated. There is a threadCountIterator which increments at each localInit, and is reset back to zero when that count reaches 10. That value is what determines which of the fileUtils objects get passed to the record processing object per thread. The idea is that each util class is responsible for collecting and writing to just one of the temp output files.
It's worth nothing that each FileUtils object gathers about 100 records in a member outputBuildString variable before writing it out, hence having them exist separately and outside of the threading process, where objects lifespan is limited.
The is to more or less evenly disperse the responsability for collecting, storing and then writing the output data across multiple fileUtil objects which means we can write more per second than if we were just writing to one file.
my problem is that this approach results in a Array Out Of Bounds exception as my threadedOutputIterator jumps above the upper limit value, despite there being code that is supposed to reduce it when this happens:
//by default threadCount = 10
private void ProcessRecords()
{
try
{
Parallel.ForEach(clientInputRecordList, new ParallelOptions { MaxDegreeOfParallelism = threadCount }, LocalInit, ThreadMain, LocalFinally);
}
catch (Exception e)
{
Console.WriteLine("The following error occured: " + e);
}
}
private SplitLineParseObject LocalInit()
{
if (threadedOutputIterator >= threadCount)
{
threadedOutputIterator = 0;
}
//still somehow goes above 10, and this is where the excepetion hits since there are only 10 objects in the threadedFileUtils array
SplitLineParseObject splitLineParseUtil = new SplitLineParseObject(parmUtils, ref recCount, ref threadedFileUtils[threadedOutputIterator], ref recordsPassedToFileUtils);
if (threadedOutputIterator<threadCount)
{
threadedOutputIterator++;
}
return splitLineParseUtil;
}
private SplitLineParseObject ThreadMain(ClientInputRecord record, ParallelLoopState state, SplitLineParseObject threadLocalObject)
{
threadLocalObject.clientInputRecord = record;
threadLocalObject.ProcessRecord();
recordsPassedToObject++;
return threadLocalObject;
}
private void LocalFinally(SplitLineParseObject obj)
{
obj = null;
}
As explained in the above comment,it still manages to jump above 10, and this is where the excepetion hits since there are only 10 objects in the threadedFileUtils array. I understand that this is because multiple threads would be incrementing that number at the same time before either of the code in those if statements could be called, meaning theres still the chance it will fail in its current state.
How could I better approach this such that I avoid that exception, while still being able to take advantage of the read, store and write efficiency that having multiple fileUtils gives me?
Thanks!
But instead of writing them to one file, the results are written to one of many temp files which get combined into the desired output file at the end
That is probably not a great idea. If you can fit the data in memory it is most likely better to keep it in memory, or do the merging of data concurrently with the production of data.
To achieve that, we have an array of 10 fileUtils, 1 of which get passed to a thread as it is initiated. There is a threadCountIterator which increments at each localInit, and is reset back to zero when that count reaches 10
This does not sound safe to me. The parallel loop should guarantee that no more than 10 threads should run concurrently (if that is your limit), and that local init will run once for each thread that is used. As far as I know it makes no guarantee that no more than 10 threads will be used in total, so it seem possible that thread #0 and thread #10 could run concurrently.
The correct usage would be to create a new fileUtils-object in the localInit.
This more or less works and ends up being more efficient than if we are writing to just one file
Are you sure? typically IO does not scale very well with concurrency. While SSDs are absolutely better than HDDs, both tend to work best with sequential IO.
How could I better approach this?
My approach would be to use a single writing thread, and a blockingCollection as a thread-safe buffer between the producers and the writer. This assumes that the order of items is not significant:
public async Task ProcessAndWriteItems(List<int> myItems)
{
// BlockingCollection uses a concurrentQueue by default
// Can also set a max size , in case the writer cannot keep up with the producers
var writeQueue = new BlockingCollection<string>();
var writeTask = Task.Run(() => Writer(writeQueue));
Parallel.ForEach(
myItems,
item =>
{
writeQueue.Add(item.ToString());
});
writeQueue.CompleteAdding(); // signal the writer to stop once all items has been processed
await writeTask;
}
private void Writer(BlockingCollection<string> queue)
{
using var stream = new StreamWriter(myFilePath);
foreach (var line in queue.GetConsumingEnumerable())
{
stream.WriteLine(line);
}
}
There is also dataflow that should be suitable for tasks like this. But I have not used it, so I cannot provide specific recommendations.
Note that multi threaded programming is difficult. While it can be made easier by proper use of modern programming techniques, you still need need to know a fair bit about thread safety to understand the problems, and what options and tools exist to solve them. You will not always be so lucky to get actual exceptions, a more typical result of multi threading bugs would be that your program just produces the wrong result. If you are unlucky this only occur in production, on a full moon, and only when processing important data.
LocalInit obviously is not thread safe, so when invoked multiple times in parallel it will have all the multithreading problems caused by not-atomic operations. As a quick fix you can lock the whole method:
private object locker = new object();
private SplitLineParseObject LocalInit()
{
lock (locker)
{
if (threadedOutputIterator >= threadCount)
{
threadedOutputIterator = 0;
}
SplitLineParseObject splitLineParseUtil = new SplitLineParseObject(parmUtils, ref recCount,
ref threadedFileUtils[threadedOutputIterator], ref recordsPassedToFileUtils);
if (threadedOutputIterator < threadCount)
{
threadedOutputIterator++;
}
return splitLineParseUtil;
}
}
Or maybe try to workaround with Interlocked for more fine-grained control and better performance (but it would not be very easy, if even possible).
Note that even if you will implement this in current code - there is still no guarantee that all previous writes are actually finished i.e. for 10 files there is a possibility that the one with 0 index is not yet finished while next 9 are and the 10th will try writing to the same file as 0th is writing too. Possibly you should consider another approach (if you still want to write to multiple files, though IO does not usually scale that well, so possibly just blocking write with queue in one file is a way to go) - you can consider splitting your data in chunks and process them in parallel (i.e. "thread" per chunk) while every chunk writes to it's own file, so there is no need for sync.
Some potentially useful reading:
Overview of synchronization primitives
System.Threading.Channels
TPL Dataflow
Threading in C# by Joseph Albahari
I'm not completely new to C#, but I'm not familiar enough with the language to know how to do what I need to do.
I have a file, call it File1.txt. File1.txt has 100,000 lines or so.
I will duplicate File1.txt and call it File1_untested.txt.
I will also create an empty file "Successes.txt"
For each line in the file:
Remove this line from File1_untested.txt
If this line passes the test, write it to Successes.txt
So, my question is, how can I multithread this?
My approach so far has been to create an object (LineChecker), give the object its line to check, and pass the object into a ThreadPool. I understand how to use ThreadPools for a few tasks with a CountdownEvent. However, it seems unreasonable to queue up 100,000 tasks all at once. How can I gradually feed the pool? Maybe 1000 lines at a time or something like that.
Also, I need to ensure that no two threads are adding to Successes.txt or removing from File1_untested.txt at the same time. I can handle this with lock(), right? What should I be passing into lock()? Can I use a static member of LineChecker?
I'm just trying to get a broad understanding of how something like this can be designed.
Since the tests takes a relatively significant amount of time then it makes sense to utilize multiple CPU cores. However, such utilization should be done only for the relatively expensive test, not for reading/updating the file. This is because reading/updating the file is relatively cheap.
Here is some example code that you can use:
Assuming the you have a relatively expensive Test method:
private bool Test(string line)
{
//This test is expensive
}
Here is a code sample that can utilize multiple CPU for testing:
Here we limit the number of items in the collection to 10, so that the thread that is reading from the file will wait for the other threads to catch up before reading more lines from the file.
This input thread will read much faster than the other threads can test, so we at the worst case we will have read 10 more lines than the testing threads have done testing. This makes sure we have good memory consumption.
CancellationTokenSource cancellation_token_source = new CancellationTokenSource();
CancellationToken cancellation_token = cancellation_token_source.Token;
BlockingCollection<string> blocking_collection = new BlockingCollection<string>(10);
using (StreamReader reader = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read)))
{
using (
StreamWriter writer =
new StreamWriter(new FileStream(success_filename, FileMode.OpenOrCreate, FileAccess.Write)))
{
var input_task = Task.Factory.StartNew(() =>
{
try
{
while (!reader.EndOfStream)
{
if (cancellation_token.IsCancellationRequested)
return;
blocking_collection.Add(reader.ReadLine());
}
}
finally //In all cases, even in the case of an exception, we need to make sure that we mark that we have done adding to the collection so that the Parallel.ForEach loop will exit. Note that Parallel.ForEach will not exit until we call CompleteAdding
{
blocking_collection.CompleteAdding();
}
});
try
{
Parallel.ForEach(blocking_collection.GetConsumingEnumerable(), (line) =>
{
bool test_reault = Test(line);
if (test_reault)
{
lock (writer)
{
writer.WriteLine(line);
}
}
});
}
catch
{
cancellation_token_source.Cancel(); //If Paralle.ForEach throws an exception, we inform the input thread to stop
throw;
}
input_task.Wait(); //This will make sure that exceptions thrown in the input thread will be propagated here
}
}
If your "test" was fast, then multithreading would not have given you any advantage whatsoever, because your code would be 100% disk-bound, and presumably you have all of your files on the same disk: you cannot improve the throughput of a single disk with multithreading.
But since your "test" will be waiting for a response from a webserver, this means that the test is going to be slow, so there is plenty of room for improvement by multithreading. Basically, the number of threads you need depends on how many requests the webserver can be servicing simultaneously without degrading the performance of the webserver. This number might still be low, so you might end up not gaining anything, but at least you can try.
If your file is not really huge, then you can read it all at once, and write it all at once. If each line is only 80 characters long, then this means that your file is only 8 megabytes, which is peanuts, so you can read all the lines into a list, work on the list, produce another list, and in the end write out the entire list.
This will allow you to create a structure, say, MyLine which contains the index of each line and the text of each line, so that you can sort all lines before writing them, so that you do not have to worry about out-of-order responses from the server.
Then, what you need to do is use a bounding blocking queue like BlockingCollection as #Paul suggested.
BlockingCollection accepts as a constructor parameter its maximum capacity. This means that once its maximum capacity has been reached, any further attempts to add to it are blocked (the caller sits there waiting) until some items are removed. So, if you want to have up to 10 simultaneously pending requests, you would construct it as follows:
var sourceCollection = new BlockingCollection<MyLine>(10);
Your main thread will be stuffing sourceCollection with MyLine objects, and you will have 10 threads which block waiting to read MyLines from the collection. Each thread sends a request to the server, waits for a response, saves the result into a thread-safe resultCollection, and attempts to fetch the next item from sourceCollection.
Instead of using multiple threads you could instead use the async features of C#, but I am not terribly familiar with them, so I cannot advice you on precisely how you would do that.
In the end, copy the contents of resultCollection into a List, sort the list, and write it to the output file. (The copy into a separate List is probably a good idea because sorting the thread-safe resultCollection will probably be much slower than sorting a non-thread-safe List. I said probably.)
(NOTE: I'm using .Net 4, not .Net 4.5, so I cannot use the TPL's DataflowBlock classes.)
TL;DR Version
Ultimately, I'm just looking for a way to process sequential work items using multiple threads in a way that preserves their order in the final output, without requiring an unbounded output buffer.
Motivation
I have existing code to provide a multithreaded mechanism for processing multiple blocks of data where one I/O-bound thread (the "supplier") is reponsible for enqueuing blocks of data for processing. These blocks of data comprise the work items.
One or more threads (the "processors") are responsible for dequeuing one work item at a time, which they process and then write the processed data to an output queue before dequeuing their next work item.
A final I/O-bound thread (the "consumer") is responsible for dequeuing completed work items from the output queue and writing them to the final destination. These work items are (and must be) written in the same order that they were enqueued. I implemented this using a concurrent priority queue, where the priority of each item is defined by its source index.
I'm using this scheme to do some custom compression on a large data stream, where the compression itself is relatively slow but the reading of the uncompressed data and the writing of the compressed data is relatively fast (although I/O-bound).
I process the data in fairly large chunks of the order of 64K, so the overhead of the pipeline is relatively small.
My current solution is working well but it involves a lot of custom code written 6 years ago using many synchronisation events, and the design seems somewhat clunky; therefore I have embarked on academic excercise to see if it can be rewritten using more modern .Net libraries.
The new design
My new design uses the BlockingCollection<> class, and is based somewhat on this Microsoft article.
In particular, look at the section entitled Load Balancing Using Multiple Producers. I have tried using that approach, and therefore I have several processing tasks each of which takes work items from a shared input BlockingCollection and writes its completed items to its own BlockingCollection output queue.
Because each processing task has its own output queue, I'm trying to use BlockingCollection.TakeFromAny() to dequeue the first available completed work item.
The Multiplexer problem
So far so good, but now here comes the problem. The Microsoft article states:
The gaps are a problem. The next stage of the pipeline, the Display Image stage, needs to show images in order and without gaps in the sequence. This is where the multiplexer comes in. Using the TakeFromAny method, the multiplexer waits for input from both of the filter stage producer queues. When an image arrives, the multiplexer looks to see if the image's sequence number is the next in the expected sequence. If it is, the multiplexer passes it to the Display Image stage. If the image is not the next in the sequence, the multiplexer holds the value in an internal look-ahead buffer and repeats the take operation for the input queue that does not have a look-ahead value. This algorithm allows the multiplexer to put together the inputs from the incoming producer queues in a way that ensures sequential order without sorting the values.
Ok, so what happens is that the processing tasks can produce finished items in pretty much any order. The multiplexer is responsible for outputting these items in the correct order.
However...
Imagine that we have 1000 items to process. Further imagine that for some weird reason, the very first item takes longer to process that all the other items combined.
Using my current scheme, the multiplexer will keep reading and buffering items from all the processing output queues until it finds the next one that it's supposed to output. Since the item that its waiting for is (according to my "imagine if" above) only going to appear after ALL the other work items have been processed, I will effectively be buffering all the work items in the entire input!
The amount of data is way too large to allow this to happen. I need to be able to stop the processing tasks from outputting completed work items when the output queue has reached a certain maximum size (i.e. it's a bounded output queue) UNLESS the work item happens to be the one the multiplexer is waiting for.
And that's where I'm getting a bit stuck. I can think of many ways to actually implement this, but they all seem to be over-complex to the extent that they are no better than the code I'm thinking to replace!
What's my question?
My question is: Am I going about this the right way?
I would have thought this would be a well-understood problem, but my research has only turned up articles that seem to ignore the unbounded buffering problem that occurs if a work item takes a very long time compared to all the other work items.
Can anyone point me at any articles that describe a reasonable way to achieve this?
TL;DR Version
Ultimately, I'm just looking for a way to process sequential work items using multiple threads in a way that preserves their order in the final output, without requiring an unbounded output buffer.
Create a pool of items at startup, 1000, say. Store them on a BlockingCollection - a 'pool queue'.
The supplier gets items from the pool queue, loads them from the file, loads in the sequence-number/whatever and submits them to the processors threadpool.
The processors do their stuff and sends the output to the multiplexer. The multiplexer does it job of storing any out-of-order items until earlier items have been processed.
When an item has been completely consumed by whatever the multiplexer outputs to, they are returned to the pool queue for re-use by the supplier.
If one 'slow item' does require enormous amounts of processing, the out-of-order collection in the multiplexer will grow as the 'quick items' slip through on the other pool threads, but because the multiplexer is not actually feeding its items to its output, the pool queue is not being replenished.
When the pool empties, the supplier will block on it and will be unable to supply any more items.
The 'quick items' remaining on the processing pool input will get processed and then processing will stop except for the 'slow item'. The supplier is blocked, the multiplexer has [poolSize-1] items in its collection. No extra memory is being used, no CPU is being wasted, the only thing happening is the processing of the 'slow item'.
When the 'slow item' is finally done, it gets output to the multiplexer.
The multiplexer can now output all [poolSize] items in the required sequential order. As these items are consumed, the pool gets filled up again and the supplier, now able to get items from the pool, runs on, again reading its file an queueing up items to the processor pool.
Auto-regulating, no bounded buffers required, no memory runaway.
Edit: I meant 'no bounded buffers required' :)
Also, no GC holdups - since the items are re-used, they don't need GC'ing.
I think you misunderstand the article. According to the description, it doesn't have an unbounded buffer, there will be at most one value in the look-ahread buffer for each queue. When you dequeue a value that's not the next one, you save it and then wait only on the queue that doesn't have a value in the buffer. (If you have multiple input buffers, the logic will have to be more complicated, or you would need a tree of 2 queue multiplexers.)
If you combine this with BlockingCollections that have specified bounded capacity, you get exactly the behavior you want: if one producer is too slow, the others will pause until the slow thread catches up.
Have you considered not using manual producer/consumer buffering but instead the .AsParallel().AsOrdered() PLINQ alternative? Semantically, this is exactly what you want - a sequence of items processed in parallel but ordered in output. Your code could look as simple as...
var orderedOutput =
ReadSequentialBlocks()
.AsParallel()
.AsOrdered()
.Select(ProcessBlock)
foreach(var item in orderedOutput)
Sink(item);
The default degree of parallelism is the number of processors on your machine, but you can tune it. There is an automatic output buffer. If the default buffering consumes too many resources, you can turn it off:
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
However, I'd certainly give the plain unadorned version a shot first - you never know, it might just work fine out of the box. Finally, if you want the simplicity of auto-multiplexing but a larger-than-zero yet non-automatic buffer, you could always use the PLINQ query to fill a fixed-size BlockingCollection<> which is read with a consuming enumerable on another thread.
Follow up
For completeness, here is the code that I wound up with. Thanks to Martin James for his answer, which provided the basis for the solution.
I'm still not completely happy with the multiplexor (see ParallelWorkProcessor.multiplex()). It works, but it seems a bit klunky.
I used Martin James' idea about a work pool to prevent unbounded growth of the multiplexor buffer, however I substituted a SemaphoreSlim for the work pool queue (since it provides the same functionality, but it's a bit simpler to use and uses less resources).
The worker tasks write their completed items to a concurrent priority queue. This allows me to easily and efficiently find the next item to output.
I used a sample concurrent priority queue from Microsoft, modified to provide an autoreset event that's signalled whenever a new item is enqueued.
Here's the ParallelWorkProcessor class. You use it by providing it with three delegates; one to provide the work items, one to process a work item, and one to output a completed work item.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics.Contracts;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
public sealed class ParallelWorkProcessor<T> where T: class // T is the work item type.
{
public delegate T Read(); // Called by only one thread.
public delegate T Process(T block); // Called simultaneously by multiple threads.
public delegate void Write(T block); // Called by only one thread.
public ParallelWorkProcessor(Read read, Process process, Write write, int numWorkers = 0)
{
_read = read;
_process = process;
_write = write;
numWorkers = (numWorkers > 0) ? numWorkers : Environment.ProcessorCount;
_workPool = new SemaphoreSlim(numWorkers*2);
_inputQueue = new BlockingCollection<WorkItem>(numWorkers);
_outputQueue = new ConcurrentPriorityQueue<int, T>();
_workers = new Task[numWorkers];
startWorkers();
Task.Factory.StartNew(enqueueWorkItems);
_multiplexor = Task.Factory.StartNew(multiplex);
}
private void startWorkers()
{
for (int i = 0; i < _workers.Length; ++i)
{
_workers[i] = Task.Factory.StartNew(processBlocks);
}
}
private void enqueueWorkItems()
{
int index = 0;
while (true)
{
T data = _read();
if (data == null) // Signals end of input.
{
_inputQueue.CompleteAdding();
_outputQueue.Enqueue(index, null); // Special sentinel WorkItem .
break;
}
_workPool.Wait();
_inputQueue.Add(new WorkItem(data, index++));
}
}
private void multiplex()
{
int index = 0; // Next required index.
int last = int.MaxValue;
while (index != last)
{
KeyValuePair<int, T> workItem;
_outputQueue.WaitForNewItem(); // There will always be at least one item - the sentinel item.
while ((index != last) && _outputQueue.TryPeek(out workItem))
{
if (workItem.Value == null) // The sentinel item has a null value to indicate that it's the sentinel.
{
last = workItem.Key; // The sentinel's key is the index of the last block + 1.
}
else if (workItem.Key == index) // Is this block the next one that we want?
{
// Even if new items are added to the queue while we're here, the new items will be lower priority.
// Therefore it is safe to assume that the item we will dequeue now is the same one we peeked at.
_outputQueue.TryDequeue(out workItem);
Contract.Assume(workItem.Key == index); // This *must* be the case.
_workPool.Release(); // Allow the enqueuer to queue another work item.
_write(workItem.Value);
++index;
}
else // If it's not the block we want, we know we'll get a new item at some point.
{
_outputQueue.WaitForNewItem();
}
}
}
}
private void processBlocks()
{
foreach (var block in _inputQueue.GetConsumingEnumerable())
{
var processedData = _process(block.Data);
_outputQueue.Enqueue(block.Index, processedData);
}
}
public bool WaitForFinished(int maxMillisecondsToWait) // Can be Timeout.Infinite.
{
return _multiplexor.Wait(maxMillisecondsToWait);
}
private sealed class WorkItem
{
public WorkItem(T data, int index)
{
Data = data;
Index = index;
}
public T Data { get; private set; }
public int Index { get; private set; }
}
private readonly Task[] _workers;
private readonly Task _multiplexor;
private readonly SemaphoreSlim _workPool;
private readonly BlockingCollection<WorkItem> _inputQueue;
private readonly ConcurrentPriorityQueue<int, T> _outputQueue;
private readonly Read _read;
private readonly Process _process;
private readonly Write _write;
}
}
And here's my test code:
using System;
using System.Diagnostics;
using System.Threading;
namespace Demo
{
public static class Program
{
private static void Main(string[] args)
{
_rng = new Random(34324);
int threadCount = 8;
_maxBlocks = 200;
ThreadPool.SetMinThreads(threadCount + 2, 4); // Kludge to prevent slow thread startup.
var stopwatch = new Stopwatch();
_numBlocks = _maxBlocks;
stopwatch.Restart();
var processor = new ParallelWorkProcessor<byte[]>(read, process, write, threadCount);
processor.WaitForFinished(Timeout.Infinite);
Console.WriteLine("\n\nFinished in " + stopwatch.Elapsed + "\n\n");
}
private static byte[] read()
{
if (_numBlocks-- == 0)
{
return null;
}
var result = new byte[128];
result[0] = (byte)(_maxBlocks-_numBlocks);
Console.WriteLine("Supplied input: " + result[0]);
return result;
}
private static byte[] process(byte[] data)
{
if (data[0] == 10) // Hack for test purposes. Make it REALLY slow for this item!
{
Console.WriteLine("Delaying a call to process() for 5s for ID 10");
Thread.Sleep(5000);
}
Thread.Sleep(10 + _rng.Next(50));
Console.WriteLine("Processed: " + data[0]);
return data;
}
private static void write(byte[] data)
{
Console.WriteLine("Received output: " + data[0]);
}
private static Random _rng;
private static int _numBlocks;
private static int _maxBlocks;
}
}
We have a ConcurrentQueue which is used to share data among 3 threads. Thread A continuously fills the queue with data. Thread B is designed to record this data to a file. Thread C is supposed to retrieve the youngest entry in the queue (or as close to youngest as possible), perform some operations on it and display results on the screen.
Thread B, in order to cluster the file write operations in time, does something like this:
if (cq.Count > 100)
{
while (cq.Count > 1)
{
qElement = PopFromCq(cq); // PopFromCq uses cq.TryDequeue()
bw.Write(qElement.data); // bw is a binary writer
}
}
else
{
System.Threading.Thread.Sleep(10);
}
ie, it waits for at least 100 elements to queue up, then writes them to the disk. It always maintains at least one item in the queue though and the reason is because we want Thread C to always have access to at least one item.
The loop in thread C looks like:
while (threadsRunning)
{
System.Threading.Thread.Sleep(500); // Update twice per second
ProcessDataAndUpdateScreen(cq.ElementAt(cq.Count - 1)); // our terrible attempt at looking at the latest (or close to latest) entry in the queue
}
In this loop, we sometimes get an exception due to the race between the thread that writes the data to disk, and the cq.ElementAt(cq.Count-1) call. I believe what is happening is as follows:
cq.Count is calculated to be, say 90.
By that time, Thread B already started its loop and it is dequeueing data from the queue to write to the disk
By the time cq.ElementAt() is called, Thread B consumed a number of items such that (cq.Count - 1) no longer points to a valid entry in the queue.
Any ideas on what would be a nice way of accessing the youngest entry in the queue in presence of multiple threads operating on the queue?
Regards,
Is it necessary for the A-B communication and A-C communication to both go through the queue? What if you have thread A write each entry to the queue (for B to read and log) and also save the entry it's just queued in a volatile property somewhere. Every time C wants to get the youngest element, it can just read directly from that property.
EDIT: Instead of just relying on a volatile property, you should actually use Interlocked.CompareExchange<T>(T, T) to set and read the "youngest entry" property.
I've modified Producer/Consumer example http://msdn.microsoft.com/en-us/library/yy12yx1f(v=vs.80).aspx. I don't want Consumer to process queue "on event". Instead i'm using infinity loop (the same to one used in Producer) and try to process all elements asap. Are there any problems with such approach? Why we need "events" between Consumer and Producer if we can use infinity loop?
// Consumer.ThreadRun
public void ThreadRun()
{
int count = 0;
while (!_syncEvents.ExitThreadEvent.WaitOne(0, false))
{
lock (((ICollection)_queue).SyncRoot)
{
while (_queue.Count > 0)
{
int item = _queue.Dequeue();
count++;
}
}
}
Console.WriteLine("Consumer Thread: consumed {0} items", count);
}
I see two potential problems with what you have
When the queue is empty your version will sit in a busy loop burning precious CPU, using a event puts the thread to sleep until there is actual work to be done.
By locking the queue and processing all the elements in the queue in a single loop like you are doing, you negate the potential benefit of having multiple consumer threads processing the queue. Now because you only increment a count in your example this might not seem like a big deal, but if you start doing real work with the items that you dequeue you could benefit from having multple threads handling that work.
If you are using .NET 4 you might want to take a look at using BlockingCollection(T) Class which would give an even cleaner solution to all of this with less locking to boot.
A potential problem could occur if your setting of the ExitThreadEvent gets into a race condition (since you don't show that part of the code it's hard to tell if that could happen).
If you are able to use .NET 4.0 you can use the built in BlockingCollection class to solve this problem simply and efficiently.