Handle Queue faster - c#

I have a queue , it receives data overtime.
I used multi thread to dequeue and save to database.
I create an Array of Thread to do this job.
for (int i = 0; i < thr.Length; i++)
{
thr[i] = new Thread(new ThreadStart(SaveData));
thr[i].Start();
}
SaveData
Note : eQ and eiQ is 2 global queue. I used while to keep thread alive.
public void SaveData()
{
var imgDAO = new imageDAO ();
string exception = "";
try
{
while (eQ.Count > 0 && eiQ.Count > 0)
{
var newRecord = eQ.Dequeue();
var newRecordImage = eiQ.Dequeue();
imageDAO.SaveEvent(newEvent, newEventImage);
var storepath = Properties.Settings.Default.StorePath;
save.WriteFile(storepath, newEvent, newEventImage);
}
}
catch (Exception e)
{
Global._logger.Info(e.Message + e.Source);
}
}
It did create multi thread but when I debug, only 1 thread alive, the rest is dead.
I dont know why ? Any one have idea? Tks

You are using WriteFile in that thread function.
Is this possible, that you trying to write file that may be locked by another thread (same filename or something)?
And one more thing - saving data on disk by multiple threads - i dont like it.
I think you should create some buffer instead of many threads and write it every few records/entries.

As mentioned in the comments, your threads will only live as long as there are elements in the queues, as soons as both are emptied the threads will terminate. This could explain why you see only one living thread while debugging.
A potential answer to you question would be to use a BlockingCollection from the System.Collections.Concurrent classes instead of a Queue. That has the capability of doing a blocking dequeue, which will stop the thread(s) until more elements are available for processing.
Another problem is the nice race condition between eQ and eiQ -- consider using a single queue with a Tuple or custom data type so you can dequeue both newRecord and newRecordImage in a single operation.

Related

In C#, how do I process a large text file with multiple threads/tasks, but with conditions?

I am writing a file-processing program in C#. I have a HUGE text file, with 5 columns of data each separated by a bar(|). The first column in each row is a column containing a person's name, and each person has a unique name.
Its a very large text file, so I want to process it concurrently using multiple tasks. But I want every row with the same name to be processed by the SAME task, not a different task. For example, if (part of) my file reads:
Jason|BMW|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|Acura|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|Hyundai|392|17|1/1/2000|1:06
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Dodge|335|18|8/24/2005|7:18
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Infiniti|335|18|8/24/2005|7:14
Then I want one task processing ALL the Jason rows, and another task processing ALL the Mike rows. I don't want the first task processing any Mike rows, and conversely I don't want the second task processing any Jason rows. Essentially, how can I make it so that all rows of a certain name are all processed by the SAME task? ALSO, how will I know when all the processing of all the rows has been completed? I've been racking my tiny brain and I can't come up with a solution.
One idea is to implement the producer-consumer pattern, with one producer that reads the file line-by-line, and multiple consumers that process the lines, one consumer per name. Since the number of unique names may be large, it would be impractical to dedicate a Thread for each consumer, so the consumers should process the data asynchronously. Each consumer should have its own private queue with data to process. The most efficient asynchronous queue currently available in .NET is the Channel<T> class, and using it as a building block would be a good idea, but I will suggest something higher-level that this: an ActionBlock<T> from the TPL Dataflow library. This component combines a processor and a queue, is async-enabled, and is highly configurable. So it will make for a succinct, quite readable, and hopefully quite efficient solution:
var processors = new Dictionary<string, ActionBlock<string>>();
foreach (var line in File.ReadLines(filePath))
{
string name = ExtractName(line); // Reads the first part of the line
if (!processors.TryGetValue(name, out ActionBlock<string> processor))
{
processor = CreateProcessor(name);
processors.Add(name, processor);
}
var accepted = processor.Post(line);
if (!accepted) break; // The processor has failed
}
// Signal that no more lines will be sent to the processors
foreach (var processor in processors.Values) processor.Complete();
// Aggregate the completion of all processors
Task allCompletions = Task.WhenAll(processors.Values.Select(p => p.Completion));
// Wait for the completion of all processors, and allow errors to propagate
allCompletions.Wait(); // or await allCompletions;
static ActionBlock<string> CreateProcessor(string name)
{
return new ActionBlock<string>((string line) =>
{
// Process the line
}, new ExecutionDataflowBlockOptions()
{
// Configure the options if the defaults are not optimal
});
}
I'd go for a concurrent dictionary of concurrent queues, keyed by name.
In the main thread (call it the reader), loop line by line enqueueing the lines to the appropriate concurrent queue (call these the worker queues), with creation of the a new worker queue and dedicated task as needed when a new name is encountered.
It would look something like this (note: this is semi-pseudo code and semi-real code and has no error checking, so treat it as a base for a solution, not the solution).
class FileProcessor
{
private ConcurrentDictionary<string, Worker> workers = new ConcurrentDictionary<string, Worker>();
class Worker
{
public Worker() => Task = Task.Run(Process);
private void Process()
{
foreach (var row in Queue.GetConsumingEnumerable())
{
if (row.Length == 0) break;
ProcessRow(row);
}
}
private void ProcessRow(string[] row)
{
// your implementation here
}
public Task Task { get; }
public BlockingCollection<string[]> Queue { get; } = new BlockingCollection<string[]>(new ConcurrentQueue<string[]>());
}
void ProcessFile(string fileName)
{
foreach (var line in GetLinesOfFile(fileName))
{
var row = line.Split('|');
var name = row[0];
// create worker as needed
var worker = workers.GetOrAdd(name, x => new Worker());
// add a row for the worker to work on
worker.Queue.Add(row);
}
// send an empty array to each worker to signal end of input
foreach (var worker in workers.Values)
worker.Queue.Add(new string[0]);
// now wait for all workers to be done
Task.WaitAll(workers.Values.Select(x => x.Task).ToArray());
}
private static IEnumerable<string> GetLinesOfFile(string fileName)
{
// this helps limit memory consumption by not loading
// the whole file at once
return File.ReadLines(fileName);
}
}
I suggest that your reader thread stream the file rather than reading the entire file; you stated the file was huge, so streaming would be memory friendly). That reader thread is I/O bound, so if you can async/await it, that would be better than my simple Process() doing a foreach with no awaiting.
The features of this approach:
dedicated task per person's name
use of a sentinel value to signal end of input
use of Task.WaitAll to join back to the main thread
assumes the tasks are CPU bound. If they are I/O bound, consider using async/await and Task.WhenAll instead
file is streamed into memory with File.ReadLines()
names do not need to be sorted because the queue to enqueue to is selected by name on-demand
Refinements
In the interest of completeness, the approach above is a bit naive and can be refined by... reading all of the comments and answers; users Zoulias and Mercer in particular have good points. We can refine this approach with
adapt this to TPL Channels and use CompleteAdding. These are not only better abstractions, but more efficient (abstraction and efficient can often be at odds, but not in this case).
reduce the name-to-thread or name-to-task dedication, which can exhaust resources in the case of a large number of names, and instead map names to buckets or partitions where each bucket/partition has a dedicated task/thread.
For the second point, for example, you could have
// create worker as needed
var worker = workers.GetOrAdd(GetPartitionKey(name), x => new Worker());
where GetPartitionKey() could be implemented something like
private string GetPartitionKey(string name) =>
name[0] switch
{
>= 'a' and <= 'f' => "A thru F bucket",
>= 'A' and <= 'F' => "A thru F bucket",
>= 'g' and <= 'k' => "G thru K bucket",
>= 'G' and <= 'K' => "G thru K bucket",
_ => "everything else bucket"
}
or whatever algorithm you want to use as a partition selector.
how can I make it so that all rows of a certain name are all processed by the SAME task?
A System.Threading.Task can be created using various TaskCreationOptions that dictate how and when their threads and resources are managed during their lifetime. For an operation for consuming large amount of data and furthermore segregating the consumption of data to specific threads - you may want to consider creating the tasks that are responsible for individual names with the option TaskCreationOptions.LongRunning which may provide a hint to the task scheduler that an additional thread might be required for the task so that it does not block the forward progress of other threads or work items on the local thread-pool queue.
For the actual how, I would recommend starting various 'Worker' threads, each with their own Task and a way for your main task (the one reading the file, or parsing the JSON data) to communicate between the two that more work needs to be completed.
Consider the use of thread-safe collections such as a ConcurrentQueue<T> or other various collections that may help you in streaming data between threads for consumption safely.
Here's a very limited example of the structure you may want to consider:
void Worker(ConcurrentQueue<string> Queue, CancellationToken Token)
{
// keep the worker in a
while (Token.IsCancellationRequested is false)
{
// check to see if the queue has stuff, and consume it
if (Queue.TryDequeue(out string line))
{
Console.WriteLine($"Consumed Line {line} {Thread.CurrentThread.ManagedThreadId}");
}
// yield the thread incase other threads have work to do
Thread.Sleep(10);
}
Console.WriteLine("Finished Work");
}
// data could be a reader, list, array anything really
IEnumerable<string> Data()
{
yield return "JASON";
yield return "Mike";
yield return "JASON";
yield return "Mike";
}
void Reader()
{
// create some collections to stream the data to other tasks
ConcurrentQueue<string> Jason = new();
ConcurrentQueue<string> Mike = new();
// make sure we have a way to cancel the workers if we need to
CancellationTokenSource tokenSource = new();
// start some worker tasks that will consume the data
Task[] workers = {
new Task(()=> Worker(Jason, tokenSource.Token), TaskCreationOptions.LongRunning),
new Task(()=> Worker(Mike, tokenSource.Token), TaskCreationOptions.LongRunning)
};
for (int i = 0; i < workers.Length; i++)
{
workers[i].Start();
}
// iterate the data and send it off to the queues for consumption
foreach (string line in Data())
{
switch (line)
{
case "JASON":
Console.WriteLine($"Sent line to JASON {Thread.CurrentThread.ManagedThreadId}");
Jason.Enqueue(line);
break;
case "Mike":
Console.WriteLine($"Sent line to Mike {Thread.CurrentThread.ManagedThreadId}");
Mike.Enqueue(line);
break;
default:
Console.WriteLine($"Disposed unknown line {Thread.CurrentThread.ManagedThreadId}");
break;
}
}
// make sure that worker threads are cancelled if parent task has been cancelled
try
{
// wait for workers to finish by checking collections
do
{
Thread.Sleep(10);
} while (Jason.IsEmpty is false && Mike.IsEmpty is false);
}
finally
{
// cancel the worker threads, if they havent already
tokenSource.Cancel();
}
}
// make sure we have a way to cancel the reader if we need to
CancellationTokenSource tokenSource = new();
// start the reader thread
Task[] tasks = { Task.Run(Reader, tokenSource.Token) };
Console.WriteLine("Starting Reader");
Task.WaitAll(tasks);
Console.WriteLine("Finished Reader");
// cleanup the tasks if they are still running some how
tokenSource?.Cancel();
// dispose of IDisposable Object
tokenSource?.Dispose();
Console.ReadLine();

Reading a lot of files "at the same time"

I'm using FileSystemWatcher in order to catch every created, changed, deleted and renamed change over whichever file in a folder.
Over this changes I need to perform a simple checksum of the contents of these files. Simply, I'm opening a filestream and pass it to MD5 class:
private byte[] calculateChecksum(string frl)
{
using (FileStream stream = File.Open(frl, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
return this.md5.ComputeHash(stream);
}
}
The problem is according the amount of files I need to handle. For example, imagine I have 200 files created along the time in a folder, and then I copy all of them and paste them on the same folder. This action is going to cause 200 event and 200 calculateChecksum() performings.
How could I solve this kind of problems?
In FileSystemWatcher handler put tasks to queue that will processed by some worker. Worker can process checksum calc tasks with targeted speed or/and frequency. Probably one worker will be better because many readers can slow down hdd with many read seeks.
Try read about BlockingCollection:
https://msdn.microsoft.com/ru-ru/library/dd997371(v=vs.110).aspx
and Producer-Consumer Dataflow Pattern
https://msdn.microsoft.com/ru-ru/library/hh228601(v=vs.110).aspx
var workerCount = 2;
BlockingCollection<String>[] filesQueues= new BlockingCollection<String>[workerCount];
for(int i = 0; i < workerCount; i++)
{
filesQueues[i] = new BlockingCollection<String>(500);
// Worker
Task.Run(() =>
{
while (!filesQueues[i].IsCompleted)
{
string url;
try
{
url= filesQueues[i].Take();
}
catch (InvalidOperationException) { }
if (!string.IsNullOrWhiteSpace(url))
{
calculateChecksum(url);
}
}
}
}
// inside of FileSystemWatcher handler
var queueIndex = hash(filename) % workersCount
// Warning!!
// Blocks if numbers.Count == dataItems.BoundedCapacity
filesQueues[queueIndex].Add(fileName);
filesQueues[queueIndex].CompleteAdding();
Also you can make multiple consumers, just call Take or TryTake concurrently - each item will only be consumed by a single consumer. But take into account in that case one file can be processed by many workers, and multiple hdd readers can slow down hdd.
UPD in case of multiple workers, it would be better to make multiple BlockingCollections, and push files in queue with index:
I've scketched a cosumer-producer pattern to solve that, and I've tried to use a thread pool in order to smooth the big amount of work, sharing a BlockingCollection
BlockingCollection & ThreadPool:
private BlockingCollection<Index.ResourceIndexDocument> documents;
this.pool = new SmartThreadPool(SmartThreadPool.DefaultIdleTimeout, 4);
this.documents = new BlockingCollection<string>();
As you cann see, I've created a I treadPool setting concurrency to 4. So, there is going to work only 4 thread at the same time regasdless of whether there is x > 4 work's units to handle in the pool.
Producer:
public void warn(string channel, string frl)
{
this.pool.QueueWorkItem<string, string>(
(file) => this.files.Add(file),
channel,
frl
);
}
Consumer:
Task.Factory.StartNew(() =>
{
Index.ResourceIndexDocument document = null;
while (this.documents.TryTake(out document, TimeSpan.FromSeconds(1)))
{
IEnumerable<Index.ResourceIndexDocument> documents = this.documents.Take(this.documents.Count);
Index.IndexEngine.Instance.index(documents);
}
},
TaskCreationOptions.LongRunning
);

Advice on processing giant text file and processing URL's

I'm currently trying to loop through a text file that is about 1.5gb's in size and then use the URL's that are grabbed from it to pull down the html from the site.
For speed I'm trying to process all the HTTP request on a new thread but since C# is not my strongest language but a requirement for what I'm doing I'm a bit confused on good thread practice.
This is how I'm processing the list
private static void Main()
{
const Int32 BufferSize = 128;
using (var fileStream = File.OpenRead("dump.txt"))
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize))
{
String line;
var progress = 0;
while ((line = streamReader.ReadLine()) != null)
{
var stuff = line.Split('|');
getHTML(stuff[3]);
progress += 1;
Console.WriteLine(progress);
}
}
}
And I'm pulling down the HTML as so
private static void getHTML(String url)
{
new Thread(() =>
{
var client = new DecompressGzipResponse();
var html = client.DownloadString(url);
}).Start();
}
Though the speeds are fast doing this initially, after about 20 thousand they slow down and eventually after 32 thousand the application will hang and crash. I was under the impression C# threads terminated when the function completed?
Can anyone give any examples/ suggestions on how to do this better?
One very reliable way to do this is by using the producer-consumer pattern. You create a thread-safe queue of URLs (for example, BlockingCollection<Uri>). Your main thread is the producer, which adds items to the queue. You then have multiple consumer threads, each of which reads Urls from the queue and does the HTTP requests. See BlockingCollection.
Setting it up isn't terribly difficult:
BlockingCollection<Uri> UrlQueue = new BlockingCollection<Uri>();
// Main thread starts the consumer threads
Task t1 = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
Task t2 = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
// create more tasks if you think necessary.
// Now read your file
foreach (var line in File.ReadLines(inputFileName))
{
var theUri = ExtractUriFromLine(line);
UrlQueue.Add(theUri);
}
// when done adding lines to the queue, mark the queue as complete
UrlQueue.CompleteAdding();
// now wait for the tasks to complete.
t1.Wait();
t2.Wait();
// You could also use Task.WaitAll if you have an array of tasks
The individual threads process the urls with this method:
void ProcessUrls()
{
foreach (var uri in UrlQueue.GetConsumingEnumerable())
{
// code here to do a web request on that url
}
}
That's a simple and reliable way to do things, but it's not especially quick. You can do much better by using a second queue of WebCient objects that make asynchronous requests For example, say you want to have 15 asynchronous requests. You start the same way with a BlockingCollection, but you only have one persistent consumer thread.
const int MaxRequests = 15;
BlockingCollection<WebClient> Clients = new BlockingCollection<WebClient>();
// start a single consumer thread
var ProcessingThread = Task.Factory.StartNew(() => ProcessUrls, TaskCreationOptions.LongRunning);
// Create the WebClient objects and add them to the queue
for (var i = 0; i < MaxRequests; ++i)
{
var client = new WebClient();
// Add an event handler for the DownloadDataCompleted event
client.DownloadDataCompleted += DownloadDataCompletedHandler;
// And add this client to the queue
Clients.Add(client);
}
// add the code from above that reads the file and populates the queue
Your processing function is somewhat different:
void ProcessUrls()
{
foreach (var uri in UrlQueue.GetConsumingEnumerable())
{
// Wait for an available client
var client = Clients.Take();
// and make an asynchronous request
client.DownloadDataAsync(uri, client);
}
// When the queue is empty, you need to wait for all of the
// clients to complete their requests.
// You know they're all done when you dequeue all of them.
for (int i = 0; i < MaxRequests; ++i)
{
var client = Clients.Take();
client.Dispose();
}
}
Your DownloadDataCompleted event handler does something with the data that was downloaded, and then adds the WebClient instance back to the queue of clients.
void DownloadDataCompleteHandler(Object sender, DownloadDataCompletedEventArgs e)
{
// The data downloaded is in e.Result
// be sure to check the e.Error and e.Cancelled values to determine if an error occurred
// do something with the data
// And then add the client back to the queue
WebClient client = (WebClient)e.UserState;
Clients.Add(client);
}
This should keep you going with 15 concurrent requests, which is about all you can do without getting a bit more complicated. Your system can likely handle many more concurrent requests, but the way that WebClient starts asynchronous requests requires some synchronous work up front, and that overhead makes 15 about the maximum number you can handle.
You might be able to have multiple threads initiating the asynchronous requests. In that case, you could potentially have as many threads as you have processor cores. So on a quad core machine, you could have the main thread and three consumer threads. With three consumer threads this technique could give you 45 concurrent requests. I'm not certain that it scales that well, but it might be worth a try.
There are ways to have hundreds of concurrent requests, but they're quite a bit more complicated to implement.
You need thread management.
My advice is to use Tasks instead of creating your own Threads.
By using the Task Parallel Library, you let the runtime deal with the thread management. By default, it will allocate your tasks on threads from the ThreadPool, and will allow a level of concurrency which is contingent on the number of CPU cores you have. It will also reuse existing Threads when they become available instead of wasting time creating new ones.
If you want to get more advanced, you can create your own task scheduler to manage the scheduling aspect yourself.
See also What is difference between Task and Thread?

Shouldn't this fail without the use of locking? Simple producer consumer

I have a queue, a list with producer threads and a list with consumer threads.
My code looks like this
public class Runner
{
List<Thread> Producers;
List<Thread> Consumers;
Queue<int> queue;
Random random;
public Runner()
{
Producers = new List<Thread>();
Consumers = new List<Thread>();
for (int i = 0; i < 2; i++)
{
Thread thread = new Thread(Produce);
Producers.Add(thread);
}
for (int i = 0; i < 2; i++)
{
Thread thread = new Thread(Consume);
Consumers.Add(thread);
}
queue = new Queue<int>();
random = new Random();
Producers.ForEach(( thread ) => { thread.Start(); });
Consumers.ForEach(( thread ) => { thread.Start(); });
}
protected void Produce()
{
while (true)
{
int number = random.Next(0, 99);
queue.Enqueue(number);
Console.WriteLine(Thread.CurrentThread.ManagedThreadId + " Produce: " + number);
}
}
protected void Consume()
{
while (true)
{
if (queue.Any())
{
int number = queue.Dequeue();
Console.WriteLine(Thread.CurrentThread.ManagedThreadId + " Consume: " + number);
}
else
{
Console.WriteLine("No items to consume");
}
}
}
}
Shouldn't this fail miserable cause of the missing use of the lock keyword?
It failed once because it tried to dequeue when the queue was empty, using the lock keyword will fix that right?
If the lock keyword is not needed for the above code, when is it needed then?
Thank you in advance! =)
Locking is to done to eliminate aberrant behavior of an application, most specifically in multithreading. The most common goal is the elimination of a "race condition" which causes non-deterministic program behavior.
This is the behavior you saw. In one run you get an error for the queue having no items, in another run you have no issues. This is a race condition. Proper usage of locking will eliminate this scenario.
Using Queue without locks is not thread safe indeed. But better than using locks you may try ConcurrentQueue. Google for "C# ConcurrentQueue" and you will find quite a lot of examples, e.g. this one compares the use and performance of Queue with a lock and ConcurrentQueue.
To clarify the existing answers, if you have a multithreading problem (such as a race condition) then it isn't guaranteed to always fail - it may fail, in a very unpredictable manner.
The reason is that two (or more) threads that are accessing a resource may try to access it at different times - precisely when each of them tries to access it will depend on many factors (how fast your CPU is, how many processor cores it has available, what other programs are running at the time, whether you are running a release or debug build, or running under a debugger, etc). You could run it many times without the failure showing up, and then have it suddenly and "inexplicably" fail - this can make these errors extremely hard to track down because they don't often show up while you're writing the faulty code, but more often when you are writing a different unrelated piece of code.
If you are going to use multithreading it is vital that you read up on the subject and gain an understanding of what can go wrong, when, and how to handle it properly - bad use of locking can be just as dangerous (if not more so) than not using locks at all (locking can cause deadlocks where your program simply "locks up"). This are aof programming must be approached carefully!
Yes this code will fail. The queue needs to support multi-threading. Use a ConcurrentQueue. See http://msdn.microsoft.com/en-us/library/dd267265.aspx
By running your code I received InvalidOperationException - "Collection was modified after the enumerator was instantiated." It means that you modify data while using several threads.
You can use the lock every time you Enqueue or Dequeue - because you modify the queue from several threads. A far better option is to use ConcurentQueues as it is thread safe and lock-free concurrent collection. It also provides better performance.
Yep, you would definitely to synchronize access to the Queue to make it thread-safe. But, you have another problem. There is no mechanism which keeps the consumers from spinning wildly around the loop. Synchronizing access to the Queue or using ConcurrentQueue will not fix that problem.
The simplest way to implement the producer-consumer pattern is to use a blocking queue. Fortunately, .NET 4.0 provides the BlockingCollection which is, despite the name, an implementation of a blocking queue.
public class Runner
{
private BlockingCollection<int> queue = new BlockingCollection<int>();
private Random random = new Random();
public Runner()
{
for (int i = 0; i < 2; i++)
{
var thread = new Thread(Produce);
thread.Start();
}
for (int i = 0; i < 2; i++)
{
var thread = new Thread(Consume);
thread.Start();
}
}
protected void Produce()
{
while (true)
{
int number = random.Next(0, 99);
queue.Add(number);
Console.WriteLine(Thread.CurrentThread.ManagedThreadId + " Produce: " + number);
}
}
protected void Consume()
{
while (true)
{
int number = queue.Take();
Console.WriteLine(Thread.CurrentThread.ManagedThreadId + " Consume: " + number);
}
}
}

Is a Semaphore the right tool for this video sequence capture/save job?

I am working on a WPF project with C# (.NET 4.0) to capture a sequence of 300 video frames from a high-speed camera that need to be saved to disk (BMP format). The video frames need to be captured in near-exact time intervals, so I can't save the frames to disk as they're being captured -- the disk I/O is unpredictable and it throws off the time intervals between frames. The capture card has about 60 frame buffers available.
I'm not sure what the best approach is for implementing a solution to this problem. My initial thoughts are to create a "BufferToDisk" thread that saves the images from the frame buffers as they become available. In this scenario, the main thread captures a frame buffer and then signals the thread to indicate that it is OK to save the frame. The problem is that the frames are being captured quicker than the thread can save the files, so there needs to be some kind of synchronization to deal with this. I was thinking a Semaphore would be a good tool for this job. I have never used a Semaphore in this way, though, so I'm not sure how to proceed.
Is this a reasonable approach to this problem? If so, can someone post some code to get me started?
Any help is much appreciated.
Edit:
After looking over the linked "Threading in C# - Part 2" book excerpt, I decided to implement the solution by adapting the "ProducerConsumerQueue" class example. Here is my adapted code:
class ProducerConsumerQueue : IDisposable
{
EventWaitHandle _wh = new AutoResetEvent(false);
Thread _worker;
readonly object _locker = new object();
Queue<string> _tasks = new Queue<string>();
public ProducerConsumerQueue()
{
_worker = new Thread(Work);
_worker.Start();
}
public void EnqueueTask(string task)
{
lock (_locker) _tasks.Enqueue(task);
_wh.Set();
}
public void Dispose()
{
EnqueueTask(null); // Signal the consumer to exit.
_worker.Join(); // Wait for the consumer's thread to finish.
_wh.Close(); // Release any OS resources.
}
void Work()
{
while (true)
{
string task = null;
lock (_locker)
if (_tasks.Count > 0)
{
task = _tasks.Dequeue();
if (task == null)
{
return;
}
}
if (task != null)
{
// parse the parameters from the input queue item
string[] indexVals = task.Split(',');
int frameNum = Convert.ToInt32(indexVals[0]);
int fileNum = Convert.ToInt32(indexVals[1]);
string path = indexVals[2];
// build the file name
string newFileName = String.Format("img{0:d3}.bmp", fileNum);
string fqfn = System.IO.Path.Combine(path, newFileName);
// save the captured image to disk
int ret = pxd_saveBmp(1, fqfn, frameNum, 0, 0, -1, -1, 0, 0);
}
else
{
_wh.WaitOne(); // No more tasks - wait for a signal
}
}
}
}
Using the class in the main routine:
// capture bitmap images and save them to disk
using (ProducerConsumerQueue q = new ProducerConsumerQueue())
{
for (int i = 0; i < 300; i++)
{
if (curFrmBuf > numFrmBufs)
{
curFrmBuf = 1; // wrap around to the first frame buffer
}
// snap an image to the image buffer
int ret = pxd_doSnap(1, curFrmBuf, 0);
// build the parameters for saving the frame to image file (for the queue)
string fileSaveParams = curFrmBuf + "," + (i + 1) + "," + newPath;
q.EnqueueTask(fileSaveParams);
curFrmBuf++;
}
}
Pretty slick class -- a small amount of code for this functionality.
Thanks so much for the suggestions, guys.
Sure, sounds reasonable. You can use semaphores or other thread synchronization primitives. This sounds like a standard producer/consumer problem. Take a look here for some pseudo-code
.
What happens if the disk is so slow (e.g. some other process pegs it) that 60 frame buffers are not enough? Maybe you'll need a BufferToMemory and BufferToDisk thread or some sort of combination. You'll want the main thread (capture to buffer) to have the highest priority, BufferToMemory medium, and BufferToDisk the lowest.
Anyway, back to Semaphores, I recommend you read this: http://www.albahari.com/threading/part2.aspx#_Semaphore. Semaphores should do the trick for you, though I would recommend SemaphoreSlim (.NET 4).
Since you're treating this as a producer/consumer problem (judging by your reply to #siz's answer), you might want to look at BlockingCollection<T>, which is designed for precisely this sort of scenario.
It allows any number of producer threads to push data into the collection, and any number of consumer threads to pull it out again. In this case, you probably want just one producer and one consumer thread.
The BlockingCollection<T> does all the work of making sure the consumer thread only wakes up and processes work once the producing thread has said that there's more work to do. And it also takes care of allowing a queue of work to build up.

Categories