Managing multi-threading in C# .NET, controlling thread count per operation - c#

Think the title I've given is a bit confusing but hard to express what I'm trying to ask.
Basically I am writing a program in C# using .NET that uses the Google cloud API in order to upload data.
I am trying to do this in an efficient way and have used parallel.foreach with success but I need finer control. I collect the files to be uploaded into one list, which I want to sort by file size and then split into say 3 equally sized (in terms of gigabytes not file count) lists.
One of these lists will contain say a third in terms of total upload size but be comprised of the largest files (in gigabytes) but therefore the smallest count of files, the next list will be the same total gigabytes as the first list but be comprised of a greater number of smaller files and finally the last list will be comprised of many many small files but should also total the same size as the other sub lists.
I then want to assign a set number of threads to the upload process. (e.g. I want the largest file list to have 5 threads assigned, the middle to have 3 and the small file list to have only 2 thread.) Is it possible to set up these 3 lists to be iterated over in parallel, while controlling how many threads are allocated?
What is the best method to do so?

Parallel.ForEach and PLINQ are meant for data parallelism - processing big chunks of data using multiple cores. It's meant for scenarios where you have eg 1GB of data in memory (or a very fast IEnumerable source) and want to process it using all cores. In such scenarios, you need to partition the data into independent chunks and have one worker crunch one crunch at a time, to limit the synchronization overhead.
What you describe though is concurrent uploads for a large number of files. That's pure IO, not data parallelism. Most of the time will be spent loading the data from disk or writing it to the network. This is a job for Task.Run and async/await. To upload multiple files concurrently, you could use an ActionBlock or a Channel to queue the files and upload them asynchronously. With channels you have to write a bit of worker boilerplate but you get greater control, especially in cases where you want to use eg the same client instance for multiple calls. An ActionBlock is essentially stateless.
Finally, you describe queues with different DOP based on size, which is a very nice idea when you have both big and small files. You can do that by using multiple ActionBlock instances, each with a different DOP, or multiple Channel workers, each with a different DOP.
Dataflows
Let's say you already have a method that uploads a file by path name :
//Adopted from the Google SDK example
async Task UploadFile(DriveService service,FileInfo file)
{
var fileName=Path.GetFileName(filePath);
using var uploadStream = file.OpenRead();
var request insertRequest = service.Files.Insert(
new File { Title = file.Name },
uploadStream,
"image/jpeg");
await insert.UploadAsync();
}
You can create three different ActionBlock instances, each with a different DOP :
var small=new ActionBlock<FileInfo>(
file=>UploadFile(service,file),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 15
});
var medium=new ActionBlock<FileInfo>(
file=>UploadFile(service,file),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
});
var big=new ActionBlock<FileInfo>(
path=>UploadFile(service,file),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 2
});
And post different files to different blocks based on size :
var directory=new DirectoryInfo(...);
var files=directory.EnumerateFiles(...);
foreach(var file in files)
{
switch (file.Length)
{
case int x when x < 1024:
small.Post(file);
break;
case int x when x < 10240:
medium.Post(file);
break;
default:
big.Post(file);
break;
}
}
Or, in C# 8 :
foreach(var file in files)
{
var block = file.Length switch {
long x when x < 1024 => small,
long x when x < 10240=> medium,
_ => big
};
block.Post(file)
}
When iteration completes, we need to tell the blocks we are done by calling Complete() on each one and waiting for all of them to finish with :
small.Complete();
medium.Complete();
big.Complete();
await Task.WhenAll(small.Completion, medium.Completion, big.Completion);

Here is another idea. You could have a single list, but upload the files with a dynamic degree of parallelism. This would be easy to implement if the SemaphoreSlim class had a WaitAsync method that could reduce the CurrentCount by a value other than 1. You could then initialize the SemaphoreSlim with a large initialCount like 1000, and then call WaitAsync with a value relative to the size of each file. Lets call this value weight. The semaphore would guarantee that the sum weight of all files currently uploaded would not exceed 1000. This could be a single huge file with weight of 1000, or 10 medium files each weighing 100, or a mix of small, medium and large files with total weight around 1000. The degree of parallelism would constantly change depending on the weight of the next file in the list.
This is an example of the code that you'd have to write:
var semaphore = new SemaphoreSlim(1000);
var tasks = Directory.GetFiles(#"D:\FilesToUpload")
.Select(async filePath =>
{
var fi = new FileInfo(filePath);
var weight = (int)Math.Min(1000, fi.Length / 1_000_000);
await semaphore.WaitAsync(weight); // Imaginary overload that accepts weight
try
{
await cloudService.UploadFile(filePath);
}
finally
{
semaphore.Release(weight);
}
})
.ToArray();
await Task.WhenAll(tasks);
Below is a custom AsyncSemaphorePlus class that provides the missing overload. It is based on Stephen Toub's AsyncSemaphore class from the blog post Building Async Coordination Primitives, Part 5: AsyncSemaphore. It is slightly modernized with features like Task.CompletedTask and TaskCreationOptions.RunContinuationsAsynchronously, that were not available at the time the blog post was written.
public class AsyncSemaphorePlus
{
private readonly object _locker = new object();
private readonly Queue<(TaskCompletionSource<bool>, int)> _queue
= new Queue<(TaskCompletionSource<bool>, int)>();
private int _currentCount;
public int CurrentCount { get { lock (_locker) return _currentCount; } }
public AsyncSemaphorePlus(int initialCount)
{
if (initialCount < 0)
throw new ArgumentOutOfRangeException(nameof(initialCount));
_currentCount = initialCount;
}
public Task WaitAsync(int count)
{
lock (_locker)
{
if (_currentCount - count >= 0)
{
_currentCount -= count;
return Task.CompletedTask;
}
else
{
var tcs = new TaskCompletionSource<bool>(
TaskCreationOptions.RunContinuationsAsynchronously);
_queue.Enqueue((tcs, count));
return tcs.Task;
}
}
}
public void Release(int count)
{
lock (_locker)
{
_currentCount += count;
while (_queue.Count > 0)
{
var (tcs, weight) = _queue.Peek();
if (weight > _currentCount) break;
(tcs, weight) = _queue.Dequeue();
_currentCount -= weight;
tcs.SetResult(true);
}
}
}
}
Update: This approach is intended for uploading a medium/large number of files. It is not suitable for extremely huge number of files, because all uploading tasks are created upfront. If the files that have to be uploaded are, say, 100,000,000, then the memory required to store the state of all these tasks may exceed the available RAM of the machine. For uploading that many files the solution proposed by Panagiotis Kanavos is probably preferable, because in can be easily modified with bounded dataflow blocks, and by feeding them with SendAsync instead of Post, so that the memory required for the whole operation is kept under control.

Related

In C#, how do I process a large text file with multiple threads/tasks, but with conditions?

I am writing a file-processing program in C#. I have a HUGE text file, with 5 columns of data each separated by a bar(|). The first column in each row is a column containing a person's name, and each person has a unique name.
Its a very large text file, so I want to process it concurrently using multiple tasks. But I want every row with the same name to be processed by the SAME task, not a different task. For example, if (part of) my file reads:
Jason|BMW|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|Acura|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|BMW|354|23|1/1/2000|1:03
Jason|Hyundai|392|17|1/1/2000|1:06
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Dodge|335|18|8/24/2005|7:18
Mike|Infiniti|335|18|8/24/2005|7:11
Mike|Infiniti|335|18|8/24/2005|7:14
Then I want one task processing ALL the Jason rows, and another task processing ALL the Mike rows. I don't want the first task processing any Mike rows, and conversely I don't want the second task processing any Jason rows. Essentially, how can I make it so that all rows of a certain name are all processed by the SAME task? ALSO, how will I know when all the processing of all the rows has been completed? I've been racking my tiny brain and I can't come up with a solution.
One idea is to implement the producer-consumer pattern, with one producer that reads the file line-by-line, and multiple consumers that process the lines, one consumer per name. Since the number of unique names may be large, it would be impractical to dedicate a Thread for each consumer, so the consumers should process the data asynchronously. Each consumer should have its own private queue with data to process. The most efficient asynchronous queue currently available in .NET is the Channel<T> class, and using it as a building block would be a good idea, but I will suggest something higher-level that this: an ActionBlock<T> from the TPL Dataflow library. This component combines a processor and a queue, is async-enabled, and is highly configurable. So it will make for a succinct, quite readable, and hopefully quite efficient solution:
var processors = new Dictionary<string, ActionBlock<string>>();
foreach (var line in File.ReadLines(filePath))
{
string name = ExtractName(line); // Reads the first part of the line
if (!processors.TryGetValue(name, out ActionBlock<string> processor))
{
processor = CreateProcessor(name);
processors.Add(name, processor);
}
var accepted = processor.Post(line);
if (!accepted) break; // The processor has failed
}
// Signal that no more lines will be sent to the processors
foreach (var processor in processors.Values) processor.Complete();
// Aggregate the completion of all processors
Task allCompletions = Task.WhenAll(processors.Values.Select(p => p.Completion));
// Wait for the completion of all processors, and allow errors to propagate
allCompletions.Wait(); // or await allCompletions;
static ActionBlock<string> CreateProcessor(string name)
{
return new ActionBlock<string>((string line) =>
{
// Process the line
}, new ExecutionDataflowBlockOptions()
{
// Configure the options if the defaults are not optimal
});
}
I'd go for a concurrent dictionary of concurrent queues, keyed by name.
In the main thread (call it the reader), loop line by line enqueueing the lines to the appropriate concurrent queue (call these the worker queues), with creation of the a new worker queue and dedicated task as needed when a new name is encountered.
It would look something like this (note: this is semi-pseudo code and semi-real code and has no error checking, so treat it as a base for a solution, not the solution).
class FileProcessor
{
private ConcurrentDictionary<string, Worker> workers = new ConcurrentDictionary<string, Worker>();
class Worker
{
public Worker() => Task = Task.Run(Process);
private void Process()
{
foreach (var row in Queue.GetConsumingEnumerable())
{
if (row.Length == 0) break;
ProcessRow(row);
}
}
private void ProcessRow(string[] row)
{
// your implementation here
}
public Task Task { get; }
public BlockingCollection<string[]> Queue { get; } = new BlockingCollection<string[]>(new ConcurrentQueue<string[]>());
}
void ProcessFile(string fileName)
{
foreach (var line in GetLinesOfFile(fileName))
{
var row = line.Split('|');
var name = row[0];
// create worker as needed
var worker = workers.GetOrAdd(name, x => new Worker());
// add a row for the worker to work on
worker.Queue.Add(row);
}
// send an empty array to each worker to signal end of input
foreach (var worker in workers.Values)
worker.Queue.Add(new string[0]);
// now wait for all workers to be done
Task.WaitAll(workers.Values.Select(x => x.Task).ToArray());
}
private static IEnumerable<string> GetLinesOfFile(string fileName)
{
// this helps limit memory consumption by not loading
// the whole file at once
return File.ReadLines(fileName);
}
}
I suggest that your reader thread stream the file rather than reading the entire file; you stated the file was huge, so streaming would be memory friendly). That reader thread is I/O bound, so if you can async/await it, that would be better than my simple Process() doing a foreach with no awaiting.
The features of this approach:
dedicated task per person's name
use of a sentinel value to signal end of input
use of Task.WaitAll to join back to the main thread
assumes the tasks are CPU bound. If they are I/O bound, consider using async/await and Task.WhenAll instead
file is streamed into memory with File.ReadLines()
names do not need to be sorted because the queue to enqueue to is selected by name on-demand
Refinements
In the interest of completeness, the approach above is a bit naive and can be refined by... reading all of the comments and answers; users Zoulias and Mercer in particular have good points. We can refine this approach with
adapt this to TPL Channels and use CompleteAdding. These are not only better abstractions, but more efficient (abstraction and efficient can often be at odds, but not in this case).
reduce the name-to-thread or name-to-task dedication, which can exhaust resources in the case of a large number of names, and instead map names to buckets or partitions where each bucket/partition has a dedicated task/thread.
For the second point, for example, you could have
// create worker as needed
var worker = workers.GetOrAdd(GetPartitionKey(name), x => new Worker());
where GetPartitionKey() could be implemented something like
private string GetPartitionKey(string name) =>
name[0] switch
{
>= 'a' and <= 'f' => "A thru F bucket",
>= 'A' and <= 'F' => "A thru F bucket",
>= 'g' and <= 'k' => "G thru K bucket",
>= 'G' and <= 'K' => "G thru K bucket",
_ => "everything else bucket"
}
or whatever algorithm you want to use as a partition selector.
how can I make it so that all rows of a certain name are all processed by the SAME task?
A System.Threading.Task can be created using various TaskCreationOptions that dictate how and when their threads and resources are managed during their lifetime. For an operation for consuming large amount of data and furthermore segregating the consumption of data to specific threads - you may want to consider creating the tasks that are responsible for individual names with the option TaskCreationOptions.LongRunning which may provide a hint to the task scheduler that an additional thread might be required for the task so that it does not block the forward progress of other threads or work items on the local thread-pool queue.
For the actual how, I would recommend starting various 'Worker' threads, each with their own Task and a way for your main task (the one reading the file, or parsing the JSON data) to communicate between the two that more work needs to be completed.
Consider the use of thread-safe collections such as a ConcurrentQueue<T> or other various collections that may help you in streaming data between threads for consumption safely.
Here's a very limited example of the structure you may want to consider:
void Worker(ConcurrentQueue<string> Queue, CancellationToken Token)
{
// keep the worker in a
while (Token.IsCancellationRequested is false)
{
// check to see if the queue has stuff, and consume it
if (Queue.TryDequeue(out string line))
{
Console.WriteLine($"Consumed Line {line} {Thread.CurrentThread.ManagedThreadId}");
}
// yield the thread incase other threads have work to do
Thread.Sleep(10);
}
Console.WriteLine("Finished Work");
}
// data could be a reader, list, array anything really
IEnumerable<string> Data()
{
yield return "JASON";
yield return "Mike";
yield return "JASON";
yield return "Mike";
}
void Reader()
{
// create some collections to stream the data to other tasks
ConcurrentQueue<string> Jason = new();
ConcurrentQueue<string> Mike = new();
// make sure we have a way to cancel the workers if we need to
CancellationTokenSource tokenSource = new();
// start some worker tasks that will consume the data
Task[] workers = {
new Task(()=> Worker(Jason, tokenSource.Token), TaskCreationOptions.LongRunning),
new Task(()=> Worker(Mike, tokenSource.Token), TaskCreationOptions.LongRunning)
};
for (int i = 0; i < workers.Length; i++)
{
workers[i].Start();
}
// iterate the data and send it off to the queues for consumption
foreach (string line in Data())
{
switch (line)
{
case "JASON":
Console.WriteLine($"Sent line to JASON {Thread.CurrentThread.ManagedThreadId}");
Jason.Enqueue(line);
break;
case "Mike":
Console.WriteLine($"Sent line to Mike {Thread.CurrentThread.ManagedThreadId}");
Mike.Enqueue(line);
break;
default:
Console.WriteLine($"Disposed unknown line {Thread.CurrentThread.ManagedThreadId}");
break;
}
}
// make sure that worker threads are cancelled if parent task has been cancelled
try
{
// wait for workers to finish by checking collections
do
{
Thread.Sleep(10);
} while (Jason.IsEmpty is false && Mike.IsEmpty is false);
}
finally
{
// cancel the worker threads, if they havent already
tokenSource.Cancel();
}
}
// make sure we have a way to cancel the reader if we need to
CancellationTokenSource tokenSource = new();
// start the reader thread
Task[] tasks = { Task.Run(Reader, tokenSource.Token) };
Console.WriteLine("Starting Reader");
Task.WaitAll(tasks);
Console.WriteLine("Finished Reader");
// cleanup the tasks if they are still running some how
tokenSource?.Cancel();
// dispose of IDisposable Object
tokenSource?.Dispose();
Console.ReadLine();

Reading a lot of files "at the same time"

I'm using FileSystemWatcher in order to catch every created, changed, deleted and renamed change over whichever file in a folder.
Over this changes I need to perform a simple checksum of the contents of these files. Simply, I'm opening a filestream and pass it to MD5 class:
private byte[] calculateChecksum(string frl)
{
using (FileStream stream = File.Open(frl, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
return this.md5.ComputeHash(stream);
}
}
The problem is according the amount of files I need to handle. For example, imagine I have 200 files created along the time in a folder, and then I copy all of them and paste them on the same folder. This action is going to cause 200 event and 200 calculateChecksum() performings.
How could I solve this kind of problems?
In FileSystemWatcher handler put tasks to queue that will processed by some worker. Worker can process checksum calc tasks with targeted speed or/and frequency. Probably one worker will be better because many readers can slow down hdd with many read seeks.
Try read about BlockingCollection:
https://msdn.microsoft.com/ru-ru/library/dd997371(v=vs.110).aspx
and Producer-Consumer Dataflow Pattern
https://msdn.microsoft.com/ru-ru/library/hh228601(v=vs.110).aspx
var workerCount = 2;
BlockingCollection<String>[] filesQueues= new BlockingCollection<String>[workerCount];
for(int i = 0; i < workerCount; i++)
{
filesQueues[i] = new BlockingCollection<String>(500);
// Worker
Task.Run(() =>
{
while (!filesQueues[i].IsCompleted)
{
string url;
try
{
url= filesQueues[i].Take();
}
catch (InvalidOperationException) { }
if (!string.IsNullOrWhiteSpace(url))
{
calculateChecksum(url);
}
}
}
}
// inside of FileSystemWatcher handler
var queueIndex = hash(filename) % workersCount
// Warning!!
// Blocks if numbers.Count == dataItems.BoundedCapacity
filesQueues[queueIndex].Add(fileName);
filesQueues[queueIndex].CompleteAdding();
Also you can make multiple consumers, just call Take or TryTake concurrently - each item will only be consumed by a single consumer. But take into account in that case one file can be processed by many workers, and multiple hdd readers can slow down hdd.
UPD in case of multiple workers, it would be better to make multiple BlockingCollections, and push files in queue with index:
I've scketched a cosumer-producer pattern to solve that, and I've tried to use a thread pool in order to smooth the big amount of work, sharing a BlockingCollection
BlockingCollection & ThreadPool:
private BlockingCollection<Index.ResourceIndexDocument> documents;
this.pool = new SmartThreadPool(SmartThreadPool.DefaultIdleTimeout, 4);
this.documents = new BlockingCollection<string>();
As you cann see, I've created a I treadPool setting concurrency to 4. So, there is going to work only 4 thread at the same time regasdless of whether there is x > 4 work's units to handle in the pool.
Producer:
public void warn(string channel, string frl)
{
this.pool.QueueWorkItem<string, string>(
(file) => this.files.Add(file),
channel,
frl
);
}
Consumer:
Task.Factory.StartNew(() =>
{
Index.ResourceIndexDocument document = null;
while (this.documents.TryTake(out document, TimeSpan.FromSeconds(1)))
{
IEnumerable<Index.ResourceIndexDocument> documents = this.documents.Take(this.documents.Count);
Index.IndexEngine.Instance.index(documents);
}
},
TaskCreationOptions.LongRunning
);

Wanting to limit Threadpool so it doesn't max out CPU

I am programming with Threads for the first time. My program only shows a small amount of data at a time; as the user moves through the data I want it to load all the possible data that could be access next so there is as little lag as possible when user switches to a new section.
Worst case scenario I might need to preload 6 sections of data. So I use something like:
if (SectionOne == null)
{
ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(PreloadSection),
Tuple.Create(thisSection, SectionOne));
}
if (SectionTwo == null)
{
ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(PreloadSection),
Tuple.Create(thisSection, SectionTwo));
}
//....
to preload each area. It works great on my main system that has 8 cores; but on my test system that only has 4 cores the entire system slows to a crawl while it is running the threads.
I am thinking that I want to run a maximum of TotalCores - 2 threads at the same time. But really I have no idea.
Looking for any help in getting this to run as efficiently as possible on multiple system setups (single core through 8 cores or whatever). Also, I am using C# and this is a Portable Class Library project, so some of my options are limited.
I would be using this built in .NET parallelism magic.
Task Parallelism
With the Task operations that is managed for you but you still have control to pick how many cores and threads you want.
Example:
const int MAX = 10000;
var options = new ParallelOptions
{
MaxDegreeOfParallelism = 2
};
IList<int> threadIds = new List<int>();
Parallel.For(0, MAX, options, i =>
{
var id = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine("Number '{0}' on thread {1}", i, id);
threadIds.Add(id);
});
You can even do it with Extensions if you want:
const int MAX_TASKS = 8;
var numbers = Enumerable.Range(0, 10000000);
IList<int> threadIds = new List<int>(MAX_TASKS);
numbers.AsParallel()
.WithDegreeOfParallelism(MAX_TASKS)
.ForAll(i =>
{
var id = Thread.CurrentThread.ManagedThreadId;
if (!threadIds.Contains(id))
{
threadIds.Add(id);
}
});
Assert.IsTrue(threadIds.Count > 2);
Assert.IsTrue(threadIds.Count <= MAX_TASKS);
Console.WriteLine(threadIds.Count);

Anyway to Parallel Yield c#

I have multiple enumerators that enumerate over flat files. I originally had each enumerator in a Parallel Invoke and each Action was adding to a BlockingCollection<Entity> and that collections was returning a ConsumingEnumerable();
public interface IFlatFileQuery
{
IEnumerable<Entity> Run();
}
public class FlatFile1 : IFlatFileQuery
{
public IEnumerable<Entity> Run()
{
// loop over a flat file and yield each result
yield return Entity;
}
}
public class Main
{
public IEnumerable<Entity> DoLongTask(ICollection<IFlatFileQuery> _flatFileQueries)
{
// do some other stuff that needs to be returned first:
yield return Entity;
// then enumerate and return the flat file data
foreach (var entity in GetData(_flatFileQueries))
{
yield return entity;
}
}
private IEnumerable<Entity> GetData(_flatFileQueries)
{
var buffer = new BlockingCollection<Entity>(100);
var actions = _flatFileQueries.Select(fundFileQuery => (Action)(() =>
{
foreach (var entity in fundFileQuery.Run())
{
buffer.TryAdd(entity, Timeout.Infinite);
}
})).ToArray();
Task.Factory.StartNew(() =>
{
Parallel.Invoke(actions);
buffer.CompleteAdding();
});
return buffer.GetConsumingEnumerable();
}
}
However after a bit of testing it turns out that the code change below is about 20-25% faster.
private IEnumerable<Entity> GetData(_flatFileQueries)
{
return _flatFileQueries.AsParallel().SelectMany(ffq => ffq.Run());
}
The trouble with the code change is that it waits till all flat file queries are enumerated before it returns the whole lot that can then be enumerated and yielded.
Would it be possible to yield in the above bit of code somehow to make it even faster?
I should add that at most the combined results of all the flat file queries might only be 1000 or so Entities.
Edit:
Changing it to the below doesn't make a difference to the run time. (R# even suggests to go back to the way it was)
private IEnumerable<Entity> GetData(_flatFileQueries)
{
foreach (var entity in _flatFileQueries.AsParallel().SelectMany(ffq => ffq.Run()))
{
yield return entity;
}
}
The trouble with the code change is that it waits till all flat file queries are enumerated before it returns the whole lot that can then be enumerated and yielded.
Let's prove that it's false by a simple example. First, let's create a TestQuery class that will yield a single entity after a given time. Second, let's execute several test queries in parallel and measure how long it took to yield their result.
public class TestQuery : IFlatFileQuery {
private readonly int _sleepTime;
public IEnumerable<Entity> Run() {
Thread.Sleep(_sleepTime);
return new[] { new Entity() };
}
public TestQuery(int sleepTime) {
_sleepTime = sleepTime;
}
}
internal static class Program {
private static void Main() {
Stopwatch stopwatch = Stopwatch.StartNew();
var queries = new IFlatFileQuery[] {
new TestQuery(2000),
new TestQuery(3000),
new TestQuery(1000)
};
foreach (var entity in queries.AsParallel().SelectMany(ffq => ffq.Run()))
Console.WriteLine("Yielded after {0:N0} seconds", stopwatch.Elapsed.TotalSeconds);
Console.ReadKey();
}
}
This code prints:
Yielded after 1 seconds
Yielded after 2 seconds
Yielded after 3 seconds
You can see with this output that AsParallel() will yield each result as soon as its available, so everything works fine. Note that you might get different timings depending on the degree of parallelism (such as "2s, 5s, 6s" with a degree of parallelism of 1, effectively making the whole operation not parallel at all). This output comes from an 4-cores machine.
Your long processing will probably scale with the number of cores, if there is no common bottleneck between the threads (such as a shared locked resource). You might want to profile your algorithm to see if there are slow parts that can be improved using tools such as dotTrace.
I don't think there is a red flag in your code anywhere. There are no outrageous inefficiencies. I think it comes down to multiple smaller differences.
PLINQ is very good at processing streams of data. Internally, it works more efficiently than adding items to a synchronized list one-by-one. I suspect that your calls to TryAdd are a bottleneck because each call requires at least two Interlocked operations internally. Those can put enormous load on the inter-processor memory bus because all threads will compete for the same cache line.
PLINQ is cheaper because internally, it does some buffering. I'm sure it doesn't output items one-by-one. Probably it batches them and amortizes sycnhronization cost that way over multiple items.
A second issue would be the bounded capacity of the BlockingCollection. 100 is not a lot. This might lead to a lot of waiting. Waiting is costly because it requires a call to the kernel and a context switch.
I make this alternative that works good for me in any scenario:
This works for me:
In a Task in a Parallel.Foreach Enqueue in a ConcurrentQueue the item
transformed to be processed.
The task has a continue that marks a
flag with that task ends.
In the same thread of execution with tasks
ends a while dequeue and yields
Fast and excellent results for me:
Task.Factory.StartNew (() =>
{
Parallel.ForEach<string> (TextHelper.ReadLines(FileName), ProcessHelper.DefaultParallelOptions,
(string currentLine) =>
{
// Read line, validate and enqeue to an instance of FileLineData (custom class)
});
}).
ContinueWith
(
ic => isCompleted = true
);
while (!isCompleted || qlines.Count > 0)
{
if (qlines.TryDequeue (out returnLine))
{
yield return returnLine;
}
}
By default the ParallelQuery class, when is working on IEnumerable<T> sources, employs a partitioning strategy known as "chunk partitioning". With this strategy each worker thread grabs a progressively larger number of items each time. This means that it has an input buffer. Then the results are accumulated into an output buffer, having a size chosen by the system, before they are available to the consumer of the query. You can disable both buffers by using the configuration options EnumerablePartitionerOptions.NoBuffering and ParallelMergeOptions.NotBuffered.
private IEnumerable<Entity> GetData(ICollection<IFlatFileQuery> flatFileQueries)
{
return Partitioner
.Create(flatFileQueries, EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.AsOrdered()
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.SelectMany(ffq => ffq.Run());
}
This way each worker thread will grab only one item at a time, and will propagate the result as soon as it is computed.
NoBuffering: Create a partitioner that takes items from the source enumerable one at a time and does not use intermediate storage that can be accessed more efficiently by multiple threads. This option provides support for low latency (items will be processed as soon as they are available from the source) and provides partial support for dependencies between items (a thread cannot deadlock waiting for an item that the thread itself is responsible for processing).
NotBuffered: Use a merge without output buffers. As soon as result elements have been computed, make that element available to the consumer of the query.

task background worker c#

Is there any change that a multiple Background Workers perform better than Tasks on 5 second running processes? I remember reading in a book that a Task is designed for short running processes.
The reasong I ask is this:
I have a process that takes 5 seconds to complete, and there are 4000 processes to complete. At first I did:
for (int i=0; i<4000; i++) {
Task.Factory.StartNewTask(action);
}
and this had a poor performance (after the first minute, 3-4 tasks where completed, and the console application had 35 threads). Maybe this was stupid, but I thought that the thread pool will handle this kind of situation (it will put all actions in a queue, and when a thread is free, it will take an action and execute it).
The second step now was to do manually Environment.ProcessorCount background workers, and all the actions to be placed in a ConcurentQueue. So the code would look something like this:
var workers = new List<BackgroundWorker>();
//initialize workers
workers.ForEach((bk) =>
{
bk.DoWork += (s, e) =>
{
while (toDoActions.Count > 0)
{
Action a;
if (toDoActions.TryDequeue(out a))
{
a();
}
}
}
bk.RunWorkerAsync();
});
This performed way better. It performed much better then the tasks even when I had 30 background workers (as much tasks as in the first case).
LE:
I start the Tasks like this:
public static Task IndexFile(string file)
{
Action<object> indexAction = new Action<object>((f) =>
{
Index((string)f);
});
return Task.Factory.StartNew(indexAction, file);
}
And the Index method is this one:
private static void Index(string file)
{
AudioDetectionServiceReference.AudioDetectionServiceClient client = new AudioDetectionServiceReference.AudioDetectionServiceClient();
client.IndexCompleted += (s, e) =>
{
if (e.Error != null)
{
if (FileError != null)
{
FileError(client,
new FileIndexErrorEventArgs((string)e.UserState, e.Error));
}
}
else
{
if (FileIndexed != null)
{
FileIndexed(client, new FileIndexedEventArgs((string)e.UserState));
}
}
};
using (IAudio proxy = new BassProxy())
{
List<int> max = new List<int>();
if (proxy.ReadFFTData(file, out max))
{
while (max.Count > 0 && max.First() == 0)
{
max.RemoveAt(0);
}
while (max.Count > 0 && max.Last() == 0)
{
max.RemoveAt(max.Count - 1);
}
client.IndexAsync(max.ToArray(), file, file);
}
else
{
throw new CouldNotIndexException(file, "The audio proxy did not return any data for this file.");
}
}
}
This methods reads from an mp3 file some data, using the Bass.net library. Then that data is sent to a WCF service, using the async method.
The IndexFile(string file) method, which creates tasks is called for 4000 times in a for loop.
Those two events, FileIndexed and FileError are not handled, so they are never thrown.
The reason why the performance for Tasks was so poor was because you mounted too many small tasks (4000). Remember the CPU needs to schedule the tasks as well, so mounting a lots of short-lived tasks causes extra work load for CPU. More information can be found in the second paragraph of TPL:
Starting with the .NET Framework 4, the TPL is the preferred way to
write multithreaded and parallel code. However, not all code is
suitable for parallelization; for example, if a loop performs only a
small amount of work on each iteration, or it doesn't run for many
iterations, then the overhead of parallelization can cause the code to
run more slowly.
When you used the background workers, you limited the number of possible alive threads to the ProcessCount. Which reduced a lot of scheduling overhead.
Given that you have a strictly defined list of things to do, I'd use the Parallel class (either For or ForEach depending on what suits you better). Furthermore you can pass a configuration parameter to any of these methods to control how many tasks are actually performed at the same time:
System.Threading.Tasks.Parallel.For(0, 20000, new ParallelOptions() { MaxDegreeOfParallelism = 5 }, i =>
{
//do something
});
The above code will perform 20000 operations, but will NOT perform more than 5 operations at the same time.
I SUSPECT the reason the background workers did better for you was because you had them created and instantiated at the start, while in your sample Task code it seems you're creating a new Task object for every operation.
Alternatively, did you think about using a fixed number of Task objects instantiated at the start and then performing a similar action with a ConcurrentQueue like you did with the background workers? That should also prove to be quite efficient.
Have you considered using threadpool?
http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx
If your performance is slower when using threads, it can only be due to threading overhead (allocating and destroying individual threads).

Categories