Batch process all items in ConcurrentBag

Batch process all items in ConcurrentBag - c#

I have the following use case. Multiple threads are creating data points which are collected in a ConcurrentBag. Every x ms a single consumer thread looks at the data points that came in since the last time and processes them (e.g. count them + calculate average).
The following code more or less represents the solution that I came up with:
private static ConcurrentBag<long> _bag = new ConcurrentBag<long>();
static void Main()
{
Task.Run(() => Consume());
var producerTasks = Enumerable.Range(0, 8).Select(i => Task.Run(() => Produce()));
Task.WaitAll(producerTasks.ToArray());
}
private static void Produce()
{
for (int i = 0; i < 100000000; i++)
{
_bag.Add(i);
}
}
private static void Consume()
{
while (true)
{
var oldBag = _bag;
_bag = new ConcurrentBag<long>();
var average = oldBag.DefaultIfEmpty().Average();
var count = oldBag.Count;
Console.WriteLine($"Avg = {average}, Count = {count}");
// Wait x ms
}
}
Is a ConcurrentBag the right tool for the job here?
Is switching the bags the right way to achieve clearing the list for new data points and then processing the old ones?
Is it safe to operate on oldBag or could I run into trouble when I iterate over oldBag and a thread is still adding an item?
Should I use Interlocked.Exchange() for switching the variables?
EDIT
I guess the above code was not really a good representation of what I'm trying to achieve. So here is some more code to show the problem:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly List<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new List<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
lock (_logMessageBuffer)
{
_logMessageBuffer.Add(logMessage);
}
}
public string GetBuffer()
{
lock (_logMessageBuffer)
{
var messages = string.Join(Environment.NewLine, _logMessageBuffer);
_logMessageBuffer.Clear();
return messages;
}
}
}
The class' purpose is to collect logs so they can be sent to a server in batches. Every x seconds GetBuffer is called. This should get the current log messages and clear the buffer for new messages. It works with locks but it as they are quite expensive I don't want to lock on every Logging-operation in my program. So that's why I wanted to use a ConcurrentBag as a buffer. But then I still need to switch or clear it when I call GetBuffer without loosing any log messages that happen during the switch.

Since you have a single consumer, you can work your way with a simple ConcurrentQueue, without swapping collections:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly ConcurrentQueue<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new ConcurrentQueue<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
_logMessageBuffer.Enqueue(logMessage);
}
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var messages = new StringBuilder();
while (count > 0 && _logMessageBuffer.TryDequeue(out var message))
{
messages.AppendLine(message);
count--;
}
return messages.ToString();
}
}
If memory allocations become an issue, you can instead dequeue them to a fixed-size array and call string.Join on it. This way, you're guaranteed to do only two allocations (whereas the StringBuilder could do many more if the initial buffer isn't properly sized):
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var buffer = new string[count];
for (int i = 0; i < count; i++)
{
_logMessageBuffer.TryDequeue(out var message);
buffer[i] = message;
}
return string.Join(Environment.NewLine, buffer);
}

Is a ConcurrentBag the right tool for the job here?
Its the right tool for a job, this really depends on what you are trying to do, and why. The example you have given is very simplistic without any context so its hard to tell.
Is switching the bags the right way to achieve clearing the list for
new data points and then processing the old ones?
The answer is no, for probably many reasons. What happens if a thread writes to it, while you are switching it?
Is it safe to operate on oldBag or could I run into trouble when I
iterate over oldBag and a thread is still adding an item?
No, you have just copied the reference, this will achieve nothing.
Should I use Interlocked.Exchange() for switching the variables?
Interlock methods are great things, however this will not help you in your current problem, they are for thread safe access to integer type values. You are really confused and you need to look up more thread safe examples.
However Lets point you in the right direction. forget about ConcurrentBag and those fancy classes. My advice is start simple and use locking so you understand the nature of the problem.
If you want multiple tasks/threads to access a list, you can easily use the lock statement and guard access to the list/array so other nasty threads aren't modifying it.
Obviously the code you have written is a nonsensical example, i mean you are just adding consecutive numbers to a list, and getting another thread to average them them. This hardly needs to be consumer producer at all, and would make more sense to just be synchronous.
At this point i would point you to better architectures that would allow you to implement this pattern, e.g Tpl Dataflow, but i fear this is just a learning excise and unfortunately you really need to do more reading on multithreading and try more examples before we can truly help you with a problem.

It works with locks but it as they are quite expensive. I don't want to lock on every logging-operation in my program.
Acquiring an uncontended lock is actually quite cheap. Quoting from Joseph Albahari's book:
You can expect to acquire and release a lock in as little as 20 nanoseconds on a 2010-era computer if the lock is uncontended.
Locking becomes expensive when it is contended. You can minimize the contention by reducing the work inside the critical region to the absolute minimum. In other words don't do anything inside the lock that can be done outside the lock. In your second example the method GetBuffer does a String.Join inside the lock, delaying the release of the lock and increasing the chances of blocking other threads. You can improve it like this:
public string GetBuffer()
{
string[] messages;
lock (_logMessageBuffer)
{
messages = _logMessageBuffer.ToArray();
_logMessageBuffer.Clear();
}
return String.Join(Environment.NewLine, messages);
}
But it can be optimized even further. You could use the technique of your first example, and instead of clearing the existing List<string>, just swap it with a new list:
public string GetBuffer()
{
List<string> oldList;
lock (_logMessageBuffer)
{
oldList = _logMessageBuffer;
_logMessageBuffer = new();
}
return String.Join(Environment.NewLine, oldList);
}
Starting from .NET Core 3.0, the Monitor class has the property Monitor.LockContentionCount, that returns the number of times there was contention at the entry point of a lock. You could watch the delta of this property every second, and see if the number is concerning. If you get single-digit numbers, there is nothing to worry about.
Touching some of your questions:
Is a ConcurrentBag the right tool for the job here?
No. The ConcurrentBag<T> is a very specialized collection intended for mixed producer scenarios, mainly object pools. You don't have such a scenario here. A ConcurrentQueue<T> is preferable to a ConcurrentBag<T> in almost all scenarios.
Should I use Interlocked.Exchange() for switching the variables?
Only if the collection was immutable. If the _logMessageBuffer was an ImmutableQueue<T>, then it would be excellent to swap it with Interlocked.Exchange. With mutable types you have no idea if the old collection is still in use by another thread, and for how long. The operating system can suspend any thread at any time for a duration of 10-30 milliseconds or even more (demo). So it's not safe to use lock-free techniques. You have to lock.

Related

Threads cause GUI to freeze up

So I'm not the most experienced with the C# programming language, however I've been making a few test applications here and there.
I've noticed that the more threads I create for an application I'm working on, the more my GUI starts to freeze. I'm not sure why this occurs, I previously thought that part of the point of multi-threading an application was to avoid the GUI from freezing up.
An explanation would be appreciated.
Also, here's the code I use to create the threads:
private void runThreads(int amount, ThreadStart address)
{
for (int i = 0; i < amount; i++)
{
threadAmount += 1;
Thread currentThread = new Thread(address);
currentThread.Start();
}
}
and here's what the threads run:
private void checkProxies()
{
while (started)
{
try
{
WebRequest request = WebRequest.Create("http://google.co.nz/");
request.Timeout = (int)timeoutCounter.Value * 1000;
request.Proxy = new WebProxy(proxies[proxyIndex]);
Thread.SetData(Thread.GetNamedDataSlot("currentProxy"), proxies[proxyIndex]);
if (proxyIndex != proxies.Length)
{
proxyIndex += 1;
}
else
{
started = false;
}
request.GetResponse();
workingProxies += 1;
}
catch (WebException)
{
deadProxies += 1;
}
lock ("threadAmount")
{
if (threadAmount > proxies.Length - proxyIndex)
{
threadAmount -= 1;
break;
}
}
}
}

While I cannot tell you why exactly your code is slowing the GUI down, there are a few things in your code you should do to make it all-round better. If the problem persist then, then it should be a lot easier to pinpoint the issue.
Creating Thread objects is expensive. This is why new classes were added in C# to better handle multi-threading. Now you have access to the Task class or the Parallel class (described below).
Judging from the comments, you're running a LOT of threads at the same time. While it shouldn't be an issue to just run them, you're not really getting much use out of them if what you're doing is firing WebRequests (unless you have an awesome network). Use multiple threads, sure, but limit their number.
A Task is great when you want to do a specific operation in the background. But when you want to repeat a single operation for a specific set of data in the background... why not use the System.Threading.Tasks.Parallel class? Specifically, Parallel.ForEach (where you can specify your list of proxies as the parameter). This method also lets you set up how many threads are supposed to run concurrently at any given moment by using ParallelOptions.
One more way to code would be to make use of the async and await keywords available in .NET 4.5. In this case your GUI (button press?) should invoke an async method.
Use thread-safe methods like Interlocked.Increment or Interlocked.Add to increase / decrease counters available from multiple threads. Also, consider that you could change your list of proxies to a ConcurrentDictionary<string, bool> (where bool indicates if the proxy works or not) and set the values without any worry as each thread will only access its own entry in the dictionary. You can easily queue the totals at the end using LINQ: dictionary.Where(q => q.Value).Count() to get the number of working proxies, for example. Of course other classes are also available, depending on how you want to tackle the issue - perhaps a Queue (or ConcurrentQueue)?
Your lock shouldn't really work... as in, it seems it's working by accident than by design in your code (thanks Luaan for the comment). But you really shouldn't do that. Consult the MSDN documentation on lock to get a better understanding of how it works. The Object created in the MSDN example isn't just for show.
You can also make the requests themselves multi-threaded by using BeginGetResponse and EndGetResponse methods. You can, in fact, combine this with the Task class for a much cleaner code (the Task class can convert Begin/End method pairs into a single Task object).
So, a quick recap - use the Parallel class for multi-threading, and use concurrency classes for keeping things in place.
Here's a quick example I wrote:
private ConcurrentDictionary<string, bool?> values = new ConcurrentDictionary<string, bool?>();
private async void Button_Click(object sender, RoutedEventArgs e)
{
var result = await CheckProxies();
label.Content = result.ToString();
}
async Task<int> CheckProxies()
{
//I don't actually HAVE a list of proxies, so I make up some data
for (int i = 0; i < 1000; i++)
values[Guid.NewGuid().ToString()] = null;
await Task.Factory.StartNew(() => Parallel.ForEach(values, new ParallelOptions() { MaxDegreeOfParallelism = 10 }, this.PeformOperation));
//note that with maxDegreeOfParallelism set to a high value (like 1000)
//then I'll get a TON of failed requests simply because I'm overloading the network
//either that or google thinks I'm DDOSing them... >_<
return values.Where(v => v.Value == true).Count();
}
void PeformOperation(KeyValuePair<string, bool?> kvp)
{
try
{
WebRequest request = WebRequest.Create("http://google.co.nz/");
request.Timeout = 100;
//I'm not actually setting up the proxy from kvp,
//because it's populated with bogus data
request.GetResponse();
values[kvp.Key] = true;
}
catch (WebException)
{
values[kvp.Key] = false;
}
}

Despite the other comments being correct in that you should use either the Task class, or better yet, the async API, that is not the reason you're locking up.
The line of code that is causing your threads to lock up is this:
request.Timeout = (int)timeoutCounter.Value * 1000;
I am assuming that timeoutCounter is a control on the WinForm - which is running on the main GUI thread.
In other words, your threads' code is trying to access a control which is not in it's own thread, which is not really "allowed", at least not so simply.
For example, this question shows how to do this, albeit most of the answers there are a bit dated.
From a quick google (okay who am I kidding, I binged it) I found this article that explains the problem rather well.

Proper thread pooling and parallelization with an execution graph in .NET .4.5.1

I have a current algorithm that goes like this.
public class Executor
{
private ParallelOptions options = new ParallelOptions();
private IList<Step> AllSteps;
public void Execute()
{
options.MaxDegreeOfParallelism = 4;
var rootSteps = AllSteps.Where(s => !s.Parents.Any());
Parallel.Foreach(rootSteps, options, RecursivelyExecuteStep);
}
private void RecursivelyExecuteStep(Step step)
{
ExecuteStep();
var childSteps = AllSteps.Where(s=>s.Parents.Contains(step)
&& step.Parents.All(p=>p.IsComplete);
Parallel.ForEach(childSteps, options, RecursivelyExecuteStep);
}
}
ParallelOptions.MaxDegreeOfParallelism will be an input variable (but left it out of the code example for brevity).
I was wondering if thread pooling is handled for me automatically or if this creates new threads every time. Also what's the best way to optimize this, is thread pooling something I want. How do I use thread pooling. I'm fairly new to multithreading and what's new in 4.5[.1]
Will this not limit the algorithm to only 4 threads because each Parallel.Foreach would have it's own MaxDegreeOfParallelism of 4 thus not limiting all the threads in the app to 4? How do I achieve limiting all threading in the app to 4?
Edit: MaxDegreeOfParallelism

You can solve this problem with TPL DataFlow library (you can get it via NuGet). As it is said in other answer, Parallel class is using the ThreadPool internally, and you should not be bothered with that.
With the TPL Dataflow the only thing you need is create an TransformManyBlock<TInput,TOutput> linked on itself (or link BufferBlock with ActionBlock with Encapsulate extension), and set the MaxDegreeOfParallelism = 4 or whatever constant you think it should be.

Parallel.Foreach basically is a nice way to queue up work items to the .NET ThreadPool.
Your application (process) has only one instance of the ThreadPool, and it tries to be as smart as possible regarding how many concurrent threads it uses, taking things like number of available cores and virtual memory into account.
So yes, the .NET ThreadPool handles thread pooling for you, and in many cases you don't need to worry about it, use Parallel.Foreach and let it get on with it.
EDIT: As noted by others, you should be careful in overusing the ThreadPool since it is a shared resource and it may disturb other parts of your application. It will also start creating new threads if your items are blocking or very long-running, which often is wasteful. A rule of thumb is that the work items should be relatively quick and preferably non-blocking. You should test and benchmark, and if it works for your use case then it is very convenient.
You can control the max number of concurrent threads used by the ThreadPool in your application if you want explicit control, by calling ThreadPool.SetMaxThreads. I'd advice against that unless you really have to though, and know what you are doing. The ThreadPool already tries to avoid using more concurrent threads than you have cores for example.
What you can do with ParallellOptions.MaxDegreeOfParallelism is only to further limit the number of concurrent ThreadPool threads that are used to execute that particular call to Parallel.Foreach.
If you need more explicit control of how many concurrent threads an invocation of your algorithm uses, here are some possible alternatives (in, arguably, increasing implementation complexity):
With the default ThreadPool you can't limit concurrency while calling Parellel.Foreach recursively. You could for example consider using Parallel.Foreach only on the top level (with a ParellelOptions.MaxDegreeOfParallelism argument) and let RecursivelyExecuteStep use a standard foreach.
Modify (or replace) the ThreadPool for your algorithm to limit concurrency by setting ParallelOptions.TaskScheduler to an instance of QueuedTaskScheduler from Parallel Extension Extras as described here.
As suggested by #VMAtm, you can use TPL Dataflow to get more
explicit control of how your computations are performed, including
concurrency (this can also be combined with a custom task scheduler if you
really want to knock yourself out).

A simple straightforward implementation could look like the following:
ParallelOptions Options = new ParallelOptions{MaxDegreeOfParallelism = 4};
IList<Step> AllSteps;
public void Execute()
{
var RemainingSteps = new HashSet<Step>(AllSteps);
while(RemainingSteps.Count > 0)
{
var ExecutableSteps = RemainingSteps.Where(s => s.Parents.All(p => p.IsComplete)).ToList();
Parallel.ForEach(ExecutableSteps, Options, ExecuteStep);
RemainingSteps.ExceptWith(ExecutableSteps);
}
}
Granted, this will execute steps in phases, so you will not always have maximum concurrency. You may only be executing one step at the end of each phase, since the next steps to execute are only realized after all steps in the current phase complete.
If you want to improve concurrency, I would suggest using a BlockingCollection. You'll need to implement a custom partitioner to use Parallel.ForEach against the blocking collection in this case. You'll also want a concurrent collection of the remaining steps, so that you don't queue the same step multiple times (the race condition previously commented on).
public class Executor
{
ParallelOptions Options = new ParallelOptions() { MaxDegreeOfParallelism = 4 };
IList<Step> AllSteps;
//concurrent hashset of remaining steps (used to prevent race conditions)
ConcurentDictionary<Step, Step> RemainingSteps = new ConcurentDictionary<Step, Step>();
//blocking collection of steps that can execute next
BlockingCollection<Step> ExecutionQueue = new BlockingCollection<Step>();
public void Execute()
{
foreach(var step in AllSteps)
{
if(step.Parents.All(p => p.IsComplete))
{
ExecutionQueue.Add(step);
}
else
{
RemainingSteps.Add(step, step);
}
}
Parallel.ForEach(
GetConsumingPartitioner(ExecutionQueue),
Options,
Execute);
}
void Execute(Step step)
{
ExecuteStep(step);
if(RemainingSteps.IsEmpty)
{
//we're done, all steps are complete
executionQueue.CompleteAdding();
return;
}
//queue up the steps that can execute next (concurrent dictionary enumeration returns a copy, so subsequent removal is safe)
foreach(var step in RemainingSteps.Values.Where(s => s.Parents.All(p => p.IsComplete)))
{
//note, removal only occurs once, so this elimiates the race condition
Step NextStep;
if(RemainingSteps.TryRemove(step, out NextStep))
{
executionQueue.Add(NextStep);
}
}
}
Partitioner<T> GetConsumingPartitioner<T>(BlockingCollection<T> collection)
{
return new BlockingCollectionPartitioner<T>(collection);
}
class BlockingCollectionPartitioner<T> : Partitioner<T>
{
readonly BlockingCollection<T> Collection;
public BlockingCollectionPartitioner(BlockingCollection<T> collection)
{
if (collection == null) throw new ArgumentNullException("collection");
Collection = collection;
}
public override bool SupportsDynamicPartitions { get { return true; } }
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
if (partitionCount < 1) throw new ArgumentOutOfRangeException("partitionCount");
var Enumerable = GetDynamicPartitions();
return Enumerable.Range(0, partitionCount)
.Select(i => Enumerable.GetEnumerator()).ToList();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return Collection.GetConsumingEnumerable();
}
}
}

Anyway to Parallel Yield c#

I have multiple enumerators that enumerate over flat files. I originally had each enumerator in a Parallel Invoke and each Action was adding to a BlockingCollection<Entity> and that collections was returning a ConsumingEnumerable();
public interface IFlatFileQuery
{
IEnumerable<Entity> Run();
}
public class FlatFile1 : IFlatFileQuery
{
public IEnumerable<Entity> Run()
{
// loop over a flat file and yield each result
yield return Entity;
}
}
public class Main
{
public IEnumerable<Entity> DoLongTask(ICollection<IFlatFileQuery> _flatFileQueries)
{
// do some other stuff that needs to be returned first:
yield return Entity;
// then enumerate and return the flat file data
foreach (var entity in GetData(_flatFileQueries))
{
yield return entity;
}
}
private IEnumerable<Entity> GetData(_flatFileQueries)
{
var buffer = new BlockingCollection<Entity>(100);
var actions = _flatFileQueries.Select(fundFileQuery => (Action)(() =>
{
foreach (var entity in fundFileQuery.Run())
{
buffer.TryAdd(entity, Timeout.Infinite);
}
})).ToArray();
Task.Factory.StartNew(() =>
{
Parallel.Invoke(actions);
buffer.CompleteAdding();
});
return buffer.GetConsumingEnumerable();
}
}
However after a bit of testing it turns out that the code change below is about 20-25% faster.
private IEnumerable<Entity> GetData(_flatFileQueries)
{
return _flatFileQueries.AsParallel().SelectMany(ffq => ffq.Run());
}
The trouble with the code change is that it waits till all flat file queries are enumerated before it returns the whole lot that can then be enumerated and yielded.
Would it be possible to yield in the above bit of code somehow to make it even faster?
I should add that at most the combined results of all the flat file queries might only be 1000 or so Entities.
Edit:
Changing it to the below doesn't make a difference to the run time. (R# even suggests to go back to the way it was)
private IEnumerable<Entity> GetData(_flatFileQueries)
{
foreach (var entity in _flatFileQueries.AsParallel().SelectMany(ffq => ffq.Run()))
{
yield return entity;
}
}

The trouble with the code change is that it waits till all flat file queries are enumerated before it returns the whole lot that can then be enumerated and yielded.
Let's prove that it's false by a simple example. First, let's create a TestQuery class that will yield a single entity after a given time. Second, let's execute several test queries in parallel and measure how long it took to yield their result.
public class TestQuery : IFlatFileQuery {
private readonly int _sleepTime;
public IEnumerable<Entity> Run() {
Thread.Sleep(_sleepTime);
return new[] { new Entity() };
}
public TestQuery(int sleepTime) {
_sleepTime = sleepTime;
}
}
internal static class Program {
private static void Main() {
Stopwatch stopwatch = Stopwatch.StartNew();
var queries = new IFlatFileQuery[] {
new TestQuery(2000),
new TestQuery(3000),
new TestQuery(1000)
};
foreach (var entity in queries.AsParallel().SelectMany(ffq => ffq.Run()))
Console.WriteLine("Yielded after {0:N0} seconds", stopwatch.Elapsed.TotalSeconds);
Console.ReadKey();
}
}
This code prints:
Yielded after 1 seconds
Yielded after 2 seconds
Yielded after 3 seconds
You can see with this output that AsParallel() will yield each result as soon as its available, so everything works fine. Note that you might get different timings depending on the degree of parallelism (such as "2s, 5s, 6s" with a degree of parallelism of 1, effectively making the whole operation not parallel at all). This output comes from an 4-cores machine.
Your long processing will probably scale with the number of cores, if there is no common bottleneck between the threads (such as a shared locked resource). You might want to profile your algorithm to see if there are slow parts that can be improved using tools such as dotTrace.

I don't think there is a red flag in your code anywhere. There are no outrageous inefficiencies. I think it comes down to multiple smaller differences.
PLINQ is very good at processing streams of data. Internally, it works more efficiently than adding items to a synchronized list one-by-one. I suspect that your calls to TryAdd are a bottleneck because each call requires at least two Interlocked operations internally. Those can put enormous load on the inter-processor memory bus because all threads will compete for the same cache line.
PLINQ is cheaper because internally, it does some buffering. I'm sure it doesn't output items one-by-one. Probably it batches them and amortizes sycnhronization cost that way over multiple items.
A second issue would be the bounded capacity of the BlockingCollection. 100 is not a lot. This might lead to a lot of waiting. Waiting is costly because it requires a call to the kernel and a context switch.

I make this alternative that works good for me in any scenario:
This works for me:
In a Task in a Parallel.Foreach Enqueue in a ConcurrentQueue the item
transformed to be processed.
The task has a continue that marks a
flag with that task ends.
In the same thread of execution with tasks
ends a while dequeue and yields
Fast and excellent results for me:
Task.Factory.StartNew (() =>
{
Parallel.ForEach<string> (TextHelper.ReadLines(FileName), ProcessHelper.DefaultParallelOptions,
(string currentLine) =>
{
// Read line, validate and enqeue to an instance of FileLineData (custom class)
});
}).
ContinueWith
(
ic => isCompleted = true
);
while (!isCompleted || qlines.Count > 0)
{
if (qlines.TryDequeue (out returnLine))
{
yield return returnLine;
}
}

By default the ParallelQuery class, when is working on IEnumerable<T> sources, employs a partitioning strategy known as "chunk partitioning". With this strategy each worker thread grabs a progressively larger number of items each time. This means that it has an input buffer. Then the results are accumulated into an output buffer, having a size chosen by the system, before they are available to the consumer of the query. You can disable both buffers by using the configuration options EnumerablePartitionerOptions.NoBuffering and ParallelMergeOptions.NotBuffered.
private IEnumerable<Entity> GetData(ICollection<IFlatFileQuery> flatFileQueries)
{
return Partitioner
.Create(flatFileQueries, EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.AsOrdered()
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.SelectMany(ffq => ffq.Run());
}
This way each worker thread will grab only one item at a time, and will propagate the result as soon as it is computed.
NoBuffering: Create a partitioner that takes items from the source enumerable one at a time and does not use intermediate storage that can be accessed more efficiently by multiple threads. This option provides support for low latency (items will be processed as soon as they are available from the source) and provides partial support for dependencies between items (a thread cannot deadlock waiting for an item that the thread itself is responsible for processing).
NotBuffered: Use a merge without output buffers. As soon as result elements have been computed, make that element available to the consumer of the query.

Multithreading speed issue

I added multithreading part to my code .
public class ThreadClassSeqGroups
{
public Dictionary<string, string> seqGroup;
public Dictionary<string, List<SearchAlgorithm.CandidateStr>> completeModels;
public Dictionary<string, List<SearchAlgorithm.CandidateStr>> partialModels;
private Thread nativeThread;
public ThreadClassSeqGroups(Dictionary<string, string> seqs)
{
seqGroup = seqs;
completeModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
partialModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
}
public void Run(DescrStrDetail dsd, DescrStrDetail.SortUnit primarySeedSu,
List<ushort> secondarySeedOrder, double partialCutoff)
{
nativeThread = new Thread(() => this._run(dsd, primarySeedSu, secondarySeedOrder, partialCutoff));
nativeThread.Priority = ThreadPriority.Highest;
nativeThread.Start();
}
public void _run(DescrStrDetail dsd, DescrStrDetail.SortUnit primarySeedSu,
List<ushort> secondarySeedOrder, double partialCutoff)
{
int groupSize = this.seqGroup.Count;
int seqCount = 0;
foreach (KeyValuePair<string, string> p in seqGroup)
{
Console.WriteLine("ThreadID {0} (priority:{1}):\t#{2}/{3} SeqName: {4}",
nativeThread.ManagedThreadId, nativeThread.Priority.ToString(), ++seqCount, groupSize, p.Key);
List<SearchAlgorithm.CandidateStr> tmpCompleteModels, tmpPartialModels;
SearchAlgorithm.SearchInBothDirections(
p.Value.ToUpper().Replace('T', 'U'), dsd, primarySeedSu, secondarySeedOrder, partialCutoff,
out tmpCompleteModels, out tmpPartialModels);
completeModels.Add(p.Key, tmpCompleteModels);
partialModels.Add(p.Key, tmpPartialModels);
}
}
public void Join()
{
nativeThread.Join();
}
}
class Program
{
public static int _paramSeqGroupSize = 2000;
static void Main(Dictionary<string, string> rawSeqs)
{
// Split the whole rawSeqs (Dict<name, seq>) into several groups
Dictionary<string, string>[] rawSeqGroups = SplitSeqFasta(rawSeqs, _paramSeqGroupSize);
// Create a thread for each seqGroup and run
var threadSeqGroups = new MultiThreading.ThreadClassSeqGroups[rawSeqGroups.Length];
for (int i = 0; i < rawSeqGroups.Length; i++)
{
threadSeqGroups[i] = new MultiThreading.ThreadClassSeqGroups(rawSeqGroups[i]);
//threadSeqGroups[i].SetPriority();
threadSeqGroups[i].Run(dsd, primarySeedSu, secondarySeedOrder, _paramPartialCutoff);
}
// Merge results from threads after the thread finish
var allCompleteModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
var allPartialModels = new Dictionary<string, List<SearchAlgorithm.CandidateStr>>();
foreach (MultiThreading.ThreadClassSeqGroups t in threadSeqGroups)
{
t.Join();
foreach (string name in t.completeModels.Keys)
{
allCompleteModels.Add(name, t.completeModels[name]);
}
foreach (string name in t.partialModels.Keys)
{
allPartialModels.Add(name, t.partialModels[name]);
}
}
}
}
However, the speed with multiple threads is much slower than single thread, and the CPU load is generally <10%.
For example:
The input file contain 2500 strings
_paramGroupSize = 3000, main thread + 1 calculation thread cost 200 sec
_paramGroupSize = 400, main thread + 7 calculation threads cost much more time (I killed it after over 10 mins run).
Is there any problem with my implementation? How to speed it up?
Thanks.

It seems to me that you are trying to process a file in parallel with multiple threads. This is a bad idea, assuming you have a single mechanical disk.
Basically, the head of the disk needs to seek the next reading location for each read request. This is a costly operation and since multiple threads issue read commands it means the head gets bounced around as each thread gets its turn to run. This will drastically reduce performance compared to the case where a single thread is doing the reading.

What was the code prior to multithreading? It's hard to tell what this code is doing, and much of the "working" code seems to be hidden in your search algorithm. However, some thoughts:
You mention an "input file", but this is not clearly shown in code - if your file access is being threaded, this will not increase performance as the file access will be the bottleneck.
Creating more threads than you have CPU cores will ultimately reduce performance (unless each thread is blocked waiting on different resources). In your case I would suggest that 8 total threads is too many.
It seems that a lot of data (memory) access might be done through your class DescrStrDetail which is passed from variable dsd in your Main method to every child thread. However, the declaration of this variable is missing and so its usage/implementation is unknown. If this variable has locks that prevent multiple threads accessing at the same time, then your multiple threads will potentially be locking eachother out of this data, further slowing performance.

When threads are run they are given time on a specific processor. if there are more threads than processors, the system context switches between threads to get all active threads some time to process. Context switching is really expensive. If you have more threads than processors most of the CPU time can be take up by context switching and make a single-threaded solution look faster than a multi thread solution.
Your example shows starting an indeterminate number of threads. if SplitSeqFasta returns more entries than cores, you will create more threads and cores and introduce a lot of context switching.
I suggest you throttle the number of threads manually, or use something like the thread parallel library and the Parallel class to have it automatically throttle for you.

C# - Pass data back from ThreadPool thread to main thread

Current implementation: Waits until parallelCount values are collected, uses ThreadPool to process the values, waits until all threads complete, re-collect another set of values and so on...
Code:
private static int parallelCount = 5;
private int taskIndex;
private object[] paramObjects;
// Each ThreadPool thread should access only one item of the array,
// release object when done, to be used by another thread
private object[] reusableObjects = new object[parallelCount];
private void MultiThreadedGenerate(object paramObject)
{
paramObjects[taskIndex] = paramObject;
taskIndex++;
if (taskIndex == parallelCount)
{
MultiThreadedGenerate();
// Reset
taskIndex = 0;
}
}
/*
* Called when 'paramObjects' array gets filled
*/
private void MultiThreadedGenerate()
{
int remainingToGenerate = paramObjects.Count;
resetEvent.Reset();
for (int i = 0; i < paramObjects.Count; i++)
{
ThreadPool.QueueUserWorkItem(delegate(object obj)
{
try
{
int currentIndex = (int) obj;
Generate(currentIndex, paramObjects[currentIndex], reusableObjects[currentIndex]);
}
finally
{
if (Interlocked.Decrement(ref remainingToGenerate) == 0)
{
resetEvent.Set();
}
}
}, i);
}
resetEvent.WaitOne();
}
I've seen significant performance improvements with this approach, however there are a number of issues to consider:
[1] Collecting values in paramObjects and synchronization using resetEvent can be avoided as there is no dependency between the threads (or current set of values with the next set of values). I'm only doing this to manage access to reusableObjects (when a set paramObjects is done processing, I know that all objects in reusableObjects are free, so taskIndex is reset and each new task of the next set of values will have its unique 'reusableObj' to work with).
[2] There is no real connection between the size of reusableObjects and the number of threads the ThreadPool uses. I might initialize reusableObjects to have 10 objects, and say due to some limitations, ThreadPool can run only 3 threads for my MultiThreadedGenerate() method, then I'm wasting memory.
So by getting rid of paramObjects, how can the above code be refined in a way that as soon as one thread completes its job, that thread returns its taskIndex(or the reusableObj) it used and no longer needs so that it becomes available to the next value. Also, the code should create a reUsableObject and add it to some collection only when there is a demand for it. Is using a Queue here a good idea ?
Thank you.

There's really no reason to do your own manual threading and task management any more. You could restructure this to a more loosely-coupled model using Task Parallel Library (and possibly System.Collections.Concurrent for result collation).
Performance could be further improved if you don't need to wait for a full complement of work before handing off each Task for processing.
TPL came along in .Net 4.0 but was back-ported to .Net 3.5. Download here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.