I am using TPL a lot and have large data flow pipeline structure.
As part of the pipeline network I want to write some data to azure blob storage. We have a lot of data therefore we have 4 storage accounts and we want to distribute the data evenly between them.
Wanted to continue using the dataflow pipeline pattern therefore I want to implement a SourceBlock that if I link it to several target blocks it will send the messages to them with round robin. BufferBlock is not good enough because he is sending the message to the first block that accepts it, and assuming all the target blocks have large bounded capacity - all the messages will go to the first target block. BroadcastBlock is not good as well because I don`t want duplicates.
Any recommendations? Implementing the ISourceBlock interface with the round robing behavior seems not so simple and I wondered if there are simpler solutions out there? or any extensions to TPL that I am not familar with?
Are you aware of possibility to link the blocks with a predicate? This is a very simple and not well tested solution for sample:
var buffer = new BufferBlock<int>();
var consumer1 = new ActionBlock<int>(i => Console.WriteLine($"First: {i}"));
var consumer2 = new ActionBlock<int>(i => Console.WriteLine($"Second: {i}"));
var consumer3 = new ActionBlock<int>(i => Console.WriteLine($"Third: {i}"));
var consumer4 = new ActionBlock<int>(i => Console.WriteLine($"Forth: {i}"));
buffer.LinkTo(consumer1, i => Predicate(0));
buffer.LinkTo(consumer2, i => Predicate(1));
buffer.LinkTo(consumer3, i => Predicate(2));
buffer.LinkTo(consumer4, i => Predicate(3));
buffer.LinkTo(DataflowBlock.NullTarget<int>());
for (var i = 0; i < 10; ++i)
{
buffer.Post(i);
}
buffer.Completion.Wait();
One of the outputs:
Third: 2
First: 0
Forth: 3
Second: 1
Second: 5
Second: 9
Third: 6
Forth: 7
First: 4
First: 8
What is going on here is you're maintaining the number of operation, and if current is suitable for the consumer, we just increment that. Note that you still should link the block without any predicate at least once for avoiding the memory issues (also, it's a good idea to test the round robin with block which do monitor the lost messages).
Related
Is it possible to get TransformManyBlocks to send intermediate results as they are created to the next step instead if waiting for the entire IEnumerable<T> to be filled?
All testing I've done shows that TransformManyBlock only sends a result to the next block when it is finished; the next block then reads those items one at a time.
It seems like basic functionality but I can't find any examples of this anywhere.
The use case is processing chunks of a file as they are read. In my case there's a modulus of so many lines needed before I can process anything so a direct stream won't work.
They kludge I've come up with is to create two pipelines:
a "processing" dataflow network the processes the chunks of data as the become available
"producer" dataflow network that ends where the file is broken into
chunks then posted to the start of the "processing" network that actually transforms the data.
The "producer" network needs to be seeded with the starting point of the "processing" network.
Not a good long term solution since additional processing options will be needed and it's not flexible.
Is it possible to have any dataflow block type to send multiple intermediate results as created to a single input? Any pointers to working code?
You probably need to create your IEnumerables by using an iterator. This way an item will be propagated downstream after every yield command. The only problem is that yielding from lambda functions is not supported in C#, so you'll have to use a local function instead. Example:
var block = new TransformManyBlock<string, string>(filePath => ReadLines(filePath));
IEnumerable<string> ReadLines(string filePath)
{
string[] lines = File.ReadAllLines(filePath);
foreach (var line in lines)
{
yield return line; // Immediately offered to any linked block
}
}
In a LINQ Query, I have used .AsParallel as follows:
var completeReservationItems = from rBase in reservation.AsParallel()
join rRel in relationship.AsParallel() on rBase.GroupCode equals rRel.SourceGroupCode
join rTarget in reservation.AsParallel() on rRel.TargetCode equals rTarget.GroupCode
where rRel.ProgramCode == programCode && rBase.StartDate <= rTarget.StartDate && rBase.EndDate >= rTarget.EndDate
select new Object
{
//Initialize based on the query
};
Then, I have created two separate Tasks and was running them in parallel, passing the same Lists to both the methods as follows:
Task getS1Status = Task.Factory.StartNew(
() =>
{
RunLinqQuery(params);
});
Task getS2Status = Task.Factory.StartNew(
() =>
{
RunLinqQuery(params);
});
Task.WaitAll(getS1Status, getS2Status);
I was capturing the timings and was surprised to see that the timings were as follows:
Above scenario: 6 sec (6000 ms)
Same code, running sequentially instead of 2 Tasks: 50 ms
Same code, but without .AsParallel() in the LINQ: 50 ms
I wanted to understand why this is taking so long in the above scenario.
Posting this as answer only because I have some code to show.
Firstly, I dont know how many threads will be created with AsParallel(). Documentation dont say anything about it https://msdn.microsoft.com/en-us/library/dd413237(v=vs.110).aspx
Imagine following code
void RunMe()
{
foreach (var threadId in Enumerable.Range(0, 100)
.AsParallel()
.Select(x => Thread.CurrentThread.ManagedThreadId)
.Distinct())
Console.WriteLine(threadId);
}
How much thread's ids we will see? For me each time will see different number of threads, example output:
30 // only one thread!
Next time
27 // several threads
13
38
10
43
30
I think, number of threads depends of current scheduler. We can always define maximum number of threads by calling WithDegreeOfParallelism (https://msdn.microsoft.com/en-us/library/dd383719(v=vs.110).aspx) method, example
void RunMe()
{
foreach (var threadId in Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(2)
.Select(x => Thread.CurrentThread.ManagedThreadId)
.Distinct())
Console.WriteLine(threadId);
}
Now, output will contains maximum 2 threads.
7
40
Why this important? As I said, number of threads can directly influence on performance.
But, this is not all problems. In your 1 scenario, you are creating new tasks (which will perform inside thread pool and can add additional overhead), and then, you are calling Task.WaitAll. Take a look on source code for it https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Task.cs,72b6b3fa5eb35695 , Im sure that those for loop by task will add additional overhead, and, in situation when AsParallel will take too much threads inside first task, next task can start continiously. Moreover, this CAN be happen, so, if you will run your 1 scenario 1000 times, probably, you will get very different results.
So, my last argument that you try to measure parallel code, but it is very hard to do it right. Im not recommend to use parallel stuff as much as you can, because it can raise performance degradation, if you dont know exactly, what are you doing.
We have an application, wherein we have a materialized array of items which we are going to process through a Reactive pipeline. It looks a little like this
EventLoopScheduler eventLoop = new EventLoopScheduler();
IScheduler concurrency = new TaskPoolScheduler(
new TaskFactory(
new LimitedConcurrencyLevelTaskScheduler(threadCount)));
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
// 1. transform on single thread
IConnectableObservable<byte[]> source =
numbers.Select(Transform).ToObservable(eventLoop).Publish();
// 2. naive parallelization, restricts parallelization to Work
// only; chunk up sequence into smaller sequences and process
// in parallel, merging results
IObservable<int> final = source.
Buffer(10).
Select(
batch =>
batch.
ToObservable(concurrency).
Buffer(10).
Select(
concurrentBatch =>
concurrentBatch.
Select(Work).
ToArray().
ToObservable(eventLoop)).
Merge()).
Merge();
final.Subscribe();
source.Connect();
Await(final).Wait();
If you are really curious to play with this, the stand-in methods look like
private async static Task Await(IObservable<int> final)
{
await final.LastOrDefaultAsync();
}
private static byte[] Transform(int number)
{
if (number == itemCount)
{
Console.WriteLine("numbers exhausted.");
}
byte[] buffer = new byte[1000000];
Buffer.BlockCopy(bloat, 0, buffer, 0, bloat.Length);
return buffer;
}
private static int Work(byte[] buffer)
{
Console.WriteLine("t {0}.", Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(50);
return 1;
}
A little explanation. Range(1, itemCount) simulates raw inputs, materialized from a data-source. Transform simulates an enrichment process each input must go through, and results in a larger memory footprint. Work is a "lengthy" process which operates on the transformed input.
Ideally, we want to minimize the number of transformed inputs held concurrently by the system, while maximizing throughput by parallelizing Work. The number of transformed inputs in memory should be batch size (10 above) times concurrent work threads (threadCount).
So for 5 threads, we should retain 50 Transform items at any given time; and if, as here, the transform is a 1MB byte buffer, then we would expect memory consumption to be at about 50MB throughout the run.
What I find is quite different. Namely that Reactive is eagerly consuming all numbers, and Transform them up front (as evidenced by numbers exhausted. message), resulting in a massive memory spike up front (#1GB for 1000 itemCount).
My basic question is: Is there a way to achieve what I need (ie minimized consumption, throttled by multi-threaded batching)?
UPDATE: sorry for reversal James; at first, i did not think paulpdaniels and Enigmativity's composition of Work(Transform) applied (this has to do with the nature of our actual implementation, which is more complex than the simple scenario provided above), however, after some further experimentation, i may be able to apply the same principles: ie defer Transform until batch executes.
You have made a couple of mistakes with your code that throws off all of your conclusions.
First up, you've done this:
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
You've used Enumerable.Range which means that when you call numbers.Select(Transform) you are going to burn through all of the numbers as fast as a single thread can take it. Rx hasn't even had a chance to do any work because up till this point your pipeline is entirely enumerable.
The next issue is in your subscriptions:
final.Subscribe();
source.Connect();
Await(final).Wait();
Because you call final.Subscribe() & Await(final).Wait(); you are creating two separate subscriptions to the final observable.
Since there is a source.Connect() in the middle the second subscription may be missing out on values.
So, let's try to remove all of the cruft that's going on here and see if we can work things out.
If you go down to this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.Select(bs => Work(bs));
Things work well. The numbers get exhausted right at the end, and processing 20 items on my machine takes about 1 second.
But this is processing everything in sequence. And the Work step provides back-pressure on Transform to slow down the speed at which it consumes the numbers.
Let's add concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs)));
This processes 20 items in 0.284 seconds, and the numbers exhaust themselves after 5 items are processed. There is no longer any back-pressure on the numbers. Basically the scheduler is handing all of the work to the Observable.Start so it is ready for the next number immediately.
Let's reduce the concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs), concurrency));
Now the 20 items get processed in 0.5 seconds. Only two get processed before the numbers are exhausted. This makes sense as we've limited concurrency to two threads. But still there's no back pressure on the consumption of the numbers so they get chewed up pretty quickly.
Having said all of this, I tried to construct a query with the appropriate back pressure, but I couldn't find a way. The crux comes down to the fact that Transform(...) performs far faster than Work(...) so it completes far more quickly.
So then the obvious move for me was this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.SelectMany(n => Observable.Start(() => Work(Transform(n)), concurrency));
This doesn't complete the numbers until the end, and it limits processing to two threads. It appears to do the right thing for what you want, except that I've had to do Work(Transform(...)) together.
The very fact that you want to limit the amount of work you are doing suggests you should be pulling data, not having it pushed at you. I would forget using Rx in this scenario, as fundamentally, what you have described is not a reactive application. Also, Rx is best suited processing items serially; it uses sequential event streams.
Why not just keep your data source enumerable, and use PLinq, Parallel.ForEach or DataFlow? All of those sound better suited for your problem.
As #JamesWorld said it may very well be that you want to use PLinq to perform this task, it really depends on if you are actually reacting to data in your real scenario or just iterating through it.
If you choose to go the Reactive route you can use Merge to control the level of parallelization occurring:
var source = numbers
.Select(n =>
Observable.Defer(() => Observable.Start(() => Work(Transform(n)), concurrency)))
//Maximum concurrency
.Merge(10)
//Schedule all the output back onto the event loop scheduler
.ObserveOn(eventLoop);
The above code will consume all the numbers first (sorry no way to avoid that), however, by wrapping the processing in a Defer and following it up with a Merge that limits parallelization, only x number of items can be in flight at a time. Start() takes a scheduler as the second argument which it uses to execute to the provided method. Finally, Since you are basically just pushing the values of Transform into Work I composed them within the Start method.
As a side note, you can await an Observable and it will be equivalent to the code you have, i.e:
await source; //== await source.LastAsync();
I need proccess several lines from a database (can be millions) in parallel in c#. The processing is quite quick (50 or 150ms/line) but I can not know this speed before runtime as it depends on hardware/network.
The ThreadPool or the newer TaskParallelLibrary seems to be what feets my needs as I am new to threading and want to get the most efficient way to process the data.
However these methods does not provide a way to control the speed execution of my tasks (lines/minute) : I want to be able to set a maximum speed limit for the processing or run it full speed.
Please note that setting the number of thread of the ThreadPool/TaskFactory does not provide sufficient accuracy for my needs as I would like to be able to set a speed limit below the 'one thread speed'.
Using a custom sheduler for the TPL seems to be a way to do that, but I did not find a way to implement it.
Furthermore, I'm worried about the efficiency cost that would take such a setup.
Could you provide me a way or advices how to achieve this work ?
Thanks in advance for your answers.
The TPL provides a convenient programming abstraction on top of the Thread Pool. I would always select TPL when that is an option.
If you wish to throttle the total processing speed, there's nothing built-in that would support that.
You can measure the total processing speed as you proceed through the file and regulate speed by introducing (non-spinning) delays in each thread. The size of the delay can be dynamically adjusted in your code based on observed processing speed.
I am not seeing the advantage of limiting a speed, but I suggest you look into limiting max degree of parallalism of the operation. That can be done via MaxDegreeOfParallelism in the ParalleForEach options property as the code works over the disparate lines of data. That way you can control the slots, for lack of a better term, which can be expanded or subtracted depending on the criteria which you are working under.
Here is an example using the ConcurrentBag to process lines of disperate data and to use 2 parallel tasks.
var myLines = new List<string> { "Alpha", "Beta", "Gamma", "Omega" };
var stringResult = new ConcurrentBag<string>();
ParallelOptions parallelOptions = new ParallelOptions();
parallelOptions.MaxDegreeOfParallelism = 2;
Parallel.ForEach( myLines, parallelOptions, line =>
{
if (line.Contains( "e" ))
stringResult.Add( line );
} );
Console.WriteLine( string.Join( " | ", stringResult ) );
// Outputs Beta | Omega
Note that parallel options also has a TaskScheduler property which you can refine more of the processing. Finally for more control, maybe you want to cancel the processing when a specific threshold is reached? If so look into CancellationToken property to exit the process early.
Batch
read text from file or SQL
parse the text into words
load the words into SQL
Today
.NET 4.0
Step 1 is very fast.
Steps 2 and 3 are about the same length (avg 0.1 second) for the same size file.
On step 3 insert using BackGroundWorker and wait for last to complete.
Everything else is on the main thread.
On a big load will do this several million times.
Need step 3 to be serial and in the same order as 1.
This is to keep the SQL table PK index from fracturing.
Tried step 3 in parallel and fracturing the index killed it.
This data is fed sorted by the PK.
Other indexes are dropped at the start of the load then rebuilt at the end of the load.
Where this process is not effective is when the size of text changes.
And the size of the text from file to file does change drastically.
What I would like is to queue 1 and 2 so 3 is kept as busy as possible.
Need step 3 to dequeue the files in order they were enqueued in 1 (even if it waits).
Need a maximum queue size for memory management (like 4-10).
Would like to have step 2 parallel with up to 4 concurrent.
Moving to .NET 4.5.
Asking for general guidance on how to implement this?
I am learning that this is a producer consumer pattern.
If this is not a producer consumer pattern please let me know so I can change the title.
I think TPL Dataflow would be a good way to do this:
For step 2, you would use a TransformBlock with MaxDegreeOfParallelism set to 4 and BoundedCapacity also set to 4, so that its queues are empty when working. It will produce the items in the same order as they came in, you don't have to do anything special for that. For step 3, use an ActionBlock, with BoundedCapacity set to your limit. Then link the two together and start sending items to the TransformBlock, ideally using something like await stepTwoBlock.SendAsync(…), to asynchronously wait if the queue is full.
In code, it would look something like:
async Task ProcessData()
{
var stepTwoBlock = new TransformBlock<OriginalText, ParsedText>(
text => Parse(text),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 4,
BoundedCapacity = 4
});
var stepThreeBlock = new ActionBlock<ParsedText>(
text => LoadIntoDatabase(text),
new ExecutionDataflowBlockOptions { BoundedCapacity = 10 });
stepTwoBlock.LinkTo(
stepThreeBlock, new DataflowLinkOptions { PropagateCompletion = true });
// this is step one:
foreach (var id in IdsToProcess)
{
OriginalText text = ReadText(id);
await stepTwoBlock.SendAsync(text);
}
stepTwoBlock.Complete();
await stepThreeBlock.Completion;
}