I'm writing a batching pipeline that processes X outstanding operations every Y seconds. It feels like System.Reactive would be a good fit for this, but I'm not able to get the subscriber to execute in parallel. My code looks like this:
var subject = new Subject<int>();
var concurrentCount = 0;
using var reader = subject
.Buffer(TimeSpan.FromSeconds(1), 100)
.Subscribe(list =>
{
var c = Interlocked.Increment(ref concurrentCount);
if (c > 1) Console.WriteLine("Executing {0} simultaneous batches", c); // This never gets printed, because Subscribe is only ever called on a single thread.
Interlocked.Decrement(ref concurrentCount);
});
Parallel.For(0, 1_000_000, i =>
{
subject.OnNext(i);
});
subject.OnCompleted();
Is there an elegant way to read from this buffered Subject, in a concurrent manner?
The Rx subscription code is always synchronous¹. What you need to do is to remove the processing code from the Subscribe delegate, and make it a side-effect of the observable sequence. Here is how it can be done:
Subject<int> subject = new();
int concurrentCount = 0;
Task processor = subject
.Buffer(TimeSpan.FromSeconds(1), 100)
.Select(list => Observable.Defer(() => Observable.Start(() =>
{
int c = Interlocked.Increment(ref concurrentCount);
if (c > 1) Console.WriteLine($"Executing {c} simultaneous batches");
Interlocked.Decrement(ref concurrentCount);
})))
.Merge(maxConcurrent: 2)
.DefaultIfEmpty() // Prevents exception in corner case (empty source)
.ToTask(); // or RunAsync (either one starts the processor)
for (int i = 0; i < 1_000_000; i++)
{
subject.OnNext(i);
}
subject.OnCompleted();
processor.Wait();
The Select+Observable.Defer+Observable.Start combination converts the source sequence to an IObservable<IObservable<Unit>>. It's a nested sequence, with each inner sequence representing the processing of one list. When the delegate of the Observable.Start completes, the inner sequence emits a Unit value and then completes. The wrapping Defer operator ensures that the inner sequences are "cold", so that they are not started before they are subscribed. Then follows the Merge operator, which unwraps the outer sequence to a flat IObservable<Unit> sequence. The maxConcurrent parameter configures how many of the inner sequences will be subscribed concurrently. Every time an inner sequence is subscribed by the Merge operator, the corresponding Observable.Start delegate starts running on a ThreadPool thread.
If you set the maxConcurrent too high, the ThreadPool may run out of workers (in other words it may become saturated), and
the concurrency of your code will then become dependent on the ThreadPool availability. If you wish, you can increase the number of workers that the ThreadPool creates instantly on demand, by using the ThreadPool.SetMinThreads method. But if your workload is CPU-bound, and you increase the worker threads above the Environment.ProcessorCount value, then most probably your CPU will be saturated instead.
If your workload is asynchronous, you can replace the Observable.Defer+Observable.Start combo with the Observable.FromAsync operator, as shown here.
¹ An unpublished library exists, the AsyncRx.NET, that plays with the idea of asynchronous subscriptions. It is based on the new interfaces IAsyncObservable<T> and IAsyncObserver<T>.
You say this:
// This never gets printed, because Subscribe is only ever called on a single thread.
It's just not true. The reason nothing gets printed is because the code in the Subscribe happens in a locked manner - only one thread at a time executes in a Subscribe so you are incrementing the value and then decrementing it almost immediately. And since it starts at zero it never has a chance to rise above 1.
Now that's just because of the Rx contract. Only one thread in subscribe at once.
We can fix that.
Try this code:
using var reader = subject
.Buffer(TimeSpan.FromSeconds(1), 100)
.SelectMany(list =>
Observable
.Start(() =>
{
var c = Interlocked.Increment(ref concurrentCount);
Console.WriteLine("Starting {0} simultaneous batches", c);
})
.Finally(() =>
{
var c = Interlocked.Decrement(ref concurrentCount);
Console.WriteLine("Ending {0} simultaneous batches", c);
}))
.Subscribe();
Now when I run it (with less than the 1_000_000 iterations that you set) I get output like this:
Starting 1 simultaneous batches
Starting 4 simultaneous batches
Ending 3 simultaneous batches
Ending 2 simultaneous batches
Starting 3 simultaneous batches
Starting 3 simultaneous batches
Ending 1 simultaneous batches
Ending 2 simultaneous batches
Starting 4 simultaneous batches
Starting 5 simultaneous batches
Ending 3 simultaneous batches
Starting 2 simultaneous batches
Starting 2 simultaneous batches
Ending 2 simultaneous batches
Starting 3 simultaneous batches
Ending 0 simultaneous batches
Ending 4 simultaneous batches
Ending 1 simultaneous batches
Starting 1 simultaneous batches
Starting 1 simultaneous batches
Ending 0 simultaneous batches
Ending 0 simultaneous batches
Related
I can already see it's not by the incorrect increments, but there's just one small piece of the puzzle I can't quite seem to catch.
We have the following code:
internal class StupidObject
{
static public SemaphoreSlim semaphore = new SemaphoreSlim(0, 100);
private int counter;
public bool MethodCall() => counter++ == 0;
public int GetCounter() => counter;
}
And the following test code to try and see if it's an atomic operation:
var sharedObj = new StupidObject();
var resultTasks = new Task[100];
for (int i = 0; i < 100; i++)
{
resultTasks[i] = Task.Run(async () =>
{
await StupidObject.semaphore.WaitAsync();
if (sharedObj.MethodCall())
{
Console.WriteLine("True");
};
});
}
Console.WriteLine("Done");
Console.ReadLine();
StupidObject.semaphore.Release(100);
Console.ReadLine();
Console.WriteLine(sharedObj.GetCounter());
Console.ReadLine();
I expect to see multiple True's written to the console, but I ever see a single one.
Why is that? By my understanding, a ++ operation reads the value, increments the read value, and then stores that value to the variable.
Those are 3 operations. If we had a race condition, where thread A did the following:
Reads value to be 0.
Increments read value by 1.
And another thread B did the same things, but beat thread A to the third operation as following:
Writes read value to variable.
When A finishes writing the incremented read value, it should print back 0, same with thread B after it has done its write operation.
Am I missing something at the design aspect of things, or is my test not good enough to make this exact situation come to fruition?
Example without the Task Parallel Library (still yields a single True to the console):
var sharedObj = new StupidObject();
var resultTasks = new Thread[10000];
for (int i = 0; i < 10000; i++)
{
resultTasks[i] = new Thread(() =>
{
StupidObject.semaphore.Wait();
if (sharedObj.MethodCall())
{
Console.WriteLine("True");
};
});
resultTasks[i].IsBackground = false;
resultTasks[i].Start();
}
Console.WriteLine("Done");
Console.ReadLine();
StupidObject.semaphore.Release(10000);
What Liam said about Console.WriteLine is possible, but also there's another thing.
Starting Tasks doesn't equal starting threads, and even starting threads doesn't guarantee that all threads will begin immediatelly. Starting 100 short tasks probably won't even fill .Net's thread pool significantly, because those tasks end quickly and thread pool's manager probably won't start more than 3-5 threads. That's not the "immediate" and "parallel" you'd like to see when you want to start parallel 100 increments to race with each other, right? Remember that Tasks are queued first, then assigned to threads.
Note that the StupidObject's counter starts with zero and that's the ONLY MOMENT EVER that the value is zero. If ANY thread wins the race and successfully writes an update to that integer, you'll get FALSE in all future tasks, because it's already 1.
And if there are many tasks on the thread pool's queue, something first has to notice that fact. At program's start, thread pool lacks threads. They are not started in dozens right at program start. They are started on demand. Most probably you fill up the queue with 100 tasks, threadpool's thread is created, picks first task, bumps counter to 1, then maybe thread pool starts new threads to consume tasks faster.
To get a bit better image what's happening, instead of printing out 'true', collect values observed by return counter++: let each task run, finish, store its value in Task's .Result, then run threads/tasks, then wait for all of then to stop, then collect .Results and write a histogram of those values. Even if you don't see 5 zeros, maybe you will see 3 ones, 7 twos, 2 threes and so on.
In a LINQ Query, I have used .AsParallel as follows:
var completeReservationItems = from rBase in reservation.AsParallel()
join rRel in relationship.AsParallel() on rBase.GroupCode equals rRel.SourceGroupCode
join rTarget in reservation.AsParallel() on rRel.TargetCode equals rTarget.GroupCode
where rRel.ProgramCode == programCode && rBase.StartDate <= rTarget.StartDate && rBase.EndDate >= rTarget.EndDate
select new Object
{
//Initialize based on the query
};
Then, I have created two separate Tasks and was running them in parallel, passing the same Lists to both the methods as follows:
Task getS1Status = Task.Factory.StartNew(
() =>
{
RunLinqQuery(params);
});
Task getS2Status = Task.Factory.StartNew(
() =>
{
RunLinqQuery(params);
});
Task.WaitAll(getS1Status, getS2Status);
I was capturing the timings and was surprised to see that the timings were as follows:
Above scenario: 6 sec (6000 ms)
Same code, running sequentially instead of 2 Tasks: 50 ms
Same code, but without .AsParallel() in the LINQ: 50 ms
I wanted to understand why this is taking so long in the above scenario.
Posting this as answer only because I have some code to show.
Firstly, I dont know how many threads will be created with AsParallel(). Documentation dont say anything about it https://msdn.microsoft.com/en-us/library/dd413237(v=vs.110).aspx
Imagine following code
void RunMe()
{
foreach (var threadId in Enumerable.Range(0, 100)
.AsParallel()
.Select(x => Thread.CurrentThread.ManagedThreadId)
.Distinct())
Console.WriteLine(threadId);
}
How much thread's ids we will see? For me each time will see different number of threads, example output:
30 // only one thread!
Next time
27 // several threads
13
38
10
43
30
I think, number of threads depends of current scheduler. We can always define maximum number of threads by calling WithDegreeOfParallelism (https://msdn.microsoft.com/en-us/library/dd383719(v=vs.110).aspx) method, example
void RunMe()
{
foreach (var threadId in Enumerable.Range(0, 100)
.AsParallel()
.WithDegreeOfParallelism(2)
.Select(x => Thread.CurrentThread.ManagedThreadId)
.Distinct())
Console.WriteLine(threadId);
}
Now, output will contains maximum 2 threads.
7
40
Why this important? As I said, number of threads can directly influence on performance.
But, this is not all problems. In your 1 scenario, you are creating new tasks (which will perform inside thread pool and can add additional overhead), and then, you are calling Task.WaitAll. Take a look on source code for it https://referencesource.microsoft.com/#mscorlib/system/threading/Tasks/Task.cs,72b6b3fa5eb35695 , Im sure that those for loop by task will add additional overhead, and, in situation when AsParallel will take too much threads inside first task, next task can start continiously. Moreover, this CAN be happen, so, if you will run your 1 scenario 1000 times, probably, you will get very different results.
So, my last argument that you try to measure parallel code, but it is very hard to do it right. Im not recommend to use parallel stuff as much as you can, because it can raise performance degradation, if you dont know exactly, what are you doing.
We have an application, wherein we have a materialized array of items which we are going to process through a Reactive pipeline. It looks a little like this
EventLoopScheduler eventLoop = new EventLoopScheduler();
IScheduler concurrency = new TaskPoolScheduler(
new TaskFactory(
new LimitedConcurrencyLevelTaskScheduler(threadCount)));
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
// 1. transform on single thread
IConnectableObservable<byte[]> source =
numbers.Select(Transform).ToObservable(eventLoop).Publish();
// 2. naive parallelization, restricts parallelization to Work
// only; chunk up sequence into smaller sequences and process
// in parallel, merging results
IObservable<int> final = source.
Buffer(10).
Select(
batch =>
batch.
ToObservable(concurrency).
Buffer(10).
Select(
concurrentBatch =>
concurrentBatch.
Select(Work).
ToArray().
ToObservable(eventLoop)).
Merge()).
Merge();
final.Subscribe();
source.Connect();
Await(final).Wait();
If you are really curious to play with this, the stand-in methods look like
private async static Task Await(IObservable<int> final)
{
await final.LastOrDefaultAsync();
}
private static byte[] Transform(int number)
{
if (number == itemCount)
{
Console.WriteLine("numbers exhausted.");
}
byte[] buffer = new byte[1000000];
Buffer.BlockCopy(bloat, 0, buffer, 0, bloat.Length);
return buffer;
}
private static int Work(byte[] buffer)
{
Console.WriteLine("t {0}.", Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(50);
return 1;
}
A little explanation. Range(1, itemCount) simulates raw inputs, materialized from a data-source. Transform simulates an enrichment process each input must go through, and results in a larger memory footprint. Work is a "lengthy" process which operates on the transformed input.
Ideally, we want to minimize the number of transformed inputs held concurrently by the system, while maximizing throughput by parallelizing Work. The number of transformed inputs in memory should be batch size (10 above) times concurrent work threads (threadCount).
So for 5 threads, we should retain 50 Transform items at any given time; and if, as here, the transform is a 1MB byte buffer, then we would expect memory consumption to be at about 50MB throughout the run.
What I find is quite different. Namely that Reactive is eagerly consuming all numbers, and Transform them up front (as evidenced by numbers exhausted. message), resulting in a massive memory spike up front (#1GB for 1000 itemCount).
My basic question is: Is there a way to achieve what I need (ie minimized consumption, throttled by multi-threaded batching)?
UPDATE: sorry for reversal James; at first, i did not think paulpdaniels and Enigmativity's composition of Work(Transform) applied (this has to do with the nature of our actual implementation, which is more complex than the simple scenario provided above), however, after some further experimentation, i may be able to apply the same principles: ie defer Transform until batch executes.
You have made a couple of mistakes with your code that throws off all of your conclusions.
First up, you've done this:
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
You've used Enumerable.Range which means that when you call numbers.Select(Transform) you are going to burn through all of the numbers as fast as a single thread can take it. Rx hasn't even had a chance to do any work because up till this point your pipeline is entirely enumerable.
The next issue is in your subscriptions:
final.Subscribe();
source.Connect();
Await(final).Wait();
Because you call final.Subscribe() & Await(final).Wait(); you are creating two separate subscriptions to the final observable.
Since there is a source.Connect() in the middle the second subscription may be missing out on values.
So, let's try to remove all of the cruft that's going on here and see if we can work things out.
If you go down to this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.Select(bs => Work(bs));
Things work well. The numbers get exhausted right at the end, and processing 20 items on my machine takes about 1 second.
But this is processing everything in sequence. And the Work step provides back-pressure on Transform to slow down the speed at which it consumes the numbers.
Let's add concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs)));
This processes 20 items in 0.284 seconds, and the numbers exhaust themselves after 5 items are processed. There is no longer any back-pressure on the numbers. Basically the scheduler is handing all of the work to the Observable.Start so it is ready for the next number immediately.
Let's reduce the concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs), concurrency));
Now the 20 items get processed in 0.5 seconds. Only two get processed before the numbers are exhausted. This makes sense as we've limited concurrency to two threads. But still there's no back pressure on the consumption of the numbers so they get chewed up pretty quickly.
Having said all of this, I tried to construct a query with the appropriate back pressure, but I couldn't find a way. The crux comes down to the fact that Transform(...) performs far faster than Work(...) so it completes far more quickly.
So then the obvious move for me was this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.SelectMany(n => Observable.Start(() => Work(Transform(n)), concurrency));
This doesn't complete the numbers until the end, and it limits processing to two threads. It appears to do the right thing for what you want, except that I've had to do Work(Transform(...)) together.
The very fact that you want to limit the amount of work you are doing suggests you should be pulling data, not having it pushed at you. I would forget using Rx in this scenario, as fundamentally, what you have described is not a reactive application. Also, Rx is best suited processing items serially; it uses sequential event streams.
Why not just keep your data source enumerable, and use PLinq, Parallel.ForEach or DataFlow? All of those sound better suited for your problem.
As #JamesWorld said it may very well be that you want to use PLinq to perform this task, it really depends on if you are actually reacting to data in your real scenario or just iterating through it.
If you choose to go the Reactive route you can use Merge to control the level of parallelization occurring:
var source = numbers
.Select(n =>
Observable.Defer(() => Observable.Start(() => Work(Transform(n)), concurrency)))
//Maximum concurrency
.Merge(10)
//Schedule all the output back onto the event loop scheduler
.ObserveOn(eventLoop);
The above code will consume all the numbers first (sorry no way to avoid that), however, by wrapping the processing in a Defer and following it up with a Merge that limits parallelization, only x number of items can be in flight at a time. Start() takes a scheduler as the second argument which it uses to execute to the provided method. Finally, Since you are basically just pushing the values of Transform into Work I composed them within the Start method.
As a side note, you can await an Observable and it will be equivalent to the code you have, i.e:
await source; //== await source.LastAsync();
I use 2 Parallel.ForEach nested loops to retrieve information quickly from a url. This is the code:
while (searches.Count > 0)
{
Parallel.ForEach(searches, (search, loopState) =>
{
Parallel.ForEach(search.items, item =>
{
RetrieveInfo(item);
}
);
}
);
}
The outer ForEach has a list of, for example 10, whilst the inner ForEach has a list of 5. This means that I'm going to query the url 50 times, however I query it 5 times simultaneously (inner ForEach).
I need to add a delay for the inner loop so that after it queries the url, it waits for x seconds - the time taken for the inner loop to complete the 5 requests.
Using Thread.Sleep is not a good idea because it will block the complete thread and possibly the other parallel tasks.
Is there an alternative that might work?
To my understanding, you have 50 tasks and you wish to process 5 of them at a time.
If so, you should look into ParallelOptions.MaxDegreeOfParallelism to process 50 tasks with a maximum degree of parallelism at 5. When one task stops, another task is permitted to start.
If you wish to have tasks processed in chunks of five, followed by another chunk of five (as in, you wish to process chunks in serial), then you would want code similar to
for(...)
{
Parallel.ForEach(
[paralleloptions,]
set of 5, action
)
}
Batch
read text from file or SQL
parse the text into words
load the words into SQL
Today
.NET 4.0
Step 1 is very fast.
Steps 2 and 3 are about the same length (avg 0.1 second) for the same size file.
On step 3 insert using BackGroundWorker and wait for last to complete.
Everything else is on the main thread.
On a big load will do this several million times.
Need step 3 to be serial and in the same order as 1.
This is to keep the SQL table PK index from fracturing.
Tried step 3 in parallel and fracturing the index killed it.
This data is fed sorted by the PK.
Other indexes are dropped at the start of the load then rebuilt at the end of the load.
Where this process is not effective is when the size of text changes.
And the size of the text from file to file does change drastically.
What I would like is to queue 1 and 2 so 3 is kept as busy as possible.
Need step 3 to dequeue the files in order they were enqueued in 1 (even if it waits).
Need a maximum queue size for memory management (like 4-10).
Would like to have step 2 parallel with up to 4 concurrent.
Moving to .NET 4.5.
Asking for general guidance on how to implement this?
I am learning that this is a producer consumer pattern.
If this is not a producer consumer pattern please let me know so I can change the title.
I think TPL Dataflow would be a good way to do this:
For step 2, you would use a TransformBlock with MaxDegreeOfParallelism set to 4 and BoundedCapacity also set to 4, so that its queues are empty when working. It will produce the items in the same order as they came in, you don't have to do anything special for that. For step 3, use an ActionBlock, with BoundedCapacity set to your limit. Then link the two together and start sending items to the TransformBlock, ideally using something like await stepTwoBlock.SendAsync(…), to asynchronously wait if the queue is full.
In code, it would look something like:
async Task ProcessData()
{
var stepTwoBlock = new TransformBlock<OriginalText, ParsedText>(
text => Parse(text),
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 4,
BoundedCapacity = 4
});
var stepThreeBlock = new ActionBlock<ParsedText>(
text => LoadIntoDatabase(text),
new ExecutionDataflowBlockOptions { BoundedCapacity = 10 });
stepTwoBlock.LinkTo(
stepThreeBlock, new DataflowLinkOptions { PropagateCompletion = true });
// this is step one:
foreach (var id in IdsToProcess)
{
OriginalText text = ReadText(id);
await stepTwoBlock.SendAsync(text);
}
stepTwoBlock.Complete();
await stepThreeBlock.Completion;
}