How to limit consuming sequence with Reactive? - c#

We have an application, wherein we have a materialized array of items which we are going to process through a Reactive pipeline. It looks a little like this
EventLoopScheduler eventLoop = new EventLoopScheduler();
IScheduler concurrency = new TaskPoolScheduler(
new TaskFactory(
new LimitedConcurrencyLevelTaskScheduler(threadCount)));
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
// 1. transform on single thread
IConnectableObservable<byte[]> source =
numbers.Select(Transform).ToObservable(eventLoop).Publish();
// 2. naive parallelization, restricts parallelization to Work
// only; chunk up sequence into smaller sequences and process
// in parallel, merging results
IObservable<int> final = source.
Buffer(10).
Select(
batch =>
batch.
ToObservable(concurrency).
Buffer(10).
Select(
concurrentBatch =>
concurrentBatch.
Select(Work).
ToArray().
ToObservable(eventLoop)).
Merge()).
Merge();
final.Subscribe();
source.Connect();
Await(final).Wait();
If you are really curious to play with this, the stand-in methods look like
private async static Task Await(IObservable<int> final)
{
await final.LastOrDefaultAsync();
}
private static byte[] Transform(int number)
{
if (number == itemCount)
{
Console.WriteLine("numbers exhausted.");
}
byte[] buffer = new byte[1000000];
Buffer.BlockCopy(bloat, 0, buffer, 0, bloat.Length);
return buffer;
}
private static int Work(byte[] buffer)
{
Console.WriteLine("t {0}.", Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(50);
return 1;
}
A little explanation. Range(1, itemCount) simulates raw inputs, materialized from a data-source. Transform simulates an enrichment process each input must go through, and results in a larger memory footprint. Work is a "lengthy" process which operates on the transformed input.
Ideally, we want to minimize the number of transformed inputs held concurrently by the system, while maximizing throughput by parallelizing Work. The number of transformed inputs in memory should be batch size (10 above) times concurrent work threads (threadCount).
So for 5 threads, we should retain 50 Transform items at any given time; and if, as here, the transform is a 1MB byte buffer, then we would expect memory consumption to be at about 50MB throughout the run.
What I find is quite different. Namely that Reactive is eagerly consuming all numbers, and Transform them up front (as evidenced by numbers exhausted. message), resulting in a massive memory spike up front (#1GB for 1000 itemCount).
My basic question is: Is there a way to achieve what I need (ie minimized consumption, throttled by multi-threaded batching)?
UPDATE: sorry for reversal James; at first, i did not think paulpdaniels and Enigmativity's composition of Work(Transform) applied (this has to do with the nature of our actual implementation, which is more complex than the simple scenario provided above), however, after some further experimentation, i may be able to apply the same principles: ie defer Transform until batch executes.

You have made a couple of mistakes with your code that throws off all of your conclusions.
First up, you've done this:
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
You've used Enumerable.Range which means that when you call numbers.Select(Transform) you are going to burn through all of the numbers as fast as a single thread can take it. Rx hasn't even had a chance to do any work because up till this point your pipeline is entirely enumerable.
The next issue is in your subscriptions:
final.Subscribe();
source.Connect();
Await(final).Wait();
Because you call final.Subscribe() & Await(final).Wait(); you are creating two separate subscriptions to the final observable.
Since there is a source.Connect() in the middle the second subscription may be missing out on values.
So, let's try to remove all of the cruft that's going on here and see if we can work things out.
If you go down to this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.Select(bs => Work(bs));
Things work well. The numbers get exhausted right at the end, and processing 20 items on my machine takes about 1 second.
But this is processing everything in sequence. And the Work step provides back-pressure on Transform to slow down the speed at which it consumes the numbers.
Let's add concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs)));
This processes 20 items in 0.284 seconds, and the numbers exhaust themselves after 5 items are processed. There is no longer any back-pressure on the numbers. Basically the scheduler is handing all of the work to the Observable.Start so it is ready for the next number immediately.
Let's reduce the concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs), concurrency));
Now the 20 items get processed in 0.5 seconds. Only two get processed before the numbers are exhausted. This makes sense as we've limited concurrency to two threads. But still there's no back pressure on the consumption of the numbers so they get chewed up pretty quickly.
Having said all of this, I tried to construct a query with the appropriate back pressure, but I couldn't find a way. The crux comes down to the fact that Transform(...) performs far faster than Work(...) so it completes far more quickly.
So then the obvious move for me was this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.SelectMany(n => Observable.Start(() => Work(Transform(n)), concurrency));
This doesn't complete the numbers until the end, and it limits processing to two threads. It appears to do the right thing for what you want, except that I've had to do Work(Transform(...)) together.

The very fact that you want to limit the amount of work you are doing suggests you should be pulling data, not having it pushed at you. I would forget using Rx in this scenario, as fundamentally, what you have described is not a reactive application. Also, Rx is best suited processing items serially; it uses sequential event streams.
Why not just keep your data source enumerable, and use PLinq, Parallel.ForEach or DataFlow? All of those sound better suited for your problem.

As #JamesWorld said it may very well be that you want to use PLinq to perform this task, it really depends on if you are actually reacting to data in your real scenario or just iterating through it.
If you choose to go the Reactive route you can use Merge to control the level of parallelization occurring:
var source = numbers
.Select(n =>
Observable.Defer(() => Observable.Start(() => Work(Transform(n)), concurrency)))
//Maximum concurrency
.Merge(10)
//Schedule all the output back onto the event loop scheduler
.ObserveOn(eventLoop);
The above code will consume all the numbers first (sorry no way to avoid that), however, by wrapping the processing in a Defer and following it up with a Merge that limits parallelization, only x number of items can be in flight at a time. Start() takes a scheduler as the second argument which it uses to execute to the provided method. Finally, Since you are basically just pushing the values of Transform into Work I composed them within the Start method.
As a side note, you can await an Observable and it will be equivalent to the code you have, i.e:
await source; //== await source.LastAsync();

Related

Generate events with dynamically changeable time interval

I have a stream of numeric values that arrive at a fast rate (sub-millisecond), and I want to display their "instant value" on screen, and for usability reasons I should downsample that stream, updating the last value using a configurable time interval. That configuration would be done by user preference, via dragging a slider.
So what I want to do is to store the last value of the source stream in a variable, and have an auto-retriggering timer that updates the displayed value with that variable's value.
I think about using RX, something like this:
Observable.FromEventPattern<double>(_source, "NewValue")
.Sample(TimeSpan.FromMilliseconds(100))
.Subscribe(ep => instantVariable = ep.EventArgs)
The problem is that I cannot, as far as I know, dynamically change the interval.
I can imagine there are ways to do it using timers, but I would prefer to use RX.
Assuming you can model the sample-size changes as an observable, you can do this:
IObservable<int> sampleSizeObservable;
var result = sampleSizeObservable
.Select(i => Observable.FromEventPattern<double>(_source, "NewValue")
.Sample(TimeSpan.FromMilliseconds(i))
)
.Switch();
Switch basically does what you want, but via Rx. It doesn't "change" the interval: Observables are (generally) supposed to be immutable. Rather whenever the sample size changes, it creates a new sampling observable, subscribes to the new observable, drops the subscription to the old one, and melds those two subscriptions together so it looks seamless to a client subscriber.
Here is a custom Sample operator with an interval that can be changed at any time before or during the lifetime of a subscription.
/// <summary>Samples the source observable sequence at a dynamic interval
/// controlled by a delegate.</summary>
public static IObservable<T> Sample<T>(this IObservable<T> source,
out Action<TimeSpan> setInterval)
{
var intervalController = new ReplaySubject<TimeSpan>(1);
setInterval = interval => intervalController.OnNext(interval);
return source.Publish(shared => intervalController
.Select(timeSpan => timeSpan == Timeout.InfiniteTimeSpan ?
Observable.Empty<long>() : Observable.Interval(timeSpan))
.Switch()
.WithLatestFrom(shared, (_, x) => x)
.TakeUntil(shared.LastOrDefaultAsync()));
}
The out Action<TimeSpan> setInterval parameter is the mechanism that controls the interval of the sampling. It can be invoked with any non-negative TimeSpan argument, or with the special value Timeout.InfiniteTimeSpan that has the effect of suspending the sampling.
This operator defers from the built-in Sample in the case where the source sequence produces values slower than the desirable sampling interval. The built-in Sample adjusts the sampling to the tempo of the source, never emitting the same value twice. On the contrary this operator maintains its own tempo, making it possible to emit the same value more than once. In case this is undesirable, you can attach the DistinctUntilChanged operator after the Sample.
Usage example:
var subscription = Observable.FromEventPattern<double>(_source, "NewValue")
.Sample(out var setSampleInterval)
.Subscribe(ep => instantVariable = ep.EventArgs);
setSampleInterval(TimeSpan.FromMilliseconds(100)); // Initial sampling interval
//...
setSampleInterval(TimeSpan.FromMilliseconds(500)); // Slow down
//...
setSampleInterval(Timeout.InfiniteTimeSpan); // Suspend
Try this one and let me know if it works:
Observable
.FromEventPattern<double>(_source, "NewValue")
.Window(() => Observable.Timer(TimeSpan.FromMilliseconds(100)))
.SelectMany(x => x.LastAsync());
If the data comes in in "Sub-millisecond" intervalls, what you would need to handle it is realtime programming. The Garbage Collection using .NET Framework - as most other things using C# - are a far step away from that. You can maybe get close in some areas. But you can never guarantee remotely that the Programm will be able to keep up with that data intake.
Aside from that, what you want sounds like Rate Limiting code. Code that will not run more often then Interval. I wrote this example code for a Multithreading example, but it should get you started on the Idea:
integer interval = 20;
DateTime dueTime = DateTime.Now.AddMillisconds(interval);
while(true){
if(DateTime.Now >= dueTime){
//insert code here
//Update next dueTime
dueTime = DateTime.Now.AddMillisconds(interval);
}
else{
//Just yield to not tax out the CPU
Thread.Sleep(1);
}
}
Note that DateTime actually has limited Precision, often not going lower then 5-20 ms. The Stop watch is a lot more precise. But honestly, anything beyond 60 updates per Second (17 ms Intervall) will propably not be human readable.
Another issue issue is actually that writing teh GUI is costly. You will never notice if you only write once per user triggered event. But onec you send updates from a loop (inlcuding one running in another thread) you can quickly into issues. In my first test with Multithreading I actually did so much and so complicated progress reporting, I ended up plain overloading the GUI thread with Stuff to change.

TPL Block with round robin link?

I am using TPL a lot and have large data flow pipeline structure.
As part of the pipeline network I want to write some data to azure blob storage. We have a lot of data therefore we have 4 storage accounts and we want to distribute the data evenly between them.
Wanted to continue using the dataflow pipeline pattern therefore I want to implement a SourceBlock that if I link it to several target blocks it will send the messages to them with round robin. BufferBlock is not good enough because he is sending the message to the first block that accepts it, and assuming all the target blocks have large bounded capacity - all the messages will go to the first target block. BroadcastBlock is not good as well because I don`t want duplicates.
Any recommendations? Implementing the ISourceBlock interface with the round robing behavior seems not so simple and I wondered if there are simpler solutions out there? or any extensions to TPL that I am not familar with?
Are you aware of possibility to link the blocks with a predicate? This is a very simple and not well tested solution for sample:
var buffer = new BufferBlock<int>();
var consumer1 = new ActionBlock<int>(i => Console.WriteLine($"First: {i}"));
var consumer2 = new ActionBlock<int>(i => Console.WriteLine($"Second: {i}"));
var consumer3 = new ActionBlock<int>(i => Console.WriteLine($"Third: {i}"));
var consumer4 = new ActionBlock<int>(i => Console.WriteLine($"Forth: {i}"));
buffer.LinkTo(consumer1, i => Predicate(0));
buffer.LinkTo(consumer2, i => Predicate(1));
buffer.LinkTo(consumer3, i => Predicate(2));
buffer.LinkTo(consumer4, i => Predicate(3));
buffer.LinkTo(DataflowBlock.NullTarget<int>());
for (var i = 0; i < 10; ++i)
{
buffer.Post(i);
}
buffer.Completion.Wait();
One of the outputs:
Third: 2
First: 0
Forth: 3
Second: 1
Second: 5
Second: 9
Third: 6
Forth: 7
First: 4
First: 8
What is going on here is you're maintaining the number of operation, and if current is suitable for the consumer, we just increment that. Note that you still should link the block without any predicate at least once for avoiding the memory issues (also, it's a good idea to test the round robin with block which do monitor the lost messages).

Parallel.For() with Interlocked.CompareExchange(): poorer performance and slightly different results to serial version

I experimented with calculating the mean of a list using Parallel.For(). I decided against it as it is about four times slower than a simple serial version. Yet I am intrigued by the fact that it does not yield exactly the same result as the serial one and I thought it would be instructive to learn why.
My code is:
public static double Mean(this IList<double> list)
{
double sum = 0.0;
Parallel.For(0, list.Count, i => {
double initialSum;
double incrementedSum;
SpinWait spinWait = new SpinWait();
// Try incrementing the sum until the loop finds the initial sum unchanged so that it can safely replace it with the incremented one.
while (true) {
initialSum = sum;
incrementedSum = initialSum + list[i];
if (initialSum == Interlocked.CompareExchange(ref sum, incrementedSum, initialSum)) break;
spinWait.SpinOnce();
}
});
return sum / list.Count;
}
When I run the code on a random sequence of 2000000 points, I get results that are different in the last 2 digits to the serial mean.
I searched stackoverflow and found this: VB.NET running sum in nested loop inside Parallel.for Synclock loses information. My case, however, is different to the one described there. There a thread-local variable temp is the cause of inaccuracy, but I use a single sum that is updated (I hope) according to the textbook Interlocked.CompareExchange() pattern. The question is of course moot because of the poor performance (which surprises me, but I am aware of the overhead), yet I am curious whether there is something to be learnt from this case.
Your thoughts are appreciated.
Using double is the underlying problem, you can feel better about the synchronization not being the cause by using long instead. The results you got are in fact correct but that never makes a programmer happy.
You discovered that floating point math is communicative but not associative. Or in other words, a + b == b + a but a + b + c != a + c + b. Implicit in your code that the order in which the numbers are added is quite random.
This C++ question talks about it as well.
The accuracy issue is very well addressed in the other answers so I won't repeat it here, other that to say never trust the low bits of your floating point values. Instead I'll try to explain the performance hit you're seeing and how to avoid it.
Since you haven't shown your sequential code, I'll assume the absolute simplest case:
double sum = list.Sum();
This is a very simple operation that should work about as fast as it is possible to go on one CPU core. With a very large list it seems like it should be possible to leverage multiple cores to sum the list. And, as it turns out, you can:
double sum = list.AsParallel().Sum();
A few runs of this on my laptop (i3 with 2 cores/4 logical procs) yields a speedup of about 2.6 times over multiple runs against 2 million random numbers (same list, multiple runs).
Your code however is much, much slower than the simple case above. Instead of simply breaking the list into blocks that are summed independently and then summing the results you are introducing all sorts of blocking and waiting in order to have all of the threads update a single running sum.
Those extra waits, the much more complex code that supports them, creating objects and adding more work for the garbage collector all contribute to a much slower result. Not only are you wasting a whole lot of time on each item in the list but you are essentially forcing the program to do a sequential operation by making it wait for the other threads to leave the sum variable alone long enough for you to update it.
Assuming that the operation you are actually performing is more complex than a simple Sum() can handle, you may find that the Aggregate() method is more useful to you than Parallel.For.
There are several overloads of the Aggregate extension, including one that is effectively a Map Pattern implementation, with similarities to how bigdata systems like MapReduce work. Documentation is here.
This version of Aggregate uses an accumulator seed (the starting value for each thread) and three functions:
updateAccumulatorFunc is called for each item in the sequence and returns an updated accumulator value
combineAccumulatorsFunc is used to combine the accumulators from each partition (thread) in your parallel enumerable
resultSelector selects the final output value from the accumulated result.
A parallel sum using this method looks something like this:
double sum = list.AsParallel().Aggregate(
// seed value for accumulators
(double)0,
// add val to accumulator
(acc, val) => acc + val,
// add accumulators
(acc1, acc2) => acc1 + acc2,
// just return the final accumulator
acc => acc
);
For simple aggregations that works fine. For a more complex aggregate that uses an accumulator that is non-trivial there is a variant that accepts a function that creates accumulators for the initial state. This is useful for example in an Average implementation:
public class avg_acc
{
public int count;
public double sum;
}
public double ParallelAverage(IEnumerable<double> list)
{
double avg = list.AsParallel().Aggregate(
// accumulator factory method, called once per thread:
() => new avg_acc { count = 0, sum = 0 },
// update count and sum
(acc, val) => { acc.count++; acc.sum += val; return acc; },
// combine accumulators
(ac1, ac2) => new avg_acc { count = ac1.count + ac2.count, sum = ac1.sum + ac2.sum },
// calculate average
acc => acc.sum / acc.count
);
return avg;
}
While not as fast as the standard Average extension (~1.5 times faster than sequential, 1.6 times slower than parallel) this shows how you can do quite complex operations in parallel without having to lock outputs or wait on other threads to stop messing with them, and how to use a complex accumulator to hold intermediate results.

Parallelizing producer and consumer with internal state

I'd like to know if the following approach is a good way to implement a producer and consumer pattern in C# .NET 4.6.1
Description of what I want to do:
I want to read files, perform calculation on the data within and save the result. Each file has an origin (a device e.g. data logger) and depending on that origin different calculations as well as output formats should be used. The file contains different values, e.g. temperature readings of several sensors. It is important that the calculations have a state. For instance this could be the last value of the previous calculation, e.g. if I want to sum all values of one origin.
I want to parallelize the processing per origin. All files from one origin need to be processed sequentially (or more specific chronologically) and cannot be parallelized.
I think the TPL Dataflow might be an appropriate solution for this.
This is the process I came up with:
The reading would be done by an TransformBlock. Next I would create instances of the classes performing operations on the data for each origin. They get initialized with the neccessary parameters, so that they know how to process files for their origin.
Then I would create TransformBlocks for each created object (so basically for each origin). Each TransformBlocks would execute a function of the corresponding object. The TransformBlock reading the files would be linked to a BufferBlock, which is linked to each TransformBlock for the processing per origin. The linking would be conditional, so that only data that is meant to reach the processing TranformBlock of an origin would be received. The output of the processing Blocks would be linked with an ActionBlock for writing the output files.
The maxDegreeOfParallelism is set to 1 for every Block.
Is that a viable solution? I thought about implementing this with Tasks and the BlockingCollection, but it seems this would be the easier approach.
Additional Information:
The amount of files processed may be to large in size or number to be loaded at once.
Reading and writing should happen concurrent to the processing. As I/O takes time and because data needs to be collected after processing to form an output file, buffering is essential.
Since the origins are independent and the items for each origin are fully dependent this problem has an easy solution:
var origins = (from f in files
group f by f.origin into g
orderby g.Count() descending
select g);
var results =
Partitioner.Create(origins) //disable chunking
.AsParallel()
.AsOrdered() //try process the biggest groups first
.Select(originGroup => {
foreach (var x in originGroup.OrderBy(...)) Process(x);
return someResult;
})
.ToList();
Process each origin sequentially and origins in parallel.
If you have a need to limit IO is some way you can throw in a SemaphoreSlim to guard the IO paths.

Advisable to use rx distinct in long running process?

i am using rx distinct operator to filter external data stream based on a certain key within a long running process.
will this cause leak in the memory? Assuming a lot of different keys will be received. How does rx distinct operator keep track of previously received keys?
Should I use groupbyuntil with a duration selector instead?
Observable.Distinct uses a HashSet internally. Memory usage will be roughly proportional to the number of distinct Keys encountered. (AFAIK about 30*n bytes)
GroupByUntil does something really different than Distinct.
GroupByUntil (well) groups, whereas Distinct filters the elements of a stream.
Not sure about the intended use, but if you just want to filter consecutive identical elements you need Observable.DistinctUntilChanged which has a memory footprint independent of the number of keys.
This may be a controversial tactic, but if you were worried about distinct keys accumulating, and if there was a point in time where this could safely be reset, you could introduce a reset policy using Observable.Switch. For example, we have a scenario where the "state of the world" is reset on a daily basis, so we could reset the distinct observable daily.
Observable.Create<MyPoco>(
observer =>
{
var distinctPocos = new BehaviorSubject<IObservable<MyPoco>>(pocos.Distinct(x => x.Id));
var timerSubscription =
Observable.Timer(
new DateTimeOffset(DateTime.UtcNow.Date.AddDays(1)),
TimeSpan.FromDays(1),
schedulerService.Default).Subscribe(
t =>
{
Log.Info("Daily reset - resetting distinct subscription.");
distinctPocos.OnNext(pocos.Distinct(x => x.Id));
});
var pocoSubscription = distinctPocos.Switch().Subscribe(observer);
return new CompositeDisposable(timerSubscription, pocoSubscription);
});
However, I do tend to agree with James World's comment above regarding testing with a memory profiler to check that memory is indeed an issue before introducing potentially unnecessary complexity. If you're accumulating 32-bit ints as the key, you'd have many millions of unique items before running into memory issues on most platforms. E.g. 262144 32-bit int keys will take up one megabyte. It may be that you reset the process long before this time, depending on your scenario.

Categories