I have a stream of numeric values that arrive at a fast rate (sub-millisecond), and I want to display their "instant value" on screen, and for usability reasons I should downsample that stream, updating the last value using a configurable time interval. That configuration would be done by user preference, via dragging a slider.
So what I want to do is to store the last value of the source stream in a variable, and have an auto-retriggering timer that updates the displayed value with that variable's value.
I think about using RX, something like this:
Observable.FromEventPattern<double>(_source, "NewValue")
.Sample(TimeSpan.FromMilliseconds(100))
.Subscribe(ep => instantVariable = ep.EventArgs)
The problem is that I cannot, as far as I know, dynamically change the interval.
I can imagine there are ways to do it using timers, but I would prefer to use RX.
Assuming you can model the sample-size changes as an observable, you can do this:
IObservable<int> sampleSizeObservable;
var result = sampleSizeObservable
.Select(i => Observable.FromEventPattern<double>(_source, "NewValue")
.Sample(TimeSpan.FromMilliseconds(i))
)
.Switch();
Switch basically does what you want, but via Rx. It doesn't "change" the interval: Observables are (generally) supposed to be immutable. Rather whenever the sample size changes, it creates a new sampling observable, subscribes to the new observable, drops the subscription to the old one, and melds those two subscriptions together so it looks seamless to a client subscriber.
Here is a custom Sample operator with an interval that can be changed at any time before or during the lifetime of a subscription.
/// <summary>Samples the source observable sequence at a dynamic interval
/// controlled by a delegate.</summary>
public static IObservable<T> Sample<T>(this IObservable<T> source,
out Action<TimeSpan> setInterval)
{
var intervalController = new ReplaySubject<TimeSpan>(1);
setInterval = interval => intervalController.OnNext(interval);
return source.Publish(shared => intervalController
.Select(timeSpan => timeSpan == Timeout.InfiniteTimeSpan ?
Observable.Empty<long>() : Observable.Interval(timeSpan))
.Switch()
.WithLatestFrom(shared, (_, x) => x)
.TakeUntil(shared.LastOrDefaultAsync()));
}
The out Action<TimeSpan> setInterval parameter is the mechanism that controls the interval of the sampling. It can be invoked with any non-negative TimeSpan argument, or with the special value Timeout.InfiniteTimeSpan that has the effect of suspending the sampling.
This operator defers from the built-in Sample in the case where the source sequence produces values slower than the desirable sampling interval. The built-in Sample adjusts the sampling to the tempo of the source, never emitting the same value twice. On the contrary this operator maintains its own tempo, making it possible to emit the same value more than once. In case this is undesirable, you can attach the DistinctUntilChanged operator after the Sample.
Usage example:
var subscription = Observable.FromEventPattern<double>(_source, "NewValue")
.Sample(out var setSampleInterval)
.Subscribe(ep => instantVariable = ep.EventArgs);
setSampleInterval(TimeSpan.FromMilliseconds(100)); // Initial sampling interval
//...
setSampleInterval(TimeSpan.FromMilliseconds(500)); // Slow down
//...
setSampleInterval(Timeout.InfiniteTimeSpan); // Suspend
Try this one and let me know if it works:
Observable
.FromEventPattern<double>(_source, "NewValue")
.Window(() => Observable.Timer(TimeSpan.FromMilliseconds(100)))
.SelectMany(x => x.LastAsync());
If the data comes in in "Sub-millisecond" intervalls, what you would need to handle it is realtime programming. The Garbage Collection using .NET Framework - as most other things using C# - are a far step away from that. You can maybe get close in some areas. But you can never guarantee remotely that the Programm will be able to keep up with that data intake.
Aside from that, what you want sounds like Rate Limiting code. Code that will not run more often then Interval. I wrote this example code for a Multithreading example, but it should get you started on the Idea:
integer interval = 20;
DateTime dueTime = DateTime.Now.AddMillisconds(interval);
while(true){
if(DateTime.Now >= dueTime){
//insert code here
//Update next dueTime
dueTime = DateTime.Now.AddMillisconds(interval);
}
else{
//Just yield to not tax out the CPU
Thread.Sleep(1);
}
}
Note that DateTime actually has limited Precision, often not going lower then 5-20 ms. The Stop watch is a lot more precise. But honestly, anything beyond 60 updates per Second (17 ms Intervall) will propably not be human readable.
Another issue issue is actually that writing teh GUI is costly. You will never notice if you only write once per user triggered event. But onec you send updates from a loop (inlcuding one running in another thread) you can quickly into issues. In my first test with Multithreading I actually did so much and so complicated progress reporting, I ended up plain overloading the GUI thread with Stuff to change.
Related
We have an application, wherein we have a materialized array of items which we are going to process through a Reactive pipeline. It looks a little like this
EventLoopScheduler eventLoop = new EventLoopScheduler();
IScheduler concurrency = new TaskPoolScheduler(
new TaskFactory(
new LimitedConcurrencyLevelTaskScheduler(threadCount)));
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
// 1. transform on single thread
IConnectableObservable<byte[]> source =
numbers.Select(Transform).ToObservable(eventLoop).Publish();
// 2. naive parallelization, restricts parallelization to Work
// only; chunk up sequence into smaller sequences and process
// in parallel, merging results
IObservable<int> final = source.
Buffer(10).
Select(
batch =>
batch.
ToObservable(concurrency).
Buffer(10).
Select(
concurrentBatch =>
concurrentBatch.
Select(Work).
ToArray().
ToObservable(eventLoop)).
Merge()).
Merge();
final.Subscribe();
source.Connect();
Await(final).Wait();
If you are really curious to play with this, the stand-in methods look like
private async static Task Await(IObservable<int> final)
{
await final.LastOrDefaultAsync();
}
private static byte[] Transform(int number)
{
if (number == itemCount)
{
Console.WriteLine("numbers exhausted.");
}
byte[] buffer = new byte[1000000];
Buffer.BlockCopy(bloat, 0, buffer, 0, bloat.Length);
return buffer;
}
private static int Work(byte[] buffer)
{
Console.WriteLine("t {0}.", Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(50);
return 1;
}
A little explanation. Range(1, itemCount) simulates raw inputs, materialized from a data-source. Transform simulates an enrichment process each input must go through, and results in a larger memory footprint. Work is a "lengthy" process which operates on the transformed input.
Ideally, we want to minimize the number of transformed inputs held concurrently by the system, while maximizing throughput by parallelizing Work. The number of transformed inputs in memory should be batch size (10 above) times concurrent work threads (threadCount).
So for 5 threads, we should retain 50 Transform items at any given time; and if, as here, the transform is a 1MB byte buffer, then we would expect memory consumption to be at about 50MB throughout the run.
What I find is quite different. Namely that Reactive is eagerly consuming all numbers, and Transform them up front (as evidenced by numbers exhausted. message), resulting in a massive memory spike up front (#1GB for 1000 itemCount).
My basic question is: Is there a way to achieve what I need (ie minimized consumption, throttled by multi-threaded batching)?
UPDATE: sorry for reversal James; at first, i did not think paulpdaniels and Enigmativity's composition of Work(Transform) applied (this has to do with the nature of our actual implementation, which is more complex than the simple scenario provided above), however, after some further experimentation, i may be able to apply the same principles: ie defer Transform until batch executes.
You have made a couple of mistakes with your code that throws off all of your conclusions.
First up, you've done this:
IEnumerable<int> numbers = Enumerable.Range(1, itemCount);
You've used Enumerable.Range which means that when you call numbers.Select(Transform) you are going to burn through all of the numbers as fast as a single thread can take it. Rx hasn't even had a chance to do any work because up till this point your pipeline is entirely enumerable.
The next issue is in your subscriptions:
final.Subscribe();
source.Connect();
Await(final).Wait();
Because you call final.Subscribe() & Await(final).Wait(); you are creating two separate subscriptions to the final observable.
Since there is a source.Connect() in the middle the second subscription may be missing out on values.
So, let's try to remove all of the cruft that's going on here and see if we can work things out.
If you go down to this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.Select(bs => Work(bs));
Things work well. The numbers get exhausted right at the end, and processing 20 items on my machine takes about 1 second.
But this is processing everything in sequence. And the Work step provides back-pressure on Transform to slow down the speed at which it consumes the numbers.
Let's add concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs)));
This processes 20 items in 0.284 seconds, and the numbers exhaust themselves after 5 items are processed. There is no longer any back-pressure on the numbers. Basically the scheduler is handing all of the work to the Observable.Start so it is ready for the next number immediately.
Let's reduce the concurrency.
IObservable<int> final =
Observable
.Range(1, itemCount)
.Select(n => Transform(n))
.SelectMany(bs => Observable.Start(() => Work(bs), concurrency));
Now the 20 items get processed in 0.5 seconds. Only two get processed before the numbers are exhausted. This makes sense as we've limited concurrency to two threads. But still there's no back pressure on the consumption of the numbers so they get chewed up pretty quickly.
Having said all of this, I tried to construct a query with the appropriate back pressure, but I couldn't find a way. The crux comes down to the fact that Transform(...) performs far faster than Work(...) so it completes far more quickly.
So then the obvious move for me was this:
IObservable<int> final =
Observable
.Range(1, itemCount)
.SelectMany(n => Observable.Start(() => Work(Transform(n)), concurrency));
This doesn't complete the numbers until the end, and it limits processing to two threads. It appears to do the right thing for what you want, except that I've had to do Work(Transform(...)) together.
The very fact that you want to limit the amount of work you are doing suggests you should be pulling data, not having it pushed at you. I would forget using Rx in this scenario, as fundamentally, what you have described is not a reactive application. Also, Rx is best suited processing items serially; it uses sequential event streams.
Why not just keep your data source enumerable, and use PLinq, Parallel.ForEach or DataFlow? All of those sound better suited for your problem.
As #JamesWorld said it may very well be that you want to use PLinq to perform this task, it really depends on if you are actually reacting to data in your real scenario or just iterating through it.
If you choose to go the Reactive route you can use Merge to control the level of parallelization occurring:
var source = numbers
.Select(n =>
Observable.Defer(() => Observable.Start(() => Work(Transform(n)), concurrency)))
//Maximum concurrency
.Merge(10)
//Schedule all the output back onto the event loop scheduler
.ObserveOn(eventLoop);
The above code will consume all the numbers first (sorry no way to avoid that), however, by wrapping the processing in a Defer and following it up with a Merge that limits parallelization, only x number of items can be in flight at a time. Start() takes a scheduler as the second argument which it uses to execute to the provided method. Finally, Since you are basically just pushing the values of Transform into Work I composed them within the Start method.
As a side note, you can await an Observable and it will be equivalent to the code you have, i.e:
await source; //== await source.LastAsync();
For testing reasons I want to be able to adjust what time Quartz.Net currently thinks it is so I do not necessarily have to wait hours, days, or weeks in order to check that my code is working.
For this purpose I created the following simple function (it is in F# but could be easily be done in C# or another language) :
let SimulateTime = fun () ->
currentTime <- DateTimeOffset.UtcNow
timeDifferenceInSeconds <- (currentTime - lastCheckedTime).TotalSeconds
simulatedTime <- simulatedTime.AddSeconds((timeDifferenceInSeconds *scaleTimeBy))
lastCheckedTime <- currentTime
simulatedTime
Where currentTime, lastCheckedTime, and simulatedTime would all be of type DateTimeOffset and both timeDifferenceInSeconds and scaleTimeBy are of type float.
I then change SystemTime.Now and SystemTime.UtcNow to use the above function as follows :
SystemTime.Now <-
Func<DateTimeOffset>(
fun () -> SimulateTime())
SystemTime.UtcNow <-
Func<DateTimeOffset>(
fun () -> SimulateTime())
Which was shown by Mark Seemann in a previous question of mine that can find here.
Now this mostly works except it seems like the longer function causes it to be off by a decently wide margin. What I mean by this is that all of my triggers will misfire. For example if I have a trigger set to occur every hour and set scaleTimeBy to 60.0 so that every second passed counts as a minute, it will never actually trigger on time. If I have a misfire policy, the trigger can then go off but the time it lists for when it activated will be as late as the half hour mark (so takes a full 30 seconds slower than what it should have been in this example).
However I can do this :
Console.WriteLine(SimulateTime())
Thread.Sleep(TimeSpan.FromSeconds(60.0))
Console.WriteLine(SimulateTime())
And the difference between the two times output to the screen in this example will be exactly an hour so the call doesn't seem like it should be adding as much of a time difference than it does.
Anyone have any advice on how to fix this issue or a better way of handling this problem?
Edit :
So the C# version of the SimulateTime function would be something like this :
public DateTimeOffset SimulateTime() {
currentTime = DateTimeOffset.UtcNow;
double timeDifference = (currentTime - lastCheckedTime).TotalSeconds;
simulatedTime = simulatedTime.AddSeconds(timeDifference * scaleTimeBy);
lastCheckedTime = currentTime
return simulatedTime;}
If that helps anyone with solving this problem.
So this issue is misfires caused by the fact that Quartz.net will idle and wait when it thinks it doesn't have any triggers occurring any time soon to avoid making too many calls. By default it waits about 30 seconds give or take if it doesn't have any triggers occurring in the time span. The idleWaitTime variable is a Timespan set in the QuartzSchedulerThread. Now when checking for triggers that might occur soon it also uses the BatchTimeWIndow from QuartzSchedulerResources.
Both idleWaitTime and BatchTimeWindow can be set in configuration/properties files where they'd be called "org.quartz.scheduler.idleWaitTime" and "org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow."
Based off what it is called in BatchTimeWindow I thought it was just a bit of look ahead for grabbing a variable (which would like since if I'm speeding things up, I'd want a small idleWaitTime but I would want it to look further ahead for triggers because the few seconds your waiting is actually minutes so will trigger sooner than it thinks), but the description of "org.quartz.scheduler.batchTriggerAcquisitionFireAheadTimeWindow" on pages going over configuration properties implies that it can cause things to fire early and be less accurate. So to start here is the code for just modifying idleWaitTime
let threadpool = Quartz.Simpl.SimpleThreadPool()
let jobstore = Quartz.Simpl.RAMJobStore()
let idleWaitTime = TimeSpan.FromSeconds(30.0/scaleTimeBy)
let dbfailureretryinverval = TimeSpan(int64 15000)
Quartz.Impl.DirectSchedulerFactory.Instance.CreateScheduler("TestScheduler","TestInstance",threadpool,jobstore,idleWaitTime,dbfailureretryinverval)
let scheduler = Quartz.Impl.DirectSchedulerFactory.Instance.GetScheduler("TestScheduler")
You can create a Scheduler that has the idleWaitTime you want by using the DirectSchedulerFactory which probably could use a little bit better documentation. It takes also a bunch of stuff you may or may not want to modify depending on what you are working on. For threadpool I just use Quartz.net's default SimpleThreadPool because I do not care about messing with the threading at this time and would not want to explain how you go about doing so unless that was the whole point of the question. Information on jobstores is available here. I am using RAMJobStore here because it is simpler than AdoJobStore but it shouldn't matter for this example. The dbfailureretryinterval is another value that don't care about for this example so I just looked up what it is set to by default. Its value should matter the least for this example because not connecting to a database. For idleWaitTime might want to do more tests to figure out what is a good value for it, but I chose to go with just scaling its default value of 30 seconds by scaleTimeBy since that is what I'm using to scale how fast things are going by. So this should make it so if I am having the program simulate time going by at a much faster rate, then it should only remain idle for smaller periods of time. One important thing to note is that when create a scheduler in this way, it is not returned as well so need to make a separate call to get the scheduler I just created. I have no idea why this is this way, I'm guessing that if you are creating several Schedulers and not necessarily using all of them it is better this way.
Now after all that you are likely to still get a bit of a misfire rate. While it is now idling for much smaller units of time (only a few seconds so potentially an acceptable margin depending on what your use case is), it still has the issue of it is only then checking to see if it has a coming trigger in the next few fractions of a second.
So lets see if adding time to BatchTimeWindow helps matters?
let threadpool = Quartz.Simpl.SimpleThreadPool()
let threadexecutor = Quartz.Impl.DefaultThreadExecutor()
let jobstore = Quartz.Simpl.RAMJobStore()
let schedulepluginmap = System.Collections.Generic.Dictionary<String,Quartz.Spi.ISchedulerPlugin>()
let idleWaitTime = TimeSpan.FromSeconds(30.0/timeScale)
let maxBatchSize = 1
let batchTimeWindow = TimeSpan.FromSeconds(timeScale)
let scheduleexporter = Quartz.Simpl.RemotingSchedulerExporter()
Quartz.Impl.DirectSchedulerFactory.Instance.CreateScheduler("TestScheduler","TestInstance",threadpool,threadexecutor,jobstore,schedulepluginmap,idleWaitTime,maxBatchSize,batchTimeWindow,scheduleexporter)
let scheduler = Quartz.Impl.DirectSchedulerFactory.Instance.GetScheduler("TestScheduler")
Now this has even more variables that don't really care about for the purposes of this example and won't even bother going over because adjusting batchTimeWindow actually makes it worse. Like getting you back to misfiring by 30 minutes. So no, batchTimeWindow while looks like might be useful is not. Only modify idleWaitTime.
Ideally for this use would want a small wait time and a larger look ahead time, but the option for that does not seem like its available.
I explain my situation.
I have a producer 1 to N consumers pattern. I'm using blocking collections and everything is working well. Doing some test I noticed this strange behavior:
I was testing how long my manipulation of data took in my consumers.
I noticed this strange things, below you'll find the code cleaned of my manipulation and which produce the strange behavior.
I have 4 consumers for 1 producer.
For most of data, the Console doesn't print anything, because ts=0 (its under a tick) but randomly (between every 1 to 5sec) it plots something like this (not in this very specific order, but of the same kind):
10000
20001
10000
30002
10000
40003
10000
10000
It is of the order of 10,000 ticks so around 1ms. Always a number in the format (N)000(N-1)
Note that the BlockingCollection I consume is filled depending on some network events which occurred completely at random times. Nothing regular from here.
The timing is almost perfect, always a multiple of 10,000 ticks.
What could be behind this ? Thks !
while(IsAlive)
{
DataToFieldMapping item;
try
{
_CollectionToConsume.TryTake(out item, -1);
}
catch
{
item = null;
}
if (item != null)
{
long ts = (DateTime.Now.Ticks - item.TimeStamp.Ticks);
if(ts>10)
Console.WriteLine(ts);
}
}
What's going on here is that DateTime.Now has a fairly limited precision. It's not giving you the time to the nearest tick. It is only updated every 10,000 ticks or so, which is why you generally see multiples of 10k ticks in your prints.
If you really want to get a better feel for the duration of those events, use the StopWatch class, which has a much higher precision. That said, StopWatch is simply a diagnostic tool (hence why it's in the Diagnostics namespace). You should only be using it to help you diagnose what's going on, and should be using it in production code.
On a side note, there really isn't any need to use a timer here at all. It appears that you're creating several consumers that are polling the BlockingCollection for new content. There is no reason to do this. They can simply block until the collection has items. (Hence the name, BlockingCollection.
The easiest way is for the consumers to simply do this:
foreach(var item in _CollectionToConsume.GetConsumingEnumerable())
ProcessItem(item);
Then just run that code in a background thread.
if you write the following and run, you'll see that ticks do not roll one to one, but rather in relatively large chunks b/c ticks resolution is actually much smaller.
for(int i =0; i< 100; i++)
{
Console.WriteLine(DateTime.Now.Ticks);
}
Use Stopwatch class to measure performance as that one uses a high-resolution timer which is much more suitable for the purpose.
I have a multi-threaded application, and in a certain section of code I use a Stopwatch to measure the time of an operation:
MatchCollection matches = regex.Matches(text); //lazy evaluation
Int32 matchCount;
//inside this bracket program should not context switch
{
//start timer
MyStopwatch matchDuration = MyStopwatch.StartNew();
//actually evaluate regex
matchCount = matches.Count;
//adds the time regex took to a list
durations.AddDuration(matchDuration.Stop());
}
Now, the problem is if the program switches control to another thread somewhere else while the stopwatch is started, then the timed duration will be wrong. The other thread could have done any amount of work before the context switches back to this section.
Note that I am not asking about locking, these are all local variables so there is no need for that. I just want the timed section to execute continuously.
edit: another solution could be to subtract the context-switched time to get the actual time done doing work in the timed section. Don't know if that's possible.
You can't do that. Otherwise it would be very easy for any application to get complete control over the CPU timeslices assigned to it.
You can, however, give your process a high priority to reduce the probability of a context-switch.
Here is another thought:
Assuming that you don't measure the execution time of a regular expression just once but multiple times, you should not see the average execution time as an absolute value but as a relative value compared to the average execution times of other regular expressions.
With this thinking you can compare the average execution times of different regular expressions without knowing the times lost to context switches. The time lost to context switches would be about the same in every average, assuming the environment is relatively stable with regards to CPU utilization.
I don't think you can do that.
A "best effort", for me, would be to put your method in a separate thread, and use
Thread.CurrentThread.Priority = ThreadPriority.Highest;
to avoid as much as possible context switching.
If I may ask, why do you need such a precise measurement, and why can't you extract the function, and benchmark it in its own program if that's the point ?
Edit : Depending on the use case it may be useful to use
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2); // Or whatever core you want to stick to
to avoid switch between cores.
I'm programming a Netduino board using the .NET Micro Framework 4.1 and want to get a higher time resolution than milliseconds. This is because I'm attempting to dim an LED by blinking it really fast.
The issue is that the sample code uses Thread.Sleep(..) which takes a number of milliseconds.
Sample code from http://netduino.com/projects/ showing the issue in question:
OutputPort ledOnboard = new OutputPort(Pins.ONBOARD_LED, false);
while (true)
{
ledOnboard.Write(true);
Thread.Sleep(1); // << PROBLEM: Can only get as low as 1 millisecond
Even if there's another way to accomplish dimming by not using a greater time resolution, I'm game.
This doesn't answer your question about getting a better time resolution, but it does solve your problem with changing the brightness on an LED. You should be using the PWM module for the Netduino.
Netduino Basics: Using Pulse Width Modulation (PWM) is a great article on how to use it.
I have had a similar problem in the past and used the following method to time in the microsecond range. The first line determines how many ticks are in a millisecond (its been a while since I used this, but I think 1 tick was 10 microseconds). The second line gets the amount of time the system has been on (in ticks). I hope this helps.
public const Int64 ticks_per_millisecond = System.TimeSpan.TicksPerMillisecond;
public static long GetCurrentTimeInTicks()
{
return Microsoft.SPOT.Hardware.Utility.GetMachineTime().Ticks;
}
You can use a timer to raise an event instead of using sleep.
The Interval property on a timer is a double so you can have less than a millisecond on it.
http://msdn.microsoft.com/en-us/library/0tcs6ww8(v=VS.90).aspx
In his comment to Seidleroni's answer BrainSlugs83 suggests "sit in a busy loop and wait for the desired number of ticks to elapse. See the function I added in the edit". But I cannot see the function added to the edit. I assume it would be something like this:
using System;
using Microsoft.SPOT.Hardware;
private static long _TicksPerMicroSecond = TimeSpan.TicksPerMillisecond/1000;
private void Wait(long microseconds)
{
var then = Utility.GetMachineTime().Ticks;
var ticksToWait = microseconds * _TicksPerNanoSecond;
while (true)
{
var now = Utility.GetMachineTime().Ticks;
if ((now - then) > ticksToWait) break;
}
}
A point that you might not be thinking about is that your code is relying on the .NET System namespace, which is based on the real time clock in your PC. Notice that the answers rely on the timer in the device.
Moving forward, I would suggest that you take a moment to qualify the source of the information you are using in your code -- is it .NET proper (Which is fundamentally based on your PC), or the device the code is running on (Which will have a namespace other than System, for example)?
PWM is a good way to control DC current artificially (by varying the pulse width), but varying the PWM frequency will still be a function of time at the end of the day.
Rather than use delays....like Sleep....you might want to spawn a thread and have it manage the brightness. Using Sleep is still basically a straight line procedural method and your code will only be able to do this one thing if you use a single thread.