Take(1) from a Select() on a hot observable - c#

What I'm trying to do is have a hot observable, and then derive another observable from it through Select.
Next I want to use await Take(1) to get a single value from the derived observable and then subsequently subscribe to it.
int i = 1;
var o1 = Observable
.Interval(TimeSpan.FromSeconds(1))
.Select(x => { i++;return i; })
.Publish()
.RefCount()
.Do(x => Console.WriteLine($"emit {x}"));
var o2 = o1.Select(x => x + 5);
await o2.Take(1);
Console.ReadLine();
using (o2.Subscribe(x =>
{
Console.WriteLine($"output {x}");
}))
{
Console.WriteLine($"subscrbied");
Console.ReadLine();
}
Console.ReadLine();
However, what I'm seeing is that after the await Take(1), the observable no longer "works" (no "emit x" is being printed anymore).
Why is this?
EDIT
Interestingly, if I add a Task.Delay it works:
var o2 = o1.Select(x => x + 5);
await o2.Take(1);
await Task.Delay(1);
Console.ReadLine();

The combination .Publish().RefCount() can be a bit nasty to work with. There are occasions when the subscribers go to zero that the query cannot be resubscribed to.
However, in this case there seems to be a race condition that I've yet to fully figure out.
Here's how to make your code work:
await o2.Take(1).ObserveOn(Scheduler.Default);
The addition of the ObserveOn allows it to operate the way you expect.
Your use of await Task.Delay(1); did the same thing. But why is still confusing me.
Finally, ignoring the kludge to make it work, the only reason it appears to work is that you're using the external state int i = 1;. You can remove the .Publish().RefCount() and it'll still work as expected. You should avoid this kind of external state and if you do use it you should use Interlocked.Increment(ref i) instead of i++.
Based on our discussions, here's an alternative way to get what you need:
var subject = new ReplaySubject<long>(1);
var source = Observable.Interval(TimeSpan.FromSeconds(1.0));
var subscription = source.Subscribe(subject);
Thread.Sleep(TimeSpan.FromSeconds(4.5));
var z = await subject.Take(1);
Console.WriteLine($"0:{z}z");
Thread.Sleep(TimeSpan.FromSeconds(2.5));
subject.Take(5).Subscribe(x => Console.WriteLine($"1:{x}x"));
Thread.Sleep(TimeSpan.FromSeconds(2.5));
subject.Take(5).Subscribe(x => Console.WriteLine($"2:{x}x"));

Related

How can I assign the value of a property, within each task of a Task.WhenAll?

How can I set a Task.WhenAll() result to a value within a Task.WhenAll() routine? For example, the following code will get all authorized users for all domain/group combos in appRoleMaps:
var results = await Task.WhenAll(appRoleMaps.Select(
x => GetAuthorizedUsers(x.DomainName, x.ADGroupName)));
But what if I want to set the authorized users results of each iteration as the iteration item Authorized property value? For example, something like the following code (although the following code does not work):
var results = await Task.WhenAll(appRoleMaps.Select(
x => x.AuthorizedUsers = GetAuthorizedUsers(x.DomainName, x.ADGroupName)));
Is there a streamlined way to do this? Reason being, the GetAuthorizedUsers result set does not include domain/group info, so I can't just do a simple foreach/where at the end to easily join by this info.
You could create an async lambda to pass into your Select. This would cause each result to be assigned to the AuthorizedUsers property on the associated instance. The outer Task.WhenAll is only required to know when all elements have been processed.
await Task.WhenAll(appRoleMaps.Select(async x =>
x.AuthorizedUsers = await GetAuthorizedUsers(x.DomainName, x.ADGroupName)));
Save the tasks and query them after waiting:
var tasks = appRoleMaps.Select(x => GetAuthorizedUsers(x.DomainName, x.ADGroupName)).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result).ToList();
This is cleaner than relying on side effects. Squirreling the value away into some property and later extracting it obfuscates the meaning of the code and is more work to code.

Reactive Extensions SelectMany with large objects

I have this little piece of code that simulates a flow that uses large objects (that huge byte[]). For each item in the sequence, an async method is invoked to get some result. The problem? As it is, it throws OutOfMemoryException.
Code compatible with LINQPad (C# Program):
void Main()
{
var selectMany = Enumerable.Range(1, 100)
.Select(i => new LargeObject(i))
.ToObservable()
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time. How can I do it the right way?
The underlying idea is to invoke DoSomethingAsync in order to get some result for each object, so that's why I use SelectMany. To simplify, I just have introduced a Task.Delay, but in real life it is a service that can process some items concurrently, so I want to introduce some concurrency mechanism to get advantage of it.
Please, notice that, theoretically, processing a little number of items at time shouldn't fill the memory. In fact, we only need each "large object" to get the results of the DoSomethingAsync method. After that point, the large object isn't used anymore.
I feel like i'm repeating myself. Similar to your last question and my last answer, what you need to do is limit the number of bigObjects™ to be created concurrent.
To do so, you need to combine object creation and processing and put it on the same thread pool. Now the problem is, we use async methods to allow threads to do other things while our async method run. Since your slow network call is async, your (fast) object creation code will keep creating large objects too fast.
Instead, we can use Rx to keep count of the number of concurrent Observables running by combine the object creation with the async call and use .Merge(maxConcurrent) to limit concurrency.
As a bonus, we can also set a minimal time for queries to execute. Just Zip with something that takes a minimal delay.
static void Main()
{
var selectMany = Enumerable.Range(1, 100)
.ToObservable()
.Select(i => Observable.Defer(() => Observable.Return(new LargeObject(i)))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)))
.Zip(Observable.Timer(TimeSpan.FromMilliseconds(400)), (el, _) => el)
).Merge(4);
selectMany
.Subscribe(r => Console.WriteLine(r));
Console.ReadLine();
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
Console.WriteLine(id + "!");
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time.
Yes, because you are creating them all at once.
If I simplify your code I can show you why:
void Main()
{
var selectMany =
Enumerable
.Range(1, 5)
.Do(x => Console.WriteLine($"{x}!"))
.ToObservable()
.SelectMany(i => Observable.FromAsync(() => DoSomethingAsync(i)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(int i)
{
await Task.Delay(1);
return i;
}
Running this produces:
1!
2!
3!
4!
5!
4
3
5
2
1
Because of the Observable.FromAsync you are allowing the source to run to completion before any of the results return. In other words you are quickly building all of the large objects, but slowly processing them.
You should allow Rx to run synchronously, but on the default scheduler so that your main thread is not blocked. The code will then run without any memory issues and your program will remain responsive on the main thread.
Here's the code for this:
var selectMany =
Observable
.Range(1, 100, Scheduler.Default)
.Select(i => new LargeObject(i))
.Select(o => DoSomethingAsync(o))
.Select(t => t.Result);
(I've effectively replaced Enumerable.Range(1, 100).ToObservable() with Observable.Range(1, 100) as that will also help with some issues.)
I've tried testing other options, but so far anything that allows DoSomethingAsync to run asynchronously runs into the out of memory error.
ConcatMap supports this out of the box. I know this operator is not available in .net, but you can make the same using Concat operator which defers subscribing to each inner source until the previous one completes.
You can introduce a time interval delay this way:
var source = Enumerable.Range(1, 100)
.ToObservable()
.Zip(Observable.Interval(TimeSpan.FromSeconds(1)), (i, ts) => i)
.Select(i => new LargeObject(i))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
So instead of pulling all 100 integers at once, immediately converting them to the LargeObject then calling DoSomethingAsync on all 100, it drips the integers out one-by-one spaced out one second each.
This is what a TPL+Rx solution would look like. Needless to say it is less elegant than Rx alone, or TPL alone. However, I don't think this problem is well suited for Rx:
void Main()
{
var source = Observable.Range(1, 100);
const int MaxParallelism = 5;
var transformBlock = new TransformBlock<int, int>(async i => await DoSomethingAsync(new LargeObject(i)),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = MaxParallelism });
source.Subscribe(transformBlock.AsObserver());
var selectMany = transformBlock.AsObservable();
selectMany
.Subscribe(r => Console.WriteLine(r));
}

Counting Non-Faulted Tasks causes re-execution of each task

I am saving a bunch of items to my database using async saves
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
});
await Task.WhenAll(tasks);
I need to verify how many times SaveAsync was successful (It throws and exception if something goes wrong). I am using IsFaulted flag to examine the tasks:
var successCount = tasks.Count(t => !t.IsFaulted);
Collection items consists of 3 elements so SaveAsync should have been called three times but it is executed 6 times. Upon closer examination I noticed that counting non-faulted tasks with c.Count(...) causes each of the task to re-run.
I suspect it has something to do with deferred LINQ execution but I am not sure why exactly and how to fix this.
Any suggestion why I observe this behavior and what would be the optimal pattern to avoid this artifact?
It happens because of multiple enumeration of your Select query.
In order to fix it, force enumeration by calling ToList() method. Then it will work correctly.
var tasks = items.Select(item =>
{
var clone = item.MakeCopy();
clone.Id = Guid.NewGuid();
return dbAccess.SaveAsync(clone);
})
.ToList();
Also you may take a look at these more detailed answers:
https://stackoverflow.com/a/8240935/3872935
https://stackoverflow.com/a/20129161/3872935.

What is the functional way to properly set a dependent predicate for Observable sequence without side effect?

I have three observables oGotFocusOrDocumentSaved, oGotFocus and oLostFocus. I would like oGotFocusOrDocumentSaved to push sequences only when _active is true. My implementation below works as needed, but it introduces a side-effect on _active. Is there anyway to remove side-effect but still get the same functionality?
class TestClass
{
private bool _active = true;
public TestClass(..)
{
...
var oLostFocus = Observable
.FromEventPattern<EventArgs>(_view, "LostFocus")
.Throttle(TimeSpan.FromMilliseconds(500));
var oGotFocus = Observable
.FromEventPattern<EventArgs>(_view, "GotFocus")
.Throttle(TimeSpan.FromMilliseconds(500));
var oGotFocusOrDocumentSaved = oDocumentSaved // some other observable
.Merge<CustomEvtArgs>(oGotFocus)
.Where(_ => _active)
.Publish();
var lostFocusDisposable = oLostFocus.Subscribe(_ => _active = false);
var gotFocusDisposable = oGotFocus.Subscribe(_ => _active = true);
// use case
oGotFocusOrDocumentSaved.Subscribe(x => DoSomethingWith(x));
...
}
...
}
It does sound like you really want a oDocumentSavedWhenHasFocus rather than a oGotFocusOrDocumentSaved observable.
So try using the .Switch() operator, like this:
var oDocumentSavedWhenHasFocus =
oGotFocus
.Select(x => oDocumentSaved.TakeUntil(oLostFocus))
.Switch();
This should be fairly obvious as to how it works, once you know how .Switch() works.
A combination of SelectMany and TakeUntil should get you where you need to be.
from g in oGotFocus
from d in oDocumentSaved
.Merge<CustomEvtArgs>(oGotFocus)
.TakeUntil(oLostFocus)
It seems that you want to be notified when the document is saved, but only if the document currently has focus. Correct? (And you also want to be notified when the document gets focus, but that can easily be merged in later.)
Think in terms of windows instead of point events; i.e., join by coincidence.
Your requirement can be represented as a Join query whereby document saves are joined to focus windows, thus yielding notifications only when both overlap; i.e., when both are "active".
var oGotFocusOrDocumentSaved =
(from saved in oDocumentSaved
join focused in oGotFocus
on Observable.Empty<CustomEventArgs>() // oDocumentSave has no duration
equals oLostFocus // oGotFocus duration lasts until oLostFocus
select saved)
.Merge(oGotFocus);

Parallel Linq - return first result that comes back

I'm using PLINQ to run a function that tests serial ports to determine if they're a GPS device.
Some serial ports immediately are found to be a valid GPS. In this case, I want the first one to complete the test to be the one returned. I don't want to wait for the rest of the results.
Can I do this with PLINQ, or do I have to schedule a batch of tasks and wait for one to return?
PLINQ is probably not going to suffice here. While you can use .First, in .NET 4, this will cause it to run sequentially, which defeats the purpose. (Note that this will be improved in .NET 4.5.)
The TPL, however, is most likely the right answer here. You can create a Task<Location> for each serial port, and then use Task.WaitAny to wait on the first successful operation.
This provides a simple way to schedule a bunch of "tasks" and then just use the first result.
I have been thinking about this on and off for the past couple days and I can't find a built in PLINQ way to do this in C# 4.0. The accepted answer to this question of using FirstOrDefault does not return a value until the full PLINQ query is complete and still returns the (ordered) first result. The following extreme example shows the behavior:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
// waits until all results are in, then returns first
q.FirstOrDefault().Dump("result");
I don't see a built-in way to immediately get the first available result, but I was able to come up with two workarounds.
The first creates Tasks to do the work and returns the Task, resulting in a quickly completed PLINQ query. The resulting tasks can be passed to WaitAny to get the first result as soon as it is available:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
return Task.Factory.StartNew(() =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
});
cts.CancelAfter(5000);
// returns as soon as the tasks are created
var ts = q.ToArray();
// wait till the first task finishes
var idx = Task.WaitAny( ts );
ts[idx].Result.Dump("res");
This is probably a terrible way to do it. Since the actual work of the PLINQ query is just a very fast Task.Factory.StartNew, it's pointless to use PLINQ at all. A simple .Select( i => Task.Factory.StartNew( ... on the IEnumerable is cleaner and probably faster.
The second workaround uses a queue (BlockingCollection) and just inserts results into this queue once they are computed:
var cts = new CancellationTokenSource();
var rnd = new ThreadLocal<Random>(() => new Random());
var q = Enumerable.Range(0, 11).Select(x => x).AsParallel()
.WithCancellation(cts.Token).WithMergeOptions( ParallelMergeOptions.NotBuffered).WithDegreeOfParallelism(10).AsUnordered()
.Where(i => i % 2 == 0 )
.Select( i =>
{
if( i == 0 )
Thread.Sleep(3000);
else
Thread.Sleep(rnd.Value.Next(50, 100));
return string.Format("dat {0}", i).Dump();
});
cts.CancelAfter(5000);
var qu = new BlockingCollection<string>();
// ForAll blocks until PLINQ query is complete
Task.Factory.StartNew(() => q.ForAll( x => qu.Add(x) ));
// get first result asap
qu.Take().Dump("result");
With this method, the work is done using PLINQ, and the BlockingCollecion's Take() will return the first result as soon as it is inserted by the PLINQ query.
While this produces the desired result, I am not sure it has any advantage over just using the simpler Tasks + WaitAny
Upon further review, you can apparently just use FirstOrDefault to solve this. PLINQ will not preserve ordering by default, and with an unbuffered query, will return immediately.
http://msdn.microsoft.com/en-us/library/dd460677.aspx
To accomplish this entirely with PLINQ in .NET 4.0:
SerialPorts. // Your IEnumerable of serial ports
AsParallel().AsUnordered(). // Run as an unordered parallel query
Where(IsGps). // Matching the predicate IsGps (Func<SerialPort, bool>)
Take(1). // Taking the first match
FirstOrDefault(); // And unwrap it from the IEnumerable (or null if none are found
The key is to not use an ordered evaluation like First or FirstOrDefault until you have specified that you only care to find one.

Categories