.net Rx: in-order batch-processing of messages - c#

I am attempting to implement an asynchronous workflow using Rx and I seem to be doing it completely wrong.
What I would like to do is this:
From an undefined asynchronous stream of un-parsed message strings (i.e. an IObservable<string>)
parse the message strings asynchronously, but preserve their order. (IObservable<Message>)
Batch up parsed Messages in groups of 100 or so (IObservable<IEnumerable<Message>>)
Send each batch, when complete, to the UI thread to be processed. Batches must arrive in the same order they were started.
I can't seem to get the order-preservation, and also Rx doesn't appear to be doing things asynchronously when I expected them to.
I made an attempt at order preservation by using an IEnumerable instead of an IObservable, and then calling the .AsParallel().AsOrdered() operators on it. Here is the code. See notes below for the issues I'm having:
private IObservable<IEnumerable<Message>> messageSource;
public IObservable<IEnumerable<Message>> MessageSource { get { return messageSource; } }
/// <summary>
/// Sub-classes of MessageProviderBase provide this IEnumerable to
/// generate unparsed message strings synchronously
/// </summary>
protected abstract IEnumerable<string> UnparsedMessages { get; }
public MessageProviderBase()
{
// individual parsed messages as a PLINQ query
var parsedMessages = from unparsedMessage in UnparsedMessages.AsParallel().AsOrdered()
select ParseMessage(unparsedMessage);
// convert the above PLINQ query to an observable, buffering up to 100 messages at a time
var batchedMessages
= parsedMessages.ToObservable().BufferWithTimeOrCount(TimeSpan.FromMilliseconds(200), 100);
// ISSUE #1:
// batchedMessages seems to call OnNext before all of the messages in its buffer are parsed.
// If you convert the IObservable<Message> it generates to an enumerable, it blocks
// when you try to enumerate it.
// Convert each batch to an IEnumerable
// ISSUE #2: Even if the following Rx query were to run asynchronously (it doesn't now, see the above comment),
// it could still deliver messages out of order. Only, instead of delivering individual
// messages out of order, the message batches themselves could arrive out of order.
messageSource = from messageBatch in batchedMessages
select messageBatch.ToEnumerable().ToList();
}

My answer below is somewhat based on Enigmativity's code, but fixes a number of race conditions related to completion and also adds support for cancellation and custom schedulers (which would make unit testing it significantly easier).
public static IObservable<U> Fork<T, U>(this IObservable<T> source,
Func<T, U> selector)
{
return source.Fork<T, U>(selector, Scheduler.TaskPool);
}
public static IObservable<U> Fork<T, U>(this IObservable<T> source,
Func<T, U> selector, IScheduler scheduler)
{
return Observable.CreateWithDisposable<U>(observer =>
{
var runningTasks = new CompositeDisposable();
var lockGate = new object();
var queue = new Queue<ForkTask<U>>();
var completing = false;
var subscription = new MutableDisposable();
Action<Exception> onError = ex =>
{
lock(lockGate)
{
queue.Clear();
observer.OnError(ex);
}
};
Action dequeue = () =>
{
lock (lockGate)
{
var error = false;
while (queue.Count > 0 && queue.Peek().Completed)
{
var task = queue.Dequeue();
observer.OnNext(task.Value);
}
if (completing && queue.Count == 0)
{
observer.OnCompleted();
}
}
};
Action onCompleted = () =>
{
lock (lockGate)
{
completing = true;
dequeue();
}
};
Action<T> enqueue = t =>
{
var cancellation = new MutableDisposable();
var task = new ForkTask<U>();
lock(lockGate)
{
runningTasks.Add(cancellation);
queue.Enqueue(task);
}
cancellation.Disposable = scheduler.Schedule(() =>
{
try
{
task.Value = selector(t);
lock(lockGate)
{
task.Completed = true;
runningTasks.Remove(cancellation);
dequeue();
}
}
catch(Exception ex)
{
onError(ex);
}
});
};
return new CompositeDisposable(runningTasks,
source.AsObservable().Subscribe(
t => { enqueue(t); },
x => { onError(x); },
() => { onCompleted(); }
));
});
}
private class ForkTask<T>
{
public T Value = default(T);
public bool Completed = false;
}
Here is a sample that randomizes the task execution time to test it:
AutoResetEvent are = new AutoResetEvent(false);
Random rand = new Random();
Observable.Range(0, 5)
.Fork(i =>
{
int delay = rand.Next(50, 500);
Thread.Sleep(delay);
return i + 1;
})
.Subscribe(
i => Console.WriteLine(i),
() => are.Set()
);
are.WaitOne();
Console.ReadLine();

Given you have:
IObservable<string> UnparsedMessages = ...;
Func<string, Message> ParseMessage = ...;
Then you could use a SelectAsync extension method like so:
IObservable<Message> ParsedMessages = UnparsedMessages.SelectAsync(ParseMessage);
The SelectAsync extension method processes each unparsed message asynchronously and ensures that the results come back in the order they arrived.
Let me know if this does what you need.
Here's the code:
public static IObservable<U> SelectAsync<T, U>(this IObservable<T> source,
Func<T, U> selector)
{
var subject = new Subject<U>();
var queue = new Queue<System.Threading.Tasks.Task<U>>();
var completing = false;
var subscription = (IDisposable)null;
Action<Exception> onError = ex =>
{
queue.Clear();
subject.OnError(ex);
subscription.Dispose();
};
Action dequeue = () =>
{
lock (queue)
{
var error = false;
while (queue.Count > 0 && queue.Peek().IsCompleted)
{
var task = queue.Dequeue();
if (task.Exception != null)
{
error = true;
onError(task.Exception);
break;
}
else
{
subject.OnNext(task.Result);
}
}
if (!error && completing && queue.Count == 0)
{
subject.OnCompleted();
subscription.Dispose();
}
}
};
Action<T> enqueue = t =>
{
if (!completing)
{
var task = new System.Threading.Tasks.Task<U>(() => selector(t));
queue.Enqueue(task);
task.ContinueWith(tu => dequeue());
task.Start();
}
};
subscription = source.Subscribe(
t => { lock(queue) enqueue(t); },
x => { lock(queue) onError(x); },
() => { lock(queue) completing = true; });
return subject.AsObservable();
}
I ended up needing to revisit this for work and wrote a more robust version of this code (based also on Richard's answer.)
The key advantage to this code is the absence of any explicit queue. I'm purely using task continuations to put the results back in order. Works like a treat!
public static IObservable<U> ForkSelect<T, U>(this IObservable<T> source, Func<T, U> selector)
{
return source.ForkSelect<T, U>(t => Task<U>.Factory.StartNew(() => selector(t)));
}
public static IObservable<U> ForkSelect<T, U>(this IObservable<T> source, Func<T, Task<U>> selector)
{
if (source == null) throw new ArgumentNullException("source");
if (selector == null) throw new ArgumentNullException("selector");
return Observable.CreateWithDisposable<U>(observer =>
{
var gate = new object();
var onNextTask = Task.Factory.StartNew(() => { });
var sourceCompleted = false;
var taskErrored = false;
Action<Exception> onError = ex =>
{
sourceCompleted = true;
onNextTask = onNextTask.ContinueWith(t => observer.OnError(ex));
};
Action onCompleted = () =>
{
sourceCompleted = true;
onNextTask = onNextTask.ContinueWith(t => observer.OnCompleted());
};
Action<T> onNext = t =>
{
var task = selector(t);
onNextTask = Task.Factory.ContinueWhenAll(new[] { onNextTask, task }, ts =>
{
if (!taskErrored)
{
if (task.IsFaulted)
{
taskErrored = true;
observer.OnError(task.Exception);
}
else
{
observer.OnNext(task.Result);
}
}
});
};
var subscription = source
.AsObservable()
.Subscribe(
t => { if (!sourceCompleted) lock (gate) onNext(t); },
ex => { if (!sourceCompleted) lock (gate) onError(ex); },
() => { if (!sourceCompleted) lock (gate) onCompleted(); });
var #return = new CompositeDisposable(subscription);
return #return;
});
}
And the SelectMany overloads to allow LINQ to be used are:
public static IObservable<U> SelectMany<T, U>(this IObservable<T> source, Func<T, Task<U>> selector)
{
return source.ForkSelect<T, U>(selector);
}
public static IObservable<V> SelectMany<T, U, V>(this IObservable<T> source, Func<T, Task<U>> taskSelector, Func<T, U, V> resultSelector)
{
if (source == null) throw new ArgumentNullException("source");
if (taskSelector == null) throw new ArgumentNullException("taskSelector");
if (resultSelector == null) throw new ArgumentNullException("resultSelector");
return source.Zip(source.ForkSelect<T, U>(taskSelector), (t, u) => resultSelector(t, u));
}
So these methods can now be used like this:
var observableOfU = observableOfT.ForkSelect(funcOfT2U);
Or:
var observableOfU = observableOfT.ForkSelect(funcOfT2TaskOfU);
Or:
var observableOfU =
from t in observableOfT
from u in funcOfT2TaskOfU(t)
select u;
Enjoy!

Related

Is it better to block on an event in CPU-bound multithreaded method than making it async?

I have a method that will spawn lots of CPU-bound workers with Task.Run(). Each worker may in turn spawn more workers, but I'm guaranteed that eventually, all workers will stop executing. My first thought was writing my method like this:
public Result OrchestrateWorkers(WorkItem[] workitems)
{
this.countdown = new CountdownEvent(0);
this.results = new ConcurrentQueue<WorkerResult>();
foreach (var workItem in workitems)
{
SpawnWorker(workItem);
}
this.countdown.Wait(); // until all spawned workers have completed.
return ComputeTotalResult(this.results);
}
The public SpawnWorker method is used to start a worker, and to keep track of when they complete by enqueueing the worker's result and decrementing the countdown.
public void SpawnWorker(WorkItem workItem)
{
this.countdown.AddCount();
Task.Run(() => {
// Worker is passed an instance of this class
// so it can call SpawnWorker if it needs to.
var worker = new Worker(workItem, this);
var result = worker.DoWork();
this.results.Enqueue(result);
countdown.Signal();
});
}
Each worker can call SpawnWorker as much as they like, but they're guaranteed to terminate at some point.
In this design, the thread that calls OrchestrateWorkers will block untill all the workers have completed. My thinking is that it's a shame that there's a blocked thread; it would be nice if it could be doing work as well.
Would it be better to rearchitect the solution to something like this?
public Task<Result> OrchestrateWorkersAsync(WorkItem[] workitems)
{
if (this.tcs is not null) throw InvalidOperationException("Already running!");
this.tcs = new TaskCompletionSource<Result>();
this.countdown = 0; // just a normal integer.
this.results = new ConcurrentQueue<WorkerResult>();
foreach (var workItem in workitems)
{
SpawnWorker(workItem);
}
return tcs.Task;
}
public void SpawnWorker(WorkItem workItem)
{
Interlocked.Increment(ref this.countdown);
Task.Run(() => {
var worker = new Worker(workItem, this);
var result = worker.DoWork();
this.results.Enqueue(result);
if (Interlocked.Decrement(ref countdown) == 0)
{
this.tcs.SetResult(this.ComputeTotalResult(this.results));
}
});
}
EDIT: I've added a more full-fleshed sample below. It should be compileable and runnable. I'm seeing a ~10% performance improvement on my 8-core system, but I want to make sure this is the "canonical" way to orchestrate a swarm of spawning tasks.
using System.Collections.Concurrent;
using System.Diagnostics;
using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using System.Linq;
public class Program
{
const int ITERATIONS = 2500000;
const int WORKERS = 200;
public static async Task Main()
{
var o = new Orchestrator<int, int>();
var oo = new OrchestratorAsync<int, int>();
var array = Enumerable.Range(0, WORKERS);
var result = Time(() => o.OrchestrateWorkers(array, DoWork));
Console.Error.WriteLine("Sync spawned {0} workers", result.Count());
var resultAsync = await TimeAsync(() => oo.OrchestrateWorkersAsync(array, DoWorkAsync));
Console.Error.WriteLine("Async spawned {0} workers", resultAsync.Count());
}
static async Task<T> TimeAsync<T>(Func<Task<T>> work)
{
var sw = new Stopwatch();
sw.Start();
var result = await work();
sw.Stop();
Console.WriteLine("Total async time: {0}", sw.ElapsedMilliseconds);
return result;
}
static T Time<T>(Func<T> work)
{
var sw = new Stopwatch();
sw.Start();
var result = work();
sw.Stop();
Console.WriteLine("Total time: {0}", sw.ElapsedMilliseconds);
return result;
}
static int DoWork(int x, Orchestrator<int, int> arg2)
{
var rnd = new Random();
int n = 0;
for (int i = 0; i < ITERATIONS; ++i)
{
n += rnd.Next();
}
if (x >= 0)
{
arg2.SpawnWorker(-1, DoWork);
arg2.SpawnWorker(-1, DoWork);
}
return n;
}
static int DoWorkAsync(int x, OrchestratorAsync<int, int> arg2)
{
var rnd = new Random();
int n = 0;
for (int i = 0; i < ITERATIONS; ++i)
{
n += rnd.Next();
}
if (x >= 0)
{
arg2.SpawnWorker(-1, DoWorkAsync);
arg2.SpawnWorker(-1, DoWorkAsync);
}
return n;
}
public class Orchestrator<TWorkItem, TResult>
{
private ConcurrentQueue<TResult> results;
private CountdownEvent countdownEvent;
public Orchestrator()
{
this.results = new();
this.countdownEvent = new(1);
}
public IEnumerable<TResult> OrchestrateWorkers(
IEnumerable<TWorkItem> workItems,
Func<TWorkItem, Orchestrator<TWorkItem, TResult>, TResult> worker)
{
foreach (var workItem in workItems)
{
SpawnWorker(workItem, worker);
}
countdownEvent.Signal();
countdownEvent.Wait();
return results;
}
public void SpawnWorker(
TWorkItem workItem,
Func<TWorkItem, Orchestrator<TWorkItem, TResult>, TResult> worker)
{
this.countdownEvent.AddCount(1);
Task.Run(() =>
{
var result = worker(workItem, this);
this.results.Enqueue(result);
countdownEvent.Signal();
});
}
}
public class OrchestratorAsync<TWorkItem, TResult>
{
private ConcurrentQueue<TResult> results;
private volatile int countdown;
private TaskCompletionSource<IEnumerable<TResult>> tcs;
public OrchestratorAsync()
{
this.results = new();
this.countdown = 0;
this.tcs = new TaskCompletionSource<IEnumerable<TResult>>();
}
public Task<IEnumerable<TResult>> OrchestrateWorkersAsync(
IEnumerable<TWorkItem> workItems,
Func<TWorkItem, OrchestratorAsync<TWorkItem, TResult>, TResult> worker)
{
this.countdown = 0; // just a normal integer.
foreach (var workItem in workItems)
{
SpawnWorker(workItem, worker);
}
return tcs.Task;
}
public void SpawnWorker(TWorkItem workItem,
Func<TWorkItem, OrchestratorAsync<TWorkItem, TResult>, TResult> worker)
{
Interlocked.Increment(ref this.countdown);
Task.Run(() =>
{
var result = worker(workItem, this);
this.results.Enqueue(result);
if (Interlocked.Decrement(ref countdown) == 0)
{
this.tcs.SetResult(this.results);
}
});
}
}
}
There's one big problem with the code as-written: the tasks fired off by Task.Run are discarded. This means there's no way to detect if anything goes wrong (i.e., an exception). It also means that there's not an easy way to aggregate results during execution, which is a common requirement; this lack of natural result handling is making the code collect results "out of band" in a separate collection.
These are the flags that this code is asking for adjustment to its structure. This is actual parallel code (i.e., not asynchronous), so parallel patterns are appropriate. You don't know how many tasks you need initially, so basic Data/Task Parallelism (such as a Parallel or PLINQ approach) won't suffice. At this point, you're needing Dynamic Task Parallelism, which is the most complex kind of parallelism. The TPL does support it, but your code just has to use the lower-level APIs to get it done.
Since you have dynamically-added work and since your structure is generally tree-shaped (each work can add other work), you can introduce an artificial root and then use child tasks. This will give you two - and possibly three - benefits:
All exceptions are no longer ignored. Child task exceptions are propagated up to their parents, all the way to the root.
You know when all the tasks are complete. Since parent tasks only complete when all their children complete, there's no need for a countdown event or any other orchestrating synchronization primitive; your code just has to wait on the root task, and all the work is done when that task completes.
If it is possible/desirable to reduce results as you go (a common requirement), then the child tasks can return the results and you will end up with the already-reduced results as the result of your root task.
Example code (ignoring (3) since it's not clear whether results can be reduced):
public class OrchestratorParentChild<TWorkItem, TResult>
{
private readonly ConcurrentQueue<TResult> results = new();
public IEnumerable<TResult> OrchestrateWorkers(
IEnumerable<TWorkItem> workItems,
Func<TWorkItem, OrchestratorParentChild<TWorkItem, TResult>, TResult> worker)
{
var rootTask = Task.Factory.StartNew(
() =>
{
foreach (var workItem in workItems)
SpawnWorker(workItem, worker);
},
default,
TaskCreationOptions.None,
TaskScheduler.Default);
rootTask.Wait();
return results;
}
public void SpawnWorker(
TWorkItem workItem,
Func<TWorkItem, OrchestratorParentChild<TWorkItem, TResult>, TResult> worker)
{
_ = Task.Factory.StartNew(
() => results.Enqueue(worker(workItem, this)),
default,
TaskCreationOptions.AttachedToParent,
TaskScheduler.Default);
}
}
Note that an "orchestrator" isn't normally used. Code using the Dynamic Task Parallelism pattern usually just calls StartNew directly instead of calling some orchestrator "spawn work" method.
In case you're wondering how this may look with results, here's one possibility:
public class OrchestratorParentChild<TWorkItem, TResult>
{
public TResult OrchestrateWorkers(
IEnumerable<TWorkItem> workItems,
Func<TWorkItem, OrchestratorParentChild<TWorkItem, TResult>, Func<IEnumerable<TResult>, TResult>, TResult> worker,
Func<IEnumerable<TResult>, TResult> resultReducer)
{
var rootTask = Task.Factory.StartNew(
() =>
{
var childTasks = workItems.Select(x => SpawnWorker(x, worker, resultReducer)).ToArray();
Task.WaitAll(childTasks);
return resultReducer(childTasks.Select(x => x.Result));
},
default,
TaskCreationOptions.None,
TaskScheduler.Default);
return rootTask.Result;
}
public Task<TResult> SpawnWorker(
TWorkItem workItem,
Func<TWorkItem, OrchestratorParentChild<TWorkItem, TResult>, Func<IEnumerable<TResult>, TResult>, TResult> worker,
Func<IEnumerable<TResult>, TResult> resultReducer)
{
return Task.Factory.StartNew(
() => worker(workItem, this, resultReducer),
default,
TaskCreationOptions.AttachedToParent,
TaskScheduler.Default);
}
}
As a final note, I rarely plug my book on this site, but you may find it helpful. Also a copy of "Parallel Programming with Microsoft® .NET: Design Patterns for Decomposition and Coordination on Multicore Architectures" if you can find it; it's a bit out of date in some places but still good overall if you want to do TPL programming.

How can I route Observable values to different Subscribers?

This is all just pseudo code...
Ok here is my scenario, I have an incoming data stream that gets parsed into packets.
I have an IObservable<Packets> Packets
Each packet has a Packet ID, i.e. 1, 2, 3, 4
I want to create observables that only receive a specific ID.
so I do:
Packets.Where(p=>p.Id == 1)
for example... that gives me an IObservable<Packets> that only gives me packets of Id 1.
I may have several of these:
Packets.Where(p=>p.Id == 2)
Packets.Where(p=>p.Id == 3)
Packets.Where(p=>p.Id == 4)
Packets.Where(p=>p.Id == 5)
This essentially works, but the more Ids I want to select the more processing is required, i.e. the p=>p.Id will be run for every single Id, even after a destination Observable has been found.
How can I do the routing so that it is more efficient, something analogous:
Dictionary listeners;
listeners.GetValue(packet.Id).OnDataReceived(packet)
so that as soon as an id is picked up by one of my IObservables, then none of the others get to see it?
Updates
Added an extension based on Lee Campbell's groupby suggestion:
public static class IObservableExtensions
{
class RouteTable<TKey, TSource>
{
public static readonly ConditionalWeakTable<IObservable<TSource>, IObservable<IGroupedObservable<TKey, TSource>>> s_routes = new ConditionalWeakTable<IObservable<TSource>, IObservable<IGroupedObservable<TKey, TSource>>>();
}
public static IObservable<TSource> Route<TKey, TSource>(this IObservable<TSource> source, Func<TSource, TKey> selector, TKey id)
{
var grouped = RouteTable<TKey, TSource>.s_routes.GetValue(source, s => s.GroupBy(p => selector(p)).Replay().RefCount());
return grouped.Where(e => e.Key.Equals(id)).SelectMany(e => e);
}
}
It would be used like this:
Subject<Packet> packetSubject = new Subject<Packet>();
var packets = packetSubject.AsObservable();
packets.Route((p) => p.Id, 5).Subscribe((p) =>
{
Console.WriteLine("5");
});
packets.Route((p) => p.Id, 4).Subscribe((p) =>
{
Console.WriteLine("4");
});
packets.Route((p) => p.Id, 3).Subscribe((p) =>
{
Console.WriteLine("3");
});
packetSubject.OnNext(new Packet() { Id = 1 });
packetSubject.OnNext(new Packet() { Id = 2 });
packetSubject.OnNext(new Packet() { Id = 3 });
packetSubject.OnNext(new Packet() { Id = 4 });
packetSubject.OnNext(new Packet() { Id = 5 });
packetSubject.OnNext(new Packet() { Id = 4 });
packetSubject.OnNext(new Packet() { Id = 3 });
output is:
3, 4, 5, 4, 3
It only checks the Id for every group when it sees a new packet id.
Here's an operator that I wrote quite some time ago, but I think it does what you're after. I still think that a simple .Where is probably better - even with multiple subscribers.
Nevertheless, I wanted a .ToLookup for observables that operates like the same operator for enumerables.
It isn't memory efficient, but it implements IDisposable so that it can be cleaned up afterwards. It also isn't thread-safe so a little hardening might be required.
Here it is:
public static class ObservableEx
{
public static IObservableLookup<K, V> ToLookup<T, K, V>(this IObservable<T> source, Func<T, K> keySelector, Func<T, V> valueSelector, IScheduler scheduler)
{
return new ObservableLookup<T, K, V>(source, keySelector, valueSelector, scheduler);
}
internal class ObservableLookup<T, K, V> : IDisposable, IObservableLookup<K, V>
{
private IDisposable _subscription = null;
private readonly Dictionary<K, ReplaySubject<V>> _lookups = new Dictionary<K, ReplaySubject<V>>();
internal ObservableLookup(IObservable<T> source, Func<T, K> keySelector, Func<T, V> valueSelector, IScheduler scheduler)
{
_subscription = source.ObserveOn(scheduler).Subscribe(
t => this.GetReplaySubject(keySelector(t)).OnNext(valueSelector(t)),
ex => _lookups.Values.ForEach(rs => rs.OnError(ex)),
() => _lookups.Values.ForEach(rs => rs.OnCompleted()));
}
public void Dispose()
{
if (_subscription != null)
{
_subscription.Dispose();
_subscription = null;
_lookups.Values.ForEach(rs => rs.Dispose());
_lookups.Clear();
}
}
private ReplaySubject<V> GetReplaySubject(K key)
{
if (!_lookups.ContainsKey(key))
{
_lookups.Add(key, new ReplaySubject<V>());
}
return _lookups[key];
}
public IObservable<V> this[K key]
{
get
{
if (_subscription == null) throw new ObjectDisposedException("ObservableLookup");
return this.GetReplaySubject(key).AsObservable();
}
}
}
}
public interface IObservableLookup<K, V> : IDisposable
{
IObservable<V> this[K key] { get; }
}
You would use it like this:
IObservable<Packets> Packets = ...
IObservableLookup<int, Packets> lookup = Packets.ToLookup(p => p.Id, p => p, Scheduler.Default);
lookup[1].Subscribe(p => { });
lookup[2].Subscribe(p => { });
// etc
The nice thing with this is that you can subscribe to values by key before a value with that key has been produced by the source observable.
Don't forget to call lookup.Dispose() when done to clean up the resources.
I would suggest looking at GroupBy and then checking if there is a performance pay off. I assume there is, but is it significant?
Packets.GroupBy(p=>p.Id)
Example code with tests on how to use GroupBy as a type of router
var scheduler = new TestScheduler();
var source = scheduler.CreateColdObservable(
ReactiveTest.OnNext(100, 1),
ReactiveTest.OnNext(200, 2),
ReactiveTest.OnNext(300, 3),
ReactiveTest.OnNext(400, 4),
ReactiveTest.OnNext(500, 5),
ReactiveTest.OnNext(600, 6),
ReactiveTest.OnNext(700, 7),
ReactiveTest.OnNext(800, 8),
ReactiveTest.OnNext(900, 9),
ReactiveTest.OnNext(1000, 10),
ReactiveTest.OnNext(1100, 11)
);
var router = source.GroupBy(i=>i%4)
.Publish()
.RefCount();
var zerosObserver = scheduler.CreateObserver<int>();
router.Where(grp=>grp.Key == 0)
.Take(1)
.SelectMany(grp=>grp)
.Subscribe(zerosObserver);
var onesObserver = scheduler.CreateObserver<int>();
router.Where(grp => grp.Key == 1)
.Take(1)
.SelectMany(grp => grp)
.Subscribe(onesObserver);
var twosObserver = scheduler.CreateObserver<int>();
router.Where(grp => grp.Key == 2)
.Take(1)
.SelectMany(grp => grp)
.Subscribe(twosObserver);
var threesObserver = scheduler.CreateObserver<int>();
router.Where(grp => grp.Key == 3)
.Take(1)
.SelectMany(grp => grp)
.Subscribe(threesObserver);
scheduler.Start();
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(400, 4), ReactiveTest.OnNext(800, 8)}, zerosObserver.Messages);
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(100, 1), ReactiveTest.OnNext(500, 5), ReactiveTest.OnNext(900, 9)}, onesObserver.Messages);
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(200, 2), ReactiveTest.OnNext(600, 6), ReactiveTest.OnNext(1000, 10) }, twosObserver.Messages);
ReactiveAssert.AreElementsEqual(new[] { ReactiveTest.OnNext(300, 3), ReactiveTest.OnNext(700, 7), ReactiveTest.OnNext(1100, 11)}, threesObserver.Messages);
You can use GroupBy to split the data. I would suggest you set up all subscriptions first and then activate your source. Doing so would result in one huge nested GroupBy query, but it is also possible to multi-cast your groups and subscribe to them individually. I wrote a small helper utility to do so below.
Because you still might want to add new routes after the source has been activated (done trough Connect), we use Replay to replay the groups. Replay is also a multi-cast operator so we wont need Publish to multi-cast.
public sealed class RouteData<TKey, TSource>
{
private IConnectableObservable<IGroupedObservable<TKey, TSource>> myRoutes;
public RouteData(IObservable<TSource> source, Func<TSource, TKey> keySelector)
{
this.myRoutes = source.GroupBy(keySelector).Replay();
}
public IDisposable Connect()
{
return this.myRoutes.Connect();
}
public IObservable<TSource> Get(TKey id)
{
return myRoutes.FirstAsync(e => e.Key.Equals(id)).Merge();
}
}
public static class myExtension
{
public static RouteData<TKey, TSource> RouteData<TKey, TSource>(this IObservable<TSource> source, Func<TSource, TKey> keySelector)
{
return new RouteData<TKey, TSource>(source, keySelector);
}
}
Example usage:
public class myPackage
{
public int Id;
public myPackage(int id)
{
this.Id = id;
}
}
class program
{
static void Main()
{
var source = new[] { 0, 1, 2, 3, 4, 5, 4, 3 }.ToObservable().Select(i => new myPackage(i));
var routes = source.RouteData(e => e.Id);
var subscription = new CompositeDisposable(
routes.Get(5).Subscribe(Console.WriteLine),
routes.Get(4).Subscribe(Console.WriteLine),
routes.Get(3).Subscribe(Console.WriteLine),
routes.Connect());
Console.ReadLine();
}
}
You may want to consider writing a custom IObserver that does your bidding. I've included an example below.
void Main()
{
var source = Observable.Range(1, 10);
var switcher = new Switch<int, int>(i => i % 3);
switcher[0] = Observer.Create<int>(val => Console.WriteLine($"{val} Divisible by three"));
source.Subscribe(switcher);
}
class Switch<TKey,TValue> : IObserver<TValue>
{
private readonly IDictionary<TKey, IObserver<TValue>> cases;
private readonly Func<TValue,TKey> idExtractor;
public IObserver<TValue> this[TKey decision]
{
get
{
return cases[decision];
}
set
{
cases[decision] = value;
}
}
public Switch(Func<TValue,TKey> idExtractor)
{
this.cases = new Dictionary<TKey, IObserver<TValue>>();
this.idExtractor = idExtractor;
}
public void OnNext(TValue next)
{
IObserver<TValue> nextCase;
if (cases.TryGetValue(idExtractor(next), out nextCase))
{
nextCase.OnNext(next);
}
}
public void OnError(Exception e)
{
foreach (var successor in cases.Values)
{
successor.OnError(e);
}
}
public void OnCompleted()
{
foreach (var successor in cases.Values)
{
successor.OnCompleted();
}
}
}
You would obviously need to implement idExtractor to extract the ids from your packet.

Read and Write as Parallel Tasks

Reading and Writing in 2 parallel tasks as shown below:
Task[] tasks = new Task[2];
var entityCollection = new BlockingCollection<Dictionary<String, object>>();
tasks[0] = Task.Factory.StartNew(() => ReadData(entityCollection), TaskCreationOptions.LongRunning);
tasks[1] = Task.Factory.StartNew(() => WriteJsontoFile(JSONFileName, entityCollection), TaskCreationOptions.LongRunning);
Task.WaitAll(tasks);
Read Task:
private void ReadData(BlockingCollection<Dictionary<String, object>> collection)
{
do
{
//continuously data is being read in to entities, this part is working fine and then adding it to collection of BlockingCollection type to be consumed in Write task
entitites.ToList().ForEach(e => collection.Add(e));
} while (true);
collection.CompleteAdding();
}
Write Task:
private void WriteJsontoFile(String JsonFileName, BlockingCollection<Dictionary<String, object>> source)
{
using (StreamWriter sw = new StreamWriter(JsonFileName, true))
{
Parallel.ForEach(source.GetConsumingPartitioner(), (line) => ser.Serialize(sw, line));
}
}
GetConsumingPartitioner() related code:
public static class BlockingCollection
{
public static Partitioner<T> GetConsumingPartitioner<T>(
this BlockingCollection<T> collection)
{
return new BlockingCollectionPartitioner<T>(collection);
}
}
class BlockingCollectionPartitioner<T> : Partitioner<T>
{
private BlockingCollection<T> _collection;
internal BlockingCollectionPartitioner(BlockingCollection<T> collection)
{
if (collection == null)
throw new ArgumentNullException("collection");
_collection = collection;
}
public override bool SupportsDynamicPartitions
{
get { return true; }
}
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
if (partitionCount < 1)
throw new ArgumentOutOfRangeException("partitionCount");
var dynamicPartitioner = GetDynamicPartitions();
return Enumerable.Range(0, partitionCount).Select(_ =>
dynamicPartitioner.GetEnumerator()).ToArray();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return _collection.GetConsumingEnumerable();
}
}
I am getting this below exception inside Write task:
Count cannot be less than zero.\r\nParameter name: count
That is not standard syntax for consuming
BlockingCollection Class
// Consume consume the BlockingCollection
while (true) Console.WriteLine(bc.Take());

LINQ query to perform a projection, skipping or wrapping exceptions where source throws on IEnumerable.GetNext()

I'd like a general solution but as an example, assume i have an IEnumerable<string>, where some can be parsed as integers, and some cannot.
var strings = new string[] { "1", "2", "notint", "3" };
Obviously if i did Select(s => int.Parse(s, temp)) it'd throw an exception when enumerated.
In this case i could do .All(s => int.TryParse(s, out temp)) first, however i want a general solution where i don't have to enumerate the IEnumerable twice.
Ideally i'd like to be able to do the following, which calls my magic exception skipping method:
// e.g. parsing strings
var strings = new string[] { "1", "2", "notint", "3" };
var numbers = strings.Select(s => int.Parse(s)).SkipExceptions();
// e.g. encountering null object
var objects = new object[] { new object(), new object(), null, new object() }
var objecttostrings = objects.Select(o => o.ToString()).SkipExceptions();
// e.g. calling a method that could throw
var myClassInstances = new MyClass[] { new MyClass(), new MyClass(CauseMethodToThrow:true) };
var myClassResultOfMethod = myClassInstances.Select(mci => mci.MethodThatCouldThrow()).SkipExceptions();
How can i write the SkipExceptions() extension method?
Some great answers for a SelectSkipExceptions() method, however i wonder if a SkipExceptions() method could be created, along the same lines as AsParallel().
How about this (you might want to give this special Select Extension a better name)
public static IEnumerable<TOutput> SelectIgnoringExceptions<TInput, TOutput>(
this IEnumerable<TInput> values, Func<TInput, TOutput> selector)
{
foreach (var item in values)
{
TOutput output = default(TOutput);
try
{
output = selector(item);
}
catch
{
continue;
}
yield return output;
}
}
Edit5
Added a using statement, thanks for the suggestion in comments
public static IEnumerable<T> SkipExceptions<T>(
this IEnumerable<T> values)
{
using(var enumerator = values.GetEnumerator())
{
bool next = true;
while (next)
{
try
{
next = enumerator.MoveNext();
}
catch
{
continue;
}
if(next) yield return enumerator.Current;
}
}
}
However this relies on the incoming IEnumerable not already being created (and therefore already having thrown Exceptions) as a list by the preceding Function. E.g. this would probably not work if you call it like this: Select(..).ToList().SkipExceptions()
Create a TryParseInt method that returns a Nullable<int>:
int? TryParseInt(string s)
{
int i;
if (int.TryParse(s, out i))
return i;
return null;
}
And use it in your query like that:
var numbers = strings.Select(s => TryParseInt(s))
.Where(i => i.HasValue)
.Select(i => i.Value);
See also this article by Bill Wagner, which presents a very similar case.
Now, i don't think you can write something like a generic SkipExceptions method, because you would catch the exception too late, and it would end the Select loop... But you could probably write a SelectSkipException method:
public static IEnumerable<TResult> SelectSkipExceptions<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
return source.SelectSkipExceptionsIterator(selector);
}
private static IEnumerable<TResult> SelectSkipExceptionsIterator<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, TResult> selector)
{
foreach(var item in source)
{
TResult value = default(TResult);
try
{
value = selector(item);
}
catch
{
continue;
}
yield return value;
}
}
Even the accepted answer may not be "general" enough. What if some day you find that you need to know what exceptions occurred?
The following extension
static class EnumeratorHelper {
//Don't forget that GetEnumerator() call can throw exceptions as well.
//Since it is not easy to wrap this within a using + try catch block with yield,
//I have to create a helper function for the using block.
private static IEnumerable<T> RunEnumerator<T>(Func<IEnumerator<T>> generator,
Func<Exception, bool> onException)
{
using (var enumerator = generator())
{
if (enumerator == null)
yield break;
for (; ; )
{
//You don't know how to create a value of T,
//and you don't know weather it can be null,
//but you can always have a T[] with null value.
T[] value = null;
try
{
if (enumerator.MoveNext())
value = new T[] { enumerator.Current };
}
catch (Exception e)
{
if (onException(e))
continue;
}
if (value != null)
yield return value[0];
else
yield break;
}
}
}
public static IEnumerable<T> WithExceptionHandler<T>(this IEnumerable<T> orig,
Func<Exception, bool> onException)
{
return RunEnumerator(() =>
{
try
{
return orig.GetEnumerator();
}
catch (Exception e)
{
onException(e);
return null;
}
}, onException);
}
}
will help. Now you can add SkipExceptions:
public static IEnumerable<T> SkipExceptions<T>(this IEnumerable<T> orig){
return orig.WithExceptionHandler(orig, e => true);
}
By using different onException callback, you can do different things
Break the iteration but ignore the exception: e => false
Try to continue iteration: e => true
Log the exception, etc
Here's a small complete program to demonstrate an answer inspired by the maybe monad. You might want to change the name of the 'Maybe' class, as it is inspired by rather than actually being a 'Maybe' as defined in other languages.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace TestMaybe
{
class Program
{
static void Main(string[] args)
{
var strings = new string[] { "1", "2", "notint", "3" };
var ints = strings.Select(s => new Maybe<string, int>(s, str => int.Parse(str))).Where(m => !m.nothing).Select(m => m.value);
foreach (var i in ints)
{
Console.WriteLine(i);
}
Console.ReadLine();
}
}
public class Maybe<T1, T2>
{
public readonly bool nothing;
public readonly T2 value;
public Maybe(T1 input, Func<T1, T2> map)
{
try
{
value = map(input);
}
catch (Exception)
{
nothing = true;
}
}
}
}
Edit: depending on the needs of your code, you might also want nothing set to true if the result of map(input) is null.
This is the same answer as Thomas's, but with a lambda & LINQ expression. +1 for Thomas.
Func<string, int?> tryParse = s =>
{
int? r = null;
int i;
if (int.TryParse(s, out i))
{
r = i;
}
return r;
};
var ints =
from s in strings
let i = tryParse(s)
where i != null
select i.Value;
You could just chain the Where and Select method together.
var numbers = strings.Where(s =>
{
int i;
return int.TryParse(s, out i);
}).Select(int.Parse);
The use of the Where method effectively removes the need for you to write your own SkipExceptions method, because this is basically what you are doing.

Is it possible to handle exceptions within LINQ queries?

Example:
myEnumerable.Select(a => ThisMethodMayThrowExceptions(a));
How to make it work even if it throws exceptions? Like a try catch block with a default value case an exceptions is thrown...
myEnumerable.Select(a =>
{
try
{
return ThisMethodMayThrowExceptions(a));
}
catch(Exception)
{
return defaultValue;
}
});
But actually, it has some smell.
About the lambda syntax:
x => x.something
is kind of a shortcut and could be written as
(x) => { return x.something; }
Call a projection which has that try/catch:
myEnumerable.Select(a => TryThisMethod(a));
...
public static Bar TryThisMethod(Foo a)
{
try
{
return ThisMethodMayThrowExceptions(a);
}
catch(BarNotFoundException)
{
return Bar.Default;
}
}
Admittedly I'd rarely want to use this technique. It feels like an abuse of exceptions in general, but sometimes there are APIs which leave you no choice.
(I'd almost certainly put it in a separate method rather than putting it "inline" as a lambda expression though.)
I have come with a small extension when I quickly want to try/catch every iteration of an IEnumerable<T>
Usage
public void Test()
{
List<string> completedProcesses = initialEnumerable
.SelectTry(x => RiskyOperation(x))
.OnCaughtException(exception => { _logger.Error(exception); return null; })
.Where(x => x != null) // filter the ones which failed
.ToList();
}
The extension
public static class OnCaughtExceptionExtension
{
public static IEnumerable<SelectTryResult<TSource, TResult>> SelectTry<TSource, TResult>(this IEnumerable<TSource> enumerable, Func<TSource, TResult> selector)
{
foreach (TSource element in enumerable)
{
SelectTryResult<TSource, TResult> returnedValue;
try
{
returnedValue = new SelectTryResult<TSource, TResult>(element, selector(element), null);
}
catch (Exception ex)
{
returnedValue = new SelectTryResult<TSource, TResult>(element, default(TResult), ex);
}
yield return returnedValue;
}
}
public static IEnumerable<TResult> OnCaughtException<TSource, TResult>(this IEnumerable<SelectTryResult<TSource, TResult>> enumerable, Func<Exception, TResult> exceptionHandler)
{
return enumerable.Select(x => x.CaughtException == null ? x.Result : exceptionHandler(x.CaughtException));
}
public static IEnumerable<TResult> OnCaughtException<TSource, TResult>(this IEnumerable<SelectTryResult<TSource, TResult>> enumerable, Func<TSource, Exception, TResult> exceptionHandler)
{
return enumerable.Select(x => x.CaughtException == null ? x.Result : exceptionHandler(x.Source, x.CaughtException));
}
public class SelectTryResult<TSource,TResult>
{
internal SelectTryResult(TSource source, TResult result, Exception exception)
{
Source = source;
Result = result;
CaughtException = exception;
}
public TSource Source { get; private set; }
public TResult Result { get; private set; }
public Exception CaughtException { get; private set; }
}
}
We could eventually go a bit further by having a SkipOnException extension, accepting optionally an exception handler for example.
In case you need Expression instead of lambda function (e.g. when selecting from IQueryable), you can use something like this:
public static class ExpressionHelper
{
public static Expression<Func<TSource, TResult>> TryDefaultExpression<TSource, TResult>(Expression<Func<TSource, TResult>> success, TResult defaultValue)
{
var body = Expression.TryCatch(success.Body, Expression.Catch(Expression.Parameter(typeof(Exception)), Expression.Constant(defaultValue, typeof (TResult))));
var lambda = Expression.Lambda<Func<TSource, TResult>>(body, success.Parameters);
return lambda;
}
}
Usage:
[Test]
public void Test()
{
var strings = new object [] {"1", "2", "woot", "3", Guid.NewGuid()}.AsQueryable();
var ints = strings.Select(ExpressionHelper.TryDefaultExpression<object, int>(x => Convert.ToInt32(x), 0));
Assert.IsTrue(ints.SequenceEqual(new[] {1, 2, 0, 3, 0}));
}
A variation of Stefan's solution for comprehension syntax:
from a in myEnumerable
select (new Func<myType>(() => {
try
{
return ThisMethodMayThrowExceptions(a));
}
catch(Exception)
{
return defaultValue;
}
}))();
Although, it "smells" too, but still this approach can sometimes be used for running code with side-effects inside expression.
When dealing with LINQ you'll commonly find scenarios where your expression could produce undesired side effects. As Jon said, the best way to combat these sort of problems is to have utility methods your LINQ expression can use that will handle these gracefully and in a fashion that won't blow up your code. For example, I have a method I've had to use time to time which wraps a TryParse to tell me if something is a number. There are many other examples of course.
One of the limitations of the expression syntax is that there are a lot of things it can't do either gracefully or even at all without breaking execution out of the expression temporarily to handle a given scenario. Parsing a subset of items in an XML file is wonderful example. Try parsing a complex parent collection with child subsets from an XML file within a single expression and you'll soon find yourself writing several expression pieces that all come together to form the entire operation.
/// <summary>
/// Catch the exception and then omit the value if exception thrown.
/// </summary>
public static IEnumerable<T> Catch<T>(this IEnumerable<T> source, Action<Exception> action = null)
{
return Catch<T, Exception>(source, action);
}
/// <summary>
/// Catch the exception and then omit the value if exception thrown.
/// </summary>
public static IEnumerable<T> Catch<T, TException>(this IEnumerable<T> source, Action<TException> action = null) where TException : Exception
{
using var enumerator = source.GetEnumerator();
while(true)
{
T item;
try
{
if (!enumerator.MoveNext())
break;
item = enumerator.Current;
}
catch (TException e)
{
action?.Invoke(e);
continue;
}
yield return item;
}
}
/// <summary>
/// Catch the exception and then return the default value.
/// </summary>
public static IEnumerable<T> Catch<T>(this IEnumerable<T> source, Func<Exception, T> defaultValue)
{
return Catch<T, Exception>(source, defaultValue);
}
/// <summary>
/// Catch the exception and then return the default value.
/// </summary>
public static IEnumerable<T> Catch<T, TException>(this IEnumerable<T> source, Func<TException, T> defaultValue) where TException : Exception
{
using var enumerator = source.GetEnumerator();
while(true)
{
T item;
try
{
if (!enumerator.MoveNext())
break;
item = enumerator.Current;
}
catch (TException e)
{
item = defaultValue(e);
}
yield return item;
}
}
Usage:
myEnumerable.Select(a => ThisMethodMayThrowExceptions(a)).Catch(e => Console.WriteLine(e.Message));
myEnumerable.Select(a => ThisMethodMayThrowExceptions(a)).Catch(e => default);
myEnumerable.Select(a => ThisMethodMayThrowExceptions(a)).Catch();
myEnumerable.Select(a => ThisMethodMayThrowExceptions(a)).Catch(((InvalidOperationException) e) => Console.WriteLine(e.Message));
myEnumerable.Select(a => ThisMethodMayThrowExceptions(a)).Catch(((InvalidOperationException) e) => default);
I've created small library for this purposes. It's supported exception handling for Select, SelectMany and Where operators.
Usage example:
var target = source.AsCatchable() // move source to catchable context
.Select(v => int.Parse(v)) // can throw an exception
.Catch((Exception e) => { /* some action */ }, () => -1)
.Select(v => v * 2)
.ToArray();
which equivalet to
var target = source
.Select(v =>
{
try
{
return int.Parse(v);
}
catch (Exception)
{
return -1; // some default behaviour
}
})
.Select(v => v * 2)
.ToArray();
It's also possible to handle several types of exceptions
var collection = Enumerable.Range(0, 5)
.AsCatchable()
.Select(v =>
{
if (v == 2) throw new ArgumentException("2");
if (v == 3) throw new InvalidOperationException("3");
return v.ToString();
})
.Catch((ArgumentException e) => { /* */ }, v => "ArgumentException")
.Catch((InvalidOperationException e) => { /* */ }, v => "InvalidOperationException")
.Catch((Exception e) => { /* */ })
.ToList();
wrap ThisMethodMayThrowExceptions with a new function,
myEnumerable.Select(a => ThisMethodMayThrowExceptions(a)); //old
myEnumerable.Select(a => tryCall(ThisMethodMayThrowExceptions,a)); //new
it's a generic function, with try catch inside.
T2 tryCall<T1, T2>(Func<T1, T2> fn, T1 input, T2 exceptionValue = default)
{
try
{
return fn(input);
}
catch
{
return exceptionValue;
}
}
var numbers = new [] {"1", "a"};
numbers.Select(n => tryCall(double.Parse, n)); //1, 0
numbers.Select(n => tryCall(double.Parse, n, double.NaN)); //1, NaN
I believe this is the correct answer since it allows you to handle the troublesome item and it is filtered from the end result.
public static class IEnumerableExtensions {
public static IEnumerable<TResult> SelectWithExceptionHandler<T, TResult>(this IEnumerable<T> enumerable, Func<T, TResult> func, Action<T, Exception> handler)
=> enumerable
.Select(x => {
try {
return (true, func(x));
} catch (Exception error) {
handler(x, error);
}
return default;
})
.Where(x => x.Item1)
.Select(x => x.Item2);
}

Categories