Cancel a task in reactivex.net - c#

Assumed that I have existed code like:
public IEnumerable<DataType> GetAllData(string[] ids) {
foreach(var id in ids) {
//this is a time-consuming operation, like query from database
var data = this.repo.Find(id);
yield return data;
}
}
I tried to apply Rx to the front-end code:
var observable = GetAllData(new[] { "1", "2", "3" }).ToObservable();
var subs = observable
.SubscribeOn(Scheduler.Default)
.Subscribe(
data => Console.WriteLine(data.Id),
() => Console.WriteLine("All Data Fetched Completed"));
And it's working properly.
But once I bind a subscription to the IObservable, is there any way I can stop it continue fetching data half-way? Dispose the subscription won't stop the enumeration.

Well, a simple approach is:
var cts = new CancellationTokenSource();
var observable = GetAllData(new[] { "1", "2", "3" }).ToObservable().TakeWhile(x => !cts.IsCancellationRequested);
var subs = observable
.SubscribeOn(Scheduler.Default)
.Subscribe(
data => Console.WriteLine(data.Id),
() => Console.WriteLine("All Data Fetched Completed"));
//...
cts.Cancel();
https://stackoverflow.com/a/31529841/2130786

Related

SynchronousBuffer Extension Method for IObservable<T>

I need a Buffer method that doesn't buffer on time or on a certain condition.
It should behave similar to this snapshot method:
Taking a snapshot of ReplaySubject<T> buffer
However it should not take a single snapshot, it should buffer when synchronous changes occur ond provide them as IObservable<IList<T>>.
I think there should be an almost simple solution as this Snapshot method, but I can't get my head around how to really solve this. (Note: The snapshot mehtod also works good for queries over multiple subjects)
Here is a test Method:
[TestMethod]
public async Task SyncBufferTest()
{
var i1 = new BehaviorSubject<int>(1);
var i2 = new BehaviorSubject<int>(4);
var sum = i1.CombineLatest(i2, (i1Value, i2Value) => i1Value + i2Value);
var listAsync = sum.SynchronousBuffer().Select(buf => buf.Last()).ToList().RunAsync(new CancellationToken());
Action syncChange1 = () =>
{
i1.OnNext(2);
i2.OnNext(5);
i1.OnNext(7);
};
Action syncChange2 = () =>
{
i1.OnNext(1);
i2.OnNext(1);
};
Action syncChange3 = () =>
{
i1.OnNext(3);
i1.OnCompleted();
i2.OnCompleted();
};
Task.Run(syncChange1)
.ContinueWith(t => syncChange2())
.ContinueWith(t => syncChange3());
var list = await listAsync;
CollectionAssert.AreEqual(new List<int> { 5, 12, 2, 4 }, list.ToList());
}
Background:
I am working on an architecture concept with a reactive data layer as the base of the application. The whole data layer consists of Subjects (as a "talking" data layer). In a single transaction multiple of these Subjects are changed. I have many Observables in a higher layer of my application that are queries to multiple of these Subjects. So I need this SynchronousBuffer to handle synchronous changes to all of these subjects in all of these queries to not get notified multiple times.
If you're looking for a reactive solution, it's always easier if you model your inputs as observables. In this case:
var i1 = new BehaviorSubject<int>(1);
var i2 = new BehaviorSubject<int>(4);
var sum = i1.CombineLatest(i2, (i1Value, i2Value) => i1Value + i2Value);
Action syncChange1 = () =>
{
i1.OnNext(2);
i2.OnNext(5);
i1.OnNext(7);
};
Action syncChange2 = () =>
{
i1.OnNext(1);
i2.OnNext(1);
};
Action syncChange3 = () =>
{
i1.OnNext(3);
i1.OnCompleted();
i2.OnCompleted();
};
IObservable<Action> actions = new Action[] { syncChange1, syncChange2, syncChange3 }.ToObservable();
Same as the question, just we're structuring our Actions as an observable series of changes. Now, magic can happen:
var openWindow = new Subject<int>();
var closeWindow = new Subject<int>();
var gatedActions = actions
.Select((a, i) => new Action(() => {
openWindow.OnNext(i);
a();
closeWindow.OnNext(i);
}));
Now we have windows defined, which can easily be passed into .Buffer() or .Window().
// alternative to window. Not used.
var buffer = sum.Buffer(openWindow, i => closeWindow.Where(cwi => cwi == i));
var listAsync = sum
.Window(openWindow, i => closeWindow.Where(cwi => cwi == i))
.SelectMany(w => w.TakeLast(1))
.ToList()
.RunAsync(new CancellationToken());
gatedActions.Subscribe(a => a(), () => { openWindow.OnCompleted(); closeWindow.OnCompleted(); });
var list = await listAsync; //output is {12, 2, 4}. The starting 5 can be worked in with a .Merge() or something.
Another approach is to try to define a time window within which you consider changes to be synchronous:
var synchronounsWindow = TimeSpan.FromMilliseconds(100);
var actions = new Action[] {syncChange1, syncChange2, syncChange3};
IObservable<Unit> allChanges = Observable.Merge(
i1.Select(_ => Unit.Default),
i2.Select(_ => Unit.Default)
);
Once we have a time window, you can apply the same windowing/buffering techniques as the other answer.
var buffer = sum.Buffer(allChanges.Throttle(synchronounsWindow)); //alternative to window if you like
IList<int> list = null;
var listAsync = sum
.Window(allChanges.Throttle(synchronounsWindow))
.SelectMany(w => w.TakeLast(1))
.ToList()
.Subscribe(l => { list = l;});
foreach (var a in actions)
{
a();
await Task.Delay(synchronounsWindow);
}
CollectionAssert.AreEqual(new List<int> { 12, 2, 4 }, list.ToList()); // again, skipping 5

How to aggregate millions of rows using EF Core

I'm trying to aggregate approximately two million rows based on user.
One user has several Transactions, each Transaction has a Platform and a TransactionType.I aggregate Platform and TransactionType columns as json and save as a single row.
But my code is slow.
How can I improve the performance?
public static void AggregateTransactions()
{
using (var db = new ApplicationDbContext())
{
db.ChangeTracker.AutoDetectChangesEnabled = false;
//Get a list of users who have transactions
var users = db.Transactions
.Select(x => x.User)
.Distinct();
foreach (var user in users.ToList())
{
//Get all transactions for a particular user
var _transactions = db.Transactions
.Include(x => x.Platform)
.Include(x => x.TransactionType)
.Where(x => x.User == user)
.ToList();
//Aggregate Platforms from all transactions for user
Dictionary<string, int> platforms = new Dictionary<string, int>();
foreach (var item in _transactions.Select(x => x.Platform).GroupBy(x => x.Name).ToList())
{
platforms.Add(item.Key, item.Count());
};
//Aggregate TransactionTypes from all transactions for user
Dictionary<string, int> transactionTypes = new Dictionary<string, int>();
foreach (var item in _transactions.Select(x => x.TransactionType).GroupBy(x => x.Name).ToList())
{
transactionTypes.Add(item.Key, item.Count());
};
db.Add<TransactionByDay>(new TransactionByDay
{
User = user,
Platforms = platforms, //The dictionary list is represented as json in table
TransactionTypes = transactionTypes //The dictionary list is represented as json in table
});
db.SaveChanges();
}
}
}
Update
So a basic view of the data would look like the following:
Tansactions Data:
Id: b11c6b67-6c74-4bbe-f712-08d609af20cf,
UserId: 1,
PlatformId: 3,
TransactionypeId: 1
Id: 4782803f-2f6b-4d99-f717-08d609af20cf,
UserId: 1,
PlatformId: 3,
TransactionypeId: 4
Aggregate data as TransactionPerDay:
Id: 9df41ef2-2fc8-441b-4a2f-08d609e21559,
UserId: 1,
Platforms: {"p3":2},
TransactionsTypes: {"t1":1,"t4":1}
So in this case, two transactions are aggregated into one. You can see that the platforms and transaction types will be aggregated as json.
You probably should not be calling db.saveChanges() within the loop. Putting it outside the loop to persist the changes once, may help.
But having said this, when dealing with large volumes of data and performance is key, I've found that ADO.NET is probably a better choice. This does not mean you have to stop using Entity Framework, but perhaps for this method you could use ADO.NET. If you go down this path you could either:
Create a stored procedure to return the data you need to work on, populate a datatable, manipulate the data and the persist everything in bulk using sqlBulkCopy.
Use a stored procedure to completely perform this operation. This avoids the need to shuttle the data to your application and the entire processing can happen within the database itself.
Linq To EF is not built for speed (LinqToSQL is easier and faster IMHO, or you could run direct SQL commands with Linq EF\SQL). Anyway, I don't know how this would speed wise:
using (var db = new MyContext(connectionstring))
{
var tbd = (from t in db.Transactions
group t by t.User
into g
let platforms = g.GroupBy(tt => tt.Platform.Name)
let trantypes = g.GroupBy(tt => tt.TransactionType.Name)
select new {
User = g.Key,
Platforms = platforms,
TransactionTypes = trantypes
}).ToList()
.Select(u => new TransactionByDay {
User=u.User,
Platforms=u.Platforms.ToDictionary(tt => tt.Key, tt => tt.Count()),
TransactionTypes = u.TransactionTypes.ToDictionary(tt => tt.Key, tt => tt.Count())
});
//...
}
The idea is to try to do less queries and includes by getting as much data as needed first. So there is no need to include with every transaction the Platform and TransactionType, where you can just query them once in a Dictionary and look the data up. Further more we could do our processing in Parallel, then save all the data at once.
public static void AggregateTransactions()
{
using (var db = new ApplicationDbContext())
{
db.ChangeTracker.AutoDetectChangesEnabled = false;
//Get a list of users who have transactions
var transactionsByUser = db.Transactions
.GroupBy(x => x.User) //Not sure if EF Core supports this kind of grouping
.ToList();
var platforms = db.Platforms.ToDictionary(ks => ks.PlatformId);
var Transactiontypes = db.TransactionTypes.ToDictionary(ks => ks.TransactionTypeId);
var bag = new ConccurentBag<TransactionByDay>();
Parallel.ForEach(transactionsByUser, transaction =>
{
//Aggregate Platforms from all transactions for user
Dictionary<string, int> platforms = new Dictionary<string, int>(); //This can be converted to a ConccurentDictionary
//This can be converted to Parallel.ForEach
foreach (var item in _transactions.Select(x => platforms[x.PlatformId]).GroupBy(x => x.Name).ToList())
{
platforms.Add(item.Key, item.Count());
};
//Aggregate TransactionTypes from all transactions for user
Dictionary<string, int> transactionTypes = new Dictionary<string, int>(); //This can be converted to a ConccurentDictionary
//This can be converted to Parallel.ForEach
foreach (var item in _transactions.Select(x => Transactiontypes[c.TransactionTypeId]).GroupBy(x => x.Name).ToList())
{
transactionTypes.Add(item.Key, item.Count());
};
bag.Add(new TransactionByDay
{
User = transaction.Key,
Platforms = platforms, //The dictionary list is represented as json in table
TransactionTypes = transactionTypes //The dictionary list is represented as json in table
});
});
//Before calling this we may need to check the status of the Parallel ForEach, or just convert it back to regular foreach loop if you see no benefit.
db.AddRange(bag);
db.SaveChanges();
}
}
Variation #2
public static void AggregateTransactions()
{
using (var db = new ApplicationDbContext())
{
db.ChangeTracker.AutoDetectChangesEnabled = false;
//Get a list of users who have transactions
var users = db.Transactions
.Select(x => x.User)
.Distinct().ToList();
var platforms = db.Platforms.ToDictionary(ks => ks.PlatformId);
var Transactiontypes = db.TransactionTypes.ToDictionary(ks => ks.TransactionTypeId);
var bag = new ConccurentBag<TransactionByDay>();
Parallel.ForEach(users, user =>
{
var _transactions = db.Transactions
.Where(x => x.User == user)
.ToList();
//Aggregate Platforms from all transactions for user
Dictionary<string, int> userPlatforms = new Dictionary<string, int>();
Dictionary<string, int> userTransactions = new Dictionary<string, int>();
foreach(var transaction in _transactions)
{
if(platforms.TryGetValue(transaction.PlatformId, out var platform))
{
if(userPlatforms.TryGetValue(platform.Name, out var tmp))
{
userPlatforms[platform.Name] = tmp + 1;
}
else
{
userPlatforms.Add(platform.Name, 1);
}
}
if(Transactiontypes.TryGetValue(transaction.TransactionTypeId, out var type))
{
if(userTransactions.TryGetValue(type.Name, out var tmp))
{
userTransactions[type.Name] = tmp + 1;
}
else
{
userTransactions.Add(type.Name, 1);
}
}
}
bag.Add(new TransactionByDay
{
User = user,
Platforms = userPlatforms, //The dictionary list is represented as json in table
TransactionTypes = userTransactions //The dictionary list is represented as json in table
});
});
db.AddRange(bag);
db.SaveChanges();
}
}

How to ensure all tasks have been executed and query the final results

I'm using Tasks to perform a computation intensive operation. PerformCalculation method. The main parent task uses Task factory to create three child tasks and starts execution. Each child task shall return
a List<Dictionary<int,double>>
List<double> paramList = new List<double>{ 2, 2.5, 3};
CancellationTokenSource cts = new CancellationTokenSource();
Task parent = new Task(() =>
{
var tf = new TaskFactory<List<Dictionary<int, double>>>(cts.Token, TaskCreationOptions.AttachedToParent,
TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
var childTasks = new[] {
tf.StartNew(() => PerformCalculation(cts.Token, paramList[0],Task.CurrentId)),
tf.StartNew(() => PerformCalculation(cts.Token, paramList[1],Task.CurrentId)),
tf.StartNew(() => PerformCalculation(cts.Token, paramList[2],Task.CurrentId)) //3rd entry
};
The results of child tasks upon successful execution shall be in the form of a List<Dictionary<int, double>>.
Now my requirement is to write a lambda expression that would query the results from all of the child tasks once they are finished executing and populate in another list
i.e. list of list (List<List<Dictionary<int, double>>>)
// When all children are done, get the value returned from the
// non-faulting/canceled tasks.
tf.ContinueWhenAll(childTasks, completedTasks =>
completedTasks
.Where(t => t.Status == TaskStatus.RanToCompletion)**??Need HELP HERE ???**),CancellationToken.None)
,TaskContinuationOptions.ExecuteSynchronously);
});
I'll side step the question a bit because all these APIs are essentially obsolete (or at least not suitable for "mainstream" scenarios). This is actually fairly easy:
var childTasks = new[] {
Task.Run(() => PerformCalculation(cts.Token, paramList[0],Task.CurrentId)),
Task.Run(() => PerformCalculation(cts.Token, paramList[1],Task.CurrentId)),
Task.Run(() => PerformCalculation(cts.Token, paramList[2],Task.CurrentId)),
};
var results = Task.WhenAll(childTasks).Result;
TaskFactory is a fairly esoteric type that I have never seen used in the wild. Attached child tasks should be avoided since they add non-obvious dependencies. None of these are mistakes but they are smells.

Observable sequence that polls repository until a valid value is returned

I have to poll a database until it contains valid data.
To do it, I have a repository that should queried every n seconds in order to get a my very own entity, called DestinationResponse.
class DestinationResponse
{
bool HasDestination { get; set; }
bool Destination { get; set; }
}
When the DestinationResponse has its property HasDestination to true, the Destination is returned.
So, my observable sequence should get all the responses waiting for one to have HasDestination=true. It basically awaits for a response that HasDestination set to true. When this happens, it returns it and the sequence completes. It will only push one element at most!
My current approach is this:
var pollingPeriod = TimeSpan.FromSeconds(n);
var scheduler = new EventLoopScheduler(ts => new Thread(ts) {Name = "DestinationPoller"});
var observable = Observable.Interval(pollingPeriod, scheduler)
.SelectMany(_ => destinationRepository.GetDestination().ToObservable())
.TakeWhile(response => !response.HasDestination)
.TakeLast(1)
.Select(response => response.Destination);
I know I's wrong, mainly because the Interval call will keep generating even if the last call to GetDestination hasn't finished.
NOTE:
repository.GetDestination() returns a Task<DestinationResponse> and it actually queries the database.
Merging the answer from Database polling with Reactive Extensions with your example code, I think gives you what you want.
var pollingPeriod = TimeSpan.FromSeconds(n);
var scheduler = new EventLoopScheduler(ts => new Thread(ts) {Name = "DestinationPoller"});
var query = Observable.Timer(pollingPeriod , scheduler)
.SelectMany(_ => destinationRepository.GetDestination().ToObservable())
.TakeWhile(response => response.HasDestination)
.Retry() //Loop on errors
.Repeat() //Loop on success
.Select(response => response.Destination)
.Take(1);
This code may be the query I want. What do you think?
private IObservable<Destination> CreateOrderDestinationObservable(string boxId, int orderId)
{
var pollingPeriod = TimeSpan.FromSeconds(DestinationPollingDelay);
var scheduler = new EventLoopScheduler(ts => new Thread(ts) {Name = "DestinationPoller"});
var observable = Observable.Timer(pollingPeriod, scheduler)
.SelectMany(_ => externalBridgeRepository.GetDestination(boxId, orderId).ToObservable())
.Where(response => response.HasDestination)
.Retry()
.Repeat()
.Take(1)
.Select(response => response.Destination);
return observable;
}

Using rx to subscribe to event and perform logging after time interval

I have a simple use case where:
Receive a notification of events
Perform some action on the event
Print the content after x interval
How can I do the above step in a single Rx pipeline?
Something like below:
void Main()
{
var observable = Observable.Interval(TimeSpan.FromSeconds(1));
// Receive event and call Foo()
observable.Subscribe(x=>Foo());
// After 1 minute, I want to print the result of count
// How do I do this using above observable?
}
int count = 0;
void Foo()
{
Console.Write(".");
count ++;
}
I think this does what you want:
var observable =
Observable
.Interval(TimeSpan.FromSeconds(1))
.Do(x => Foo())
.Window(() => Observable.Timer(TimeSpan.FromMinutes(1.0)));
var subscription =
observable
.Subscribe(xs => Console.WriteLine(count));
However, it's a bad idea to mix state with observables. If you had two subscriptions you'd increment count twice as fast. It's better to encapsulate your state within the observable so that each subscription would get a new instance of count.
Try this instead:
var observable =
Observable
.Defer(() =>
{
var count = 0;
return
Observable
.Interval(TimeSpan.FromSeconds(1))
.Select(x =>
{
Console.Write(".");
return ++count;
});
})
.Window(() => Observable.Timer(TimeSpan.FromMinutes(0.1)))
.SelectMany(xs => xs.LastAsync());
var subscription =
observable
.Subscribe(x => Console.WriteLine(x));
I get this kind of output:
...........................................................59
............................................................119
............................................................179
............................................................239
Remembering that it starts with 0 then this is timing pretty well.
After seeing paulpdaniels answer I realized that I could replace my Window/SelectMany/LastAsync with the simpler Sample operator.
Also, if we don't really need the side-effect of incrementing a counter then this whole observable shrinks down to this:
var observable =
Observable
.Interval(TimeSpan.FromSeconds(1.0))
.Do(x => Console.Write("."))
.Sample(TimeSpan.FromMinutes(1.0));
observable.Subscribe(x => Console.WriteLine(x));
Much simpler!
I would use Select + Sample:
var observable = Observable.Interval(TimeSpan.FromSeconds(1))
.Select((x, i) => {
Foo(x);
return i;
})
.Do(_ => Console.Write("."))
.Sample(TimeSpan.FromMinutes(1));
observable.Subscribe(x => Console.WriteLine(x));
Select has an overload that returns the index of the current value, by returning that and then sampling at 1 minute intervals, you can get the last value emitted during that interval.

Categories