Reactive Extensions buffering subscriptions

Reactive Extensions buffering subscriptions - c#

I am fairly new to Rx and am having trouble finding a solution to my problem. I am using Rx to commence a download through a client library. Currently it looks like:
private void DownloadStuff(string descriptor, Action<Stuff> stuffAction)
{
this.stuffDownloader.GetStuffObservable(descriptor).Subscribe(x => stuffAction(x))
}
Where stuffDownloader is a wrapper around download logic defined in the client library. But I encountered a problem where I call DownloadStuff too much, causing many downloads, and overwhelming the system. Now what I would like to do is
private void DownloadStuff(string descriptor, Action<Stuff> stuffAction)
{
this.stuffDownloader.GetStuffObservable(descriptor)
.SlowSubscribe(TimeSpan.FromMilliSeconds(50))
.Subscribe(x => stuffAction(x))
}
Where SlowSubscribe is some combination of Rx actions to only subscribe on some interval.
Normally I would just put these DownloadStuff calls on a queue and pull them off on an interval, but I've been trying to do more through Rx lately. Three solutions occur to me:
This functionality exists and can be done all on the subscription side.
This is possible but the infrastructure of the downloader is incorrect and should change (i.e. stuffDownloader needs to behave differently)
This shouldn't be done with Rx, do it another way.
It occurs to me #2 is possible by passing an IObservable of descriptors to the client library and somehow slowing how the descriptors get onto the Observable.

You could in theory use Rx to treat your requests as events. This way you could leverage the serializing nature of Rx to queue up downloads.
I would think that you network layer (or stuffDownloader) would do this for you, but if you want to join me for a hack....this is what I have come up with (Yeehaw!!)
1.
Dont pass an Action, use Rx!! You are basically losing the error handling here and setting yourself up for weird unhandled exceptions.
private void DownloadStuff(string descriptor, Action<Stuff> stuffAction)
becomes
private IObservable<Stuff> DownloadStuff(string descriptor)
2.
Now we just have one method calling another. Seems pointless. Throw away the abstraction.
3.
Fix the underlying. To me the stuffDownloader is not doing it's job. Update the interface to take an IScheduler. Now you can pass it a dedicated EventLoopScheduler to enforce the serialization of the work
public IObservable<Stuff> GetStuffObservable(string descriptor, IScheduler scheduler)
4.
Fix the implementation?
As you want to serialize your requests (hmmmm....) you can just make the call synchronous.
private Stuff GetSync(string description)
{
var request = (HttpWebRequest)WebRequest.Create("http://se300328:90/");
var response =request.GetResponse();
var stuff = MapToStuff(response);
return stuff;
}
Now you just call that in you other method
public IObservable<Stuff> GetStuffObservable(string descriptor, ISchedulerLongRunning scheduler)
{
return Observable.Create<Stuff>(o=>
{
try
{
var stuff = GetStuff(description);
o.OnNext(stuff);
o.OnCompleted();
}
catch(Exception ex)
{
o.OnError(ex);
}
return Disposable.Empty(); //If you want to be sync, you cant cancel!
})
.SubscribeOn(scheduler);
}
However, having done all of this, I am sure this is not what you really want. I would expect that there is a problem somewhere else in the system.
Another alternative is to consider using the Merge operator and it's max concurent feature?

Related

Using event handlers inside an akka.net Actor safely

I'm trying to build a file download actor, using Akka.net. It should send messages on download completion but also report download progress.
In .NET there are classes supporting asynchronous operations using more than one event. For example WebClient.DownloadFileAsync has two events: DownloadProgressChanged and DownloadFileCompleted.
Preferably, one would use the task based async version and use the .PipeTo extension method. But, I can't see how that would work with an async method exposing two events. As is the case with WebClient.DownloadFileAsync. Even with WebClient.DownloadFileTaskAsync you still need to handle DownloadProgressChanged using an event handler.
The only way I found to use this was to hook up two event handlers upon creation of my actor. Then in the handlers, I messages to Self and the Sender. For this, I must refer to some private fields of the actor from inside the event handlers. This feels wrong to me, but I cannot see another way out.
Is there a safer way to use multiple event handlers in an Actor?
Currently, my solution looks like this (_client is a WebClient instance created in the constructor of the actor):
public void HandleStartDownload(StartDownload message)
{
_self = Self;
_downloadRequestor = Sender;
_uri = message.Uri;
_guid = message.Guid;
_tempPath = Path.GetTempFileName();
_client.DownloadFileAsync(_uri, _tempPath);
}
private void Client_DownloadFileCompleted(object sender, System.ComponentModel.AsyncCompletedEventArgs e)
{
var completedMessage = new DownloadCompletedInternal(_guid, _tempPath);
_downloadRequestor.Tell(completedMessage);
_self.Tell(completedMessage);
}
private void Client_DownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
var progressedMessage = new DownloadProgressed(_guid, e.ProgressPercentage);
_downloadRequestor.Tell(progressedMessage);
_self.Tell(progressedMessage);
}
So when the download starts, some fields are set. Additionally, I make sure I Become a state where further StartDownload messages are stashed, until the DownloadCompleted message is received by Self:
public void Ready()
{
Receive<StartDownload>(message => {
HandleStartDownload(message);
Become(Downloading);
});
}
public void Downloading()
{
Receive<StartDownload>(message => {
Stash.Stash();
});
Receive<DownloadCompleted>(message => {
Become(Ready);
Stash.UnstashAll();
});
}
For reference, here's the entire Actor, but I think the important stuff is in this post directly: https://gist.github.com/AaronLenoir/4ce5480ecea580d5d283c5d08e8e71b5

I must refer to some private fields of the actor from inside the event
handlers. This feels wrong to me, but I cannot see another way out.
Is there a safer way to use multiple event handlers in an Actor?
There's nothing inherently wrong with an actor having internal state, and members that are part of that state raising events which are handled within the actor. No more wrong than this would be if taking an OO approach.
The only real concern is if that internal state gets mixed between multiple file download requests, but I think your current code is sound.
A possibly more palatable approach may be to look at the FileDownloadActor as a single use actor, fire it up, download the file, tell the result to the sender and then kill the actor. Starting up actors is a cheap operation, and this completely sidesteps the possibility of sharing the internal state between multiple download requests.
Unless of course you specifically need to queue downloads to run sequentially as your current code does - but the queue could be managed by another actor altogether and still treat the download actors as temporary.

I don't know if that is your case, but I see people treating Actors as micro services when they are simply objects. Remember Actors have internal state.
Now think about scalability, you can't scale messages to one Actor in a distributed Actor System. The messages you're sending to one Actor will be executed in the node executing that Actor.
If you want to execute download operations in parallel (for example), you do as Patrick said and create one Actor per download operation and that Actor can be executed in any available node.

What does the FabricNotReadableException mean? And how should we respond to it?

We are using the following method in a Stateful Service on Service-Fabric. The service has partitions. Sometimes we get a FabricNotReadableException from this peace of code.
public async Task HandleEvent(EventHandlerMessage message)
{
var queue = await StateManager.GetOrAddAsync<IReliableQueue<EventHandlerMessage>>(EventHandlerServiceConstants.EventHandlerQueueName);
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.EnqueueAsync(tx, message);
await tx.CommitAsync();
}
}
Does that mean that the partition is down and is being moved? Of that we hit a secondary partition? Because there is also a FabricNotPrimaryException that is being raised in some cases.
I have seen the MSDN link (https://msdn.microsoft.com/en-us/library/azure/system.fabric.fabricnotreadableexception.aspx). But what does
Represents an exception that is thrown when a partition cannot accept reads.
mean? What happened that a partition cannot accept a read?

Under the covers Service Fabric has several states that can impact whether a given replica can safely serve reads and writes. They are:
Granted (you can think of this as normal operation)
Not Primary
No Write Quorum (again mainly impacting writes)
Reconfiguration Pending
FabricNotPrimaryException which you mention can be thrown whenever a write is attempted on a replica which is not currently the Primary, and maps to the NotPrimary state.
FabricNotReadableException maps to the other states (you don't really need to worry or differentiate between them), and can happen in a variety of cases. One example is if the replica you are trying to perform the read on is a "Standby" replica (a replica which was down and which has been recovered, but there are already enough active replicas in the replica set). Another example is if the replica is a Primary but is being closed (say due to an upgrade or because it reported fault), or if it is currently undergoing a reconfiguration (say for example that another replica is being added). All of these conditions will result in the replica not being able to satisfy writes for a small amount of time due to certain safety checks and atomic changes that Service Fabric needs to handle under the hood.
You can consider FabricNotReadableException retriable. If you see it, just try the call again and eventually it will resolve into either NotPrimary or Granted. If you get FabricNotPrimary exception, generally this should be thrown back to the client (or the client in some way notified) that it needs to re-resolve in order to find the current Primary (the default communication stacks that Service Fabric ships take care of watching for non-retriable exceptions and re-resolving on your behalf).
There are two current known issues with FabricNotReadableException.
FabricNotReadableException should have two variants. The first should be explicitly retriable (FabricTransientNotReadableException) and the second should be FabricNotReadableException. The first version (Transient) is the most common and is probably what you are running into, certainly what you would run into in the majority of cases. The second (non-transient) would be returned in the case where you end up talking to a Standby replica. Talking to a standby won't happen with the out of the box transports and retry logic, but if you have your own it is possible to run into it.
The other issue is that today the FabricNotReadableException should be deriving from FabricTransientException, making it easier to determine what the correct behavior is.

Posted as an answer (to asnider's comment - Mar 16 at 17:42) because it was too long for comments! :)
I am also stuck in this catch 22. My svc starts and immediately receives messages. I want to encapsulate the service startup in OpenAsync and set up some ReliableDictionary values, then start receiving message. However, at this point the Fabric is not Readable and I need to split this "startup" between OpenAsync and RunAsync :(
RunAsync in my service and OpenAsync in my client also seem to have different Cancellation tokens, so I need to work around how to deal with this too. It just all feels a bit messy. I have a number of ideas on how to tidy this up in my code but has anyone come up with an elegant solution?
It would be nice if ICommunicationClient had a RunAsync interface that was called when the Fabric becomes ready/readable and cancelled when the Fabric shuts down the replica - this would seriously simplify my life. :)

I was running into the same problem. My listener was starting up before the main thread of the service. I queued the list of listeners needing to be started, and then activated them all early on in the main thread. As a result, all messages coming in were able to be handled and placed into the appropriate reliable storage. My simple solution (this is a service bus listener):
public Task<string> OpenAsync (CancellationToken cancellationToken)
{
string uri;
Start ();
uri = "<your endpoint here>";
return Task.FromResult (uri);
}
public static object lockOperations = new object ();
public static bool operationsStarted = false;
public static List<ClientAuthorizationBusCommunicationListener> pendingStarts = new List<ClientAuthorizationBusCommunicationListener> ();
public static void StartOperations ()
{
lock (lockOperations)
{
if (!operationsStarted)
{
foreach (ClientAuthorizationBusCommunicationListener listener in pendingStarts)
{
listener.DoStart ();
}
operationsStarted = true;
}
}
}
private static void QueueStart (ClientAuthorizationBusCommunicationListener listener)
{
lock (lockOperations)
{
if (operationsStarted)
{
listener.DoStart ();
}
else
{
pendingStarts.Add (listener);
}
}
}
private void Start ()
{
QueueStart (this);
}
private void DoStart ()
{
ServiceBus.WatchStatusChanges (HandleStatusMessage,
this.clientId,
out this.subscription);
}
========================
In the main thread, you call the function to start listener operations:
protected override async Task RunAsync (CancellationToken cancellationToken)
{
ClientAuthorizationBusCommunicationListener.StartOperations ();
...
This problem likely manifested itself here as the bus in question already had messages and started firing the second the listener was created. Trying to access anything in state manager was throwing the exception you were asking about.

That async-ing feeling - httpclient and mvc thread blocking

Dilemma, dilemma...
I've been working up a solution to a problem that uses async calls to the HttpClient library (GetAsync=>ConfigureAwait(false) etc). IIn a console app, my dll is very responsive and the mixture of using the async await calls and the Parallel.ForEach(=>) really makes me glow.
Now for the issue. After moving from this test harness to the target app, things have become problematic. I'm using asp.net mvc 4 and have hit a few issues. The main issue really is that calling my process on a controller action actually blocks the main thread until the async actions are complete. I've tried using an async controller pattern, I've tried using Task.Factory, I've tried using new Threads. You name it, I've tried all the flavours - and then some!.
Now, I appreciate that the nature of http is not designed to facilitate long processes like this and there are a number of articles here on SO that say don't do it. However, there are mitigating reasons why i NEED to use this approach. The main reason that I need to run this in mvc is due to the fact that I actually update the live data cache (on the mvc app) in realtime via raising an event in my dll's code. This means that fragments of the 50-60 data feeds can be pushed out live before the entire async action is complete. Therefore, client apps can receive partial updates within seconds of the async action being instigated. If I were to delegate the process out to a console app that ran the entire process in the background, I'd no longer be able to harness those fragment partial updates and this is the raison d'etre behind the entire choice of this architecture.
Can anyone shed light on a solution that would allow me to mitigate the blocking of the thread, whilst at the same time, allow each async fragment to be consumed by my object model and fed out to the client apps (I'm using signalr to make these client updates). A kind of nirvanna would be a scenario where an out-of-process cache object could be shared between numerous processes - the cache update could then be triggered and consumed by my mvc process (aka - http://devproconnections.com/aspnet-mvc/out-process-caching-aspnet). And so back to reality...
I have also considered using a secondary webservice to achieve this, but would welcome other options before once again over engineering my solution (there are already many moving parts and a multitude of async Actions going on).
Sorry not to have added any code, I'm hoping for practical philosophy/insights, rather than code help on this, tho would of course welcome coded examples that illustrate a solution to my problem.
I'll update the question as we move in time, as my thinking process is still maturing on this.
[edit] - for the sake of clarity, the snippet below is my brothers grimm code collision (extracted from a larger body of work):
Parallel.ForEach(scrapeDataBases, new ParallelOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount * 15
},
async dataBase =>
{
await dataBase.ScrapeUrlAsync().ConfigureAwait(false);
await UpdateData(dataType, (DataCheckerScrape)dataBase);
});

async and Parallel.ForEach do not mix naturally, so I'm not sure what your console solution looks like. Furthermore, Parallel should almost never be used on ASP.NET at all.
It sounds like what you would want is to just use Task.WhenAll.
On a side note, I think your reasoning around background processing on ASP.NET is incorrect. It is perfectly possible to have a separate process that updates the clients via SignalR.

Being that your question is pretty high level without a lot of code. You could try Reactive Extensions.
Something like
private IEnumerable<Task<Scraper>> ScrappedUrls()
{
// Return the 50 to 60 task for each website here.
// I assume they all return the same type.
// return .ScrapeUrlAsync().ConfigureAwait(false);
throw new NotImplementedException();
}
public async Task<IEnumerable<ScrapeOdds>> GetOdds()
{
var results = new Collection<ScrapeOdds>();
var urlRequest = ScrappedUrls();
var observerableUrls = urlRequest.Select(u => u.ToObservable()).Merge();
var publisher = observerableUrls.Publish();
var hubContext = GlobalHost.ConnectionManager.GetHubContext<OddsHub>();
publisher.Subscribe(scraper =>
{
// Whatever you do do convert to the result set
var scrapedOdds = scraper.GetOdds();
results.Add(scrapedOdds);
// update anything else you want when it arrives.
// Update SingalR here
hubContext.Clients.All.UpdatedOdds(scrapedOdds);
});
// Will fire off subscriptions and not continue until they are done.
await publisher;
return results;
}
The merge option will process the results as they come in. You can then update the signalR hubs plus whatever else you need to update as they come in. The controller action will have to wait for them all to come in. That's why there is an await on the publisher.
I don't really know if httpClient is going to like to have 50 - 60 web calls all at once or not. If it doesn't you can just take the IEnumerable to an array and break it down into a smaller chunks. And also there should be some error checking in there. With Rx you can also tell it to SubscribeOn and ObserverOn different threads but I think with everything being pretty much async that wouldn't be necessary.

Attempt at an Asynchronous method is failing

I have an MVC3/.NET 4 application which uses Entity Framework (4.3.1 Code First)
I have wrapped EF into a Repository/UnitOfWork pattern as described here…
http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application
Typically, as it explains in the article, when I require the creation of a new record I’ve been doing this…
public ActionResult Create(Course course)
{
unitOfWork.CourseRepository.Add(course);
unitOfWork.Save();
return RedirectToAction("Index");
}
However, when more than simply saving a record to a database is required I wrap the logic into what I’ve called an IService. For example…
private ICourseService courseService;
public ActionResult Create(Course course)
{
courseService.ProcessNewCourse(course);
return RedirectToAction("Index");
}
In one of my services I have something like the following…
public void ProcessNewCourse(Course course)
{
// Save the course to the database…
unitOfWork.CourseRepository.Add(course);
unitOfWork.Save();
// Generate a PDF that email some people about the new course being created, which requires more use of the unitOfWork…
var someInformation = unitOfWork.AnotherRepository.GetStuff();
var myPdfCreator = new PdfCreator();
IEnumerable<People> people = unitOfWork.PeopleRepository.GetAllThatWantNotifiying(course);
foreach(var person in people)
{
var message = “Hi ” + person.FullName;
var attachment = myPdfCreator.CreatePdf();
etc...
smtpClient.Send();
}
}
The above isn’t the actual code (my app has nothing to do with courses, I’m using view models, and I have separated the PDF creation and email message out into other classes) but the gist of what is going on is as above!
My problem is that the generation of the PDF and emailing it out is taking some time. The user just needs to know that the record has been saved to the database so I thought I would put the code below the unitOfWork.Save(); into an asynchronous method. The user can then be redirected and the server can happily take its time processing the emails, and attachments and whatever else I require it to do post save.
This is where I’m struggling.
I’ve tried a few things, the current being the following in ICourseService…
public class CourseService : ICourseService
{
private delegate void NotifyDelegate(Course course);
private NotifyDelegate notifyDelegate;
public CourseService()
{
notifyDelegate = new NotifyDelegate(this.Notify);
}
public void ProcessNewCourse(Course course)
{
// Save the course to the database…
unitOfWork.CourseRepository.Add(course);
unitOfWork.Save();
notifyDelegate.BeginInvoke(course);
}
private void Notify(Course course)
{
// All the stuff under unitOfWork.Save(); moved here.
}
}
My Questions/Problems
I’m randomly getting the error: "There is already an open DataReader associated with this Command which must be closed first." in the Notify() method.
Is it something to do with the fact that I’m trying to share the unitOrWork and therefore a dbContext across threads?
If so, can someone be kind enough to explain why this is a problem?
Should I be giving a new instance of unitOfWork to the Notify method?
Am I using the right patterns/classes to invoke the method asynchronously? Or should I be using something along the lines of....
new System.Threading.Tasks.Task(() => { Notify(course); }).Start();
I must say I've become very confused with the terms asynchronous, parallel, and concurrent!!
Any links to articles (c# async for idiots) would be appreciated!!
Many thanks.
UPDATE:
A little more digging got me to this SO page: https://stackoverflow.com/a/5491978/192999 which says...
"Be aware though that EF contexts are not thread safe, i.e. you cannot use the same context in more than one thread."
...so am I trying to achieve the impossible? Does this mean I should be creating a new IUnitOfWork instance for my new thread?

You could create a polling background thread that does the lengthy operation separately from your main flow. This thread could scan the database for new items (or items marked to process). This solution is pretty simple and ensures that jobs get done even if you application crashes (it will be picked up when the polling thread is started again).
You could also use a Synchronised Queue if it's not terrible if the request is 'lost', in the case your application crashes after the doc is requested and before it's generated/sent.
One thing is almost sure - as rikitikitik said - you will need to use a new unit of work, which means a separate transaction.
You could also look at Best threading queue example / best practice .

Simulating push technology by rebuilding the AsynchResult Object - is it even possible?

Recently, I successfully created a long-polling service using HttpAsyncHandler’s. During the development it came to me (that) I “might” be able to re-use the AsyncResult object many times without long-polling repeatedly. If possible, I could then “simulate” push-technology by re-building or re-using the AsyncResult somehow (treating the first request as though it were a subscription-request).
Of course, the first call works great, but subsequent calls keep giving me “Object not set to an instance of an object”. I am “guessing” it is because certain objects are static, and therefore, once "completed" cannot be reused or retrieved (any insight there would be AWESOME!).
So the question is…
Is it possible to build dynamically a new callback from the old callback?
The initial "subscription" process goes like this:
public IAsyncResult BeginProcessRequest(HttpContext context, AsyncCallback cb, object extraData)
{
Guid id = new Guid(context.Request["Key"]);
AsyncResult request = new AsyncResult(cb, context, id);
Service.Singleton.Subscribe(request);
return request;
}
Here is an example of what the service does:
private void MainLoop()
{
while (true)
{
if (_subscribers.Count == 0)
{
if (_messages.Count == max)
_messages.Clear();
}
else
{
if (_messages.Count > 0)
{
Message message = _messages.Dequeue();
foreach (AsyncResult request in _subscribers.ToArray())
{
if(request.ProcessRequest(message));
_subscribers.Remove(request);
}
}
}
Thread.Sleep(500);
}
}
Here is an example of what the AsyncResult.ProcessRequest() call does:
public bool ProcessRequest(Message message)
{
try
{
this.Response = DoSomethingUseful(message);
this.Response.SessionValid = true;
}
catch (Exception ex)
{
this.Response = new Response();
this.Response.SessionValid = false;
}
this.IsCompleted = true;
_asyncCallback(this);
return this.IsCompleted;
}
SO...WOULD SOMETHING LIKE THIS BE POSSIBLE?
I literally tried this and it didn't work...but is SOMETHING "like" it possible?
AsyncResult newRequest = new AsyncResult(request.cb, request.context, request.id);
if(request.ProcessRequest(message))
{
_subscribers.Remove(request);
Subscribers.Add(newRequest);
}

IAsyncResult implementations must satisfy certain invariants, one of which is that it can only be completed once. You don't identify the AsyncResult you're using, but if it's Richter's famous version, then it would uphold that invariant.
If you don't want to go through the trouble of implementing the event-based asynchronous pattern, then the best option is Microsoft Rx, which is a true push-based system.

Let me first preface by saying I am completely unfamiliar with IHttpAsyncHandler interface and usage.
That being said, in general when using an asynchronous programming model, each AsyncResult represents a specific asynchronous method call and should not be reused. Seems like you are looking more for a RegisterEvent(callback) method than a BeginProcessing(callback method) - so even if you were able to get this to work, the design does not hold by asynchonous programming best practices (IMHO).
I assume that since you are using http which is request/response based, it seems unlikely that you will be able to push multiple responses for one request and even if you were able to somehow hack this up, your client will eventually get a timeout due to its unanswered request which would be problematic for what you are going for.
I know in Remoting you can register for remote events and WCF supports duplex contracts which can enable "push technology" if this is an option for you.
Good luck.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.