Akka.Net PreRestart not executed when exception from async handler

Akka.Net PreRestart not executed when exception from async handler - c#

I have the following Actor where I am trying to restart and resend the failing message back to the actor :
public class BuildActor : ReceivePersistentActor
{
public override string PersistenceId => "asdad3333";
private readonly IActorRef _nextActorRef;
public BuildActor(IActorRef nextActorRef)
{
_nextActorRef = nextActorRef;
Command<Workload>(x => Build(x));
RecoverAny(workload =>
{
Console.WriteLine("Recovering");
});
}
public void Build(Workload Workload)
{
var context = Context;
var self = Self;
Persist(Workload, async x =>
{
//after this line executes
//application goes into break mode
//does not execute PreStart or Recover
var workload = await BuildTask(Workload);
_nextActorRef.Tell(workload);
context.Stop(self);
});
}
private Task<Workload> BuildTask(Workload Workload)
{
//works as expected if method made synchronous
return Task.Run(() =>
{
//simulate exception
if (Workload.ShowException)
{
throw new Exception();
}
return Workload;
});
}
protected override void PreRestart(Exception reason, object message)
{
if (message is Workload workload)
{
Console.WriteLine("Prestart");
workload.ShowException = false;
Self.Tell(message);
}
}
}
Inside the success handler of Persist I am trying to simulate an exception being thrown but on exception the application goes in to break mode and PreRestart hook is not invoked. But if I make BuildTask method synchronous by removing Task.Run then on exception both PreRestart and Recover<T> methods are invoked.
I would really appreciated if someone can point to me what should be the recommended pattern for this and where I am going wrong.

Most probably, Akka.Persistence is not the good solution for your problem here.
Akka.Persistence uses eventsourcing principles for storing actor's state. Few key points important in this context:
What you're sending to actor, is a command. It describes a job, you want to be done. Executing that command may result in doing some actual processing and eventually may lead to persist actor's linear state change history in form of the events.
In Akka.NET Persist method is used only to store events - they describe the fact, that something has happened: because of that, they cannot be denied and they cannot fail (a thing that you're doing in your Persist callback).
When an actor restarts at any point in time, it will always try to recreate its own state by replaying all events Persisted up to the last known point in time. For this reason it's important that Recover method should only focus on replaying actor's state (it can be called multiple times over the same event) and never result in side effects (example of side effect is sending an email). Any exception thrown there will mean, that actor state is irrecoverably corrupted and that actor will be killed.
If you want to resend the message to your actor, you could:
Put a reliable message queue (i.e. RabbitMQ or Azure Service Bus) or log (Kafka or Event Hub) in front of your actor processing pipeline. This is actually the most reasonable scenario in many cases.
Use at-least-once delivery semantics from Akka.Persistence - but IMHO only if for some reason you cannot use 1st solution.
The most simplistic and unreliable option (since messages are residing only in memory and never persisted) is dead letter queue. Every unhandled message is send there. You can subscribe to it and filter the incoming data to detect which messages should be send again to their recipients.

Related

How to create a .NET Background Service that only listens for events?

I need to do some fairly granular analysis of Windows events on some servers, and forward them to a syslog server. I've created a .NET Service which is working quite well, but there are some aspects to this that I do not understand.
Here's the Program.cs, which is pretty much out-of-the-box, but I've added some configuration stuff to it:
using IHost host = Host.CreateDefaultBuilder(args)
.UseWindowsService(options =>
{
options.ServiceName = "LEULogSenderSVC";
})
.ConfigureServices((hostContext, services) =>
{
IConfiguration configuration = hostContext.Configuration;
LEULogConfig options = configuration.GetSection("LEULogConfig").Get<LEULogConfig>();
services.AddSingleton(options);
services.AddHostedService<LogMonitorSvc>();
})
.Build();
await host.RunAsync();
Here is the LogMonitorSvc.cs (edited for brevity):
public sealed class LogMonitorSvc : BackgroundService
{
private readonly ILogger<LogMonitorSvc> _logger;
private static LEULogConfig _options;
private static MessageFiltering systemLogRules { get; set; }
private static MessageFiltering applicationLogRules { get; set; }
private static void OnApplicationEntryWritten(object source, EntryWrittenEventArgs e)
{
//Process Application Log Entry, optionally send to syslog...
}
private static void OnSystemEntryWritten(object source, EntryWrittenEventArgs e)
{
//Process System Log Entry, optionally send to syslog...
}
public LogMonitorSvc(ILogger<LogMonitorSvc> logger, LEULogConfig options)
{
_logger = logger;
_options = options;
EventLog systemLog = new EventLog("System", ".");
EventLog applicationLog = new EventLog("Application", ".");
systemLogRules = MessageFiltering.DeSerialize(options.SystemLogRulesFilePath);
systemLog.EntryWritten += new EntryWrittenEventHandler(OnSystemEntryWritten);
systemLog.EnableRaisingEvents = true;
applicationLogRules = MessageFiltering.DeSerialize(options.ApplicationLogRulesFilePath);
applicationLog.EntryWritten += new EntryWrittenEventHandler(OnApplicationEntryWritten);
applicationLog.EnableRaisingEvents = true;
}
protected override async Task ExecuteAsync(CancellationToken stoppingToken)
{
while (!stoppingToken.IsCancellationRequested)
{
//_logger.LogWarning("Worker running at: {time}", DateTimeOffset.Now);
await Task.Delay(10000, stoppingToken);
}
}
}
Every example I've found seems to assume that there's something going on periodically (almost always a timer-based event) that causes the 'Task' to run. I don't need that, I simply register two Event Handlers, and the service should just chill (which it does) until one of those events occurs, at which time the appropriate On...EntryWritten handler runs. Again, this is basically working, but it feels quite kludgy.
So, my questions are as follows:
Do I need the "await host.RunAsync()" line at the end of program.cs? I can't figure out how to get rid of it, because the service just dies if it's not there.
My ExecuteAsync code simply drops in for a visit every 10 seconds and does nothing. Is there something else I can put in there that essentially says "Wait indefinitely without pinning my CPU"?
What's the correct way to setup error handling in this situation? If something goes haywire during initialization (e.g. file not found), I'd like to prevent the service from starting, but if I throw an error in the constructor, it seems to proceed as if nothing happened.
Is there a better way to approach this? I wonder what happens if a burst of events happen all at once - will the various events be handled in their own thread, or do they get queued up, etc.?
Thanks in advance for any advice...

Do I need the "await host.RunAsync()" line at the end of program.cs? I can't figure out how to get rid of it, because the service just dies if it's not there.
Yes. The host is what creates and starts the background services. You still need to run the host.
My ExecuteAsync code simply drops in for a visit every 10 seconds and does nothing. Is there something else I can put in there that essentially says "Wait indefinitely without pinning my CPU"?
You can feel free to ignore ExecuteAsync if that works for you:
protected override Task ExecuteAsync(CancellationToken) => Task.CompletedTask;
This is one way of thinking of event-based background services: the constructor starts it and Dispose stops it, and ExecuteAsync is ignored.
An alternative perspective is to have a minimal constructor (generally considered good design), and have ExecuteAsync be its "main loop". I.e., it starts when ExecuteAsync starts, and it cleans up before exiting ExecuteAsync. In this case, the "infinite" delay is a normal way to do nothing until shutdown is requested (via the CancellationToken).
What's the correct way to setup error handling in this situation? If something goes haywire during initialization (e.g. file not found), I'd like to prevent the service from starting, but if I throw an error in the constructor, it seems to proceed as if nothing happened.
Are you sure? Throwing an exception from a constructor should prevent the host from even getting its list of hosted services.
Is there a better way to approach this? I wonder what happens if a burst of events happen all at once - will the various events be handled in their own thread, or do they get queued up, etc.?
That is entirely dependent on the implementation of EventLog. I'm fairly sure that each event will come in on a thread pool thread.

How to correctly discard a web post async c#

I have a number of web posts inside my application that need to send text data to a server but other than awaiting completion of the post shouldnt hold up the methods that are called from (large data posts that would slowdown logic etc that shouldnt be).
Currently im discarding the task as that appeared to be the correct method however on the server end logs indicate it seams to be closing the connection before the data is successfuly sent meaning I'm loosing most of the data in transit.
private void DoSomethingandPost()
{
BeforeMethod();
PushWebDataAsync(TheData1);
PushWebDataAsync(TheData2);
AfterMethod();
}
public static async void PushWebDataAsync(string Data)
{
...makes changes to the data...
try
{
_ = pushDataAync(Data);
}
catch (Exception e)
{
_ = pushDataAync(Data);
}
}
public System.Threading.Tasks.Task<System.Xml.XmlNode> pushDataAync(string Data)
{
return base.Channel.pushDataAync(Data);
}
My gut feeling is that if "AfterMethod" returns before the data has completed sending the connection to the server is cut and so the data isnt fully transmitted.
What Im trying to acheieve really is DoSomethingandPost() completes and exits but the two async Post's continue on their own until complete then exit.

If AfterMethod must run after the two PushWebDataAsync calls, then make the later return a Task, make AfterMethod async and await the push-methods. DoSomethingandPost will return at the first await-statement, doing the rest of the work at some later time . If you want to do the push concurrently then do
var task1 = PushWebDataAsync(TheData1);
var task2 = PushWebDataAsync(TheData2);
await Task.WhenAll(new []{task1, task2});
...
It is good practice to avoid async void since this makes it impossible for the caller to know if the call succeeded or not. If you know this will never be needed, like in the event handler for a button, then it is good practice to handle any exception that may be thrown.

Using event handlers inside an akka.net Actor safely

I'm trying to build a file download actor, using Akka.net. It should send messages on download completion but also report download progress.
In .NET there are classes supporting asynchronous operations using more than one event. For example WebClient.DownloadFileAsync has two events: DownloadProgressChanged and DownloadFileCompleted.
Preferably, one would use the task based async version and use the .PipeTo extension method. But, I can't see how that would work with an async method exposing two events. As is the case with WebClient.DownloadFileAsync. Even with WebClient.DownloadFileTaskAsync you still need to handle DownloadProgressChanged using an event handler.
The only way I found to use this was to hook up two event handlers upon creation of my actor. Then in the handlers, I messages to Self and the Sender. For this, I must refer to some private fields of the actor from inside the event handlers. This feels wrong to me, but I cannot see another way out.
Is there a safer way to use multiple event handlers in an Actor?
Currently, my solution looks like this (_client is a WebClient instance created in the constructor of the actor):
public void HandleStartDownload(StartDownload message)
{
_self = Self;
_downloadRequestor = Sender;
_uri = message.Uri;
_guid = message.Guid;
_tempPath = Path.GetTempFileName();
_client.DownloadFileAsync(_uri, _tempPath);
}
private void Client_DownloadFileCompleted(object sender, System.ComponentModel.AsyncCompletedEventArgs e)
{
var completedMessage = new DownloadCompletedInternal(_guid, _tempPath);
_downloadRequestor.Tell(completedMessage);
_self.Tell(completedMessage);
}
private void Client_DownloadProgressChanged(object sender, DownloadProgressChangedEventArgs e)
{
var progressedMessage = new DownloadProgressed(_guid, e.ProgressPercentage);
_downloadRequestor.Tell(progressedMessage);
_self.Tell(progressedMessage);
}
So when the download starts, some fields are set. Additionally, I make sure I Become a state where further StartDownload messages are stashed, until the DownloadCompleted message is received by Self:
public void Ready()
{
Receive<StartDownload>(message => {
HandleStartDownload(message);
Become(Downloading);
});
}
public void Downloading()
{
Receive<StartDownload>(message => {
Stash.Stash();
});
Receive<DownloadCompleted>(message => {
Become(Ready);
Stash.UnstashAll();
});
}
For reference, here's the entire Actor, but I think the important stuff is in this post directly: https://gist.github.com/AaronLenoir/4ce5480ecea580d5d283c5d08e8e71b5

I must refer to some private fields of the actor from inside the event
handlers. This feels wrong to me, but I cannot see another way out.
Is there a safer way to use multiple event handlers in an Actor?
There's nothing inherently wrong with an actor having internal state, and members that are part of that state raising events which are handled within the actor. No more wrong than this would be if taking an OO approach.
The only real concern is if that internal state gets mixed between multiple file download requests, but I think your current code is sound.
A possibly more palatable approach may be to look at the FileDownloadActor as a single use actor, fire it up, download the file, tell the result to the sender and then kill the actor. Starting up actors is a cheap operation, and this completely sidesteps the possibility of sharing the internal state between multiple download requests.
Unless of course you specifically need to queue downloads to run sequentially as your current code does - but the queue could be managed by another actor altogether and still treat the download actors as temporary.

I don't know if that is your case, but I see people treating Actors as micro services when they are simply objects. Remember Actors have internal state.
Now think about scalability, you can't scale messages to one Actor in a distributed Actor System. The messages you're sending to one Actor will be executed in the node executing that Actor.
If you want to execute download operations in parallel (for example), you do as Patrick said and create one Actor per download operation and that Actor can be executed in any available node.

What does the FabricNotReadableException mean? And how should we respond to it?

We are using the following method in a Stateful Service on Service-Fabric. The service has partitions. Sometimes we get a FabricNotReadableException from this peace of code.
public async Task HandleEvent(EventHandlerMessage message)
{
var queue = await StateManager.GetOrAddAsync<IReliableQueue<EventHandlerMessage>>(EventHandlerServiceConstants.EventHandlerQueueName);
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.EnqueueAsync(tx, message);
await tx.CommitAsync();
}
}
Does that mean that the partition is down and is being moved? Of that we hit a secondary partition? Because there is also a FabricNotPrimaryException that is being raised in some cases.
I have seen the MSDN link (https://msdn.microsoft.com/en-us/library/azure/system.fabric.fabricnotreadableexception.aspx). But what does
Represents an exception that is thrown when a partition cannot accept reads.
mean? What happened that a partition cannot accept a read?

Under the covers Service Fabric has several states that can impact whether a given replica can safely serve reads and writes. They are:
Granted (you can think of this as normal operation)
Not Primary
No Write Quorum (again mainly impacting writes)
Reconfiguration Pending
FabricNotPrimaryException which you mention can be thrown whenever a write is attempted on a replica which is not currently the Primary, and maps to the NotPrimary state.
FabricNotReadableException maps to the other states (you don't really need to worry or differentiate between them), and can happen in a variety of cases. One example is if the replica you are trying to perform the read on is a "Standby" replica (a replica which was down and which has been recovered, but there are already enough active replicas in the replica set). Another example is if the replica is a Primary but is being closed (say due to an upgrade or because it reported fault), or if it is currently undergoing a reconfiguration (say for example that another replica is being added). All of these conditions will result in the replica not being able to satisfy writes for a small amount of time due to certain safety checks and atomic changes that Service Fabric needs to handle under the hood.
You can consider FabricNotReadableException retriable. If you see it, just try the call again and eventually it will resolve into either NotPrimary or Granted. If you get FabricNotPrimary exception, generally this should be thrown back to the client (or the client in some way notified) that it needs to re-resolve in order to find the current Primary (the default communication stacks that Service Fabric ships take care of watching for non-retriable exceptions and re-resolving on your behalf).
There are two current known issues with FabricNotReadableException.
FabricNotReadableException should have two variants. The first should be explicitly retriable (FabricTransientNotReadableException) and the second should be FabricNotReadableException. The first version (Transient) is the most common and is probably what you are running into, certainly what you would run into in the majority of cases. The second (non-transient) would be returned in the case where you end up talking to a Standby replica. Talking to a standby won't happen with the out of the box transports and retry logic, but if you have your own it is possible to run into it.
The other issue is that today the FabricNotReadableException should be deriving from FabricTransientException, making it easier to determine what the correct behavior is.

Posted as an answer (to asnider's comment - Mar 16 at 17:42) because it was too long for comments! :)
I am also stuck in this catch 22. My svc starts and immediately receives messages. I want to encapsulate the service startup in OpenAsync and set up some ReliableDictionary values, then start receiving message. However, at this point the Fabric is not Readable and I need to split this "startup" between OpenAsync and RunAsync :(
RunAsync in my service and OpenAsync in my client also seem to have different Cancellation tokens, so I need to work around how to deal with this too. It just all feels a bit messy. I have a number of ideas on how to tidy this up in my code but has anyone come up with an elegant solution?
It would be nice if ICommunicationClient had a RunAsync interface that was called when the Fabric becomes ready/readable and cancelled when the Fabric shuts down the replica - this would seriously simplify my life. :)

I was running into the same problem. My listener was starting up before the main thread of the service. I queued the list of listeners needing to be started, and then activated them all early on in the main thread. As a result, all messages coming in were able to be handled and placed into the appropriate reliable storage. My simple solution (this is a service bus listener):
public Task<string> OpenAsync (CancellationToken cancellationToken)
{
string uri;
Start ();
uri = "<your endpoint here>";
return Task.FromResult (uri);
}
public static object lockOperations = new object ();
public static bool operationsStarted = false;
public static List<ClientAuthorizationBusCommunicationListener> pendingStarts = new List<ClientAuthorizationBusCommunicationListener> ();
public static void StartOperations ()
{
lock (lockOperations)
{
if (!operationsStarted)
{
foreach (ClientAuthorizationBusCommunicationListener listener in pendingStarts)
{
listener.DoStart ();
}
operationsStarted = true;
}
}
}
private static void QueueStart (ClientAuthorizationBusCommunicationListener listener)
{
lock (lockOperations)
{
if (operationsStarted)
{
listener.DoStart ();
}
else
{
pendingStarts.Add (listener);
}
}
}
private void Start ()
{
QueueStart (this);
}
private void DoStart ()
{
ServiceBus.WatchStatusChanges (HandleStatusMessage,
this.clientId,
out this.subscription);
}
========================
In the main thread, you call the function to start listener operations:
protected override async Task RunAsync (CancellationToken cancellationToken)
{
ClientAuthorizationBusCommunicationListener.StartOperations ();
...
This problem likely manifested itself here as the bus in question already had messages and started firing the second the listener was created. Trying to access anything in state manager was throwing the exception you were asking about.

JOliver EventStore dispatched flag not set, events replayed on each restart

I'm using EventStore and things appear to be working, the event is stored and dispatched by my in-memory broker, the event is processed by my read model. But the "dispatched" flag in the EventStore Commits table is not getting set for some reason, so each time I restart my app it replays all of the events. I'm using SQL Server 2012 as the store. No errors are occurring. Any idea why this flag would not be getting set?
My code:
private static IStoreEvents BuildEventStore(IBroker broker)
{
return Wireup.Init()
.UsingSqlPersistence(Constants.EventStoreConnectionStringName)
.InitializeStorageEngine()
.UsingJsonSerialization()
.Compress()
.UsingAsynchronousDispatchScheduler()
.DispatchTo(new DelegateMessageDispatcher(c => DispatchCommit(broker, c)))
.Build();
}
private static void DispatchCommit(IBroker broker, Commit commit)
{
foreach (var #event in commit.Events)
{
if(!(#event.Body is IDomainEvent))
{
throw new InvalidOperationException("An event was published that is not an IDomainEvent: " + #event.Body.GetType());
}
broker.Publish((IDomainEvent)#event.Body);
}
}

UsingAsynchronousDispatchScheduler wires in a dispatcher that does its own thing out of the mainline of your processing. For example, if it has issues writing, the command processing thread is not going to hear about it.
One way of making life simplerdifferent is to change it to synchronous so your command processing chain gets to hear about the exceptions (remember that a dispatch exception doesn't roll back the fact that the command got processed and the event has now happened though).
But really you need to USL to see what the processor wired in by UsingAsynchronousDispatchScheduler does wrt reporting problems so that your monitoring can pick up the issue.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.