I have recently taken over a legacy windows service and it has been writing the following event in the system event log:
Event ID: 7034 Description: The
MyService service terminated
unexpectedly. It has done this X
time(s).
I was looking over source code and found the following code pattern in the service class library:
(It has been simplified to protect the innocent..)
public static void StartService()
{
//do some stuff...
ManageCycle();
}
public static void ManageCycle()
{
//do some stuff
ManageCycle();
}
What is this coding patten called and could it possibly cause the windows service to shutdown (i.e. memory leak)?
This looks like the stack overflow exception pattern. Eran is correct. Use a while loop:
public static void StartService()
{
//do some stuff...
isRunning = true;
ManageCycle();
}
public static void ManageCycle()
{
while(isRunning)
{
//do some stuff and wrap in exception handling
}
}
public static void StopService()
{
isRunning=false;
}
It suppose to throw StackOverflow (HA HA :) ) Exception, because of the endless recursive calling.
Take a look at this example - you should choose the technique that fits your architecture.
That's a recursive call that will ultimately blow the stack.
The best answer for this kind of situation:
Don't use Recursive Algorithms unless your algorithm has a recursive structure. For example, if you're analyzing a file system, and want to scan a specific Directory, you'd want to do something like:
void ScanDirectory(Directory)
{
// Handle Files
if (currfile.directory)
ScanDirectory(currfile)
}
This makes sense because it's much easier than doing it iteratively. But otherwise, when you're just repeating an action over and over again, making it a recursion is completely unnecessary and will cause code inefficiency and eventually stack overflows.
This is a recursive call with apparently no exit criteria. Eventually it will run out of stack since a call to ManageCycle never returns.
In addition the StartService method will never return, it ought to be spining up at least one foreground thread and then return.
Recursion, it ooks like it's recursively calling itself. I'm surprised there isn't a stack overflow exception. Perhaps the service property on the machine running this is configured to restart the service on failure.
It's recursive alright. It will keep calling itself repeatedly (a bad thing) and that will result in a stackoverflow.
What the "//do some stuff" do? Maybe there is a good reason that it calls itself,
bBut without a way to get out of the loop (recursive), the application will exit.
Related
We are using the following method in a Stateful Service on Service-Fabric. The service has partitions. Sometimes we get a FabricNotReadableException from this peace of code.
public async Task HandleEvent(EventHandlerMessage message)
{
var queue = await StateManager.GetOrAddAsync<IReliableQueue<EventHandlerMessage>>(EventHandlerServiceConstants.EventHandlerQueueName);
using(ITransaction tx = StateManager.CreateTransaction())
{
await queue.EnqueueAsync(tx, message);
await tx.CommitAsync();
}
}
Does that mean that the partition is down and is being moved? Of that we hit a secondary partition? Because there is also a FabricNotPrimaryException that is being raised in some cases.
I have seen the MSDN link (https://msdn.microsoft.com/en-us/library/azure/system.fabric.fabricnotreadableexception.aspx). But what does
Represents an exception that is thrown when a partition cannot accept reads.
mean? What happened that a partition cannot accept a read?
Under the covers Service Fabric has several states that can impact whether a given replica can safely serve reads and writes. They are:
Granted (you can think of this as normal operation)
Not Primary
No Write Quorum (again mainly impacting writes)
Reconfiguration Pending
FabricNotPrimaryException which you mention can be thrown whenever a write is attempted on a replica which is not currently the Primary, and maps to the NotPrimary state.
FabricNotReadableException maps to the other states (you don't really need to worry or differentiate between them), and can happen in a variety of cases. One example is if the replica you are trying to perform the read on is a "Standby" replica (a replica which was down and which has been recovered, but there are already enough active replicas in the replica set). Another example is if the replica is a Primary but is being closed (say due to an upgrade or because it reported fault), or if it is currently undergoing a reconfiguration (say for example that another replica is being added). All of these conditions will result in the replica not being able to satisfy writes for a small amount of time due to certain safety checks and atomic changes that Service Fabric needs to handle under the hood.
You can consider FabricNotReadableException retriable. If you see it, just try the call again and eventually it will resolve into either NotPrimary or Granted. If you get FabricNotPrimary exception, generally this should be thrown back to the client (or the client in some way notified) that it needs to re-resolve in order to find the current Primary (the default communication stacks that Service Fabric ships take care of watching for non-retriable exceptions and re-resolving on your behalf).
There are two current known issues with FabricNotReadableException.
FabricNotReadableException should have two variants. The first should be explicitly retriable (FabricTransientNotReadableException) and the second should be FabricNotReadableException. The first version (Transient) is the most common and is probably what you are running into, certainly what you would run into in the majority of cases. The second (non-transient) would be returned in the case where you end up talking to a Standby replica. Talking to a standby won't happen with the out of the box transports and retry logic, but if you have your own it is possible to run into it.
The other issue is that today the FabricNotReadableException should be deriving from FabricTransientException, making it easier to determine what the correct behavior is.
Posted as an answer (to asnider's comment - Mar 16 at 17:42) because it was too long for comments! :)
I am also stuck in this catch 22. My svc starts and immediately receives messages. I want to encapsulate the service startup in OpenAsync and set up some ReliableDictionary values, then start receiving message. However, at this point the Fabric is not Readable and I need to split this "startup" between OpenAsync and RunAsync :(
RunAsync in my service and OpenAsync in my client also seem to have different Cancellation tokens, so I need to work around how to deal with this too. It just all feels a bit messy. I have a number of ideas on how to tidy this up in my code but has anyone come up with an elegant solution?
It would be nice if ICommunicationClient had a RunAsync interface that was called when the Fabric becomes ready/readable and cancelled when the Fabric shuts down the replica - this would seriously simplify my life. :)
I was running into the same problem. My listener was starting up before the main thread of the service. I queued the list of listeners needing to be started, and then activated them all early on in the main thread. As a result, all messages coming in were able to be handled and placed into the appropriate reliable storage. My simple solution (this is a service bus listener):
public Task<string> OpenAsync (CancellationToken cancellationToken)
{
string uri;
Start ();
uri = "<your endpoint here>";
return Task.FromResult (uri);
}
public static object lockOperations = new object ();
public static bool operationsStarted = false;
public static List<ClientAuthorizationBusCommunicationListener> pendingStarts = new List<ClientAuthorizationBusCommunicationListener> ();
public static void StartOperations ()
{
lock (lockOperations)
{
if (!operationsStarted)
{
foreach (ClientAuthorizationBusCommunicationListener listener in pendingStarts)
{
listener.DoStart ();
}
operationsStarted = true;
}
}
}
private static void QueueStart (ClientAuthorizationBusCommunicationListener listener)
{
lock (lockOperations)
{
if (operationsStarted)
{
listener.DoStart ();
}
else
{
pendingStarts.Add (listener);
}
}
}
private void Start ()
{
QueueStart (this);
}
private void DoStart ()
{
ServiceBus.WatchStatusChanges (HandleStatusMessage,
this.clientId,
out this.subscription);
}
========================
In the main thread, you call the function to start listener operations:
protected override async Task RunAsync (CancellationToken cancellationToken)
{
ClientAuthorizationBusCommunicationListener.StartOperations ();
...
This problem likely manifested itself here as the bus in question already had messages and started firing the second the listener was created. Trying to access anything in state manager was throwing the exception you were asking about.
I am trying to write a profiler that logs all .Net method calls in a process. The goal is to make it highly performant and keep let's say the last 5-10 minutes in memory (fixed buffer, cyclically overwrite old info) until the user triggers that info to be written to disk. Intended use is to track down rarely reproducing performance issues.
I started off with the SimpleCLRProfiler project from https://github.com/appneta/SimpleCLRProfiler. The profiler makes use of the ICorProfilerCallback2 callback interface of .Net profiling. I got it to compile and work in my environment (Win 8.1, .Net 4.5, VS2012). However, I noticed that sometimes Leave calls are missing for which Enter calls were logged. Example of a Console.WriteLine call (I reduced the output of DbgView to what is minimally necessary to understand):
Line 1481: Entering System.Console.WriteLine
Line 1483: Entering SyncTextWriter.WriteLine
Line 1485: Entering System.IO.TextWriter.WriteLine
Line 1537: Leaving SyncTextWriter.WriteLine
Two Entering calls don't have corresponding Leaving calls. The profiled .Net code looks like this:
Console.WriteLine("Hello, Simple Profiler!");
The relevant SimpleCLRProfiler methods are:
HRESULT CSimpleProfiler::registerGlobalCallbacks()
{
HRESULT hr = profilerInfo3->SetEnterLeaveFunctionHooks3WithInfo(
(FunctionEnter3WithInfo*)MethodEntered3,
(FunctionEnter3WithInfo*)MethodLeft3,
(FunctionEnter3WithInfo*)MethodTailcall3);
if (FAILED(hr))
Trace_f(L"Failed to register global callbacks (%s)", _com_error(hr).ErrorMessage());
return S_OK;
}
void CSimpleProfiler::OnEnterWithInfo(FunctionID functionId, COR_PRF_ELT_INFO eltInfo)
{
MethodInfo info;
HRESULT hr = info.Create(profilerInfo3, functionId);
if (FAILED(hr))
Trace_f(L"Enter() failed to create MethodInfo object (%s)", _com_error(hr).ErrorMessage());
Trace_f(L"[%p] [%d] Entering %s.%s", functionId, GetCurrentThreadId(), info.className.c_str(), info.methodName.c_str());
}
void CSimpleProfiler::OnLeaveWithInfo(FunctionID functionId, COR_PRF_ELT_INFO eltInfo)
{
MethodInfo info;
HRESULT hr = info.Create(profilerInfo3, functionId);
if (FAILED(hr))
Trace_f(L"Enter() failed to create MethodInfo object (%s)", _com_error(hr).ErrorMessage());
Trace_f(L"[%p] [%d] Leaving %s.%s", functionId, GetCurrentThreadId(), info.className.c_str(), info.methodName.c_str());
}
Does anybody have an idea, why the .Net Profiler would not perform Leave calls for all leaving methods? By the way, I checked that the OnLeaveMethod does not unexpectedly exit before any trace due to an exception or so. It doesn't.
Thanks, Christoph
Since stakx does not seem to be coming back to my question to provide an official answer (and get the credit) so I will do it for him:
As stakx had hinted at, I didn't log tail calls. In fact, I wasn't even aware of the concept so I had completely ignored that hook method (it was wired up but empty). I found a good explanation of tail calls here: David Broman's CLR Profiling API Blog: Enter, Leave, Tailcall Hooks Part 2: Tall tales of tail calls.
I quote from the link above:
Tail calling is a compiler optimization that saves execution of instructions and saves reads and writes of stack memory. When the last thing a function does is call another function (and other conditions are favorable), the compiler may consider implementing that call as a tail call, instead of a regular call.
Consider this code:
static public void Main() {
Helper();
}
static public void Helper() {
One();
Three();
}
static public void Three() {
...
}
When method Three is called, without tail call optimization, the stack will look like this.
Three
Helper
Main
With tail call optimization, the stack looks like this:
Three
Main
So before calling Three, due to the optimization, method Helper was already popped of the stack and as a result, there is one less method on the stack (less memory usage) and also some executions and memory write operations were saved.
I've just "earned" the privilege to maintain a legacy library coded in C# at my current work.
This dll:
Exposes methods for a big legacy system made with Uniface, that has no choice but calling COM objects.
Serves as a link between this legacy system, and another system's API.
Uses WinForm for its UI in some cases.
More visually, as I understand the components :
*[Big legacy system in Uniface]* ==[COM]==> [C# Library] ==[Managed API]==> *[Big EDM Management System]*
The question is: One of the methods in this C# Library takes too long to run and I "should" make it asynchronous!
I'm used to C#, but not to COM at all. I've already done concurrent programming, but COM seems to add a lot of complexity to it and all my trials so far end in either:
A crash with no error message at all
My Dll only partially working (displaying only part of its UI, and then closing), and still not giving me any error at all
I'm out of ideas and resources about how to handle threads within a COM dll, and I would appreciate any hint or help.
So far, the biggest part of the code I've changed to make my method asynchronous :
// my public method called by the external system
public int ComparedSearch(string application, out string errMsg) {
errMsg = "";
try {
Action<string> asyncOp = AsyncComparedSearch;
asyncOp.BeginInvoke(application, null, null);
} catch (ex) {
// ...
}
return 0;
}
private int AsyncComparedSearch(string application) {
// my actual method doing the work, that was the called method before
}
Any hint or useful resource would be appreciated.
Thank you.
UPDATE 1:
Following answers and clues below (especially about the SynchronizationContext, and with the help of this example) I was able to refactor my code and making it to work, but only when called from another Window application in C#, and not through COM.
The legacy system encounters a quite obscure error when I call the function and doesn't give any details about the crash.
UPDATE 2:
Latest updates in my trials: I managed to make the multithreading work when the calls are made from a test project, and not from the Uniface system.
After multiple trials, we tend to think that our legacy system doesn't support well multithreading in its current config. But that's not the point of the question any more :)
Here is a exerpt of the code that seems to work:
string application;
SynchronizationContext context;
// my public method called by the external system
public int ComparedSearch(string application, out string errMsg) {
this.application = application;
context = WindowsFormsSynchronizationContext.Current;
Thread t = new Thread(new ThreadStart(AsyncComparedSearchAndShowDocs));
t.Start();
errMsg = "";
return 0;
}
private void AsyncComparedSearch() {
// ANY WORK THAT AS NOTHING TO DO WITH UI
context.Send(new SendOrPostCallback(
delegate(object state)
{
// METHODS THAT MANAGE UI SOMEHOW
}
), null);
}
We are now considering other solutions than modifying this COM assembly, like encapsulating this library in a Windows Service and creating an interface between the system and the service. It should be more sustainable..
It is hard to tell without knowing more details, but there are few issues here.
You execute the delegate on another thread via BeginInvoke but you don't wait for it. Your try\catch block won't catch anything as it has already passed while the remote call is still being executed. Instead, you should put try\catch block inside AsyncComparedSearch.
As you don't wait for the end of the execution of remote method (EndInvoke or via callback) I am not sure how do you handle the results of the COM call. I guess then that you update the GUI from within AsyncComparedSearch. If so, it is wrong, as it is running on another thread and you should never update GUI from anywhere but the GUI thread - it will most likely result with a crash or other unexpected behavior. Therefore, you need to sync the GUI update work to GUI thread. In WinForms you need to use Control.BeginInvoke (don't confuse it with Delegate.BeginInvoke) or some other way (e.g. SynchronizationContext) to sync the code to GUI thread. I use something similar to this:
private delegate void ExecuteActionHandler(Action action);
public static void ExecuteOnUiThread(this Form form, Action action)
{
if (form.InvokeRequired) { // we are not on UI thread
// Invoke or BeginInvoke, depending on what you need
form.Invoke(new ExecuteActionHandler(ExecuteOnUiThread), action);
}
else { // we are on UI thread so just execute the action
action();
}
}
then I call it like this from any thread:
theForm.ExecuteOnUiThread( () => theForm.SomeMethodWhichUpdatesControls() );
Besides, read this answer for some caveats.
I'm playing a little bit with some C# Winforms/WPF code and just stumbled upon something strange.
Let's say I have a code like this:
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
DoSomething();
// something more if everything worked okay
}
}
What puzzles me is that I cannot simply close the application from the method DoSomething before the constructor finishes its job. If anything during the execution of DoSomething fails, I need to close the application immediately; however, it just keeps running, executes the part // something more... and THEN closes, but that's way too late for me.
I have to put the code for closing the form inside the constructor itself with a following return; and then it works, but I don't really find that an acceptable solution. I'm trying to move such validation logic from the constructor to my methods.
I've tried things like:
public void DoSomething()
{
Close();
}
and
public void DoSomething()
{
Application.Current.Shutdown();
}
But it doesn't seem to work. Yes, both codes do close the application, but only after a fully finished constructor code.
Why would I need such a thing? Well, because at startup I need to check for various things, like availability of the connection and hardware, validate the user etc, and if anything fails, there's no point of executing more code.
I tried the same principle with Winforms and WPF (hence the tags) — works the same way.
Can anybody provide an explanation or a solution?
Just try using Environment.Exit(-1) in your situation and all will be good.
ADDED: This is the best reference i can get for you.
Difference between Application.Exit vs Application.Shutdown vs Environment.Exit
Application.Exit() is for exiting a windows forms application in a graceful way. Basically, it stops the message pump, closes all windows and lands you back in the Main() method just after the call to Application.Run(). However, sometimes it doesn't appear to work - this is usually because there are other foreground threads (apart from the UI thread) still running which are preventing the thread from ending.
Application.Shutdown() is (broadly) the equivalent of Application.Exit() in a WPF application. However, you have a bit more control as you can set the ShutDownMode so that the application shuts down when the main window closes, the last window closes or only when this method is called.
Environment.Exit() kills all running threads and the process itself stone dead. This should only be used in WF or WPF as a last resort when the more graceful methods are not working for some reason. It can also be used to make an abrupt exit from a console application.
Another Reference: How to properly exit a C# application?
You can always ignore your fellow developers and just use Environment.FailFast()
But really - don't. If you have critical things to do, S.A verifying the serial port is connected to the nuclear power plant, just do it prior. There's no rule forcing you to Application.Run(...) as soon as Main() is called.
There have already been posted viable solutions for your problem.
Just to answer your follow-up question: the reason why methods like Close() and Shutdown() do not immediately exit your application is that both just push messages into the application's message queue. They are only processed after MainWindow's constructor finished and code execution returns to the message processing loop, maybe even after some other still pending messages in the queue have been handled too.
On the contrary, methods like Environment.Exit() or Environment.FailFast() are kind of hard-core os functions leading to more or less immediately killing the process.
A workaround would be to throw a exception and handle it in application.UnhandledException
Define an Exception class:
public class InitializationException : Exception
{
public InitializationException()
{}
public InitializationException(string msg)
: base(msg)
{}
public InitializationException(string msg, Exception inner)
: base(msg, inner)
{}
}
and change your code like this:
public partial class MainWindow : Window
{
public MainWindow()
{
InitializeComponent();
try
{
DoSomething();
// maybe something more if everything went ok
}
catch( InitializationException ex )
{
// log the exception
Close();
}
}
public void DoSomething()
{
if (notSomethingOK)
throw new InitializationException( "Something is not OK and the applicaiton must shutdown." );
}
}
This is a clean and maintainable solution.
System.Windows.Forms.Application.Exit();
Conceptually such things should not be used in class constructors. Constructor is somewhat made for instance initialization with starting state and not the actual things may happen (like exceptions, message boxes, etc).
Don't forget that you can just return; from constructor, if you need to break its execution. This is better tactic (most times you don't need to just shutdown application on error without displaying some text).
There are "window shown", "visibility changed", "loaded" and many other events in C# on Windows/WPF, that you can override virtually or add as an event handler. Initialize your form/app there.
They're normal methods so all works as expected. You can try throwing exceptions that your application entry point (Main function) will just catch and ignore.
For WPF, check this:
- https://msdn.microsoft.com/en-us/library/system.windows.forms.application.setunhandledexceptionmode(v=vs.110).aspx.
I have a windows service written in c#. It has a timer inside, which fires some functions on a regular basis. So the skeleton of my service:
public partial class ArchiveService : ServiceBase
{
Timer tickTack;
int interval = 10;
...
protected override void OnStart(string[] args)
{
tickTack = new Timer(1000 * interval);
tickTack.Elapsed += new ElapsedEventHandler(tickTack_Elapsed);
tickTack.Start();
}
protected override void OnStop()
{
tickTack.Stop();
}
private void tickTack_Elapsed(object sender, ElapsedEventArgs e)
{
...
}
}
It works for some time (like 10-15 days) then it stops. I mean the service shows as running, but it does not do anything. I make some logging and the problem can be the timer, because after the interval it does not call the tickTack_Elapsed function.
I was thinking about rewrite it without a timer, using an endless loop, which stops the processing for the amount of time I set up. This is also not an elegant solution and I think it can have some side effects regarding memory.
The Timer is used from the System.Timers namespace, the environment is Windows 2003. I used this approach in two different services on different servers, but both is producing this behavior (this is why I thought that it is somehow connected to my code or the framework itself).
Does somebody experienced this behavior? What can be wrong?
Edit:
I edited both services. One got a nice try-catch everywhere and more logging. The second got a timer-recreation on a regular basis. None of them stopped since them, so if this situation remains for another week, I will close this question. Thank you for everyone so far.
Edit:
I close this question because nothing happened. I mean I made some changes, but those changes are not really relevant in this matter and both services are running without any problem since then. Please mark it as "Closed for not relevant anymore".
unhandled exceptions in timers are swallowed, and they silently kill the timer
wrap the body of your timer code in a try-catch block
I have seen this before with both timer, and looped services. Usually the case is that an exception is caught that stops the timer or looping thread, but does not restart it as part of the exception recovery.
To your other points...
I dont think that there is anything "elegant" about the timer. For me its more straight forward to see a looping operation in code than timer methods. But Elegance is subjective.
Memory issue? Not if you write it properly. Maybe a processor burden if your Thread.Sleep() isn't set right.
http://support.microsoft.com/kb/842793
This is a known bug that has resurfaced in the Framework more than once.
The best known work-around: don't use timers. I've rendered this bug ineffective by doing a silly "while (true)" loop.
Your mileage may vary, so verify with your combination of OS/Framework bits.
Like many respondents have pointed out exceptions are swallowed by timer. In my windows services I use System.Threading.Timer. It has Change(...) method which allows you to start/stop that timer. Possible place for exception could be reentrancy problem - in case when tickTack_Elapsed executes longer than timer period. Usually I write timer loop like this:
void TimeLoop(object arg)
{
stopTimer();
//Do some stuff
startTimer();
}
You could also lock(...) your main loop to protect against reentrancy.
Interesting issue. If it is truly just time related (i.e. not an exception), then I wonder if you can simply periodically recycle the timer - i.e.
private void tickTack_Elapsed(object sender, ElapsedEventArgs e)
{
CheckForRecycle();
// ... actual code
}
private void CheckForRecycle()
{
lock(someLock) {
if(++tickCount > MAX_TICKS) {
tickCount = 0;
tickTack.Stop();
// re-create timer
tickTack = new Timer(...);
tickTack.Elapsed += ...
tickTack.Start();
}
}
}
You could probably merge chunks of this with the OnStart / OnStop etc to reduce duplication.
Have you checked the error logs? Maybe you run out of timers somehow. Maybe you can create just one timer when you initialize the ArchiveService and skip the OnStart stuff.
I have made exactly the same as you in a few projects but have not had the problem.
Do you have code in the tickTac_Elapsed that can be causing this? Like a loop that never ends or some error that stops the timer, using threads and waiting for ending of those and so on?