I'm encountering an issue where a service is exiting on errors that should never propagate up.
I built a microservice manager (.NET as the local environment doesnt support .NET Core and some of its native microservice abilities)
Built in VS2019 targeting .NET 4.5.2 (I know, but this is the world we live in)
For the microservice manager, it is built and installed as a windows service. Entry looks like this (#if/#else was for testing locally, it is working as intended when registered as a windows service)
Program.cs (Entry point)
` static class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
static void Main()
{
#if DEBUG
Scheduler myScheduler = new Scheduler();
myScheduler.OnDebug();
System.Threading.Thread.Sleep(System.Threading.Timeout.Infinite);
#else
ServiceBase[] ServicesToRun;
ServicesToRun = new ServiceBase[]
{
new Scheduler()
};
ServiceBase.Run(ServicesToRun);
#endif
}
}`
Scheduler.cs
//(confidential code hidden)
`private static readonly Configuration config = Newtonsoft.Json.JsonConvert.DeserializeObject<Configuration>(
File.ReadAllText(configFilePath)
);
public Scheduler()
{
//InitializeComponent(); //windows service, doesnt need UI components initialized
}
public void OnDebug()
{
OnStart(null); //triggers when developing locally
}
protected override async void OnStart(string[] args)
{
try
{
logger.Log($#"Service manager starting...");
logger.Log($#"Finding external services... {config.services.Count} services found.");
foreach (var service in config.services)
{
try
{
if (service.disabled)
{
logger.Log(
$#"Skipping {service.name}: disabled=true in Data Transport Service's appSettings.json file");
continue;
}
logger.Queue($#"Starting: {service.name}...");
string serviceLocation = service.useRelativePath
? Path.Combine(assemblyLocation, service.path)
: service.path;
var svc = Assembly.LoadFrom(serviceLocation);
var assemblyType = svc.GetType($#"{svc.GetName().Name}.Program");
var methodInfo = assemblyType.GetMethod("Main");
var instanceObject = Activator.CreateInstance(assemblyType, new object[0]);
methodInfo.Invoke(instanceObject, new object[0]);
logger.Queue(" Running").Send("");
}
catch (TargetInvocationException ex)
{
logger.Queue(" Failed").Send("");
logger.Log("an error occurred", LOG.LEVEL.CRITICAL, ex);
}
catch (Exception ex)
{
logger.Queue(" Failed").Send("");
logger.Log("an error occurred", LOG.LEVEL.CRITICAL, ex);
}
}
logger.Log("Finished loading services.");
}
catch (Exception ex)
{
logger.Log($#"Critical error encountered", LOG.LEVEL.CRITICAL, ex);
}
}
Microservice:
public [Confidential]()
{
if (currentProfile == null)
{
var errMsg =
$#"Service not loaded, Profile not found, check appSettings.currentProfile: '{config.currentProfile}'";
logger.Log(errMsg,severity: LOG.LEVEL.CRITICAL);
throw new SettingsPropertyNotFoundException(errMsg);
}
if (currentProfile.disabled)
{
var errMsg = $#"Service not loaded: {config.serviceName}, Service's appSettings.currentProfile.disabled=true";
logger.Log(errMsg,LOG.LEVEL.WARN);
throw new ArgumentException(errMsg);
}
logger.Log($#"Loading: '{config.serviceName}' with following configuration:{Environment.NewLine}{JsonConvert.SerializeObject(currentProfile,Formatting.Indented)}");
logger.Queue($#"Encrypting config file passwords...");
bool updateConfig = false;
foreach (var kafkaSource in config.dataTargets)
{
if (!kafkaSource.password.IsEncrypted())
{
updateConfig = true;
logger.Queue($#"%tabEncrypting: {kafkaSource.name}");
kafkaSource.password = kafkaSource.password.Encrypt();
}
else
{
logger.Queue($#"%tabAlready encrypted: {kafkaSource.name}");
}
}
logger.Send(Environment.NewLine);
if (updateConfig)
{
File.WriteAllText(
configFilePath,
Newtonsoft.Json.JsonConvert.SerializeObject(config));
}
var _source = config.dataSources.FirstOrDefault(x=>x.name==currentProfile.dataSource);
var _target = config.dataTargets.FirstOrDefault(x => x.name == currentProfile.dataTarget);
source = new Connectors.Sql(logger,
_source?.name,
_source?.connectionString,
_source.pollingInterval,
_source.maxRowsPerSelect,
_source.maxRowsPerUpdate);
target = new Connectors.KafkaProducer(logger)
{
bootstrapServers = _target?.bootstrapServers,
name = _target?.name,
password = _target?.password.Decrypt(),
sslCaLocation = Path.Combine(assemblyLocation,_target?.sslCaLocation),
topic = _target?.topic,
username = _target?.username
};
Start();
}
public void Start()
{
Timer timer = new Timer();
try
{
logger.Log($#"SQL polling interval: {source.pollingInterval} seconds");
timer.Interval = source.pollingInterval * 1000;
timer.Elapsed += new ElapsedEventHandler(this.OnTimer);
timer.Start();
if (currentProfile.executeOnStartup)
Run();
}
catch (Exception ex)
{
var sb = new StringBuilder();
sb.AppendLine($#"Critical error encountered loading external service: {config.serviceName}.");
if (!timer.Enabled)
sb.AppendLine($#"service unloaded - Schedule not started!");
else
sb.AppendLine($#"service appears to be loaded and running on schedule.");
logger.Log(sb.ToString(), LOG.LEVEL.CRITICAL, ex);
}
}
public void OnTimer(object sender, ElapsedEventArgs e)
{
try
{
Run();
}
catch (Exception ex)
{
logger.Log($#"Critical error during scheduled run on service: {config.serviceName}.", LOG.LEVEL.CRITICAL, ex);
}
}
public async void Run()
{
//Get new alarm events from SQL source
logger.Queue("Looking for new alarms...");
var rows = await GetNewEvents();`
The exception occurred during the GetNewEvents method, which attempted to open a SqlConnection to a SQL server that was unavailable due to network issues, that method intentionally throws an exception, which should throw up to OnTimer, where it gets caught, logged, and the timer keeps running. During development/testing, I used invalid credentials, bad connection string, etc and simulated this type of error and it worked as expected, logged the error, kept running. For some reason recently, that error is not caught in OnTimer, it propagates up, where it should be caught by Start (but isn't), after that it should be caught by the parent service manager which is entirely wrapped in a try/catch with no throw's, and above that (because their could be multiple microservices managed by that service) the entry point to the service manager is wrapped in try/catch with no throws, all for isolation from microservice errors. For some reason though, now, the error from a VERY downstream application is propagating all the way up.
Typically, this code runs 24/7 no issues, the microservice it is loading from the config file launches and runs fine. The entry into that specific microservice starts with a try {...} catch (Exception ex) {...} block.
The concept is to have a microservice manager than can launch a number of microservices without having to install all of them as windows services, and have some level of configuration driven by a config file that dictates how the main service runs.
The microservice represented here opens a SQL connection, reads data, performs business logic, publishes results to Kafka, it does this on a polling interval dictated by the config file contained in the microservice. As stated above, its ran for months without issue.
Recently, I noticed the main microservice manager service was not running on the windows server, I investigated the Server Application Logs and found a "Runtime Error" that essentially stated the microservice, while attempting to connect to sql, failed (network issue) and caused the entire microservice manager to exit. To my understanding, they way I'm launching the microservice should isolate it from the main service manager app. Additionally, the main service manager app is wrapped in a very generic try catch block. The entry point to the micro service itself is wrapped in a try catch, and almost every component in the microservice is wrapped in try / catch per business need. The scenario that faulted (cant connect to sql) intentionally throws an error for logging purposes, but should be caught by the immediate parent try/catch, which does not propagate or re-throw, only logs the error to a txt file and the windows server app log.
How is it that this exception is bubbling up through isolation points and causing the main service to fault and exit? I tested this extensively during development and prior to release, this exact scenario being unable to connect to sql, and it generated the correct log entry, and tried again on the next polling cycle as expected.
I haven't tried any other approaches as yet, as I feel they would be band-aid fixes as best as I dont understand why the original design is suddenly failing. The server hasn't changed, no patching/security updates/etc.
From the server Application Log:
Application: DataTransportService.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Exception
at Connectors.SqlHelper.DbHelper+d__13`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
at IntelligentAlarms.IntelligentAlarm+d__14.MoveNext()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(System.Threading.Tasks.Task)
at IntelligentAlarms.IntelligentAlarm+d__12.MoveNext()
at System.Runtime.CompilerServices.AsyncMethodBuilderCore+<>c.b__6_1(System.Object)
at System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Related
How can i catch the exception that occurs when starting a windows service. I am unable to get the exception here in my below code even though i am throwing exception in the Onstart() method of the service.
public class InterOpIntegrationWinService : ServiceBase
{
protected override void OnStart(string[] args)
{
throw new InvalidOperationException(message);
}
}
Calling thread code
try
{
using (ServiceController controller = new ServiceController())
{
controller.ServiceName = objServiceConfig.ServiceName;
controller.Start();
System.Windows.Forms.Application.DoEvents();
//controller.WaitForStatus(ServiceControllerStatus.Running, new TimeSpan(0, 0, 15));
//controller.WaitForStatus(ServiceControllerStatus.Running);
//if (!string.IsNullOrEmpty(LogUtilities.ServiceOnStartException))
//{
// MessageBox.Show("Error with starting service : " + LogUtilities.ServiceOnStartException);
// LogUtilities.ServiceOnStartException = string.Empty;
//}
}
}
catch (System.InvalidOperationException InvOpExcep)
{
DisplayError(InvOpExcep.Message);
LogUtilities.DisplayMessage("Failed to start service. " + LogUtilities.ServiceOnStartException, InvOpExcep);
LogUtilities.ServiceOnStartException = string.Empty;
}
catch (Exception ex)
{
DisplayError(ex.Message);
LogUtilities.DisplayMessage("Failed to start service. " + LogUtilities.ServiceOnStartException, ex);
LogUtilities.ServiceOnStartException = string.Empty;
}
i check for application license in the onstart() method and throws a licensing error if it fails. i want this to shared to my calling thread so i could show the message in a DialogBox. Any ideas of how to do this if i cannot handle the exceptions in my calling process.
Separate your service into (at least) two components - a component that deals with IPC in some form (e.g. Remoting, WCF endpoint, REST service, etc) and (one or more) components that do its actual job.
If the licensing check fails, don't start the other components - but do still start the component that offers IPC. After starting your service (which should now always at least start), you forms-based application can connect to the service and (through whatever means you want) determine that the service is currently refusing to provide any functionality due to a failed licensing check.
I have an instance of the following code that executes correctly in Debug or as a standalone Windows application:
TcpListener tcpListener = new TcpListener(IPAddress.Any, 4554);
tcpListener.Start();
while (true)
{
try
{
using (Socket socket = tcpListener.AcceptSocket())
{
// Code here is reached in Debug or as a Console Application
// but not as a Windows Service
}
}
catch (SocketException se)
{
// This is never reached
}
catch (Exception ex)
{
// This is never reached
}
finally
{
// This is never reached in the Windows Service
}
}
However, when I install it as a Windows Service, it crashes on tcpListener.AcceptSocket(), and logs the following to the Event Viewer:
An unhandled exception ('System.Net.Sockets.SocketException') occurred in MyService.exe [768]. Just-In-Time debugging this exception failed with the following error: The operation attempted is not supported.
Even trying to catch the exception I am unable to log anything more. Stepping through code in Debug accomplishes nothing because the code successfully blocks the application and waits for a client connection.
Is there a way to implement this for a Windows Service?
usr's advice (and this answer) led me to a bug in the code. The ServiceBase class contained the following:
protected override void OnStart(string[] args)
{
_worker = new Thread(ExecuteService);
_worker.Start();
}
private void ExecuteService()
{
for (;;)
{
if (_stop.WaitOne(1000))
{
new TcpServer().StartTcpServer();
return;
}
}
}
The correct implementation was to remove the for loop, which was re-instantiating the listener. Here is the final code:
protected override void OnStart(string[] args)
{
_worker = new Thread(ExecuteService);
_worker.Start();
}
private static void ExecuteService()
{
new TcpServer().StartTcpServer();
}
I have window service which acts as a sync software. I want to add unhanded exception logging on my service, so I modified my program.cs like this:
static class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[SecurityPermission(SecurityAction.Demand, Flags = SecurityPermissionFlag.ControlAppDomain)]
static void Main()
{
// Register Unhandled Exception Handler
AppDomain.CurrentDomain.UnhandledException +=
new UnhandledExceptionEventHandler(UnhandledExceptionHandler);
// Run Service
ServiceBase[] ServicesToRun;
ServicesToRun = new ServiceBase[]
{
new Service()
};
ServiceBase.Run(ServicesToRun);
}
static void UnhandledExceptionHandler(object sender, UnhandledExceptionEventArgs args)
{
// Get Exception
Exception ex = (Exception)args.ExceptionObject;
// Generate Error
string ErrorMessage = String.Format(
"Error: {0}\r\n" +
"Runtime Terminating: {1}\r\n----- ----- ----- ----- ----- -----\r\n\r\n" +
"{2}\r\n\r\n####################################\r\n",
ex.Message,
args.IsTerminating,
ex.StackTrace.Trim());
// Write Error To File
try
{
using (StreamWriter sw = File.AppendText("UnhandledExceptions.log"))
sw.WriteLine(errorMessage);
}
catch { }
}
}
Then on my Service.cs file, in the OnStart method, I added a throw new Exception("test"); to see if unhanded exceptions are being logged to file as expected.
When I start my service, it stops immediately as expected; however it doesn't seem to be logging the exception to the specified file.
Any idea what I am doing wrong here? Thanks in advance for any help.
Before you ask, my service runs as Local Service and the directory where my service .exe runs from (c:\mysync) already has Local Service added in the security tab with full read/write access.
OnStart is called in Service base class inside try-catch block. If an exception happens on this stage it catches it and just set a status 1 as a result and do not throw it further:
string[] args = (string[]) state;
try
{
this.OnStart(args);
.....
}
catch (Exception ex)
{
this.WriteEventLogEntry(Res.GetString("StartFailed", new object[1]
{
(object) ((object) ex).ToString()
}), EventLogEntryType.Error);
this.status.currentState = 1;
}
As a result you can find a record in EventLogs, but you can't catch it as an unhanded domain exception, as there is no such exception.
using (StreamWriter sw = File.AppendText("UnhandledExceptions.log"))
It is forever a really bad idea to not use full path names for files (like c:\foo\bar.log). Especially in a service, you have very little control over the default directory for your service. Because it is started by the service control manager, not by the user from the command prompt or a desktop shortcut.
So high odds that you are just looking at the wrong file. The real one probably ended up being written to c:\windows\system32 (or syswow64). The operating system directories are normally write protected but that doesn't work for a service, they run with a highly privileged account so can litter the hard drive anywhere.
Always use full path names. Using the EventLog instead is highly recommended.
I have a Windows Service application which is performing some calls to SQL Server. I have a particular unit of work to do which involves saving one row to the Message table and updating multiple rows in the Buffer table.
I have wrapped these two SQL statements into a TransactionScope to ensure that they either both get committed, or neither get committed.
The high level code looks like this:
public static void Save(Message message)
{
using (var transactionScope = new TransactionScope())
{
MessageData.Save(message.TransactionType,
message.Version,
message.CaseNumber,
message.RouteCode,
message.BufferSetIdentifier,
message.InternalPatientNumber,
message.DistrictNumber,
message.Data,
message.DateAssembled,
(byte)MessageState.Inserted);
BufferLogic.FlagSetAsAssembled(message.BufferSetIdentifier);
transactionScope.Complete();
}
}
This has all worked perfectly on my development machine with a local SQL Server installation.
On deploying the Windows Service to a server (but connecting back to my local machine's SQL Server) I am intermittently getting this error message:
System.ArgumentNullException: Value cannot be null.
at System.Threading.Monitor.Enter(Object obj)
at System.Data.ProviderBase.DbConnectionPool.TransactedConnectionPool.TransactionEnded(Transaction transaction, DbConnectionInternal transactedObject)
at System.Data.SqlClient.SqlDelegatedTransaction.SinglePhaseCommit(SinglePhaseEnlistment enlistment)
at System.Transactions.TransactionStateDelegatedCommitting.EnterState(InternalTransaction tx)
at System.Transactions.CommittableTransaction.Commit()
at System.Transactions.TransactionScope.InternalDispose()
at System.Transactions.TransactionScope.Dispose()
at OpenLink.Logic.MessageLogic.Save(Message message) in E:\DevTFS\P0628Temp\OpenLink\OpenLink.Logic\MessageLogic.cs:line 30
at OpenLinkMessageAssembler.OpenLinkMessageAssemblerService.RunService() in E:\DevTFS\P0628Temp\OpenLink\OpenLinkMessageAssembler\OpenLinkMessageAssemblerService.cs:line 99
I believe the line of code being referred to by the exception is where the using block is closed, thus calling the Dispose() method of the TransactionScope. I'm at a bit of a loss here, as the exception seems to be thrown by the internal workings of the TransactionScope class.
One thing that may be significant is that when installing on the server, I had to enable some of the settings for the Distributed Transaction Coordinator to allow network access This got me into thinking that when it's all on my local machine, DTC is probably not used.
Could DTC be part of the cause of this exception?
I also considered whether it was to do with connection pools being maxed out, but would expect a more useful exception than what I'm getting. I kept running the query in this question to check the connection pool size, and it never exceeded four.
My ultimate question is, why is this error intermittently occurring? How can I diagnose what's causing it?
Edit: Threading
#Joe suggested this could be a threading issue. I've therefore included the skeleton code of my Windows Service below to see if it is problematic.
Note that the EventLogger class writes only to the Windows event log and does not connect to SQL Server.
partial class OpenLinkMessageAssemblerService : ServiceBase
{
private volatile bool _isStopping;
private readonly ManualResetEvent _stoppedEvent;
private readonly int _stopTimeout = Convert.ToInt32(ConfigurationManager.AppSettings["ServiceOnStopTimeout"]);
Thread _workerThread;
public OpenLinkMessageAssemblerService()
{
InitializeComponent();
_isStopping = false;
_stoppedEvent = new ManualResetEvent(false);
ServiceName = "OpenLinkMessageAssembler";
}
protected override void OnStart(string[] args)
{
try
{
_workerThread = new Thread(RunService) { IsBackground = true };
_workerThread.Start();
}
catch (Exception exception)
{
EventLogger.LogError(ServiceName, exception.ToString());
throw;
}
}
protected override void OnStop()
{
// Set the global flag so it can be picked up by the worker thread
_isStopping = true;
// Allow worker thread to exit cleanly until timeout occurs
if (!_stoppedEvent.WaitOne(_stopTimeout))
{
_workerThread.Abort();
}
}
private void RunService()
{
// Check global flag which indicates whether service has been told to stop
while (!_isStopping)
{
try
{
var buffersToAssemble = BufferLogic.GetNextSetForAssembly();
if (!buffersToAssemble.Any())
{
Thread.Sleep(30000);
continue;
}
... // Some validation code removed here for clarity
string assembledMessage = string.Empty;
buffersToAssemble.ForEach(b => assembledMessage += b.Data);
var messageParser = new MessageParser(assembledMessage);
var message = messageParser.Parse();
MessageLogic.Save(message); // <-- This calls the method which results in the exception
}
catch (Exception exception)
{
EventLogger.LogError(ServiceName, exception.ToString());
throw;
}
}
_stoppedEvent.Set();
}
}
Check you have setup Your your web server and separate db server if you have them separate.
http://itknowledgeexchange.techtarget.com/sql-server/how-to-configure-dtc-on-windows-2008/
For Logging i would Suggest put int a try catch inside the transaction scope However if you logging to database you will need to make use of transaction scope suppress function
using(TransactionScope scope4 = new
TransactionScope(TransactionScopeOption.Suppress))
{
...
}
I worked around this by stopping the transaction from being escalated to DTC. By using SQL 2008 instead of SQL 2005, the transaction does not get escalated, and all is fine.
You do not mention your .Net version but according to
http://support.microsoft.com/kb/960754, there is an issue with 2.50727.4016 version of System.Data.dll.
If your server has this older version, I would try to get the updated one from Microsoft.
Have a windows service that listens to a msmq. In the OnStart method is have this
protected override void OnStart(string[] args)
{
try
{
_queue = new MessageQueue(_qPath);//this part works as i had logging before and afer this call
//Add MSMQ Event
_queue.ReceiveCompleted += new ReceiveCompletedEventHandler(queue_ReceiveCompleted);//this part works as i had logging before and afer this call
_queue.BeginReceive();//This is where it is failing - get a null reference exception
}
catch(Exception ex)
{
EventLogger.LogEvent(EventSource, EventLogType, "OnStart" + _lineFeed +
ex.InnerException.ToString() + _lineFeed + ex.Message.ToString());
}
}
where
private MessageQueue _queue = null;
This works on my machine but when deployed to a windows 2003 server and running as Network service account, it fails
Exception recvd:
Service cannot be started. System.NullReferenceException: Object reference not set to an instance of an object.
at MYService.Service.OnStart(String[] args)
at System.ServiceProcess.ServiceBase.ServiceQueuedMainCallback(Object state)
Solved:
Turned out that the Q that i set up, I had to explicitly add Network Service account to it under security tab
You're seeing that particular exception because you're calling ex.InnerException.ToString(). The InnerException property is not always populated (in fact, it frequently isn't, nor should it be).
Your root problem is likely that the Network Service account doesn't have permissions to access the queue (in this case, read from it).
Here's some code that will help you get the actual error in your event log:
catch(Exception ex)
{
Exception e = ex;
StringBuilder message = new StringBuilder();
while(e != null)
{
if(message.Length > 0) message.AppendLine("\nInnerException:");
message.AppendLine(e.ToString());
e = e.InnerException;
}
EventLogger.LogEvent(EventSource, EventLogType, "OnStart" + _lineFeed +
message.ToString());
}