In C# ASP.NET 3.5 web application running on Windows Server 2003, I get the following error once in a while:
"Object reference not set to an instance of an object.: at System.Messaging.Interop.MessagePropertyVariants.Unlock()
at System.Messaging.Message.Unlock()
at System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32 action, CursorHandle cursor, MessagePropertyFilter filter, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType)
at System.Messaging.MessageEnumerator.get_Current()
at System.Messaging.MessageQueue.GetAllMessages()".
The line of code that throws this error is:
Message[] msgs = Global.getOutputQueue(mode).GetAllMessages();
where Global.getOutputQueue(mode) gives the messagequeue I want to get messages from.
Update:
Global.getPool(mode).WaitOne();
commonClass.log(-1, "Acquired pool: " + mode, "Report ID: " + unique_report_id);
............../* some code /
..............
lock(getLock(mode))
{
bool yet_to_get = true;
int num_retry = 0;
do
{
try
{
msgs = Global.getOutputQueue(mode).GetAllMessages();
yet_to_get = false;
}
catch
{
Global.setOutputQueue(mode);
msgs = Global.getOutputQueue(mode).GetAllMessages();
yet_to_get = false;
}
++num_retry;
}
while (yet_to_get && num_retry < 2);
}
... / some code*/
....
finally
{
commonClass.log(-1, "Released pool: " + mode, "Report ID: " + unique_report_id);
Global.getPool(mode).Release();
}
Your description and this thread suggests a timing issue. I would create the MessageQueue object infrequently (maybe only once) and have Global.getOutputQueue(mode) return a cached version, seems likely to get around this.
EDIT: Further details suggest you have the opposite problem. I suggest encapsulating access to the message queue, catching this exception and recreating the queue if that exception occurs. So, replace the call to Global.getOutputQueue(mode).GetAllMessages() with something like this:
public void getAllOutputQueueMessages()
{
try
{
return queue_.GetAllMessages();
}
catch (Exception)
{
queue_ = OpenQueue();
return queue_.GetAllMessages();
}
}
You'll notice I did not preserve your mode functionality, but you get the idea. Of course, you have to duplicate this pattern for other calls you make to the queue, but only for the ones you make (not the whole queue interface).
This is an old thread, but google brought me here so I shall add my findings.
I agree with user: tallseth that this is a timing issue.
After the message queue is created it is not instantly available.
try
{
return _queue.GetAllMessages().Length;
}
catch (Exception)
{
System.Threading.Thread.Sleep(4000);
return _queue.GetAllMessages().Length;
}
try adding a pause if you catch an exception when accessing a queue which you know has been created.
On a related note
_logQueuePath = logQueuePath.StartsWith(#".\") ? logQueuePath : #".\" + logQueuePath;
_queue = new MessageQueue(_logQueuePath);
MessageQueue.Create(_logQueuePath);
bool exists = MessageQueue.Exists(_logQueuePath);
running the MessageQueue.Exists(string nameofQ); method immediately after creating the queue will return false. So be careful when calling code such as:
public void CreateQueue()
{
if (!MessageQueue.Exists(_logQueuePath))
{
MessageQueue.Create(_logQueuePath);
}
}
As it is likely to throw an exception stating that the queue you are trying to create already exists.
-edit: (Sorry I don't have the relevant link for this new info)
I read that a newly created MessageQueue will return false on MessageQueue.Exists(QueuePath)until it has received at least one message.
Keeping this and the earlier points i mentioned in mind has gotten my code running reliably.
Related
I've got a UWP (C#) app that's running in production on a remote machine (under windows 10) but it periodically crashes.
My client says, somewhat arbitrarily, every 9 hours or so.
I have several .wer files from the previous crashes but did not have a minidump, the paths referenced in the event viewer entry for the crash are blank other than the WER files.
See edits below for how a minidump was obtained and findings.
The exception is an access violation (0xc0000005) at exception offset 0x0004df23 in ntdll.dll
I have the full source for the application and can run it in debug for long periods without the crash.
If I use DLL Export Viewer and load the exact version of ntdll.dll (copied from the remote machine) then I can see that at relative address 0x0004dc60 is EtwNotificationRegister and at 0x0004e260 is LdrGetDllPath.
Does this mean that my crash is occurring within a line of code in EtwNotificationRegister (which in turn is invoked by something within our code; however very difficult to trace without stack/minidump)
I am not sure if the layout of a dll is such that the address I have can be placed like that?
Edit 2 as per #Raymond: No. There are almost certainly other non-exported functions between EtwNotificationRegister and LdrGetDllPath. On build 17763.475, offset 4df23 is RtlpWaitOnCriticalSection, so you are probably using an uninitialized critical section or an already-deleted critical section.
Is there any way I can extract more detail about this crash? I have remote access to the computer running the app but the crash does not appear to be triggered by a particular event (e.g. we can't hit a button and cause the crash)
Using a minidump now
I am running the program in both local debug as well.
I have a remote debugger to the remote process but can't seem to break or inspect threads, not sure why. Just redeployed with symbols and the debugger attaches no problem but it just skips all breakpoints :(
Our own (rather naive) local log file, originally intended for only local debugging is written with a StreamWriter.WriteLine and immediately followed with a StreamWriter.Flush (wrapped in a try catch since that's not thread safe) just ends at a normal event on the remote machine - there is nothing following this normal event.
We catch App_UnhandledException and write to this log so I'd have expected a stack here.
In Unexplained crashes related to ntdll.dll it is suggested that a crash from ntdll.dll is a canary in a coalmine Unexplained crashes related to ntdll.dll
Edit 1: I have configured an auto crash dump as per https://www.meziantou.net/2018/06/04/tip-automatically-create-a-crash-dump-file-on-error so if I can get it to crash again maybe I'll get a dump file next time?
Here is the detail from the WER
Version=1
EventType=MoAppCrash
EventTime=132017523132123596
ReportType=2
Consent=1
UploadTime=132017523137590717
ReportStatus=268435456
ReportIdentifier=8d467f04-4bdd-4f9e-bf26-b42d143ece1a
IntegratorReportIdentifier=b60f9ca0-4126-4262-a886-98d3844892d3
Wow64Host=34404
NsAppName=praid:App
OriginalFilename=XXXXXX.YYYYYY.exe
AppSessionGuid=00001514-0001-0004-9fe2-6df11905d501
TargetAppId=U:XXXXXX.YYYYYY_1.0.201.0_x64__b0abmt6f49vqj!App
TargetAppVer=1.0.201.0_x64_!2018//01//24:08:17:16!1194d!XXXXXX.YYYYYY.exe
BootId=4294967295
TargetAsId=1298
UserImpactVector=271582000
IsFatal=1
EtwNonCollectReason=4
Response.BucketId=2ee79f27e2e81a541d6200d746866340
Response.BucketTable=5
Response.LegacyBucketId=2117255699418735424
Response.type=4
Sig[0].Name=Package Full Name
Sig[0].Value=XXXXXX.YYYYYY_1.0.201.0_x64__b0abmt6f49vqj
Sig[1].Name=Application Name
Sig[1].Value=praid:App
Sig[2].Name=Application Version
Sig[2].Value=1.0.0.0
Sig[3].Name=Application Timestamp
Sig[3].Value=5a68410c
Sig[4].Name=Fault Module Name
Sig[4].Value=ntdll.dll
Sig[5].Name=Fault Module Version
Sig[5].Value=10.0.17763.475
Sig[6].Name=Fault Module Timestamp
Sig[6].Value=3230aa04
Sig[7].Name=Exception Code
Sig[7].Value=c0000005
Sig[8].Name=Exception Offset
Sig[8].Value=000000000004df23
DynamicSig[1].Name=OS Version
DynamicSig[1].Value=10.0.17763.2.0.0.256.48
DynamicSig[2].Name=Locale ID
DynamicSig[2].Value=5129
DynamicSig[22].Name=Additional Information 1
DynamicSig[22].Value=95b1
DynamicSig[23].Name=Additional Information 2
DynamicSig[23].Value=95b15a88b673e33a5f48839974790b1c
DynamicSig[24].Name=Additional Information 3
DynamicSig[24].Value=283d
DynamicSig[25].Name=Additional Information 4
DynamicSig[25].Value=283dea7b6b6112710c1e3f76ed84d993
Edit 3: screenshot of minidump from a crash last night. In the event log, the WER crash looks the same so this appears to be the same issue. I will see if I can load symbols etc.
Edit 4: Attempting to debug managed. Threads view shows a thread as the exception point but no call stack info.
Edit 5: Debugging native from the minidump. Looks like we have a winner.
#Raymond was correct, it was RtlpWaitOnCriticalSection invoked from BluetoothLEAdvertismentWatcher::AdvertismentReceivedCallbackWorker
Native call stack as text:
Not Flagged > 8748 0 Worker Thread Win64
Thread Windows.Devices.Bluetooth.dll!(void)
ntdll.dll!RtlpWaitOnCriticalSection()
ntdll.dll!RtlpEnterCriticalSectionContended()
ntdll.dll!RtlEnterCriticalSection()
Windows.Devices.Bluetooth.dll!(void)()
Windows.Devices.Bluetooth.dll!wil::ResultFromException<(void)
()
Windows.Devices.Bluetooth.dll!Windows::Devices::Bluetooth::Advertisement::BluetoothLEAdvertisementWatcher::AdvertisementReceivedCallbackWorker(void)
Windows.Devices.Bluetooth.dll!Windows::Devices::Bluetooth::Advertisement::BluetoothLEAdvertisementWatcher::AdvertisementReceivedThreadpoolWorkCallbackStatic(struct
_TP_CALLBACK_INSTANCE *,void *,struct _TP_WORK *)
ntdll.dll!TppWorkpExecuteCallback()
ntdll.dll!TppWorkerThread()
kernel32.dll!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
Edit 6: okay so now, what do I do? How can I resolve this problem? My understanding of the stack is it looks like an exception was thrown inside the callback? Is that correct?
So I could put a managed try/catch in the BLE advertisment callback handler and that should (catch - for further debugging) fix it?
Edit 7: code...
Here is the code we use to instantiate the wrapper and subscribe to events.
The BluetoothLEAdvertisementWatcherWrapper is a delgating class (e.g. it just wraps the underlying BluetoothLEAdvertisementWatcher so it can implement an interface; it simply passes all events through and exposes properties. We do this so that we can have a different version that creates virtual events for testing)
bluetoothAdvertisementWatcher = new BluetoothLEAdvertisementWatcherWrapper();
bluetoothAdvertisementWatcher.SignalStrengthFilter.SamplingInterval = TimeSpan.Zero;
bluetoothAdvertisementWatcher.ScanningMode = BluetoothLEScanningMode.Active;
bluetoothAdvertisementWatcher.Received += Watcher_Received;
bluetoothAdvertisementWatcher.Stopped += Watcher_Stopped;
bluetoothAdvertisementWatcher.Start();
Here is the code for the wrapper; just to show it's not doing anything complex:
public class BluetoothLEAdvertisementWatcherWrapper : IBluetoothAdvertismentWatcher, IDisposable
{
private BluetoothLEAdvertisementWatcher bluetoothWatcher;
public BluetoothLEAdvertisementWatcherWrapper()
{
bluetoothWatcher = new BluetoothLEAdvertisementWatcher();
}
public BluetoothSignalStrengthFilter SignalStrengthFilter => bluetoothWatcher.SignalStrengthFilter;
public BluetoothLEScanningMode ScanningMode
{
get
{
return bluetoothWatcher.ScanningMode;
}
set
{
bluetoothWatcher.ScanningMode = value;
}
}
public event TypedEventHandler<BluetoothLEAdvertisementWatcher, BluetoothLEAdvertisementReceivedEventArgs> Received
{
add
{
bluetoothWatcher.Received += value;
}
remove
{
bluetoothWatcher.Received -= value;
}
}
public event TypedEventHandler<BluetoothLEAdvertisementWatcher, BluetoothLEAdvertisementWatcherStoppedEventArgs> Stopped
{
add
{
bluetoothWatcher.Stopped += value;
}
remove
{
bluetoothWatcher.Stopped -= value;
}
}
public BluetoothLEAdvertisementWatcherStatus Status => bluetoothWatcher.Status;
public Action<IPacketFrame, short> YieldAdvertisingPacket { get => throw new NotImplementedException(); set => throw new NotImplementedException(); }
public void Start()
{
bluetoothWatcher.Start();
}
public void Stop()
{
bluetoothWatcher.Stop();
}
public void Dispose()
{
if (bluetoothWatcher != null)
{
if (bluetoothWatcher.Status == BluetoothLEAdvertisementWatcherStatus.Started)
{
bluetoothWatcher.Stop();
}
bluetoothWatcher = null;
}
}
}
And here is the code for the Watcher_Received event handler:
private void Watcher_Received(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementReceivedEventArgs args)
{
try
{
//we won't queue packets until registered
if (!ApplicationContext.Current.Details.ReceiverId.HasValue)
return;
IPacketFrame frame;
PacketFrameParseResult result = ParseFrame(args, out frame);
if (result == PacketFrameParseResult.Success)
{
ApplicationContext.Current.Details.BluetoothPacketCount++;
}
short rssi = args.RawSignalStrengthInDBm;
string message = FormatPacketForDisplay(args, args.AdvertisementType, rssi, frame, result);
if (BluetoothPacketReceived != null)
{
BluetoothPacketReceived.Invoke(this, new BluetoothPacketReceivedEventArgs(message, result, frame, rssi));
}
}
catch (Exception ex)
{
if (ex.InnerException is Exceptions.PacketFrameParseException && (ex.InnerException as Exceptions.PacketFrameParseException).Result == PacketFrameParseResult.InvalidData)
{
// noop
}
else
{
Logger.Log(LogLevel.Warning, "BLE listener caught bluetooth packet error: {0}", ex);
if (BluetoothPacketError != null)
{
BluetoothPacketError.Invoke(this, new BluetoothPacketErrorEventArgs(ex));
}
}
}
}
You can see here that the entire managed callback is wrapped in a try catch and doesn't rethrow, so I'm not sure if there's anything further I can do to prevent the native exception from bringing the application down.
Current thinking, based on this: RtlpEnterCriticalSectionContended is it a parallel event handler, the native side is raising the handler, and it raises for a new event in the same thread while the previous handler is still executing from a previous event?
Then this is a race condition on the critical section that causes the crash?
Edit 8: To test this theory, I replaced the contents of received with a read + push to a concurrent queue, allowing the managed code to exit the event handler as quickly as possible.
Then added a seperate thread reading from the concurrent queue to perform my application side processing.
Initially, I thought this had resolved the issue as the application actively (listening) ran for approximately 15 hours, however it crashed again this morning with the same symptoms.
Edit 8: Following suggestions in the comments, we tried to ensure that we didn't dispose/GC the watcher after a stop prior to the receive completing.
We did this by using a TaskCompletionSource to function as a promise, subscribing to the Stopped event so we could await on the completion source task which would only have a result set when the Stopped event had fired.
We also used a lock (Monitor.Enter) in both StopAsync and Received to ensure that both could not be running in parallel.
This appeared to reduce the speed at which the system could process events which would make sense if the BLE packets were arriving in parallel.
Updated code as follows:
if ((DateTime.Now - this.LastStartedTimestamp).TotalSeconds > 60)
{
if (this.LastStopReason != BluetoothWatcherStopReason.DeviceCharacteristicWorker)
{
Logger.Log(LogLevel.Debug, "Stopping bluetooth watcher...");
// restart watcher every 10 mins
await this.StopAsync(BluetoothWatcherStopReason.AutomaticRestart);
//start again if automatic restart
Logger.Log(LogLevel.Debug, "Starting bluetooth watcher...");
this.Start(this.testMode);
Logger.Log(LogLevel.Debug, "Started bluetooth watcher");
this.LastStartedTimestamp = DateTime.Now;
}
}
private void Watcher_Stopped(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementWatcherStoppedEventArgs args)
{
string error = args.Error.ToString();
Logger.Log(LogLevel.Warning, string.Format("BLE listening stopped because {0}...", error));
LastError = args.Error;
if (BluetoothWatcherStopped != null)
{
BluetoothWatcherStopped.Invoke(sender, args);
}
}
public class ReceivedBluetoothAdvertismentPacketItem
{
public DateTime Timestamp { get; set; }
public BluetoothLEAdvertisementType Type { get; set; }
public byte[] Buffer { get; set; }
public short Rssi { get; set; }
}
ConcurrentQueue<ReceivedBluetoothAdvertismentPacketItem> BluetoothPacketsReceivedQueue = new ConcurrentQueue<ReceivedBluetoothAdvertismentPacketItem>();
private void Watcher_Received(BluetoothLEAdvertisementWatcher sender, BluetoothLEAdvertisementReceivedEventArgs args)
{
bool lockWasTaken = false;
try
{
//this prevents stop until we're exiting Received
Monitor.Enter(BluetoothWatcherEventSynchronisation, ref lockWasTaken);
if (!lockWasTaken)
{
return;
}
//we won't queue packets until registered
if (!ApplicationContext.Current.ReceiverDetails.ReceiverId.HasValue)
return;
BluetoothLEAdvertisementType type = args.AdvertisementType;
byte[] buffer = GetManufacturerData(args.Advertisement);
short rssi = args.RawSignalStrengthInDBm;
BluetoothPacketsReceivedQueue.Enqueue(new ReceivedBluetoothAdvertismentPacketItem
{
Timestamp = DateTime.UtcNow,
Type = type,
Rssi = rssi,
Buffer = buffer
});
ApplicationContext.Current.ReceiverDetails.UnprocessedQueueLength = BluetoothPacketsReceivedQueue.Count;
}
catch (Exception ex)
{
Logger.Log(LogLevel.Warning, "BLE listener caught bluetooth packet error: {0}", ex);
if (BluetoothPacketError != null)
{
BluetoothPacketError.Invoke(this, new BluetoothPacketErrorEventArgs(ex));
}
}
finally
{
if (lockWasTaken)
{
Monitor.Exit(BluetoothWatcherEventSynchronisation);
}
}
}
public BluetoothWatcherStopReason LastStopReason { get; private set; } = BluetoothWatcherStopReason.Unknown;
private object BluetoothWatcherEventSynchronisation = new object();
public Task<BluetoothWatcherStopReason> StopAsync(BluetoothWatcherStopReason reason)
{
var promise = new TaskCompletionSource<BluetoothWatcherStopReason>();
if (bluetoothAdvertisementWatcher != null)
{
LastStopReason = reason;
UpdateBluetoothStatusInReceiverModel(BluetoothLEAdvertisementWatcherStatus.Stopped); //actually stopping but we lie
bool lockWasTaken = false;
try
{
Monitor.Enter(BluetoothWatcherEventSynchronisation, ref lockWasTaken);
{
bluetoothAdvertisementWatcher.Received -= Watcher_Received;
bluetoothAdvertisementWatcher.Stopped += (sender, args) =>
{
// clean up
if (bluetoothAdvertisementWatcher != null)
{
bluetoothAdvertisementWatcher.Stopped -= Watcher_Stopped;
bluetoothAdvertisementWatcher = null;
}
//notify continuation
promise.SetResult(reason);
};
bluetoothAdvertisementWatcher.Stop();
}
}
finally
{
if (lockWasTaken)
{
Monitor.Exit(BluetoothWatcherEventSynchronisation);
}
}
}
base.Stop();
return promise.Task;
}
Following these changes, the same crash is still occuring in the Windows.Devices.Bluetooth native assembly (as per above)
Edit 9: I've removed the automatic periodic start/stop and now the app has been stable for > 36 hours without a crash. So something inside this flow is causing the crashes. We originally added that to work around an issue with the advertisment watcher just stopping after a while, so we'd like to keep it if we can fix it.
The if statement if ((DateTime.Now - this.LastStartedTimestamp).TotalSeconds > 60) (and block) is currently commented.
I have opened a bug for windows universal here: https://wpdev.uservoice.com/forums/110705-universal-windows-platform/suggestions/37623343-bluetoothleadvertismentwatcher-advertismentreceiv
Overview of Problem:
I need to connect to an IRC Server. Once connected, the program will send a message to the channel, and a response will occur over multiple lines back. I need to read these lines and store in a variable for later use. A special character at the end of the message (]) will define the end of the message over multiple lines. Once we have received this character, the IRC session should disconnect and processing should continue.
Situation:
I am using the Smartirc4net library. Calling irc.Disconnect() takes about 40 seconds to disconnect the session. Once we've received the ] character, the session should be disconnected, Listen() should not be blocking, and the rest of the program should continue to run.
Research:
I have found this: smartirc4net listens forever, can't exit thread, and I think it might be the same issue, however, I am unsure of what I need to do to resolve the problem.
Code:
public class IrcCommunicator
{
public IrcClient irc = new IrcClient();
string data;
public string Data { get { return data; } }
// this method we will use to analyse queries (also known as private messages)
public void OnQueryMessage(object sender, IrcEventArgs e)
{
data += e.Data.Message;
if (e.Data.Message.Contains("]"))
{
irc.Disconnect(); //THIS TAKES 40 SECONDS!!!
}
}
public void RunCommand()
{
irc.OnQueryMessage += new IrcEventHandler(OnQueryMessage);
string[] serverlist;
serverlist = new string[] { "127.0.0.1" };
int port = 6667;
string channel = "#test";
try
{
irc.Connect(serverlist, port);
}
catch (ConnectionException e)
{
// something went wrong, the reason will be shown
System.Console.WriteLine("couldn't connect! Reason: " + e.Message);
}
try
{
// here we logon and register our nickname and so on
irc.Login("test", "test");
// join the channel
irc.RfcJoin(channel);
irc.SendMessage(SendType.Message, "test", "!query");
// here we tell the IRC API to go into a receive mode, all events
// will be triggered by _this_ thread (main thread in this case)
// Listen() blocks by default, you can also use ListenOnce() if you
// need that does one IRC operation and then returns, so you need then
// an own loop
irc.Listen();
// when Listen() returns our IRC session is over, to be sure we call
// disconnect manually
irc.Disconnect();
}
catch (Exception e)
{
// this should not happen by just in case we handle it nicely
System.Console.WriteLine("Error occurred! Message: " + e.Message);
System.Console.WriteLine("Exception: " + e.StackTrace);
}
}
}
IrcBot bot = new IrcBot();
bot.RunCommand();
ViewBag.IRC = bot.Data;
As you can see, once this
Thank you for your time to look at this code and read my problem description. If you have any thoughts, or other suggestions, please let me know.
Mike
I was able to successfully disconnect straight away by calling RfcQuit() within OnQueryMessage(), before irc.Disconnect();
using IPC over local TCP to communicate from Client to a Server thread. The connection itself doesn't seem to be throwing any errors, but every time I try to make one of the associated calls, I get this message:
System.Runtime.Remoting.RemotingException: Could not connect to an IPC Port: The System cannot Find the file specified
What I am attempting to figure out is WHY. Because this WAS working correctly, until I transitioned the projects in question (yes, both) from .NET 3.5 to .NET 4.0.
Listen Code
private void ThreadListen()
{
_listenerThread = new Thread(Listen) {Name = "Listener Thread", Priority = ThreadPriority.AboveNormal};
_listenerThread.Start();
}
private void Listen()
{
_listener = new Listener(this);
LifetimeServices.LeaseTime = TimeSpan.FromDays(365);
IDictionary props = new Hashtable();
props["port"] = 63726;
props["name"] = "AGENT";
TcpChannel channel = new TcpChannel(props, null, null);
ChannelServices.RegisterChannel(channel, false);
RemotingServices.Marshal(_listener, "Agent");
Logger.WriteLog(new LogMessage(MethodBase.GetCurrentMethod().Name, "Now Listening for commands..."));
LogEvent("Now Listening for commands...");
}
Selected Client Code
private void InitializeAgent()
{
try
{
_agentController =
(IAgent)RemotingServices.Connect(typeof(IAgent), IPC_URL);
//Note: IPC_URL was originally "ipc://AGENT/AGENT"
// It has been changed to read "tcp://localhost:63726/Agent"
SetAgentPid();
}
catch (Exception ex)
{
HandleError("Unable to initialize the connected agent.", 3850244, ex);
}
}
//This is the method that throws the error
public override void LoadTimer()
{
// first check to see if we have already set the agent process id and set it if not
if (_agentPid < 0)
{
SetAgentPid();
}
try
{
TryStart();
var tries = 0;
while (tries < RUNCHECK_TRYCOUNT)
{
try
{
_agentController.ReloadSettings();//<---Error occurs here
return;
} catch (RemotingException)
{
Thread.Sleep(2000);
tries++;
if (tries == RUNCHECK_TRYCOUNT)
throw;
}
}
}
catch (Exception ex)
{
HandleError("Unable to reload the timer for the connected agent.", 3850243, ex);
}
}
If you need to see something I haven't shown, please ask, I'm pretty much flying blind here.
Edit: I think the issue is the IPC_URL String. It is currently set to "ipc://AGENT/AGENT". The thing is, I have no idea where that came from, why it worked before, or what might be stopping it from working now.
Update
I was able to get the IPC Calls working correctly by changing the IPC_URL String, but I still lack understanding of why what I did worked. Or rather, why the original code stopped working and I needed to change it in the first place.
The string I am using now is "tcp://localhost:63726/Agent"
Can anyone tell me, not why the new string works, I know that...but Why did the original string work before and why did updating the project target to .NET 4.0 break it?
I have two self hosted services running on the same network. The first is sampling an excel sheet (or other sources, but for the moment this is the one I'm using to test) and sending updates to a subscribed client.
The second connects as a client to instances of the first client, optionally evaluates some formula on these inputs and the broadcasts the originals or the results as updates to a subscribed client in the same manner as the first. All of this is happening over a tcp binding.
My problem is occuring when the second service attempts to subscribe to two of the first service's feeds at once, as it would do if a new calculation is using two or more for the first time. I keep getting TimeoutExceptions which appear to be occuring when the second feed is subscribed to. I put a breakpoint in the called method on the first server and stepping through it, it is able to fully complete and return true back up the call stack, which indicates that the problem might be some annoying intricacy of WCF
The first service is running on port 8081 and this is the method that gets called:
public virtual bool Subscribe(int fid)
{
try
{
if (fid > -1 && _fieldNames.LeftContains(fid))
{
String sessionID = OperationContext.Current.SessionId;
Action<Object, IUpdate> toSub = MakeSend(OperationContext.Current.GetCallbackChannel<ISubClient>(), sessionID);//Make a callback to the client's callback method to send the updates
if (!_callbackList.ContainsKey(fid))
_callbackList.Add(fid, new Dictionary<String, Action<Object, IUpdate>>());
_callbackList[fid][sessionID] = toSub;//add the callback method to the list of callback methods to call when this feed is updated
String field = GetItem(fid);//get the current stored value of that field
CheckChanged(fid, field);//add or update field, usually returns a bool if the value has changed but also updates the last value reference, used here to ensure there is a value to send
FireOne(toSub, this, MakeUpdate(fid, field));//sends an update so the subscribing service will have a first value
return true;
}
return false;
}
catch (Exception e)
{
Log(e);//report any errors before returning a failure
return false;
}
}
The second service is running on port 8082 and is failing in this method:
public int AddCalculation(string name, string input)
{
try
{
Calculation calc;
try
{
calc = new Calculation(_fieldNames, input, name);//Perform slow creation before locking - better wasted one thread than several blocked ones
}
catch (FormatException e)
{
throw Fault.MakeCalculationFault(e.Message);
}
lock (_calculations)
{
int id = nextID();
foreach (int fid in calc.Dependencies)
{
if (!_calculations.ContainsKey(fid))
{
lock (_fieldTracker)
{
DataRow row = _fieldTracker.Rows.Find(fid);
int uses = (int)(row[Uses]) + 1;//update uses of that feed
try
{
if (uses == 1){//if this is the first use of this field
SubServiceClient service = _services[(int)row[ServiceID]];//get the stored connection (as client) to that service
service.Subscribe((int)row[ServiceField]);//Failing here, but only on second call and not if subscribed to each seperately
}
}
catch (TimeoutException e)
{
Log(e);
throw Fault.MakeOperationFault(FaultType.NoItemFound, "Service could not be found");//can't be caught, if this timed out then outer connection timed out
}
_fieldTracker.Rows.Find(fid)[Uses] = uses;
}
}
}
return id;
}
}
catch (FormatException f)
{
Log(f.Message);
throw Fault.MakeOperationFault(FaultType.InvalidInput, f.Message);
}
}
The ports these are on could change but are never shared. The tcp binding used is set up in code with these settings:
_tcpbinding = new NetTcpBinding();
_tcpbinding.PortSharingEnabled = false;
_tcpbinding.Security.Mode = SecurityMode.None;
This is in a common library to ensure they both have the same set up, which is also a reason why it is declared in code.
I have already tried altering the Service Throttling Behavior for more concurrent calls but that didn't work. It's commented out for now since it didn't work but for reference here's what I tried:
ServiceThrottlingBehavior stb = new ServiceThrottlingBehavior
{
MaxConcurrentCalls = 400,
MaxConcurrentSessions = 400,
MaxConcurrentInstances = 400
};
host.Description.Behaviors.RemoveAll<ServiceThrottlingBehavior>();
host.Description.Behaviors.Add(stb);
Has anyone had similar issues of methods working correctly but still timing out when sending back to the caller?
This was a difficult problem and from everything I could tell, it is an intricacy of WCF. It cannot handle one connection being reused very quickly in a loop.
It seems to lock up the socket connection, though trying to add GC.Collect() didn't free up whatever resources it was contesting.
In the end the only way I found to work was to create another connection to the same endpoint for each concurrent request and perform them on separate threads. Might not be the cleanest way but it was all that worked.
Something that might come in handy is that I used the svc trace viewer to monitor the WCF calls to try and track the problem, I found out how to use it from this article: http://www.codeproject.com/Articles/17258/Debugging-WCF-Apps
I have this in a class called "MessageQueueReceive".
public MessageQueueTransaction BlockingReceive(out Message message)
{
MessageQueueTransaction tran = null;
message = null;
tran = new MessageQueueTransaction();
tran.Begin();
try
{
message = Queue.Receive(new TimeSpan(0, 0, 5), tran);
}
catch (MessageQueueException ex)
{
// If the exception was a timeout, then just continue
// otherwise re-raise it.
if (ex.MessageQueueErrorCode != MessageQueueErrorCode.IOTimeout)
throw ex;
}
return tran;
}
Then my processing loop has this:-
while (!Abort)
{
try
{
tran = this.Queue.BlockingReceive(out msg);
if (msg != null)
{
// Process message here
if (tran != null)
tran.Commit();
}
}
catch (Exception ex)
{
if (tran != null)
tran.Abort();
}
}
The control panel tool shows that the message queues I'm using are transactional. Journal queue is not enabled.
This code creates the queue:-
private static MessageQueue CreateMessageQueue(string queueName, bool transactional = false)
{
MessageQueue messageQueue = MessageQueue.Create(queueName, transactional);
messageQueue.SetPermissions("Administrators", MessageQueueAccessRights.FullControl,
AccessControlEntryType.Allow);
return messageQueue;
}
The transactional parameter is set as "true" when this is called.
What I find is that when an exception occurs during the processing of the message, tran.Abort is called but at that point I'd expect the message to be returned to the queue. However, this is not happening and the messages are lost.
Am I missing something obvious? Can anyone see what I'm doing wrong?
Thanks for all the comments. I did re-organise my code as Russell McClure suggested, and I tried to create simple test cases but could not reproduce the problem.
In the end, the problem was not at all where I was looking (how often does that happen?).
In my pipeline, I had a duplicate message checker. The "messages" my system deals with are from remote devices on a WAN, and occasionally messages on the wire are duplicated.
When a message was pulled from the MSMQ, it would pass via the duplicate checker the database writer. If the database writer failed, the duplicate checked did not remove the hash from its table. When the process tried to loop again, it would get the same message from the queue agan because the MSMQ transaction had been rolled back when the database writer failed. However, on the second attempt, the duplicate checker would spot that it had seen the message before, and swallow it silently.
The fix was to make the duplicate checker spot the exception coming from the next link in the chain, and roll-back anything it had done too.
Your queue needs to be created as a transactional queue to get what you want.
EDIT:
Well, if your queue is transactional then that points to the fact that you are mishandling your transaction, although I can't see specifically how it is happening. I would change your BlockingReceive method to return the message. I would move the creation of the MessageQueueTransaction to the outer method. Your code will be much more maintainable if you have the Begin, Commit and Abort method calls in the same method.