Service occasionally hangs when stopping: suspended threads - c#

I wrote a Windows service in C# targeting .NET 4.0 which will on the odd occasion hang completely when I attempt to stop the service. I've noticed from looking at a dump file that a number of my threads are suspended, though I don't suspend them myself in my code.
The environment is Windows Server 2008R2 64bit, though I've observed the same hang on Windows 7 64bit. .NET 4.0 is the latest version installed.
There's a lot of code so I'm just posting some hopefully relevant snippets, I can post more if required.
Basic design:
Main() starts a new thread to handle logging to a file (the code for which is in a separate dll), then starts the service.
public static void Main(string[] args)
{
...
else if (Args.RunService)
{
Logger.Options.LogToFile = true;
MSPO.Logging.Logger.Start();
RunService();
MSPO.Logging.Logger.Stop();
}
...
}
private static void RunService()
{
service = new ProcessThrottlerService();
System.ServiceProcess.ServiceBase.Run(service);
}
That thread stays there, until ServiceBase.Run returns.
OnStart() in the service creates a new thread and starts it.
protected override void OnStart(string[] args)
{
serviceThread = new MainServiceThread();
serviceThread.StartThread();
base.OnStart(args);
}
I create a ManualResetEventSlim which is used as the stop signal for the rest of the program. OnStop() sets the event.
protected override void OnStop()
{
if (serviceThread != null)
{
serviceThread.StopThread(); // Event is signalled in there
serviceThread.WaitForThreadToReturn(); // This calls thread.Join() on the MainServiceThread thread
}
base.OnStop();
}
The "MainServiceThread" creates the event, kicks off a new thread again, and just waits on the event.
private void StartHandlerAndWaitForServiceStop()
{
processHandler.Start(serviceStopEvent);
serviceStopEvent.Wait();
processHandler.Stop();
}
The processHandler thread subscribes to this WMI query:
watcher = new ManagementEventWatcher(new ManagementScope("root\\CIMV2"),
new WqlEventQuery("SELECT * FROM Win32_ProcessStartTrace"));
watcher.EventArrived += HandleNewProcessCreated;
If the new process name is of interest, I create a new "throttler" thread which effectively just suspends the process, sleeps, resumes the process, and sleeps again, on a loop:
while (true)
{
ntresult = Ntdll.NtResumeProcess(processHandle);
if (ntresult != Ntdll.NTSTATUS.STATUS_SUCCESS)
{
if (ntresult != Ntdll.NTSTATUS.STATUS_PROCESS_IS_TERMINATING)
LogSuspendResumeFailure("resume", ntresult);
break;
}
Thread.Sleep(resumeTime);
ntresult = Ntdll.NtSuspendProcess(processHandle);
if (ntresult != Ntdll.NTSTATUS.STATUS_SUCCESS)
{
if (ntresult != Ntdll.NTSTATUS.STATUS_PROCESS_IS_TERMINATING)
LogSuspendResumeFailure("suspend", ntresult);
break;
}
Thread.Sleep(suspendTime);
if (++loop >= loopsBeforeCheckingStopEvent)
{
if (stopEvent.IsSet) break;
loop = 0;
}
}
If the service receives a stop command, it will set the ManualResetEventSlim event. Any threads "throttling" processes will see it within 1 second and break out of the loop/return. The process handler thread will wait on all of those threads to return, and then return as well. At that point the StartHandlerAndWaitForServiceStop() method posted above will return, and the other threads that have been waiting here and there return.
The vast majority of the times I've stopped the service, it stops without any problems. This is regardless of whether I've got 0 or 500 throttler threads running, and regardless of whether any have ever been created while the service was running.
However now and again when I try to stop it (through services.msc), it will hang. Yesterday I managed to create a full dump of the process while it was in this state. I created the dump with Process Explorer.
The dump file shows that a number of my threads are suspended:
0:010> ~
0 Id: 1840.c34 Suspend: 0 Teb: 000007ff`fffdd000 Unfrozen
1 Id: 1840.548 Suspend: 0 Teb: 000007ff`fffdb000 Unfrozen
2 Id: 1840.9c0 Suspend: 0 Teb: 000007ff`fffd9000 Unfrozen
3 Id: 1840.1da8 Suspend: 0 Teb: 000007ff`fffd7000 Unfrozen
4 Id: 1840.b08 Suspend: 3 Teb: 000007ff`fffd5000 Unfrozen
5 Id: 1840.1b5c Suspend: 0 Teb: 000007ff`ffef6000 Unfrozen
6 Id: 1840.af0 Suspend: 2 Teb: 000007ff`ffef2000 Unfrozen
7 Id: 1840.c60 Suspend: 0 Teb: 000007ff`ffef0000 Unfrozen
8 Id: 1840.1d94 Suspend: 4 Teb: 000007ff`ffeee000 Unfrozen
9 Id: 1840.1cd8 Suspend: 4 Teb: 000007ff`ffeec000 Unfrozen
. 10 Id: 1840.1c64 Suspend: 0 Teb: 000007ff`ffefa000 Unfrozen
11 Id: 1840.1dc8 Suspend: 0 Teb: 000007ff`fffd3000 Unfrozen
12 Id: 1840.8f4 Suspend: 0 Teb: 000007ff`ffefe000 Unfrozen
This ties up with what I was seeing in Process Explorer - of the two processes I was "throttling", one was permanently suspended, the other was permanently resumed. So those throttler threads were effectively suspended, as they were no longer doing their work. It should be impossible for them to stop without being suspended, as I have error handling wrapped around it and any exception would cause those threads to log info and return. Plus their call stacks showed no errors. They weren't sleeping permanently due to some error, because the sleep times were 22 and 78 milliseconds for each of the two sleeps, and it was working fine before I tried to stop the service.
So I'm trying to understand how those threads could have become suspended. My only suspicion is the GC, cause that suspends threads while reclaiming/compacting memory.
I've pasted the content of !eestack and ~*kb here: http://pastebin.com/rfQK0Ak8
I should mention I didn't have symbols, as I'd already rebuilt the application a number of times by the time I created the dump. However as it's .NET I guess it's less of an issue?
From eestack, these are what I believe are "my" threads:
Thread 0: Main service thread, it's still in the ServiceBase.Run method.
Thread 4: That is my logger thread. That thread will spend most of its life waiting on a blocking queue.
Thread 6: My MainServiceThread thread, which is just waiting on the event to be set.
Threads 8 & 9: Both are "throttler" thread, executing the loop I posted above.
Thread 10: This thread appears to be executing the OnStop() method, so is handling the service stop command.
That's it, and threads 4, 6, 8, and 9 are suspended according to the dump file. So all "my" threads are suspended, apart from the main thread and the thread handling the OnStop() method.
Now I don't know much about the GC and debugging .NET stuff, but thread 10 looks dodgy to me. Excerpt from call stack:
Thread 10
Current frame: ntdll!NtWaitForMultipleObjects+0xa
Child-SP RetAddr Caller, Callee
000000001a83d670 000007fefdd41420 KERNELBASE!WaitForMultipleObjectsEx+0xe8, calling ntdll!NtWaitForMultipleObjects
000000001a83d6a0 000007fef4dc3d7c clr!CExecutionEngine::ClrVirtualAlloc+0x3c, calling kernel32!VirtualAllocStub
000000001a83d700 000007fefdd419bc KERNELBASE!WaitForMultipleObjectsEx+0x224, calling ntdll!RtlActivateActivationContextUnsafeFast
000000001a83d710 000007fef4e9d3aa clr!WKS::gc_heap::grow_heap_segment+0xca, calling clr!StressLog::LogOn
000000001a83d730 000007fef4e9cc98 clr!WKS::gc_heap::adjust_limit_clr+0xec, calling clr!memset
000000001a83d740 000007fef4df398d clr!COMNumber::FormatInt32+0x8d, calling clr!LazyMachStateCaptureState
000000001a83d750 000007fef4df398d clr!COMNumber::FormatInt32+0x8d, calling clr!LazyMachStateCaptureState
000000001a83d770 00000000778a16d3 kernel32!WaitForMultipleObjectsExImplementation+0xb3, calling kernel32!WaitForMultipleObjectsEx
000000001a83d7d0 000007fef4e9ce73 clr!WKS::gc_heap::allocate_small+0x158, calling clr!WKS::gc_heap::a_fit_segment_end_p
000000001a83d800 000007fef4f8f8e1 clr!WaitForMultipleObjectsEx_SO_TOLERANT+0x91, calling kernel32!WaitForMultipleObjectsExImplementation
000000001a83d830 000007fef4dfb798 clr!Thread::GetApartment+0x34, calling clr!GetThread
000000001a83d860 000007fef4f8f6ed clr!Thread::GetFinalApartment+0x1a, calling clr!Thread::GetApartment
000000001a83d890 000007fef4f8f6ba clr!Thread::DoAppropriateAptStateWait+0x56, calling clr!WaitForMultipleObjectsEx_SO_TOLERANT
000000001a83d8d0 000007fef4f8f545 clr!Thread::DoAppropriateWaitWorker+0x1b1, calling clr!Thread::DoAppropriateAptStateWait
000000001a83d990 000007fef4ecf167 clr!ObjectNative::Pulse+0x147, calling clr!HelperMethodFrameRestoreState
000000001a83d9d0 000007fef4f8f63b clr!Thread::DoAppropriateWait+0x73, calling clr!Thread::DoAppropriateWaitWorker
000000001a83da50 000007fef4f0ff6a clr!Thread::JoinEx+0xa6, calling clr!Thread::DoAppropriateWait
000000001a83dac0 000007fef4defd90 clr!GCHolderBase<0,0,0,0>::EnterInternal+0x3c, calling clr!Thread::EnablePreemptiveGC
000000001a83daf0 000007fef4f1039a clr!ThreadNative::DoJoin+0xd8, calling clr!Thread::JoinEx
000000001a83db20 000007fef45f86f3 (MethodDesc 000007fef3cbe8d8 +0x1a3 System.Threading.SemaphoreSlim.Release(Int32)), calling 000007fef4dc31b0 (stub for System.Threading.Monitor.Exit(System.Object))
000000001a83db60 000007fef4dfb2a6 clr!FrameWithCookie<HelperMethodFrame_1OBJ>::FrameWithCookie<HelperMethodFrame_1OBJ>+0x36, calling clr!GetThread
000000001a83db90 000007fef4f1024d clr!ThreadNative::Join+0xfd, calling clr!ThreadNative::DoJoin
000000001a83dc40 000007ff001723f5 (MethodDesc 000007ff001612c0 +0x85 MSPO.Logging.MessageQueue.EnqueueMessage(System.String)), calling (MethodDesc 000007fef30fde88 +0 System.Collections.Concurrent.BlockingCollection`1[[System.__Canon, mscorlib]].TryAddWithNoTimeValidation(System.__Canon, Int32, System.Threading.CancellationToken))
000000001a83dcf0 000007ff001720e9 (MethodDesc 000007ff00044bb0 +0xc9 ProcessThrottler.Logging.Logger.Log(LogLevel, System.String)), calling (MethodDesc 000007ff00161178 +0 MSPO.Logging.MessageFormatter.QueueFormattedOutput(System.String, System.String))
000000001a83dd10 000007fef4f101aa clr!ThreadNative::Join+0x5a, calling clr!LazyMachStateCaptureState
000000001a83dd30 000007ff0018000b (MethodDesc 000007ff00163e10 +0x3b ProcessThrottler.Service.MainServiceThread.WaitForThreadToReturn()), calling 000007fef4f10150 (stub for System.Threading.Thread.JoinInternal())
000000001a83dd60 000007ff0017ff44 (MethodDesc 000007ff00049f30 +0xc4 ProcessThrottler.Service.ProcessThrottlerService.OnStop()), calling 000007ff0004d278 (stub for ProcessThrottler.Service.MainServiceThread.WaitForThreadToReturn())
000000001a83dda0 000007fef63fcefb (MethodDesc 000007fef63d65e0 +0xbb System.ServiceProcess.ServiceBase.DeferredStop())
I could post more code showing what each of my functions is doing, but I really don't think this is a deadlock in my code, as the threads would not become suspended in that case. So I'm looking at the above call stack and seeing it's doing some GC stuff after I tell it to log a string to a queue. But none of that GC stuff looks dodgy, at least not compared to what I'm seeing in http://blogs.msdn.com/b/tess/archive/2008/02/11/hang-caused-by-gc-xml-deadlock.aspx I have a config file to tell it to use gcServer, but I'm almost certain it's not using that setting because in my earlier testing GCSettings.IsServerGC always returned false.
So... does anyone have any suggestions as to why my threads are suspended?
This is my OpenProcess method BTW which gets the handle to the process to be suspended/resumed, in response to Hans's comment:
private void GetProcessHandle(CurrentProcessDetails process)
{
IntPtr handle = Kernel32.OpenProcess(
process.Settings.RequiredProcessAccessRights,
false,
(uint)process.ID
);
if (handle == IntPtr.Zero)
throw new Win32ExceptionWrapper(
string.Format("Failed to open process {0} {1}",
process.Settings.ProcessNameWithExt, process.IDString));
process.Handle = handle;
}

I've discovered the cause. It has nothing to do with my code. It's a bug in Process Explorer.
My program is written to target .NET 4.0. If I use Process Explorer to view any of my threads' call stacks, Process Explorer suspends the thread and doesn't resume it. What it should do is suspend the thread while it gets the call stack, and then resume it immediately. But it's not resuming the threads - not my managed threads, anyway.
I can replicate it with this very simple code:
using System;
namespace Test
{
class Program
{
static void Main(string[] args)
{
for (int i = 0; i < int.MaxValue; i++)
{
Console.WriteLine(i.ToString());
}
}
}
}
If I compile that to target .NET 4.0 or higher, run it, and use Process Explorer to open the thread running the loop, the thread will become suspended. The resume button will become available, and I can click it to resume the thread. Opening the thread multiple times results in it being suspended multiple times; I confirmed this by using Windbg to view the suspend count of the thread.
If I compile it to target versions below 4.0 (tried 2.0 and 3.5), threads I open in Process Explorer do not remain suspended.

Related

Thread.Join() hangs intermittently after last line is passed in thread

I have console .NET application that is intended for execution of test automation.
Application invokes a separate thread from the main thread and in that new thread executes automated script - as follows:
void runScriptSeparateThread(TestScript script)
{
// do some stuff
Thread runScriptThread = new Thread(() => executeScript(script));
runScriptThread.SetApartmentState(ApartmentState.STA);
runScriptThread.Start();
if (runScriptThread.Join(timeout) == false)
{
runScriptThread.Abort();
File.AppendAllText(#"C:\log.txt", "Error: timeout ");
}
else
{
File.AppendAllText(#"C:\log.txt", "Message outer");
}
// do some other stuff
}
void executeScript(TestScript script)
{
// run test script using reflection calls to external assemblies
// includes invocation of new threads which will live after this thread finishes
// can potentially include any calls - according to needs of test automation
File.AppendAllText(#"C:\log.txt", "Message inner");
}
Problem is: Some times, after method executeScript() reaches its final line in its thread - method .Join() in the main thread continues to wait for timeout. That is - text "Message inner" is present in the "C:\log.txt" file, but text "Message outer" is missing.
NB: Behavior described above reproduces intermittently for cases when new threads with STA apartment state are spawned in the beginning of executeScript() method. New threads perform monitoring of UI controls with Ranorex tools - which perform behind the scene Win32 API calls that I am unfamiliar with. All of new threads' references are passed to the main thread and suppose to live after thread of executeScript() method exists.
Method executeScript makes calls with reflection according to automated script - and can potentially do any calls which can be implemented with .NET on a system.
My question is: Is it possible that invocation of new threads blocks execution of executeScript() method in separate thread - even after method reaches its last line? Can it be that STA apartment state of the thread and some Win32 calls that cause message pumping are the reason of hanging .Join() method for thread after thread's function passes all lines?
Note: Hang of .Join() method happens very rarely and was reproduced only on lab machines. I did not manage to reproduce behavior on local machine - even after automatic executing hundreds of times overnight.
Workaround found: So far I have ended up with following work around - resorted to usage of ManualResetEventSlim to wait for completion of the thread as below:
private ManualResetEventSlim executionControl = new ManualResetEventSlim();
private void runScriptSeparateThread(TestScript script)
{
this.executionControl.Reset();
Thread runScriptThread = new Thread(() => executeScript(script));
runScriptThread.SetApartmentState(ApartmentState.STA);
runScriptThread.Start();
if (this.executionControl.Wait(timeout))
{
runScriptThread.Abort();
File.AppendAllText(#"C:\log.txt", "Message outer");
}
else
{
File.AppendAllText(#"C:\log.txt", "Error: timeout ");
}
}
void executeScript(TestScript script)
{
// execute test automation
File.AppendAllText(#"C:\log.txt", "Message inner");
this.executionControl.Set();
}
Posted the same question on MSDN forum.

ManualResetEventSlim: Calling .Set() followed immediately by .Reset() doesn't release *any* waiting threads

ManualResetEventSlim: Calling .Set() followed immediately by .Reset() doesn't release any waiting threads
(Note: This also happens with ManualResetEvent, not just with ManualResetEventSlim.)
I tried the code below in both release and debug mode.
I'm running it as a 32-bit build using .Net 4 on Windows 7 64-bit running on a quad core processor.
I compiled it from Visual Studio 2012 (so .Net 4.5 is installed).
The output when I run it on my system is:
Waiting for 20 threads to start
Thread 1 started.
Thread 2 started.
Thread 3 started.
Thread 4 started.
Thread 0 started.
Thread 7 started.
Thread 6 started.
Thread 5 started.
Thread 8 started.
Thread 9 started.
Thread 10 started.
Thread 11 started.
Thread 12 started.
Thread 13 started.
Thread 14 started.
Thread 15 started.
Thread 16 started.
Thread 17 started.
Thread 18 started.
Thread 19 started.
Threads all started. Setting signal now.
0/20 threads received the signal.
So setting and then immediately resetting the event did not release a single thread. If you uncomment the Thread.Sleep(), then they are all released.
This seems somewhat unexpected.
Does anyone have an explanation?
using System;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
public static class Program
{
private static void Main(string[] args)
{
_startCounter = new CountdownEvent(NUM_THREADS); // Used to count #started threads.
for (int i = 0; i < NUM_THREADS; ++i)
{
int id = i;
Task.Factory.StartNew(() => test(id));
}
Console.WriteLine("Waiting for " + NUM_THREADS + " threads to start");
_startCounter.Wait(); // Wait for all the threads to have called _startCounter.Signal()
Thread.Sleep(100); // Just a little extra delay. Not really needed.
Console.WriteLine("Threads all started. Setting signal now.");
_signal.Set();
// Thread.Sleep(50); // With no sleep at all, NO threads receive the signal.
_signal.Reset();
Thread.Sleep(1000);
Console.WriteLine("\n{0}/{1} threads received the signal.\n\n", _signalledCount, NUM_THREADS);
Console.WriteLine("Press any key to exit.");
Console.ReadKey();
}
private static void test(int id)
{
Console.WriteLine("Thread " + id + " started.");
_startCounter.Signal();
_signal.Wait();
Interlocked.Increment(ref _signalledCount);
Console.WriteLine("Task " + id + " received the signal.");
}
private const int NUM_THREADS = 20;
private static readonly ManualResetEventSlim _signal = new ManualResetEventSlim();
private static CountdownEvent _startCounter;
private static int _signalledCount;
}
}
Note: This question poses a similar problem, but it doesn't seem to have an answer (other than confirming that yes, this can happen).
Issue with ManualResetEvent not releasing all waiting threads consistently
[EDIT]
As Ian Griffiths points out below, the answer is that the underlying Windows API that is used is not designed to support this.
It's unfortunate that the Microsoft documentation for ManualResetEventSlim.Set() states wrongly that it
Sets the state of the event to signaled, which allows one or more
threads waiting on the event to proceed.
Clearly "one or more" should be "zero or more".
Resetting a ManualResetEvent is not like calling Monitor.Pulse - it makes no guarantee that it will release any particular number of threads. On the contrary, the documentation (for the underlying Win32 synchronization primitive) is pretty clear that you can't know what will happen:
Any number of waiting threads, or threads that subsequently begin wait operations for the specified event object, can be released while the object's state is signaled
The key phrase here is "any number" which includes zero.
Win32 does provide a PulseEvent but as it says "This function is unreliable and should not be used." The remarks in its documentation at http://msdn.microsoft.com/en-us/library/windows/desktop/ms684914(v=vs.85).aspx provide some insight into why pulse-style semantics cannot reliably be achieved with an event object. (Basically, the kernel sometimes takes threads that are waiting for an event off its wait list temporarily, so it's always possible that a thread will miss a 'pulse' on an event. That's true whether you use PulseEvent or you try to do it yourself by setting and resetting the event.)
The intended semantics of ManualResetEvent is that it acts as a gate. The gate is open when you set it, and is closed when you reset it. If you open a gate and then quickly close it before anyone had a chance to get through the gate, you shouldn't be surprised if everyone is still on the wrong side of the gate. Only those who were alert enough to get through the gate while you held it open will get through. That's how it's meant to work, so that's why you're seeing what you see.
In particular the semantics of Set are very much not "open gate, and ensure all waiting threads are through the gate". (And if it were to mean that, it's not obvious what the kernel should do with multi-object waits.) So this is not a "problem" in the sense that the event isn't meant to be used the way you're trying to use it, so it's functioning correctly. But it is a problem in the sense that you won't be able to use this to get the effect you're looking for. (It's a useful primitive, it's just not useful for what you're trying to do. I tend to use ManualResetEvent exclusively for gates that are initially closed, and which get opened exactly once, and never get closed again.)
So you probably need to consider some of the other synchronization primitives.

Thread.Join appears to return false incorrectly

I am using Thread.Join(int millisecondsTimeout) to terminate a number of AppDomains.
Frequently, I get an error message stating that the AppDomain did not terminate within 5 seconds. Whilst stepping through the debugger I see that the AppDomain.Unload() call terminates easily within 5 seconds, but Thread.Join returns false.
Where am I going wrong?
var thread = new Thread(
() =>
{
try
{
AppDomain.Unload(someAppDomain);
}
catch (ArgumentNullException)
{
}
catch (CannotUnloadAppDomainException exception)
{
// Some error message
}
});
thread.Start();
const int numSecondsWait = 5;
if (!thread.Join(1000 * numSecondsWait))
{
// Some error message about it not exiting in 5 seconds
}
Edit 1
Worth adding what each of the AppDomains do. Each AppDomain has at least one Timer. The code roughly looks as follows, (keep in mind I've collapsed loads of classes into one here for readability).
static void Main(string[] args)
{
_exceptionThrown = new EventWaitHandle(false, EventResetMode.AutoReset);
_timer = new Timer(TickAction, null, 0, interval);
try
{
_exceptionThrown.WaitOne();
}
finally
{
_timer.Dispose(_timerWaitHandle);
WaitHandle.WaitAll(_timerWaitHandle);
}
}
In effect I know that the "Main" thread will throw a ThreadAbortException, jump into the finally statement and ensure the Timer queue is fully drained before exiting.
All of the Timers though log when they are inside the tick method. So I can be near certain that there is nothing on the timer queue, and the _timer.Dispose(_timerWaitHandle) returns immediately.
Regardless of whether it does or not, at least one of the three AppDomains I am Unloading will not complete it within 5 seconds.
If you want to be sure that the appdomains always unload within 5 seconds, you can try to measure it.
For example using something like this:
var stopwatch = System.Diagnostics.Stopwatch.StartNew();
AppDomain.Unload(someAppDomain);
long elapsedMillis = stopwatch.ElapsedMilliseconds;
System.Diagnostics.Trace.Writeline("Unload duration: " + elapsedMillis + " ms");
The Output window of Visual Studio (or the DebugView tool from sysinternals) should show it
The reason for this is well documented in the MSDN Library article for Unload():
In the .NET Framework version 2.0 there is a thread dedicated to unloading application domains. This improves reliability, especially when the .NET Framework is hosted. When a thread calls Unload, the target domain is marked for unloading. The dedicated thread attempts to unload the domain, and all threads in the domain are aborted. If a thread does not abort, for example because it is executing unmanaged code, or because it is executing a finally block, then after a period of time a CannotUnloadAppDomainException is thrown in the thread that originally called Unload. If the thread that could not be aborted eventually ends, the target domain is not unloaded. Thus, in the .NET Framework version 2.0 domain is not guaranteed to unload, because it might not be possible to terminate executing threads.
The threads in domain are terminated using the Abort method, which throws a ThreadAbortException in the thread. Although the thread should terminate promptly, it can continue executing for an unpredictable amount of time in a finally clause.
So you'll need to find out why your program has a thread running inside that appdomain and why it refuses to abort. Common for example when it is buried inside non-managed code. Use Debug + Windows + Threads to see them.

Thread.Start is not returning in some sparse cases in my c# application

I have written a TCP server application in c#. Application listens for inbound connections
using TcpListener.AcceptTcpClient() method in main listener thread.
When a connection is received, TcpListener.AcceptTcpClient() unblocks and returns TCPClient object.
On receiving a connection, a new thread is created and started to read write data to new connection.
The new thread is started by following code.
while(true)
{
TcpClient client = serverListener.AcceptTcpClient();
if (client.Connected)
{
Thread t = new Thread(delegate() { readWriteData(client); });
t.IsBackground = true;
t.Start(); /// Problem happens here. The thread gets stuck here and doesn't move further
}
}
The application runs fine but in some times in Windows 7 machines, the application suddenly stops listening for tcp connections.
On analysis of thread stacks of application in this state, ( Microsoft stack explorer was used to view stacks of all threads of the application ) it is found that the main listener thread is stuck on following line of the code section shown above
t.Start(); /// Problem happens here. The thread gets stuck here and doesn't move further
I did lot of research and couldn't find why it is happening. This behavior is observed only in windows 7 systems.
Can anybody please help me to solve this issue.
As suggested by Rob,
I am posting here stack trace shown by windbg (sos)
0547eae0 7282e006 mscorwks!Thread::StartThread+0xc3, calling mscorwks!_EH_epilog3
0547eb00 727ac825 mscorwks!__SwitchToThread+0xd, calling mscorwks!__DangerousSwitchToThread
0547eb10 728b9c6f mscorwks!ThreadNative::StartInner+0x1ba, calling mscorwks!__SwitchToThread
0547eb58 727e4b04 mscorwks!SafeHandle::DisposeNative+0x3a, calling mscorwks!LazyMachStateCaptureState
0547ebc8 728b9d80 mscorwks!ThreadNative::Start+0xa6, calling mscorwks!ThreadNative::StartInner
0547ec18 728b9d01 mscorwks!ThreadNative::Start+0x1f, calling mscorwks!LazyMachStateCaptureState
0547ec74 71de6afc (MethodDesc 0x71c13048 +0x8c System.Threading.Thread.Start()), calling mscorwks!ThreadNative::Start
0547ec8c 030e2a46 (MethodDesc 0x30da408 +0x25e WindowsService.Server.startListener()), calling (MethodDesc 0x71c13048 +0 System.Threading.Thread.Start())
Still I have not found the root cause why above mention problem is happening. However to prevent my application failing because of this situation I have implemented following workaround.
The modified code is as below.
count = 0;
while(true)
{
TcpClient client = serverListener.AcceptTcpClient();
if (client.Connected)
{
Thread t = new Thread(delegate() { readWriteData(client); });
t.IsBackground = true;
++count;
t.Start(); /// Problem happens here. The thread gets stuck here and doesn't move further
++count;
}
}
I check in another thread that if value of count hasn't changed in 5 secs and the value of count is odd number that means the listener thread is stuck on t.start(). In that case, I terminate the current listener thread and start new one.
I think I have figured out the issue.
I did close an open handle in another thread by mistake. I was closing same handle two times in a thread by using native close method via pinvoke. It might have happened that after first close of the handle, the same handle was assigned some where internal in process. After that second close was executed but it actually closed the open handle hence that led to unexplained unsuitability in the process.
After removing that second close, the issue didn't appear again.
Not sure if that can solve your issue:
// Start ThreadProc. Note that on a uniprocessor, the new
// thread does not get any processor time until the main thread
// is preempted or yields. Uncomment the Thread.Sleep that
// follows t.Start() to see the difference.
t.Start();
//Thread.Sleep(0);
for (int i = 0; i < 4; i++) {
Console.WriteLine("Main thread: Do some work.");
Thread.Sleep(0);
}
Source: http://msdn.microsoft.com/en-us/library/system.threading.thread.aspx
Another way to work arround your issue might be to use the TCPListener async:
http://msdn.microsoft.com/en-us/library/system.net.sockets.tcplistener.beginaccepttcpclient.aspx

How to Managed ALL running Threads in C# console appication?

I having problem managed thread parallel in console application.
I am running 10 threads parallel & all thread doing some specific task.
In case if any task is over/completed then doing stop/end thread and immediate I started new thread instance. I want 10 threads so anyone thread is going to stop/end then It generates new thread. but every time I want 10 threads in running mode in console application & It should be parallel work using C# console application.
How I can running 10 threads in C# console application?
At the end of each thread put a lock on some shared object (lock (obj) {}).
Then remove the current thread from a collection of threads you have.
If the collection.Count is less than 10 create a new one and put inside the collection.
Release the lock.
private List<Thread> threads = new List<Thread>();
private void ThreadFunction() {
// do something
// here before the lock
lock (threads) {
threads.Remove(Thread.CurrentThread);
if (thread.Count < 10) {
Thread t = new Thread(ThreadFunction);
threads.Add(t);
t.Start();
}
}
}
Be sure to catch all exception inside the thread or you code will fail when a thread exception happens. That is make sure that the lock part of the code is always called (except on a Thread abord exception but that will not matter).
But as stated I think you should use a ThreadPool for such a task...
The book on threads in .Net is: http://www.albahari.com/threading/
This alone will probably answer any questions you have.
Depending on what you are using these threads for (I am guessing that you may be talking about running transactions in the background) you may want to use BackgroundWorker.
http://msdn.microsoft.com/en-us/library/system.componentmodel.backgroundworker.aspx
BackgroundWorker lets you deal with Begin/End/Progress Events only, making debugging much less error prone.

Categories