I have a long running task that needs to only have one instance running at a time. I chose Azure Durable Entities based on the documentation that these are designed for this sort of situation, but it seems that past a certain threshold the entity will re-run the task after execution has completed. It does the job of single threading the task beautifully, but never seems to recognize that it has completed.
Here is an example with the Sleep call representing the long running task execution:
[FunctionName("LongRunningTask")]
public static void LongRunningTask([EntityTrigger] IDurableEntityContext context, ILogger log)
{
var sleepTime = context.GetInput<TimeSpan>();
var state = context.GetState<LongRunningTaskState>() ?? new LongRunningTaskState();
log.LogInformation($"Waiting for {sleepTime}... State when started: {JsonConvert.SerializeObject(state)}");
System.Threading.Thread.Sleep(sleepTime);
state.RunCount++;
context.SetState(state);
var updatedState = context.GetState<LongRunningTaskState>() ?? new LongRunningTaskState();
log.LogInformation($"Finished waiting {sleepTime}. New state is {JsonConvert.SerializeObject(updatedState)}");
context.Return(state);
}
[FunctionName("StartLongRunningTask")]
public static async Task<IActionResult> StartLongRunningTask(
[HttpTrigger(AuthorizationLevel.Anonymous, "get", "post", Route = "StartLongRunningTask/{seconds:int}")] HttpRequestMessage req,
int seconds,
[DurableClient] IDurableEntityClient starter,
ILogger log)
{
var delay = TimeSpan.FromSeconds(seconds);
var entityId = new EntityId(nameof(LongRunningTask), "Singleton");
await starter.SignalEntityAsync(entityId, "Download", operationInput: delay);
return (ActionResult)(new OkObjectResult($"Schedule task for {delay}"));
}
If I tell it to wait 30 seconds, it behaves as expected:
[2023-02-03T23:02:22.452Z] Executing 'StartLongRunningTask' (Reason='This function was programmatically called via the host APIs.', Id=542d8dad-58a2-47a1-94ac-5f5489a56d67)
[2023-02-03T23:02:22.514Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' scheduled. Reason: EntitySignal:Download. IsReplay: False. State: Scheduled. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 2.
[2023-02-03T23:02:22.529Z] Executed 'StartLongRunningTask' (Succeeded, Id=542d8dad-58a2-47a1-94ac-5f5489a56d67, Duration=98ms)
[2023-02-03T23:02:22.660Z] Executing 'LongRunningTask' (Reason='(null)', Id=11e224a1-acc1-404e-b29f-7dd062e16e40)
[2023-02-03T23:02:22.666Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' started. IsReplay: False. Input: (216 bytes). State: Started. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 3. TaskEventId: -1
[2023-02-03T23:02:22.676Z] Waiting for 00:00:30... State when started: {"runCount":1}
[2023-02-03T23:02:52.773Z] Finished waiting 00:00:30. New state is {"runCount":2}
[2023-02-03T23:02:52.777Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed 'Download' operation 7470263c-de86-4243-84e5-eec14d489af0 in 30103.6145ms. IsReplay: False. Input: (216 bytes). Output: (56 bytes). HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 4.
[2023-02-03T23:02:52.809Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed. ContinuedAsNew: True. IsReplay: False. Output: (56 bytes). State: Completed. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 5. TaskEventId: -1
[2023-02-03T23:02:52.815Z] Executed 'LongRunningTask' (Succeeded, Id=11e224a1-acc1-404e-b29f-7dd062e16e40, Duration=30157ms)
But if I tell it to wait 10 minutes, it enters an endless loop where it doesn't recognize the successful completion of the previous iteration. Note the state at the beginning of each attempt stays at "runCount": 1:
[2023-02-03T22:29:41.355Z] Executing 'StartLongRunningTask' (Reason='This function was programmatically called via the host APIs.', Id=bd751b6f-71f8-4c1d-97ee-24b3027a6d85)
[2023-02-03T22:29:41.427Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' scheduled. Reason: EntitySignal:Download. IsReplay: False. State: Scheduled. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 2.
[2023-02-03T22:29:41.453Z] Executed 'StartLongRunningTask' (Succeeded, Id=bd751b6f-71f8-4c1d-97ee-24b3027a6d85, Duration=124ms)
[2023-02-03T22:29:41.474Z] Executing 'LongRunningTask' (Reason='(null)', Id=bdbb65aa-ac65-4252-bab8-028974a81148)
[2023-02-03T22:29:41.481Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' started. IsReplay: False. Input: (216 bytes). State: Started. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 3. TaskEventId: -1
[2023-02-03T22:29:41.500Z] Waiting for 00:10:00... State when started: {"runCount":1}
[2023-02-03T22:39:41.586Z] Finished waiting 00:10:00. New state is {"runCount":2}
[2023-02-03T22:39:41.592Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed 'Download' operation 8a07b5fb-fa93-4ff5-9278-2b33eb125ea1 in 600097.4966ms. IsReplay: False. Input: (216 bytes). Output: (56 bytes). HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 4.
[2023-02-03T22:39:41.616Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed. ContinuedAsNew: True. IsReplay: False. Output: (56 bytes). State: Completed. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 5. TaskEventId: -1
[2023-02-03T22:39:41.619Z] Executed 'LongRunningTask' (Succeeded, Id=bdbb65aa-ac65-4252-bab8-028974a81148, Duration=600148ms)
[2023-02-03T22:39:41.653Z] Executing 'LongRunningTask' (Reason='(null)', Id=18e7056a-22e2-496f-8486-a7704791713e)
[2023-02-03T22:39:41.654Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' started. IsReplay: False. Input: (216 bytes). State: Started. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 6. TaskEventId: -1
[2023-02-03T22:39:41.656Z] Waiting for 00:10:00... State when started: {"runCount":1}
[2023-02-03T22:49:42.466Z] Finished waiting 00:10:00. New state is {"runCount":2}
[2023-02-03T22:49:42.470Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed 'Download' operation 4e08a7d2-a0bf-4d68-bc2a-eff989ef3007 in 600814.92ms. IsReplay: False. Input: (216 bytes). Output: (56 bytes). HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 7.
[2023-02-03T22:49:42.473Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed. ContinuedAsNew: True. IsReplay: False. Output: (56 bytes). State: Completed. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 8. TaskEventId: -1
[2023-02-03T22:49:42.475Z] Executed 'LongRunningTask' (Succeeded, Id=18e7056a-22e2-496f-8486-a7704791713e, Duration=600822ms)
[2023-02-03T22:49:42.502Z] Executing 'LongRunningTask' (Reason='(null)', Id=d57f3470-af6f-464a-9b14-8091e70946c2)
[2023-02-03T22:49:42.503Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' started. IsReplay: False. Input: (216 bytes). State: Started. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 9. TaskEventId: -1
[2023-02-03T22:49:42.505Z] Waiting for 00:10:00... State when started: {"runCount":1}
[2023-02-03T22:59:42.567Z] Finished waiting 00:10:00. New state is {"runCount":2}
[2023-02-03T22:59:42.570Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed 'Download' operation 8a07b5fb-fa93-4ff5-9278-2b33eb125ea1 in 600065.2645ms. IsReplay: False. Input: (216 bytes). Output: (56 bytes). HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 10.
[2023-02-03T22:59:42.571Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' completed. ContinuedAsNew: True. IsReplay: False. Output: (56 bytes). State: Completed. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 11. TaskEventId: -1
[2023-02-03T22:59:42.572Z] Executed 'LongRunningTask' (Succeeded, Id=d57f3470-af6f-464a-9b14-8091e70946c2, Duration=600070ms)
[2023-02-03T22:59:42.594Z] Executing 'LongRunningTask' (Reason='(null)', Id=94e639a1-c460-494e-b48c-3215253acf6f)
[2023-02-03T22:59:42.595Z] #longrunningtask#Singleton: Function 'longrunningtask (Entity)' started. IsReplay: False. Input: (216 bytes). State: Started. HubName: TestHubName. AppName: . SlotName: . ExtensionVersion: 2.9.0. SequenceNumber: 12. TaskEventId: -1
[2023-02-03T22:59:42.596Z] Waiting for 00:10:00... State when started: {"runCount":1}
I have the function timeout set to 4 hours in my host.json file and a premium app service plan that allows for longer running functions.
Related
The code sample below
using System.Threading;
namespace TimerApp
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("***** Timer Application *****\n");
Console.WriteLine("In the thread #{0}", Thread.CurrentThread.ManagedThreadId);
// Create the delegate for the Timer type.
TimerCallback timerCB = new TimerCallback(ShowTime);
// Establish timer settings.
Timer t = new Timer(
timerCB, // The TimerCallback delegate object.
"Hello from Main()", // Any info to pass into the called method (null for no info).
0, // Amount of time to wait before starting (in milliseconds).
1000); // Interval of time between calls (in milliseconds).
Console.WriteLine("Hit key to terminate...");
Console.ReadLine();
}
// Method to show current time...
public static void ShowTime(object state)
{
Console.WriteLine("From the thread #{0}, it is background?{1}: time is {2}, param is {3}",
Thread.CurrentThread.ManagedThreadId,
Thread.CurrentThread.IsBackground,
DateTime.Now.ToLongTimeString(),
state.ToString());
}
}
}
produces the following output
***** Timer Application *****
In the thread #1
Hit key to terminate...
From the thread #4, it is background?True: time is 10:37:54 PM, param is Hello from Main()
From the thread #4, it is background?True: time is 10:37:55 PM, param is Hello from Main()
From the thread #5, it is background?True: time is 10:37:56 PM, param is Hello from Main()
From the thread #4, it is background?True: time is 10:37:57 PM, param is Hello from Main()
From the thread #5, it is background?True: time is 10:37:58 PM, param is Hello from Main()
From the thread #4, it is background?True: time is 10:37:59 PM, param is Hello from Main()
From the thread #5, it is background?True: time is 10:38:00 PM, param is Hello from Main()
...
Press any key to continue . . .
Does the System.Threading.Timer make callbacks using several threads at a time?
It makes use of the thread pool, using the first thread that it finds available at each time interval. The timer simply triggers the firing of these threads.
void Main()
{
System.Threading.Timer timer = new Timer((x) =>
{
Console.WriteLine($"{DateTime.Now.TimeOfDay} - Is Thread Pool Thread: {Thread.CurrentThread.IsThreadPoolThread} - Managed Thread Id: {Thread.CurrentThread.ManagedThreadId}");
Thread.Sleep(5000);
}, null, 1000, 1000);
Console.ReadLine();
}
Output
07:19:44.2628607 - Is Thread Pool Thread: True - Managed Thread Id: 10
07:19:45.2639080 - Is Thread Pool Thread: True - Managed Thread Id: 13
07:19:46.2644998 - Is Thread Pool Thread: True - Managed Thread Id: 9
07:19:47.2649563 - Is Thread Pool Thread: True - Managed Thread Id: 8
07:19:48.2660500 - Is Thread Pool Thread: True - Managed Thread Id: 12
07:19:49.2664012 - Is Thread Pool Thread: True - Managed Thread Id: 14
07:19:50.2669635 - Is Thread Pool Thread: True - Managed Thread Id: 15
07:19:51.2679269 - Is Thread Pool Thread: True - Managed Thread Id: 10
07:19:52.2684307 - Is Thread Pool Thread: True - Managed Thread Id: 9
07:19:53.2693090 - Is Thread Pool Thread: True - Managed Thread Id: 13
07:19:54.2839838 - Is Thread Pool Thread: True - Managed Thread Id: 8
07:19:55.2844800 - Is Thread Pool Thread: True - Managed Thread Id: 12
07:19:56.2854568 - Is Thread Pool Thread: True - Managed Thread Id: 15
In the code above we are setting the thread to wait 5 seconds, so after printing out to the console, the thread is kept alive for an additional 5 seconds before completing execution and returning to the Thread Pool.
The timer carries on firing on each second regardless, it's not waiting on the thread it triggered to complete.
First time attempting to use unit tests. I'm going to go through my steps:
Right clicked solution of my project I want to test in visual studio and clicked 'new project'. Then I selected Test > Test Project.
This gave me another project under my solution with a Test.cs file. I added the following to it:
namespace TestProject1
{
[TestClass]
public class MainTest
{
//Project1.MainWindow mw = new Project1.MainWindow(); //not used in test yet
[TestMethod]
public void MakeDoubleDate_Test()
{
string validMpacString = "1998,265/302010"; //When converted, should be 36060.430902777778
string nonValidString1 = "nope,700/807060";
string nonValidString2 = "thisDoesn't work";
double validDouble = Project1.ConversionsUnit.MakeDoubleDate(validMpacString);
double nonValidDouble1 = Project1.ConversionsUnit.MakeDoubleDate(nonValidString1);
double nonValidDouble2 = Project1.ConversionsUnit.MakeDoubleDate(nonValidString2);
Assert.AreEqual(validDouble, 36060.430902777778);
Assert.AreEqual(nonValidDouble1, DateTime.Now.ToOADate());
Assert.AreEqual(nonValidDouble2, DateTime.Now.ToOADate());
}
}
}
My original project is called Project1. In my test project, I added a reference to Project1.
Now, my test shows up in the test view, but trying to run it it just is stuck on pending forever. I tried another person's project w/ tests and it did the same thing. Not sure what I need to do. Haven't had any luck snooping around google.
Edit: Here's some Debug output when I try running it:
The thread 'ExecutionUtilities.InvokeWithTimeout helper thread 'Microsoft.VisualStudio.TestTools.TestTypes.Unit.UnitTestAdapter.AbortTestRun'' (0x4748) has exited with code 0 (0x0).
The thread 'Agent: adapter run thread for test 'MakeDoubleDate_Test' with id '1bc08c40-ee7f-46e5-8689-8237cd3ffe4b'' (0x2820) has exited with code 0 (0x0).
The thread 'Agent: state execution thread for test 'MakeDoubleDate_Test' with id '1bc08c40-ee7f-46e5-8689-8237cd3ffe4b'' (0x1848) has exited with code 0 (0x0).
The thread 'Agent: test queue thread' (0x3ecc) has exited with code 0 (0x0).
W, 18160, 8, 2016/03/29, 12:52:54.995, USERNAME\QTAgent32.exe, AgentObject.AgentStateWaiting: Proceeding to clean up data collectors since connection to controller is lost
The thread 'Agent: heartbeat thread' (0x4560) has exited with code 0 (0x0).
The thread '<No Name>' (0x2284) has exited with code 0 (0x0).
The thread '<No Name>' (0x4484) has exited with code 0 (0x0).
The thread '<No Name>' (0x43f4) has exited with code 0 (0x0).
The thread '<No Name>' (0x3a50) has exited with code 0 (0x0).
The thread '<No Name>' (0x4424) has exited with code 0 (0x0).
Which continues until I exit out of visual studio.
Edit: Saw something about visual studio 2010 having problems w/o service pack 1. Turns out I don't have it. Updating now, hopefully it works.
Updating my visual studio 2010 to service pack 1 solved the issue and my tests run correctly now.
Got the idea to do so from this link: VS2010 Unit test “Pending” and the test cannot be completed
I wrote a Windows service in C# targeting .NET 4.0 which will on the odd occasion hang completely when I attempt to stop the service. I've noticed from looking at a dump file that a number of my threads are suspended, though I don't suspend them myself in my code.
The environment is Windows Server 2008R2 64bit, though I've observed the same hang on Windows 7 64bit. .NET 4.0 is the latest version installed.
There's a lot of code so I'm just posting some hopefully relevant snippets, I can post more if required.
Basic design:
Main() starts a new thread to handle logging to a file (the code for which is in a separate dll), then starts the service.
public static void Main(string[] args)
{
...
else if (Args.RunService)
{
Logger.Options.LogToFile = true;
MSPO.Logging.Logger.Start();
RunService();
MSPO.Logging.Logger.Stop();
}
...
}
private static void RunService()
{
service = new ProcessThrottlerService();
System.ServiceProcess.ServiceBase.Run(service);
}
That thread stays there, until ServiceBase.Run returns.
OnStart() in the service creates a new thread and starts it.
protected override void OnStart(string[] args)
{
serviceThread = new MainServiceThread();
serviceThread.StartThread();
base.OnStart(args);
}
I create a ManualResetEventSlim which is used as the stop signal for the rest of the program. OnStop() sets the event.
protected override void OnStop()
{
if (serviceThread != null)
{
serviceThread.StopThread(); // Event is signalled in there
serviceThread.WaitForThreadToReturn(); // This calls thread.Join() on the MainServiceThread thread
}
base.OnStop();
}
The "MainServiceThread" creates the event, kicks off a new thread again, and just waits on the event.
private void StartHandlerAndWaitForServiceStop()
{
processHandler.Start(serviceStopEvent);
serviceStopEvent.Wait();
processHandler.Stop();
}
The processHandler thread subscribes to this WMI query:
watcher = new ManagementEventWatcher(new ManagementScope("root\\CIMV2"),
new WqlEventQuery("SELECT * FROM Win32_ProcessStartTrace"));
watcher.EventArrived += HandleNewProcessCreated;
If the new process name is of interest, I create a new "throttler" thread which effectively just suspends the process, sleeps, resumes the process, and sleeps again, on a loop:
while (true)
{
ntresult = Ntdll.NtResumeProcess(processHandle);
if (ntresult != Ntdll.NTSTATUS.STATUS_SUCCESS)
{
if (ntresult != Ntdll.NTSTATUS.STATUS_PROCESS_IS_TERMINATING)
LogSuspendResumeFailure("resume", ntresult);
break;
}
Thread.Sleep(resumeTime);
ntresult = Ntdll.NtSuspendProcess(processHandle);
if (ntresult != Ntdll.NTSTATUS.STATUS_SUCCESS)
{
if (ntresult != Ntdll.NTSTATUS.STATUS_PROCESS_IS_TERMINATING)
LogSuspendResumeFailure("suspend", ntresult);
break;
}
Thread.Sleep(suspendTime);
if (++loop >= loopsBeforeCheckingStopEvent)
{
if (stopEvent.IsSet) break;
loop = 0;
}
}
If the service receives a stop command, it will set the ManualResetEventSlim event. Any threads "throttling" processes will see it within 1 second and break out of the loop/return. The process handler thread will wait on all of those threads to return, and then return as well. At that point the StartHandlerAndWaitForServiceStop() method posted above will return, and the other threads that have been waiting here and there return.
The vast majority of the times I've stopped the service, it stops without any problems. This is regardless of whether I've got 0 or 500 throttler threads running, and regardless of whether any have ever been created while the service was running.
However now and again when I try to stop it (through services.msc), it will hang. Yesterday I managed to create a full dump of the process while it was in this state. I created the dump with Process Explorer.
The dump file shows that a number of my threads are suspended:
0:010> ~
0 Id: 1840.c34 Suspend: 0 Teb: 000007ff`fffdd000 Unfrozen
1 Id: 1840.548 Suspend: 0 Teb: 000007ff`fffdb000 Unfrozen
2 Id: 1840.9c0 Suspend: 0 Teb: 000007ff`fffd9000 Unfrozen
3 Id: 1840.1da8 Suspend: 0 Teb: 000007ff`fffd7000 Unfrozen
4 Id: 1840.b08 Suspend: 3 Teb: 000007ff`fffd5000 Unfrozen
5 Id: 1840.1b5c Suspend: 0 Teb: 000007ff`ffef6000 Unfrozen
6 Id: 1840.af0 Suspend: 2 Teb: 000007ff`ffef2000 Unfrozen
7 Id: 1840.c60 Suspend: 0 Teb: 000007ff`ffef0000 Unfrozen
8 Id: 1840.1d94 Suspend: 4 Teb: 000007ff`ffeee000 Unfrozen
9 Id: 1840.1cd8 Suspend: 4 Teb: 000007ff`ffeec000 Unfrozen
. 10 Id: 1840.1c64 Suspend: 0 Teb: 000007ff`ffefa000 Unfrozen
11 Id: 1840.1dc8 Suspend: 0 Teb: 000007ff`fffd3000 Unfrozen
12 Id: 1840.8f4 Suspend: 0 Teb: 000007ff`ffefe000 Unfrozen
This ties up with what I was seeing in Process Explorer - of the two processes I was "throttling", one was permanently suspended, the other was permanently resumed. So those throttler threads were effectively suspended, as they were no longer doing their work. It should be impossible for them to stop without being suspended, as I have error handling wrapped around it and any exception would cause those threads to log info and return. Plus their call stacks showed no errors. They weren't sleeping permanently due to some error, because the sleep times were 22 and 78 milliseconds for each of the two sleeps, and it was working fine before I tried to stop the service.
So I'm trying to understand how those threads could have become suspended. My only suspicion is the GC, cause that suspends threads while reclaiming/compacting memory.
I've pasted the content of !eestack and ~*kb here: http://pastebin.com/rfQK0Ak8
I should mention I didn't have symbols, as I'd already rebuilt the application a number of times by the time I created the dump. However as it's .NET I guess it's less of an issue?
From eestack, these are what I believe are "my" threads:
Thread 0: Main service thread, it's still in the ServiceBase.Run method.
Thread 4: That is my logger thread. That thread will spend most of its life waiting on a blocking queue.
Thread 6: My MainServiceThread thread, which is just waiting on the event to be set.
Threads 8 & 9: Both are "throttler" thread, executing the loop I posted above.
Thread 10: This thread appears to be executing the OnStop() method, so is handling the service stop command.
That's it, and threads 4, 6, 8, and 9 are suspended according to the dump file. So all "my" threads are suspended, apart from the main thread and the thread handling the OnStop() method.
Now I don't know much about the GC and debugging .NET stuff, but thread 10 looks dodgy to me. Excerpt from call stack:
Thread 10
Current frame: ntdll!NtWaitForMultipleObjects+0xa
Child-SP RetAddr Caller, Callee
000000001a83d670 000007fefdd41420 KERNELBASE!WaitForMultipleObjectsEx+0xe8, calling ntdll!NtWaitForMultipleObjects
000000001a83d6a0 000007fef4dc3d7c clr!CExecutionEngine::ClrVirtualAlloc+0x3c, calling kernel32!VirtualAllocStub
000000001a83d700 000007fefdd419bc KERNELBASE!WaitForMultipleObjectsEx+0x224, calling ntdll!RtlActivateActivationContextUnsafeFast
000000001a83d710 000007fef4e9d3aa clr!WKS::gc_heap::grow_heap_segment+0xca, calling clr!StressLog::LogOn
000000001a83d730 000007fef4e9cc98 clr!WKS::gc_heap::adjust_limit_clr+0xec, calling clr!memset
000000001a83d740 000007fef4df398d clr!COMNumber::FormatInt32+0x8d, calling clr!LazyMachStateCaptureState
000000001a83d750 000007fef4df398d clr!COMNumber::FormatInt32+0x8d, calling clr!LazyMachStateCaptureState
000000001a83d770 00000000778a16d3 kernel32!WaitForMultipleObjectsExImplementation+0xb3, calling kernel32!WaitForMultipleObjectsEx
000000001a83d7d0 000007fef4e9ce73 clr!WKS::gc_heap::allocate_small+0x158, calling clr!WKS::gc_heap::a_fit_segment_end_p
000000001a83d800 000007fef4f8f8e1 clr!WaitForMultipleObjectsEx_SO_TOLERANT+0x91, calling kernel32!WaitForMultipleObjectsExImplementation
000000001a83d830 000007fef4dfb798 clr!Thread::GetApartment+0x34, calling clr!GetThread
000000001a83d860 000007fef4f8f6ed clr!Thread::GetFinalApartment+0x1a, calling clr!Thread::GetApartment
000000001a83d890 000007fef4f8f6ba clr!Thread::DoAppropriateAptStateWait+0x56, calling clr!WaitForMultipleObjectsEx_SO_TOLERANT
000000001a83d8d0 000007fef4f8f545 clr!Thread::DoAppropriateWaitWorker+0x1b1, calling clr!Thread::DoAppropriateAptStateWait
000000001a83d990 000007fef4ecf167 clr!ObjectNative::Pulse+0x147, calling clr!HelperMethodFrameRestoreState
000000001a83d9d0 000007fef4f8f63b clr!Thread::DoAppropriateWait+0x73, calling clr!Thread::DoAppropriateWaitWorker
000000001a83da50 000007fef4f0ff6a clr!Thread::JoinEx+0xa6, calling clr!Thread::DoAppropriateWait
000000001a83dac0 000007fef4defd90 clr!GCHolderBase<0,0,0,0>::EnterInternal+0x3c, calling clr!Thread::EnablePreemptiveGC
000000001a83daf0 000007fef4f1039a clr!ThreadNative::DoJoin+0xd8, calling clr!Thread::JoinEx
000000001a83db20 000007fef45f86f3 (MethodDesc 000007fef3cbe8d8 +0x1a3 System.Threading.SemaphoreSlim.Release(Int32)), calling 000007fef4dc31b0 (stub for System.Threading.Monitor.Exit(System.Object))
000000001a83db60 000007fef4dfb2a6 clr!FrameWithCookie<HelperMethodFrame_1OBJ>::FrameWithCookie<HelperMethodFrame_1OBJ>+0x36, calling clr!GetThread
000000001a83db90 000007fef4f1024d clr!ThreadNative::Join+0xfd, calling clr!ThreadNative::DoJoin
000000001a83dc40 000007ff001723f5 (MethodDesc 000007ff001612c0 +0x85 MSPO.Logging.MessageQueue.EnqueueMessage(System.String)), calling (MethodDesc 000007fef30fde88 +0 System.Collections.Concurrent.BlockingCollection`1[[System.__Canon, mscorlib]].TryAddWithNoTimeValidation(System.__Canon, Int32, System.Threading.CancellationToken))
000000001a83dcf0 000007ff001720e9 (MethodDesc 000007ff00044bb0 +0xc9 ProcessThrottler.Logging.Logger.Log(LogLevel, System.String)), calling (MethodDesc 000007ff00161178 +0 MSPO.Logging.MessageFormatter.QueueFormattedOutput(System.String, System.String))
000000001a83dd10 000007fef4f101aa clr!ThreadNative::Join+0x5a, calling clr!LazyMachStateCaptureState
000000001a83dd30 000007ff0018000b (MethodDesc 000007ff00163e10 +0x3b ProcessThrottler.Service.MainServiceThread.WaitForThreadToReturn()), calling 000007fef4f10150 (stub for System.Threading.Thread.JoinInternal())
000000001a83dd60 000007ff0017ff44 (MethodDesc 000007ff00049f30 +0xc4 ProcessThrottler.Service.ProcessThrottlerService.OnStop()), calling 000007ff0004d278 (stub for ProcessThrottler.Service.MainServiceThread.WaitForThreadToReturn())
000000001a83dda0 000007fef63fcefb (MethodDesc 000007fef63d65e0 +0xbb System.ServiceProcess.ServiceBase.DeferredStop())
I could post more code showing what each of my functions is doing, but I really don't think this is a deadlock in my code, as the threads would not become suspended in that case. So I'm looking at the above call stack and seeing it's doing some GC stuff after I tell it to log a string to a queue. But none of that GC stuff looks dodgy, at least not compared to what I'm seeing in http://blogs.msdn.com/b/tess/archive/2008/02/11/hang-caused-by-gc-xml-deadlock.aspx I have a config file to tell it to use gcServer, but I'm almost certain it's not using that setting because in my earlier testing GCSettings.IsServerGC always returned false.
So... does anyone have any suggestions as to why my threads are suspended?
This is my OpenProcess method BTW which gets the handle to the process to be suspended/resumed, in response to Hans's comment:
private void GetProcessHandle(CurrentProcessDetails process)
{
IntPtr handle = Kernel32.OpenProcess(
process.Settings.RequiredProcessAccessRights,
false,
(uint)process.ID
);
if (handle == IntPtr.Zero)
throw new Win32ExceptionWrapper(
string.Format("Failed to open process {0} {1}",
process.Settings.ProcessNameWithExt, process.IDString));
process.Handle = handle;
}
I've discovered the cause. It has nothing to do with my code. It's a bug in Process Explorer.
My program is written to target .NET 4.0. If I use Process Explorer to view any of my threads' call stacks, Process Explorer suspends the thread and doesn't resume it. What it should do is suspend the thread while it gets the call stack, and then resume it immediately. But it's not resuming the threads - not my managed threads, anyway.
I can replicate it with this very simple code:
using System;
namespace Test
{
class Program
{
static void Main(string[] args)
{
for (int i = 0; i < int.MaxValue; i++)
{
Console.WriteLine(i.ToString());
}
}
}
}
If I compile that to target .NET 4.0 or higher, run it, and use Process Explorer to open the thread running the loop, the thread will become suspended. The resume button will become available, and I can click it to resume the thread. Opening the thread multiple times results in it being suspended multiple times; I confirmed this by using Windbg to view the suspend count of the thread.
If I compile it to target versions below 4.0 (tried 2.0 and 3.5), threads I open in Process Explorer do not remain suspended.
I have the following Scenario.
I take 50 jobs from the database into a blocking collection.
Each job is a long running one. (potentially could be). So I want to run them in a separate thread. (I know - it may be better to run them as Task.WhenAll and let the TPL figure it out - but I want to control how many runs simultaneously)
Say I want to run 5 of them simultaneously (configurable)
I create 5 tasks (TPL), one for each job and run them in parallel.
What I want to do is to pick up the next Job in the blocking collection as soon as one of the jobs from step 4 is complete and keep going until all 50 are done.
I am thinking of creating a Static blockingCollection and a TaskCompletionSource which will be invoked when a job is complete and then it can call the consumer again to pick one job at a time from the queue. I would also like to call async/await on each job - but that's on top of this - not sure if that has an impact on the approach.
Is this the right way to accomplish what I'm trying to do?
Similar to this link, but catch is that I want to process the next Job as soon as one of the first N items are done. Not after all N are done.
Update :
Ok, I have this code snippet doing exactly what I want, if someone wants to use it later. As you can see below, 5 threads are created and each thread starts the next job when it is done with current. Only 5 threads are active at any given time. I understand this may not work 100% like this always, and will have performance issues of context switching if used with one cpu/core.
var block = new ActionBlock<Job>(
job => Handler.HandleJob(job),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
foreach (Job j in GetJobs())
block.SendAsync(j);
Job 2 started on thread :13. wait time:3600000ms. Time:8/29/2014
3:14:43 PM
Job 4 started on thread :14. wait time:15000ms. Time:8/29/2014
3:14:43 PM
Job 0 started on thread :7. wait time:600000ms. Time:8/29/2014
3:14:43 PM
Job 1 started on thread :12. wait time:900000ms. Time:8/29/2014
3:14:43 PM
Job 3 started on thread :11. wait time:120000ms. Time:8/29/2014
3:14:43 PM
job 4 finished on thread :14. 8/29/2014 3:14:58 PM
Job 5 started on thread :14. wait time:1800000ms. Time:8/29/2014
3:14:58 PM
job 3 finished on thread :11. 8/29/2014 3:16:43 PM
Job 6 started on thread :11. wait time:1200000ms. Time:8/29/2014
3:16:43 PM
job 0 finished on thread :7. 8/29/2014 3:24:43 PM
Job 7 started on thread :7. wait time:30000ms. Time:8/29/2014 3:24:43
PM
job 7 finished on thread :7. 8/29/2014 3:25:13 PM
Job 8 started on thread :7. wait time:100000ms. Time:8/29/2014
3:25:13 PM
job 8 finished on thread :7. 8/29/2014 3:26:53 PM
Job 9 started on thread :7. wait time:900000ms. Time:8/29/2014
3:26:53 PM
job 1 finished on thread :12. 8/29/2014 3:29:43 PM
Job 10 started on thread :12. wait time:300000ms. Time:8/29/2014
3:29:43 PM
job 10 finished on thread :12. 8/29/2014 3:34:43 PM
Job 11 started on thread :12. wait time:600000ms. Time:8/29/2014
3:34:43 PM
job 6 finished on thread :11. 8/29/2014 3:36:43 PM
Job 12 started on thread :11. wait time:300000ms. Time:8/29/2014
3:36:43 PM
job 12 finished on thread :11. 8/29/2014 3:41:43 PM
Job 13 started on thread :11. wait time:100000ms. Time:8/29/2014
3:41:43 PM
job 9 finished on thread :7. 8/29/2014 3:41:53 PM
Job 14 started on thread :7. wait time:300000ms. Time:8/29/2014
3:41:53 PM
job 13 finished on thread :11. 8/29/2014 3:43:23 PM
job 11 finished on thread :12. 8/29/2014 3:44:43 PM
job 5 finished on thread :14. 8/29/2014 3:44:58 PM
job 14 finished on thread :7. 8/29/2014 3:46:53 PM
job 2 finished on thread :13. 8/29/2014 4:14:43 PM
You can easily achieve what you need using TPL Dataflow.
What you can do is use BufferBlock<T>, which is a buffer for storing you data, and link it together with an ActionBlock<T> which will consume those requests as they're coming in from the BufferBlock<T>.
Now, the beauty here is that you can specify how many requests you want the ActionBlock<T> to handle concurrently using the ExecutionDataflowBlockOptions class.
Here's a simplified console version, which processes a bunch of numbers as they're coming in, prints their name and Thread.ManagedThreadID:
private static void Main(string[] args)
{
var bufferBlock = new BufferBlock<int>();
var actionBlock =
new ActionBlock<int>(i => Console.WriteLine("Reading number {0} in thread {1}",
i, Thread.CurrentThread.ManagedThreadId),
new ExecutionDataflowBlockOptions
{MaxDegreeOfParallelism = 5});
bufferBlock.LinkTo(actionBlock);
Produce(bufferBlock);
Console.ReadKey();
}
private static void Produce(BufferBlock<int> bufferBlock)
{
foreach (var num in Enumerable.Range(0, 500))
{
bufferBlock.Post(num);
}
}
You can also post them asynchronously if needed, using the awaitable BufferBlock.SendAsync
That way, you let the TPL handle all the throttling for you without needing to do it manually.
You can use BlockingCollection and it will work just fine, but it was built before async-await so it blocks synchronously which could be less scalable in most cases.
You're better off using async ready TPL Dataflow as Yuval Itzchakov suggested. All you need is an ActionBlock that processes each item concurrently with a MaxDegreeOfParallelism of 5 and you post your work to it synchronously (block.Post(item)) or asynchronously (await block.SendAsync(item)):
private static void Main()
{
var block = new ActionBlock<Job>(
async job => await job.ProcessAsync(),
new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 5});
for (var i = 0; i < 50; i++)
{
block.Post(new Job());
}
Console.ReadKey();
}
You could do this with a SemaphoreSlim like in this answer, or using ForEachAsync like in this answer.
I'm running .NET application (.NET 4.5) on Mono on Debian/Raspbian (on Raspberry Pi). And very often, say 9 out of 10 runs, I see after a while:
_wapi_handle_ref: Attempting to ref unused handle 0x770
_wapi_handle_unref_full: Attempting to unref unused handle 0x770
Of course the "0x770" is always different.
The applications runs fine then a short time. But eventually fails - either hard or just stops progressing (looks like deadlock/livelock).
Anything guide how to pinpoint problem in .NET code causing it and help Mono resolve it?
Mono version info:
Mono JIT compiler version 3.2.3 (Debian 3.2.3+dfsg-5+rpi1)
Copyright (C) 2002-2012 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
TLS: __thread
SIGSEGV: normal
Notifications: epoll
Architecture: armel,vfp+hard
Disabled: none
Misc: softdebug
LLVM: supported, not enabled.
GC: sgen
On 3.2.7 (built from current sources) the app fails even harder:
Stacktrace:
at <unknown> <0xffffffff>
at (wrapper managed-to-native) System.Buffer.BlockCopyInternal (System.Array,int,System.Array,int,int) <0xffffffff>
at System.IO.FileStream.ReadSegment (byte[],int,int) <0x0006f>
at System.IO.FileStream.ReadInternal (byte[],int,int) <0x00233>
at (wrapper runtime-invoke) <Module>.runtime_invoke_int__this___object_int_int (object,intptr,intptr,intptr) <0xffffffff>
Native stacktrace:
Debug info from gdb:
Mono support loaded.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
[New Thread 0xb519a430 (LWP 1456)]
[New Thread 0xb52ba430 (LWP 32577)]
[New Thread 0xb52da430 (LWP 32576)]
[New Thread 0xb5538430 (LWP 32574)]
[New Thread 0xb5b7b430 (LWP 32573)]
0xb6f05494 in pthread_cond_wait##GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
Id Target Id Frame
6 Thread 0xb5b7b430 (LWP 32573) "mono" 0xb6f07700 in sem_wait##GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
5 Thread 0xb5538430 (LWP 32574) "mono" 0xb6f09250 in nanosleep () from /lib/arm-linux-gnueabihf/libpthread.so.0
4 Thread 0xb52da430 (LWP 32576) "mono" 0xb6e6de84 in epoll_wait () from /lib/arm-linux-gnueabihf/libc.so.6
3 Thread 0xb52ba430 (LWP 32577) "mono" 0xb6f07954 in sem_timedwait () from /lib/arm-linux-gnueabihf/libpthread.so.0
2 Thread 0xb519a430 (LWP 1456) "mono" 0xb6f09a3c in waitpid () from /lib/arm-linux-gnueabihf/libpthread.so.0
* 1 Thread 0xb6fd9000 (LWP 32571) "mono" 0xb6f05494 in pthread_cond_wait##GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
Thread 6 (Thread 0xb5b7b430 (LWP 32573)):
#0 0xb6f07700 in sem_wait##GLIBC_2.4 () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1 0x001fb618 in mono_sem_wait (sem=0x2eff34, alertable=1) at mono-semaphore.c:119
#2 0x0017a52c in finalizer_thread (unused=<optimized out>) at gc.c:1073
#3 0x0015f3ec in start_wrapper_internal (data=0x974850) at threads.c:609
#4 start_wrapper (data=0x974850) at threads.c:654
#5 0x001f1718 in thread_start_routine (args=0x92f628) at wthreads.c:294
#6 0x001ff824 in inner_start_thread (arg=<optimized out>) at mono-threads-posix.c:49
#7 0xb6f00bfc in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#8 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#9 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 5 (Thread 0xb5538430 (LWP 32574)):
#0 0xb6f09250 in nanosleep () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1 0xb6f08044 in __pthread_enable_asynccancel () from /lib/arm-linux-gnueabihf/libpthread.so.0
#2 0x001f08f8 in SleepEx (ms=<optimized out>, alertable=162) at wthreads.c:842
#3 0x00160e80 in monitor_thread (unused=<optimized out>) at threadpool.c:779
#4 0x0015f3ec in start_wrapper_internal (data=0xa0e400) at threads.c:609
#5 start_wrapper (data=0xa0e400) at threads.c:654
#6 0x001f1718 in thread_start_routine (args=0x92f7d8) at wthreads.c:294
#7 0x001ff824 in inner_start_thread (arg=<optimized out>) at mono-threads-posix.c:49
#8 0xb6f00bfc in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#9 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#10 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 4 (Thread 0xb52da430 (LWP 32576)):
#0 0xb6e6de84 in epoll_wait () from /lib/arm-linux-gnueabihf/libc.so.6
#1 0x00161804 in tp_epoll_wait (p=0x2efd7c) at ../../mono/metadata/tpool-epoll.c:118
#2 0x0015f3ec in start_wrapper_internal (data=0xbead88) at threads.c:609
#3 start_wrapper (data=0xbead88) at threads.c:654
#4 0x001f1718 in thread_start_routine (args=0x92fb38) at wthreads.c:294
#5 0x001ff824 in inner_start_thread (arg=<optimized out>) at mono-threads-posix.c:49
#6 0xb6f00bfc in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#7 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#8 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 3 (Thread 0xb52ba430 (LWP 32577)):
#0 0xb6f07954 in sem_timedwait () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1 0x001fb6f8 in mono_sem_timedwait (sem=0x2efcfc, timeout_ms=<optimized out>, alertable=1) at mono-semaphore.c:82
#2 0x00163844 in async_invoke_thread (data=0xb6e16e00) at threadpool.c:1565
#3 0x0015f3ec in start_wrapper_internal (data=0xbea9c8) at threads.c:609
#4 start_wrapper (data=0xbea9c8) at threads.c:654
#5 0x001f1718 in thread_start_routine (args=0x92fbc8) at wthreads.c:294
#6 0x001ff824 in inner_start_thread (arg=<optimized out>) at mono-threads-posix.c:49
#7 0xb6f00bfc in start_thread () from /lib/arm-linux-gnueabihf/libpthread.so.0
#8 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
#9 0xb6e6d758 in ?? () from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Thread 2 (Thread 0xb519a430 (LWP 1456)):
#0 0xb6f09a3c in waitpid () from /lib/arm-linux-gnueabihf/libpthread.so.0
#1 0x000b1284 in mono_handle_native_sigsegv (signal=<optimized out>, ctx=<optimized out>) at mini-exceptions.c:2299
#2 0x000277e4 in mono_sigsegv_signal_handler (_dummy=11, info=0xb5199548, context=0xb51995c8) at mini.c:6777
#3 <signal handler called>
#4 mono_array_get_byte_length (array=0xb4e54010) at icall.c:6121
#5 ves_icall_System_Buffer_BlockCopyInternal (src=0xb3512010, src_offset=<optimized out>, dest=<optimized out>, dest_offset=<optimized out>, count=4096) at icall.c:6192
#6 0xb6817a18 in ?? ()
Cannot access memory at address 0xff8
=================================================================
Got a SIGSEGV while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries
used by your application.
=================================================================