ICorProfilerCallback2: CLR profiler does not log all Leave calls - c#

I am trying to write a profiler that logs all .Net method calls in a process. The goal is to make it highly performant and keep let's say the last 5-10 minutes in memory (fixed buffer, cyclically overwrite old info) until the user triggers that info to be written to disk. Intended use is to track down rarely reproducing performance issues.
I started off with the SimpleCLRProfiler project from https://github.com/appneta/SimpleCLRProfiler. The profiler makes use of the ICorProfilerCallback2 callback interface of .Net profiling. I got it to compile and work in my environment (Win 8.1, .Net 4.5, VS2012). However, I noticed that sometimes Leave calls are missing for which Enter calls were logged. Example of a Console.WriteLine call (I reduced the output of DbgView to what is minimally necessary to understand):
Line 1481: Entering System.Console.WriteLine
Line 1483: Entering SyncTextWriter.WriteLine
Line 1485: Entering System.IO.TextWriter.WriteLine
Line 1537: Leaving SyncTextWriter.WriteLine
Two Entering calls don't have corresponding Leaving calls. The profiled .Net code looks like this:
Console.WriteLine("Hello, Simple Profiler!");
The relevant SimpleCLRProfiler methods are:
HRESULT CSimpleProfiler::registerGlobalCallbacks()
{
HRESULT hr = profilerInfo3->SetEnterLeaveFunctionHooks3WithInfo(
(FunctionEnter3WithInfo*)MethodEntered3,
(FunctionEnter3WithInfo*)MethodLeft3,
(FunctionEnter3WithInfo*)MethodTailcall3);
if (FAILED(hr))
Trace_f(L"Failed to register global callbacks (%s)", _com_error(hr).ErrorMessage());
return S_OK;
}
void CSimpleProfiler::OnEnterWithInfo(FunctionID functionId, COR_PRF_ELT_INFO eltInfo)
{
MethodInfo info;
HRESULT hr = info.Create(profilerInfo3, functionId);
if (FAILED(hr))
Trace_f(L"Enter() failed to create MethodInfo object (%s)", _com_error(hr).ErrorMessage());
Trace_f(L"[%p] [%d] Entering %s.%s", functionId, GetCurrentThreadId(), info.className.c_str(), info.methodName.c_str());
}
void CSimpleProfiler::OnLeaveWithInfo(FunctionID functionId, COR_PRF_ELT_INFO eltInfo)
{
MethodInfo info;
HRESULT hr = info.Create(profilerInfo3, functionId);
if (FAILED(hr))
Trace_f(L"Enter() failed to create MethodInfo object (%s)", _com_error(hr).ErrorMessage());
Trace_f(L"[%p] [%d] Leaving %s.%s", functionId, GetCurrentThreadId(), info.className.c_str(), info.methodName.c_str());
}
Does anybody have an idea, why the .Net Profiler would not perform Leave calls for all leaving methods? By the way, I checked that the OnLeaveMethod does not unexpectedly exit before any trace due to an exception or so. It doesn't.
Thanks, Christoph

Since stakx does not seem to be coming back to my question to provide an official answer (and get the credit) so I will do it for him:
As stakx had hinted at, I didn't log tail calls. In fact, I wasn't even aware of the concept so I had completely ignored that hook method (it was wired up but empty). I found a good explanation of tail calls here: David Broman's CLR Profiling API Blog: Enter, Leave, Tailcall Hooks Part 2: Tall tales of tail calls.
I quote from the link above:
Tail calling is a compiler optimization that saves execution of instructions and saves reads and writes of stack memory. When the last thing a function does is call another function (and other conditions are favorable), the compiler may consider implementing that call as a tail call, instead of a regular call.
Consider this code:
static public void Main() {
Helper();
}
static public void Helper() {
One();
Three();
}
static public void Three() {
...
}
When method Three is called, without tail call optimization, the stack will look like this.
Three
Helper
Main
With tail call optimization, the stack looks like this:
Three
Main
So before calling Three, due to the optimization, method Helper was already popped of the stack and as a result, there is one less method on the stack (less memory usage) and also some executions and memory write operations were saved.

Related

Threads increase abnormally in linux service

I have a service that runs in linux under SystemD but gets compiled and debugged in VS22 under Windows.
The service is mainly a proxy to a MariaDB10 database shaped as a BackgroundWorker serving clients via SignalR.
If I run it in relase mode on Windows, the number of logical threads remains in a reasonable value (20-25 approx). See pic below.
Under linux, after few minutes (i cannot give you more insight unfortuantely... i still have to figure out what could be changing) the number of threads start increasing constantly every second.
see pic here arriving already to more than 100 and still counting:
Reading current logical threads increasing / thread stack is leaking i got confirmed that the CLR is allowing new threads if the others are not completing, but there is currently no change in the code when moving from Windows to Linux.
This is the HostBuilder with the call to SystemD
 public static IHostBuilder CreateWebHostBuilder(string[] args)
        {
            string curDir = MondayConfiguration.DefineCurrentDir();
            IConfigurationRoot config = new ConfigurationBuilder()
                // .SetBasePath(Directory.GetCurrentDirectory())
                .SetBasePath(curDir)
                .AddJsonFile("servicelocationoptions.json", optional: false, reloadOnChange: true)
#if DEBUG
                   .AddJsonFile("appSettings.Debug.json")
#else
                   .AddJsonFile("appSettings.json")
#endif
                   .Build();
            return Host.CreateDefaultBuilder(args)
                .UseContentRoot(curDir)
                .ConfigureAppConfiguration((_, configuration) =>
                {
                    configuration
                    .AddIniFile("appSettings.ini", optional: true, reloadOnChange: true)
#if DEBUG
                   .AddJsonFile("appSettings.Debug.json")
#else
                   .AddJsonFile("appSettings.json")
#endif
                    .AddJsonFile("servicelocationoptions.json", optional: false, reloadOnChange: true);
                })
                .UseSerilog((_, services, configuration) => configuration
                    .ReadFrom.Configuration(config, sectionName: "AppLog")// (context.Configuration)
                    .ReadFrom.Services(services)
                    .Enrich.FromLogContext()
                    .WriteTo.Console())
                // .UseSerilog(MondayConfiguration.Logger)
                .ConfigureServices((hostContext, services) =>
                {
                    services
                    .Configure<ServiceLocationOptions>(hostContext.Configuration.GetSection(key: nameof(ServiceLocationOptions)))
                    .Configure<HostOptions>(opts => opts.ShutdownTimeout = TimeSpan.FromSeconds(30));
                })
                .ConfigureWebHostDefaults(webBuilder =>
                {
                    webBuilder.UseStartup<Startup>();
                    ServiceLocationOptions locationOptions = config.GetSection(nameof(ServiceLocationOptions)).Get<ServiceLocationOptions>();
                    string url = locationOptions.HttpBase + "*:" + locationOptions.Port;
                    webBuilder.UseUrls(url);
                })
                .UseSystemd();
        }
In the meantime I am trying to trace all the Monitor.Enter() that I use to render serial the API endpoints that touch the state of the service and the inner structures, but in Windows seems all ok.
I am starting wondering if the issue in the call to SystemD. I would like to know what is really involved in a call to UseSystemD() but there is not so much documentation around.
I did just find [https://devblogs.microsoft.com/dotnet/net-core-and-systemd/] (https://devblogs.microsoft.com/dotnet/net-core-and-systemd/) by Glenn Condron and few quick notes on MSDN.
EDIT 1: To debug further I created a class to scan the threadpool using ClrMd.
My main service has an heartbeat (weird it is called Ping) as follows (not the add to processTracker.Scan()):
private async Task Ping()
{
await _containerServer.SyslogQueue.Writer.WriteAsync((
LogLevel.Information,
$"Monday Service active at: {DateTime.UtcNow.ToLocalTime()}"));
string processMessage = ProcessTracker.Scan();
await _containerServer.SyslogQueue.Writer.WriteAsync((LogLevel.Information, processMessage));
_logger.DebugInfo()
.Information("Monday Service active at: {Now}", DateTime.UtcNow.ToLocalTime());
}
where the processTrackes id constructed like this:
public static class ProcessTracker
{
static ProcessTracker()
{
}
public static string Scan()
{
// see https://stackoverflow.com/questions/31633541/clrmd-throws-exception-when-creating-runtime/31745689#31745689
StringBuilder sb = new();
string answer = $"Active Threads{Environment.NewLine}";
// Create the data target. This tells us the versions of CLR loaded in the target process.
int countThread = 0;
var pid = Process.GetCurrentProcess().Id;
using (var dataTarget = DataTarget.AttachToProcess(pid, 5000, AttachFlag.Passive))
{
// Note I just take the first version of CLR in the process. You can loop over
// every loaded CLR to handle the SxS case where both desktop CLR and .Net Core
// are loaded in the process.
ClrInfo version = dataTarget.ClrVersions[0];
var runtime = version.CreateRuntime();
// Walk each thread in the process.
foreach (ClrThread thread in runtime.Threads)
{
try
{
sb = new();
// The ClrRuntime.Threads will also report threads which have recently
// died, but their underlying data structures have not yet been cleaned
// up. This can potentially be useful in debugging (!threads displays
// this information with XXX displayed for their OS thread id). You
// cannot walk the stack of these threads though, so we skip them here.
if (!thread.IsAlive)
continue;
sb.Append($"Thread {thread.OSThreadId:X}:");
countThread++;
// Each thread tracks a "last thrown exception". This is the exception
// object which !threads prints. If that exception object is present, we
// will display some basic exception data here. Note that you can get
// the stack trace of the exception with ClrHeapException.StackTrace (we
// don't do that here).
ClrException? currException = thread.CurrentException;
if (currException is ClrException ex)
sb.AppendLine($"Exception: {ex.Address:X} ({ex.Type.Name}), HRESULT={ex.HResult:X}");
// Walk the stack of the thread and print output similar to !ClrStack.
sb.AppendLine(" ------> Managed Call stack:");
var collection = thread.EnumerateStackTrace().ToList();
foreach (ClrStackFrame frame in collection)
{
// Note that CLRStackFrame currently only has three pieces of data:
// stack pointer, instruction pointer, and frame name (which comes
// from ToString). Future versions of this API will allow you to get
// the type/function/module of the method (instead of just the
// name). This is not yet implemented.
sb.AppendLine($" {frame}");
}
}
catch
{
//skip to the next
}
finally
{
answer += sb.ToString();
}
}
}
answer += $"{Environment.NewLine} Total thread listed: {countThread}";
return answer;
}
}
All fine, in Windows it prints a lot of nice information in some kind of tree textual view.
The point is that somewhere it requires Kernel32.dll and in linux that is not available. Can someone give hints on this? The service is published natively without .NET infrastructure, in release mode, arch linux64, single file.
thanks a lot
Alex
I found a way to skip the whole logging of what I needed from a simple debug session.
I was not aware I could attach also to a Systemd process remotely.
Just followed https://learn.microsoft.com/en-us/visualstudio/debugger/remote-debugging-dotnet-core-linux-with-ssh?view=vs-2022 for a quick step by step guide.
The only preresquisites are to let the service be in debug mode and have the NET runtime installed on the host, but that's really all.
Sorry for not having known this earlier.
Alex

Why does putting DoEvents in a loop cause a StackOverflow exception?

I was getting a weird error in a legacy application (not written by myself), where I was getting a StackOverflow exception when I changed the date on a calendar.
A simplified version is below. This is the code-behind of a Windows Form containing two controls, a Label called label2 and a calendar called MonthCalendar called monthCalendar1.
I think the idea here was to create a typewriter effect. I am on XP, my colleague on Windows 7 is able to run this ok:
private void monthCalendar1_DateChanged(object sender, DateRangeEventArgs e)
{
const string sTextDisplay = "Press Generate button to build *** Reports ... ";
for (var i = 0; i < 45; i++)
{
label2.Text = Mid(sTextDisplay, 1, i);
System.Threading.Thread.Sleep(50);
//Error on this line
//An unhandled exception of type 'System.StackOverflowException' occurred in System.Windows.Forms.dll
Application.DoEvents();
}
}
public static string Mid(string s, int a, int b)
{
var temp = s.Substring(a - 1, b);
return temp;
}
I can't see the stack trace, all I see is:
{Cannot evaluate expression because the current thread is in a stack overflow state.}
Also, I'm interested in the comments asking why I haven't checked the stack trace of my StackOverflow exception, as it looks like this isn't possible without third party tools at least.
What could be causing this? Thanks
Remember, programs are stack-based. As your program runs, every function call places a new entry on the stack. Every time a function completes, you pop from the stack to see where to go back to, so you can continue the prior method. When a function completes and the stack is empty, the program ends.
It's important to remember the program stack is generous, but finite. You can only put so many function calls on the stack before it runs out of space. This is what happens when we say the stack overflows.
DoEvents() is just another function call. You might put it in a long-running task to allow your program to handle messages from the operating system about user activity: things like clicks, keystrokes, etc. It also allows your program to handle messages from the operating system, for example if the program needs to re-draw it's windows.
Normally, there will only be one or two (or even zero) messages waiting for a DoEvents() call. Your program handles these, the DoEvents() call is popped off the stack, and the original code continues. Sometimes, there may be many messages waiting. If any of those messages also results in code running that again calls to DoEvents(), we are now another level deep in the call stack. And if that code in turn finds a message waiting which causes DoEvents() to run, we'll be yet another level deep. Maybe you can see where this is going.
DoEvents(), used in conjuction with the MouseMove event, is a common source of problems like this. MouseMove events can pile up on you very quickly. This can also happen with KeyPress events, when you have a key that is held down.
Normally, I wouldn't expect a Calendar DateChanged event to have this kind of problem, but if you have DoEvents() somewhere else, or drive another event (perhaps on your label) that in turn updates your calendar, you can easily create a cycle that will force your program to spiral into a stack overflow situation.
What you want to do instead is explore the BackgroundWorder component, or the newer Task and async patterns.
You may also want to read my write-up on DoEvents() for this question:
How to use DoEvents() without being "evil"?
Normally you have a message pump pretty close to the top of the stack. Adding lots of messages isn't ever resulting in a "deep" stack, as they are all processed by a top level pump. Using DoEvents is creating a new message pump at a point deeper in the stack. If one of the messages that you are pumping also calls DoEvents, you now have a message pump even deeper on in the stack. If that message pump has another message that calls DoEvents ... and you get the idea.
The only way for the stack to clear up again is for the message queue to be empty, at which point you start calling back up the stack until you get to the top level message pump.
The problem here is that your code doesn't make it easy. It calls DoEvents a lot in a loop, so it needs to have an idle queue for quite some time to actually get back out of that loop. On top of that, if you happen to have an "active" application that's sending lots of messages to the message queue, possibly lots of monthCalendar1_DateChanged events, or even other events using DoEvents in a loop, or just other events to keep the queue from being empty, it's not particularly hard to believe that your stack would get deep enough to result in an SOE.
The ideal solution of course is to not use DoEvents. Write asynchronous code instead, so that your stack depth never exceeds a constant value.
DoEvents shouldn't use in any case and you don't require substring to archive a TypeWriting effect
Here is the best way I know at the moment:
using System.Threading;
private string text = "this is my test string";
private void button1_Click(object sender, EventArgs e)
{
new Thread(loop).Start();
}
private void loop()
{
for (int i = 0; i < text.Length; i++)
{
AddChar(text[i]);
Thread.Sleep(50);
}
}
private void AddChar(char c)
{
if (label1.InvokeRequired)
Invoke((MethodInvoker)delegate { AddChar(c); });
else
label1.Text += c;
}

Wrong exit code of a Windows process (C#/C++/etc.)?

Our C# application exits with code 0, even though it is explicitly returning -1 in the code:
internal class Program
{
public int Main()
{
....
return -1;
}
}
The same happened if void Main was used:
internal class Program
{
public void Main()
{
....
Environment.Exit(-1);
}
}
As other questions on SO suggested it could have been an unhandled CLR/C++/native exception in some other thread.
However I've added graceful shutdown of all managed/native threads right before this the last one, but the behavior stayed.
What could be the reason?
Turns out this happened because we used JobObjects to make sure that all child process exit when current process exits using this code in C (we actually p-invoked from C#):
HANDLE h = ::CreateJobObject(NULL, NULL);
JOBOBJECT_EXTENDED_LIMIT_INFORMATION info;
::ZeroMemory(&info, sizeof(info));
info.BasicLimitInformation.LimitFlags = JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE;
::SetInformationJobObject(h, JobObjectExtendedLimitInformation, &info, sizeof(info));
::AssignProcessToJobObject(h, ::GetCurrentProcess());
...
::CloseHandle(h);
return -1;
This code adds the current process and all its child processes to a job object which will be closed on current process exit.
BUT it has a side-effect when CloseHandle was invoked it would kill the current process without ever reaching to the line return -1. And since JOB_OBJECT_LIMIT_KILL_ON_JOB_CLOSE flag automatically kills all processes there is no way to set a exit code for all processes, so OS exited the process with exit code 0.
In C# we followed standard guidelines to clean-up resources and used SafeHandle-derived class to make sure that CloseHandle is invoked and absolutely the same happened - before CLR actually exited it invoked ::CloseHandle for all SafeHandles ignoring the actual return code set both by return value and Environment.Exit.
However what's even more interesting, is that if an explicit (or not so explicit) call to CloseHandle is removed in both C# and C++, OS will still close all the handles at the process exit after CLR/CRT exited, and the actual exit code will be returned. So sometimes it is good not to clean-up resources :-) or in another words, until a native ::ExitProcess is invoked, you can't guarantee that the exit code will be intact.
So to fix this particular issue I could either call AssignProcessToJobObject whenever a child process is started or removed the explicit (or not so explicit) call to CloseHandle. I chose the first approach.

Make my COM assembly call asynchronous

I've just "earned" the privilege to maintain a legacy library coded in C# at my current work.
This dll:
Exposes methods for a big legacy system made with Uniface, that has no choice but calling COM objects.
Serves as a link between this legacy system, and another system's API.
Uses WinForm for its UI in some cases.
More visually, as I understand the components :
*[Big legacy system in Uniface]* ==[COM]==> [C# Library] ==[Managed API]==> *[Big EDM Management System]*
The question is: One of the methods in this C# Library takes too long to run and I "should" make it asynchronous!
I'm used to C#, but not to COM at all. I've already done concurrent programming, but COM seems to add a lot of complexity to it and all my trials so far end in either:
A crash with no error message at all
My Dll only partially working (displaying only part of its UI, and then closing), and still not giving me any error at all
I'm out of ideas and resources about how to handle threads within a COM dll, and I would appreciate any hint or help.
So far, the biggest part of the code I've changed to make my method asynchronous :
// my public method called by the external system
public int ComparedSearch(string application, out string errMsg) {
errMsg = "";
try {
Action<string> asyncOp = AsyncComparedSearch;
asyncOp.BeginInvoke(application, null, null);
} catch (ex) {
// ...
}
return 0;
}
private int AsyncComparedSearch(string application) {
// my actual method doing the work, that was the called method before
}
Any hint or useful resource would be appreciated.
Thank you.
UPDATE 1:
Following answers and clues below (especially about the SynchronizationContext, and with the help of this example) I was able to refactor my code and making it to work, but only when called from another Window application in C#, and not through COM.
The legacy system encounters a quite obscure error when I call the function and doesn't give any details about the crash.
UPDATE 2:
Latest updates in my trials: I managed to make the multithreading work when the calls are made from a test project, and not from the Uniface system.
After multiple trials, we tend to think that our legacy system doesn't support well multithreading in its current config. But that's not the point of the question any more :)
Here is a exerpt of the code that seems to work:
string application;
SynchronizationContext context;
// my public method called by the external system
public int ComparedSearch(string application, out string errMsg) {
this.application = application;
context = WindowsFormsSynchronizationContext.Current;
Thread t = new Thread(new ThreadStart(AsyncComparedSearchAndShowDocs));
t.Start();
errMsg = "";
return 0;
}
private void AsyncComparedSearch() {
// ANY WORK THAT AS NOTHING TO DO WITH UI
context.Send(new SendOrPostCallback(
delegate(object state)
{
// METHODS THAT MANAGE UI SOMEHOW
}
), null);
}
We are now considering other solutions than modifying this COM assembly, like encapsulating this library in a Windows Service and creating an interface between the system and the service. It should be more sustainable..
It is hard to tell without knowing more details, but there are few issues here.
You execute the delegate on another thread via BeginInvoke but you don't wait for it. Your try\catch block won't catch anything as it has already passed while the remote call is still being executed. Instead, you should put try\catch block inside AsyncComparedSearch.
As you don't wait for the end of the execution of remote method (EndInvoke or via callback) I am not sure how do you handle the results of the COM call. I guess then that you update the GUI from within AsyncComparedSearch. If so, it is wrong, as it is running on another thread and you should never update GUI from anywhere but the GUI thread - it will most likely result with a crash or other unexpected behavior. Therefore, you need to sync the GUI update work to GUI thread. In WinForms you need to use Control.BeginInvoke (don't confuse it with Delegate.BeginInvoke) or some other way (e.g. SynchronizationContext) to sync the code to GUI thread. I use something similar to this:
private delegate void ExecuteActionHandler(Action action);
public static void ExecuteOnUiThread(this Form form, Action action)
{
if (form.InvokeRequired) { // we are not on UI thread
// Invoke or BeginInvoke, depending on what you need
form.Invoke(new ExecuteActionHandler(ExecuteOnUiThread), action);
}
else { // we are on UI thread so just execute the action
action();
}
}
then I call it like this from any thread:
theForm.ExecuteOnUiThread( () => theForm.SomeMethodWhichUpdatesControls() );
Besides, read this answer for some caveats.

Window Service and C# design pattern question

I have recently taken over a legacy windows service and it has been writing the following event in the system event log:
Event ID: 7034 Description: The
MyService service terminated
unexpectedly. It has done this X
time(s).
I was looking over source code and found the following code pattern in the service class library:
(It has been simplified to protect the innocent..)
public static void StartService()
{
//do some stuff...
ManageCycle();
}
public static void ManageCycle()
{
//do some stuff
ManageCycle();
}
What is this coding patten called and could it possibly cause the windows service to shutdown (i.e. memory leak)?
This looks like the stack overflow exception pattern. Eran is correct. Use a while loop:
public static void StartService()
{
//do some stuff...
isRunning = true;
ManageCycle();
}
public static void ManageCycle()
{
while(isRunning)
{
//do some stuff and wrap in exception handling
}
}
public static void StopService()
{
isRunning=false;
}
It suppose to throw StackOverflow (HA HA :) ) Exception, because of the endless recursive calling.
Take a look at this example - you should choose the technique that fits your architecture.
That's a recursive call that will ultimately blow the stack.
The best answer for this kind of situation:
Don't use Recursive Algorithms unless your algorithm has a recursive structure. For example, if you're analyzing a file system, and want to scan a specific Directory, you'd want to do something like:
void ScanDirectory(Directory)
{
// Handle Files
if (currfile.directory)
ScanDirectory(currfile)
}
This makes sense because it's much easier than doing it iteratively. But otherwise, when you're just repeating an action over and over again, making it a recursion is completely unnecessary and will cause code inefficiency and eventually stack overflows.
This is a recursive call with apparently no exit criteria. Eventually it will run out of stack since a call to ManageCycle never returns.
In addition the StartService method will never return, it ought to be spining up at least one foreground thread and then return.
Recursion, it ooks like it's recursively calling itself. I'm surprised there isn't a stack overflow exception. Perhaps the service property on the machine running this is configured to restart the service on failure.
It's recursive alright. It will keep calling itself repeatedly (a bad thing) and that will result in a stackoverflow.
What the "//do some stuff" do? Maybe there is a good reason that it calls itself,
bBut without a way to get out of the loop (recursive), the application will exit.

Categories