C# Performance benchmark - c#

To keep track of performance in our software we measure the duration of calls we are interested in.
for example:
using(var performanceTrack = new PerformanceTracker("pt-1"))
{
// do some stuff
CallAnotherMethod();
using(var anotherPerformanceTrack = new PerformanceTracker("pt-1a"))
{
// do stuff
// .. do something
}
using(var anotherPerformanceTrackb = new PerformanceTracker("pt-1b"))
{
// do stuff
// .. do something
}
// do more stuff
}
This will result in something like:
pt-1 [----------------------------] 28ms
[--] 2ms from another method
pt-1a [-----------] 11ms
pt-1b [-------------] 13ms
In the constructor of PerformanceTracker I start a stopwatch. (As far as I know it's the most reliable way to measure a duration.) In the dispose method I stop the stopwatch and save the results to application insights.
I have noticed a lot of fluctation between the results. To solve this I've already done the following:
Run in release built, outside of visual studio.
Warm up call first, not included in to the statistics.
Before every call (total 75 calls) I call the garbage collector.
After this the fluctation is less, but still not very accurate. For example I have run my test set twice. Both times
See here the results in milliseconds.
Avg: 782.946666666667 981.68
Min: 489 vs 513
Max: 2600 vs 4875
stdev: 305.854933523003 vs 652.343471128764
sampleSize: 75 vs 75
Why is the performance measurement with the stopwatch still giving a lot of variation in the results? I found on SO (https://stackoverflow.com/a/16157458/1408786) that I should maybe add the following to my code:
//prevent the JIT Compiler from optimizing Fkt calls away
long seed = Environment.TickCount;
//use the second Core/Processor for the test
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(2);
//prevent "Normal" Processes from interrupting Threads
Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
//prevent "Normal" Threads from interrupting this thread
Thread.CurrentThread.Priority = ThreadPriority.Highest;
But the problem is, we have a lot of async code. How can I get a reliable performance track in the code? My aim is to discover performance degradation when for example after a check in a method is 10ms slower than before...

Related

Why is my application becoming less responsive over time?

I'm debugging a C# application that becomes almost unresponsive after a few days. The application calculates memory/CPU usage every second and displays it in the footer of the main UI.
The cause for the unresponsiveness is because of the time it takes to fetch the RawValue of a PerformanceCounter ("Working Set - Private"). After a couple of days, it takes almost a second to fetch the RawValue, freezing the main UI thread. If I restart my computer, everything is fast again for a few days until it slowly becomes less responsive. If I recompile this application without the PerformanceCounter code (it's open source), it runs normally immediately.
To rule out that it's the application, here's some sample code that does the exact same thing:
static void Main(string[] args)
{
using (var memoryWorkingSetCounter = new PerformanceCounter("Process", "Working Set - Private", Process.GetCurrentProcess().ProcessName, true))
{
while (!Console.KeyAvailable)
{
var memoryWorkingSetSw = new Stopwatch();
memoryWorkingSetSw.Start();
var memoryWorkingSetValue = memoryWorkingSetCounter.RawValue;
memoryWorkingSetSw.Stop();
Console.WriteLine(memoryWorkingSetValue.ToString());
Console.WriteLine(memoryWorkingSetSw.Elapsed.ToString());
Thread.Sleep(TimeSpan.FromSeconds(1));
}
}
Console.Read();
}
I left this running for ~2.5 days and graphed the Elapsed time in milliseconds:
What could cause a performance counter in Windows to become slow over time? Could another app be not cleaning it up? Is there a way to debug which apps are also looking at this performance counter? I'm on Windows 10.
why you declare Stopwatch inside a loop?
remove any new declaration possible inside loops.
when you do so the memory is increasing overtime and you count on garbage collector to do the work

Memory leak in MediaPlayer

Can someone please explain why the following program runs out of memory?
class Program
{
private static void ThreadRoutine()
{
System.Windows.Media.MediaPlayer player = new System.Windows.Media.MediaPlayer();
}
static void Main(string[] args)
{
Thread aThread;
int iteration = 1;
while (true)
{
aThread = new Thread(ThreadRoutine);
aThread.Start();
aThread.Join();
Console.WriteLine("Iteration: " + iteration++);
}
}
}
To be fair, the specific exception I get is a System.ComponentModel.Win32Exception, "Not enough storage is available to process this command." The exception happens on the attempt to create a new MediaPlayer.
MediaPlayer does not implement the IDisposable interface so I'm not sure if there is other cleanup necessary. I certainly didn't find any in the MediaPlayer documentation.
I had an application long ago, which rendered data to image, and placed it over a PictureBox as result, like a pipe line of graphical engine, ...
The thing ware in common with you, were the lack of memory, it was like this,...
66%
67%
68%
69%
70%
71%
72%
73%
74%
70% -->Auto Garbage Collecting
71%
72%
73%
74%
75%
76%
77%
78%
79%
80%
81%
77% -->Auto Garbage Collecting
78%
79%
.
.
.
And a lot of Auto Garbage Collecting at high memory around 90% to 97%,...
but seem it wasn't enough, and at some point around 97% and 98% system (application) crashed with memory issue...
so i come with this idea, to call garbage collector every several frame,
so i did this:
GC.Collect();
GC.Collect it self is to heavy, also when there are too many object inside memory,... but when you don't left too many object, it work smooth,...
there are also many thing about finalizing objects, but I'm not sure if they work properly, as they naturally end in -> x = null
which mean, we broke the link to that object, but doesn't mean we do not have this object some where around galaxy (for example you can fill object destructor with a message, and see after leaving that object it won't get destroyed immediately, it take till you close your application / call GC.Collect / Lack on memory, or maybe random auto collect),... until it get destroyed,... so, for example as the "using" directive, it self call "Dispose" method, and we fill our dispose with obj= null,... then i'm not sure if i should recommend you writing manual finalize method, but there's still one thing... there's a Marshal object, which i don't know where it come from, and if it work over all object, i see some programmer use it to release objects, ..., i don't know how it work, but it maybe really release the object from our memory,...
If you did not found anything useful about it, my first option will be calling GC.Collect...
also you can set it's parameters, so it collect less object, than it would do normally.
https://msdn.microsoft.com/en-us/library/xe0c2357%28v=vs.110%29.aspx
Release resources in .Net C#
i also didn't read this fully, but seem he has thread and memory issue:
Freeing resources when thread is not alive
My guess is that the Thread objects created aren't garbage collected in time.
As you are stating in your comment that you are out of luck with respect to finding a solution that doesn't involve a memory leak, I'm posting this alternative as an answer even though it doesn't attempt to answer your original question.
If you use the ThreadPool instead, it (a) runs much faster and (b) does not crash.
class Program
{
private static void ThreadRoutine(object state)
{
var player = new MediaPlayer();
var iteration = (int)state;
if (iteration % 1000 == 0)
{
Console.WriteLine("Executed: " + state);
}
}
static void Main(string[] args)
{
for (int i = 0; i < 10000000; i++)
{
if (i % 1000 == 0)
{
Console.WriteLine("Queued: " + i);
}
ThreadPool.QueueUserWorkItem(ThreadRoutine, i);
}
}
}
On my machine, I have no problem creating 10 million thread-pool threads in a few seconds.
Queued: 9988000
Queued: 9989000
Queued: 9990000
Queued: 9991000
Executed: 9989000
Executed: 9990000
Executed: 9991000
Executed: 9988000
Queued: 9992000
Executed: 9992000
Queued: 9993000
Queued: 9994000
Queued: 9995000
Executed: 9994000
Executed: 9993000
Queued: 9996000
Executed: 9996000
Executed: 9995000
Queued: 9997000
Executed: 9997000
Queued: 9998000
Executed: 9998000
Queued: 9999000
Executed: 9999000
Press any key to continue . . .

Why is Parallel.ForEach much faster then AsParallel().ForAll() even though MSDN suggests otherwise?

I've been doing some investigation to see how we can create a multithreaded application that runs through a tree.
To find how this can be implemented in the best way I've created a test application that runs through my C:\ disk and opens all directories.
class Program
{
static void Main(string[] args)
{
//var startDirectory = #"C:\The folder\RecursiveFolder";
var startDirectory = #"C:\";
var w = Stopwatch.StartNew();
ThisIsARecursiveFunction(startDirectory);
Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);
Console.ReadKey();
}
public static void ThisIsARecursiveFunction(String currentDirectory)
{
var lastBit = Path.GetFileName(currentDirectory);
var depth = currentDirectory.Count(t => t == '\\');
//Console.WriteLine(depth + ": " + currentDirectory);
try
{
var children = Directory.GetDirectories(currentDirectory);
//Edit this mode to switch what way of parallelization it should use
int mode = 3;
switch (mode)
{
case 1:
foreach (var child in children)
{
ThisIsARecursiveFunction(child);
}
break;
case 2:
children.AsParallel().ForAll(t =>
{
ThisIsARecursiveFunction(t);
});
break;
case 3:
Parallel.ForEach(children, t =>
{
ThisIsARecursiveFunction(t);
});
break;
default:
break;
}
}
catch (Exception eee)
{
//Exception might occur for directories that can't be accessed.
}
}
}
What I have encountered however is that when running this in mode 3 (Parallel.ForEach) the code completes in around 2.5 seconds (yes I have an SSD ;) ). Running the code without parallelization it completes in around 8 seconds. And running the code in mode 2 (AsParalle.ForAll()) it takes a near infinite amount of time.
When checking in process explorer I also encounter a few strange facts:
Mode1 (No Parallelization):
Cpu: ~25%
Threads: 3
Time to complete: ~8 seconds
Mode2 (AsParallel().ForAll()):
Cpu: ~0%
Threads: Increasing by one per second (I find this strange since it seems to be waiting on the other threads to complete or a second timeout.)
Time to complete: 1 second per node so about 3 days???
Mode3 (Parallel.ForEach()):
Cpu: 100%
Threads: At most 29-30
Time to complete: ~2.5 seconds
What I find especially strange is that Parallel.ForEach seems to ignore any parent threads/tasks that are still running while AsParallel().ForAll() seems to wait for the previous Task to either complete (which won't soon since all parent Tasks are still waiting on their child tasks to complete).
Also what I read on MSDN was: "Prefer ForAll to ForEach When It Is Possible"
Source: http://msdn.microsoft.com/en-us/library/dd997403(v=vs.110).aspx
Does anyone have a clue why this could be?
Edit 1:
As requested by Matthew Watson I've first loaded the tree in memory before looping through it. Now the loading of the tree is done sequentially.
The results however are the same. Unparallelized and Parallel.ForEach now complete the whole tree in about 0.05 seconds while AsParallel().ForAll still only goes around 1 step per second.
Code:
class Program
{
private static DirWithSubDirs RootDir;
static void Main(string[] args)
{
//var startDirectory = #"C:\The folder\RecursiveFolder";
var startDirectory = #"C:\";
Console.WriteLine("Loading file system into memory...");
RootDir = new DirWithSubDirs(startDirectory);
Console.WriteLine("Done");
var w = Stopwatch.StartNew();
ThisIsARecursiveFunctionInMemory(RootDir);
Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);
Console.ReadKey();
}
public static void ThisIsARecursiveFunctionInMemory(DirWithSubDirs currentDirectory)
{
var depth = currentDirectory.Path.Count(t => t == '\\');
Console.WriteLine(depth + ": " + currentDirectory.Path);
var children = currentDirectory.SubDirs;
//Edit this mode to switch what way of parallelization it should use
int mode = 2;
switch (mode)
{
case 1:
foreach (var child in children)
{
ThisIsARecursiveFunctionInMemory(child);
}
break;
case 2:
children.AsParallel().ForAll(t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
case 3:
Parallel.ForEach(children, t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
default:
break;
}
}
}
class DirWithSubDirs
{
public List<DirWithSubDirs> SubDirs = new List<DirWithSubDirs>();
public String Path { get; private set; }
public DirWithSubDirs(String path)
{
this.Path = path;
try
{
SubDirs = Directory.GetDirectories(path).Select(t => new DirWithSubDirs(t)).ToList();
}
catch (Exception eee)
{
//Ignore directories that can't be accessed
}
}
}
Edit 2:
After reading the update on Matthew's comment I've tried to add the following code to the program:
ThreadPool.SetMinThreads(4000, 16);
ThreadPool.SetMaxThreads(4000, 16);
This however does not change how the AsParallel peforms. Still the first 8 steps are being executed in an instant before slowing down to 1 step / second.
(Extra note, I'm currently ignoring the exceptions that occur when I can't access a Directory by the Try Catch block around the Directory.GetDirectories())
Edit 3:
Also what I'm mainly interested in is the difference between Parallel.ForEach and AsParallel.ForAll because to me it's just strange that for some reason the second one creates one Thread for every recursion it does while the first once handles everything in around 30 threads max. (And also why MSDN suggests to use the AsParallel even though it creates so much threads with a ~1 second timeout)
Edit 4:
Another strange thing I found out:
When I try to set the MinThreads on the Thread pool above 1023 it seems to ignore the value and scale back to around 8 or 16:
ThreadPool.SetMinThreads(1023, 16);
Still when I use 1023 it does the first 1023 elements very fast followed by going back to the slow pace I've been experiencing all the time.
Note: Also literally more then 1000 threads are now created (compared to 30 for the whole Parallel.ForEach one).
Does this mean Parallel.ForEach is just way smarter in handling tasks?
Some more info, this code prints twice 8 - 8 when you set the value above 1023: (When you set the values to 1023 or lower it prints the correct value)
int threadsMin;
int completionMin;
ThreadPool.GetMinThreads(out threadsMin, out completionMin);
Console.WriteLine("Cur min threads: " + threadsMin + " and the other thing: " + completionMin);
ThreadPool.SetMinThreads(1023, 16);
ThreadPool.SetMaxThreads(1023, 16);
ThreadPool.GetMinThreads(out threadsMin, out completionMin);
Console.WriteLine("Now min threads: " + threadsMin + " and the other thing: " + completionMin);
Edit 5:
As of Dean's request I've created another case to manually create tasks:
case 4:
var taskList = new List<Task>();
foreach (var todo in children)
{
var itemTodo = todo;
taskList.Add(Task.Run(() => ThisIsARecursiveFunctionInMemory(itemTodo)));
}
Task.WaitAll(taskList.ToArray());
break;
This is also as fast as the Parallel.ForEach() loop. So we still don't have the answer to why AsParallel().ForAll() is so much slower.
This problem is pretty debuggable, an uncommon luxury when you have problems with threads. Your basic tool here is the Debug > Windows > Threads debugger window. Shows you the active threads and gives you a peek at their stack trace. You'll easily see that, once it gets slow, that you'll have dozens of threads active that are all stuck. Their stack trace all look the same:
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout, bool exitContext) + 0x16 bytes
mscorlib.dll!System.Threading.Monitor.Wait(object obj, int millisecondsTimeout) + 0x7 bytes
mscorlib.dll!System.Threading.ManualResetEventSlim.Wait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x182 bytes
mscorlib.dll!System.Threading.Tasks.Task.SpinThenBlockingWait(int millisecondsTimeout, System.Threading.CancellationToken cancellationToken) + 0x93 bytes
mscorlib.dll!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler scheduler, bool waitForCompletion) + 0xba bytes
mscorlib.dll!System.Threading.Tasks.Task.RunSynchronously(System.Threading.Tasks.TaskScheduler scheduler) + 0x13 bytes
System.Core.dll!System.Linq.Parallel.SpoolingTask.SpoolForAll<ConsoleApplication1.DirWithSubDirs,int>(System.Linq.Parallel.QueryTaskGroupState groupState, System.Linq.Parallel.PartitionedStream<ConsoleApplication1.DirWithSubDirs,int> partitions, System.Threading.Tasks.TaskScheduler taskScheduler) Line 172 C#
// etc..
Whenever you see something like this, you should immediately think fire-hose problem. Probably the third-most common bug with threads, after races and deadlocks.
Which you can reason out, now that you know the cause, the problem with the code is that every thread that completes adds N more threads. Where N is the average number of sub-directories in a directory. In effect, the number of threads grows exponentially, that's always bad. It will only stay in control if N = 1, that of course never happens on an typical disk.
Do beware that, like almost any threading problem, that this misbehavior tends to repeat poorly. The SSD in your machine tends to hide it. So does the RAM in your machine, the program might well complete quickly and trouble-free the second time you run it. Since you'll now read from the file system cache instead of the disk, very fast. Tinkering with ThreadPool.SetMinThreads() hides it as well, but it cannot fix it. It never fixes any problem, it only hides them. Because no matter what happens, the exponential number will always overwhelm the set minimum number of threads. You can only hope that it completes finishing iterating the drive before that happens. Idle hope for a user with a big drive.
The difference between ParallelEnumerable.ForAll() and Parallel.ForEach() is now perhaps also easily explained. You can tell from the stack trace that ForAll() does something naughty, the RunSynchronously() method blocks until all the threads are completed. Blocking is something threadpool threads should not do, it gums up the thread pool and won't allow it to schedule the processor for another job. And has the effect you observed, the thread pool is quickly overwhelmed with threads that are waiting on the N other threads to complete. Which isn't happening, they are waiting in the pool and are not getting scheduled because there are already so many of them active.
This is a deadlock scenario, a pretty common one, but the threadpool manager has a workaround for it. It watches the active threadpool threads and steps in when they don't complete in a timely manner. It then allows an extra thread to start, one more than the minimum set by SetMinThreads(). But not more then the maximum set by SetMaxThreads(), having too many active tp threads is risky and likely to trigger OOM. This does solve the deadlock, it gets one of the ForAll() calls to complete. But this happens at a very slow rate, the threadpool only does this twice a second. You'll run out of patience before it catches up.
Parallel.ForEach() doesn't have this problem, it doesn't block so doesn't gum up the pool.
Seems to be the solution, but do keep in mind that your program is still fire-hosing the memory of your machine, adding ever more waiting tp threads to the pool. This can crash your program as well, it just isn't as likely because you have a lot of memory and the threadpool doesn't use a lot of it to keep track of a request. Some programmers however accomplish that as well.
The solution is a very simple one, just don't use threading. It is harmful, there is no concurrency when you have only one disk. And it does not like being commandeered by multiple threads. Especially bad on a spindle drive, head seeks are very, very slow. SSDs do it a lot better, it however still takes an easy 50 microseconds, overhead that you just don't want or need. The ideal number of threads to access a disk that you can't otherwise expect to be cached well is always one.
The first thing to note is that you are trying to parallelise an IO-bound operation, which will distort the timings significantly.
The second thing to note is the nature of the parallelised tasks: You are recursively descending a directory tree. If you create multiple threads to do this, each thread is likely to be accessing a different part of the disk simultaneously - which will cause the disk read head to be jumping all over the place and slowing things down considerably.
Try changing your test to create an in-memory tree, and access that with multiple threads instead. Then you will be able to compare the timings properly without the results being distorted beyond all usefulness.
Additionally, you may be creating a great number of threads, and they will (by default) be threadpool threads. Having a great number of threads will actually slow things down when they exceed the number of processor cores.
Also note that when you exceed the thread pool minimum threads (defined by ThreadPool.GetMinThreads()), a delay is introduced by the thread pool manager between each new threadpool thread creation. (I think this is around 0.5s per new thread).
Also, if the number of threads exceeds the value returned by ThreadPool.GetMaxThreads(), the creating thread will block until one of the other threads has exited. I think this is likely to be happening.
You can test this hypothesis by calling ThreadPool.SetMaxThreads() and ThreadPool.SetMinThreads() to increase these values, and see if it makes any difference.
(Finally, note that if you are really trying to recursively descend from C:\, you will almost certainly get an IO exception when it reaches a protected OS folder.)
NOTE: Set the max/min threadpool threads like this:
ThreadPool.SetMinThreads(4000, 16);
ThreadPool.SetMaxThreads(4000, 16);
Follow Up
I have tried your test code with the threadpool thread counts set as described above, with the following results (not run on the whole of my C:\ drive, but on a smaller subset):
Mode 1 took 06.5 seconds.
Mode 2 took 15.7 seconds.
Mode 3 took 16.4 seconds.
This is in line with my expectations; adding a load of threading to do this actually makes it slower than single-threaded, and the two parallel approaches take roughly the same time.
In case anyone else wants to investigate this, here's some determinative test code (the OP's code is not reproducible because we don't know his directory structure).
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Threading.Tasks;
namespace Demo
{
internal class Program
{
private static DirWithSubDirs RootDir;
private static void Main()
{
Console.WriteLine("Loading file system into memory...");
RootDir = new DirWithSubDirs("Root", 4, 4);
Console.WriteLine("Done");
//ThreadPool.SetMinThreads(4000, 16);
//ThreadPool.SetMaxThreads(4000, 16);
var w = Stopwatch.StartNew();
ThisIsARecursiveFunctionInMemory(RootDir);
Console.WriteLine("Elapsed seconds: " + w.Elapsed.TotalSeconds);
Console.ReadKey();
}
public static void ThisIsARecursiveFunctionInMemory(DirWithSubDirs currentDirectory)
{
var depth = currentDirectory.Path.Count(t => t == '\\');
Console.WriteLine(depth + ": " + currentDirectory.Path);
var children = currentDirectory.SubDirs;
//Edit this mode to switch what way of parallelization it should use
int mode = 3;
switch (mode)
{
case 1:
foreach (var child in children)
{
ThisIsARecursiveFunctionInMemory(child);
}
break;
case 2:
children.AsParallel().ForAll(t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
case 3:
Parallel.ForEach(children, t =>
{
ThisIsARecursiveFunctionInMemory(t);
});
break;
default:
break;
}
}
}
internal class DirWithSubDirs
{
public List<DirWithSubDirs> SubDirs = new List<DirWithSubDirs>();
public String Path { get; private set; }
public DirWithSubDirs(String path, int width, int depth)
{
this.Path = path;
if (depth > 0)
for (int i = 0; i < width; ++i)
SubDirs.Add(new DirWithSubDirs(path + "\\" + i, width, depth - 1));
}
}
}
The Parallel.For and .ForEach methods are implemented internally as equivalent to running iterations in Tasks, e.g. that a loop like:
Parallel.For(0, N, i =>
{
DoWork(i);
});
is equivalent to:
var tasks = new List<Task>(N);
for(int i=0; i<N; i++)
{
tasks.Add(Task.Factory.StartNew(state => DoWork((int)state), i));
}
Task.WaitAll(tasks.ToArray());
And from the perspective of every iteration potentially running in parallel with every other iteration, this is an ok mental model, but does not happen in reality. Parallel, in fact, does not necessarily use one Task per iteration, as that is significantly more overhead than is necessary. Parallel.ForEach tries to use the minimum number of tasks necessary to complete the loop as fast as possible. It spins up tasks as threads become available to process those tasks, and each of those tasks participates in a management scheme (I think its called chunking): A task asks for multiple iterations to be done, gets them, and then processes that work, and then goes back for more. The chunk sizes vary based the number of tasks participating, the load on the machine, etc.
PLINQ’s .AsParallel() has a different implementation, but it ‘can’ still similarly fetch multiple iterations into a temporary store, do the calculations in a thread (but not as a task), and put the query results into a small buffer. (You get something based on ParallelQuery, and then further .Whatever() functions bind to an alternative set of extension methods that provide parallel implementations).
So now that we have a small idea of how these two mechanisms work, I will try to provide an answer to your original question:
So why is .AsParallel() slower than Parallel.ForEach? The reason stems from the following. Tasks (or their equivalent implementation here) do NOT block on I/O-like calls. They ‘await’ and free up the CPU to do something else. But (quoting C# nutshell book): “PLINQ cannot perform I/O-bound work without blocking threads”. The calls are synchronous. They were written with the intention that you increase the degree of parallelism if (and ONLY if) you are doing such things as downloading web pages per task that do not hog CPU time.
And the reason why your function calls are exactly analogous to I/O bound calls is this: One of your threads (call it T) blocks and does nothing until all of its child threads have finished, which can be a slow process here. T itself is not CPU-intensive while it waits for the children to unblock, it is doing nothing but waiting. Hence it is identical to a typical I/O bound function call.
Based on the accepted answer to How exactly does AsParallel work?
.AsParallel.ForAll() casts back to IEnumerable before calling .ForAll()
so it creates 1 new thread + N recursive calls (each of which generates a new thread).

How to stop constant memory increase every time segment of code is run?

I have a program that runs some methods every 5 seconds in the background. However every 5 seconds its physical memory usage jumps by 16-20 Kb. Through commenting out segments of code, I've narrowed it down to this specific segment is what is causing the issue. What am I missing here to correctly release the allocated Memory?
Loop segment from main method:
while (true)
{
listMessages = FetchAllMessages();
//Commented out other segments. Not causing memory increase
System.Threading.Thread.Sleep(5000);
}
Method called:
public static List<Message> FetchAllMessages()
{
try
{
using (Pop3Client client = new Pop3Client())
{
client.Connect("pop.gmail.com", 995, true);
client.Authenticate("removed", "removed");
int messageCount = client.GetMessageCount();
List<Message> allMessages = new List<Message>(messageCount);
for (int i = messageCount; i > 0; i--)
{
if (verifiedEmail.Contains(client.GetMessage(i).Headers.From.Address) || verifiedSms.Contains(client.GetMessage(i).Headers.From.Address))
{
string tempMessage = client.GetMessage(i).ToMailMessage().Body.ToLower();
if (tempMessage.Contains("cmd") && tempMessage.Contains("fin"))
{
allMessages.Add(client.GetMessage(i));
}
}
client.DeleteMessage(i);
}
client.Disconnect();
return allMessages;
}
}
catch (Exception ex)
{
return null;
}
}
One of the things that could be causing a steady increase in memory usage is that you're calling GetMessage so many times. Depending on how your POP client is written, that could be allocating a new buffer every time so that it can download the message from the POP server. That memory will of course be collected eventually, but you're exercising the garbage collector needlessly. And you're also being highly inefficient.
You should consider changing your code to something like this:
for (int i = messageCount; i > 0; i--)
{
var msg = client.GetMessage(i);
if (verifiedEmail.Contains(msg.Headers.From.Address)
|| verifiedSms.Contains(msg.Headers.From.Address))
{
string tempMessage = msg.ToMailMessage().Body.ToLower();
if (tempMessage.Contains("cmd") && tempMessage.Contains("fin"))
{
allMessages.Add(msg);
}
}
client.DeleteMessage(i);
}
So instead of calling client.GetMessage(i) four times, you call it only once.
It also makes the code easier to read.
That said, I think it's likely that your "memory leak" is just the GC taking its own sweet time in collecting memory.
One other thing. You have a sleep loop:
while (true)
{
listMessages = FetchAllMessages();
Thread.Sleep(5000);
}
You're tying up a thread that spends most of its time doing nothing. You'd be better off creating a timer with a 5 second interval, like this:
System.Threading.Timer MailTimer; // declare at class scope
// Do this in your initialization
MailTimer = new Timer(MessageFetcher, null, 5000, -1);
And your MessageFetcher method is:
void MessageFetcher(object state)
{
listMessages = FetchAllMessages();
// do that other stuff that you didn't show
// reset the timer so that it fires 5 seconds from now
MailTimer.Change(5000, -1);
}
The initialization creates a one-shot timer that expires in five seconds and calls MessageFetcher. When MessageFetcher is done, it sets a timer so that mail will be checked in another five seconds. You want to do it this way rather than setting a periodic interval, because you don't want the timer to call MessageFetcher again if the previous tick isn't done processing.
The MessageFetcher method is executed on a pool thread. Using the timer prevents you from having to keep a thread around all the time, occupying memory while it's doing essentially nothing.
As Paddy said, eventually garbage collection will get around to disposing the objects and releasing memory, but you CAN manually force it, although it's generally better to allow it to happen automatically.
But to test that garbage collection will reduce the memory, exit the While loop after several calls and call GC.Collect();. The memory should go down.
Calling GC.Collect(); is expensive and that's why you're better off to let the OS choose the optimal time to call garbage collection automatically.
This is a great answer to a similar question regarding your concerns: C# Garbage collection
Who knows? It's not deterministic. Think of it like this: on a system
with infinite memory, the garbage collector doesn't have to do
anything. And you might think that's a bad example, but that's what
the garbage collector is simulating for you: a system with infinite
memory. Because on a system with sufficiently more memory available
than required by your program, the garbage collector never has to run.
Consequently, your program can not make any assumptions about when
memory will (if ever) be collected.
So, the answer to your question is: we don't know.
I would recommend setting up a memory profiler to log the memory consumption of your application and run it for a while to test. You should see that the Garbage Collector will keep things under control automatically without you having to change anything with your code.

Performance Counter - System.InvalidOperationException: Category does not exist

I have following class that returns number of current Request per Second of IIS. I call RefreshCounters every minute in order to keep Requests per Second value refreshed (because it is average and if I keep it too long old value will influence result too much)... and when I need to display current RequestsPerSecond I call that property.
public class Counters
{
private static PerformanceCounter pcReqsPerSec;
private const string counterKey = "Requests_Sec";
public static object RequestsPerSecond
{
get
{
lock (counterKey)
{
if (pcReqsPerSec != null)
return pcReqsPerSec.NextValue().ToString("N2"); // EXCEPTION
else
return "0";
}
}
}
internal static string RefreshCounters()
{
lock (counterKey)
{
try
{
if (pcReqsPerSec != null)
{
pcReqsPerSec.Dispose();
pcReqsPerSec = null;
}
pcReqsPerSec = new PerformanceCounter("W3SVC_W3WP", "Requests / Sec", "_Total", true);
pcReqsPerSec.NextValue();
PerformanceCounter.CloseSharedResources();
return null;
}
catch (Exception ex)
{
return ex.ToString();
}
}
}
}
The problem is that following Exception is sometimes thrown:
System.InvalidOperationException: Category does not exist.
at System.Diagnostics.PerformanceCounterLib.GetCategorySample(String machine,\ String category)
at System.Diagnostics.PerformanceCounter.NextSample()
at System.Diagnostics.PerformanceCounter.NextValue()
at BidBop.Admin.PerfCounter.Counters.get_RequestsPerSecond() in [[[pcReqsPerSec.NextValue().ToString("N2");]]]
Am I not closing previous instances of PerformanceCounter properly? What am I doing wrong so that I end up with that exception sometimes?
EDIT:
And just for the record, I am hosting this class in IIS website (that is, of course, hosted in App Pool which has administrative privileges) and invoking methods from ASMX service. Site that uses Counter values (displays them) calls RefreshCounters every 1 minute and RequestsPerSecond every 5 seconds; RequestPerSecond are cached between calls.
I am calling RefreshCounters every 1 minute because values tend to become "stale" - too influenced by older values (that were actual 1 minute ago, for example).
Antenka has led you in a good direction here. You should not be disposing and re-creating the performance counter on every update/request for value. There is a cost for instantiating the performance counters and the first read can be inaccurate as indicated in the quote below. Also your lock() { ... } statements are very broad (they cover a lot of statements) and will be slow. Its better to have your locks as small as possible. I'm giving Antenka a voteup for the quality reference and good advice!
However, I think I can provide a better answer for you. I have a fair bit of experience with monitoring server performance and understand exactly what you need. One problem your code doesn't take into account is that whatever code is displaying your performance counter (.aspx, .asmx, console app, winform app, etc) could be requesting this statistic at any rate; it could be requested once every 10 seconds, maybe 5 times per second, you don't know and shouldn't care. So you need to separate the PerformanceCounter collection code from that does the monitoring from the code that actually reports the current Requests / Second value. And for performance reasons, I'm also going to show you how to setup the performance counter on first request and then keep it going until nobody has made any requests for 5 seconds, then close/dispose the PerformanceCounter properly.
public class RequestsPerSecondCollector
{
#region General Declaration
//Static Stuff for the polling timer
private static System.Threading.Timer pollingTimer;
private static int stateCounter = 0;
private static int lockTimerCounter = 0;
//Instance Stuff for our performance counter
private static System.Diagnostics.PerformanceCounter pcReqsPerSec;
private readonly static object threadLock = new object();
private static decimal CurrentRequestsPerSecondValue;
private static int LastRequestTicks;
#endregion
#region Singleton Implementation
/// <summary>
/// Static members are 'eagerly initialized', that is,
/// immediately when class is loaded for the first time.
/// .NET guarantees thread safety for static initialization.
/// </summary>
private static readonly RequestsPerSecondCollector _instance = new RequestsPerSecondCollector();
#endregion
#region Constructor/Finalizer
/// <summary>
/// Private constructor for static singleton instance construction, you won't be able to instantiate this class outside of itself.
/// </summary>
private RequestsPerSecondCollector()
{
LastRequestTicks = System.Environment.TickCount;
// Start things up by making the first request.
GetRequestsPerSecond();
}
#endregion
#region Getter for current requests per second measure
public static decimal GetRequestsPerSecond()
{
if (pollingTimer == null)
{
Console.WriteLine("Starting Poll Timer");
// Let's check the performance counter every 1 second, and don't do the first time until after 1 second.
pollingTimer = new System.Threading.Timer(OnTimerCallback, null, 1000, 1000);
// The first read from a performance counter is notoriously inaccurate, so
OnTimerCallback(null);
}
LastRequestTicks = System.Environment.TickCount;
lock (threadLock)
{
return CurrentRequestsPerSecondValue;
}
}
#endregion
#region Polling Timer
static void OnTimerCallback(object state)
{
if (System.Threading.Interlocked.CompareExchange(ref lockTimerCounter, 1, 0) == 0)
{
if (pcReqsPerSec == null)
pcReqsPerSec = new System.Diagnostics.PerformanceCounter("W3SVC_W3WP", "Requests / Sec", "_Total", true);
if (pcReqsPerSec != null)
{
try
{
lock (threadLock)
{
CurrentRequestsPerSecondValue = Convert.ToDecimal(pcReqsPerSec.NextValue().ToString("N2"));
}
}
catch (Exception) {
// We had problem, just get rid of the performance counter and we'll rebuild it next revision
if (pcReqsPerSec != null)
{
pcReqsPerSec.Close();
pcReqsPerSec.Dispose();
pcReqsPerSec = null;
}
}
}
stateCounter++;
//Check every 5 seconds or so if anybody is still monitoring the server PerformanceCounter, if not shut down our PerformanceCounter
if (stateCounter % 5 == 0)
{
if (System.Environment.TickCount - LastRequestTicks > 5000)
{
Console.WriteLine("Stopping Poll Timer");
pollingTimer.Dispose();
pollingTimer = null;
if (pcReqsPerSec != null)
{
pcReqsPerSec.Close();
pcReqsPerSec.Dispose();
pcReqsPerSec = null;
}
}
}
System.Threading.Interlocked.Add(ref lockTimerCounter, -1);
}
}
#endregion
}
Ok now for some explanation.
First you'll notice this class is designed to be a static singleton.
You can't load multiple copies of it, it has a private constructor
and and eagerly initialized internal instance of itself. This makes
sure you don't accidentally create multiple copies of the same
PerformanceCounter.
Next you'll notice in the private constructor (this will only run
once when the class is first accessed) we create both the
PerformanceCounter and a timer which will be used to poll the
PerformanceCounter.
The Timer's callback method will create the PerformanceCounter if
needed and get its next value is available. Also every 5 iterations
we're going to see how long its been since your last request for the
PerformanceCounter's value. If it's been more than 5 seconds, we'll
shutdown the polling timer as its unneeded at the moment. We can
always start it up again later if we need it again.
Now we have a static method called GetRequestsPerSecond() for you to
call which will return the current value of the RequestsPerSecond
PerformanceCounter.
The benefits of this implementation are that you only create the performance counter once and then keep using until you are finished with it. Its easy to use because you simple call RequestsPerSecondCollector.GetRequestsPerSecond() from wherever you need it (.aspx, .asmx, console app, winforms app, etc). There will always be only one PerformanceCounter and it will always be polled at exactly 1 times per second regardless of how quickly you call RequestsPerSecondCollector.GetRequestsPerSecond(). It will also automatically close and dispose of the PerformanceCounter if you haven't requested its value in more than 5 seconds. Of course you can adjust both the timer interval and the timeout milliseconds to suit your needs. You could poll faster and timeout in say 60 seconds instead of 5. I chose 5 seconds as it proves that it works very quickly while debugging in visual studio. Once you test it and know it works, you might want a longer timeout.
Hopefully this helps you not only better use PerformanceCounters, but also feel safe to reuse this class which is separate from whatever you want to display the statistics in. Reusable code is always a plus!
EDIT: As a follow up question, what if you want to performance some cleanup or babysitting task every 60 seconds while this performance counter is running? Well we already have the timer running every 1 second and a variable tracking our loop iterations called stateCounter which is incremented on each timer callback. So you could add in some code like this:
// Every 60 seconds I want to close/dispose my PerformanceCounter
if (stateCounter % 60 == 0)
{
if (pcReqsPerSec != null)
{
pcReqsPerSec.Close();
pcReqsPerSec.Dispose();
pcReqsPerSec = null;
}
}
I should point out that this performance counter in the example should not "go stale". I believe 'Request / Sec" should be an average and not a moving average statistic. But this sample just illustrates a way you could do any type of cleanup or "babysitting" of your PerformanceCounter on a regular time interval. In this case we are closing and disposing the performance counter which will cause it to be recreated on next timer callback. You could modify this for your use case and according the specific PerformanceCounter you are using. Most people reading this question/answer should not need to do this. Check the documentation for your desired PerformanceCounter to see if it is a continuous count, an average, a moving average, etc... and adjust your implementation appropriately.
I don't know, if this passes you .. I've read article PerformanceCounter.NextValue Method
And there was a comment:
// If the category does not exist, create the category and exit.
// Performance counters should not be created and immediately used.
// There is a latency time to enable the counters, they should be created
// prior to executing the application that uses the counters.
// Execute this sample a second time to use the category.
So, I have a question, which can lead to answer: isn't call to a RequestsPerSecond method happends too early?
Also, I would suggest you to to try check if the Category doesn't exists and log the info somewhere, so we can analyze it and determine which conditions we have and how often that happends.
I just solved this type of error or exception with:
Using,
new PerformanceCounter("Processor Information", "% Processor Time", "_Total");
Instead of,
new PerformanceCounter("Processor", "% Processor Time", "_Total");
I had an issue retrieving requests per second on IIS using code similar to the following
var pc = new PerformanceCounter();
pc.CategoryName = #"W3SVC_W3WP";
pc.InstanceName = #"_Total";
pc.CounterName = #"Requests / Sec";
Console.WriteLine(pc.NextValue());
This would sometimes throw InvalidOperationException and I was able to reproduce the exception by restarting IIS. If I run with a non warmed up IIS, e.g. after a laptop reboot or IIS restart, then I get this exception. Hit the website first, make any http request beforehand, and wait a second or two and I don't get the exception. This smells like the performance counters are cached,and when Idle they get dumped, and take a while to re-cache? (or similar).
Update1: Initially when I manually browse to the website and warm it up, it solves the problem. I've tried programmatically warming up the server with new WebClient().DownloadString(); Thread.Sleep() up to 3000ms and this has not worked? So my results of manually warming up server, might somehow be a false positive. I'm leaving my answer here, because it might be the cause, (i.e. manual warming up), and maybe someone else can elaborate further?
Update2: Ah, ok, here are some unit tests that summarises some learning from further experimenting I did yesterday. (There's not a lot on google on this subject btw.)
As far as I can reason, the following statements might be true; (and I submit the unit tests underneath as evidence.) I may have misinterpreted the results, so please double check ;-D
Create a performance counter and calling getValue before the category exists, e.g. querying an IIS counter, while IIS is cold and no process running, will throw InvalidOperation exception "category does not exist". (I assume this is true for all counters, and not just IIS.)
From within a Visual Studio unit test, once your counter throws an exception, if you subsequently warm up the server after the first exception, and create a new PerformanceCounter and query again, it will still throw an exception! (this one was a surprise, I assume this is because of some singleton action. My apologies I have not had enough time to decompile the sources to investigate further before posting this reply.)
In 2 above, if you mark the unit test with [STAThread] then I was able to create a new PerformanceCounter after one has failed. (This might have something to do with Performance counter possibly being singletons? Needs further testing.)
No pause was required for me before creating counter and using it, despite some warnings in MSDN same code documentation, other than the time it takes to create a performance counter itself before calling NextValue().In my case, to warm up the counter and bring the "category" into existance, was for me to fire one shot across the bow of IIS, i.e. make a single GET request, and viola, no longer get "InvalidOperationException", and this seems to be a reliable fix for me, for now. At least when querying IIS performance counters.
CreatingPerformanceCounterBeforeWarmingUpServerThrowsException
[Test, Ignore("Run manually AFTER restarting IIS with 'iisreset' at cmd prompt.")]
public void CreatingPerformanceCounterBeforeWarmingUpServerThrowsException()
{
Console.WriteLine("Given a webserver that is cold");
Console.WriteLine("When I create a performance counter and read next value");
using (var pc1 = new PerformanceCounter())
{
pc1.CategoryName = #"W3SVC_W3WP";
pc1.InstanceName = #"_Total";
pc1.CounterName = #"Requests / Sec";
Action action1 = () => pc1.NextValue();
Console.WriteLine("Then InvalidOperationException will be thrown");
action1.ShouldThrow<InvalidOperationException>();
}
}
[Test, Ignore("Run manually AFTER restarting IIS with 'iisreset' at cmd prompt.")]
public void CreatingPerformanceCounterAfterWarmingUpServerDoesNotThrowException()
{
Console.WriteLine("Given a webserver that has been Warmed up");
using (var client = new WebClient())
{
client.DownloadString("http://localhost:8082/small1.json");
}
Console.WriteLine("When I create a performance counter and read next value");
using (var pc2 = new PerformanceCounter())
{
pc2.CategoryName = #"W3SVC_W3WP";
pc2.InstanceName = #"_Total";
pc2.CounterName = #"Requests / Sec";
float? result = null;
Action action2 = () => result = pc2.NextValue();
Console.WriteLine("Then InvalidOperationException will not be thrown");
action2.ShouldNotThrow();
Console.WriteLine("And the counter value will be returned");
result.HasValue.Should().BeTrue();
}
}
Just out of curiousity, what do you have set for properties in Visual Studio? In VS go to Project Properties, Build, Platform target and change it to AnyCPU. I have seen it before where Performance Counters aren't always retrieved when it is set to x86, and changing it to AnyCPU could fix it.

Categories