Garbage collection worst case performance on Mono

Garbage collection worst case performance on Mono - c#

Hi i want to find out the longest time needed to call a method or to create an object.
I thought of something like calling GC.Collect() before creating the object or during the method or calling some destructors.
Has anyone some hints or ideas for finding out some (or the) worst case scenarios?
best regards

See this thread. Put the answer code in a loop and record lowest & highest time that come in. But that may not be actually that intersting. Here is better performance meassaring:
Run the code in you actual application to get a feel of a real life scenario
or
Run a test on say 100000 calls to whatever method you want to
test then you can take an average of the call times which should give you a better indication if your method is slow or not

Related

Repeated access to properties and speed in C# [duplicate]

Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)

First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.

If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.

Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.

It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.

why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.

If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.

C# - Is the amount of objects in memory affecting performance of local processing?

I am very confused by what I am seeing in my program.
Let's say we have a list of two large objects (loaded from 2 external files).
Then I iterate over each object and for each one I call a method, which performs a bunch of processing.
Just to illustrate:
foreach (var object in objects)
{
object.DoSomething();
}
In first case, objects contains 2 items. It completes very fast, I track the progress of each object individually and the processing for each one is very fast.
Then I run the program again, this time adding some more input files, so instead of 2, I'd have let's say 6 objects.
So the code runs again, and the 2 objects from before are still there, along with some more, but for some odd reason, now each processing (each call to object.DoSomething()) takes much longer than before.
Let's say scenario 1 with 2 objects, objectA.Dosomething() takes 1
minute to complete.
Let's say scenario 2, with 6 objects, same objectA.Dosomething()
as in scenario 1 now takes 5 minutes to complete.
The more objects I have in my list, the longer each processing for each individual object takes.
How is that possible? How can the performance of an individual processing for a specific, independent object, be affected so much by objects in the memory? How can, in scenario 1 and 2 above, the exact same processing on the exact same data take a significantly different amount of time to complete?
Also, please note that processing is slower from the start, it does not start fast on first object and then slows down progressively, it's just consistently slowed down proportionally to the amount of objects to process. I have some multi-threading in there, and I can see the rate at which threads complete drops dramatically when I start adding more objects. The multi-threading happens inside of "DoSomething()" and it will not leave untill all threads have completed. However, I don't think this issue is related to multi-threading. Actually, I added multi-threading because of the slowness.
Also please note that initially I was merging all input files into one huge object and one single call to DoSomething(), and I broke it down thinking it would help performance.
Is this a "normal" behavior and if so, what are the ways around this? I can think of other ways to process the data, but I still don't get this behavior and there has to be something I can do to get the intended result here.
Edit 1:
Each object in the "objects" list above also contains a list (queue) of smaller objects, around 5000 of those each. I am starting to believe my issue might be that, and that I should use structs or something similar instead of having so many nested objects. Would that explain the type of behavior I am describing above?

As stated in the comments, my question was too abstract for any precise answer to be given. I mostly wanted some pointers and to know if somehow I might have hit some internal limit.
It turned out I was overlooking a separate mechanism I have for logging results internally and producing reports. I built that part of the system really quickly and it was ridiculously inefficient and growing way too fast. Limiting the size of the internal structures, limiting the amount of retrievals from big collections and breaking down the processing in smaller chunks did the trick.
Just to illustrate, something that was taking over 6 hours is now taking 1 minute. Shame on me. Cleaner solution would be to use a database, but at least it seems I will be getting away with this one for now.

Avoiding C# JIT overhead

Is there an easy way of JIT-ing C# code up front, rather than waiting for the first time the code is invoked? I have read about NGEN but I don't think that's going to help me.
My application waits and responds to a specific external event that comes from a UDP port, and none of the critical-path code is (a) run before the event arrives, or (b) ever run again, so the cost of JIT is high in this scenario. Checking with ANTS profiler the overhead for JIT is around 40-50%, sometimes it's as high as 90%. My application is highly latency sensitive, and every millisecond counts.
My initial thought is that I could add a bool parameter to every critical path method, and call those methods before the event occurs, in order to initiate the JIT compile. However, is there a prettier and less hacky way?
Many thanks

I'd say use NGEN and if it doesn't work, you likely have deeper problems.
But, to answer your question, this article on how to pre-jit uses System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod to force a JIT. It includes sample code to use reflection to get the method handles.

What happens the second time the event arrives? Is it faster then or just as slow. If its still slow then JIT is not the problem, because the code gets "JIT"ed only once, the first time it is run.
NGEN would provide the answer for you. My suggestion is to take the bare minimum of the code you need, the critical path if you will, and put it in dummy/sandbox project. Start profiling/NGenning this code and seeing the performance.
If this bare minimum code, even after being NGEN'ed performs poorly on multiple calls, then pre-compiling isn't going to help you. Its something else in the code that is causing performance bottle necks.

Does passing a control as parameter causes an enough performance hit?

What is better/preferred - Creating a method of 2 lines which accepts a web control as a parameter, operates on it and is called from 3-4 places within the same code file or writing those 2 lines at the 3-4 places and not creating the method?
P.S. The control I am referring here is a textbox.

All it is passing is a reference. There will be no significant cost to this whatsoever. If the method is small and linear, the JIT may even choose to inline it - but ultimately, this is not going to make any difference.
Stick with the method approach - then you only have one place to maintain.

For maintainability it is better to break out the lines to a method.
Performance wise you will not notice any difference at all.

Unless you're working on code that needs to run in a Nuclear Powerplant or a NASA landrover on Mars, you're always better of writing code that is easier for YOU to maintain! And that means refactoring your code so you never repeat yourself.
Theoretically its of course faster to have the instructions inline and not call a method, but in practice it far outweighs the cons maintaining it.

The DRY (Don't Repeat Yourself) principle says that you create the method and call it the 3-4 times. The performance hit is so minimal that it really isn't worth thinking about unless the consequence of those billionths of a second outweigh the additional overhead of maintaining the code 3-4 times over.
In short, unless you can really really really justify the additional maintenance over grabbing every last processor cycle* overhead then create the method.
(*) Given this is a textbox and therefore most likely a business application your biggest performance worries more likely are databases and webservices.

Careful where the method that modifies the control resides. If the method is part of another class, then passing controls to it will break the encapsulation of the class that owns the control.

Variation in execution time

I've been profiling a method using the stopwatch class, which is sub-millisecond accurate. The method runs thousands of times, on multiple threads.
I've discovered that most calls (90%+) take 0.1ms, which is acceptable. Occasionally, however, I find that the method takes several orders of magnitude longer, so that the average time for the call is actually more like 3-4ms.
What could be causing this?
The method itself is run from a delegate, and is essentially an event handler.
There are not many possible execution paths, and I've not yet discovered a path that would be conspicuously complicated.
I'm suspecting garbage collection, but I don't know how to detect whether it has occurred.
Finally, I am also considering whether the logging method itself is causing the problem. (The logger is basically a call to a static class + event listener that writes to the console.)

Just because Stopwatch has a high accuracy doesn't mean that other things can't get in the way - like the OS interrupting that thread to do something else. Garbage collection is another possibility. Writing to the console could easily cause delays like that.
Are you actually interested in individual call times, or is it overall performance which is important? It's generally more useful to run a method thousands of times and look at the total time - that's much more indicative of overall performance than individual calls which can be affected by any number of things on the computer.

As I commented, you really should at least describe what your method does, if you're not willing to post some code (which would be best).
That said, one way you can tell if garbage collection has occurred (from Windows):
Run perfmon (Start->Run->perfmon)
Right-click on the graph; select "Add Counters..."
Under "Performance object", select ".NET CLR Memory"
From there you can select # Gen 0, 1, and 2 collections and click "Add"
Now on the graph you will see a graph of all .NET CLR garbage collections
Just keep this graph open while you run your application
EDIT: If you want to know if a collection occurred during a specific execution, why not do this?
int initialGen0Collections = GC.CollectionCount(0);
int initialGen1Collections = GC.CollectionCount(1);
int initialGen2Collections = GC.CollectionCount(2);
// run your method
if (GC.CollectionCount(0) > initialGen0Collections)
// gen 0 collection occurred
if (GC.CollectionCount(1) > initialGen1Collections)
// gen 1 collection occurred
if (GC.CollectionCount(2) > initialGen2Collections)
// gen 2 collection occurred
SECOND EDIT: A couple of points on how to reduce garbage collections within your method:
You mentioned in a comment that your method adds the object passed in to "a big collection." Depending on the type you use for said big collection, it may be possible to reduce garbage collections. For instance, if you use a List<T>, then there are two possibilities:
a. If you know in advance how many objects you'll be processing, you should set the list's capacity upon construction:
List<T> bigCollection = new List<T>(numObjects);
b. If you don't know how many objects you'll be processing, consider using something like a LinkedList<T> instead of a List<T>. The reason for this is that a List<T> automatically resizes itself whenever a new item is added beyond its current capacity; this results in a leftover array that (eventually) will need to be garbage collected. A LinkedList<T> does not use an array internally (it uses LinkedListNode<T> objects), so it will not result in this garbage collection.
If you are creating objects within your method (i.e., somewhere in your method you have one or more lines like Thing myThing = new Thing();), consider using a resource pool to eliminate the need for constantly constructing objects and thereby allocating more heap memory. If you need to know more about resource pooling, check out the Wikipedia article on Object Pools and the MSDN documentation on the ConcurrentBag<T> class, which includes a sample implementation of an ObjectPool<T>.

That can depend on many things and you really have to figure out which one you are delaing with.
I'm not terribly familiar with what triggers garbage collection and what thread it runs on, but that sounds like a possibility.
My first thought around this is with paging. If this is the first time the method runs and the application needs to page in some code to run the method, it would be waiting on that. Or, it could be the data that you're using within the method that triggered a cache miss and now you have to wait for that.
Maybe you're doing an allocation and the allocator did some extra reshuffling in order to get you the allocation you requested.
Not sure how thread time is calculated with Stopwatch, but a context switch might be what you're seeing.
Or...it could be something completely different...
Basically, it could be one of several things and you really have to look at the code itself to see what is causing your occasional slow-down.

It could well be GC. If you use a profiler application such as Redgate's ANTS profiler you can profile % time in GC along side your application's performance to see what's going on.
In addition, you can use the CLRProfiler...
https://github.com/MicrosoftArchive/clrprofiler
Finally, Windows Performance Monitor will show the % time in GC for a given running applicaiton too.
These tools will help you get a holistic view of what's going on in your app as well as the OS in general.
I'm sure you know this stuff already but microbenchmarking such as this is sometimes useful for determining how fast one line of code might be compared to another than you might write, but you generally want to profile your application under typical load too.
Knowing that a given line of code is 10 times faster than another is useful, but if that line of code is easier to read and not part of a tight loop then the 10x performance hit might not be a problem.

What you need is a performance profile to tell you exactly what causes a slow down. Here is a quick list And of course here is the ANTS profiler.
Without knowing what your operation is doing, it sounds like it could be the garbage collection. However that might not be the only reason. If you are reading or writing to the disc it is possible your application might have to wait while something else is using the disk.
Timing issues may occur if you have a multi-threaded application and another thread could be taking some processor time that is only running 10 % of the time. This is why a profiler would help.

If you're only running the code "thousands" of times on a pretty quick function, the occasional longer time could easily be due to transient events on the system (maybe Windows decided it was time to cache something).
That being said, I would suggest the following:
Run the function many many more times, and take an average.
In the code that uses the function, determine if the function in question actually is a bottleneck. Use a profiler for this.

It can be dependent on your OS, environment, page reads, CPU ticks per second and so on.
The most realistic way is to run an execution path several thousand times and take the average.
However, if that logging class is only called occasionally and it logs to disk, that is quite likely to be a slow-down factor if it has to seek on the drive first.
A read of http://en.wikipedia.org/wiki/Profiling_%28computer_programming%29 may give you an insight into more techniques for determining slowdowns in your applications, while a list of profiling tools that you may find useful is at:
http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler
specifically http://en.wikipedia.org/wiki/Visual_Studio_Team_System_Profiler if you're doing c# stuff.
Hope that helps!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.