Customization hooks: virtual functions or events?

Customization hooks: virtual functions or events? - c#

Performance and Design wise what would be the pros and cons
Using a sealed class and events or using a abstract class with a virtual function?
there will only be one listener to the events...

You shouldn't worry too much about performance in terms of abstract classes, inheritance, and event subscription. These are language constructs designed to make development & maintenance simpler. They aren't really designed with performance completely in mind.
There are better things to worry about when it comes to performance. A few things come to mind:
Boxing & unboxing - Try to avoid boxing & unboxing objects too much if you're doing a lot of repetitive or iterative tasks.
Reference Types vs. Value Types - Objects created as "structs" are stored by value. This means that the entire value of the object is passed around when sent in memory. This can be more expensive, but its lifetime is more deterministic, so it has an advantage in only existing within certain scopes usually. Objects created as "classes" are stored by reference. When sending a by reference object around through code, you only send the reference to the object, which means less memory to move around. The downside is that because it is allocated to the heap, it's lifespan in memory is less deterministic.
Subscribing/unsubscribing to events - This is not so much a performance issue as just a general development mistake. Objects will not be GC'ed unless all events are unsubscribed from. If you keep subscriptions open, your object can remain in memory forever causing a memory leak. Microsoft has good documentation on a WeakEvent pattern to help work around this problem.
You should also read Microsoft's MSDN documentation on Performance. It's a pretty good reference for understanding the real performance killers in .NET. Sealed & abstract classes and event handlers are usually not a performance concern.
Generally, code structure is more important to worry about. Think about how you're working with your data and what patterns you use that could be heavy to execute.

They don't exactly look like similar alternatives... If the question is whether virtual methods are faster than calling an event than the answer is yes, but only slightly.

Related

C# Garbage Collection -> to C++ delete

I'm converting a C# project to C++ and have a question about deleting objects after use. In C# the GC of course takes care of deleting objects, but in C++ it has to be done explicitly using the delete keyword.
My question is, is it ok to just follow each object's usage throughout a method and then delete it as soon as it goes out of scope (ie method end/re-assignment)?
I know though that the GC waits for a certain size of garbage (~1MB) before deleting; does it do this because there is an overhead when using delete?
As this is a game I am creating there will potentially be lots of objects being created and deleted every second, so would it be better to keep track of pointers that go out of scope, and once that size reachs 1MB to then delete the pointers?
(as a side note: later when the game is optimised, objects will be loaded once at startup so there is not much to delete during gameplay)

Your problem is that you are using pointers in C++.
This is a fundamental problem that you must fix, then all your problems go away. As chance would have it, I got so fed up with this general trend that I created a set of presentation slides on this issue. – (CC BY, so feel free to use them).
Have a look at the slides. While they are certainly not entirely serious, the fundamental message is still true: Don’t use pointers. But more accurately, the message should read: Don’t use delete.
In your particular situation you might find yourself with a lot of long-lived small objects. This is indeed a situation which a modern GC handles quite well, and which reference-counting smart pointers (shared_ptr) handle less efficiently. If (and only if!) this becomes a performance problem, consider switching to a small object allocator library.

You should be using RAII as much as possible in C++ so you do not have to explicitly deleteanything anytime.
Once you use RAII through smart pointers and your own resource managing classes every dynamic allocation you make will exist only till there are any possible references to it, You do not have to manage any resources explicitly.

Memory management in C# and C++ is completely different. You shouldn't try to mimic the behavior of .NET's GC in C++. In .NET allocating memory is super fast (basically moving a pointer) whereas freeing it is the heavy task. In C++ allocating memory isn't that lightweight for several reasons, mainly because a large enough chunk of memory has to be found. When memory chunks of different sizes are allocated and freed many times during the execution of the program the heap can get fragmented, containing many small "holes" of free memory. In .NET this won't happen because the GC will compact the heap. Freeing memory in C++ is quite fast, though.
Best practices in .NET don't necessarily work in C++. For example, pooling and reusing objects in .NET isn't recommended most of the time, because the objects get promoted to higher generations by the GC. The GC works best for short lived objects. On the other hand, pooling objects in C++ can be very useful to avoid heap fragmentation. Also, allocating a larger chunk of memory and using placement new can work great for many smaller objects that need to be allocated and freed frequently, as it can occur in games. Read up on general memory management techniques in C++ such as RAII or placement new.
Also, I'd recommend getting the books "Effective C++" and "More effective C++".

Well, the simplest solution might be to just use garbage collection in
C++. The Boehm collector works well, for example. Still, there are
pros and cons (but porting code originally written in C# would be a
likely candidate for a case where the pros largely outweigh the cons.)
Otherwise, if you convert the code to idiomatic C++, there shouldn't be
that many dynamically allocated objects to worry about. Unlike C#, C++
has value semantics by default, and most of your short lived objects
should be simply local variables, possibly copied if they are returned,
but not allocated dynamically. In C++, dynamic allocation is normally
only used for entity objects, whose lifetime depends on external events;
e.g. a Monster is created at some random time, with a probability
depending on the game state, and is deleted at some later time, in
reaction to events which change the game state. In this case, you
delete the object when the monster ceases to be part of the game. In
C#, you probably have a dispose function, or something similar, for
such objects, since they typically have concrete actions which must be
carried out when they cease to exist—things like deregistering as
an Observer, if that's one of the patterns you're using. In C++, this
sort of thing is typically handled by the destructor, and instead of
calling dispose, you call delete the object.

Substituting a shared_ptr in every instance that you use a reference in C# would get you the closest approximation at probably the lowest effort input when converting the code.
However you specifically mention following an objects use through a method and deleteing at the end - a better approach is not to new up the object at all but simply instantiate it inline/on the stack. In fact if you take this approach even for returned objects with the new copy semantics being introduced this becomes an efficient way to deal with returned objects also - so there is no need to use pointers in almost every scenario.

There are a lot more things to take into considerations when deallocating objects than just calling delete whenever it goes out of scope. You have to make sure that you only call delete once and only call it once all pointers to that object have gone out of scope. The garbage collector in .NET handles all of that for you.
The construct that is mostly corresponding to that in C++ is tr1::shared_ptr<> which keeps a reference counter to the object and deallocates when it drops to zero. A first approach to get things running would be to make all C# references in to C++ tr1::shared_ptr<>. Then you can go into those places where it is a performance bottleneck (only after you've verified with a profile that it is an actual bottleneck) and change to more efficient memory handling.

GC feature of c++ has been discussed a lot in SO.
Try Reading through this!!
Garbage Collection in C++

Binary Heap vs (new) B-Heap: Should it be implemented in the CLR/.NET, and where?

The following article discusses an alternative heap structure that takes into consideration that most servers are virtualized and therefore most memory is paged to disk.
http://queue.acm.org/detail.cfm?id=1814327
Can (or should) a .NET developer implement a B-Heap data structure so that parent-child relationships are maintained within the same Virtual Memory Page? How or where would this be implemented?
Clarification
In other words, is this type of data structure needed within .NET as a primimitive type? True it should be implemented in either natively in the CLR or in a p/invoke.
When a server administrator deploys my .NET app within a virtual machine, does this binary heap optimization make sense? If so, when does it make sense? (number of objects, etc)

To at least a certain extent, BCL collections do seem to take paging concerns into account. They also take CPU cache concerns into account (which overlaps in some regard, as locality of memory can affect both, though in different ways).
Consider that Queue<T> uses arrays for internal storage. In purely random-access terms (that is to say, where there is never any cost for paging or CPU cache flushing) this is a poor choice; the queue will almost always be solely added to at one point and removed from at another and hence an internal implementation as a singly linked list would win in almost every way (for that matter, in terms of iterating through the queue - which it also supports - a linked list shouldn't do much worse than an array in this regard in a pure-random-access situation). Where array-based implementation fares better than singly-linked-list is precisely when paging and CPU cache are considered. That MS went for a solution that is worse in the pure-random-access situation but better in the real-world case where paging matters, so that they are paying attention to the effects of paging.
Of course, from the outside that isn't obvious - and shouldn't be. From the outside we want something that works like a queue; making the inside efficient is a different concern.
These concerns are also met in other ways. The way the GC works, for example, minimises the amount of paging necessary as its moving objects not only makes for less fragmentation, but also makes for fewer page faults. Other collections are also implemented in ways to make paging less frequent than the most immediate solution would suggest.
That's just a few things that stand out to me from things I have looked at. I'd bet good money such concerns are also considered at many other places in the .NET teams work. Likewise with other frameworks. Consider that the one big performance concern Cliff Click mentions repeatedly in terms of his Java lock-free hashtable (I really much finish checking my C# implementation) apart from those of lock-free concurrency (the whole point of the exercise) is cache-lines; and it's also the one other performance concern he doesn't dismiss!
Consider also, that most uses of most collections are going to fit in one page anyway!
If you are implementing your own collections, or putting a standard collection into particularly heavy use, then these are things you need to think about (sometimes "nah, not an issue" is enough thinking, sometimes it isn't) but that doesn't mean they aren't already thought about in terms of what we get from the BCL.

if you have an especially special-case scenario and algorithm then you might benefit from that kind of optimization.
But generally speaking, when reimplementing core parts of the CLR framework (on top of the CLR I might add, ie in managed code) your chances of doing it more efficiently than the CLR team did are incredibly slim. So I wouldn't recommend it unless you have already profiled the heck out of your current implementation and have positively identified issues related to locality of data in memory. And even then, you will get more bang for your buck by tweaking your algorithm to work better with the CLR memory management scheme then trying to bypass or work around it.

When should weak references be used?

I recently came across a piece of Java code with WeakReferences - I had never seen them deployed although I'd come across them when they were introduced. Is this something that should be routinely used or only when one runs into memory problems? If the latter, can they be easily retrofitted or does the code need serious refactoring? Can the average Java (or C#) programmer generally ignore them?
EDIT Can any damage be done by over-enthusiastic use of WRs?

Weak references are all about garbage collection. A standard object will not "disappear" until all references to it are severed, this means all the references your various objects have to it have to be removed before garbage collection will consider it garbage.
With a weak reference just because your object is referenced by other objects doesn't necessarily mean it's not garbage. It can still get picked up by GC and get removed from memory.
An example: If I have a bunch of Foo objects in my application I might want to use a Set to keep a central record of all the Foo's I have around. But, when other parts of my application remove a Foo object by deleting all references to it, I don't want the remaining reference my Set holds to that object to keep it from being garbage collected! Really I just want it to disappear from my set. This is where you'd use something like a Weak Set (Java has a WeakHashMap) instead, which uses weak references to its members instead of "strong" references.
If your objects aren't being garbage collected when you want them to then you've made an error in your book keeping, something's still holding a reference that you forgot to remove. Using weak references can ease the pain of such book keeping, since you don't have to worry about them keeping an object "alive" and un-garbage-collected, but you don't have to use them.

You use them whenever you want to have a reference to an object without keeping the object alive yourself. This is true for many caching-like features, but also play an important role in event handling, where a subscriber shall not be kept alive by being subscribed to an event.
A small example: A timer event which refreshes some data. Any number of objects can subscribe on the timer in order to get notified, but the bare fact that they subscribed on the timer should not keep them alive. So the timer should have weak references to the objects.

Can any damage be done by
over-enthusiastic use of WRs?
Yes it can.
One concern is that weak references make your code more complicated and potentially error prone. Any code that uses a weak reference needs to deal with the possibility that the reference has been broken each time it uses it. If you over-use weak references you end up writing lots of extra code. (You can mitigate this by hiding each weak reference behind a method that takes care of the checking, and re-creates the discarded object on demand. But this may not necessarily be as simple as that; e.g. if the re-creation process involves network access, you need to cope with the possibility of re-creation failure.)
A second concern is that there are runtime overheads with using weak references. The obvious costs are those of creating weak references and calling get on them. A less obvious cost is that significant extra work needs to be done each time the GC runs.
A final concern is that if you use a weak references for something that your application is highly likely to need in the future, you may incur the cost of repeatedly recreating it. If this cost is high (in terms of CPU time, IO bandwidth, network traffic, whatever) your application may perform badly as a result. You may be better off giving the JVM more memory and not using weak references at all.
Off course, this does not mean you should avoid using weak references entirely. Just that you need to think carefully. And probably you should first run a memory profiler on your application to figure out where your memory usage problems stem from.

A good question to ask when considering use of a WeakReference is how one would feel if the weak reference were invalidated the instant no strong references existed to the object. If that would make the WeakReference less useful, then a WeakReference is probably not the best thing to use. If that would be an improvement over the non-deterministic invalidation that comes from garbage-collection, then a WeakReference is probably the right choice.

What are ways to solve Memory Leaks in C#

I'm learning C#. From what I know, you have to set things up correctly to have the garbage collector actually delete everything as it should be. I'm looking for wisdom learned over the years from you, the intelligent.
I'm coming from a C++ background and am VERY used to code-smells and development patterns. I want to learn what code-smells are like in C#. Give me advice!
What are the best ways to get things deleted?
How can you figure out when you have "memory leaks"?
Edit: I am trying to develop a punch-list of "stuff to always do for memory management"
Thanks, so much.

C#, the .NET Framework uses Managed Memory and everything (but allocated unmanaged resources) is garbage collected.
It is safe to assume that managed types are always garbage collected. That includes arrays, classes and structures. Feel free to do int[] stuff = new int[32]; and forget about it.
If you open a file, database connection, or any other unmanaged resource in a class, implement the IDisposable interface and in your Dispose method de-allocate the unmanaged resource.
Any class which implements IDisposable should be explicitly closed, or used in a (I think cool) Using block like;
using (StreamReader reader = new StreamReader("myfile.txt"))
{
... your code here
}
Here .NET will dispose reader when out of the { } scope.

The first thing with GC is that it is non-deterministic; if you want a resource cleaned up promptly, implement IDisposable and use using; that doesn't collect the managed memory, but can help a lot with unmanaged resources and onward chains.
In particular, things to watch out for:
lots of pinning (places a lot of restrictions on what the GC can do)
lots of finalizers (you don't usually need them; slows down GC)
static events - easy way to keep a lot of large object graphs alive ;-p
events on an inexpensive long-life object, that can see an expensive object that should have been cleaned up
"captured variables" accidentally keeping graphs alive
For investigating memory leaks... "SOS" is one of the easiest routes; you can use SOS to find all instances of a type, and what can see it, etc.

In general, the less you worry about memory allocation in C#, the better off you are. I would leave it to a profiler to tell me when I'm having issues with collection.
You can't create memory leaks in C# in the same way as you do in C++. The garbage collector will always "have your back". What you can do is create objects and hold references to them even though you never use them. That's a code smell to look out for.
Other than that:
Have some notion of how frequently collection will occur (for performance reasons)
Don't hold references to objects longer than you need
Dispose of objects that implement IDisposable as soon as you're done with them (use the using syntax)
Properly implement the IDisposable interface

The main sources of memory leaks I can think of are:
keeping references to objects you don't need any more (usually in some sort of collection) So here you need to remember that all things that you add to a collection that you have reference too will stay in memory.
Having circular references, e.g. having delegates registered with an event. So even though you explicitly don't reference an object, it can't get garbage collected because one of its methods is registered as a delegate with an event. In these cases you need to remember to remove the delegate before discarding the reference.
Interoperating with native code and failing to free it. Even if you use managed wrappers that implement finalizers, often the CLR doesn't clean them fast enough, because it doesn't understand the memory footprint. You should use the using(IDisposable ){} pattern

One other thing to consider for memory management is if you are implementing any Observer patterns and not disposing of the references correctly.
For instance:
Object A watches Object B
Object B is disposed if the reference from A to B is not disposed of property the GC will not properyly dispose of the object. Becuase the event handler is still assigned the GC doesn't see it as a non utilized resource.
If you have a small set of objects you're working with this may me irrelevant. However, if your working with thousands of objects this can cause a gradual increase in memory over the life of the application.
There are some great memory management software applications to monitor what's going on with the heap of your application. I found great benefit from utilizing .Net Memory Profiler.
HTH

I recommend using .NET Memory Profiler
.NET Memory Profiler is a powerful tool for finding memory leaks and optimizing the memory usage in programs written in C#, VB.NET or any other .NET Language.
.NET Memory Profiler will help you to:
View real-time memory and resource information
Easily identify memory leaks by collecting and comparing snapshots of .NET memory
Find instances that are not properly disposed
Get detailed information about unmanaged resource usage
Optimize memory usage
Investigate memory problems in production code
Perform automated memory testing
Retrieve information about native memory
Take a look at their video tutorials:
http://memprofiler.com/tutorials/

Others have already mentioned the importance of IDisposable, and some of the things to watch out for in your code.
I wanted to suggest some additional resources; I found the following invaluable when learning the details of .NET GC and how to trouble-shoot memory issues in .NET applications.
CLR via C# by Jeffrey Richter is an excellent book. Worth the purchase price just for the chapter on GC and memory.
This blog (by a Microsoft "ASP.NET Escalation Engineer") is often my go-to source for tips and tricks for using WinDbg, SOS, and for spotting certain types of memory leaks. Tess even designed .NET debugging demos/labs which will walk you through common memory issues and how to recognize and solve them.
Debugging Tools for Windows (WinDbg, SOS, etc)

You can use tools like CLR profiler it takes some time to learn how to use it correctly, but after all it is free. (It helped me several times to find my memory leakage)

The best way to ensure that objects get deleted, or in .NET lingo, garbage-collected, is to ensure that all root references (references that can be traced through methods and objects to the first method on a thread's call stack) to an object are set to null.
The GC cannot, and will not, collect an object if there are any rooted references to it, no matter whether it implements IDisposable or not.
Circular references impose no penalty or possibility of memory leaks, as the GC marks which objects it has visited in the object graph. In the case of delegates or eventhandlers it may be common to forget to remove the reference in an event to a target method, so that the object that contains the target method can't be collected if the event is rooted.

What are the best ways to get things deleted?
NOTE: the following works only for types containing unmanaged resources. It doesn't help with purely managed types.
Probably the best method is to implement and follow the IDisposable pattern; and call the dispose method on all objects implementing it.
The 'using' statement is your best friend. Loosely put, it will call dispose for you on objects implementing IDisposable.

Why doesn't .NET have a SoftReference as well as a WeakReference, like Java?

I really love WeakReference's. But I wish there was a way to tell the CLR how much (say, on a scale of 1 to 5) how weak you consider the reference to be. That would be brilliant.
Java has SoftReference, WeakReference and I believe also a third type called a "phantom reference". That's 3 levels right there which the GC has a different behaviour algorithm for when deciding if that object gets the chop.
I am thinking of subclassing .NET's WeakReference (luckily and slightly bizzarely it isn't sealed) to make a pseudo-SoftReference that is based on a expiration timer or something.

I believe the fundamental reason that NET does not have soft references is because it can rely on an operating system with virtual memory. A Java process must specify its maximum OS memory (e.g. with -Xmx128M), and it never takes more OS memory than that. Whereas a NET process keeps taking OS memory that it needs, which the OS supplies with disk-backed virtual memory when RAM runs out. If NET allowed soft references, then the NET runtime would not know when to release them unless it either peeked deep into the OS to see if its memory is actually paged on disk (a nasty OS/CLR dependency), or it requested the runtime to specify a maximum process memory footprint (e.g. an equivalent of -Xmx). I guess that Microsoft does not want to add -Xmx to NET because they think the OS should decide how much RAM each process gets (by choosing which virtual memory pages to hold in RAM or on disk), and not the process itself.

Java SoftReferences are used in the creation of memory sensitive caches (they serve no other purpose).
As of .NET 4, .NET has a class System.Runtime.Caching.MemoryCache which will probably meet any such needs.

Having a WeakReference with varying levels of weakness (priority) sounds nice, but also might make the GC's job harder, not easier. (I've no idea on the GC internals, but) I would assume there some sort of additional access statistics that are kept for WeakReference objects so that the GC can clean them up efficiently (e.g. it might get rid of the least-used items first).
More than likely the added complexity wouldn't make anything any more efficient because the most efficient way is to get rid of infrequently used WeakReferences first. If you could assign a priority, how would you do it? This smells like a premature optimization: the programmer doesn't really know most of the time and is guessing; the result is a slower GC collection cycle that is probably reclaiming the wrong objects.
It begs the question though, that if you care about the WeakReference.Target object being reclaimed, is it really a good use of WeakReference?
It's like a cache. You shove stuff into the cache and ask the cache to make it stale after x minutes, but most caches never guarantee to keep it around at all. It just guarantees that if it does, it will expire it according to the policy requested.

My guess as to why this isn't there already would be simplicity. Most people, I think, would call it a virtue that there is only one type of reference, not four.

Maybe the ASP.NET Cache class (System.Web.Caching.Cache) might help achieve what you want? It automatically remove objects if memory gets low:
ASP.NET Caching Overview
Here's an article that shows how to use the Cache class in a windows forms application.
quoted from: Equivalent to SoftReference in .net?

Don't forget that you also have your standard references (the ones that you use on a daily basis). This gives you one more level.
WeakReferences should be used when you don't really care if the object goes away, while SoftReferences really only should be used when you would use a normal reference, but you would rather your object be cleared then for you to run out of memory. I'm not sure on the specifics, but I suspect that the GC normally traces through SoftReferences but not WeakReferences when determining which objects are live, but when running low on memory will also skip the SoftReferences.
My guess is that the .Net designers felt that the difference was confusing to most people and or that SoftReferences add more complexity than they really wanted and so decided to leave them out.
As a side note, AFAIK PhantomReferences are mostly designed for internal use by the virtual machine and are not intended for actual client use.

Maybe there should be an property where you can specify which Generation that the object >= before it is collected. So if you specify 1 then it is the weakest possible reference. But if you specify 3 then it would need to survive at least 3 prior collections before it can be considered for collection itself.
I thought the track ressurection flag was no good for this because by that time the object has already been finalized? May be wrong though...
(PS: I am the OP, just signed up. PITA that it doesn't inherit your history from "unregistered" accounts.)

Looking for the 'trackResurrection' option passed to the constructor perhaps?
The GC class also offers some assistance.

Don't know why .NET does not have Softreferences.
BUT in Java Softreferences are IMHO overused. The reason is tha at least in an application server you would want to be able to influence per application how long your Softreferenzen live. That's currently not possible in Java.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.