C# memory management: unsafe keyword and pointers

C# memory management: unsafe keyword and pointers - c#

What are the consequences (positive/negative) of using the unsafe keyword in C# to use pointers? For example, what becomes of garbage collection, what are the performance gains/losses, what are the performance gains/losses compared to other languages manual memory management, what are the dangers, in which situation is it really justifiable to make use of this language feature, is it longer to compile... ?

As already mentioned by Conrad, there are some situations where unsafe access to memory in C# is useful. There are not as many of them, but there are some:
Manipulating with Bitmap is almost a typical example where you need some additional performance that you can get by using unsafe.
Interoperability with older API (such as WinAPI or native C/C++ DLLs) is another area where unsafe can be quite useful - for example you may want to call a function that takes/returns an unmanaged pointer.
On the other hand, you can write most of the things using Marshall class, which hides many of the unsafe operations inside method calls. This will be a bit slower, but it is an option if you want to avoid using unsafe (or if you're using VB.NET which doesn't have unsafe)
Positive consequences:
So, the main positive consequences of the existence of unsafe in C# is that you can write some code more easily (interoperability) and you can write some code more efficiently (manipulating with bitmap or maybe some heavy numeric calculations using arrays - although, I'm not so sure about the second one).
Negative consequences: Of course, there is some price that you have to pay for using unsafe:
Non-verifiable code: C# code that is written using the unsafe features becomes non-verifiable, which means that the your code could compromise the runtime in any way. This isn't a big problem in a full-trust scenario (e.g. unrestricted desktop app) - you just don't have all the nice .NET CLR guarantees. However, you cannot run the application in a restricted enviornment such as public web hosting, Silverlight or partial trust (e.g. application running from network).
Garbage collector also needs to be careful when you use unsafe. GC is usually allowed to relocate objects on the managed heap (to keep the memory defragmented). When you take a pointer to some object, you need to use the fixed keyword to tell the GC that it cannot move the object until you finish (which could probably affect the performance of garbage collection - but of course, depending on the exact scenario).
My guess that if C# didn't have to interoperate with older code, it probably wouldn't support unsafe (and research projects like Singularity that attempt to create more verifiable operating system based on managed languages definitely disallow usnsafe code). However, in the real-world, unsafe is useful in some (rare) cases.

I can give you a situation where it was worth using:
I have to generate a bitmap pixel by pixel. Drawing.Bitmap.SetPixel() is way too slow. So I build my own managed Array of bitmap data, and use unsafe to get the IntPtr for Bitmap.Bitmap( Int32, Int32, Int32, PixelFormat, IntPtr).

To quote Professional C# 2008:
"The two main reasons for using
pointers are:
Backward compability - Despite all of the facilities provided by the
.NET-runtime it is still possible to
call native Windows API functions, and
for some operations this may be the
only way to accompling your task.
These API functions are generally
written in C and often require
pointers as parameters. However, in
many cases it is possible to write the
DllImport declaration in a way that
avoids use of pointers; for example,
by using the System.IntPtr class.
Performance - On those occasions where speed is of the utmost
importance, pointer can provide a
route to optimized perfomance. If you
know what you are doing, you can
ensure that data is accessed or
manipulated in the most efficient way.
However, be aware that, more often
than not, there are other areas of
your code where you can make necessary
performance improvemens without
resoirting to pointers. Try using a
code profiler to look for bottlenecks
in your code - one comes with Visual
Studio 2008."
And if you use pointer your code will require higher lever of trust to execute and if the user does not grant that your code will not run.
And wrap it up with a last quote:
"We strongly advice against using
pointers unnecessarily because it will
not only be harder to write and debug,
but it will also fail the memory
type-safety checks imposed by the
CLR."

Garbage Collection is inefficient with long-lived objects. .Net's garbage collector works best when most objects are released rather quickly, and some objects "live forever." The problem is that longer-living objects are only released during full garbage collections, which incurs a significant performance penalty. In essence, long-living objects quickly move into generation 2.
(For more information, you might want to read up on .Net's generational garbage collector: http://msdn.microsoft.com/en-us/library/ms973837.aspx)
In situations where objects, or memory use in general, is going to be long-lived, manual memory management will yield better performance because it can be released to the system without requiring a full garbage collection.
Implementing some kind of a memory management system based around a single large byte array, structs, and lots of pointer arithmetic, could theoretically increase performance in situations where data will be stored in RAM for a long time.
Unfortunately, I'm not aware of a good way to do manual memory management in .Net for objects that are going to be long-lived. This basically means that applications that have long-lived data in RAM will periodically become unresponsive when they run a full garbage collection of all of the memory.

Related

Does C# allows pointers? [duplicate]

I know C# gives the programmer the ability to access, use pointers in an unsafe context. But When is this needed?
At what circumstances, using pointers becomes inevitable?
Is it only for performance reasons?
Also why does C# expose this functionality through an unsafe context, and remove all of the managed advantages from it? Is it possible to have use pointers without losing any advantages of managed environment, theoretically?

When is this needed? Under what circumstances does using pointers becomes inevitable?
When the net cost of a managed, safe solution is unacceptable but the net cost of an unsafe solution is acceptable. You can determine the net cost or net benefit by subtracting the total benefits from the total costs. The benefits of an unsafe solution are things like "no time wasted on unnecessary runtime checks to ensure correctness"; the costs are (1) having to write code that is safe even with the managed safety system turned off, and (2) having to deal with potentially making the garbage collector less efficient, because it cannot move around memory that has an unmanaged pointer into it.
Or, if you are the person writing the marshalling layer.
Is it only for performance reasons?
It seems perverse to use pointers in a managed language for reasons other than performance.
You can use the methods in the Marshal class to deal with interoperating with unmanaged code in the vast majority of cases. (There might be a few cases in which it is difficult or impossible to use the marshalling gear to solve an interop problem, but I don't know of any.)
Of course, as I said, if you are the person writing the Marshal class then obviously you don't get to use the marshalling layer to solve your problem. In that case you'd need to implement it using pointers.
Why does C# expose this functionality through an unsafe context, and remove all of the managed advantages from it?
Those managed advantages come with performance costs. For example, every time you ask an array for its tenth element, the runtime needs to do a check to see if there is a tenth element, and throw an exception if there isn't. With pointers that runtime cost is eliminated.
The corresponding developer cost is that if you do it wrong then you get to deal with memory corruption bugs that formats your hard disk and crashes your process an hour later rather than dealing with a nice clean exception at the point of the error.
Is it possible to use pointers without losing any advantages of managed environment, theoretically?
By "advantages" I assume you mean advantages like garbage collection, type safety and referential integrity. Thus your question is essentially "is it in theory possible to turn off the safety system but still get the benefits of the safety system being turned on?" No, clearly it is not. If you turn off that safety system because you don't like how expensive it is then you don't get the benefits of it being on!

Pointers are an inherent contradiction to the managed, garbage-collected, environment.
Once you start messing with raw pointers, the GC has no clue what's going on.
Specifically, it cannot tell whether objects are reachable, since it doesn't know where your pointers are.
It also cannot move objects around in memory, since that would break your pointers.
All of this would be solved by GC-tracked pointers; that's what references are.
You should only use pointers in messy advanced interop scenarios or for highly sophisticated optimization.
If you have to ask, you probably shouldn't.

The GC can move references around; using unsafe keeps an object outside of the GC's control, and avoids this. "Fixed" pins an object, but lets the GC manage the memory.
By definition, if you have a pointer to the address of an object, and the GC moves it, your pointer is no longer valid.
As to why you need pointers: Primary reason is to work with unmanaged DLLs, e.g. those written in C++
Also note, when you pin variables and use pointers, you're more susceptible to heap fragmentation.
Edit
You've touched on the core issue of managed vs. unmanaged code... how does the memory get released?
You can mix code for performance as you describe, you just can't cross managed/unmanaged boundaries with pointers (i.e. you can't use pointers outside of the 'unsafe' context).
As for how they get cleaned... You have to manage your own memory; objects that your pointers point to were created/allocated (usually within the C++ DLL) using (hopefully) CoTaskMemAlloc(), and you have to release that memory in the same manner, calling CoTaskMemFree(), or you'll have a memory leak. Note that only memory allocated with CoTaskMemAlloc() can be freed with CoTaskMemFree().
The other alternative is to expose a method from your native C++ dll that takes a pointer and frees it... this lets the DLL decide how to free the memory, which works best if it used some other method to allocate memory. Most native dlls you work with are third-party dlls that you can't modify, and they don't usually have (that I've seen) such functions to call.
An example of freeing memory, taken from here:
string[] array = new string[2];
array[0] = "hello";
array[1] = "world";
IntPtr ptr = test(array);
string result = Marshal.PtrToStringAuto(ptr);
Marshal.FreeCoTaskMem(ptr);
System.Console.WriteLine(result);
Some more reading material:
C# deallocate memory referenced by IntPtr
The second answer down explains the different allocation/deallocation methods
How to free IntPtr in C#?
Reinforces the need to deallocate in the same manner the memory was allocated
http://msdn.microsoft.com/en-us/library/aa366533%28VS.85%29.aspx
Official MSDN documentation on the various ways to allocate and deallocate memory.
In short... you need to know how the memory was allocated in order to free it.
Edit
If I understand your question correctly, the short answer is yes, you can hand the data off to unmanaged pointers, work with it in an unsafe context, and have the data available once you exit the unsafe context.
The key is that you have to pin the managed object you're referencing with a fixed block. This prevents the memory you're referencing from being moved by the GC while in the unsafe block. There are a number of subtleties involved here, e.g. you can't reassign a pointer initialized in a fixed block... you should read up on unsafe and fixed statements if you're really set on managing your own code.
All that said, the benefits of managing your own objects and using pointers in the manner you describe may not buy you as much of a performance increase as you might think. Reasons why not:
C# is very optimized and very fast
Your pointer code is still generated as IL, which has to be jitted (at which point further optimizations come into play)
You're not turning the Garbage Collector off... you're just keeping the objects you're working with out of the GC's purview. So every 100ms or so, the GC still interrupts your code and executes its functions for all the other variables in your managed code.
HTH,
James

The most common reasons to use pointers explicitly in C#:
doing low-level work (like string manipulation) that is very performance sensitive,
interfacing with unmanaged APIs.
The reason why the syntax associated with pointers was removed from C# (according to my knowledge and viewpoint — Jon Skeet would answer better B-)) was it turned out to be superfluous in most situations.
From the language design perspective, once you manage memory by a garbage collector you have to introduce severe constraints on what is and what is not possible to do with pointers. For example, using a pointer to point into the middle of an object can cause severe problems to the GC. Hence, once the restrictions are in place, you can just omit the extra syntax and end up with “automatic” references.
Also, the ultra-benevolent approach found in C/C++ is a common source of errors. For most situations, where micro-performance doesn't matter at all, it is better to offer tighter rules and constrain the developer in favor of less bugs that would be very hard to discover. Thus for common business applications the so-called “managed” environments like .NET and Java are better suited than languages that presume to work against the bare-metal machine.

Say you want to communicate between 2 application using IPC (shared memory) then you can marshal the data to memory and pass this data pointer to the other application via windows messaging or something. At receiving application you can fetch data back.
Useful also in case of transferring data from .NET to legacy VB6 apps wherein you will marshal the data to memory, pass pointer to VB6 app using win msging, use VB6 copymemory() to fetch data from the managed memory space to VB6 apps unmanaged memory space..

Real buffers usage guid-lines

Now i started work on typical application that is massively using buffers.. i was surprised that i can't find good clear guide on this topic.
I have couple questions.
1) When do i prefer to use buffer in unmanaged heap memory over managed memory?
I know that object allocation is faster on .net then on the unmanaged heap and object destruction is much more expensive on .net because of GC overhead, so i think that it will be little faster to use unmanaged. When should i use fixed{} and when Marshal.AllocHGlobal()?
2) As i understand it is more effective to use week reference for both managed and unmanaged buffers in .net if it buffer possible can be reused after some time (based on user actions), isn't it?

Trying to manually manage your memory allocations using native "buffers" is going to be difficult at best with .NET. You can't allocate managed types into the unmanaged buffers, so they'll only be usable for structured data, in which case, there's little advantage over a simple managed array (which will stay in memory contiguously, etc).
In general, it's typically a better approach to try to manage how you're allocating and letting go of objects, and try to manually reuse them as appropriate (if, and only if, memory pressure is an issue for you).
As for some of your specific points:
I know that object allocation is faster on .net then on the unmanaged heap and object destruction is much more expensive on .net because of GC overhead, so i think that it will be little faster to use unmanaged.
I think your assumptions here are a bit flawed. Object allocation, at the point of allocation, is typically faster in .NET, as the CLR will potentially have preallocated memory it can already use. Object "destruction" is also faster on .NET, though there is a deferred cost due to GC that can be a bit higher (though it's not always). There are a lot of factors here, mainly focused around object lifecycles - if you allow your objects to be promoted to Gen1 or especially Gen2, then things can potentially get difficult to track and measure, as GC compaction costs can be higher.
When should i use fixed{} and when Marshal.AllocHGlobal()?
In general, you would (very) rarely use either in C#. You're typically better off leaving your memory unpinned, and allowing the GC to do its work properly, which in turn tends to lead to better GC heuristics overall.
2) As i understand it is more effective to use week reference for both managed and unmanaged buffers in .net if it buffer possible can be reused after some time (based and user actions), isn't it?
Not necessarily. Reusing objects and keeping them alive longer than necessary has some serious drawbacks, as well. This will probably guarantee that the memory will get promoted into Gen2, which will potentially make life worse, not better.
Typically, my advice would be to trust the system, but measure as you go. If, and only if, you find a real problem, there are almost always ways to address those specific issues (without resorting to unmanaged or manually managing memory buffers). Working with raw memory should be an absolute last resort when dealing with a managed code base.

C# Garbage Collection -> to C++ delete

I'm converting a C# project to C++ and have a question about deleting objects after use. In C# the GC of course takes care of deleting objects, but in C++ it has to be done explicitly using the delete keyword.
My question is, is it ok to just follow each object's usage throughout a method and then delete it as soon as it goes out of scope (ie method end/re-assignment)?
I know though that the GC waits for a certain size of garbage (~1MB) before deleting; does it do this because there is an overhead when using delete?
As this is a game I am creating there will potentially be lots of objects being created and deleted every second, so would it be better to keep track of pointers that go out of scope, and once that size reachs 1MB to then delete the pointers?
(as a side note: later when the game is optimised, objects will be loaded once at startup so there is not much to delete during gameplay)

Your problem is that you are using pointers in C++.
This is a fundamental problem that you must fix, then all your problems go away. As chance would have it, I got so fed up with this general trend that I created a set of presentation slides on this issue. – (CC BY, so feel free to use them).
Have a look at the slides. While they are certainly not entirely serious, the fundamental message is still true: Don’t use pointers. But more accurately, the message should read: Don’t use delete.
In your particular situation you might find yourself with a lot of long-lived small objects. This is indeed a situation which a modern GC handles quite well, and which reference-counting smart pointers (shared_ptr) handle less efficiently. If (and only if!) this becomes a performance problem, consider switching to a small object allocator library.

You should be using RAII as much as possible in C++ so you do not have to explicitly deleteanything anytime.
Once you use RAII through smart pointers and your own resource managing classes every dynamic allocation you make will exist only till there are any possible references to it, You do not have to manage any resources explicitly.

Memory management in C# and C++ is completely different. You shouldn't try to mimic the behavior of .NET's GC in C++. In .NET allocating memory is super fast (basically moving a pointer) whereas freeing it is the heavy task. In C++ allocating memory isn't that lightweight for several reasons, mainly because a large enough chunk of memory has to be found. When memory chunks of different sizes are allocated and freed many times during the execution of the program the heap can get fragmented, containing many small "holes" of free memory. In .NET this won't happen because the GC will compact the heap. Freeing memory in C++ is quite fast, though.
Best practices in .NET don't necessarily work in C++. For example, pooling and reusing objects in .NET isn't recommended most of the time, because the objects get promoted to higher generations by the GC. The GC works best for short lived objects. On the other hand, pooling objects in C++ can be very useful to avoid heap fragmentation. Also, allocating a larger chunk of memory and using placement new can work great for many smaller objects that need to be allocated and freed frequently, as it can occur in games. Read up on general memory management techniques in C++ such as RAII or placement new.
Also, I'd recommend getting the books "Effective C++" and "More effective C++".

Well, the simplest solution might be to just use garbage collection in
C++. The Boehm collector works well, for example. Still, there are
pros and cons (but porting code originally written in C# would be a
likely candidate for a case where the pros largely outweigh the cons.)
Otherwise, if you convert the code to idiomatic C++, there shouldn't be
that many dynamically allocated objects to worry about. Unlike C#, C++
has value semantics by default, and most of your short lived objects
should be simply local variables, possibly copied if they are returned,
but not allocated dynamically. In C++, dynamic allocation is normally
only used for entity objects, whose lifetime depends on external events;
e.g. a Monster is created at some random time, with a probability
depending on the game state, and is deleted at some later time, in
reaction to events which change the game state. In this case, you
delete the object when the monster ceases to be part of the game. In
C#, you probably have a dispose function, or something similar, for
such objects, since they typically have concrete actions which must be
carried out when they cease to exist—things like deregistering as
an Observer, if that's one of the patterns you're using. In C++, this
sort of thing is typically handled by the destructor, and instead of
calling dispose, you call delete the object.

Substituting a shared_ptr in every instance that you use a reference in C# would get you the closest approximation at probably the lowest effort input when converting the code.
However you specifically mention following an objects use through a method and deleteing at the end - a better approach is not to new up the object at all but simply instantiate it inline/on the stack. In fact if you take this approach even for returned objects with the new copy semantics being introduced this becomes an efficient way to deal with returned objects also - so there is no need to use pointers in almost every scenario.

There are a lot more things to take into considerations when deallocating objects than just calling delete whenever it goes out of scope. You have to make sure that you only call delete once and only call it once all pointers to that object have gone out of scope. The garbage collector in .NET handles all of that for you.
The construct that is mostly corresponding to that in C++ is tr1::shared_ptr<> which keeps a reference counter to the object and deallocates when it drops to zero. A first approach to get things running would be to make all C# references in to C++ tr1::shared_ptr<>. Then you can go into those places where it is a performance bottleneck (only after you've verified with a profile that it is an actual bottleneck) and change to more efficient memory handling.

GC feature of c++ has been discussed a lot in SO.
Try Reading through this!!
Garbage Collection in C++

Is there any performance hit or thread context switch on using unsafe code?

If I want to use a bit of unsafe code inside a very time sensitive app - will be there any delay in 'swiching' to unsafe code or thread context switch? C# .net 4

In principle: no. The whole point is that you bypass some of the managed runtime checks and restrictions.
That said, it is theoretically possible that the JIT engine can apply fewer optimizations in rare circumstances, due to the fact that fewer assumptions can be made about the code in the unsafe block. Edit Actually the point about pinning heap memory made by Matthew is a prime example that lies in this direction. The JIT-ter and GC engine are more restricted and can make fewer assumptions
Also, unsafe code requires running with certain permissions so it might not be appropriate for all deplyoment targets.

The time taken to get fixed memory locations and converting indexes to pointers may have negative impacts depending on what you are trying to do. Only real way to know is try it as both safe and unsafe and see which is faster. (My experience has been safe work is typically faster... I was very surprised.)

There usually is a penalty from marshaling the parameters and results.

Pinning pointer arrays in memory

I'm currently working on a ray-tracer in C# as a hobby project. I'm trying to achieve a decent rendering speed by implementing some tricks from a c++ implementation and have run into a spot of trouble.
The objects in the scenes which the ray-tracer renders are stored in a KdTree structure and the tree's nodes are, in turn, stored in an array. The optimization I'm having problems with is while trying to fit as many tree nodes as possible into a cache line. One means of doing this is for nodes to contain a pointer to the left child node only. It is then implicit that the right child follows directly after the left one in the array.
The nodes are structs and during tree construction they are succesfully put into the array by a static memory manager class. When I begin to traverse the tree it, at first, seems to work just fine. Then at a point early in the rendering (about the same place each time), the left child pointer of the root node is suddenly pointing at a null pointer. I have come to the conclusion that the garbage collecter has moved the structs as the array lies on the heap.
I've tried several things to pin the addresses in memory but none of them seems to last for the entire application lifetime as I need. The 'fixed' keyword only seems to help during single method calls and declaring 'fixed' arrays can only be done on simple types which a node isn't. Is there a good way to do this or am I just too far down the path of stuff C# wasn't meant for.
Btw, changing to c++, while perhaps the better choice for a high performance program, is not an option.

Firstly, if you're using C# normally, you can't suddenly get a null reference due to the garbage collector moving stuff, because the garbage collector also updates all references, so you don't need to worry about it moving stuff around.
You can pin things in memory but this may cause more problems than it solves. For one thing, it prevents the garbage collector from compacting memory properly, and may impact performance in that way.
One thing I would say from your post is that using structs may not help performance as you hope. C# fails to inline any method calls involving structs, and even though they've fixed this in their latest runtime beta, structs frequently don't perform that well.
Personally, I would say C++ tricks like this don't generally tend to carry over too well into C#. You may have to learn to let go a bit; there can be other more subtle ways to improve performance ;)

What is your static memory manager actually doing? Unless it is doing something unsafe (P/Invoke, unsafe code), the behaviour you are seeing is a bug in your program, and not due to the behaviour of the CLR.
Secondly, what do you mean by 'pointer', with respect to links between structures? Do you literally mean an unsafe KdTree* pointer? Don't do that. Instead, use an index into the array. Since I expect that all nodes for a single tree are stored in the same array, you won't need a separate reference to the array. Just a single index will do.
Finally, if you really really must use KdTree* pointers, then your static memory manager should allocate a large block using e.g. Marshal.AllocHGlobal or another unmanaged memory source; it should both treat this large block as a KdTree array (i.e. index a KdTree* C-style) and it should suballocate nodes from this array, by bumping a "free" pointer.
If you ever have to resize this array, then you'll need to update all the pointers, of course.
The basic lesson here is that unsafe pointers and managed memory do not mix outside of 'fixed' blocks, which of course have stack frame affinity (i.e. when the function returns, the pinned behaviour goes away). There is a way to pin arbitrary objects, like your array, using GCHandle.Alloc(yourArray, GCHandleType.Pinned), but you almost certainly don't want to go down that route.
You will get more sensible answers if you describe in more detail what you are doing.

If you really want to do this, you can use the GCHandle.Alloc method to specify that a pointer should be pinned without being automatically released at the end of the scope like the fixed statement.
But, as other people have been saying, doing this is putting undue pressure on the garbage collector. What about just creating a struct that holds onto a pair of your nodes and then managing an array of NodePairs rather than an array of nodes?
If you really do want to have completely unmanaged access to a chunk of memory, you would probably be better off allocating the memory directly from the unmanaged heap rather than permanently pinning a part of the managed heap (this prevents the heap from being able to properly compact itself). One quick and simple way to do this would be to use Marshal.AllocHGlobal method.

Is it really prohibitive to store the pair of array reference and index?

What is your static memory manager actually doing? Unless it is doing something unsafe (P/Invoke, unsafe code), the behaviour you are seeing is a bug in your program, and not due to the behaviour of the CLR.
I was in fact speaking about unsafe pointers. What I wanted was something like Marshal.AllocHGlobal, though with a lifetime exceeding a single method call. On reflection it seems that just using an index is the right solution as I might have gotten too caught up in mimicking the c++ code.
One thing I would say from your post is that using structs may not help performance as you hope. C# fails to inline any method calls involving structs, and even though they've fixed this in their latest run-time beta, structs frequently don't perform that well.
I looked into this a bit and I see it has been fixed in .NET 3.5SP1; I assume that's what you were referring to as the run-time beta. In fact, I now understand that this change accounted for a doubling of my rendering speed. Now, structs are aggressively in-lined, improving their performance greatly on X86 systems (X64 had better struct performance in advance).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.