I'm attempting to marshal a forest of objects from C# .NET to native C++. That is: I have a graph of hundreds of millions of objects (if not more), that I wish to use in native C++. See it as a normal 'leaf'/'node' construction with pointers between leafs and nodes. I control both the C++ and the C# code, so I can make adjustments to the code.
The inner loop of the software is going to be implemented in native C++ for performance reasons. I basically want to tell the GC to stop for a while (to ensure objects aren't moved), then do the fancy C++ routine, and then continue the GC once it's done.
There are also things that I don't want to do:
Make my own mark & sweep algorithm to pin all objects in the graph. Not only will that be very time consuming, it'll also cost a lot of memory because I then have to keep track of all these GCHandle objects by myself.
Use native allocation methods like malloc. I've had a native C++ application in the past, and it suffered greatly from memory fragmentation, that .NET 'automatically' solves just fine... not to mention the benefit of GC.
So, any suggestions on how to do this?
I will look at using managed C++.
Maybe accessing the .NET objects from manage C++ will be fast enough.
Otherwise use managed C++ to “walk” the .net objects and create native C++ objects, delete them all once done.
Or create a class factory class in manage C++ that can be used to create the C++ object being callable from C#, once again delete them all once done.
Or do as Marc Gravel says and manually allocating a buffer of unmanaged memory, and dealing with structs inside that space, maybe using a code generator driven from attributes on your C# classes.
Related
I know C# gives the programmer the ability to access, use pointers in an unsafe context. But When is this needed?
At what circumstances, using pointers becomes inevitable?
Is it only for performance reasons?
Also why does C# expose this functionality through an unsafe context, and remove all of the managed advantages from it? Is it possible to have use pointers without losing any advantages of managed environment, theoretically?
When is this needed? Under what circumstances does using pointers becomes inevitable?
When the net cost of a managed, safe solution is unacceptable but the net cost of an unsafe solution is acceptable. You can determine the net cost or net benefit by subtracting the total benefits from the total costs. The benefits of an unsafe solution are things like "no time wasted on unnecessary runtime checks to ensure correctness"; the costs are (1) having to write code that is safe even with the managed safety system turned off, and (2) having to deal with potentially making the garbage collector less efficient, because it cannot move around memory that has an unmanaged pointer into it.
Or, if you are the person writing the marshalling layer.
Is it only for performance reasons?
It seems perverse to use pointers in a managed language for reasons other than performance.
You can use the methods in the Marshal class to deal with interoperating with unmanaged code in the vast majority of cases. (There might be a few cases in which it is difficult or impossible to use the marshalling gear to solve an interop problem, but I don't know of any.)
Of course, as I said, if you are the person writing the Marshal class then obviously you don't get to use the marshalling layer to solve your problem. In that case you'd need to implement it using pointers.
Why does C# expose this functionality through an unsafe context, and remove all of the managed advantages from it?
Those managed advantages come with performance costs. For example, every time you ask an array for its tenth element, the runtime needs to do a check to see if there is a tenth element, and throw an exception if there isn't. With pointers that runtime cost is eliminated.
The corresponding developer cost is that if you do it wrong then you get to deal with memory corruption bugs that formats your hard disk and crashes your process an hour later rather than dealing with a nice clean exception at the point of the error.
Is it possible to use pointers without losing any advantages of managed environment, theoretically?
By "advantages" I assume you mean advantages like garbage collection, type safety and referential integrity. Thus your question is essentially "is it in theory possible to turn off the safety system but still get the benefits of the safety system being turned on?" No, clearly it is not. If you turn off that safety system because you don't like how expensive it is then you don't get the benefits of it being on!
Pointers are an inherent contradiction to the managed, garbage-collected, environment.
Once you start messing with raw pointers, the GC has no clue what's going on.
Specifically, it cannot tell whether objects are reachable, since it doesn't know where your pointers are.
It also cannot move objects around in memory, since that would break your pointers.
All of this would be solved by GC-tracked pointers; that's what references are.
You should only use pointers in messy advanced interop scenarios or for highly sophisticated optimization.
If you have to ask, you probably shouldn't.
The GC can move references around; using unsafe keeps an object outside of the GC's control, and avoids this. "Fixed" pins an object, but lets the GC manage the memory.
By definition, if you have a pointer to the address of an object, and the GC moves it, your pointer is no longer valid.
As to why you need pointers: Primary reason is to work with unmanaged DLLs, e.g. those written in C++
Also note, when you pin variables and use pointers, you're more susceptible to heap fragmentation.
Edit
You've touched on the core issue of managed vs. unmanaged code... how does the memory get released?
You can mix code for performance as you describe, you just can't cross managed/unmanaged boundaries with pointers (i.e. you can't use pointers outside of the 'unsafe' context).
As for how they get cleaned... You have to manage your own memory; objects that your pointers point to were created/allocated (usually within the C++ DLL) using (hopefully) CoTaskMemAlloc(), and you have to release that memory in the same manner, calling CoTaskMemFree(), or you'll have a memory leak. Note that only memory allocated with CoTaskMemAlloc() can be freed with CoTaskMemFree().
The other alternative is to expose a method from your native C++ dll that takes a pointer and frees it... this lets the DLL decide how to free the memory, which works best if it used some other method to allocate memory. Most native dlls you work with are third-party dlls that you can't modify, and they don't usually have (that I've seen) such functions to call.
An example of freeing memory, taken from here:
string[] array = new string[2];
array[0] = "hello";
array[1] = "world";
IntPtr ptr = test(array);
string result = Marshal.PtrToStringAuto(ptr);
Marshal.FreeCoTaskMem(ptr);
System.Console.WriteLine(result);
Some more reading material:
C# deallocate memory referenced by IntPtr
The second answer down explains the different allocation/deallocation methods
How to free IntPtr in C#?
Reinforces the need to deallocate in the same manner the memory was allocated
http://msdn.microsoft.com/en-us/library/aa366533%28VS.85%29.aspx
Official MSDN documentation on the various ways to allocate and deallocate memory.
In short... you need to know how the memory was allocated in order to free it.
Edit
If I understand your question correctly, the short answer is yes, you can hand the data off to unmanaged pointers, work with it in an unsafe context, and have the data available once you exit the unsafe context.
The key is that you have to pin the managed object you're referencing with a fixed block. This prevents the memory you're referencing from being moved by the GC while in the unsafe block. There are a number of subtleties involved here, e.g. you can't reassign a pointer initialized in a fixed block... you should read up on unsafe and fixed statements if you're really set on managing your own code.
All that said, the benefits of managing your own objects and using pointers in the manner you describe may not buy you as much of a performance increase as you might think. Reasons why not:
C# is very optimized and very fast
Your pointer code is still generated as IL, which has to be jitted (at which point further optimizations come into play)
You're not turning the Garbage Collector off... you're just keeping the objects you're working with out of the GC's purview. So every 100ms or so, the GC still interrupts your code and executes its functions for all the other variables in your managed code.
HTH,
James
The most common reasons to use pointers explicitly in C#:
doing low-level work (like string manipulation) that is very performance sensitive,
interfacing with unmanaged APIs.
The reason why the syntax associated with pointers was removed from C# (according to my knowledge and viewpoint — Jon Skeet would answer better B-)) was it turned out to be superfluous in most situations.
From the language design perspective, once you manage memory by a garbage collector you have to introduce severe constraints on what is and what is not possible to do with pointers. For example, using a pointer to point into the middle of an object can cause severe problems to the GC. Hence, once the restrictions are in place, you can just omit the extra syntax and end up with “automatic” references.
Also, the ultra-benevolent approach found in C/C++ is a common source of errors. For most situations, where micro-performance doesn't matter at all, it is better to offer tighter rules and constrain the developer in favor of less bugs that would be very hard to discover. Thus for common business applications the so-called “managed” environments like .NET and Java are better suited than languages that presume to work against the bare-metal machine.
Say you want to communicate between 2 application using IPC (shared memory) then you can marshal the data to memory and pass this data pointer to the other application via windows messaging or something. At receiving application you can fetch data back.
Useful also in case of transferring data from .NET to legacy VB6 apps wherein you will marshal the data to memory, pass pointer to VB6 app using win msging, use VB6 copymemory() to fetch data from the managed memory space to VB6 apps unmanaged memory space..
I'm converting a C# project to C++ and have a question about deleting objects after use. In C# the GC of course takes care of deleting objects, but in C++ it has to be done explicitly using the delete keyword.
My question is, is it ok to just follow each object's usage throughout a method and then delete it as soon as it goes out of scope (ie method end/re-assignment)?
I know though that the GC waits for a certain size of garbage (~1MB) before deleting; does it do this because there is an overhead when using delete?
As this is a game I am creating there will potentially be lots of objects being created and deleted every second, so would it be better to keep track of pointers that go out of scope, and once that size reachs 1MB to then delete the pointers?
(as a side note: later when the game is optimised, objects will be loaded once at startup so there is not much to delete during gameplay)
Your problem is that you are using pointers in C++.
This is a fundamental problem that you must fix, then all your problems go away. As chance would have it, I got so fed up with this general trend that I created a set of presentation slides on this issue. – (CC BY, so feel free to use them).
Have a look at the slides. While they are certainly not entirely serious, the fundamental message is still true: Don’t use pointers. But more accurately, the message should read: Don’t use delete.
In your particular situation you might find yourself with a lot of long-lived small objects. This is indeed a situation which a modern GC handles quite well, and which reference-counting smart pointers (shared_ptr) handle less efficiently. If (and only if!) this becomes a performance problem, consider switching to a small object allocator library.
You should be using RAII as much as possible in C++ so you do not have to explicitly deleteanything anytime.
Once you use RAII through smart pointers and your own resource managing classes every dynamic allocation you make will exist only till there are any possible references to it, You do not have to manage any resources explicitly.
Memory management in C# and C++ is completely different. You shouldn't try to mimic the behavior of .NET's GC in C++. In .NET allocating memory is super fast (basically moving a pointer) whereas freeing it is the heavy task. In C++ allocating memory isn't that lightweight for several reasons, mainly because a large enough chunk of memory has to be found. When memory chunks of different sizes are allocated and freed many times during the execution of the program the heap can get fragmented, containing many small "holes" of free memory. In .NET this won't happen because the GC will compact the heap. Freeing memory in C++ is quite fast, though.
Best practices in .NET don't necessarily work in C++. For example, pooling and reusing objects in .NET isn't recommended most of the time, because the objects get promoted to higher generations by the GC. The GC works best for short lived objects. On the other hand, pooling objects in C++ can be very useful to avoid heap fragmentation. Also, allocating a larger chunk of memory and using placement new can work great for many smaller objects that need to be allocated and freed frequently, as it can occur in games. Read up on general memory management techniques in C++ such as RAII or placement new.
Also, I'd recommend getting the books "Effective C++" and "More effective C++".
Well, the simplest solution might be to just use garbage collection in
C++. The Boehm collector works well, for example. Still, there are
pros and cons (but porting code originally written in C# would be a
likely candidate for a case where the pros largely outweigh the cons.)
Otherwise, if you convert the code to idiomatic C++, there shouldn't be
that many dynamically allocated objects to worry about. Unlike C#, C++
has value semantics by default, and most of your short lived objects
should be simply local variables, possibly copied if they are returned,
but not allocated dynamically. In C++, dynamic allocation is normally
only used for entity objects, whose lifetime depends on external events;
e.g. a Monster is created at some random time, with a probability
depending on the game state, and is deleted at some later time, in
reaction to events which change the game state. In this case, you
delete the object when the monster ceases to be part of the game. In
C#, you probably have a dispose function, or something similar, for
such objects, since they typically have concrete actions which must be
carried out when they cease to exist—things like deregistering as
an Observer, if that's one of the patterns you're using. In C++, this
sort of thing is typically handled by the destructor, and instead of
calling dispose, you call delete the object.
Substituting a shared_ptr in every instance that you use a reference in C# would get you the closest approximation at probably the lowest effort input when converting the code.
However you specifically mention following an objects use through a method and deleteing at the end - a better approach is not to new up the object at all but simply instantiate it inline/on the stack. In fact if you take this approach even for returned objects with the new copy semantics being introduced this becomes an efficient way to deal with returned objects also - so there is no need to use pointers in almost every scenario.
There are a lot more things to take into considerations when deallocating objects than just calling delete whenever it goes out of scope. You have to make sure that you only call delete once and only call it once all pointers to that object have gone out of scope. The garbage collector in .NET handles all of that for you.
The construct that is mostly corresponding to that in C++ is tr1::shared_ptr<> which keeps a reference counter to the object and deallocates when it drops to zero. A first approach to get things running would be to make all C# references in to C++ tr1::shared_ptr<>. Then you can go into those places where it is a performance bottleneck (only after you've verified with a profile that it is an actual bottleneck) and change to more efficient memory handling.
GC feature of c++ has been discussed a lot in SO.
Try Reading through this!!
Garbage Collection in C++
I have a project in which I'll have to process 100s if not 1000s of messages a second, and process/plot this data on graphs accordingly. (The user will search for a set of data in which the graph will be plotted in real time, not literally having to plot 1000s of values on a graph.)
I'm having trouble understanding using DLLs for having the bulk of the message processing in C++, but then handing the information into a C# interface. Can someone dumb it down for me here?
Also, as speed will be a priority, I was wondering if accessing across 2 different layers of code will have more of a performance hit than programming the project in its entirety in C#, or of course, C++. However, I've read bad things about programming a GUI in C++; in regards to which, this application must also look modern, clean, professional etc. So I was thinking C# would be the way forward (perhaps XAML, WPF).
Thanks for your time.
The simplest way to interop between a C/C++ DLL and a .NET Assembly is through p/invoke. On the C/C++ side, create a DLL as you would any other. On the C# side you create a p/invoke declaration. For example, say your DLL is mydll.dll and it exports a method void Foo():
[DllImport("mydll.dll")]
extern static void Foo();
That's it. You simply call Foo like any other static class method. The hard part is getting data marshalled and that is a complicated subject. If you are writing the DLL you can probably go out of your way to make the export functions easily marshalled. For more on the topic of p/invoke marshalling see here: http://msdn.microsoft.com/en-us/magazine/cc164123.aspx.
You will take a performance hit when using p/invoke. Every time a managed application makes an unmanaged method call, it takes a hit crossing the managed/unmanaged boundary and then back again. When you marshal data, a lot of copying goes on. The copying can be reduced if necessary by using 'unsafe' C# code (using pointers to access unmanaged memory directly).
What you should be aware of is that all .NET applications are chock full of p/invoke calls. No .NET application can avoid making Operating System calls and every OS call has to cross into the unmanaged world of the OS. WinForms and even WPF GUI applications make that journey many hundreds, even thousands of times a second.
If it were my task, I would first do it 100% in C#. I would then profile it and tweak performance as necessary.
If speed is your priority, C++ might be the better choice. Try to make some estimations about how hard the calculation really is (1000 messages can be trivial to handle in C# if the calculation per message is easy, and they can be too hard for even the best optimized program). C++ might have some more advantages (regarding performance) over C# if your algorithms are complex, involving different classes, etc.
You might want to take a look at this question for a performance comparison.
Separating back-end and front-end is a good idea. Whether you get a performance penalty from having one in C++ and the other in C# depends on how much data conversion is actually necessary.
I don't think programming the GUI is a pain in general. MFC might be painful, Qt is not (IMHO).
Maybe this gives you some points to start with!
Another possible way to go: sounds like this task is a prime target for parallelization. Build your app in such a way that it can split its workload on several CPU cores or even different machines. Then you can solve your performance problems (if there will be any) by throwing hardware at them.
If you have C/C++ source, consider linking it into C++/CLI .NET Assembly. This kind of project allows you to mix unmanaged code and put managed interfaces on it. The result is a simple .NET assembly which is trivial to use in C# or VB.NET projects.
There is built-in marshaling of simple types, so that you can call functions from the managed C++ side into the unmanaged side.
The only thing you need to be aware of is that when you marshal a delegate into a function pointer, it doesn't hold a reference, so if you need the C++ to hold managed callbacks, you need to arrange for a reference to be held. Other than that, most of the built-in conversions work as expected. Visual Studio will even let you debug across the boundary (turn on unmanaged debugging).
If you have a .lib, you can use it in a C++/CLI project as long as it's linked to the C-Runtime dynamically.
You should really prototype this in C# before you start screwing around with marshalling and unmarshalling data into unsafe structures so that you can invoke functions in a C++ DLL. C# is very often faster than you think it'll be. Prototyping is cheap.
I'm writing some native C++ code which needs to be called from C# (and I can't replace the C++ native code with C# code).
I found memory corruptions while allocating/deallocating some memory in the native C++ code using malloc/free. Then I used LocalAlloc/LocalFree and HeapAlloc/HeapFree and had the same problems.
The allocations/deallocations seem to be correct, and they happen in a separate thread, created by the native code.
I was wondering which is the best allocation strategy to use in a native C++ library called by C#
EDIT: found the problem: the problem wasn't in the allocation/deallocation code, but in some memory being written after being deallocated.
As long as the C# side of the code uses the compiler's /unsafe switch and the fixed keyword used for holding the buffer of data, I think you should be ok.
As to the question of your memory allocation, it may not be the C++ memory allocation code that is causing the problem, it could be the way how the C++ code is interacting with the driver...maybe using VirtualAlloc/VirtualFree pair as per the MSDN docs...
Edit: When you try to allocate the buffer to hold the data from the C++ side after interacting with the driver...possibly a race-condition or interrupt latency is causing a memory corruption...just a thought...
Hope this helps,
Best regards,
Tom.
Your question is missing essential details, it isn't at all clear whether the memory allocated by the C++ code needs to be released on the C# side. That's normally done automatically with, say, the P/Invoke marshaller or the COM interop layer in the CLR. Or can be done manually by declaring a method argument as IntPtr, then use of the Marshal class.
If it is done automatically you must use the COM memory allocator, CoTaskMemAlloc(). If you marshal yourself you could also use GlobalAlloc(), release on the C# side with Marshal.FreeHGlobal(). There isn't any advantage to using GlobalAlloc(), you might as well use CoTaskMemAlloc() and release with Marshal.FreeCoTaskMem().
But you should have noticed this yourself. Allocating with malloc() or HeapAlloc() on the C++ side causes leaks instead of corruption if the managed code releases the memory. Vista and Win7 have a much stricter heap manager, it terminates the program if it notices a bad release.
It sounds to me that you have simple heap corruption in your C++ code. That is the most common scourge of unmanaged C++ programming, over-running the end of a buffer, writing to memory that's been freed, bad pointer values. The way to get rid of bugs like these is a careful code review and use of a debug allocator, such as the one provided by <crtdbg.h>. Good luck with it.
The windows driver developpement kit recommend against the use of C++ for drivers.
Also the best strategy is to have the driver manage its own memory. When the c# needs to see the data then pass it a marshalled buffer and have the driver to fill it
I have an unmanaged C++ exe that I could call from inside my C# code directly (have the C++ code that I could make a lib) or via spawning a process and grabbing the data from the OutputStream. What are the advantages/disadvantages of the options?
Since you have source code of the C++ library, you can use C++/CLI to compile it into a mixed mode dll so it is easy to be used by the C# application.
The benefit of this will be most flexible on data flow (input or output to that C++ module).
While running the C++ code out of process has one benefit. If your C++ code is not very robust, this can make your main C# process stable so as not to be crashed by the C++ code.
The big downside to scraping the OutputStream is the lack of data typing. I'd much rather do the work of exporting a few functions and reusing an existing library; but, that's really just a preference.
Another disadvantage of spawning a process is that on windows spwaning a process is a very expensive (slow) operation. If you intend to call the c++ code quite often this is worth considering.
An advantage can be that you're automatically more isolated to crashes in the c++ program.
Drop in replacement of the c++ executable can be an advantage as well.
Furthermore writing interop code can be big hassle in c#. If it's a complicated interace and you decide to do interop, have a look at c++/cli for the interop layer.
You're far better off taking a subset of the functions of the C++ executable and building it into a library. You'll keep type safety and you'll be able to better leverage Exception Handling (not to mention finer grain control of how you manage the calls into the functions in the library).
If you go with grabbing the data from the OutputStream of the executable, you're going to have no visibility into the processes of the executable, no real exception handling, and you're going to lose any type information you may have had.
The main disadvantage to being in process would be making sure you handle the managed/native interactions correctly.
1)
The c++ code will probably depend on deterministic destruction for cleanup/resource freeing etc. I say probably because this is common and good practice in c++.
In the managed code this means you have to be careful to dispose of your c++ cli wrapper code properly. If your code is used once, a using clause in c# will do this for you. If the object needs to live a while as a member you'll find that the dispose will need to be chained the whole way through your application.
2)
Another issue depends on how memory hungry your application is. The managed garbage collector can be lazy. It is guaranteed to kick in if a managed allocation needs more space than is available. However the unmanaged allocator is not connected in anyway. Therefore you need to manaully inform the managed allocator that you will be making unmanaged allocations and that it should keep that space available. This is done using the AddMemoryPressure method.
The main disadvantages to being out of process are:
1) Speed.
2) Code overhead to manage the communication.
3) Code overhead to watch for one or other process dying when it is not expected to.