As far as I know (correct me if I am wrong please), managed languages (or at least C#) is not going to make any segfault (at least when no Unsafe or directly dealing with unmanaged memory). This opposite to unmanaged language (or at least C++) where you can get segfault by just taking a look to cat near you for a second while coding.
The question: How managed language ensure this? were their runtime library built and tested so carefully. Or they have some way to catch these segfault and deal with it in a way or another?
The motivation behind this question: I have C# application that calls a native C++ library (both were built by me). When my C++ DLL makes segfault, the whole application goes down (some services go down) which is not a good thing at all. I know that when getting segfault, this means something was done wrongly and need to be corrected. However, at least I want some mechanism to solve this problem when the buggy (may cause segfault) C++ DLL is working on the customer machine.
They don't allow you to manually deallocate memory.
They don't enable you to read/write from/to arbitrary memory addresses (C++ also doesn't allow this, but the language syntax makes it possible).
(as a special form of the above) They check every array access whether it is within the bounds of the array
To the best of my knowledge, they don't have undefined bahavior (except of courese, when calling unsafe code)
I want some mechanism to solve this problem when the buggy (may cause segfault) C++ DLL is working on the customer machine.
The problem is that even if you could allow your program to continue (I don't know if Windows/c# offer any mechanism to do this), it might no longer be in a valid state, so depending on what the error is and to what kind of ressources you program has access to, this might actually result in worse errors than just a program crash, including the destruction of userdata.
Related
I have a C# application that does a few bits and pieces, but the main task it performs is done by a Delphi DLL which it calls.
This Delphi DLL is a total memory hog, which needs to cache a lot of DB-information locally for speed. I'm happy that it's not leaky, as FastMM4 isn't reporting any memory leaks when the code is run within Delphi.
I am starting to run into problems, however, when the control is returned to C#. The C# code attempts to do some calculations on the results of the Delphi app (all results marshalled via a DB). These calculations usually involve a million or so doubles so not extreme memory usage, however the code keeps returning me out of memory exceptions.
I assume that FastMM4 in the Delphi code still hasn't returned the freed memory to Windows (and hence available to the C# code), so the process is still using it's maximum 32-bit memory allocation and C# can't obtain more when it needs to.
So, how do I get the memory used (and freed) by Delphi usable again by the C# code? I thought we may want to do one of the following:
Force an unload of the Delphi DLL from the C# side (my colleague doesn't think this will work, as he thinks it'll just unload the code rather than the memory used on the heap) - probably LoadLibrary/FreeLibrary?
Make a call at the end of the Delphi DLL to release the memory back to Windows (I tried SetWorkingProcessSetSize before, but didn't seem to do anything, should I use a different call?)
Wrap the Delphi DLL in a C# DLL and call it in a different AppDomain (I don't like this from a style perspective as we're creating wrappers just to hold wrappers.
Anything else I've missed?
Force an unload of the Delphi DLL from the C# side (my colleague doesn't think this will work, as he thinks it'll just unload the code rather than the memory used on the heap) - probably LoadLibrary/FreeLibrary?
This will just work. When the DLL unloads, FastMM will finalize and return the memory that it reserved and committed.
One thing I would do is make a call to GC.Collect before calling your library. .NET knows what to do when more managed memory is requested than can fit and calls the collector automatically, however it has no clue what you're doing in native code so there will be a lot of memory allocated needlessly.
I would also move from a 32 bit architecture. It's not that you ran out of memory, it's that you ran out of consecutive memory large enough to fit whatever you're trying to do in Delphi. A larger virtual address space will fix that issue for you, and there hasn't been a processor made in the past 6 years that didn't wasn't 64 bit capable. It's time to take those shy steps into the bright future ahead of us.
In C# (or maybe in .NET in general) individual assemblies cannot be unloaded from memory.
Unloading can only occur at the AppDomain level.
I am wondering what are there reasons behind this design? Other languages support this feature (C++ i think)
Here is an MSDN blog post listing some reasons why not. The main issue is:
First off, you are running that code in the app domain (duh!). That means there are potentially call sites and call stacks with addresses in them that are expecting to keep working. Have you ever gotten an access violation where your EIP points to 0x???????? That is an example where someone freed up a DLL, the pages got unmapped by the memory system, and then you tried to branch to it. This typically happens in COM when you have a ref counting error and you make an interface method call. We cannot afford to be as lose with managed code. We must guarantee we know all of the code you are executing and that it is type safe and verifiable. That means explicit tracking of anything that could be using that code, including GC objects and COM interop wrappers. This tracking is handled today around an app domain boundary. Tracking it at the assembly level becomes quite expensive.
I'll summarise this in higher-level language:
Basically, things that go wrong if you simply delete executable code go wrong on the unmanaged level. You would have compiled code that points to other compiled code that is no longer there, so your code would jump into an area that is invalid, and possibly contains arbitrary data.
This is unacceptable in managed code, because things are meant to be safe and have some guarantees around them. One of these guarantees is that your code can't execute arbitrary sections of memory.
To handle this issue properly you'd have to track many more things more closely, and this would be a large overhead. The alternative is to only track these things at appdomain boundaries, which is what is done.
I have a project in which I'll have to process 100s if not 1000s of messages a second, and process/plot this data on graphs accordingly. (The user will search for a set of data in which the graph will be plotted in real time, not literally having to plot 1000s of values on a graph.)
I'm having trouble understanding using DLLs for having the bulk of the message processing in C++, but then handing the information into a C# interface. Can someone dumb it down for me here?
Also, as speed will be a priority, I was wondering if accessing across 2 different layers of code will have more of a performance hit than programming the project in its entirety in C#, or of course, C++. However, I've read bad things about programming a GUI in C++; in regards to which, this application must also look modern, clean, professional etc. So I was thinking C# would be the way forward (perhaps XAML, WPF).
Thanks for your time.
The simplest way to interop between a C/C++ DLL and a .NET Assembly is through p/invoke. On the C/C++ side, create a DLL as you would any other. On the C# side you create a p/invoke declaration. For example, say your DLL is mydll.dll and it exports a method void Foo():
[DllImport("mydll.dll")]
extern static void Foo();
That's it. You simply call Foo like any other static class method. The hard part is getting data marshalled and that is a complicated subject. If you are writing the DLL you can probably go out of your way to make the export functions easily marshalled. For more on the topic of p/invoke marshalling see here: http://msdn.microsoft.com/en-us/magazine/cc164123.aspx.
You will take a performance hit when using p/invoke. Every time a managed application makes an unmanaged method call, it takes a hit crossing the managed/unmanaged boundary and then back again. When you marshal data, a lot of copying goes on. The copying can be reduced if necessary by using 'unsafe' C# code (using pointers to access unmanaged memory directly).
What you should be aware of is that all .NET applications are chock full of p/invoke calls. No .NET application can avoid making Operating System calls and every OS call has to cross into the unmanaged world of the OS. WinForms and even WPF GUI applications make that journey many hundreds, even thousands of times a second.
If it were my task, I would first do it 100% in C#. I would then profile it and tweak performance as necessary.
If speed is your priority, C++ might be the better choice. Try to make some estimations about how hard the calculation really is (1000 messages can be trivial to handle in C# if the calculation per message is easy, and they can be too hard for even the best optimized program). C++ might have some more advantages (regarding performance) over C# if your algorithms are complex, involving different classes, etc.
You might want to take a look at this question for a performance comparison.
Separating back-end and front-end is a good idea. Whether you get a performance penalty from having one in C++ and the other in C# depends on how much data conversion is actually necessary.
I don't think programming the GUI is a pain in general. MFC might be painful, Qt is not (IMHO).
Maybe this gives you some points to start with!
Another possible way to go: sounds like this task is a prime target for parallelization. Build your app in such a way that it can split its workload on several CPU cores or even different machines. Then you can solve your performance problems (if there will be any) by throwing hardware at them.
If you have C/C++ source, consider linking it into C++/CLI .NET Assembly. This kind of project allows you to mix unmanaged code and put managed interfaces on it. The result is a simple .NET assembly which is trivial to use in C# or VB.NET projects.
There is built-in marshaling of simple types, so that you can call functions from the managed C++ side into the unmanaged side.
The only thing you need to be aware of is that when you marshal a delegate into a function pointer, it doesn't hold a reference, so if you need the C++ to hold managed callbacks, you need to arrange for a reference to be held. Other than that, most of the built-in conversions work as expected. Visual Studio will even let you debug across the boundary (turn on unmanaged debugging).
If you have a .lib, you can use it in a C++/CLI project as long as it's linked to the C-Runtime dynamically.
You should really prototype this in C# before you start screwing around with marshalling and unmarshalling data into unsafe structures so that you can invoke functions in a C++ DLL. C# is very often faster than you think it'll be. Prototyping is cheap.
I've got a p-invoke call to an unmanaged DLL that was failing in my WPF app but not in a simple, starter WPF app. I tried to figure out what the problem was but eventually came to the conclusion that if I assign too much memory before making the call, the call fails. I had two separate blocks of code, both of which would succeed on their own, but that would cause failure if both were run. (They had nothing to do with what the p-invoke call is trying to do).
What kind of issues in the unmanaged library would cause such an issue? I thought that the managed and unmanaged heaps were supposed to be automatically separated.
The crash as far as I can tell is happening in a dynamically loaded secondary DLL from the one p-invoked into. Could that have something to do with it?
Unmanaged code is prone to corrupt the heap. The side effects of that corruption are very unpredictable, it depends on what happens afterwards with that corrupted memory. It is not uncommon that nothing bad happens if the corruption is not in a crucial location. Changing the memory allocation pattern of your program can change that outcome.
All you really know right now is that the unmanaged code can't be trusted. Doing something about it is invariably hard, especially from a managed host program. You won't get anywhere until you start writing unit tests for that unmanaged code, using unmanaged code to exercise it, and find a reproducible bomb that you could tackle with an unmanaged debugger.
A shot in the dark given there is not much info to work with.
Is it possible that the unmanaged DLL needs to be loaded at a specific base address and when you allocate too much memory or other assemblies are loaded, the DLL is not able to load at the correct address.
http://msdn.microsoft.com/en-us/library/w368ysh2.aspx
I have an unmanaged C++ exe that I could call from inside my C# code directly (have the C++ code that I could make a lib) or via spawning a process and grabbing the data from the OutputStream. What are the advantages/disadvantages of the options?
Since you have source code of the C++ library, you can use C++/CLI to compile it into a mixed mode dll so it is easy to be used by the C# application.
The benefit of this will be most flexible on data flow (input or output to that C++ module).
While running the C++ code out of process has one benefit. If your C++ code is not very robust, this can make your main C# process stable so as not to be crashed by the C++ code.
The big downside to scraping the OutputStream is the lack of data typing. I'd much rather do the work of exporting a few functions and reusing an existing library; but, that's really just a preference.
Another disadvantage of spawning a process is that on windows spwaning a process is a very expensive (slow) operation. If you intend to call the c++ code quite often this is worth considering.
An advantage can be that you're automatically more isolated to crashes in the c++ program.
Drop in replacement of the c++ executable can be an advantage as well.
Furthermore writing interop code can be big hassle in c#. If it's a complicated interace and you decide to do interop, have a look at c++/cli for the interop layer.
You're far better off taking a subset of the functions of the C++ executable and building it into a library. You'll keep type safety and you'll be able to better leverage Exception Handling (not to mention finer grain control of how you manage the calls into the functions in the library).
If you go with grabbing the data from the OutputStream of the executable, you're going to have no visibility into the processes of the executable, no real exception handling, and you're going to lose any type information you may have had.
The main disadvantage to being in process would be making sure you handle the managed/native interactions correctly.
1)
The c++ code will probably depend on deterministic destruction for cleanup/resource freeing etc. I say probably because this is common and good practice in c++.
In the managed code this means you have to be careful to dispose of your c++ cli wrapper code properly. If your code is used once, a using clause in c# will do this for you. If the object needs to live a while as a member you'll find that the dispose will need to be chained the whole way through your application.
2)
Another issue depends on how memory hungry your application is. The managed garbage collector can be lazy. It is guaranteed to kick in if a managed allocation needs more space than is available. However the unmanaged allocator is not connected in anyway. Therefore you need to manaully inform the managed allocator that you will be making unmanaged allocations and that it should keep that space available. This is done using the AddMemoryPressure method.
The main disadvantages to being out of process are:
1) Speed.
2) Code overhead to manage the communication.
3) Code overhead to watch for one or other process dying when it is not expected to.