I have a project in which I'll have to process 100s if not 1000s of messages a second, and process/plot this data on graphs accordingly. (The user will search for a set of data in which the graph will be plotted in real time, not literally having to plot 1000s of values on a graph.)
I'm having trouble understanding using DLLs for having the bulk of the message processing in C++, but then handing the information into a C# interface. Can someone dumb it down for me here?
Also, as speed will be a priority, I was wondering if accessing across 2 different layers of code will have more of a performance hit than programming the project in its entirety in C#, or of course, C++. However, I've read bad things about programming a GUI in C++; in regards to which, this application must also look modern, clean, professional etc. So I was thinking C# would be the way forward (perhaps XAML, WPF).
Thanks for your time.
The simplest way to interop between a C/C++ DLL and a .NET Assembly is through p/invoke. On the C/C++ side, create a DLL as you would any other. On the C# side you create a p/invoke declaration. For example, say your DLL is mydll.dll and it exports a method void Foo():
[DllImport("mydll.dll")]
extern static void Foo();
That's it. You simply call Foo like any other static class method. The hard part is getting data marshalled and that is a complicated subject. If you are writing the DLL you can probably go out of your way to make the export functions easily marshalled. For more on the topic of p/invoke marshalling see here: http://msdn.microsoft.com/en-us/magazine/cc164123.aspx.
You will take a performance hit when using p/invoke. Every time a managed application makes an unmanaged method call, it takes a hit crossing the managed/unmanaged boundary and then back again. When you marshal data, a lot of copying goes on. The copying can be reduced if necessary by using 'unsafe' C# code (using pointers to access unmanaged memory directly).
What you should be aware of is that all .NET applications are chock full of p/invoke calls. No .NET application can avoid making Operating System calls and every OS call has to cross into the unmanaged world of the OS. WinForms and even WPF GUI applications make that journey many hundreds, even thousands of times a second.
If it were my task, I would first do it 100% in C#. I would then profile it and tweak performance as necessary.
If speed is your priority, C++ might be the better choice. Try to make some estimations about how hard the calculation really is (1000 messages can be trivial to handle in C# if the calculation per message is easy, and they can be too hard for even the best optimized program). C++ might have some more advantages (regarding performance) over C# if your algorithms are complex, involving different classes, etc.
You might want to take a look at this question for a performance comparison.
Separating back-end and front-end is a good idea. Whether you get a performance penalty from having one in C++ and the other in C# depends on how much data conversion is actually necessary.
I don't think programming the GUI is a pain in general. MFC might be painful, Qt is not (IMHO).
Maybe this gives you some points to start with!
Another possible way to go: sounds like this task is a prime target for parallelization. Build your app in such a way that it can split its workload on several CPU cores or even different machines. Then you can solve your performance problems (if there will be any) by throwing hardware at them.
If you have C/C++ source, consider linking it into C++/CLI .NET Assembly. This kind of project allows you to mix unmanaged code and put managed interfaces on it. The result is a simple .NET assembly which is trivial to use in C# or VB.NET projects.
There is built-in marshaling of simple types, so that you can call functions from the managed C++ side into the unmanaged side.
The only thing you need to be aware of is that when you marshal a delegate into a function pointer, it doesn't hold a reference, so if you need the C++ to hold managed callbacks, you need to arrange for a reference to be held. Other than that, most of the built-in conversions work as expected. Visual Studio will even let you debug across the boundary (turn on unmanaged debugging).
If you have a .lib, you can use it in a C++/CLI project as long as it's linked to the C-Runtime dynamically.
You should really prototype this in C# before you start screwing around with marshalling and unmarshalling data into unsafe structures so that you can invoke functions in a C++ DLL. C# is very often faster than you think it'll be. Prototyping is cheap.
Related
As far as I know (correct me if I am wrong please), managed languages (or at least C#) is not going to make any segfault (at least when no Unsafe or directly dealing with unmanaged memory). This opposite to unmanaged language (or at least C++) where you can get segfault by just taking a look to cat near you for a second while coding.
The question: How managed language ensure this? were their runtime library built and tested so carefully. Or they have some way to catch these segfault and deal with it in a way or another?
The motivation behind this question: I have C# application that calls a native C++ library (both were built by me). When my C++ DLL makes segfault, the whole application goes down (some services go down) which is not a good thing at all. I know that when getting segfault, this means something was done wrongly and need to be corrected. However, at least I want some mechanism to solve this problem when the buggy (may cause segfault) C++ DLL is working on the customer machine.
They don't allow you to manually deallocate memory.
They don't enable you to read/write from/to arbitrary memory addresses (C++ also doesn't allow this, but the language syntax makes it possible).
(as a special form of the above) They check every array access whether it is within the bounds of the array
To the best of my knowledge, they don't have undefined bahavior (except of courese, when calling unsafe code)
I want some mechanism to solve this problem when the buggy (may cause segfault) C++ DLL is working on the customer machine.
The problem is that even if you could allow your program to continue (I don't know if Windows/c# offer any mechanism to do this), it might no longer be in a valid state, so depending on what the error is and to what kind of ressources you program has access to, this might actually result in worse errors than just a program crash, including the destruction of userdata.
I'm attempting to marshal a forest of objects from C# .NET to native C++. That is: I have a graph of hundreds of millions of objects (if not more), that I wish to use in native C++. See it as a normal 'leaf'/'node' construction with pointers between leafs and nodes. I control both the C++ and the C# code, so I can make adjustments to the code.
The inner loop of the software is going to be implemented in native C++ for performance reasons. I basically want to tell the GC to stop for a while (to ensure objects aren't moved), then do the fancy C++ routine, and then continue the GC once it's done.
There are also things that I don't want to do:
Make my own mark & sweep algorithm to pin all objects in the graph. Not only will that be very time consuming, it'll also cost a lot of memory because I then have to keep track of all these GCHandle objects by myself.
Use native allocation methods like malloc. I've had a native C++ application in the past, and it suffered greatly from memory fragmentation, that .NET 'automatically' solves just fine... not to mention the benefit of GC.
So, any suggestions on how to do this?
I will look at using managed C++.
Maybe accessing the .NET objects from manage C++ will be fast enough.
Otherwise use managed C++ to “walk” the .net objects and create native C++ objects, delete them all once done.
Or create a class factory class in manage C++ that can be used to create the C++ object being callable from C#, once again delete them all once done.
Or do as Marc Gravel says and manually allocating a buffer of unmanaged memory, and dealing with structs inside that space, maybe using a code generator driven from attributes on your C# classes.
I have some native applications we have written in c++ and compiled to executables. .exe ect.
We need to run this in a service from c# and I am wondering if it makes any difference to create wrapper project that makes it easy to call it from c# either by a c++/CLI project or just p/invoke, or to just to start a process that calls the .exe file as we would do from command line?
Ofcause its easier to consume from c# if its just taking a namespace and call a c# function that takes care of things. Put I could as easy create that function that starts a process call to the command line exe and get the result that way.
Is there any performance difference to either method because that will most likely be the key factor for this as we can implement both ways easy.
Also using a c++/CLI wrapper makes transfer of variables easier.
Ofcause its easier to consume from c# if its just taking a namespace
and call a c# function that takes care of things
It depends on nature of what your app is going to do using thet C++ code.
For sure it will be easier to debug.
The performance should be better in case of wrapper, as in case of EXE, you need to startup EXE, which may take some non so irrelevant time. In case of wrapper, there is perfromance bottleneck in transfering data between managed and unmanaged code, instead. Which, by the way, should be less then executable run.
All this depends on concrete application lifecycle and actually can be measured only by you, in your concrete context.
My choice would be:
If calls are not so frequent, keep it like separate executable.
On the other hand if calls are still not so frequent, but the amount of data you need to pass to and from calls are big enough, may consider wrapper. Here again choice of EXE may be again valid if your data is or can be file, so you just pass file paths to executable.
If calls are frequent enough, use wrapper.
Repeat, those are just consideration that may or may not bring some practical benefit in your concrete case.
There is a performance difference. However, as with all performance tweaking, the key is profiling. Do it using the way that's easier, and check if its okay.
The main performance difference between the P/Invoke and C++/CLI way is marshalling, which is automatic in P/Invoke (with some customization).
I have an unmanaged C++ exe that I could call from inside my C# code directly (have the C++ code that I could make a lib) or via spawning a process and grabbing the data from the OutputStream. What are the advantages/disadvantages of the options?
Since you have source code of the C++ library, you can use C++/CLI to compile it into a mixed mode dll so it is easy to be used by the C# application.
The benefit of this will be most flexible on data flow (input or output to that C++ module).
While running the C++ code out of process has one benefit. If your C++ code is not very robust, this can make your main C# process stable so as not to be crashed by the C++ code.
The big downside to scraping the OutputStream is the lack of data typing. I'd much rather do the work of exporting a few functions and reusing an existing library; but, that's really just a preference.
Another disadvantage of spawning a process is that on windows spwaning a process is a very expensive (slow) operation. If you intend to call the c++ code quite often this is worth considering.
An advantage can be that you're automatically more isolated to crashes in the c++ program.
Drop in replacement of the c++ executable can be an advantage as well.
Furthermore writing interop code can be big hassle in c#. If it's a complicated interace and you decide to do interop, have a look at c++/cli for the interop layer.
You're far better off taking a subset of the functions of the C++ executable and building it into a library. You'll keep type safety and you'll be able to better leverage Exception Handling (not to mention finer grain control of how you manage the calls into the functions in the library).
If you go with grabbing the data from the OutputStream of the executable, you're going to have no visibility into the processes of the executable, no real exception handling, and you're going to lose any type information you may have had.
The main disadvantage to being in process would be making sure you handle the managed/native interactions correctly.
1)
The c++ code will probably depend on deterministic destruction for cleanup/resource freeing etc. I say probably because this is common and good practice in c++.
In the managed code this means you have to be careful to dispose of your c++ cli wrapper code properly. If your code is used once, a using clause in c# will do this for you. If the object needs to live a while as a member you'll find that the dispose will need to be chained the whole way through your application.
2)
Another issue depends on how memory hungry your application is. The managed garbage collector can be lazy. It is guaranteed to kick in if a managed allocation needs more space than is available. However the unmanaged allocator is not connected in anyway. Therefore you need to manaully inform the managed allocator that you will be making unmanaged allocations and that it should keep that space available. This is done using the AddMemoryPressure method.
The main disadvantages to being out of process are:
1) Speed.
2) Code overhead to manage the communication.
3) Code overhead to watch for one or other process dying when it is not expected to.
I want to create a simple http proxy server that does some very basic processing on the http headers (i.e. if header x == y, do z). The server may need to support hundreds of users. I can write the server in C# (pretty easy) or c++ (much harder). However, would a C# version have as good of performance as a C++ version? If not, would the difference in performance be big enough that it would not make sense to write it in C#?
You can use unsafe C# code and pointers in critical bottleneck points to make it run faster. Those behave much like C++ code and I believe it executes as fast.
But most of the time, C# is JIT-ted to uber-fast already, I don't believe there will be much differences as with what everyone has said.
But one thing you might want to consider is: Managed code (C#) string operations are rather slow compared to using pointers effectively in C++. There are more optimization tricks with C++ pointers than with CLR strings.
I think I have done some benchmarks before, but can't remember where I've put them.
Why do you expect a much higher performance from the C++ application?
There is no inherent slowdown added by a C# application when you are doing it right. (not too many dropped references, frequent object creation/dropping per call, etc.)
The only time a C++ application really outperforms an equivalent C# application is when you can do (very) low level operations. E.g. casting raw memory pointers, inline assembler, etc.
The C++ compiler may be better at creating fast code, but mostly this is wasted in most applications. If you do really have a part of your application that must be blindingly fast, try writing a C call for that hot spot.
Only if most of the system behaves too slowly you should consider writing it in C/C++. But there are many pitfalls that may kill your performance in your C++ code.
(TLDR: A C++ expert may create 'faster' code as an C# expert, but a mediocre C++ programmer may create slower code than mediocre C# one)
I would expect the C# version to be nearly as fast as the C++ one but with smaller memory footprint.
In some cases managed code is actually a LOT faster and uses less memory compared to non optimized C++. C++ code can be faster if written by expert, but it rarely justifies the effort.
As a side note I can recall a performance "competition" in the blogosphere between Michael Kaplan (c#) and Raymond Chan (C++) to write a program, that does exactly the same thing. Raymond Chan, who is considered one of the best programmers in the world (Joel) succeeded to write faster C++ after a long struggle rewriting most of the code.
The proxy server you describe would deal mostly with string data and I think its reasonable to implement in C#. In your example,
if header x == y, do z
the slowest part might actually be doing whatever 'z' is and you'll have to do that work regardless of the language.
In my experience, the design and implementation has much more to do with performance than do the choice of language/framework (however, the usual caveats apply: eg, don't write a device driver in C# or java).
I wouldn't think twice about writing the type of program you describe in a managed language (be it Java, C#, etc). These days, the performance gains you get from using a lower level language (in terms of closeness to hardware) is often easily offset by the runtime abilities of a managed environment. Of course this is coming from a C#/python developer so I'm not exactly unbiased...
If you need a fast and reliable proxy server, it might make sense to try some of those that already exist. But if you have custom features that are required, then you may have to build your own. You may want to collect some more information on the expected load: hundreds of users might be a few requests a minute or a hundred requests a second.
Assuming you need to serve under or around 200 qps on a single machine, C# should easily meet your needs -- even languages known for being slow (e.g. Ruby) can easily pump out a few hundred requests a second.
Aside from performance, there are other reasons to choose C#, e.g. it's much easier to write buffer overflows in C++ than C#.
Is your http server going to run on a dedicated machine? If yes, I would say go with C# if it is easier for you. If you need to run other applications on the same machine, you'll need to take into account the memory footprint of your application and the fact that GC will run at "random" times.