C# Performance - should I write the computation heavy methods in c++? - c#

I am building a prototype for a quantitative library that does some signal analysis using image processing techniques. I built the initial prototype entirely in C#, but the performance is not as good as expected. Most of the computation is done through heavy matrix calculations, and these are taking up most of the time.
I am wondering if it is worth it to write a C++/CLI interface to unmanaged C++ code. Has anyone ever gone through this? Other suggestions for optimizing C# performance is welcome.

There was a time where it would definitely be better to write in C/C++, but the C# optimizer and JIT is so good now, that for pure math, there's probably no difference.
The difference comes when you have to deal with memory and possibly arrays. Even so, I'd still work with C# (or F#) and then optimize hotspots. The JIT is really good at optimizing away small, short-lived objects.
With arrays, you have to worry about C# doing bounds-checks on each access. Read this:
Link
Test it yourself -- I've been finding C# to be comparable -- sometimes faster.

It's hard to give a definitive answer here, but if performance is an issue I'd find a time-tested library with the performance you need and wrap it.
Something simple like multiplication or division is not much different between c++ and c# - the c++ compiler has an optimizer, and the CLR runtime has the on-demand JITer that does optimizations. So in theory, the c++ would outperform c# only on the first call.
However, theory and practice are not the same. With more complicated algorithms you also run into the differences between memory managers, and the maturity of the optimization techniques. If you want anecdotal evidence, you can find some math-heavy comparisons here.
Personally, I find doing the heavy computations in a native library and using c++/CLI to call it gives a good boost when the computations are the biggest bottleneck. As always, make sure that's the case before doing any optimization.

Matrix math is best done in native code in my opinion. Even the C++ libraries typically allow binding to a lower-level implementation like LAPACK.
There is a C# LAPACK port here (also C# BLAS on the same site) which you could try but I'd be surprised if this is faster than native code.

I've done a lot of image processing work in C# and, yes, I usually do use native code for heavy duty code where performance matters but I used just PInvokes and not the C++/CLI interface. A lot of time this is not needed, though.
There's quite a few good .NET profilers. The Red Gate one is my personal favorite. It might help you to visualize where the bottlenecks are.

The only reasonable language benchmark out there: http://shootout.alioth.debian.org/
See for yourself.

Performance for mathematical computation is pretty poor in C#. I was gobbsmacked to find how slow mathematical calculations are in C#. Just write a loop in C# and C++ having a few Multiplication, Sin, Cos, ... and the difference is immense.
I do not know Managed C++ but imlpementing it all in unmanaged C++ abd I would umagine exposing granular interfaces through P/Invoke should have little perfromance hit.
That is what I have done for heavy real-time image processing.

I built the initial prototype entirely in C#, but the performance is not as good as expected.
then you have two options:
Build another prototype in C++, and see how that compares, or optimize your C# code. No matter what language you write in, your code won't be fast until you've profiled and optimized and profiled and optimized it. That is especially true in C++. If you write the fastest possible implementation in C# and compare it to the fastest possible implementation in C++, then the C++ version will most likely be faster. But it will come at a cost in terms of development time. It's not trivial to write efficient C++ code. If you are new to the language then you will most likely write very inefficient code, especially if you are coming from C# or Java, where things are done differently and have different costs.
If you just write a working implementation, without worrying too much about performance, then I'm guessing that the C# version will probably be faster.
But it really depends on what kind of performance you're after (and not least, how expensive the operations you need to perform are. There's an overhead associated with the transition from managed to native code, so it is not worth it for short operations that are executed often.
Number-crunching code in C++ can be as fast as code written in Fortran (give or take a few percent), but to achieve that, you need to use a lot of advanced techniques (expression templates and lots of metaprogramming) or some fairly complex libraries which implement it for you.
Is that worth it? Or can C# be made fast enough for your needs?

You should write computational heavy programs in C++, you cannot reach anywhere near to C++ performance by optimizing C#. The overhead of calling wrappers is negligible assuming the computation takes considerable time. I have done coding in both C++ and C# and have never seen any occasion where .NET framework code come comparable to C++. There are a few instances where C# where runs better, but it was better because of lack of appropriate libraries or bad coding in C++. If you can write code equally well in C# and C++, I would write performance code in C++ and everything else is C#.
If x is the world best C++ programmer and y is the best C# programmer then most of the times x can write faster code than y. However, y can finish the coding faster than x most of the times.

Related

Compile C#, so that it runs with the speed of C++

Alright, so I wanted to ask if it's actually possible to make a parser from c# to c++.
So that code written in C# would be able to run as fast as code written in C++.
Is it actually possible to do? I'm not asking how hard is it going to be.
What makes you think that translating your C# code to C++ would magically make it faster?
Languages don't have a speed. Assuming that C# code is slower (I'll get back to that), it is because of what that code does (including the implicit requirements placed by C#, such as bounds checking on arrays), and not because of the language it is written in.
If you converted your C# code to C++, it would still need to do bounds checking on arrays, because the original source code expected this to happen, so it would have to do just as much work.
Moreover, C# often isn't slower than C++. There are plenty of benchmarks floating around on the internet, generally showing that for the most part, C# is as fast as (or faster than) C++. Only when you spend a lot of time optimizing your code, does C++ become faster.
If you want faster code, you need to write code that requires less work to execute, not try to change the source language. That's just cargo-cult programming at its worst. You once saw some efficient code, and that was written in C++, so now you try to make things C++, in the hope of attracting any efficiency that might be passing by.
It just doesn't work that way.
Although you could translate C# code to C++, there would be the issue that C# depends on the .Net framework libraries which are not native, so you could not simply translate C# code to C++.
Update
Also C# code depends on the runtime to do things such as memory management i.e. Garbage Collection. If you translated the C# code to C++, where would the memory management code be? Parsing and translating is not going to fix issues like that.
The Mono project has invested quite a lot of energy in turning LLVM into a native machine code compiler for the C# runtime, although there are some problems with specific language constructs like shared generics etc.. Check it out and take it for a spin.
You can use NGen to compile IL to native code
Performance related tweaks:
Platform independent
use a profiler to spot the bottlenecks;
prevent unnecessary garbage (spot it using generation #0 collect count and the Large Object heap)
prevent unnecessary copying (use struct wisely)
prevent unwarranted generics (code-sharing has unexpected performance side effects)
prefer oldfashioned loops over enumerator blocks when performance is an issue
When using LINQ watch closely where you maintain/break deferred evaluation. Both can be enormous boosts to performance
use reflection.emit/Expression Trees to precompile certain dynamic logic that is performance bottleneck
Mono
use Mono --gc=sgen --optimize=inline,... (the SGEN garbage collector can make orders of magnitude difference). See also man mono for a lot of tuning/optimization options
use MONO_GENERIC_SHARING=none to disable sharing of generics (making particular tasks a lot quicker especially when supporting both valuetypes and reftypes) (not recommended for regular production use)
use the -optimize+ compile flag (optimizing the CLR code independently from what the JITter may do with that)
Less mainstream:
use the LLVM backend (click the quote:)
This allows Mono to benefit from all of the compiler optimizations done in LLVM. For example the SciMark score goes from 482 to 610.
use mkbundle to create a statically linked NATIVE binary image (already fully JITted, i.e. AOT (ahead-of-time compiled))
MS .NET
Most of the above have direct Microsoft pendants (NGen, `/Optimize' etc.)
Of course MS don't have a switchable/tunable garbage collector, and I don't think a fully compiled native binary can be achieved like with mono.
As always the answer to making code run faster is:
Find the bottleneck and optimize that
Most of the time the bottleneck is either:
time spend in a critical loop
Review your algorithm and datastructure, do not change the language, the latter will give a 10% speedup, the first will give you a 1000x speedup.
If you're stuck on the best algorithm, you can always ask a specific, short and detailed question on SO.
time waiting for resources for a slow source
Reduce the amount of stuff you're requesting from the source
instead of:
SELECT * FROM bigtable
do
SELECT TOP 10 * FROM bigtable ORDER BY xxx
The latter will return instantly and you cannot show a million records in a meaningful way anyhow.
Or you can have the server at the order end reduce the data so that it doesn't take a 100 years to cross the network.
Alternativly you can execute the slow datafetch routine in a separate thread, so the rest of your program can do meaningful stuff instead of waiting.
Time spend because you are overflowing memory with Gigabytes of data
Use a different algorithm that works on a smaller dataset at a time.
Try to optimize cache usage.
The answer to efficient coding is measure where your coding time goes
Use a profiler.
see: http://csharp-source.net/open-source/profilers
And optimize those parts that eat more than 50% of your CPU time.
Do this for a number of iterations, and soon your 10 hour running time will be down to a manageable 3 minutes, instead of the 9.5 hours that you will get from switching to this or that better language.

C++ backend with C# frontend?

I have a project in which I'll have to process 100s if not 1000s of messages a second, and process/plot this data on graphs accordingly. (The user will search for a set of data in which the graph will be plotted in real time, not literally having to plot 1000s of values on a graph.)
I'm having trouble understanding using DLLs for having the bulk of the message processing in C++, but then handing the information into a C# interface. Can someone dumb it down for me here?
Also, as speed will be a priority, I was wondering if accessing across 2 different layers of code will have more of a performance hit than programming the project in its entirety in C#, or of course, C++. However, I've read bad things about programming a GUI in C++; in regards to which, this application must also look modern, clean, professional etc. So I was thinking C# would be the way forward (perhaps XAML, WPF).
Thanks for your time.
The simplest way to interop between a C/C++ DLL and a .NET Assembly is through p/invoke. On the C/C++ side, create a DLL as you would any other. On the C# side you create a p/invoke declaration. For example, say your DLL is mydll.dll and it exports a method void Foo():
[DllImport("mydll.dll")]
extern static void Foo();
That's it. You simply call Foo like any other static class method. The hard part is getting data marshalled and that is a complicated subject. If you are writing the DLL you can probably go out of your way to make the export functions easily marshalled. For more on the topic of p/invoke marshalling see here: http://msdn.microsoft.com/en-us/magazine/cc164123.aspx.
You will take a performance hit when using p/invoke. Every time a managed application makes an unmanaged method call, it takes a hit crossing the managed/unmanaged boundary and then back again. When you marshal data, a lot of copying goes on. The copying can be reduced if necessary by using 'unsafe' C# code (using pointers to access unmanaged memory directly).
What you should be aware of is that all .NET applications are chock full of p/invoke calls. No .NET application can avoid making Operating System calls and every OS call has to cross into the unmanaged world of the OS. WinForms and even WPF GUI applications make that journey many hundreds, even thousands of times a second.
If it were my task, I would first do it 100% in C#. I would then profile it and tweak performance as necessary.
If speed is your priority, C++ might be the better choice. Try to make some estimations about how hard the calculation really is (1000 messages can be trivial to handle in C# if the calculation per message is easy, and they can be too hard for even the best optimized program). C++ might have some more advantages (regarding performance) over C# if your algorithms are complex, involving different classes, etc.
You might want to take a look at this question for a performance comparison.
Separating back-end and front-end is a good idea. Whether you get a performance penalty from having one in C++ and the other in C# depends on how much data conversion is actually necessary.
I don't think programming the GUI is a pain in general. MFC might be painful, Qt is not (IMHO).
Maybe this gives you some points to start with!
Another possible way to go: sounds like this task is a prime target for parallelization. Build your app in such a way that it can split its workload on several CPU cores or even different machines. Then you can solve your performance problems (if there will be any) by throwing hardware at them.
If you have C/C++ source, consider linking it into C++/CLI .NET Assembly. This kind of project allows you to mix unmanaged code and put managed interfaces on it. The result is a simple .NET assembly which is trivial to use in C# or VB.NET projects.
There is built-in marshaling of simple types, so that you can call functions from the managed C++ side into the unmanaged side.
The only thing you need to be aware of is that when you marshal a delegate into a function pointer, it doesn't hold a reference, so if you need the C++ to hold managed callbacks, you need to arrange for a reference to be held. Other than that, most of the built-in conversions work as expected. Visual Studio will even let you debug across the boundary (turn on unmanaged debugging).
If you have a .lib, you can use it in a C++/CLI project as long as it's linked to the C-Runtime dynamically.
You should really prototype this in C# before you start screwing around with marshalling and unmarshalling data into unsafe structures so that you can invoke functions in a C++ DLL. C# is very often faster than you think it'll be. Prototyping is cheap.

Suitability of C# for clustered calculation-heavy apps?

I'm preparing to write a photonic simulation package that will run on a 128-node Linux and Windows cluster, with a Windows-based client for designing jobs (CAD-like) and submitting them to to the cluster.
Most of this is well-trod ground, but I'm curious how C# stacks up to C++ in terms of real number-crunching ability. I'm very comfortable with both languages, but I find the superior object model and framework support of C# with .NET or Mono incredibly enticing. However, I can't, with this application, sacrifice too much in processing power for the sake of developer preference.
Does anyone have any experience in this area? Are there any hard benchmarks available? I'd assume that the the final machine code would be optimized using the same techniques whether it comes from a C# or C++ source, especially since that typically takes place at the pcode/IL level.
The optimisation techniques employed by C# and native C++ are vastly different. C# compilers emit IL, which is only marginally optimised and then JIT'ed to binary code when it is about to execute for the first time. Most of the optimisation work happens inside the JIT compiler.
This has pros and cons. JIT has time budgets, which limits how much effort it can expend on optimisation. But it also has intimate knowledge of the hardware it is actually running on, so it can (in theory) make transparent use of newer CPU opcodes and detailed knowledge of performance data such as a pipeline hazards database.
In practice, I don't know how significant the latter is. I do know that at least Mono will parallelise some loops automatically if it finds itself running on a CPU with SSE (SSE2, perhaps?), which may be a big deal for your scenario.
I did a quick search and found this:
http://www.drdobbs.com/184401976;jsessionid=232QX0GU3C3KXQE1GHOSKH4ATMY32JVN
Edit: Bear in mind (on reading the article) that this was done 5 years ago so performance is likely to be better all round!
I have found these performance comparisons:
http://reverseblade.blogspot.com/2009/02/c-versus-c-versus-java-performance.html
http://systematicgaming.wordpress.com/2009/01/03/performance-c-vs-c/
http://journal.stuffwithstuff.com/2009/01/03/debunking-c-vs-c-performance/
http://www.csharphelp.com/2007/01/managed-c-versus-unmanaged-c/
And here is even a case study:
http://www.itu.dk/~sestoft/papers/numericperformance.pdf
Hope it helps.

How close can I get C# to the performance of C++ for small intensive tasks?

I was thinking about the speed difference of C++ to C# being mostly about C# compiling to byte-code that is taken in by the JIT compiler (is that correct?) and all the checks C# does.
I notice that it is possible to turn a lot of these functions off, both in the compile options, and possibly through using the unsafe keyword as unsafe code is not verifiable by the common language runtime.
Therefore if you were to write a simple console application in both languages, that flipped an imaginary coin an infinite number of times and displayed the results to the screen every 10,000 or so iterations, how much speed difference would there be? I chose this because it's a very simple program.
I'd like to test this but I don't know C++ or have the tools to compile it. This is my C# version though:
static void Main(string[] args)
{
unsafe
{
Random rnd = new Random();
int heads = 0, tails = 0;
while (true)
{
if (rnd.NextDouble() > 0.5)
heads++;
else
tails++;
if ((heads + tails) % 1000000 == 0)
Console.WriteLine("Heads: {0} Tails: {1}", heads, tails);
}
}
}
Is the difference enough to warrant deliberately compiling sections of code "unsafe" or into DLLs that do not have some of the compile options like overflow checking enabled? Or does it go the other way, where it would be beneficial to compile sections in C++? I'm sure interop speed comes into play too then.
To avoid subjectivity, I reiterate the specific parts of this question as:
Does C# have a performance boost from using unsafe code?
Do the compile options such as disabling overflow checking boost performance, and do they affect unsafe code?
Would the program above be faster in C++ or negligably different?
Is it worth compiling long intensive number-crunching tasks in a language such as C++ or using /unsafe for a bonus? Less subjectively, could I complete an intensive operation faster by doing this?
The example given is flawed because it does not show real life usage of both programming languages. Using simple datatypes to measure the speed of a language will not bring anything interesting. Instead, I suggest you create a template class in C++ and compare it with what is possible in C# for class generics. In the end, objects will bring some important results and you will see that C++ is faster than C#. Not to mention that you are comparing a lower level programming language with C#.
Does C# have a performance boost from
using unsafe code?
Yes, it will have a boost but it is not suggested that you write only code with unsafe. Here is why: Code written using an unsafe context cannot be verified to be safe, so it will be executed only when the code is fully trusted. In other words, unsafe code cannot be executed in an untrusted environment. For example, you cannot run unsafe code directly from the Internet. http://msdn.microsoft.com/en-us/library/aa288474(VS.71).aspx
Would the program above be faster in
C++ or negligably different?
Yes the program would be slightly faster in C++. C++ is a lower programming language and even faster if you start using the algorithm library (random_shuffle comes to mind).
Is it worth compiling long intensive
number-crunching tasks in a language
such as C++ or using /unsafe for a
bonus? Less subjectively, could I
complete an intensive operation faster
by doing this?
It depends on the project...
Up to more than 100% of speed - depends a lot on the task, simply said.
More than 100% - yes, because the just in time compiler knows your processor, and I doubt you actually optimize for your hardware platform ;)
No SSE is a problem if you do matrix operations.
FOr some things with tons of arrays (image manipulation) The array tests kill you, but pointers work (i.e. unsafe code) as they bypass this.
Regarding things like overflow checking - be carefull. As in: in C++ you have the same possibly. If you need overflow checking, the performance issue is not there ;)
I personally would not bother with C++ in most cases. Partially yes, especially when you can benefit from SSE:
So, at the end a lot depends on the NATURE if your calculations.

C# Performance For Proxy Server (vs C++)

I want to create a simple http proxy server that does some very basic processing on the http headers (i.e. if header x == y, do z). The server may need to support hundreds of users. I can write the server in C# (pretty easy) or c++ (much harder). However, would a C# version have as good of performance as a C++ version? If not, would the difference in performance be big enough that it would not make sense to write it in C#?
You can use unsafe C# code and pointers in critical bottleneck points to make it run faster. Those behave much like C++ code and I believe it executes as fast.
But most of the time, C# is JIT-ted to uber-fast already, I don't believe there will be much differences as with what everyone has said.
But one thing you might want to consider is: Managed code (C#) string operations are rather slow compared to using pointers effectively in C++. There are more optimization tricks with C++ pointers than with CLR strings.
I think I have done some benchmarks before, but can't remember where I've put them.
Why do you expect a much higher performance from the C++ application?
There is no inherent slowdown added by a C# application when you are doing it right. (not too many dropped references, frequent object creation/dropping per call, etc.)
The only time a C++ application really outperforms an equivalent C# application is when you can do (very) low level operations. E.g. casting raw memory pointers, inline assembler, etc.
The C++ compiler may be better at creating fast code, but mostly this is wasted in most applications. If you do really have a part of your application that must be blindingly fast, try writing a C call for that hot spot.
Only if most of the system behaves too slowly you should consider writing it in C/C++. But there are many pitfalls that may kill your performance in your C++ code.
(TLDR: A C++ expert may create 'faster' code as an C# expert, but a mediocre C++ programmer may create slower code than mediocre C# one)
I would expect the C# version to be nearly as fast as the C++ one but with smaller memory footprint.
In some cases managed code is actually a LOT faster and uses less memory compared to non optimized C++. C++ code can be faster if written by expert, but it rarely justifies the effort.
As a side note I can recall a performance "competition" in the blogosphere between Michael Kaplan (c#) and Raymond Chan (C++) to write a program, that does exactly the same thing. Raymond Chan, who is considered one of the best programmers in the world (Joel) succeeded to write faster C++ after a long struggle rewriting most of the code.
The proxy server you describe would deal mostly with string data and I think its reasonable to implement in C#. In your example,
if header x == y, do z
the slowest part might actually be doing whatever 'z' is and you'll have to do that work regardless of the language.
In my experience, the design and implementation has much more to do with performance than do the choice of language/framework (however, the usual caveats apply: eg, don't write a device driver in C# or java).
I wouldn't think twice about writing the type of program you describe in a managed language (be it Java, C#, etc). These days, the performance gains you get from using a lower level language (in terms of closeness to hardware) is often easily offset by the runtime abilities of a managed environment. Of course this is coming from a C#/python developer so I'm not exactly unbiased...
If you need a fast and reliable proxy server, it might make sense to try some of those that already exist. But if you have custom features that are required, then you may have to build your own. You may want to collect some more information on the expected load: hundreds of users might be a few requests a minute or a hundred requests a second.
Assuming you need to serve under or around 200 qps on a single machine, C# should easily meet your needs -- even languages known for being slow (e.g. Ruby) can easily pump out a few hundred requests a second.
Aside from performance, there are other reasons to choose C#, e.g. it's much easier to write buffer overflows in C++ than C#.
Is your http server going to run on a dedicated machine? If yes, I would say go with C# if it is easier for you. If you need to run other applications on the same machine, you'll need to take into account the memory footprint of your application and the fact that GC will run at "random" times.

Categories