Suitability of C# for clustered calculation-heavy apps?

Suitability of C# for clustered calculation-heavy apps? - c#

I'm preparing to write a photonic simulation package that will run on a 128-node Linux and Windows cluster, with a Windows-based client for designing jobs (CAD-like) and submitting them to to the cluster.
Most of this is well-trod ground, but I'm curious how C# stacks up to C++ in terms of real number-crunching ability. I'm very comfortable with both languages, but I find the superior object model and framework support of C# with .NET or Mono incredibly enticing. However, I can't, with this application, sacrifice too much in processing power for the sake of developer preference.
Does anyone have any experience in this area? Are there any hard benchmarks available? I'd assume that the the final machine code would be optimized using the same techniques whether it comes from a C# or C++ source, especially since that typically takes place at the pcode/IL level.

The optimisation techniques employed by C# and native C++ are vastly different. C# compilers emit IL, which is only marginally optimised and then JIT'ed to binary code when it is about to execute for the first time. Most of the optimisation work happens inside the JIT compiler.
This has pros and cons. JIT has time budgets, which limits how much effort it can expend on optimisation. But it also has intimate knowledge of the hardware it is actually running on, so it can (in theory) make transparent use of newer CPU opcodes and detailed knowledge of performance data such as a pipeline hazards database.
In practice, I don't know how significant the latter is. I do know that at least Mono will parallelise some loops automatically if it finds itself running on a CPU with SSE (SSE2, perhaps?), which may be a big deal for your scenario.

I did a quick search and found this:
http://www.drdobbs.com/184401976;jsessionid=232QX0GU3C3KXQE1GHOSKH4ATMY32JVN
Edit: Bear in mind (on reading the article) that this was done 5 years ago so performance is likely to be better all round!

I have found these performance comparisons:
http://reverseblade.blogspot.com/2009/02/c-versus-c-versus-java-performance.html
http://systematicgaming.wordpress.com/2009/01/03/performance-c-vs-c/
http://journal.stuffwithstuff.com/2009/01/03/debunking-c-vs-c-performance/
http://www.csharphelp.com/2007/01/managed-c-versus-unmanaged-c/
And here is even a case study:
http://www.itu.dk/~sestoft/papers/numericperformance.pdf
Hope it helps.

Related

JIT compiler and its benefits for speed up the execution of programs in .net in front of c++

As we know, .net has CLI and JIT to execute programs. but these two stage maybe cause to lower speed and performance in compare with c++ that compile all codes in one stage. I want to know that .net's languages how to overcome this disadvantage and deal with it?

Having worked on both C++ compilers and now having spent the past few years working on the .Net JIT, I think there are a few things worth considering:
As many others have pointed out, the JIT is running in process with your app, and it tries to carefully balance quick JIT times versus the quality of jitted code. The more elaborate optimizations seen in C++ often come with very high compile time price tags, and there are some pretty sharp knees in the compile-time-vs-code-quality graph.
Prejitting seemingly can change this equation somewhat as the jit runs beforehand and could take more time, but prejitting's ability to enlarge optimization scope is quite limited (for instance we try and avoid introducing fragile cross-assembly dependencies, and so for example won't inline across assembly boundaries). So prejitted code tends to run somewhat more slowly than jitted code, and mainly helps application startup times.
.Net's default execution model precludes many interprocedural optimizations, because of dynamic class loading, reflection, and the ability of a profiler to update method bodies in a running process. We think, by and large, that the productivity and app architecture gains from these features are worth the trouble. But for cases where these features are not needed we are looking for ways to ensure that if your app doesn't need it, your app won't pay for it.
For example we have some "pure" AOT work going on over in CoreRT but as a consequence reflection is limited.
.Net Core 2.1 includes a preview of Tiered jitting, which will allow us to ease some of the constraints on jit time -- we'll be able to invest more time jitting methods that we know are frequently executed. So I would expect to see more sophisticated optimizations get added to the JIT over time.
.Net Core 2.1 also includes a preview of Hardware Intrinsics so you can take full advantage of the rich instruction sets available on modern hardware.
.Net's JIT does not yet get much benefit from profile feedback. This is something we are actively working on changing, though it will take time, and will likely be tied into tiering.
The .Net execution model fundamentally alters the way one needs to think about certain compiler optimizations. For instance, from the compiler's standpoint, many operations -- including low level things like field access -- can raise semantically meaningful exceptions (in C++ only calls/throws can cause exceptions). And .Net's GC is precise and relocating which imposes constraints on optimizations in other ways.

Compile C#, so that it runs with the speed of C++

Alright, so I wanted to ask if it's actually possible to make a parser from c# to c++.
So that code written in C# would be able to run as fast as code written in C++.
Is it actually possible to do? I'm not asking how hard is it going to be.

What makes you think that translating your C# code to C++ would magically make it faster?
Languages don't have a speed. Assuming that C# code is slower (I'll get back to that), it is because of what that code does (including the implicit requirements placed by C#, such as bounds checking on arrays), and not because of the language it is written in.
If you converted your C# code to C++, it would still need to do bounds checking on arrays, because the original source code expected this to happen, so it would have to do just as much work.
Moreover, C# often isn't slower than C++. There are plenty of benchmarks floating around on the internet, generally showing that for the most part, C# is as fast as (or faster than) C++. Only when you spend a lot of time optimizing your code, does C++ become faster.
If you want faster code, you need to write code that requires less work to execute, not try to change the source language. That's just cargo-cult programming at its worst. You once saw some efficient code, and that was written in C++, so now you try to make things C++, in the hope of attracting any efficiency that might be passing by.
It just doesn't work that way.

Although you could translate C# code to C++, there would be the issue that C# depends on the .Net framework libraries which are not native, so you could not simply translate C# code to C++.
Update
Also C# code depends on the runtime to do things such as memory management i.e. Garbage Collection. If you translated the C# code to C++, where would the memory management code be? Parsing and translating is not going to fix issues like that.

The Mono project has invested quite a lot of energy in turning LLVM into a native machine code compiler for the C# runtime, although there are some problems with specific language constructs like shared generics etc.. Check it out and take it for a spin.

You can use NGen to compile IL to native code

Performance related tweaks:
Platform independent
use a profiler to spot the bottlenecks;
prevent unnecessary garbage (spot it using generation #0 collect count and the Large Object heap)
prevent unnecessary copying (use struct wisely)
prevent unwarranted generics (code-sharing has unexpected performance side effects)
prefer oldfashioned loops over enumerator blocks when performance is an issue
When using LINQ watch closely where you maintain/break deferred evaluation. Both can be enormous boosts to performance
use reflection.emit/Expression Trees to precompile certain dynamic logic that is performance bottleneck
Mono
use Mono --gc=sgen --optimize=inline,... (the SGEN garbage collector can make orders of magnitude difference). See also man mono for a lot of tuning/optimization options
use MONO_GENERIC_SHARING=none to disable sharing of generics (making particular tasks a lot quicker especially when supporting both valuetypes and reftypes) (not recommended for regular production use)
use the -optimize+ compile flag (optimizing the CLR code independently from what the JITter may do with that)
Less mainstream:
use the LLVM backend (click the quote:)
This allows Mono to benefit from all of the compiler optimizations done in LLVM. For example the SciMark score goes from 482 to 610.
use mkbundle to create a statically linked NATIVE binary image (already fully JITted, i.e. AOT (ahead-of-time compiled))
MS .NET
Most of the above have direct Microsoft pendants (NGen, `/Optimize' etc.)
Of course MS don't have a switchable/tunable garbage collector, and I don't think a fully compiled native binary can be achieved like with mono.

As always the answer to making code run faster is:
Find the bottleneck and optimize that
Most of the time the bottleneck is either:
time spend in a critical loop
Review your algorithm and datastructure, do not change the language, the latter will give a 10% speedup, the first will give you a 1000x speedup.
If you're stuck on the best algorithm, you can always ask a specific, short and detailed question on SO.
time waiting for resources for a slow source
Reduce the amount of stuff you're requesting from the source
instead of:
SELECT * FROM bigtable
do
SELECT TOP 10 * FROM bigtable ORDER BY xxx
The latter will return instantly and you cannot show a million records in a meaningful way anyhow.
Or you can have the server at the order end reduce the data so that it doesn't take a 100 years to cross the network.
Alternativly you can execute the slow datafetch routine in a separate thread, so the rest of your program can do meaningful stuff instead of waiting.
Time spend because you are overflowing memory with Gigabytes of data
Use a different algorithm that works on a smaller dataset at a time.
Try to optimize cache usage.
The answer to efficient coding is measure where your coding time goes
Use a profiler.
see: http://csharp-source.net/open-source/profilers
And optimize those parts that eat more than 50% of your CPU time.
Do this for a number of iterations, and soon your 10 hour running time will be down to a manageable 3 minutes, instead of the 9.5 hours that you will get from switching to this or that better language.

C# Performance - should I write the computation heavy methods in c++?

I am building a prototype for a quantitative library that does some signal analysis using image processing techniques. I built the initial prototype entirely in C#, but the performance is not as good as expected. Most of the computation is done through heavy matrix calculations, and these are taking up most of the time.
I am wondering if it is worth it to write a C++/CLI interface to unmanaged C++ code. Has anyone ever gone through this? Other suggestions for optimizing C# performance is welcome.

There was a time where it would definitely be better to write in C/C++, but the C# optimizer and JIT is so good now, that for pure math, there's probably no difference.
The difference comes when you have to deal with memory and possibly arrays. Even so, I'd still work with C# (or F#) and then optimize hotspots. The JIT is really good at optimizing away small, short-lived objects.
With arrays, you have to worry about C# doing bounds-checks on each access. Read this:
Link
Test it yourself -- I've been finding C# to be comparable -- sometimes faster.

It's hard to give a definitive answer here, but if performance is an issue I'd find a time-tested library with the performance you need and wrap it.
Something simple like multiplication or division is not much different between c++ and c# - the c++ compiler has an optimizer, and the CLR runtime has the on-demand JITer that does optimizations. So in theory, the c++ would outperform c# only on the first call.
However, theory and practice are not the same. With more complicated algorithms you also run into the differences between memory managers, and the maturity of the optimization techniques. If you want anecdotal evidence, you can find some math-heavy comparisons here.
Personally, I find doing the heavy computations in a native library and using c++/CLI to call it gives a good boost when the computations are the biggest bottleneck. As always, make sure that's the case before doing any optimization.

Matrix math is best done in native code in my opinion. Even the C++ libraries typically allow binding to a lower-level implementation like LAPACK.
There is a C# LAPACK port here (also C# BLAS on the same site) which you could try but I'd be surprised if this is faster than native code.

I've done a lot of image processing work in C# and, yes, I usually do use native code for heavy duty code where performance matters but I used just PInvokes and not the C++/CLI interface. A lot of time this is not needed, though.
There's quite a few good .NET profilers. The Red Gate one is my personal favorite. It might help you to visualize where the bottlenecks are.

The only reasonable language benchmark out there: http://shootout.alioth.debian.org/
See for yourself.

Performance for mathematical computation is pretty poor in C#. I was gobbsmacked to find how slow mathematical calculations are in C#. Just write a loop in C# and C++ having a few Multiplication, Sin, Cos, ... and the difference is immense.
I do not know Managed C++ but imlpementing it all in unmanaged C++ abd I would umagine exposing granular interfaces through P/Invoke should have little perfromance hit.
That is what I have done for heavy real-time image processing.

I built the initial prototype entirely in C#, but the performance is not as good as expected.
then you have two options:
Build another prototype in C++, and see how that compares, or optimize your C# code. No matter what language you write in, your code won't be fast until you've profiled and optimized and profiled and optimized it. That is especially true in C++. If you write the fastest possible implementation in C# and compare it to the fastest possible implementation in C++, then the C++ version will most likely be faster. But it will come at a cost in terms of development time. It's not trivial to write efficient C++ code. If you are new to the language then you will most likely write very inefficient code, especially if you are coming from C# or Java, where things are done differently and have different costs.
If you just write a working implementation, without worrying too much about performance, then I'm guessing that the C# version will probably be faster.
But it really depends on what kind of performance you're after (and not least, how expensive the operations you need to perform are. There's an overhead associated with the transition from managed to native code, so it is not worth it for short operations that are executed often.
Number-crunching code in C++ can be as fast as code written in Fortran (give or take a few percent), but to achieve that, you need to use a lot of advanced techniques (expression templates and lots of metaprogramming) or some fairly complex libraries which implement it for you.
Is that worth it? Or can C# be made fast enough for your needs?

You should write computational heavy programs in C++, you cannot reach anywhere near to C++ performance by optimizing C#. The overhead of calling wrappers is negligible assuming the computation takes considerable time. I have done coding in both C++ and C# and have never seen any occasion where .NET framework code come comparable to C++. There are a few instances where C# where runs better, but it was better because of lack of appropriate libraries or bad coding in C++. If you can write code equally well in C# and C++, I would write performance code in C++ and everything else is C#.
If x is the world best C++ programmer and y is the best C# programmer then most of the times x can write faster code than y. However, y can finish the coding faster than x most of the times.

Run C# code on GPU

I have no knowledge of GPU programming concepts and APIs. I have a few questions:
Is it possible to write a piece of managed C# code and compile/translate it to some kind of module, which can be executed on the GPU? Or am I doomed to have two implementations, one for managed on the CPU and one for the GPU (I understand that there will be restrictions on what can be executed on the GPU)?
Does there exist a decent and mature API to program independently against various GPU hardware vendors (i.e. a common API)?
Are there any best practices if one wants to develop applications that run on a CPU, written in managed language, and also provide speed optimizations if suitable GPU hardware is present?
I would also be glad for links to any kind of documentation with appropriate learning resources.
Best,
Jozef

1) No - not for the general case of C# - obviously anything can be created for some subset of the language
2) Yes - HLSL using Direct X or Open GL
3) Not generally possible - CPU and GPU coding are fundamentally different
Basically you can't think of CPU and GPU coding as being comparable. A GPU is a highly specialised parallel processing tool - for lots of parallel simple calculations.
Trying to write a general progam in a GPU with lots of branches etc just won't be efficient - maybe not even possible.
Their memory access architectures are totally different.
You should write for the CPU but farm out appropriate parallel computations to the GPU.

1) No, not for the general case of C#, but a small subset, yes. Either through a runtime (check Tidepowerd GPU.NET) or via language support (LINQ or Code Quotations).
2) Yes, DirectCompute (DX11 Compute Shaders) and OpenCL are both vendor independent, mature APIs and you can find .NET binding for them.
3) No, as James said, they are different beast. GPU are high latency processors optimized for high throughput data parallel applications whereas CPU are low latency processors optimized for sequential general purpose applications.
The only research project I know that tries to address this issue is the SPAP language.
My advice, don't try to find the perfect universal API/runtime because there's none. Pick an existing technology (DirectCompute or OpenCL) and see how you can leverage it for your business.
Useful links for starting:
Microsoft DirectCompute SDK (DirectCompute is part of the DirectX SDK)
NVIDIA Compute SDK (ton of samples, CUDA, DirectCompute and OpenCL ones)
AMD Stream SDK (mostly OpenCL samples)

1) Not that I know of, but there might be a library for C# that can help you.
2) OpenCL. It's GPU-independent and can even run on CPUs.
3) OpenCL will help you with that, you can compile for CPU too with OpenCL, though I'm not sure how great of code it makes for the CPU. I've really fallen in love with OpenCL lately, it works really really well.

There's also brahma. It supposedly captures expressions and compiles them for the GPU. I haven't tried myself.
And, Microsoft has a research prototype called accelerator, which is similar in goal but syntactically different.

Have you looked at Alea GPU? There libraries, while not completely free, have a fair license. There is great documentation and an impressive looking tool-chain.

For Java, see the Aparapi project (https://github.com/aparapi/aparapi). This allows a subset of Java to be run on any GPU which supports OpenCL. The bytecode of Kernel classes is cross-compiled at runtime to OpenCL code. There are severe restrictions on the java code which can be cross-compiled - basically no Objects can be used as fields, locals or method args.
However a hefty advantage is that the kernels can be executed in either Java or OpenCL (with automatic fallback to Java ThreadPool execution in the event of unavailability of an appropriate GPU/APU device). This sounds like the closest thing to what you are seeking in part 3 of your question (though of course the managed language is not C#).
I'm not aware of anything similar in C#.

C# Performance For Proxy Server (vs C++)

I want to create a simple http proxy server that does some very basic processing on the http headers (i.e. if header x == y, do z). The server may need to support hundreds of users. I can write the server in C# (pretty easy) or c++ (much harder). However, would a C# version have as good of performance as a C++ version? If not, would the difference in performance be big enough that it would not make sense to write it in C#?

You can use unsafe C# code and pointers in critical bottleneck points to make it run faster. Those behave much like C++ code and I believe it executes as fast.
But most of the time, C# is JIT-ted to uber-fast already, I don't believe there will be much differences as with what everyone has said.
But one thing you might want to consider is: Managed code (C#) string operations are rather slow compared to using pointers effectively in C++. There are more optimization tricks with C++ pointers than with CLR strings.
I think I have done some benchmarks before, but can't remember where I've put them.

Why do you expect a much higher performance from the C++ application?
There is no inherent slowdown added by a C# application when you are doing it right. (not too many dropped references, frequent object creation/dropping per call, etc.)
The only time a C++ application really outperforms an equivalent C# application is when you can do (very) low level operations. E.g. casting raw memory pointers, inline assembler, etc.
The C++ compiler may be better at creating fast code, but mostly this is wasted in most applications. If you do really have a part of your application that must be blindingly fast, try writing a C call for that hot spot.
Only if most of the system behaves too slowly you should consider writing it in C/C++. But there are many pitfalls that may kill your performance in your C++ code.
(TLDR: A C++ expert may create 'faster' code as an C# expert, but a mediocre C++ programmer may create slower code than mediocre C# one)

I would expect the C# version to be nearly as fast as the C++ one but with smaller memory footprint.
In some cases managed code is actually a LOT faster and uses less memory compared to non optimized C++. C++ code can be faster if written by expert, but it rarely justifies the effort.
As a side note I can recall a performance "competition" in the blogosphere between Michael Kaplan (c#) and Raymond Chan (C++) to write a program, that does exactly the same thing. Raymond Chan, who is considered one of the best programmers in the world (Joel) succeeded to write faster C++ after a long struggle rewriting most of the code.

The proxy server you describe would deal mostly with string data and I think its reasonable to implement in C#. In your example,
if header x == y, do z
the slowest part might actually be doing whatever 'z' is and you'll have to do that work regardless of the language.

In my experience, the design and implementation has much more to do with performance than do the choice of language/framework (however, the usual caveats apply: eg, don't write a device driver in C# or java).
I wouldn't think twice about writing the type of program you describe in a managed language (be it Java, C#, etc). These days, the performance gains you get from using a lower level language (in terms of closeness to hardware) is often easily offset by the runtime abilities of a managed environment. Of course this is coming from a C#/python developer so I'm not exactly unbiased...

If you need a fast and reliable proxy server, it might make sense to try some of those that already exist. But if you have custom features that are required, then you may have to build your own. You may want to collect some more information on the expected load: hundreds of users might be a few requests a minute or a hundred requests a second.
Assuming you need to serve under or around 200 qps on a single machine, C# should easily meet your needs -- even languages known for being slow (e.g. Ruby) can easily pump out a few hundred requests a second.
Aside from performance, there are other reasons to choose C#, e.g. it's much easier to write buffer overflows in C++ than C#.

Is your http server going to run on a dedicated machine? If yes, I would say go with C# if it is easier for you. If you need to run other applications on the same machine, you'll need to take into account the memory footprint of your application and the fact that GC will run at "random" times.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.