Run C# code on GPU

Run C# code on GPU - c#

I have no knowledge of GPU programming concepts and APIs. I have a few questions:
Is it possible to write a piece of managed C# code and compile/translate it to some kind of module, which can be executed on the GPU? Or am I doomed to have two implementations, one for managed on the CPU and one for the GPU (I understand that there will be restrictions on what can be executed on the GPU)?
Does there exist a decent and mature API to program independently against various GPU hardware vendors (i.e. a common API)?
Are there any best practices if one wants to develop applications that run on a CPU, written in managed language, and also provide speed optimizations if suitable GPU hardware is present?
I would also be glad for links to any kind of documentation with appropriate learning resources.
Best,
Jozef

1) No - not for the general case of C# - obviously anything can be created for some subset of the language
2) Yes - HLSL using Direct X or Open GL
3) Not generally possible - CPU and GPU coding are fundamentally different
Basically you can't think of CPU and GPU coding as being comparable. A GPU is a highly specialised parallel processing tool - for lots of parallel simple calculations.
Trying to write a general progam in a GPU with lots of branches etc just won't be efficient - maybe not even possible.
Their memory access architectures are totally different.
You should write for the CPU but farm out appropriate parallel computations to the GPU.

1) No, not for the general case of C#, but a small subset, yes. Either through a runtime (check Tidepowerd GPU.NET) or via language support (LINQ or Code Quotations).
2) Yes, DirectCompute (DX11 Compute Shaders) and OpenCL are both vendor independent, mature APIs and you can find .NET binding for them.
3) No, as James said, they are different beast. GPU are high latency processors optimized for high throughput data parallel applications whereas CPU are low latency processors optimized for sequential general purpose applications.
The only research project I know that tries to address this issue is the SPAP language.
My advice, don't try to find the perfect universal API/runtime because there's none. Pick an existing technology (DirectCompute or OpenCL) and see how you can leverage it for your business.
Useful links for starting:
Microsoft DirectCompute SDK (DirectCompute is part of the DirectX SDK)
NVIDIA Compute SDK (ton of samples, CUDA, DirectCompute and OpenCL ones)
AMD Stream SDK (mostly OpenCL samples)

1) Not that I know of, but there might be a library for C# that can help you.
2) OpenCL. It's GPU-independent and can even run on CPUs.
3) OpenCL will help you with that, you can compile for CPU too with OpenCL, though I'm not sure how great of code it makes for the CPU. I've really fallen in love with OpenCL lately, it works really really well.

There's also brahma. It supposedly captures expressions and compiles them for the GPU. I haven't tried myself.
And, Microsoft has a research prototype called accelerator, which is similar in goal but syntactically different.

Have you looked at Alea GPU? There libraries, while not completely free, have a fair license. There is great documentation and an impressive looking tool-chain.

For Java, see the Aparapi project (https://github.com/aparapi/aparapi). This allows a subset of Java to be run on any GPU which supports OpenCL. The bytecode of Kernel classes is cross-compiled at runtime to OpenCL code. There are severe restrictions on the java code which can be cross-compiled - basically no Objects can be used as fields, locals or method args.
However a hefty advantage is that the kernels can be executed in either Java or OpenCL (with automatic fallback to Java ThreadPool execution in the event of unavailability of an appropriate GPU/APU device). This sounds like the closest thing to what you are seeking in part 3 of your question (though of course the managed language is not C#).
I'm not aware of anything similar in C#.

Related

Can C# .NET be used for hard real-time?

Given that the familiar form of .NET is run on Windows, which is not a real-time O/S, and MONO runs on Linux (standard kernel is also not a real-time O/S).
Given also, that any memory allocation scheme offering garbage collection (as in "managed" .NET), and indeed any heap memory scheme will introduce non-deterministic, potentially non-trivial delays into an application's execution behavior.
Is there any combination of alternate host O/S and coding paradigm in which one can leverage all of the power and conveniences of C# .NET while implementing a solution which can execute designated portions of code within tightly specified time constraints? e.g. start a C# method every 10ms to a tolerance of less than 1ms, with completion time determined only by the work performed in the method itself?
Obviously, the application would have to be carefully written; time-critical code would have to avoid memory allocations; the application would have to have completed all its memory allocation etc. work and have no other threads active once the hard real-time loop is started. Also, the host O/S would have to support real-time scheduling.
Is this possible within the .NET / MONO framework, or is it precluded by the design of the .NET runtime, framework, and O/Ss on which it (or compatible equivalent) is supported?
For example: is it possible to do reliable fine-grained (~1ms) machine control purely in C# with something like NETduino, or do they have limits or require alternate strategies for such applications?

Short Answer: No.
Longer answer: The closest you can get is running the .net Micro Framework directly on Hardware, but the TinyCLR still doesn't give you deterministic timings. Microsoft has Windows CE/Windows Embedded Compact as their real time offering, but even that is only real time for slower tasks (I believe somewhere in the range of 50 microseconds or more - not sure if that qualifies for Hard Real Time)
I do not know if it were technically possible to create a real-time c# implementation, but no one has done one and even .net native isn't made for that.

Can C# be used for hard real-time? Yes
When we talk about real-time it's most often (if not always) about robotics and IoT. And for that we almost always go with one of these options (forget Windows CE and Windows 10 IoT):
Microcontrollers (example: Arduino, RPi Pico, NodeMCU)
Linux based SBCs (example: Raspberry Pi, BeagleBone, Rock Pi)
Microcontrollers are by nature real-time. Basically the device will just run a loop forever (there are interrupts and multi-threading on some chips though). Top languages in this category are C/C++ and MicroPython. But C# can also be used:
Wilderness Labs (Netduino and Meadow F7)
.NET nanoframefork (several boards)
The second option (Linux based SBCs) is a bit more tricky. The OS has complete control over the hardware and it has a scheduler. That way many processes can be run on just one CPU. The OS itself has a lot of housekeeping as well.
Linux has a set of scheduling APIs that can be used to tell the OS that we want you to favor our process over others. And the OS will do its best to comply but no guarantees. This is usually called soft real-time. In .NET you can use the Process.PriorityClass to change your process's nice value. Depending on how busy the OS is and the amount of resources available (CPUs and memory) you might get satisfying results.
Other than that, Linux also provides hard real-time capabilities with the PREEMT_RT patch, and there is also a feature that you can isolate a CPU core for your selected processes. But to my knowledge .NET does not have any API to use these capabilities (P/Invoke may work).

Can I utilise cores in GPU from c# WITHOUT change to code?

I realise there are several questions on this subject but I believe my angle is unique.
I have a mature C# app that I use for scientific number crunching. In the code I start 24 C# threads on my 24 HyperThread (i.e. I have 2 CPUs each with 6 cores/ 12 threads) workstation. I run Windows 7 and it handles it brilliantly - I am able to use my full processing power to get my work done.
I see that some GPUs advertise "448 cores". If I bought one of these would my c# app be able to utilise them? I mean without rewriting my code in any major way. Would the threads I start get taken up by the GPU cores instead of the CPU HyperThreads as is the case now?
FOLLOW ON QUESTION
Hi, I appreciate the answers I am getting - even if negative.
Is there any other hardware I should be thinking about (not too expensive) that would give me a large number of Cores, but would be able to run my c# code without a rewrite?

You'd need to rewrite your code really to make use of a gpu. These links might be useful:-
CUDA .NET - CUDA functionality through .NET apps.
CUDA Sharp - C# wrapper for nVidia Toolkit
These are based on the nVidia CUDA system so you'd need an nVidia card for this of course.

Heh... no. No way no how. Those "cores" aren't the same. To take advantage of any GPU computing, you need to write your computations in a very specific way. Try OpenCL maybe. But the answer to your question is no.
As for your edit, the only possible thing with few changes (depending on how you've currently structured it) is a processor. If you're not making general software, you could probably run 48 non-HT individual cores. Maybe that's not the bottleneck, though. You could increase your RAM to make everything generally faster to a certain point.

No. .Net threads will not automatically take advantage of GPU cores for processing. They are very different from normal processor cores. You would need to alter your program to take advantage of GPU processing.

Is there any advantage to using C++ instead of C# when using Direct3D?

Is there any advantage to using C++ instead of C# when using Direct3D? The tutorials I've found for C++ and DirectX all use Direct3D (which to my knowledge is managed). Similarly, all of the C# tutorials I've found are for Direct3D.
Is Direct3D managed?
Is there any difference between using D3D in either of the two languages?

DirectX is entirely native. Any impression you may have that it's managed is completely and utterly wrong. There are managed wrappers that will allow you to use DirectX from managed code. In addition, DirectX is programmed to be accessed from C++ or C, or similar languages. If you look at the SlimDX project, they encountered numerous issues, especially due to resource collection, because C# doesn't genuinely support non-memory resources being automatically collected and using doesn't cut the mustard. In addition, game programming can be very CPU-intensive, and often, the additional performance lost by using a managed language is untenable, and virtually all existing supporting libraries are for C or C++.
If you want to make a small game, or something like that, there's nothing at all stopping you from using managed code. However, I know of no commercial games that actually take this route.

The point of Direct3D is to move rendering off the CPU and onto the GPU. If there were to be a significant performance difference it would be for that code that runs on the CPU. Therefore I don't see that there should be any significant performance difference between native and managed code for the part of your code that interfaces with Direct3D.
Direct3D itself is not managed code.

It depends on what you're doing exactly. As David Heffernan mentioned, one of the objectives of Direct3D is to move as much processing as possible to the GPU. With the advent of vertex shaders, pixel shaders, and much more, we're closer to that reality than ever.
Of course given infinite time and resources, you can usually create more efficient algorithms in C++ than C#. This will affect performance at the CPU level. Today, processing that is not graphics related is still mostly done on the CPU. There are things like CUDA, OpenCL, and even future versions of DirectX which will open up possibilities of moving any parallel-friendly algorithm to the GPU as well. But the adoption rate of those technologies (and the video cards that support it) isn't exactly mainstream just yet.
So what types of CPU-intensive algorithms should you consider C++ for?
Artificial Intelligence
Particle engines / n-body simulations
Fast Fourier transform
Those are just the first things I can think of. At least the first two are very common in games today. AI is often done in a compromised fashion in games to run as quickly as possible, simply because it can be so processor intensive. And then particle engines are everywhere.

Suitability of C# for clustered calculation-heavy apps?

I'm preparing to write a photonic simulation package that will run on a 128-node Linux and Windows cluster, with a Windows-based client for designing jobs (CAD-like) and submitting them to to the cluster.
Most of this is well-trod ground, but I'm curious how C# stacks up to C++ in terms of real number-crunching ability. I'm very comfortable with both languages, but I find the superior object model and framework support of C# with .NET or Mono incredibly enticing. However, I can't, with this application, sacrifice too much in processing power for the sake of developer preference.
Does anyone have any experience in this area? Are there any hard benchmarks available? I'd assume that the the final machine code would be optimized using the same techniques whether it comes from a C# or C++ source, especially since that typically takes place at the pcode/IL level.

The optimisation techniques employed by C# and native C++ are vastly different. C# compilers emit IL, which is only marginally optimised and then JIT'ed to binary code when it is about to execute for the first time. Most of the optimisation work happens inside the JIT compiler.
This has pros and cons. JIT has time budgets, which limits how much effort it can expend on optimisation. But it also has intimate knowledge of the hardware it is actually running on, so it can (in theory) make transparent use of newer CPU opcodes and detailed knowledge of performance data such as a pipeline hazards database.
In practice, I don't know how significant the latter is. I do know that at least Mono will parallelise some loops automatically if it finds itself running on a CPU with SSE (SSE2, perhaps?), which may be a big deal for your scenario.

I did a quick search and found this:
http://www.drdobbs.com/184401976;jsessionid=232QX0GU3C3KXQE1GHOSKH4ATMY32JVN
Edit: Bear in mind (on reading the article) that this was done 5 years ago so performance is likely to be better all round!

I have found these performance comparisons:
http://reverseblade.blogspot.com/2009/02/c-versus-c-versus-java-performance.html
http://systematicgaming.wordpress.com/2009/01/03/performance-c-vs-c/
http://journal.stuffwithstuff.com/2009/01/03/debunking-c-vs-c-performance/
http://www.csharphelp.com/2007/01/managed-c-versus-unmanaged-c/
And here is even a case study:
http://www.itu.dk/~sestoft/papers/numericperformance.pdf
Hope it helps.

C# Performance For Proxy Server (vs C++)

I want to create a simple http proxy server that does some very basic processing on the http headers (i.e. if header x == y, do z). The server may need to support hundreds of users. I can write the server in C# (pretty easy) or c++ (much harder). However, would a C# version have as good of performance as a C++ version? If not, would the difference in performance be big enough that it would not make sense to write it in C#?

You can use unsafe C# code and pointers in critical bottleneck points to make it run faster. Those behave much like C++ code and I believe it executes as fast.
But most of the time, C# is JIT-ted to uber-fast already, I don't believe there will be much differences as with what everyone has said.
But one thing you might want to consider is: Managed code (C#) string operations are rather slow compared to using pointers effectively in C++. There are more optimization tricks with C++ pointers than with CLR strings.
I think I have done some benchmarks before, but can't remember where I've put them.

Why do you expect a much higher performance from the C++ application?
There is no inherent slowdown added by a C# application when you are doing it right. (not too many dropped references, frequent object creation/dropping per call, etc.)
The only time a C++ application really outperforms an equivalent C# application is when you can do (very) low level operations. E.g. casting raw memory pointers, inline assembler, etc.
The C++ compiler may be better at creating fast code, but mostly this is wasted in most applications. If you do really have a part of your application that must be blindingly fast, try writing a C call for that hot spot.
Only if most of the system behaves too slowly you should consider writing it in C/C++. But there are many pitfalls that may kill your performance in your C++ code.
(TLDR: A C++ expert may create 'faster' code as an C# expert, but a mediocre C++ programmer may create slower code than mediocre C# one)

I would expect the C# version to be nearly as fast as the C++ one but with smaller memory footprint.
In some cases managed code is actually a LOT faster and uses less memory compared to non optimized C++. C++ code can be faster if written by expert, but it rarely justifies the effort.
As a side note I can recall a performance "competition" in the blogosphere between Michael Kaplan (c#) and Raymond Chan (C++) to write a program, that does exactly the same thing. Raymond Chan, who is considered one of the best programmers in the world (Joel) succeeded to write faster C++ after a long struggle rewriting most of the code.

The proxy server you describe would deal mostly with string data and I think its reasonable to implement in C#. In your example,
if header x == y, do z
the slowest part might actually be doing whatever 'z' is and you'll have to do that work regardless of the language.

In my experience, the design and implementation has much more to do with performance than do the choice of language/framework (however, the usual caveats apply: eg, don't write a device driver in C# or java).
I wouldn't think twice about writing the type of program you describe in a managed language (be it Java, C#, etc). These days, the performance gains you get from using a lower level language (in terms of closeness to hardware) is often easily offset by the runtime abilities of a managed environment. Of course this is coming from a C#/python developer so I'm not exactly unbiased...

If you need a fast and reliable proxy server, it might make sense to try some of those that already exist. But if you have custom features that are required, then you may have to build your own. You may want to collect some more information on the expected load: hundreds of users might be a few requests a minute or a hundred requests a second.
Assuming you need to serve under or around 200 qps on a single machine, C# should easily meet your needs -- even languages known for being slow (e.g. Ruby) can easily pump out a few hundred requests a second.
Aside from performance, there are other reasons to choose C#, e.g. it's much easier to write buffer overflows in C++ than C#.

Is your http server going to run on a dedicated machine? If yes, I would say go with C# if it is easier for you. If you need to run other applications on the same machine, you'll need to take into account the memory footprint of your application and the fact that GC will run at "random" times.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Run C# code on GPU - c#

There's also brahma. It supposedly captures expressions and compiles them for the GPU. I haven't tried myself. And, Microsoft has a research prototype called accelerator, which is similar in goal but syntactically different.

Have you looked at Alea GPU? There libraries, while not completely free, have a fair license. There is great documentation and an impressive looking tool-chain.