GPU vs CPU, Circumstances and Graphic Processing

GPU vs CPU, Circumstances and Graphic Processing - c#

My main question is this: for graphic processing (e.g. moving images, collisions, scaling etc.), is there something I can do to get the best performance out of the processing? Under what circumstances is the GPU used instead of the CPU, and can I selectively use one over the other? Should I? Also, what Graphic API should I use if I wanted to get the best performance with graphic processing. Or should I even use an API at all?
I feel as though I'm asking to many questions in one Question, or that I'm not wording it correctly, but I don't see how I could do otherwise. If any clarification is needed, just ask. I've searched many places, but this may be a duplicate, so my apologies if it is.

I'm not sure what you are creating, but these are my musings on the subject:
A good option is XNA. That is basically a wrapper on top of DirectX that is made for utilizing the GPU for graphic intensive tasks, but hides some of the dirty implementation of using DirectX directly. Your best option for games, graphical simulations etc.
Another option is WPF that also uses DirectX at the bottom, but is more geared towards creating user interfaces than games and similar tasks, but is extremely flexible.
WinForms/GDI+ is your worst option, since it does not use hardware acceleration at all, and will only use the CPU.

Related

Thread Affinity

I have a multithreaded program which consist of a C# interop layer over C++ code.
I am setting threads affinity (like in this post) and it works on part of my code, however on second part it doesn't work. Can Intel Compiler / IPP / MKL libs / inline assembly interfere with external affinity setting?
UPDATE:
I can't post code as it is whole environment with many many dlls. I set environment values: OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 IPP_NUM_THREADS=1. When it runs in single thread, it runs ok, but when I use number of C# threads and set affinity per thread (on a quad core machine), the initialization is going fine on separate cores, but during processing all threads start using the same core. Hope I am clear enough.
Thanks.

We've had this exact problem; we'd set our thread affinity to what we wanted, and the IPP/MKL functions would blow that away! The answer to your question is 'yes'.
Auto Parallelism
The issue is that, by default, the Intel libraries like to automatically execute multi-threaded versions of the routines. So, a single FFT gets computed by a number of threads setup by the library specifically for this purpose.
Intel's intent is that the programmer could get on with the job of writing a single threaded application, and the library would allow that single thread to benefit from a multicore processor by creating a number of threads for the maths work. A noble intent (your source code then need know nothing about the runtime hardware to extract the best achievable performance - handy sometimes), but a right bloody nuisance when one is doing one's own threading for one's own reasons.
Controlling the Library's Behaviour
Take a look at these Intel docs, section Support Functions / Threading Support Functions. You can either programmatically control the library's threading tendancies, or there's some environment variables you can set (like MKL_NUM_THREADS) before your program runs. Setting the number of threads was (as far as I recall) enough to stop the library doing its own thing.
Philosophical Essay Inspired By Answering Your Question (best ignored)
More or less everything Intel is doing in CPU design and software (e.g. IPP/MKL) is aimed at making it unnecessary for the programmer to Worry About Threads. You want good math performance? Use MKL. You want that for loop to go fast? Turn on Auto Parallelisation in ICC. You want to make the best use of cache? That's what Hyperthreading is for.
It's not a bad approach, and personally speaking I think that they've done a pretty good job. AMD too. Their architectures are quite good at delivering good real world performance improvements to the "Average Programmer" for the minimal investment in learning, re-writing and code development.
Irritation
However, the thing that irritates me a little bit (though I don't want to appear ungrateful!) is that whilst this approach works for the majority of programmers out there (which is where the profitable market is), it just throws more obstacles in the way of those programmers who want to spin their own parallelism. I can't blame Intel for that of course, they've done exactly the right thing; they're a market led company, they need to make things that will sell.
By offering these easy features the situation of there being too many under skilled and under trained programmers becomes more entrenched. If all programmers can get good performance without having to learn what auto parallelism is actually doing, then we'll never move on. The pool of really good programmers who actually know that stuff will remain really small.
Problem
I see this as a problem (though only a small one, I'll explain later). Computing needs to become more efficient for both economic and environmental reasons. Intel's approach allows for increased performance, and better silicon manufacturing techniques produces lower power consumption, but I always feel like it's not quite as efficient as it could be.
Example
Take the Cell processor at the heart of the PS3. It's something that I like to witter on about endlessly! However, IBM developed that with a completely different philosophy to Intel. They gave you no cache (just some fast static RAM instead to use as you saw fit), the architecture was pretty much pure NUMA, you had to do all your own parallelisation, etc. etc. The result was that if you really knew what you were doing you could get about 250GFLOPS out of the thing (I think the none-PS3 variants went to 320GLOPS), for 80Watts, all the way back in 2005.
It's taken Intel chips about another 6 or 7 years or so for a single device to get to that level of performance. That's a lot of Moores law growth. If the Cell got manufactured on Intel's latest silicon fab and was given as many transistors as Intel put into their big Xeons, it would still blow everything else away.
No Market
However, apart from PS3, Cell was a none-starter market proposition. IBM decided that it would never be a big enough seller to be worth their while. There just wasn't enough programmers out there who could really use it, and to indulge the few of us who could makes no commercial sense, which wouldn't please the shareholders.
Small Problem, Bigger Problem
I said earlier that this was only a small problem. Well, most of the world's computing isn't about high maths performance, it's become Facebook, Twitter, etc. That sort is all about I/O performance, and for that you don't need high maths performance. So in that sense the dependence on Intel Doing Everything For You so that the average programmer to get good math performance matters very little. There's just not enough maths being done to warrant a change in design philosophy.
In fact, I strongly suspect that the world will ultimately decide that you don't need a large chip at all, an ARM should do just fine. If that does come to pass then the market for Intel's very large chips with very good general purpose maths compute performance will vanish. Effectively those of use who want good maths performance are being heavily subsidised by those who want to fill enourmous data centres with Intel based hardware and put Intel PCs on every desktop.
We're simply lucky that Intel apparently has a desire to make sure that every big CPU they build is good at maths regardless of whether most of their users actually use that maths performance. I'm sure that desire has its foundations in marketing prowess and wanting the bragging rights, but those are not hard, commercially tangible artifacts that bring shareholder value.
So if those data centre guys decide that, actually, they'd rather save electricity and fill their data centres with ARMs, where does that leave Intel? ARMs are fine devices for the purpose for which they're intended, but they're not at the top of my list of Supercomputer chips. So where does that leave us?
Trend
My take on the current market trend is that 'Workstations' (PCs as we call them now) are going to start costing lots and lots of money, just like they did in the 1980s / early 90s.
I think that better supercomputers will become unaffordable because no one can spare the $10billions it would take to do the next big chip. If people stop having PCs there won't be a mass market for large all-out GPUs, so we won't even be able to use those instead. They're an exclusive thing, but super computers do play a vital role in our world and we do need them to get better. So who is going to pay for that? Not me, that's for sure.
Oops, that went on for quite a while...

In C#, what's a good way for displaying zoomable, pannable video?

I've been sorting through DirectX, DirectShow, etc. and can't figure out which .NET C# library would be the best.
I'm making an art installation that will feature full screen video. I'd like the user to be able to pan and zoom in and out on the video as it's playing, ideally with no skipping or hiccups. Is there a Microsoft technology that stands out as an obvious choice for this?

Don't forget to consider WPF.. It is a lot faster to get started with than Direct3D... It also has an infrastructure designed for glitch free animations, independent from delays from garbage collection and ui thread activity.
In very complex GUI's, WPF can come with some hidden cost, which annoys people, claiming WPF is slow.
But I am confident it will work fine in the scenario you describe.

IMHO Use XNA. It has much deeper support than the old managed directX. This guy answered your question for you: http://www.david-amador.com/2009/10/xna-camera-2d-with-zoom-and-rotation/

Simulating fluid flow over a heightmap

I am looking for a way to approximate a volume of fluid moving over a heightmap. The easiest solution I can think of is to approximate it as a large number of non-drawn spheres, of small diameter (<0.1m). I would then place a visible plane representing the surface of the water on "top" of the spheres, at the locations they came to rest. To my knowledge, no managed physics engines contain a built in fluid simulator, hence the question.
Implementation would consist of using a physics engine such as JigLibX, which is capable of simulating the motion of the spheres. To determine the height of the planes, I was thinking of averaging the maximum height of each sphere that is on the top layer of a grouping.
I dont expect performance to be great, but would it be approachable for real time? If not, could I use this simulation to pre-bake lines of flow?
I hope this makes sense, I really want opinions/suggestions as to whether this is feasible, or if there is a better way of approaching this.
Thanks for any help, Venatu
(If its relevant, my target platform is XNA 4.0, using C#. Windows only at this point in time, so PhysX/Havok are possibilities for the simulation, but I would prefer a managed solution)

I haven't seen realistic fluid dynamics in real time without using something like PhysX as of yet - probably because the calculations needed are so complicated! The problem with your approach as I see it would come with the resting contact of all those spheres as they settled down, which takes up a lot of processing power. Lots of resting contact points are notorious for eating into performance very quickly, even on the most powerful of desktops.
If you are going down this route then I'd recommend modelling the fluid as an elastic but solid body using spring based physics, where the force applied to one part of the water would use springs to propagate out to the rest. This gives you the option of setting a breaking point for the springs and separating the body into two or more bodies when that happens (and the reverse for coming back together.) This can give you the foundation for things like spray. It's also a more versatile approach in terms of performance, because you can choose the number of particles and springs you use to approximate your model.
It's a big and complicated topic, but I hope that provided at least some insight!

The most popular method to simulate fluids in real-time is Smoothed-particle hydrodynamics.
Several useful links:
http://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics
http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html
http://www.plunk.org/~trina/thesis/html/thesis_toc.html
In addition to simulation itself you will also need some specialized broad-phase collision detection algorithms such as sweep-and-prune or hashing cells.
And you're right, there is no completed 2d solutions for the fluid dynamics.

Is there any advantage to using C++ instead of C# when using Direct3D?

Is there any advantage to using C++ instead of C# when using Direct3D? The tutorials I've found for C++ and DirectX all use Direct3D (which to my knowledge is managed). Similarly, all of the C# tutorials I've found are for Direct3D.
Is Direct3D managed?
Is there any difference between using D3D in either of the two languages?

DirectX is entirely native. Any impression you may have that it's managed is completely and utterly wrong. There are managed wrappers that will allow you to use DirectX from managed code. In addition, DirectX is programmed to be accessed from C++ or C, or similar languages. If you look at the SlimDX project, they encountered numerous issues, especially due to resource collection, because C# doesn't genuinely support non-memory resources being automatically collected and using doesn't cut the mustard. In addition, game programming can be very CPU-intensive, and often, the additional performance lost by using a managed language is untenable, and virtually all existing supporting libraries are for C or C++.
If you want to make a small game, or something like that, there's nothing at all stopping you from using managed code. However, I know of no commercial games that actually take this route.

The point of Direct3D is to move rendering off the CPU and onto the GPU. If there were to be a significant performance difference it would be for that code that runs on the CPU. Therefore I don't see that there should be any significant performance difference between native and managed code for the part of your code that interfaces with Direct3D.
Direct3D itself is not managed code.

It depends on what you're doing exactly. As David Heffernan mentioned, one of the objectives of Direct3D is to move as much processing as possible to the GPU. With the advent of vertex shaders, pixel shaders, and much more, we're closer to that reality than ever.
Of course given infinite time and resources, you can usually create more efficient algorithms in C++ than C#. This will affect performance at the CPU level. Today, processing that is not graphics related is still mostly done on the CPU. There are things like CUDA, OpenCL, and even future versions of DirectX which will open up possibilities of moving any parallel-friendly algorithm to the GPU as well. But the adoption rate of those technologies (and the video cards that support it) isn't exactly mainstream just yet.
So what types of CPU-intensive algorithms should you consider C++ for?
Artificial Intelligence
Particle engines / n-body simulations
Fast Fourier transform
Those are just the first things I can think of. At least the first two are very common in games today. AI is often done in a compromised fashion in games to run as quickly as possible, simply because it can be so processor intensive. And then particle engines are everywhere.

Multithreaded drawing in .NET?

(Edit: to clarify, my main goal is concurrency, but not necessarily for multi-core machines)
I'm fairly new to all concepts on concurrency, but I figured out I needed to have parallel drawing routines, for a number of reasons:
I wanted to draw different portions of a graphic separatedly (background refreshed less often than foreground, kept on a buffer).
I wanted control about priority (More priority to UI responsiveness than drawing a complex graph).
I wanted to have per-frame drawing calculations multithreaded.
I wanted to offer cancelling for complex on-buffer drawing routines.
However, being such a beginner, my code soon looked like a mess and refactoring or bug-fixing became so awkward that I decided I need to play more with it before doing anything serious.
So, I'd like to know how to make clean, easy to mantain .NET multithreaded code that makes sense when I look at it after waking up the next day. The bigest issue I had was structuring the application so all parts talk to each other in a smart (as opposed to awkward and hacky) way.
Any suggestion is welcome, but I have a preference for sources that I can digest in my free time (e.g., not a 500+ pages treatise on concurrency) and for C#/VB.NET, up to the latest version (since I see there have been advances). Basically I want something straight to the point so I can get started by playing with the concepts on my toy projects.

but I figured out I needed to have
parallel drawing routines
Three words: NOT UNDER WINDOWS.
Simple like that. Standard windows drawing is single threaded per definition, for compatibility reasons. Any UI control (let's stick to the .NET world) shall ONLY be manipulated from it's creational thread (so in reality it is more brutal than single threaded - it is ONE SPECIFIC THREAD ONLY).
You can do the precalculation separately, but the real drawing has t obe done from that one thread.
UNLESS you allocate a bitmap, have your own drawing there, and then turn that over to the UI thread for painting onto the window.
This has nothing to do with the whole Task Parallel Library etc. (which I downvoted) but goes back town to a very old requirement that is kept around for simplicity reason AND compatibility. This is the the reason any UI thread is to be market as sintgle threaded appartement.
Also note that multi threaded drawing, if you implement it yourself, has serious implications. Which one wins optically (stays in the foreground)? This is no really determinable when using multi threaded. You are free to try it, though.
In this case:
Having your own buffer and synchronization is a must. Stay away from any windows level graphics library (WPF or Winforms) except for the last step (rawing your bitmap).
DirectX 11 supposedly has some support for multi thread calls, but I am unsure how far that goes.

The Task Parallel Library is definitely the place to look for simplifying your code. I've personally written a (semi-long) introduction to Parallelism with .NET 4 that covers quite a few concepts that would be useful.
Be aware, however, that you probably will want to consider leaving your drawing single threaded. You should try to keep the computation multithreaded, and the actual drawing operations done on the GUI thread.
Most drawing APIs require all actual drawing calls to happen on the same synchronization context.
That being said, using the new collection classes like ConcurrentQueue simplify this type of code. Try to think in terms of lots of threads (producers) adding "drawing operations" to a shared, concurrent queue - and one thread (the consumer) grabbing the operations and performing them.
This gives you a reasonably scalable, but fairly simple design on which you can build.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.