XNA, Vector math and the GPU

XNA, Vector math and the GPU - c#

I am looking into making a game for Windows Phone and Windows 8 RT. The first iteration of the game will use XNA for the UI.
But since I plan to have other iterations that may not use XNA, I am writing my core game logic in a Portable Class Library.
I have gotten to the part where I am calculating vector math (sprite locations) in the core game logic.
As I was figuring this out, I had a co-worker tell me that I should make sure that I am doing these calculations on the GPU (not the CPU).
So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Side Question: If not, should they be done on the GPU? Or is it OK for me to do them in my Portable Class Library and have the CPU run them?
Note: If I need to have XNA do them so that I can use the GPU then it is not hard to inject that functionality into my core logic from XNA. I just want to know if it is something I should really be doing.
Note II: My game is a 2D game. It will be calculating movement of bad guys and projectiles along a vector. (Meaning this is not a huge 3D Game.)

I think your co-worker is mistaken. Here are just two of the reasons that doing this kind of calculation on the GPU doesn't make sense:
The #1 reason, by a very large margin, is that it's not cheap to get data onto the GPU in the first place. And then it's extremely expensive to get data back from the GPU.
The #2 reason is that the GPU is good for doing parallel calculations - that is - it does the same operation on a large amount of data. The kind of vector operations you will be doing are many different operations, on a small-to-medium amount of data.
So you'd get a huge win if - say - you were doing a particle system on the GPU. It's a large amount of homogeneous data, you perform the same operation on each particle, and all the data can live on the GPU.
Even XNA's built-in SpriteBatch does most of its per-sprite work on the CPU (everything but the final, overall matrix transformation). While it could do per-sprite transforms on the GPU (and I think it used to in XNA 3), it doesn't. This allows it to reduce the amount of data it needs to send the GPU (a performance win), and makes it more flexible - as it leaves the vertex shader free for your own use.
These are great reasons to use the CPU. I'd say if it's good enough for the XNA team - it's good enough for you :)
Now, what I think your co-worker may have meant - rather than the GPU - was to do the vector maths using SIMD instructions (on the CPU).
These give you a performance win. For example - adding a vector usually requires you to add the X component, and then the Y component. Using SIMD allows the CPU to add both components at the same time.
Sadly Microsoft's .NET runtime does not currently make (this kind of) use of SIMD instructions. It's supported in Mono, though.

So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Looking inside the Vector class in XNA using ILSpy reveals that the XNA Vector libraries do not use the graphics card for vector math.

Related

Unity C# Voxel finite water optimization

I got a (basic) voxel engine running and a water system that looks (and I assume basically works) like this: https://www.youtube.com/watch?v=Q_TdeGIOOts (not my game).
The water values are stored in a 3d Array of floats, and every 0.05s it calculates water flow by checking the voxel below and adjacent (y-1, x-1, x+1, z-1, z+1) and adds the value.
This system works fine (70+ fps) for small amounts of water, but when I start calculating water on 8+ chunks, it gets too much.
(I disabled all rendering or mesh creation to check if that is the bottleneck, it isnt. Its purely the flow calculations).
I am not a very experienced programmer so I wouldnt know where to start optimizing, apart from making the calculations happen in a coroutine as I already did.
In this post: https://gamedev.stackexchange.com/questions/55414/how-to-define-areas-filled-with-water (near the bottom) Boreal suggests running it in a compute shader. Is this the way to go for me? And how would I go about such a thing?
Any help is much appreciated.

If you're really calculating a voxel based simulation, you will be expanding the number of calculations geometrically as your size increases, so you will quickly run out of processing power on larger volumes.
A compute shader is great for doing massively parallel calculations quickly, although it's a very different programming paradigm that takes some getting used to. A compute shader will look at the contents of a buffer (ie, a 'texture' for us civilians) and do things to it very quickly -- in your case the buffer will probably be a buffer/texture whose pixel values represent water cells. If you want to do something really simple like increment them up or down the compute shader uses the parallel processing power of the GPU to do it really fast.
The hard part is that GPUs are optimized for parallel processing. This means that you can't write code like "texelA.value += texelB.value" - without extra work on your part, each fragment of the buffer is processed with zero knowledge of what happens in the other fragments. To reference other texels you need to read the texture again somehow - some techniques read one texture multiple times with offsets (this GL example does this to implement blurs, others do it by repeatedly processing a texture, putting the result into a temporary texture and then reprocessing that.
At the 10,000 foot level: yes, a compute shader is a good tool for this kind of problem since it involves tons of self-similar calculation. But, it won't be easy to do off the bat. If you have not done conventional shader programming before, You may want to look at that first to get used to the way GPUs work. Even really basic tools (if-then-else or loops) have very different performance implications and uses in GPU programming and it takes some time to get your head around the differences. As of this writing (1/10/13) it looks like Nvidia and Udacity are offering an intro to compute shader course which might be a good way to get up to speed.
FWIW you also need pretty modern hardware for compute shaders, which may limit your audience.

XNA SpriteSortMode Speed and Shaders (4.0)

Evening/afternoon;
I'm getting to grips with writing a little custom 2D XNA engine for my own personal use. When deciding on how to draw the 2D sprites, I'm stuck in a quandry:
Firstly, I'd like, at some point, to implement some custom shader effects. Every tutorial I read on the internet said that I therefore am forced to use SpriteSortMode.Immediate, except one, which said that in XNA 4.0 that is no longer necessary.
Furthermore, I am unsure about which SpriteSortMode is fastest for my approach, regardless of shading effects. Ordering/layering of different sprites is definitely a necessity (i.e. to have a HUD in front of the game sprites, and the game sprites in front of a backdrop etc.); but would it be faster to implement a custom sorted list and just call the Draw()s in order, or use the BackToFront / FrontToBack options?
Thank you in advance.

Starting with XNA 4.0 you can use custom shader effects with any sprite sort mode. The Immediate sprite sort mode in XNA 3.1 was pretty broken. (see http://blogs.msdn.com/b/shawnhar/archive/2010/04/05/spritesortmode-immediate-in-xna-game-studio-4-0.aspx)
Concerning sorting I would say to sort them back to front for transparent sprites and front to back for opaque ones. See: http://msdn.microsoft.com/en-us/library/microsoft.xna.framework.graphics.spritesortmode.aspx
There is one extra interesting mode there (for a 2D engine at least), which is the Texture sorting mode where it sorts the draw calls by which texture is needed and thus reduces state change. That could be a big performance win for the main game sprites.
And I wouldn't worry too much about performance until your profiler says otherwise. SpriteBatch does quite a lot of batching (yes, really) and that will be the biggest performance improvement because it minimizes the number of state changes.
The only other way I can think of to improve performance is to use instancing, but I think that with XNA it might be a bad idea (ideally you'll want SM3.0 hardware and a custom vertex shader at least; I'm not sure how that plays with the SpriteBatch class).

Simulating fluid flow over a heightmap

I am looking for a way to approximate a volume of fluid moving over a heightmap. The easiest solution I can think of is to approximate it as a large number of non-drawn spheres, of small diameter (<0.1m). I would then place a visible plane representing the surface of the water on "top" of the spheres, at the locations they came to rest. To my knowledge, no managed physics engines contain a built in fluid simulator, hence the question.
Implementation would consist of using a physics engine such as JigLibX, which is capable of simulating the motion of the spheres. To determine the height of the planes, I was thinking of averaging the maximum height of each sphere that is on the top layer of a grouping.
I dont expect performance to be great, but would it be approachable for real time? If not, could I use this simulation to pre-bake lines of flow?
I hope this makes sense, I really want opinions/suggestions as to whether this is feasible, or if there is a better way of approaching this.
Thanks for any help, Venatu
(If its relevant, my target platform is XNA 4.0, using C#. Windows only at this point in time, so PhysX/Havok are possibilities for the simulation, but I would prefer a managed solution)

I haven't seen realistic fluid dynamics in real time without using something like PhysX as of yet - probably because the calculations needed are so complicated! The problem with your approach as I see it would come with the resting contact of all those spheres as they settled down, which takes up a lot of processing power. Lots of resting contact points are notorious for eating into performance very quickly, even on the most powerful of desktops.
If you are going down this route then I'd recommend modelling the fluid as an elastic but solid body using spring based physics, where the force applied to one part of the water would use springs to propagate out to the rest. This gives you the option of setting a breaking point for the springs and separating the body into two or more bodies when that happens (and the reverse for coming back together.) This can give you the foundation for things like spray. It's also a more versatile approach in terms of performance, because you can choose the number of particles and springs you use to approximate your model.
It's a big and complicated topic, but I hope that provided at least some insight!

The most popular method to simulate fluids in real-time is Smoothed-particle hydrodynamics.
Several useful links:
http://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics
http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html
http://www.plunk.org/~trina/thesis/html/thesis_toc.html
In addition to simulation itself you will also need some specialized broad-phase collision detection algorithms such as sweep-and-prune or hashing cells.
And you're right, there is no completed 2d solutions for the fluid dynamics.

What technologies to use for a particle system with enormous calculation demand?

I have a particle system with X particles.
Each particle tests for collision with other particles. This gives X*X = X^2 collision tests per frame. For 60f/s, this corresponds to 60*X^2 collision detection per second.
What is the best technological approach for these intensive calculations? Should I use F#, C, C++ or C#, or something else?
The following are constraints
The code is written in C# with the latest XNA
Multi-threaded may be considered
No special algorithm that tests the collision with the nearest neighbors or that reduces the problem
The last constraint may be strange, so let me explain.
Regardless constraint 3, given a problem with enormous computational requirement what would be the best approach to solve the problem.
An algorithm reduces the problem; still the same algorithm may behave different depending on technology. Consider pros and cons of CLR vs native C.

The simple answer is "measure it". But take a look at this graph (that I borrowed from this question - which is worth your reading).
C++ is maybe 10% faster than MS's C# implementation (for this particular calculation) and faster still against Mono's C# implementation. But in real world terms, C++ is not all that much faster than C#.
If you're doing hard-core number crunching, you will want to use the SIMD/SSE unit of your CPU. This is something that C# does not normally support - but Mono is adding support for through Mono.Simd. You can see from the graph that using the SIMD unit gives a significant performance boost to both languages.
(It's worth noting that while C++ is still "faster" than C#, the choice of language has only a small effect on performance, compared to the choice of what hardware to use. As was mentioned in the comments - your choice of algorithm will have by far the greatest effect.)
And finally, as Jerry Coffin mentioned in his answer, that you could also do the processing on the GPU. I imagine that it would be even faster than SIMD (but as I said - measure it). Using the GPU has the added beneift of leaving the CPU free to do other tasks. The downside is that your end-users will need a reasonable GPU.

You should probably consider doing this on the GPU using something like CUDA, OpenCL, or a DirectX compute shader.

Sweep and prune is a broad phase collision detection algorithm that may be well suited to this task. If you can make use of temporal coherency, that being from frame to frame the location differences are generally small a reduction in processing may be obtained. A good book on the subject is "real time collision detection".

For a simple speed up you could sort by one axis first and loop through checking for a collision in that axis before doing a full check... For each particle you only need to look further in the array until you find one that doesn't collide in that axis then you can move to the next one.

Vector Transformation with Matrix

I'm working on a code to do a software skinner (bone/skin animation), and I'm at the "optimization" phase (the skinner works pretty well and skin a 4900 triangles mesh with 22 bones in 1.09 ms on a Core Duo 2 Ghz (notebook)). What I need to know is :
1) Can someone show me the way (maybe with pseudocode) to transform a float3 (array of 3 float) (representing a coordinate) against a float4x3 matrix?
2) Can someone show me the way (maybe with pseudocode) to transform a float3 (array of 3 float) (representing a normal) against a float3x3 matrix?
I ask this as i know that in the skinning process you can avoid to use part of the matrix without getting any change in the animation process. (So to recover some elaboration time)
Thanks!

Optimizing vector/matrix operations via mathematical reduction is possible, but tricky. You can find some information on the topic here, here, and here.
Now, this may not be quite what you're looking for, but...
You can use the machine GPU (graphics card processor) to vastly increase the computation performance of vector/matrix operations. Many operations can be increased by several orders of magnitude by taking advantage of SIMD processing available on the GPU.
There are two reasonably good libraries available for C# developers for GPGPU programming:
Microsoft's Accelerator Library, documentation available here.
Brahma - an open source GPU library for C# developers that leverages LINQ.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.