XNA SpriteSortMode Speed and Shaders (4.0)

XNA SpriteSortMode Speed and Shaders (4.0) - c#

Evening/afternoon;
I'm getting to grips with writing a little custom 2D XNA engine for my own personal use. When deciding on how to draw the 2D sprites, I'm stuck in a quandry:
Firstly, I'd like, at some point, to implement some custom shader effects. Every tutorial I read on the internet said that I therefore am forced to use SpriteSortMode.Immediate, except one, which said that in XNA 4.0 that is no longer necessary.
Furthermore, I am unsure about which SpriteSortMode is fastest for my approach, regardless of shading effects. Ordering/layering of different sprites is definitely a necessity (i.e. to have a HUD in front of the game sprites, and the game sprites in front of a backdrop etc.); but would it be faster to implement a custom sorted list and just call the Draw()s in order, or use the BackToFront / FrontToBack options?
Thank you in advance.

Starting with XNA 4.0 you can use custom shader effects with any sprite sort mode. The Immediate sprite sort mode in XNA 3.1 was pretty broken. (see http://blogs.msdn.com/b/shawnhar/archive/2010/04/05/spritesortmode-immediate-in-xna-game-studio-4-0.aspx)
Concerning sorting I would say to sort them back to front for transparent sprites and front to back for opaque ones. See: http://msdn.microsoft.com/en-us/library/microsoft.xna.framework.graphics.spritesortmode.aspx
There is one extra interesting mode there (for a 2D engine at least), which is the Texture sorting mode where it sorts the draw calls by which texture is needed and thus reduces state change. That could be a big performance win for the main game sprites.
And I wouldn't worry too much about performance until your profiler says otherwise. SpriteBatch does quite a lot of batching (yes, really) and that will be the biggest performance improvement because it minimizes the number of state changes.
The only other way I can think of to improve performance is to use instancing, but I think that with XNA it might be a bad idea (ideally you'll want SM3.0 hardware and a custom vertex shader at least; I'm not sure how that plays with the SpriteBatch class).

Related

Does drawing outside of screen bounds affect performance

In my 2D game I have large map, and I scroll around it (like in Age of Empires).
On draw I draw all the elements (which are textures/Images). Some of them are on the screen, but most of them are not (because you see about 10% of the map).
Does XNA know not to draw them by checking that the destination rectangle won't fall on the screen?
Or should I manually check it and avoid drawing them at all?

A sometimes overlooked concept of CPU performance while using SpriteBatch is how you go about batching your sprites. And using drawable game component doesn't lend itself easily to efficiently batching sprites.
Basically, The GPU drawing the sprites is not slow & the CPU organizing all the sprites into a batch is not slow. The slow part is when the CPU has to communicate to the GPU what it needs to draw (sending it the batch info). This CPU to GPU communication does not happen when you call spriteBatch.Draw(). It happens when you call spriteBatch.End(). So the low hanging fruit to efficiency is calling spriteBatch.End() less often. (of course this means calling Begin() less often too). Also, use spriteSortMode.Immediate very sparingly because it immediately causes the CPU to send each sprites info to the GPU (slow)).
So if you call Begin() & End() in each game component class, and have many components, you are costing yourself a lot of time unnecessarily and you will probably save more time coming up with a better batching scheme than worrying about offscreen sprites.
Aside: The GPU automatically ignores offscreen sprites from its pixel shader anyway. So culling offscreen sprites on the CPU won't save GPU time.
Reference here.

You must manually account for this, and it will largely effect the performance of the game as it grows, this is otherwise known as culling. Culling is not done just because drawing stuff off screen reduces performance, it is because calling Draw that many extra times is slow. Anything you don't need to update that is out of the viewport should be excluded too. You can see more about how you can do this and how SpriteBatch handles this here.

XNA, Vector math and the GPU

I am looking into making a game for Windows Phone and Windows 8 RT. The first iteration of the game will use XNA for the UI.
But since I plan to have other iterations that may not use XNA, I am writing my core game logic in a Portable Class Library.
I have gotten to the part where I am calculating vector math (sprite locations) in the core game logic.
As I was figuring this out, I had a co-worker tell me that I should make sure that I am doing these calculations on the GPU (not the CPU).
So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Side Question: If not, should they be done on the GPU? Or is it OK for me to do them in my Portable Class Library and have the CPU run them?
Note: If I need to have XNA do them so that I can use the GPU then it is not hard to inject that functionality into my core logic from XNA. I just want to know if it is something I should really be doing.
Note II: My game is a 2D game. It will be calculating movement of bad guys and projectiles along a vector. (Meaning this is not a huge 3D Game.)

I think your co-worker is mistaken. Here are just two of the reasons that doing this kind of calculation on the GPU doesn't make sense:
The #1 reason, by a very large margin, is that it's not cheap to get data onto the GPU in the first place. And then it's extremely expensive to get data back from the GPU.
The #2 reason is that the GPU is good for doing parallel calculations - that is - it does the same operation on a large amount of data. The kind of vector operations you will be doing are many different operations, on a small-to-medium amount of data.
So you'd get a huge win if - say - you were doing a particle system on the GPU. It's a large amount of homogeneous data, you perform the same operation on each particle, and all the data can live on the GPU.
Even XNA's built-in SpriteBatch does most of its per-sprite work on the CPU (everything but the final, overall matrix transformation). While it could do per-sprite transforms on the GPU (and I think it used to in XNA 3), it doesn't. This allows it to reduce the amount of data it needs to send the GPU (a performance win), and makes it more flexible - as it leaves the vertex shader free for your own use.
These are great reasons to use the CPU. I'd say if it's good enough for the XNA team - it's good enough for you :)
Now, what I think your co-worker may have meant - rather than the GPU - was to do the vector maths using SIMD instructions (on the CPU).
These give you a performance win. For example - adding a vector usually requires you to add the X component, and then the Y component. Using SIMD allows the CPU to add both components at the same time.
Sadly Microsoft's .NET runtime does not currently make (this kind of) use of SIMD instructions. It's supported in Mono, though.

So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Looking inside the Vector class in XNA using ILSpy reveals that the XNA Vector libraries do not use the graphics card for vector math.

Simulating fluid flow over a heightmap

I am looking for a way to approximate a volume of fluid moving over a heightmap. The easiest solution I can think of is to approximate it as a large number of non-drawn spheres, of small diameter (<0.1m). I would then place a visible plane representing the surface of the water on "top" of the spheres, at the locations they came to rest. To my knowledge, no managed physics engines contain a built in fluid simulator, hence the question.
Implementation would consist of using a physics engine such as JigLibX, which is capable of simulating the motion of the spheres. To determine the height of the planes, I was thinking of averaging the maximum height of each sphere that is on the top layer of a grouping.
I dont expect performance to be great, but would it be approachable for real time? If not, could I use this simulation to pre-bake lines of flow?
I hope this makes sense, I really want opinions/suggestions as to whether this is feasible, or if there is a better way of approaching this.
Thanks for any help, Venatu
(If its relevant, my target platform is XNA 4.0, using C#. Windows only at this point in time, so PhysX/Havok are possibilities for the simulation, but I would prefer a managed solution)

I haven't seen realistic fluid dynamics in real time without using something like PhysX as of yet - probably because the calculations needed are so complicated! The problem with your approach as I see it would come with the resting contact of all those spheres as they settled down, which takes up a lot of processing power. Lots of resting contact points are notorious for eating into performance very quickly, even on the most powerful of desktops.
If you are going down this route then I'd recommend modelling the fluid as an elastic but solid body using spring based physics, where the force applied to one part of the water would use springs to propagate out to the rest. This gives you the option of setting a breaking point for the springs and separating the body into two or more bodies when that happens (and the reverse for coming back together.) This can give you the foundation for things like spray. It's also a more versatile approach in terms of performance, because you can choose the number of particles and springs you use to approximate your model.
It's a big and complicated topic, but I hope that provided at least some insight!

The most popular method to simulate fluids in real-time is Smoothed-particle hydrodynamics.
Several useful links:
http://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics
http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html
http://www.plunk.org/~trina/thesis/html/thesis_toc.html
In addition to simulation itself you will also need some specialized broad-phase collision detection algorithms such as sweep-and-prune or hashing cells.
And you're right, there is no completed 2d solutions for the fluid dynamics.

Optimization of a GC language, any ideas?

I'm a pretty big newbie when it comes to optimization. In the current game I'm working on I've managed to optimize a function and shave about 0.5% of its CPU load and that's about as 'awesome' as I've been.
My situation is as follows: I've developed a physics heavy game in MonoTouch using an XNA wrapper library called ExEn and try as I might I've found it very hard to get the game to reach a playable framerate on an iPhone4 (don't even want to think about iPhone3GS at this point).
The performance degradation is almost certainly in the physics calculations, if I turn physics off the framerate jumps up sharply, if I disable everything, rendering, input, audio and just leave physics on performance hovers around 15fps during physics intensive situations.
I used Instruments to profile the performance and this is what I got: http://i.imgur.com/FX25h.png The functions which drain the most performance are either from the physics engine (Farseer) or the ExEn XNA wrapper functions they call (notably Vector2.Max, Vector2.Min).
I looked into those functions and I know wherever it can Farseer is passing values by reference into those functions rather than by value so that's that covered (and it's literally the only way I can think of. The functions are very simple themselves basically amounting to such operations as
return new Vector2(Max(v1.x, v2.x), Max(v1.y, v2.y))
Basically I feel like I'm stuck and in my limited capacity and understanding of code optimizations I'm not sure what my options are or if I even have any options (maybe I should just curl into a fetal position and cry?). With LLVM turned on and built in release I'm getting maybe 15fps at best. I did manage to bring the game up to 30fps by lowering the physics precision but this makes many levels simply unplayable as bodies intersect one another and collapse in on themselves.
So my question is, is this a lost cause or is there anything I can do to beef up performance?

First of all, love your game on Windows Phone 7!
Secondly, I don't see anything out of the ordinary in your profiler output. I did a quick and dirty performance analysis of the Farseer engine once (running in .net) and came up with similar results. It almost looks like you have a slowdown that is proportional across the board and may be due to mono itself.

I suppose you follow the performance hints in http://farseerphysics.codeplex.com/documentation already :-)
The most important thing seems to be
to reduce complexity for the
collision detection calculations,
i.e. not the visual but the colliding
shapes. In Unijty3D they are called
colliders and you can attach a simple
cube as a collider to a complex human
body. I don't know anything about
Fareer but they probably have similar
concept (is it called body?).
If possible, try to replace your main
character or other complex objects by
easy cubes and check if fps raises.
Compiler switches sometimes leverage performance. Be really sure that there are no debug settings activated (I got up to 30 times slower code in a C++ library project). Ensure that armv7 optimisation is turned on and -O3 or -Os
Watch out for logging statements as they are extremely expensive on iPhone
[Update:]
Try to decrease the number of actively calculated AABBs just to find out which part of the physics engine causes the trouble. If it's the pure number follow FFox' advice.
What is about other platforms? Where did you perform the testing during the development phase, on simulator? Which one? Any chance to get it running on Android or Android simulator or Windows Phone? This would give you a hint if it is an iPhone specific problem.
Ah, I just saw that ExEn still is in pre-release state and the final will be launched on July 21th as OS. IMO this changes the situation: If your App is running fine on some other comparable platform, then just wait for the release and give it a new try. Chances are pretty well that there is still debugging code in the pre-release you are working on.

Collisions in a real world application

Here's my problem. I'm creating a game and I'm wondering about how to do the collisions. I have several case to analyze and to find the best solution for.
I'll say it beforehand, I'm not using any third party physics library, but I'm gonna do it in house. (as this is an educational project, I don't have schedules and I want to learn)
I have 2 types of mesh for which I have to make the collisions for :
1) Static Meshes (that move around the screen, but does not have ANY animation)
2) Skinned/Boned Meshes (animated)
Actually I have this solution (quite hacky :|)
First of all I have a test against some bounding volume that enclose the full mesh (capsule in my case), after :
1) For the static meshes I divide them manually in blocks (on the modeler) and for each of these blocks i use a sphere/AABB test. (works fine, but its a little messy to slice every mesh :P) (i tried an automatic system to divide the mesh through planes, but it gives bad results :()
2) For the animated mesh ATM i'm dividing the mesh at runtime into x blocks (where x is the number of bones). Each block contain the vertex for which that bone is the major influencer. (Sometimes works, sometimes gives really bad results. :|)
Please note that the divide of the mesh is done at loading time and not each time (otherwise it would run like a slideshow :D)
And here's the question :
What is the most sane idea to use for those 2 case?
Any material for me to study these methods? (with some sourcecode and explanations would be even better (language is not important, when i understand the algorithm, the implementation is easy))
Can you argument why that solution is better than others?
I heard a lot of talk about kd-tree, octree, etc..while I understand their structure I miss their utility in a collision detection scenario.
Thanks a lot for the answers!!!
EDIT : Trying to find a K-Dop example with some explanation on the net. Still haven't found anything. :( Any clues?
I'm interested on HOW the K-Dop can be efficiently tested with other type of bounding volumes etc...but the documentation on the net seems highly lacking. :(

Prior to doing complex collision detection you should perform basic detection.
Using spheres or rectangles as bounding volumes is your best bet. Then if this detects a collision, move onto your more complex methods.
What I'm getting at is simple is often better, and quicker. Wrapping bounding volumes and splitting meshes up is costly, not to mention complex. You seem to be on the right track though.
As with game programming there are multiple ways of collision detection. My advice would be start simple. Take a cube and perfect your routines on that, then in theory you should be able to use any other model. As for examples I'd check gamedev.net as they have some nice articles. Much or my home made collision detection is a combination of many methods, so I can't really recommended the definitive resource.

The most common approaches used in many current AAA games is "k-DOP" simplified collision for StaticMeshes, and a simplified physical body representation for the SkeletalMeshes.
If you google for "kDOP collision" or "discrete orientation polytopes" you should find enough references. This is basicly a bounding volume defined of several planes that are moved from outside towards the mesh, until a triangle collision occurs. The "k" in kDOP defines how many of these planes are used, and depending on your geometry and your "k" you can get really good approximations.
For SkeletalMeshes the most common technique is to define simple geometry that is attached to specific bones. This geometry might be a box or a sphere. This collision-model than can be used for quite accurate collision detection of animated meshes.
If you need per-triangle collision, the "Separating Axis Theorem" is the google-search term of your choice. This is usefull for specific cases, but 75% of your collision-detection needs should be covered with the above mentioned methods.
Keep in mind, that you most probably will need a higher level of early collision rejection than a bounding-volume. As soon as you have a lot of objects in the world, you will need to use a "spatial partitioning" to reject groups of objects from further testing as early as possible.

The answering question comes down to how precise do you need?
Clearly, sphere bounding boxes are the most trivial. On the other side of the scale, you have a full triangle mesh-mesh collision detection, which has to happen each time an object moves.
Game development physics engine rely on the art of the approximation(I lurked in GameDev.net's math and physics forums years ago).
My opinion is that you will need some sort of bounding ellipsoid associated with each object. An object can be a general multimesh object, a mesh, or a submesh mesh. This should provide a 'decent' amount of approximation.

Pick up Christer Ericson's book, Real-Time Collision Detection. He discusses these very issues in great detail.
When reading articles, remember that in a real-world game application you will be working under hard limits of memory and time - you get 16.6ms per frame and that's it! So be cautious of any article or paper that doesn't seriously discuss the memory and CPU footprint of its algorithm.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.