I'm aware that the following is a vague question, but I'm hitting performance problems that I did not anticipate in XNA.
I have a low poly model (It has 18 faces and 14 vertices) that I'm trying to draw to the screen a (high!) number of times. I get over 60 FPS (on a decent machine) until I draw this model 5000+ times. Am I asking too much here? I'd very much like to double or triple that number (10-15k) at least.
My code for actually drawing the models is given below. I have tried to eliminate as much computation from the draw cycle as possible, is there more I can squeeze from it, or better alternatives all together?
Note: tile.Offset is computed once during initialisation, not every cycle.
foreach (var tile in Tiles)
{
var myModel = tile.Model;
Matrix[] transforms = new Matrix[myModel.Bones.Count];
myModel.CopyAbsoluteBoneTransformsTo(transforms);
foreach (ModelMesh mesh in myModel.Meshes)
{
foreach (BasicEffect effect in mesh.Effects)
{
// effect.EnableDefaultLighting();
effect.World = transforms[mesh.ParentBone.Index]
* Matrix.CreateTranslation(tile.Offset);
effect.View = CameraManager.ViewMatrix;
effect.Projection = CameraManager.ProjectionMatrix;
}
mesh.Draw();
}
}
You're quite clearly hitting the batch limit. See this presentation and this answer and this answer for details. Put simply: there is a limit to how many draw calls you can submit to the GPU each second.
The batch limit is a CPU-based limit, so you'll probably see that your CPU gets pegged once you get to your 5000+ models. Worse still, when your game is doing other calculations, it will reduce the CPU time available to submit those batches.
(And it's important to note that, conversely, you are almost certainly not hitting GPU limits. No need to worry about mesh complexity yet.)
There are a number of ways to reduce your batch count. Frustrum culling is one. Probably the best one to persue in your case is Geometry Instancing, this lets you draw multiple models in a single batch. Here is an XNA sample that does this.
Better still, if it's static geometry, can you simply bake it all into one or a few big meshes?
As with any performance problem there are limits where a particular approach works. You need to measure and see where problems are. The best option is to use profiler but even basic measurements like looking at CPU load may show what bottlencks you have.
As a first investiagtion step I'd recommend to remove all computations (like matrix multiplications) and see you get improvments - this would mean that CPU is still doing more work than GPU.
Make sure you are not doing measurements on debug build - it could make application significantly slower if it is CPU bound.
Side note: GPU works the best when you send large operations relatively infrequently. Your code does more or less opposite - send huge number of very small drawing requests. You should be able to batch your primitives and get better performance. There are samples around how to render large number of simple objects (including ones in DirectX SDK), searching for "gpu rendering crowds" can give you starting point.
Related
My lack of in depth understanding of the fundamentals has taken a toll on these types of problem solving challenges.
The HackerRank matrix rotation problem is a very fun one to solve. I recommend people who are trying to enrich their coding skills to use hackerrank (https://www.hackerrank.com/challenges/matrix-rotation-algo)
The problem summary is that you are given an R x C matrix of integers where the minimum of R and C must be even. You have to rotate the matrix anti-clockwise x number of times. Rotation applies to the elements of the matrix, not the matrix dimension in case it is not clear.
So I solved this problem with two algorithms. They are both very similar in that you can imagine the matrix like layers of onions where you loop through each layer, and rotate the elements in that layer. The number of rotations is simply x % (count of elements in that layer) so if you are given x=1,000,000 it doesn't make sense to repeat full rotations.
The first one, which is the fastest is:
https://codetidy.com/8002/
The second one, does not loop through the number of rotations but instead does some heavy logic and math to figure out where to move an element to.
https://codetidy.com/8001/
So when I was writing the second one, I assumed that it would be crazy faster, because you don't iterate through maximum number of rotations in each layer. However, it ended up faring slower.
I don't quite understand why. I logged the number of iterations in a console and the first one does 50x more iterations, but is faster.
Number of iterations is not everything. Here are a few general things that might affect the performance.
One important thing to keep in mind with arrays and matrices are cache hits. If your operations generate lots of cache hits they will seem orders of magnitude faster. To get cache hits you usually need to go in the memory order. For an array that is sequentially forward. For a matrix it means incrementing the lowest index first. To get misses you need to jump around in increments larger than the size of the cacheline (CPU dependent). Fun experiment: benchmark for (i...) for (j...) ++m[i][j] and for (i...) for (j..) ++m[j][i] to see the difference.
In your case I would guess that the faster approach has very linear access on the horizontal parts at least.
Then there's branch prediction. Modern CPUs pipeline the instructions to make better use of the existing hardware. Branches (IFs) break the pipeline since you don't know which path to take (that instruction is still executing). As an optimization the compiler/CPU pick one and start processing and if the condition result is the other way it will throw everything away and restart processing. Checking something that usually gives the same result (like i<n) will be faster than something that's harder to predict.
These are some lowlevel reasons why the simpler approach might seem faster. Add some higher level reasons (like compiler not optimizing the code the way you expect) and you get results like this.
An important note: The complexity reflects the asymptotically behavior. Yes, the second approach will be faster for a sufficiently large matrix, and it's very likely that the sizes used for this problem are not sufficiently large.
I am working on writing an application that contains line plots of large datasets.
My current strategy is to load up my data for each channel into 1D vertex buffers.
I then use a vertex shader when drawing to assemble my buffers into vertices (so I can reuse one of my buffers for multiple sets of data)
This is working pretty well, and I can draw a few hundred million data-points, without slowing down too much.
To stretch things a bit further I would like to reduce the number of points that actually get drawn, though simple reduction (I.e. draw every n points) as there is not much point plotting 1000 points that are all represented by a single pixel)
One way I can think of doing this is to use a geometry shader and only emit every N points but I am not sure if this is the best plan of attack.
Would this be the recommended way of doing this?
You can do this much simpler by adjusting the stride of all vertex attributes to N times the normal one.
I'm looking for a data structure similar to T[,,] (a 3D array) except I don't know the dimensions beforehand (nor have a reasonable upper bound) that will expand outwards as time goes on. I'd like to use negative indexes as well.
The only thing that comes to mind is a dictionary, with some kind of Point3 struct as the key. Are there any other alternatives? I'd like to lookups to be as fast as possible. The data will always be clustered around 0,0,0. It can expand in any direction, but there will never be any "gaps" between points.
I think I'm going to go ahead and just use a Dictionary<Point3, T> for now, and see how that performs. If it's a performance issue I'll try building a wrapper around T[,,] such that I can use negative indexes.
Obviously you'll need to store this in a data structure resembling a sparse array, because you have no idea how large your data-set is going to be. So a Dictionary seems reasonable.
I'm going a little crazy here, but I think your indices should be in Spherical Coordinates. It makes sense to me as your data grows outwards. It will also make finding elements at a specific range from (0, 0, 0) extremely easy.
If you may need range queries, KD-Trees comes to mind. They are tree-like structures, at each level seperate the universe into two along one axis. They offer O(logN) lookup time (for constant number of dimensions) which may or may not be fast enough, but they also provide O(logN + S) time for range queries where S is the size of the items found, which usually is very good. They can handle dynamic data (insertions and deletions along with lookups) but the tree may become unbalanced as a result. Also you can do Nearest Neighboor search from a given point, (i.e. get the 10 nearest objects to point (7,8,9)) Wikipedia, as always, is a good starting point: http://en.wikipedia.org/wiki/Kd-tree
If there are huge numbers of things in the world, of if the world is very dynamic (things move, be created/destroyed all the time), kd-trees may not be good enough. If most of the time you will only ask "give me the thing at (7,8,9)" you can either use a hash as you mentioned in your question or something like a List<List<List<T>>>. I'd just implement whichever is easier within an interface and worry about the performance later.
I am kind of assuming you need the dynamic aspect because array could be huge. In that case, what you could try is to allocate your array as a set of 3d 'tiles'. At the top level you have a 3d data structure that stores pointers to your tiles. You expand and allocate this as you go along.
Each individual tile could contain, say, 32x32x32 voxels. Or whatever amount suits your goals.
Looking up you tile is done by dividing your coordinate index by 32 (by bitshifting, of course), and the index in the tile is calculated by masking away the upper bits.
A lookup like this is fairly cheap, possible on par with a .net Dictionary, but it will use less memory, which is good for performance too.
The resulting array will be chunky, though: the array boundaries are multiples of your tile size.
Array access is a very fast linear lookup - if speed of lookup is your priority, then it's probably the way to go, depending on how often you will need to modify your arrays.
If your goal is to maintain the chunks around a player, you might want to arrange a "current world" array structure around the player, such that it is a 3-dimensional array with the centre chunk at 9,9,9 with a size of [20,20,20]. Each time the player leaves the centre chunk for another, you re-index the array to drop the old chunks and move on.
Ultimately, you're asking for options on how to optimize your game engine, but it's nearly impossible to say which is going to be correct for you. Even though games tend to be more optimized for performance than other applications, don't get lured into micro-optimizations; optimize for readability first and then optimize for performance when you find it necessary.
If you suspect this particular data-structure is going to be a bottleneck in your engine, put some performance tracing in so it's easy to be sure one way or the other once you've got the engine running.
I am working on a project where the game world is irregularly shaped (Think of the shape of a lake). this shape has a grid with coordinates placed over it. The game world is only on the inside of the shape. (Once again, think Lake)
How can I efficiently represent the game world? I know that many worlds are basically square, and work well in a 2 or 3 dimension array. I feel like if I use an array that is square, then I am basically wasting space, and increasing the amount of time that I need to iterate through the array. However, I am not sure how a jagged array would work here either.
Example shape of gameworld
X
XX
XX X XX
XXX XXX
XXXXXXX
XXXXXXXX
XXXXX XX
XX X
X
Edit:
The game world will most likely need each valid location stepped through. So I would a method that makes it easy to do so.
There's computational overhead and complexity associated with sparse representations, so unless the bounding area is much larger than your actual world, it's probably most efficient to simply accept the 'wasted' space. You're essentially trading off additional memory usage for faster access to world contents. More importantly, the 'wasted-space' implementation is easier to understand and maintain, which is always preferable until the point where a more complex implementation is required. If you don't have good evidence that it's required, then it's much better to keep it simple.
You could use a quadtree to minimize the amount of wasted space in your representation. Quad trees are good for partitioning 2-dimensional space with varying granularity - in your case, the finest granularity is a game square. If you had a whole 20x20 area without any game squares, the quad tree representation would allow you to use only one node to represent that whole area, instead of 400 as in the array representation.
Use whatever structure you've come up with---you can always change it later. If you're comfortable with using an array, use it. Stop worrying about the data structure you're going to use and start coding.
As you code, build abstractions away from this underlying array, like wrapping it in a semantic model; then, if you realize (through profiling) that it's waste of space or slow for the operations you need, you can swap it out without causing problems. Don't try to optimize until you know what you need.
Use a data structure like a list or map, and only insert the valid game world coordinates. That way the only thing you are saving are valid locations, and you don't waste memory saving the non-game world locations since you can deduce those from lack of presence in your data structure.
The easiest thing is to just use the array, and just mark the non-gamespace positions with some special marker. A jagged array might work too, but I don't use those much.
You could present the world as an (undirected) graph of land (or water) patches. Each patch then has a regular form and the world is the combination of these patches. Every patch is a node in the graph and has has graph edges to all its neighbours.
That is probably also the most natural representation of any general world (but it might not be the most efficient one). From an efficiency point of view, it will probably beat an array or list for a highly irregular map but not for one that fits well into a rectangle (or other regular shape) with few deviations.
An example of a highly irregular map:
x
x x
x x x
x x
x xxx
x
x
x
x
There’s virtually no way this can be efficiently fitted (both in space ratio and access time) into a regular shape. The following, on the other hand, fits very well into a regular shape by applying basic geometric transformations (it’s a parallelogram with small bits missing):
xxxxxx x
xxxxxxxxx
xxxxxxxxx
xx xxxx
One other option that could allow you to still access game world locations in O(1) time and not waste too much space would be a hashtable, where the keys would be the coordinates.
Another way would be to store an edge list - a line vector along each straight edge. Easy to check for inclusion this way and a quad tree or even a simple location hash on each vertice can speed lookup of info. We did this with a height component per edge to model the walls of a baseball stadium and it worked beautifully.
There is a big issue that nobody here addressed: the huge difference between storing it on disk and storing it in memory.
Assuming you are talking about a game world as you said, this means it's going to be very large. You're not going to store the whole thing in memory in once, but instead you will store the immediate vicinity in memory and update it as the player walks around.
This vicinity area should be as simple, easy and quick to access as possible. It should definitely be an array (or a set of arrays which are swapped out as the player moves). It will be referenced often and by many subsystems of your game engine: graphics and physics will handle loading the models, drawing them, keeping the player on top of the terrain, collisions, etc.; sound will need to know what ground type the player is currently standing on, to play the appropriate footstep sound; and so on. Rather than broadcast and duplicate this data among all the subsystems, if you just keep it in global arrays they can access it at will and at 100% speed and efficiency. This can really simplify things (but be aware of the consequences of global variables!).
However, on disk you definitely want to compress it. Some of the given answers provide good suggestions; you can serialize a data structure such as a hash table, or a list of only filled-in locations. You could certainly store an octree as well. In any case, you don't want to store blank locations on disk; according to your statistic, that would mean 66% of the space is wasted. Sure there is a time to forget about optimization and make it Just Work, but you don't want to distribute a 66%-empty file to end users. Also keep in mind that disks are not perfect random-access machines (except for SSDs); mechanical hard drives should still be around another several years at least, and they work best sequentially. See if you can organize your data structure so that the read operations are sequential, as you stream more vicinity terrain while the player moves, and you'll probably find it to be a noticeable difference. Don't take my word for it though, I haven't actually tested this sort of thing, it just makes sense right?
I'm writing a game in which a character moves around on a randomly generated map in real time (as it's revealed.) This leads me an interesting data structures problem. The map is generated as it comes into view, in a circle around the character (probably 20-60 tiles) so where there is data, it is very dense, and all in a grid. Where there's not data, though, there could be huge, ungenerated spaces. The character could walk in a huge circle, for example, creating a ring of tiles around vast empty space.
A simple matrix would create massive amounts of unneeded overhead, and waste a lot of space. Typical BSPs, though, seem like they would cause a huge performance drop because of the dense, grid-like nature of the data.
What do you suggest? Matrixes - quadtrees - some hybrid of the two?
I am thinking of implementing something similar in my game that I am working on. I'm going to create a custom class that can be accessed like a 2D array ex. map[x][y] but the underlying datatype would be closer to a hashtable. Something like data[x.Value.ToString() + "," + y.Value.ToString()]
My game is fairly basic as my tiles will only ever be walkable, deadly, or unwalkable.
I'm interested in a more elegant solution though :D
I've been tackling this for the past month, and have come up with what I believe is a fairly good solution. It's not as fast as a pure matrix, but has the benefit of being infinitely extensible (within the limits of an int.)
Basically, it's a binary space partition which builds upwards (instead of downwards, like a quadtree.) If you write to a point outside of the currently allocated matrix space, it generates a larger node and expands. If a majority of a node's children nodes are allocated matrices, it will aggregate them into itself and remove their references. This means that the more well-defined boundaries you use, the better performance you get, as the more this structure acts like a matrix.
I've posted my code here, and will try and write some sort of demo in the future, and move to a better hosting site.
Don't hesitate to let me know what you think, or if you have any questions about it.