Optimal High-Density Binary Space Partition for Grids

Optimal High-Density Binary Space Partition for Grids - c#

I'm writing a game in which a character moves around on a randomly generated map in real time (as it's revealed.) This leads me an interesting data structures problem. The map is generated as it comes into view, in a circle around the character (probably 20-60 tiles) so where there is data, it is very dense, and all in a grid. Where there's not data, though, there could be huge, ungenerated spaces. The character could walk in a huge circle, for example, creating a ring of tiles around vast empty space.
A simple matrix would create massive amounts of unneeded overhead, and waste a lot of space. Typical BSPs, though, seem like they would cause a huge performance drop because of the dense, grid-like nature of the data.
What do you suggest? Matrixes - quadtrees - some hybrid of the two?

I am thinking of implementing something similar in my game that I am working on. I'm going to create a custom class that can be accessed like a 2D array ex. map[x][y] but the underlying datatype would be closer to a hashtable. Something like data[x.Value.ToString() + "," + y.Value.ToString()]
My game is fairly basic as my tiles will only ever be walkable, deadly, or unwalkable.
I'm interested in a more elegant solution though :D

I've been tackling this for the past month, and have come up with what I believe is a fairly good solution. It's not as fast as a pure matrix, but has the benefit of being infinitely extensible (within the limits of an int.)
Basically, it's a binary space partition which builds upwards (instead of downwards, like a quadtree.) If you write to a point outside of the currently allocated matrix space, it generates a larger node and expands. If a majority of a node's children nodes are allocated matrices, it will aggregate them into itself and remove their references. This means that the more well-defined boundaries you use, the better performance you get, as the more this structure acts like a matrix.
I've posted my code here, and will try and write some sort of demo in the future, and move to a better hosting site.
Don't hesitate to let me know what you think, or if you have any questions about it.

Related

Understanding performance of two Matrix rotation algorithms

My lack of in depth understanding of the fundamentals has taken a toll on these types of problem solving challenges.
The HackerRank matrix rotation problem is a very fun one to solve. I recommend people who are trying to enrich their coding skills to use hackerrank (https://www.hackerrank.com/challenges/matrix-rotation-algo)
The problem summary is that you are given an R x C matrix of integers where the minimum of R and C must be even. You have to rotate the matrix anti-clockwise x number of times. Rotation applies to the elements of the matrix, not the matrix dimension in case it is not clear.
So I solved this problem with two algorithms. They are both very similar in that you can imagine the matrix like layers of onions where you loop through each layer, and rotate the elements in that layer. The number of rotations is simply x % (count of elements in that layer) so if you are given x=1,000,000 it doesn't make sense to repeat full rotations.
The first one, which is the fastest is:
https://codetidy.com/8002/
The second one, does not loop through the number of rotations but instead does some heavy logic and math to figure out where to move an element to.
https://codetidy.com/8001/
So when I was writing the second one, I assumed that it would be crazy faster, because you don't iterate through maximum number of rotations in each layer. However, it ended up faring slower.
I don't quite understand why. I logged the number of iterations in a console and the first one does 50x more iterations, but is faster.

Number of iterations is not everything. Here are a few general things that might affect the performance.
One important thing to keep in mind with arrays and matrices are cache hits. If your operations generate lots of cache hits they will seem orders of magnitude faster. To get cache hits you usually need to go in the memory order. For an array that is sequentially forward. For a matrix it means incrementing the lowest index first. To get misses you need to jump around in increments larger than the size of the cacheline (CPU dependent). Fun experiment: benchmark for (i...) for (j...) ++m[i][j] and for (i...) for (j..) ++m[j][i] to see the difference.
In your case I would guess that the faster approach has very linear access on the horizontal parts at least.
Then there's branch prediction. Modern CPUs pipeline the instructions to make better use of the existing hardware. Branches (IFs) break the pipeline since you don't know which path to take (that instruction is still executing). As an optimization the compiler/CPU pick one and start processing and if the condition result is the other way it will throw everything away and restart processing. Checking something that usually gives the same result (like i<n) will be faster than something that's harder to predict.
These are some lowlevel reasons why the simpler approach might seem faster. Add some higher level reasons (like compiler not optimizing the code the way you expect) and you get results like this.
An important note: The complexity reflects the asymptotically behavior. Yes, the second approach will be faster for a sufficiently large matrix, and it's very likely that the sizes used for this problem are not sufficiently large.

List which indices are expandable into both directions?

I was going to to write a game (it's called "Qwirkle" if you ever heard of it) in which a 2-dimensional game field stores the position of stones which the players have put into it. The first player puts a stone anywhere and other players can connect to it from any side (left / right / top and bottom). The game field itself is not restricted to a fixed size which would ruin the game idea. However, the number of stones is limited to a value the player can define at start.
Because of the game logic I need to for-loop through the stones with an index. However, since the players can add stones from any side, I'd need a list which is expandable into any direction (e.g. into negative and positive index direction).
Performance is not unimportant since I need to check several stones in one turn.
The best thing would be to access a stone like _stones[-3,5] to access the one at position -3, 5, of course.
I thought a stack which can be pushed and popped from any side (like PushBack / PushFront) would be useful for this, but I'm not quite sure how to realize that in C#.
Are there pre-implemented lists / stacks like the one I'm thinking of, or is my approach completely weird?

The data structure you want is an immutable quadtree. If the board is mostly empty then using an immutable quadree enables you to represent boards that are essentially unlimited in size; a one-trillion-by-one-trillion cell board takes only a few bytes more memory than a 32-by-32 cell board. Immutable quadtrees can easily be indexed in the manner you describe, and computing a new quadtree given an old quadtree and an edit is straightforward.
I've written immutable quadtree algorithms several times over the years and I have been meaning for a long time to do a series of blog articles on them, but I never have. When I do I'll come back and update this answer.
In the meanwhile, this Dr. Dobbs article on Gosper's Algorithm is the one I used to learn how immutable quadtrees work.
http://www.drdobbs.com/jvm/an-algorithm-for-compressing-space-and-t/184406478

What you want is a double ended queue (known as a deque, pronounced "deck"). There is no implementation in the .NET BCL (unfortunately) but there are 3rd party implementations (see Google).

Dictionary could be an option, instead of thinking about indexes of a list, you could think of integer keys of a Dictionary. No matter what's the dimension.
Dictionary<int,Dictionary<int,Stone>> stones = new Dictionary<int,Dictionary<int,Stone>>();
// do some initialisation for the base field size ...
// access it this way
Stone s = stones[-1][-5];
Only issue is when you want to add a 2nd dimension which can get resource consuming (iterating over all 1st dimensions).

I think you will learn more from implementing a custom data structure containing, say, two ArrayLists. One going forward and one going backward. Of course, in your implementation they will both go the same way but your implementation could have the data structure return a negation of the positive index plus one so that you don't get an index of -0 for the first element in the backward ArrayList.
Maybe I've misunderstood the problem, but that's how I would go about implementing a two-way expandable data structure.
Good luck!

You could do something like:
_stones[-(i),i] and/or _stones[i,-(i)]
Just a suggestion.

Data structure for expandable 3D array?

I'm looking for a data structure similar to T[,,] (a 3D array) except I don't know the dimensions beforehand (nor have a reasonable upper bound) that will expand outwards as time goes on. I'd like to use negative indexes as well.
The only thing that comes to mind is a dictionary, with some kind of Point3 struct as the key. Are there any other alternatives? I'd like to lookups to be as fast as possible. The data will always be clustered around 0,0,0. It can expand in any direction, but there will never be any "gaps" between points.
I think I'm going to go ahead and just use a Dictionary<Point3, T> for now, and see how that performs. If it's a performance issue I'll try building a wrapper around T[,,] such that I can use negative indexes.

Obviously you'll need to store this in a data structure resembling a sparse array, because you have no idea how large your data-set is going to be. So a Dictionary seems reasonable.
I'm going a little crazy here, but I think your indices should be in Spherical Coordinates. It makes sense to me as your data grows outwards. It will also make finding elements at a specific range from (0, 0, 0) extremely easy.

If you may need range queries, KD-Trees comes to mind. They are tree-like structures, at each level seperate the universe into two along one axis. They offer O(logN) lookup time (for constant number of dimensions) which may or may not be fast enough, but they also provide O(logN + S) time for range queries where S is the size of the items found, which usually is very good. They can handle dynamic data (insertions and deletions along with lookups) but the tree may become unbalanced as a result. Also you can do Nearest Neighboor search from a given point, (i.e. get the 10 nearest objects to point (7,8,9)) Wikipedia, as always, is a good starting point: http://en.wikipedia.org/wiki/Kd-tree
If there are huge numbers of things in the world, of if the world is very dynamic (things move, be created/destroyed all the time), kd-trees may not be good enough. If most of the time you will only ask "give me the thing at (7,8,9)" you can either use a hash as you mentioned in your question or something like a List<List<List<T>>>. I'd just implement whichever is easier within an interface and worry about the performance later.

I am kind of assuming you need the dynamic aspect because array could be huge. In that case, what you could try is to allocate your array as a set of 3d 'tiles'. At the top level you have a 3d data structure that stores pointers to your tiles. You expand and allocate this as you go along.
Each individual tile could contain, say, 32x32x32 voxels. Or whatever amount suits your goals.
Looking up you tile is done by dividing your coordinate index by 32 (by bitshifting, of course), and the index in the tile is calculated by masking away the upper bits.
A lookup like this is fairly cheap, possible on par with a .net Dictionary, but it will use less memory, which is good for performance too.
The resulting array will be chunky, though: the array boundaries are multiples of your tile size.

Array access is a very fast linear lookup - if speed of lookup is your priority, then it's probably the way to go, depending on how often you will need to modify your arrays.
If your goal is to maintain the chunks around a player, you might want to arrange a "current world" array structure around the player, such that it is a 3-dimensional array with the centre chunk at 9,9,9 with a size of [20,20,20]. Each time the player leaves the centre chunk for another, you re-index the array to drop the old chunks and move on.
Ultimately, you're asking for options on how to optimize your game engine, but it's nearly impossible to say which is going to be correct for you. Even though games tend to be more optimized for performance than other applications, don't get lured into micro-optimizations; optimize for readability first and then optimize for performance when you find it necessary.
If you suspect this particular data-structure is going to be a bottleneck in your engine, put some performance tracing in so it's easy to be sure one way or the other once you've got the engine running.

Fastest spatial index for 2D data, once built there are no updates

I have large 2D object collection, only lines for now.
I need algorithm suggestion how to create fastest spatial index over
this collection so that I can collect all objects that are inside some
bounds.
Once built index will not be updated.
Object distribution in this database is not spatially uniform.
Algorithm implementation in C#.
Update: Current usage is for road graph of some country, so lines are small, from one crossroad to another, bigger density in populated areas. I think this gives good picture about data.
Obviously there are many indexing methods to achieve this, but I would require one that is fastest.

You can use the Segment Tree if you want to save 2-D lines and your queries are 2-D range queries.
The algorithmic complexity of a query is O( log^2 N ).

Check out quadtrees.... and DotSpatial for spatial type handling, including a quadtree implementation.

You can also try an R-tree. There's a C# implementation available at http://sourceforge.net/projects/cspatialindexrt/.
R-trees should have the kind of performance of a Segment Tree and the above implementation should be stand alone and fairly independent of a lot of extra code references, but I haven't tested it.

There is no silver bullet on this. It depends on the type of data (i.e., only points, only lines, triangles, meshes, any combination of them, etc.) and the type of query (point inside polygon, line intersection, nearest neighbors, any geometry inside a circle or box, etc).
You have a datastructure designed for specific type of query and data. If you want to use a single datastructure for all types of queries and all type of data you have to trade off either space or time or both. You can approach to be reasonably fast but you won't be optimal in general.
In my experience, a datastructure general enough to cope with most geometrical objects and can handle several types of queries I would recommend the AABB-Tree:
https://doc.cgal.org/latest/AABB_tree/index.html

Representing a Gameworld that is Irregularly shaped

I am working on a project where the game world is irregularly shaped (Think of the shape of a lake). this shape has a grid with coordinates placed over it. The game world is only on the inside of the shape. (Once again, think Lake)
How can I efficiently represent the game world? I know that many worlds are basically square, and work well in a 2 or 3 dimension array. I feel like if I use an array that is square, then I am basically wasting space, and increasing the amount of time that I need to iterate through the array. However, I am not sure how a jagged array would work here either.
Example shape of gameworld
X
XX
XX X XX
XXX XXX
XXXXXXX
XXXXXXXX
XXXXX XX
XX X
X
Edit:
The game world will most likely need each valid location stepped through. So I would a method that makes it easy to do so.

There's computational overhead and complexity associated with sparse representations, so unless the bounding area is much larger than your actual world, it's probably most efficient to simply accept the 'wasted' space. You're essentially trading off additional memory usage for faster access to world contents. More importantly, the 'wasted-space' implementation is easier to understand and maintain, which is always preferable until the point where a more complex implementation is required. If you don't have good evidence that it's required, then it's much better to keep it simple.

You could use a quadtree to minimize the amount of wasted space in your representation. Quad trees are good for partitioning 2-dimensional space with varying granularity - in your case, the finest granularity is a game square. If you had a whole 20x20 area without any game squares, the quad tree representation would allow you to use only one node to represent that whole area, instead of 400 as in the array representation.

Use whatever structure you've come up with---you can always change it later. If you're comfortable with using an array, use it. Stop worrying about the data structure you're going to use and start coding.
As you code, build abstractions away from this underlying array, like wrapping it in a semantic model; then, if you realize (through profiling) that it's waste of space or slow for the operations you need, you can swap it out without causing problems. Don't try to optimize until you know what you need.

Use a data structure like a list or map, and only insert the valid game world coordinates. That way the only thing you are saving are valid locations, and you don't waste memory saving the non-game world locations since you can deduce those from lack of presence in your data structure.

The easiest thing is to just use the array, and just mark the non-gamespace positions with some special marker. A jagged array might work too, but I don't use those much.

You could present the world as an (undirected) graph of land (or water) patches. Each patch then has a regular form and the world is the combination of these patches. Every patch is a node in the graph and has has graph edges to all its neighbours.
That is probably also the most natural representation of any general world (but it might not be the most efficient one). From an efficiency point of view, it will probably beat an array or list for a highly irregular map but not for one that fits well into a rectangle (or other regular shape) with few deviations.
An example of a highly irregular map:
x
x x
x x x
x x
x xxx
x
x
x
x
There’s virtually no way this can be efficiently fitted (both in space ratio and access time) into a regular shape. The following, on the other hand, fits very well into a regular shape by applying basic geometric transformations (it’s a parallelogram with small bits missing):
xxxxxx x
xxxxxxxxx
xxxxxxxxx
xx xxxx

One other option that could allow you to still access game world locations in O(1) time and not waste too much space would be a hashtable, where the keys would be the coordinates.

Another way would be to store an edge list - a line vector along each straight edge. Easy to check for inclusion this way and a quad tree or even a simple location hash on each vertice can speed lookup of info. We did this with a height component per edge to model the walls of a baseball stadium and it worked beautifully.

There is a big issue that nobody here addressed: the huge difference between storing it on disk and storing it in memory.
Assuming you are talking about a game world as you said, this means it's going to be very large. You're not going to store the whole thing in memory in once, but instead you will store the immediate vicinity in memory and update it as the player walks around.
This vicinity area should be as simple, easy and quick to access as possible. It should definitely be an array (or a set of arrays which are swapped out as the player moves). It will be referenced often and by many subsystems of your game engine: graphics and physics will handle loading the models, drawing them, keeping the player on top of the terrain, collisions, etc.; sound will need to know what ground type the player is currently standing on, to play the appropriate footstep sound; and so on. Rather than broadcast and duplicate this data among all the subsystems, if you just keep it in global arrays they can access it at will and at 100% speed and efficiency. This can really simplify things (but be aware of the consequences of global variables!).
However, on disk you definitely want to compress it. Some of the given answers provide good suggestions; you can serialize a data structure such as a hash table, or a list of only filled-in locations. You could certainly store an octree as well. In any case, you don't want to store blank locations on disk; according to your statistic, that would mean 66% of the space is wasted. Sure there is a time to forget about optimization and make it Just Work, but you don't want to distribute a 66%-empty file to end users. Also keep in mind that disks are not perfect random-access machines (except for SSDs); mechanical hard drives should still be around another several years at least, and they work best sequentially. See if you can organize your data structure so that the read operations are sequential, as you stream more vicinity terrain while the player moves, and you'll probably find it to be a noticeable difference. Don't take my word for it though, I haven't actually tested this sort of thing, it just makes sense right?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.