I'm working on image related algorithms and was wondering what some good choices would be for unpredictable 2D traversal of structures as such:
Start from a set pixel
Traverse to a neighbor (up left down right or diagonal, always 1 away, no jumps).
Keep doing more of those traversals to the next pixel with the next direction depending on some output (similar rate of each direction over time).
For simple traversal then arrays are just fine and very fast when reading sequentially, for random access not much can be done but i'm wondering if here Something could be done, if simply stored in an array going "up" could actually be pretty far in memory on large is (5+megapixel). Can you think of any good structures for this? I'm more than happy to trade memory for speed here but i can't think of any structure that could help short of actually making an array of items that each store a pointer to their 8 neighbors, but that sounds like it would be slow to create to begin with and i'm not making more than 1 to 3 passes on those images so it may end up being slower than the naive array implementation.
Any suggestions are welcome.
Space-filling curves are what you are looking for. They provide cache-friendly data layout in memory for neighbor-related operations.
However, be aware of premature optimization. Before doing any kind of optimizations, make sure you have a working algorithm. Then you can start optimizing it, taking benchmarks and comparing them with the base non-optimized version.
Related
Currently I am using XNA Game Studio 4.0 with C# Visual Studio 2010. I want to use a versatile method for handling triangles. I am using a preset array of VertexPositionColor items passed through the GraphicsDevice.DrawUserPrimitives() method, which only handles arrays. Because arrays are fixed, but I wanted to have a very large space to arbitrarily add new triangles to the array, my original idea was to make a large array, specifically
VertexPositionColor vertices = new VertexPositionColor[int.MaxValue];
but that ran my application out of memory. So what I'm wondering is how to approach this memory/performance issue best.
Is there an easy way to increase the amount of memory allocated to the stack whenever my program runs?
Would it be beneficial to store the array on the heap instead? And would I have to build my own allocator if I wanted to do that?
Or is my best approach simply to use a LinkedList and deal with the extra processing required to copy it to an array every frame?
I hit this building my voxel engine code.
Consider the problem I had:
Given an unknown volume size that would clearly be bigger than the amount of memory the computer had how do I manage that volume of data?
My solution was to use sparse chunking. for example:
In my case instead of using an array I used a dictionary.
This way I could lookup the values based on a key that was say the hashcode of a voxels position and the value was the voxel itself.
This meant that the voxels were fast to pull out, and self organised by the language / compiler in to an indexed set.
It also means that when pulling data back out I could default to Voxel.Empty for voxels that hadn't yet been assigned.
In your case you might not need a default value but using a dictionary might prove more helpful than an array.
The up shot ... Arrays are a tad faster for some things but when you consider all of your usage scenarios for the data you may find that overall the gains of using a dictionary are worth a slight allocation cost.
In testing I found that if I was prepared to drop from something like 100ms per thousand to say 120ms per thousand on allocations I could then retrieve the data 100% faster for most of the queries I was performing on the set.
Reason for my suggestion here:
It looks like you don't know the size of your data set and using an array only makes sense if you do know the size otherwise you tie up needless "pre allocated chunks of ram" for no reason in order to make your code ready for any eventuality you want to throw at it.
Hope this helps.
You may try List<T> and ToArray() method associate with List. And it's supported by XNA framework too (MSDN).
List is a successor to ArrayList and provide more features and strongly typed (A good comparison).
About performance, List<T>.ToArray is a O(n) operation. And I suggest you to break your lengthy array to sort of portions which you can name with a key [Some sort of unique identifier to a region or so on] . And store relevant information in a List and use Dictionary like Dictionary<Key, List<T>> which could reduce operations involved. Also you can process required models with priority based approach which would give a performance gain over processing complete array at once.
I am developing a game in c#, I'm rather new at C#. I would like to know if this approach would affect adversely the performance?
Instantiate all textures in four categories, id est 4 different arrays.
This is to keep relevant textures apart from each other (example MonsterA needs 3 textures that are in the same array)
Have objects with generic Lists to point at the texture(s) they need.
Since the textures are in the same array this would help with the caching etc, I think
As far as I know List would only create pointers that have locality not so much the actual textures. I am Using SFML.Net, but this should apply to say, listing pictures of some sort, or listing objects you want to have locality.
The question is then, will doing this affect it adversely, will it work as I expect or will it not matter at all? And why?
If you are very serious about that - try all approaches and measure/compare. Don't forget to set your goals first, otherwise you'll be trying to save time/memory when it does not cause problem for your case. Note that you need to measure complete sequence you worry about, not just "load textures" part.
It is very unlikely if performance will be impacted by the way you are arranging metadata portion of textures (everything but image bytes) - the amount of memory used by images themselves would be much bigger than any list/dictionaries you refer textures from.
Main optimizations with textures are
not load them at all till needed/potentially need
somehow make them smaller (multiple detail levels, compression,...)
sometimes number of textures matter and image strips / sprite sheets can be used to combine multiple images into one.
But for most projects doing nothing special is a good start - if you finished game/program that is somewhat slower that you'd like is much better than 1/3 complete one but with very fast texture loading (or whatever you decide to optimize too much).
I got a (basic) voxel engine running and a water system that looks (and I assume basically works) like this: https://www.youtube.com/watch?v=Q_TdeGIOOts (not my game).
The water values are stored in a 3d Array of floats, and every 0.05s it calculates water flow by checking the voxel below and adjacent (y-1, x-1, x+1, z-1, z+1) and adds the value.
This system works fine (70+ fps) for small amounts of water, but when I start calculating water on 8+ chunks, it gets too much.
(I disabled all rendering or mesh creation to check if that is the bottleneck, it isnt. Its purely the flow calculations).
I am not a very experienced programmer so I wouldnt know where to start optimizing, apart from making the calculations happen in a coroutine as I already did.
In this post: https://gamedev.stackexchange.com/questions/55414/how-to-define-areas-filled-with-water (near the bottom) Boreal suggests running it in a compute shader. Is this the way to go for me? And how would I go about such a thing?
Any help is much appreciated.
If you're really calculating a voxel based simulation, you will be expanding the number of calculations geometrically as your size increases, so you will quickly run out of processing power on larger volumes.
A compute shader is great for doing massively parallel calculations quickly, although it's a very different programming paradigm that takes some getting used to. A compute shader will look at the contents of a buffer (ie, a 'texture' for us civilians) and do things to it very quickly -- in your case the buffer will probably be a buffer/texture whose pixel values represent water cells. If you want to do something really simple like increment them up or down the compute shader uses the parallel processing power of the GPU to do it really fast.
The hard part is that GPUs are optimized for parallel processing. This means that you can't write code like "texelA.value += texelB.value" - without extra work on your part, each fragment of the buffer is processed with zero knowledge of what happens in the other fragments. To reference other texels you need to read the texture again somehow - some techniques read one texture multiple times with offsets (this GL example does this to implement blurs, others do it by repeatedly processing a texture, putting the result into a temporary texture and then reprocessing that.
At the 10,000 foot level: yes, a compute shader is a good tool for this kind of problem since it involves tons of self-similar calculation. But, it won't be easy to do off the bat. If you have not done conventional shader programming before, You may want to look at that first to get used to the way GPUs work. Even really basic tools (if-then-else or loops) have very different performance implications and uses in GPU programming and it takes some time to get your head around the differences. As of this writing (1/10/13) it looks like Nvidia and Udacity are offering an intro to compute shader course which might be a good way to get up to speed.
FWIW you also need pretty modern hardware for compute shaders, which may limit your audience.
I have a case here that I would like to have some opinions from the experts :)
Situation:
I have a data structure with ´Int32´ and ´Double´ values, with a total of 108 bytes.
I have to process a large series of this data structure. Its something like (conceptual, I will use a for loop instead):
double result = 0;
foreach(Item item in series)
{
double += //some calculation based on item
}
I expect the size of the series to be about 10 Mb.
To be useful, the whole series must be processed. It's all or nothing.
The series data will never change.
My requirements:
Memory consumption is not an issue. I think that nowadays, if the user doesn't have a few dozen Mb free on his machine, he probably has a deeper problem.
Speed is a concern. I want the iteration to be as fast as possible.
No unmanaged code, or interop, or even unsafe.
What I would like to know
Implement the item data structure as a value or reference type? From what I know, value types are cheaper, but I imagine that on each iteration a copy will be made for each item if I use a value type. Is this copy faster than a heap access?
Any real problem if I implement the accessors as anonymous properties? I believe this will increase the footprint. But also that the getter will be inlined anyway. Can I safely assume this?
I'm seriously considering to create a very large static readonly array of the series directly in code (it's rather easy do this with the data source). This would give me a 10Mb assembly. Any reason why I should avoid this?
Hope someone can give me a good opinion on this.
Thanks
Implement the item data structure as a value or reference type? From what I know, value types are cheaper, but I imagine that on each iteration a copy will be made for each item if I use a value type. Is this copy faster than a heap access?
Code it both ways and profile it aggressively on real-world input. Then you'll know exactly which one is faster.
Any real problem if I implement the accessors as anonymous properties?
Real problem? No.
I believe this will increase the footprint. But also that the getter will be inlined anyway. Can I safely assume this?
You can only safely assume things guaranteed by the spec. It's not guaranteed by the spec.
I'm seriously considering to create a very large static readonly array of the series directly in code (it's rather easy do this with the data source). This would give me a 10Mb assembly. Any reason why I should avoid this?
I think you're probably worrying about this too much.
I'm sorry if my answer seems dismissive. You're asking random people on the Internet to speculate which of two things is faster. We can guess, and we might be right, but you could just code it both ways in the blink of an eye and know exactly which is faster. So, just do it?
However, I always code for correctness, readability and maintainability at first. I establish reasonable performance requirements up front, and I see if my implementation meets them. If it does, I move on. If I need more performance from my application, I profile it to find the bottlenecks and then I start worrying.
You're asking about a trivial computation that takes ~10,000,000 / 108 ~= 100,000 iterations. Is this even a bottleneck in your application? Seriously, you are overthinking this. Just code it and move on.
That's 100,000 loops which in CPU time is sod all. Stop over thinking it and just write the code. You're making a mountain out of a molehill.
Speed is subjective. How do you load your data and how much data is inside your process elsewhere? Loading the data will be the slowest part of your app if you do not need complex parsing logic to create your struct.
I do think you ask this question because you have a struct of 108 bytes of size which you do perform calculations on and you wonder why your app is slow. Please note that structs are passed by value which means if you pass the struct to one or more method during your calcuations or you fetch it from a List you will create a copy of the struct every time. This is indeed very costly.
Change your struct to a class and expose only getters to be sure to have a read only object only. That should fix your perf issues.
A good practice is to separate data from code, so regarding your "big array embedded in the code question", I say don't do that.
Use LINQ for calculations on entire series; the speed is good.
Use a Node class for each point if you want more functionality.
I used to work with such large series of data. They were points that you plot on a graph. Originally they were taken every ms or less. The datasets were huge. Users wanted to apply different formulas to these series and have that displayed. It looks to me that your problem might be similar.
To improve speed we stored different zoom levels of the points in a db. Say every ms, then aggregate for every minute, every hr, every day, etc (whatever users needed). When users zoomed in or out we would load the new values from db instead of performing the calculations right then. We would also cache the values so users don't have to go to the db all the time.
Also if the users wanted to apply some formulas to the series (like in your case), the data is less in size.
I've taken on quite a daunting challenge for myself. In my XNA game, I want to implement Blargg's NTSC filter. This is a C library that transforms a bitmap to make it look like it was output on a CRT TV with the NTSC standard. It's quite accurate, really.
The first thing I tried, a while back, was to just use the C library itself by calling it as a dll. Here I had two problems, 1. I couldn't get some of the data to copy correctly so the image was messed up, but more importantly, 2. it was extremely slow. It required getting the XNA Texture2D bitmap data, passing it through the filter, and then setting the data again to the texture. The framerate was ruined, so I couldn't go down this route.
Now I'm trying to translate the filter into a pixel shader. The problem here (if you're adventurous to look at the code - I'm using the SNES one because it's simplest) is that it handles very large arrays, and relies on interesting pointer operations. I've done a lot of work rewriting the algorithm to work independently per pixel, as a pixel shader will require. But I don't know if this will ever work. I've come to you to see if finishing this is even possible.
There's precalculated array involved containing 1,048,576 integers. Is this alone beyond any limits for the pixel shader? It only needs to be set once, not once per frame.
Even if that's ok, I know that HLSL cannot index arrays by a variable. It has to unroll it into a million if statements to get the correct array element. Will this kill the performance and make it a fruitless endeavor again? There are multiple array accesses per pixel.
Is there any chance that my original plan to use the library as is could work? I just need it to be fast.
I've never written a shader before. Is there anything else I should be aware of?
edit: Addendum to #2. I just read somewhere that not only can hlsl not access arrays by variable, but even to unroll it, the index has to be calculable at compile time. Is this true, or does the "unrolling" solve this? If it's true I think I'm screwed. Any way around that? My algorithm is basically a glorified version of "the input pixel is this color, so look up my output pixel values in this giant array."
From my limited understanding of Shader languages, your problem can easily be solved by using texture instead of array.
Pregenerate it on CPU and then save as texture. 1024x1024 in your case.
Use standard texture access functions as if texture was the array. Posibly using nearest-neighbor to limit blendinding of individual pixels.
I dont think this is possible if you want speed.