I've taken on quite a daunting challenge for myself. In my XNA game, I want to implement Blargg's NTSC filter. This is a C library that transforms a bitmap to make it look like it was output on a CRT TV with the NTSC standard. It's quite accurate, really.
The first thing I tried, a while back, was to just use the C library itself by calling it as a dll. Here I had two problems, 1. I couldn't get some of the data to copy correctly so the image was messed up, but more importantly, 2. it was extremely slow. It required getting the XNA Texture2D bitmap data, passing it through the filter, and then setting the data again to the texture. The framerate was ruined, so I couldn't go down this route.
Now I'm trying to translate the filter into a pixel shader. The problem here (if you're adventurous to look at the code - I'm using the SNES one because it's simplest) is that it handles very large arrays, and relies on interesting pointer operations. I've done a lot of work rewriting the algorithm to work independently per pixel, as a pixel shader will require. But I don't know if this will ever work. I've come to you to see if finishing this is even possible.
There's precalculated array involved containing 1,048,576 integers. Is this alone beyond any limits for the pixel shader? It only needs to be set once, not once per frame.
Even if that's ok, I know that HLSL cannot index arrays by a variable. It has to unroll it into a million if statements to get the correct array element. Will this kill the performance and make it a fruitless endeavor again? There are multiple array accesses per pixel.
Is there any chance that my original plan to use the library as is could work? I just need it to be fast.
I've never written a shader before. Is there anything else I should be aware of?
edit: Addendum to #2. I just read somewhere that not only can hlsl not access arrays by variable, but even to unroll it, the index has to be calculable at compile time. Is this true, or does the "unrolling" solve this? If it's true I think I'm screwed. Any way around that? My algorithm is basically a glorified version of "the input pixel is this color, so look up my output pixel values in this giant array."
From my limited understanding of Shader languages, your problem can easily be solved by using texture instead of array.
Pregenerate it on CPU and then save as texture. 1024x1024 in your case.
Use standard texture access functions as if texture was the array. Posibly using nearest-neighbor to limit blendinding of individual pixels.
I dont think this is possible if you want speed.
Related
I'm working on image related algorithms and was wondering what some good choices would be for unpredictable 2D traversal of structures as such:
Start from a set pixel
Traverse to a neighbor (up left down right or diagonal, always 1 away, no jumps).
Keep doing more of those traversals to the next pixel with the next direction depending on some output (similar rate of each direction over time).
For simple traversal then arrays are just fine and very fast when reading sequentially, for random access not much can be done but i'm wondering if here Something could be done, if simply stored in an array going "up" could actually be pretty far in memory on large is (5+megapixel). Can you think of any good structures for this? I'm more than happy to trade memory for speed here but i can't think of any structure that could help short of actually making an array of items that each store a pointer to their 8 neighbors, but that sounds like it would be slow to create to begin with and i'm not making more than 1 to 3 passes on those images so it may end up being slower than the naive array implementation.
Any suggestions are welcome.
Space-filling curves are what you are looking for. They provide cache-friendly data layout in memory for neighbor-related operations.
However, be aware of premature optimization. Before doing any kind of optimizations, make sure you have a working algorithm. Then you can start optimizing it, taking benchmarks and comparing them with the base non-optimized version.
I got a (basic) voxel engine running and a water system that looks (and I assume basically works) like this: https://www.youtube.com/watch?v=Q_TdeGIOOts (not my game).
The water values are stored in a 3d Array of floats, and every 0.05s it calculates water flow by checking the voxel below and adjacent (y-1, x-1, x+1, z-1, z+1) and adds the value.
This system works fine (70+ fps) for small amounts of water, but when I start calculating water on 8+ chunks, it gets too much.
(I disabled all rendering or mesh creation to check if that is the bottleneck, it isnt. Its purely the flow calculations).
I am not a very experienced programmer so I wouldnt know where to start optimizing, apart from making the calculations happen in a coroutine as I already did.
In this post: https://gamedev.stackexchange.com/questions/55414/how-to-define-areas-filled-with-water (near the bottom) Boreal suggests running it in a compute shader. Is this the way to go for me? And how would I go about such a thing?
Any help is much appreciated.
If you're really calculating a voxel based simulation, you will be expanding the number of calculations geometrically as your size increases, so you will quickly run out of processing power on larger volumes.
A compute shader is great for doing massively parallel calculations quickly, although it's a very different programming paradigm that takes some getting used to. A compute shader will look at the contents of a buffer (ie, a 'texture' for us civilians) and do things to it very quickly -- in your case the buffer will probably be a buffer/texture whose pixel values represent water cells. If you want to do something really simple like increment them up or down the compute shader uses the parallel processing power of the GPU to do it really fast.
The hard part is that GPUs are optimized for parallel processing. This means that you can't write code like "texelA.value += texelB.value" - without extra work on your part, each fragment of the buffer is processed with zero knowledge of what happens in the other fragments. To reference other texels you need to read the texture again somehow - some techniques read one texture multiple times with offsets (this GL example does this to implement blurs, others do it by repeatedly processing a texture, putting the result into a temporary texture and then reprocessing that.
At the 10,000 foot level: yes, a compute shader is a good tool for this kind of problem since it involves tons of self-similar calculation. But, it won't be easy to do off the bat. If you have not done conventional shader programming before, You may want to look at that first to get used to the way GPUs work. Even really basic tools (if-then-else or loops) have very different performance implications and uses in GPU programming and it takes some time to get your head around the differences. As of this writing (1/10/13) it looks like Nvidia and Udacity are offering an intro to compute shader course which might be a good way to get up to speed.
FWIW you also need pretty modern hardware for compute shaders, which may limit your audience.
I'm working on a "falling sand" style of game.
I've tried many ways of drawing the sand to the screen, however, each way seems to produce some problem in one form or another.
List of things I've worked through:
Drawing each pixel individually, one at a time from a pixel sized texture. Problem: Slowed down after about 100,000 pixels were changing per update.
Drawing each pixel to one big texture2d, drawing the texture2d, then clearing the data. Problems: using texture.SetPixel() is very slow, and even with disposing the old texture, it would cause a small memory leak (about 30kb per second, which added up quick), even after calling dispose on the object. I simply could not figure out how to stop it. Overall, however, this has been the best method (so far). If there is a way to stop that leak, I'd like to hear it.
Using Lockbits from bitmap. This worked wonderfully from the bitmaps perspective, but unfortunately, I still had to convert the bitmap back to a texture2d, which would cause the frame rate to drop to less than one. So, this has the potential to work very well, if I can find a way to draw the bitmap in xna without converting it (or something).
Setting each pixel into a texture2d with set pixel, by replacing the 'old' position of pixels with transparent pixels, then setting the new position with the proper color. This doubled the number of pixel sets necessary to finish the job, and was much much slower than using number 2.
So, my question is, any better ideas? Or ideas on how to fix styles 2 or 3?
My immediate thought is that you are stalling the GPU pipeline. The GPU can have a pipeline that lags several frames behind the commands that you are issuing.
So if you issue a command to set data on a texture, and the GPU is currently using that texture to render an old frame, it must finish all of its rendering before it can accept the new texture data. So it waits, killing your performance.
The workaround for this might be to use several textures in a double- (or even triple- or quad-) buffer arrangement. Don't attempt to write to a texture that you have just used for rendering.
Also - you can write to textures from a thread other than your rendering thread. This might come in handy, particularly for clearing textures.
As you seem to have discovered, it's actually quicker to SetData in large chunks, rather than issue many, small SetData calls. Determining the ideal size for a "chunk" differs between GPUs - but it is a fair bit bigger than a single pixel.
Also, creating a texture is much slower than reusing one, in raw performance terms (if you ignore the pipeline effect I just described); so reuse that texture.
It's worth mentioning that a "pixel sprite" requires sending maybe 30 times as much data per-pixel to the GPU than a texture.
See also this answer, which has a few more details and some in-depth links if you want to go deeper.
I'm getting images from a C328R camera attached to a small arduino robot. I want the robot to drive towards orange ping-pong balls and pick them up. I'm using the C# code supplied by funkotron76 at http://www.codeproject.com/KB/recipes/C328R.aspx.
Is there a library I can use to do this, or do I need to iterate over every pixel in the image looking for orange? If so, what kind of tolerance would I need to compensate for various lighting conditions?
I could probably test to figure out these numbers, but I'm hoping someone out there knows the answers.
Vision can be surprisingly difficult, especially as you try to tolerate varying conditions. A few good things to research include Blob Finding (searching for contiguous pixels matching certain criteria, usually a threshold of brightness), Image Segmentation (can you have multiple balls in an image?) and general theory on Hue (most vision algorithms work with grayscale or binary images, so you'll first need to transform the image in a way that highlights the orangeness as the criteria for selection.)
Since you are presumably tracking these objects in real time as you move toward them, you might also be interested in learning about tracking models, such as the Kalman filter. It's overkill for what you're doing, but it's interesting and the basic ideas are helpful. Since you presumably know that the object should not be moving very quickly, you can use that fact to filter out false positives that could otherwise lead you to move away from the object. You can put together a simpler version of this kind of filtering by simply ignoring frames that have moved too far from the previous accepted frame (with a few boundary conditions to avoid getting stuck ignoring the object.)
As far as I know, certain mathematical functions like FFTs and perlin noise, etc. can be much faster when done on the GPU as a pixel shader. My question is, if I wanted to exploit this to calculate results and stream to bitmaps, could I do it without needing to actually display it in Silverlight or something?
More specifically, I was thinking of using this for large terrain generation involving lots of perlin and other noises, and post-processing like high passes and deriving normals from heightmaps, etc, etc.
The short answer is yes. The longer answer is that you can set (for example) a texture as the render target, which deposits your results there.
Unless you're really set on using a shader to do the calculation, you might want to consider using something that's actually designed for this kind of job such as Cuda or OpenCL.
Hmm its a good question.
Anything that can be displayed can be rendered using an instance of WriteableBitmap and its Render method. You can access the output using the Pixels byte array property.
However (assuming GPU acceleration is turned on and the content is appropriately marked to make use of the GPU) whether such a render will actually make use of the GPU when going to a WriteableBitmap instead of the display I don't know.