I can easily think of a number of situations where it would be useful to change a single pixel in a Texture2D, especially because of the performance hit and inconvenience you get when constantly doing GetData<>(); SetData<>(); every frame or drawing to a RenderTarget2D.
Is there any real reason not to expose setter methods for single pixels? If not, is there a way to modify a single pixel without using the methods above?
Texture data is almost always copied to video memory (VRAM) by the graphics driver when the texture is initialized, for performance reasons. This makes texture fetches by the shaders running on the GPU significantly faster; you would definitely not be happy if every texture cache miss had to fetch the missing data over the PCIe bus!
However, as you've noticed, this makes it difficult and/or slow for the CPU to read or modify the data. Not only is the PCIe bus relatively slow, but VRAM is generally not directly addressable by the CPU; data must usually be transferred using special low-level DMA commands. This is exactly why you see a performance hit when using XNA's GetData<>() and SetData<>(): it's not the function call overhead that's killing you, it's the fact that they have to copy data back and forth to VRAM behind your back.
If you want to modify data in VRAM, the low-level rendering API (e.g. OpenGL or Direct3D 11) gives you three options:
Temporarily "map" the pixel data before your changes (which involves copying it back to main memory) and "unmap" it when your edits are complete (to commit the changes back to VRAM). This is probably what GetData<>() and SetData<>() are doing internally.
Use a function like OpenGL's glTexSubImage2D(), which essentially skips the "map" step and copies the new pixel data directly back to VRAM, overwriting the previous contents.
Instruct the GPU to make the modifications on your behalf, by running a shader that writes to the texture as a render target.
XNA is built on top of Direct3D, so it has to work within these limitations as well. So, no raw pixel data for you!
(As an aside, everything above is also true for GPU buffer data.)
Related
I am using OpenGL (via OpenTK) to perform spatial queries on lots of point cloud data on the GPU. Each frame of data is around 200k points. This works well flor low numbers of queries (<10queries # 60fps) but does not scale as more are performed per data frame (100queries # 6fps).
I would have expected modern GPUs to be able to chew through 20 million points (200k * 100 queries) points from 100 draw calls without breaking a sweat; especially since each glDrawArrays uses the same VBO.
A 'spatial query' consists of setting some uniforms and a glDrawArrays call. The geom shader then chooses to emit or not emit a vertex based on the result of the query. I have tried with / without branching and it makes no difference. The VBO used is separated attributes, one is STATIC_DRAW and other is DYNAMIC_DRAW (updated before each batch frame of spatial queries). Transform feedback then collects the data.
Profiling shows that glGetQueryObject is by far the slowest call (probably blocking, 5600 inclusive samples compared to 127 from glDrawArrys) but I'm not sure how to improve this. I tried making lots of little result buffers in GPU memory and binding a new transform feedback buffer for each query, but this had no effect - perhaps due to running on a single core? The other option would be to read the video memory from the previous query from another thread, but this throws an Access Violation and I'm unsure if the gains would be that significant.
Any thoughts on how to improve performance? Am I missing something obvious like a debug mode that needs switching off?
I'm working on a "falling sand" style of game.
I've tried many ways of drawing the sand to the screen, however, each way seems to produce some problem in one form or another.
List of things I've worked through:
Drawing each pixel individually, one at a time from a pixel sized texture. Problem: Slowed down after about 100,000 pixels were changing per update.
Drawing each pixel to one big texture2d, drawing the texture2d, then clearing the data. Problems: using texture.SetPixel() is very slow, and even with disposing the old texture, it would cause a small memory leak (about 30kb per second, which added up quick), even after calling dispose on the object. I simply could not figure out how to stop it. Overall, however, this has been the best method (so far). If there is a way to stop that leak, I'd like to hear it.
Using Lockbits from bitmap. This worked wonderfully from the bitmaps perspective, but unfortunately, I still had to convert the bitmap back to a texture2d, which would cause the frame rate to drop to less than one. So, this has the potential to work very well, if I can find a way to draw the bitmap in xna without converting it (or something).
Setting each pixel into a texture2d with set pixel, by replacing the 'old' position of pixels with transparent pixels, then setting the new position with the proper color. This doubled the number of pixel sets necessary to finish the job, and was much much slower than using number 2.
So, my question is, any better ideas? Or ideas on how to fix styles 2 or 3?
My immediate thought is that you are stalling the GPU pipeline. The GPU can have a pipeline that lags several frames behind the commands that you are issuing.
So if you issue a command to set data on a texture, and the GPU is currently using that texture to render an old frame, it must finish all of its rendering before it can accept the new texture data. So it waits, killing your performance.
The workaround for this might be to use several textures in a double- (or even triple- or quad-) buffer arrangement. Don't attempt to write to a texture that you have just used for rendering.
Also - you can write to textures from a thread other than your rendering thread. This might come in handy, particularly for clearing textures.
As you seem to have discovered, it's actually quicker to SetData in large chunks, rather than issue many, small SetData calls. Determining the ideal size for a "chunk" differs between GPUs - but it is a fair bit bigger than a single pixel.
Also, creating a texture is much slower than reusing one, in raw performance terms (if you ignore the pipeline effect I just described); so reuse that texture.
It's worth mentioning that a "pixel sprite" requires sending maybe 30 times as much data per-pixel to the GPU than a texture.
See also this answer, which has a few more details and some in-depth links if you want to go deeper.
I've taken on quite a daunting challenge for myself. In my XNA game, I want to implement Blargg's NTSC filter. This is a C library that transforms a bitmap to make it look like it was output on a CRT TV with the NTSC standard. It's quite accurate, really.
The first thing I tried, a while back, was to just use the C library itself by calling it as a dll. Here I had two problems, 1. I couldn't get some of the data to copy correctly so the image was messed up, but more importantly, 2. it was extremely slow. It required getting the XNA Texture2D bitmap data, passing it through the filter, and then setting the data again to the texture. The framerate was ruined, so I couldn't go down this route.
Now I'm trying to translate the filter into a pixel shader. The problem here (if you're adventurous to look at the code - I'm using the SNES one because it's simplest) is that it handles very large arrays, and relies on interesting pointer operations. I've done a lot of work rewriting the algorithm to work independently per pixel, as a pixel shader will require. But I don't know if this will ever work. I've come to you to see if finishing this is even possible.
There's precalculated array involved containing 1,048,576 integers. Is this alone beyond any limits for the pixel shader? It only needs to be set once, not once per frame.
Even if that's ok, I know that HLSL cannot index arrays by a variable. It has to unroll it into a million if statements to get the correct array element. Will this kill the performance and make it a fruitless endeavor again? There are multiple array accesses per pixel.
Is there any chance that my original plan to use the library as is could work? I just need it to be fast.
I've never written a shader before. Is there anything else I should be aware of?
edit: Addendum to #2. I just read somewhere that not only can hlsl not access arrays by variable, but even to unroll it, the index has to be calculable at compile time. Is this true, or does the "unrolling" solve this? If it's true I think I'm screwed. Any way around that? My algorithm is basically a glorified version of "the input pixel is this color, so look up my output pixel values in this giant array."
From my limited understanding of Shader languages, your problem can easily be solved by using texture instead of array.
Pregenerate it on CPU and then save as texture. 1024x1024 in your case.
Use standard texture access functions as if texture was the array. Posibly using nearest-neighbor to limit blendinding of individual pixels.
I dont think this is possible if you want speed.
hello i have a heavy graphics application where i have to draw the graphics in 2-10 seconds every time this time varies depending upon the source application which is sending data to my application via UDP;
i have some static graphics there is no change in them some are semi dynamic that means some time they are updated and normally remains unchanged and all other graphics are dynamic there are about 8000 approx objects that are dynamic
i am working in c# and learn the two techniques given in title which one will be more efficient in this case help required
thanx in advance;
How large are your objects?
One probably can't predict what's more efficient here, it depends on everything, type of objects, size of objects, complexity of converting data to visible graphics and most of all the speed of your internet connection will limit your application.
In the end you probably want to try both and measure their performance. Even then you might want to implement it as a setting so the user can flip between the two.
As far as I know, certain mathematical functions like FFTs and perlin noise, etc. can be much faster when done on the GPU as a pixel shader. My question is, if I wanted to exploit this to calculate results and stream to bitmaps, could I do it without needing to actually display it in Silverlight or something?
More specifically, I was thinking of using this for large terrain generation involving lots of perlin and other noises, and post-processing like high passes and deriving normals from heightmaps, etc, etc.
The short answer is yes. The longer answer is that you can set (for example) a texture as the render target, which deposits your results there.
Unless you're really set on using a shader to do the calculation, you might want to consider using something that's actually designed for this kind of job such as Cuda or OpenCL.
Hmm its a good question.
Anything that can be displayed can be rendered using an instance of WriteableBitmap and its Render method. You can access the output using the Pixels byte array property.
However (assuming GPU acceleration is turned on and the content is appropriately marked to make use of the GPU) whether such a render will actually make use of the GPU when going to a WriteableBitmap instead of the display I don't know.