Offloading to HLSL/GPU without displaying?

Offloading to HLSL/GPU without displaying? - c#

As far as I know, certain mathematical functions like FFTs and perlin noise, etc. can be much faster when done on the GPU as a pixel shader. My question is, if I wanted to exploit this to calculate results and stream to bitmaps, could I do it without needing to actually display it in Silverlight or something?
More specifically, I was thinking of using this for large terrain generation involving lots of perlin and other noises, and post-processing like high passes and deriving normals from heightmaps, etc, etc.

The short answer is yes. The longer answer is that you can set (for example) a texture as the render target, which deposits your results there.
Unless you're really set on using a shader to do the calculation, you might want to consider using something that's actually designed for this kind of job such as Cuda or OpenCL.

Hmm its a good question.
Anything that can be displayed can be rendered using an instance of WriteableBitmap and its Render method. You can access the output using the Pixels byte array property.
However (assuming GPU acceleration is turned on and the content is appropriately marked to make use of the GPU) whether such a render will actually make use of the GPU when going to a WriteableBitmap instead of the display I don't know.

Related

Unity C# Voxel finite water optimization

I got a (basic) voxel engine running and a water system that looks (and I assume basically works) like this: https://www.youtube.com/watch?v=Q_TdeGIOOts (not my game).
The water values are stored in a 3d Array of floats, and every 0.05s it calculates water flow by checking the voxel below and adjacent (y-1, x-1, x+1, z-1, z+1) and adds the value.
This system works fine (70+ fps) for small amounts of water, but when I start calculating water on 8+ chunks, it gets too much.
(I disabled all rendering or mesh creation to check if that is the bottleneck, it isnt. Its purely the flow calculations).
I am not a very experienced programmer so I wouldnt know where to start optimizing, apart from making the calculations happen in a coroutine as I already did.
In this post: https://gamedev.stackexchange.com/questions/55414/how-to-define-areas-filled-with-water (near the bottom) Boreal suggests running it in a compute shader. Is this the way to go for me? And how would I go about such a thing?
Any help is much appreciated.

If you're really calculating a voxel based simulation, you will be expanding the number of calculations geometrically as your size increases, so you will quickly run out of processing power on larger volumes.
A compute shader is great for doing massively parallel calculations quickly, although it's a very different programming paradigm that takes some getting used to. A compute shader will look at the contents of a buffer (ie, a 'texture' for us civilians) and do things to it very quickly -- in your case the buffer will probably be a buffer/texture whose pixel values represent water cells. If you want to do something really simple like increment them up or down the compute shader uses the parallel processing power of the GPU to do it really fast.
The hard part is that GPUs are optimized for parallel processing. This means that you can't write code like "texelA.value += texelB.value" - without extra work on your part, each fragment of the buffer is processed with zero knowledge of what happens in the other fragments. To reference other texels you need to read the texture again somehow - some techniques read one texture multiple times with offsets (this GL example does this to implement blurs, others do it by repeatedly processing a texture, putting the result into a temporary texture and then reprocessing that.
At the 10,000 foot level: yes, a compute shader is a good tool for this kind of problem since it involves tons of self-similar calculation. But, it won't be easy to do off the bat. If you have not done conventional shader programming before, You may want to look at that first to get used to the way GPUs work. Even really basic tools (if-then-else or loops) have very different performance implications and uses in GPU programming and it takes some time to get your head around the differences. As of this writing (1/10/13) it looks like Nvidia and Udacity are offering an intro to compute shader course which might be a good way to get up to speed.
FWIW you also need pretty modern hardware for compute shaders, which may limit your audience.

Optimal way to set pixel data?

I'm working on a "falling sand" style of game.
I've tried many ways of drawing the sand to the screen, however, each way seems to produce some problem in one form or another.
List of things I've worked through:
Drawing each pixel individually, one at a time from a pixel sized texture. Problem: Slowed down after about 100,000 pixels were changing per update.
Drawing each pixel to one big texture2d, drawing the texture2d, then clearing the data. Problems: using texture.SetPixel() is very slow, and even with disposing the old texture, it would cause a small memory leak (about 30kb per second, which added up quick), even after calling dispose on the object. I simply could not figure out how to stop it. Overall, however, this has been the best method (so far). If there is a way to stop that leak, I'd like to hear it.
Using Lockbits from bitmap. This worked wonderfully from the bitmaps perspective, but unfortunately, I still had to convert the bitmap back to a texture2d, which would cause the frame rate to drop to less than one. So, this has the potential to work very well, if I can find a way to draw the bitmap in xna without converting it (or something).
Setting each pixel into a texture2d with set pixel, by replacing the 'old' position of pixels with transparent pixels, then setting the new position with the proper color. This doubled the number of pixel sets necessary to finish the job, and was much much slower than using number 2.
So, my question is, any better ideas? Or ideas on how to fix styles 2 or 3?

My immediate thought is that you are stalling the GPU pipeline. The GPU can have a pipeline that lags several frames behind the commands that you are issuing.
So if you issue a command to set data on a texture, and the GPU is currently using that texture to render an old frame, it must finish all of its rendering before it can accept the new texture data. So it waits, killing your performance.
The workaround for this might be to use several textures in a double- (or even triple- or quad-) buffer arrangement. Don't attempt to write to a texture that you have just used for rendering.
Also - you can write to textures from a thread other than your rendering thread. This might come in handy, particularly for clearing textures.
As you seem to have discovered, it's actually quicker to SetData in large chunks, rather than issue many, small SetData calls. Determining the ideal size for a "chunk" differs between GPUs - but it is a fair bit bigger than a single pixel.
Also, creating a texture is much slower than reusing one, in raw performance terms (if you ignore the pipeline effect I just described); so reuse that texture.
It's worth mentioning that a "pixel sprite" requires sending maybe 30 times as much data per-pixel to the GPU than a texture.
See also this answer, which has a few more details and some in-depth links if you want to go deeper.

Why doesn't `Texture2D` expose its pixel data?

I can easily think of a number of situations where it would be useful to change a single pixel in a Texture2D, especially because of the performance hit and inconvenience you get when constantly doing GetData<>(); SetData<>(); every frame or drawing to a RenderTarget2D.
Is there any real reason not to expose setter methods for single pixels? If not, is there a way to modify a single pixel without using the methods above?

Texture data is almost always copied to video memory (VRAM) by the graphics driver when the texture is initialized, for performance reasons. This makes texture fetches by the shaders running on the GPU significantly faster; you would definitely not be happy if every texture cache miss had to fetch the missing data over the PCIe bus!
However, as you've noticed, this makes it difficult and/or slow for the CPU to read or modify the data. Not only is the PCIe bus relatively slow, but VRAM is generally not directly addressable by the CPU; data must usually be transferred using special low-level DMA commands. This is exactly why you see a performance hit when using XNA's GetData<>() and SetData<>(): it's not the function call overhead that's killing you, it's the fact that they have to copy data back and forth to VRAM behind your back.
If you want to modify data in VRAM, the low-level rendering API (e.g. OpenGL or Direct3D 11) gives you three options:
Temporarily "map" the pixel data before your changes (which involves copying it back to main memory) and "unmap" it when your edits are complete (to commit the changes back to VRAM). This is probably what GetData<>() and SetData<>() are doing internally.
Use a function like OpenGL's glTexSubImage2D(), which essentially skips the "map" step and copies the new pixel data directly back to VRAM, overwriting the previous contents.
Instruct the GPU to make the modifications on your behalf, by running a shader that writes to the texture as a render target.
XNA is built on top of Direct3D, so it has to work within these limitations as well. So, no raw pixel data for you!
(As an aside, everything above is also true for GPU buffer data.)

Translating C to C# and HLSL: will this be possible?

I've taken on quite a daunting challenge for myself. In my XNA game, I want to implement Blargg's NTSC filter. This is a C library that transforms a bitmap to make it look like it was output on a CRT TV with the NTSC standard. It's quite accurate, really.
The first thing I tried, a while back, was to just use the C library itself by calling it as a dll. Here I had two problems, 1. I couldn't get some of the data to copy correctly so the image was messed up, but more importantly, 2. it was extremely slow. It required getting the XNA Texture2D bitmap data, passing it through the filter, and then setting the data again to the texture. The framerate was ruined, so I couldn't go down this route.
Now I'm trying to translate the filter into a pixel shader. The problem here (if you're adventurous to look at the code - I'm using the SNES one because it's simplest) is that it handles very large arrays, and relies on interesting pointer operations. I've done a lot of work rewriting the algorithm to work independently per pixel, as a pixel shader will require. But I don't know if this will ever work. I've come to you to see if finishing this is even possible.
There's precalculated array involved containing 1,048,576 integers. Is this alone beyond any limits for the pixel shader? It only needs to be set once, not once per frame.
Even if that's ok, I know that HLSL cannot index arrays by a variable. It has to unroll it into a million if statements to get the correct array element. Will this kill the performance and make it a fruitless endeavor again? There are multiple array accesses per pixel.
Is there any chance that my original plan to use the library as is could work? I just need it to be fast.
I've never written a shader before. Is there anything else I should be aware of?
edit: Addendum to #2. I just read somewhere that not only can hlsl not access arrays by variable, but even to unroll it, the index has to be calculable at compile time. Is this true, or does the "unrolling" solve this? If it's true I think I'm screwed. Any way around that? My algorithm is basically a glorified version of "the input pixel is this color, so look up my output pixel values in this giant array."

From my limited understanding of Shader languages, your problem can easily be solved by using texture instead of array.
Pregenerate it on CPU and then save as texture. 1024x1024 in your case.
Use standard texture access functions as if texture was the array. Posibly using nearest-neighbor to limit blendinding of individual pixels.
I dont think this is possible if you want speed.

Unsafe C# and pointers for 2D rendering, good or bad?

I am writing a C# control that wraps DirectX 9 and provides a simplified interface to perform 2D pixel level drawing. .NET requires that I wrap this code in an unsafe code block and compile with the allow unsafe code option.
I'm locking the entire surface which then returns a pointer to the locked area of memory. I can then write pixel data directly using "simple" pointer arithmetic. I have performance tested this and found a substantial speed improvement over other "safe" methods I know of.
Is this the fastest way to manipulate individual pixels in a C# .NET application? Is there a better, safer way? If there was an equally fast approach that does not require pointer manipulation it would be my preference to use that.
(I know this is 2008 and we should all be using DirectX 3D, OpenGL, etc., however this control is to be used exclusively for 2D pixel rendering and simply does not require 3D rendering.)

Using unsafe pointers is the fastest way to do direct memory manipulation in C# (definitely faster than using the Marshal wrapper functions).
Just out of curiosity, what sort of 2D drawing operations are you trying to perform?
I ask because locking a DirectX surface to do pixel level manipulations will defeat most of the hardware acceleration benefits that you would hope to gain from using DirectX. Also, the DirectX device will fail to initialize when used over terminal services (remote desktop), so the control will be unusable in that scenario (this may not matter to you).
DirectX will be a big win when drawing large triangles and transforming images (texture mapped onto a quad), but it won't really perform that great with single pixel manipulation.
Staying in .NET land, one alternative is to keep around a Bitmap object to act as your surface, using LockBits and directly accessing the pixels through the unsafe pointer in the returned BitmapData object.

Yes, that is probably the fastest way.
A few years ago I had to compare two 1024x1024 images at the pixel level; the get-pixel methods took 2 minutes, and the unsafe scan took 0.01 seconds.

I have also used unsafe to speed up things of that nature. The performance improvements are dramatic, to say the least. The point here is that unsafe turns off a bunch of things that you might not need as long as you know what you're doing.
Also, check out DirectDraw. It is the 2D graphics component of DirectX. It is really fast.

I recently was tasked with creating a simple histogram control for one of our thin client apps (C#). The images that I was analyzing were about 1200x1200 and I had to go the same route. I could make the thing draw itself once with no problem, but the control needed to be re-sizable. I tried to avoid it, but I had to get at the raw memory itself.
I'm not saying it is impossible using the standard .NET classes, but I couldn't get it too work in the end.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.