I got a (basic) voxel engine running and a water system that looks (and I assume basically works) like this: https://www.youtube.com/watch?v=Q_TdeGIOOts (not my game).
The water values are stored in a 3d Array of floats, and every 0.05s it calculates water flow by checking the voxel below and adjacent (y-1, x-1, x+1, z-1, z+1) and adds the value.
This system works fine (70+ fps) for small amounts of water, but when I start calculating water on 8+ chunks, it gets too much.
(I disabled all rendering or mesh creation to check if that is the bottleneck, it isnt. Its purely the flow calculations).
I am not a very experienced programmer so I wouldnt know where to start optimizing, apart from making the calculations happen in a coroutine as I already did.
In this post: https://gamedev.stackexchange.com/questions/55414/how-to-define-areas-filled-with-water (near the bottom) Boreal suggests running it in a compute shader. Is this the way to go for me? And how would I go about such a thing?
Any help is much appreciated.
If you're really calculating a voxel based simulation, you will be expanding the number of calculations geometrically as your size increases, so you will quickly run out of processing power on larger volumes.
A compute shader is great for doing massively parallel calculations quickly, although it's a very different programming paradigm that takes some getting used to. A compute shader will look at the contents of a buffer (ie, a 'texture' for us civilians) and do things to it very quickly -- in your case the buffer will probably be a buffer/texture whose pixel values represent water cells. If you want to do something really simple like increment them up or down the compute shader uses the parallel processing power of the GPU to do it really fast.
The hard part is that GPUs are optimized for parallel processing. This means that you can't write code like "texelA.value += texelB.value" - without extra work on your part, each fragment of the buffer is processed with zero knowledge of what happens in the other fragments. To reference other texels you need to read the texture again somehow - some techniques read one texture multiple times with offsets (this GL example does this to implement blurs, others do it by repeatedly processing a texture, putting the result into a temporary texture and then reprocessing that.
At the 10,000 foot level: yes, a compute shader is a good tool for this kind of problem since it involves tons of self-similar calculation. But, it won't be easy to do off the bat. If you have not done conventional shader programming before, You may want to look at that first to get used to the way GPUs work. Even really basic tools (if-then-else or loops) have very different performance implications and uses in GPU programming and it takes some time to get your head around the differences. As of this writing (1/10/13) it looks like Nvidia and Udacity are offering an intro to compute shader course which might be a good way to get up to speed.
FWIW you also need pretty modern hardware for compute shaders, which may limit your audience.
Related
I'm working on image related algorithms and was wondering what some good choices would be for unpredictable 2D traversal of structures as such:
Start from a set pixel
Traverse to a neighbor (up left down right or diagonal, always 1 away, no jumps).
Keep doing more of those traversals to the next pixel with the next direction depending on some output (similar rate of each direction over time).
For simple traversal then arrays are just fine and very fast when reading sequentially, for random access not much can be done but i'm wondering if here Something could be done, if simply stored in an array going "up" could actually be pretty far in memory on large is (5+megapixel). Can you think of any good structures for this? I'm more than happy to trade memory for speed here but i can't think of any structure that could help short of actually making an array of items that each store a pointer to their 8 neighbors, but that sounds like it would be slow to create to begin with and i'm not making more than 1 to 3 passes on those images so it may end up being slower than the naive array implementation.
Any suggestions are welcome.
Space-filling curves are what you are looking for. They provide cache-friendly data layout in memory for neighbor-related operations.
However, be aware of premature optimization. Before doing any kind of optimizations, make sure you have a working algorithm. Then you can start optimizing it, taking benchmarks and comparing them with the base non-optimized version.
I am looking into making a game for Windows Phone and Windows 8 RT. The first iteration of the game will use XNA for the UI.
But since I plan to have other iterations that may not use XNA, I am writing my core game logic in a Portable Class Library.
I have gotten to the part where I am calculating vector math (sprite locations) in the core game logic.
As I was figuring this out, I had a co-worker tell me that I should make sure that I am doing these calculations on the GPU (not the CPU).
So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Side Question: If not, should they be done on the GPU? Or is it OK for me to do them in my Portable Class Library and have the CPU run them?
Note: If I need to have XNA do them so that I can use the GPU then it is not hard to inject that functionality into my core logic from XNA. I just want to know if it is something I should really be doing.
Note II: My game is a 2D game. It will be calculating movement of bad guys and projectiles along a vector. (Meaning this is not a huge 3D Game.)
I think your co-worker is mistaken. Here are just two of the reasons that doing this kind of calculation on the GPU doesn't make sense:
The #1 reason, by a very large margin, is that it's not cheap to get data onto the GPU in the first place. And then it's extremely expensive to get data back from the GPU.
The #2 reason is that the GPU is good for doing parallel calculations - that is - it does the same operation on a large amount of data. The kind of vector operations you will be doing are many different operations, on a small-to-medium amount of data.
So you'd get a huge win if - say - you were doing a particle system on the GPU. It's a large amount of homogeneous data, you perform the same operation on each particle, and all the data can live on the GPU.
Even XNA's built-in SpriteBatch does most of its per-sprite work on the CPU (everything but the final, overall matrix transformation). While it could do per-sprite transforms on the GPU (and I think it used to in XNA 3), it doesn't. This allows it to reduce the amount of data it needs to send the GPU (a performance win), and makes it more flexible - as it leaves the vertex shader free for your own use.
These are great reasons to use the CPU. I'd say if it's good enough for the XNA team - it's good enough for you :)
Now, what I think your co-worker may have meant - rather than the GPU - was to do the vector maths using SIMD instructions (on the CPU).
These give you a performance win. For example - adding a vector usually requires you to add the X component, and then the Y component. Using SIMD allows the CPU to add both components at the same time.
Sadly Microsoft's .NET runtime does not currently make (this kind of) use of SIMD instructions. It's supported in Mono, though.
So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Looking inside the Vector class in XNA using ILSpy reveals that the XNA Vector libraries do not use the graphics card for vector math.
I am looking for a way to approximate a volume of fluid moving over a heightmap. The easiest solution I can think of is to approximate it as a large number of non-drawn spheres, of small diameter (<0.1m). I would then place a visible plane representing the surface of the water on "top" of the spheres, at the locations they came to rest. To my knowledge, no managed physics engines contain a built in fluid simulator, hence the question.
Implementation would consist of using a physics engine such as JigLibX, which is capable of simulating the motion of the spheres. To determine the height of the planes, I was thinking of averaging the maximum height of each sphere that is on the top layer of a grouping.
I dont expect performance to be great, but would it be approachable for real time? If not, could I use this simulation to pre-bake lines of flow?
I hope this makes sense, I really want opinions/suggestions as to whether this is feasible, or if there is a better way of approaching this.
Thanks for any help, Venatu
(If its relevant, my target platform is XNA 4.0, using C#. Windows only at this point in time, so PhysX/Havok are possibilities for the simulation, but I would prefer a managed solution)
I haven't seen realistic fluid dynamics in real time without using something like PhysX as of yet - probably because the calculations needed are so complicated! The problem with your approach as I see it would come with the resting contact of all those spheres as they settled down, which takes up a lot of processing power. Lots of resting contact points are notorious for eating into performance very quickly, even on the most powerful of desktops.
If you are going down this route then I'd recommend modelling the fluid as an elastic but solid body using spring based physics, where the force applied to one part of the water would use springs to propagate out to the rest. This gives you the option of setting a breaking point for the springs and separating the body into two or more bodies when that happens (and the reverse for coming back together.) This can give you the foundation for things like spray. It's also a more versatile approach in terms of performance, because you can choose the number of particles and springs you use to approximate your model.
It's a big and complicated topic, but I hope that provided at least some insight!
The most popular method to simulate fluids in real-time is Smoothed-particle hydrodynamics.
Several useful links:
http://en.wikipedia.org/wiki/Smoothed-particle_hydrodynamics
http://http.developer.nvidia.com/GPUGems/gpugems_ch38.html
http://www.plunk.org/~trina/thesis/html/thesis_toc.html
In addition to simulation itself you will also need some specialized broad-phase collision detection algorithms such as sweep-and-prune or hashing cells.
And you're right, there is no completed 2d solutions for the fluid dynamics.
I've taken on quite a daunting challenge for myself. In my XNA game, I want to implement Blargg's NTSC filter. This is a C library that transforms a bitmap to make it look like it was output on a CRT TV with the NTSC standard. It's quite accurate, really.
The first thing I tried, a while back, was to just use the C library itself by calling it as a dll. Here I had two problems, 1. I couldn't get some of the data to copy correctly so the image was messed up, but more importantly, 2. it was extremely slow. It required getting the XNA Texture2D bitmap data, passing it through the filter, and then setting the data again to the texture. The framerate was ruined, so I couldn't go down this route.
Now I'm trying to translate the filter into a pixel shader. The problem here (if you're adventurous to look at the code - I'm using the SNES one because it's simplest) is that it handles very large arrays, and relies on interesting pointer operations. I've done a lot of work rewriting the algorithm to work independently per pixel, as a pixel shader will require. But I don't know if this will ever work. I've come to you to see if finishing this is even possible.
There's precalculated array involved containing 1,048,576 integers. Is this alone beyond any limits for the pixel shader? It only needs to be set once, not once per frame.
Even if that's ok, I know that HLSL cannot index arrays by a variable. It has to unroll it into a million if statements to get the correct array element. Will this kill the performance and make it a fruitless endeavor again? There are multiple array accesses per pixel.
Is there any chance that my original plan to use the library as is could work? I just need it to be fast.
I've never written a shader before. Is there anything else I should be aware of?
edit: Addendum to #2. I just read somewhere that not only can hlsl not access arrays by variable, but even to unroll it, the index has to be calculable at compile time. Is this true, or does the "unrolling" solve this? If it's true I think I'm screwed. Any way around that? My algorithm is basically a glorified version of "the input pixel is this color, so look up my output pixel values in this giant array."
From my limited understanding of Shader languages, your problem can easily be solved by using texture instead of array.
Pregenerate it on CPU and then save as texture. 1024x1024 in your case.
Use standard texture access functions as if texture was the array. Posibly using nearest-neighbor to limit blendinding of individual pixels.
I dont think this is possible if you want speed.
Given two byte arrays of data captured from a microphone, how can I determine which one has more spikes in noise? I would assume there is an algorithm I can apply to the data, but I have no idea where to start.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
If it helps, I am using the Microsoft.Xna.Framework.Audio.Microphone class to capture the sound.
you can convert each sample (normalised to a range 1.0 to -1.0) into a decibel rating by applying the formula
dB = 20 * log-base-10 (sample-value)
To be honest, so long as you don't mind the occasional false positive, and your microphone is set up OK, you should have no problem telling the difference between a baby crying and ambient background noise, without going through the hassle of doing an FFT.
I'd recommend you having a look at the source code for a noise gate, which does pretty much what you are after, with configurable attack times & thresholds.
First use a Fast Fourier Transform to transform the signal into the frequency domain.
Then check if the signal in the typical "cry-frequencies" is significantly higher than the other amplitudes.
The preprocessor of the speex codec supports noise vs signal detection, but I don't know if you can get it to work with XNA.
Or if you really want some kind of loudness calculate the sum of squares of the amplitudes from the frequencies you're interested in (for example 50-20000Hz) and if the average of that over the last 30 seconds is significantly higher than the average over the last 10 minutes or exceeds a certain absolute threshold sound the alarm.
Louder at what point? The signal's average amplitude will tell you which one is louder on average, but that is kind of a dumb, brute force way to go about it. It may work for you in practice though.
Getting down to it, I need to be able to determine when a baby is crying vs ambient noise in the room.
Ok, so, I'm just throwing out ideas here; I am by no means an expert on audio processing.
If you know your input, i.e., a baby crying (relatively loud with a high pitch) versus ambient noise (relatively quiet), you should be able to analyze the signal in terms of pitch (frequency) and amplitude (loudness). Of course, if during he recording someone drops some pots and pans onto the kitchen floor, that will be tough to discern.
As a first pass I would simply traverse the signal, maintaining a standard deviation of pitch and amplitude throughout, and then set a flag when those deviations jump beyond some threshold that you will have to define. When they come back down you may be able to safely assume that you captured the baby's cry.
Again, just throwing you an idea here. You will have to see how it works in practice with actual data.
I agree with #Ed Swangren, it will take a lot of playing with samples of data for a lot of sources. To me, it sounds like the trick will be to limit or hopefully eliminate false positives. My experience with babies is they are much louder crying than the environment. so, keeping track of the average measurements (freq/amp/??) of the normal environment and then classifying how well the changes match the characteristics of a crying baby which changes from kid to kid, so you'll probably want a system that 'learns'. Best of luck.
update: you might find this library useful http://naudio.codeplex.com/