I'm trying to draw a few thousand particles using instancing. It's working and it's fast, but I have one bottleneck that slows the whole program down.
My Particle class is similar to this:
public class Particle
{
public Vector2 Position;
//More data not used for drawing
//....
}
Now in my DrawLoop() I got something like this:
Vector2[] instanceData = new Vector2[numParticles];
public void Draw()
{
for(int i = 0; i < numParticles; ++i)
instanceData[i] = Particles[i].Position; //THAT'S the slow part
instanceBuffer.SetData(instanceData);
//Now draw VertexBuffer using instancing
//...
}
I have tried using Parallel.For, but it doesn't speed things up enough, since I'm having like 8000 particles. Also I looked in the particlesystem example from MSDN. But their Particle struct just contains the data for drawing the particles, and the positions are calculated in the shader. However, I need additional data for several algorithms.
I can't think of a class design, so I don't need to assign the particle positions to the array every frame.
Since this problem ultimately arose from the data structures being used, let me present you with a common alternative to the linked list for scenarios such as this one.
Linked lists are generally not a good idea for storing particles for two reasons: one, you can't randomly access them efficiently, as you discovered here; and two, linked lists have poor locality of reference. Given the performance requirements of particle systems, the latter point can be killer.
A standard list has much better locality of reference, but as you've discovered, adding and removing items can be slow, and this is something you do commonly in particle engines.
Can we improve on that?
Let's start with something even more basic than a list, a simple array. For simplicity's sake, let's hard-cap the number of particles in your engine (we'll redress this later).
private const Int32 ParticleCount = 8000;
private readonly Particle[] particles = new Particle[ParticleCount];
private Int32 activeParticles = 0;
Assuming you have room, you can always add a particle to the end of the array in constant time:
particles[activeParticles++] = newParticleData;
But removing a particle is O(n), because all of the particles after it need to be shifted down:
var indexOfRemovedParticle = 12;
particles.RemoveAt(indexOfRemovedParticle);
activeParticles--;
What else can we do in constant time? Well, we can move particles around:
particles[n] = particles[m];
Can we use this to improve our performance?
Yes! Change the remove operation to a move operation, and what was O(n) becomes O(1):
var indexOfRemovedParticle = 12;
var temp = particles[indexOfRemovedParticle];
particles[indexOfRemovedParticles] = particles[activeParticles - 1];
particles[activeParticles - 1] = temp;
activeParticles--;
We partition our array: all of the particles at the beginning are active, and all of the particles at the end are inactive. So to remove a particle, all we have to do is swap it with the last active particle, then decrement the number of active particles.
(Note that you need the index within the array of the particle to remove. If you have to go searching for this, you end up reverting to O(n) time; however, since the usual workflow for particles is "loop through the whole list, update each particle, and if it's dead, remove it from the list," you often get the index of dead particles for "free" anyway.)
Now, this all assumes a fixed number of particles, but if you need more flexibility you can solve this problem the same way the List<T> class does: whenever you run out of room, just allocate a bigger array and copy everything into it.
This data structure provides quick inserts and removals, quick random access, and good locality of reference. The latter can be improved further by making your Particle class into a structure, so that all of your particle data will be stored contiguously in memory.
Related
I'll do my best to explain the problem, which first requires an explanation of the project. I apologize ahead of time if it's a bit scattered. I have ADHD, which has made communicating on this platform massively difficult, so please bear with me. This is necessary to mention because stack overflow has consistently suppressed my questions for being scattered and I'm tired of this intolerance.
The project generates a 3D Worley noise map using my own unoptimized algorithm, which, at the moment, requires iterating through every index of the 3D array and individually calculating every element's value on the CPU one at a time. also, this will be in 3D world space.
Additionally, I needed to write my own class for the randomized points or "nodes" because this program iteratively moves these nodes in pseudorandom directions, and each node is associated with a procedure for calculating map index values, which is just assigned via an integer between 1 and 6. after each iteration, the map is regenerated. Without the nodes, this can't work.
Code for the Nodes on repl.it
Obviously, this is extremely slow, and I need to implement multithreading and compute shaders, which is the conclusion I've come to. Still, I'm faced with a massive problem: I've no idea how to use hlsl or compute shaders, and I cannot, for the life of me, find any resources on hlsl for C#/java/python programmers that would help me wrap my head around anything. ANY resources explaining hlsl on a basic level would be enormously helpful.
Now, for the specific problem of this question: I have no idea how to start. I have one vague idea of an approach that is derived from my ignorance about multithreading. I could use 32 individual 32x32 RWStructuredTexture2D<float> arrays that I stack after calling my shader to create a 3D texture; however, to do this, I need to be able to pass my nodes to the shader, and every use of compute shaders I've seen only has one parameter, uint3 id : SV_DispatchThreadID, which makes no sense to me. I've briefly considered making a struct for the nodes in my shader, but I still have no idea how to get that information to my shader.
For the actual question: how do I throw nodes at this and then get 32 32x32 float arrays out of it?
here's some pseudocode for the in-betweens.
//somehow set this up to have 32 different threads
//make an hlsl equivalent of a float[,]
#params NodeSet nodes and z coordinate
#return float[32,32]
//NodeSet is just a collection of Nodes that has some convenience methods.
float[,] CSMain(#params) {
for(int x = 0; x < 32; x++)
for(int y = 0; y < 32; y++)
//set value of element
return floatArr;
}
Second question: should I even be using Compute Shaders for this problem?
You should use 32 x 32 x 32 threads and output to a 1d buffer of length 32 x 32 x 32 based on the id of the thread
Pseudocode:
#pragma kernel PseudoCode
[numthreads(32, 32, 32)]
void PseudoCode (uint3 id : SV_DispatchThreadID) {
float3 pos = id; // or id - float3(15.5,15.5,15.5) to center, etc.
int outputIndex = id.x*32*32+id.y*32+id.z;
for (int i = 0; i < nodecount ; i++)
outputbuffer[outputIndex] += GetInfluence(nodeData[i], pos);
outputBuffer[outputIndex] = NormalizeOutput(outputBuffer[outputIndex]);
}
For a school assignment we need to implement a 7-Riffle Algorithm Method in C# which shuffles the faces of a Rubik's Cube. Unfortunately there is not enough resources on the web that show how it should be coded. I implemented the Stopwatch already to calculate the elapsed ticks it takes for different Rubik's cube sizes.
This code works for the shuffling bit, but the time it takes doesn't seem to make sense as it is faster than that of Fisher Yates.
Random rand = new Random();
for (int i = rubikCubeArray.Length - 1; i > 7; i--)
{
int n = rand.Next(i + 1);
int temp = rubikCubeArray[i];
rubikCubeArray[i] = rubikCubeArray[n];
rubikCubeArray[n] = temp;
}
Any help please?
common starting seed is a good idea (as jdweng pointed out)
I just needed repair that typo as rookies might not know seen should be seed. This way both compared algorithms would have the same conditions.
nested for loops
not familiar with 7-Riffle Shuffle Algorithm but backtracking solver should have nested for loops. Right now you got single loop that goes 9 times (why?).
If you have N=7 turns shuffled cube than you need 7 nested for loops each iterating over all possible turns 3*3*2=18. If the N is changing you need dynamically nested for loops for more info see:
dynamicaly nested for loop nested_for
or maskable nested for loops up to gods number (20).
Each for loop on each iteration turns previous loop state cube by selected movement and in last loop should be also detecting solved case and break if found.
So something like this (solver):
cube0=solved_cube();
for (cube1=cube0,i1=0; i1<18; i1++,cube1=turn_cube(cube0,i1))
for (cube2=cube1,i2=0; i2<18; i2++,cube2=turn_cube(cube1,i2))
...
for (cube7=cube6,i7=0; i7<18; i7++,cube7=turn_cube(cube1,i7))
if (cube7 == solved_cube()) return { i1,i2,...,i7 }; // solution found
return false; // unsolved
where turn_cube(cube a,int turn) will return cube a turned by turn where turn selects which slice is turned in which direction (which of the 18 possible turns) ...
Also this might interests you:
Quaternion rotation do not works as excepted
[Edit1] shuffler
As I mentioned I am not familiar with 7 riffle shuffle algo so if you just want to have a cube 7 turns from solved state then you're almost right. You should have single for loop as you have but inside you need to make valid random moves something like this:
cube=solved_cube();
for (i=0; i<7; i++)
cube=turn_cube(cube,Random(18));
Now the real problem is to code the turn_cube function. To help with that we would need to know much more about how you represent your Rubik cube internally.
So is it 1D,2D or 3D array? What is the topology? What are the element values (HEX color may be or just 0..5 or some enum or transform matrix)?
In the link above is example of mine solver with source code to function void RubiCube::cube_rotate(int axis,int slice,double ang) which is more or less what the cube_turn should do. There are 18 possible turns:
3 axises (axis)
each axis has 3 slices (slice)
and we can turn CW or CCW by 90 deg (ang beware mine ang is in radians and allows arbitarry angles as I animate the turns)
so you need to map those 18 cases into int turn = <0,17> which will be processed in the cube_turn and applied to your cube...
I am visualizing a point cloud in Unity. My C# script reads the RGB data of a .png file and draws a particle at the corresponding position( x=r, y=g, z=b).
Pictures of course have multiple pixels of the same color, currently they are still drawn but i want to avoid that and increase the size of the corresponding particle instead.
I already tried checking with Array.IndexOf() for existing particles and increasing their size when found.
The problem with this solution is that it is very slow. The possible amount of different particles is 256*256*256 and when I tried it with only 50*50 particles it took over a minute to compute. An example with 4*4 worked well but this is far from what I need.
I already did think about making a list with existing particles. Maybe the list search is faster, but then I also would have to transform the list to an array.
Another idea is to just store a counter value in an int[256,256,256] and then iterating through it to create particles. But this would also be a huge overhead.
Any ideas for a better approach are very welcome.
Edit: Creating all the particles including the unneccesary ones is very fast, just taking 1-2 seconds to compute a million particles. For this visual an rendering improvement I hope that I dont need to increase the computation time by factor 10 and above.
I'm not too sure on the size of your dataset, but having tested this quick bit of code with 1million items, it seems pretty quick (<0.5s to generate data and generate weighting).
Assuming your RBG list looks like the following:
var rbgList = new List<RBG>();
then a quick bit of LINQ to provide grouping by unique RGB combinations with number of unique occurences would be like:
var grouping=
rbgList.GroupBy(val =>
new {val.R, val.B, val.G}, (key, group)=>
new {RBG= new RBG(key.R, key.B, key.G), Count = group.Count()})
.Select(g=>g)
.OrderBy(g=>g.Count);
Then you can itterate through 'grouping', getting the RBG value and the Count, allowing you to locate the x/y/z coords you need then scale the point size based on the count.
Hope that helps
I have read some tutorials on perlin noise (for example this one) for my terrain generation, and I wanted to make sure that I understood it and can correctly implement it.
I start with 1 Dimension:
amplitude = persistence^i
// persistence can have any value but mostly it is 1 or lower. It changes the amplitude of the graphs with higher frequency since:
frequency = 2^i
//the 2 effects, that each graph reflects one octave, wich is not 100% necessary but the example happens do do it like this
'i' is the octave we are looking at.
Here is my attempt:
private float[] generateGraph()
{
float[] graph = new float[output.Width];
for (int i = 0; i < output.Width; i += 1/frequency)
{
graph[i] = random.Next((int)(1000000000000*persistence))/1000000000000f;
}
return graph;
}
I imagined the array as a graph, where the index is X and the value is Y. I search for a value for every multiple of texture.Width/frequency until the end of the array.
I have some random values I am using for now, which I have to connect with either Linear Interpolation/Cosine Interpolation or Cubic Interpolation.
Which one should I use? Which is the most performant when I want to use the noise for terrain generation in 2D?
I would like to put the graphs in a 2D-array after this and then check for each value, if its higher than 0.5, it should get some material or texture.
Is this situation, how should I do it? Am I totally on the wrong track?
edit1: Before I put the graph in a 2D array, I would like to generate perhaps 5 other graphs with a higher 'i' and blend them (which shouldn't be too hard).
edit2: this implementation is nice and 'easy'.
Define "too much performance" any kind of interpolation should be fine for 2D data. If you are really worried about performance, you might try implementing Simplex Noise, but that is much harder to understand and implement and it really becomes better at 3D and higher. In 2D they are somehow comparable.
Also, perlin noise is usually implemented as function of x parameters, where x is number of dimensions and the function has internal array of random values, that is accessed based on integer values of the parameters. You should try studying the original source code.
I have designed Multiplayers Games before, but now I wanted to create an MMORPG Architecture for learning/challenge purpose. I want to go as far as simulate hundreds(or a couple thousands) of concurrent players on a single server.
So far so good except that right now I am facing a problem to figure out a good way to update all game objects on the Server as often and as fast as possible.
By Game Objects I mean all Players, Mobs, Bullets.
The problem is that all Players, Mobs, Bullets are stored in a Collections on the Server Side Memory for faster processing, and iterating throught all of them to check for collisions, update health, update movement, etc... is taking way too long.
Lets say I have 1000 players and there is 10000 mobs in the whole world and all players and creatures are responsible for creating 5 other game objects(no more no less) such as bullets.
That would give (1000 + 10000) * 5 = 55000 game objects in a collection.
Iterating throught all objects, to update them, take forever(a couples minutes) on a Dual-Core HT i5 with 4gb RAM. This is wrong.
When iterating, the code look like this(pseudo):
for(int i = 0; i < gameobjects.Count; i++) {
for(int j = 0; j < gameobjects.Count; j++) {
// logic to verify if gameobjects[i] is in range of
// gameobjects[j]
}
}
As an optimization, I am thinking about dividing my game objects in different zones and collections but that wouldn't fix the problem where I need to update all objects several times per seconds.
How should i proceed to update all game objects on the server side? I did heavy search to find interesting game design patterns but no results so far. :(
Thanks in advance!
I would change the design completely and implement an event base design. This has many advantages, the obvious one is that you will only need to update the objects that are actually being interacted with. As you will always have a majority of the game objects in an MMO game being idle, or not seen at all.
There is no reason why you should calculate objects that are not visible on any players screen. That would be insane and require a server farm that you most likely cannot afford. Instead you can try to predict movement. Or store a list of all objects that are currently not interacted with and update these less frequently.
If a player can't see an object you can teleport the unit over large distances instead of having it travel smoothly. Essentially moving the unit over huge distances within the confined area that the object is allowed to move. Making it look like the object is moving freely even when the object is not visible to the players. Usually this would be triggered as an event when a new player enters, or leaves a zone.
You can achieve this by simply calculating the time since the last update and predict how far the object would have traveled, as if it was visible to the player. This is especially useful for objects or NPCs that has a set route, as it makes the calculations much simpler.
Your code is running that slow, not just because it is checking all N objects, but because it checking all possible interactions of objects, and that takes N^2 calculations = 3 025 000 000 in your sample.
One way to reduce this number of checks would be to put the object in your game world into a grid, so that objects that are not in the same or aligned cells cannot interact with each other.
Also, your current code checks each interaction twice, you can easily fix this by starting loop from i in your inner cycle:
for(int i = 0; i < gameobjects.Count; i++)
for(int j = i; j < gameobjects.Count; j++)
Looping over 55,000 objects shouldn't be too slow. Obviously, you are doing too much stuff too often over those objects and probably doing stuff that shouldn't always be done.
For example, if there is no players around a mob, should it really be calculated?
(if a tree falls in a forest and there's nobody around, does it really make a sound?)
Also, lot of objects might not need to be updated at every loop. Players for instance could be left to the client to calculate and only be "verified" once every 1-2 seconds. Dumping all the player's collision to the client would make your server workload much easier to handle. Same things for player's bullet or raycast. In return, it also makes the game much more fluid for the players.
Does mobs when following a path need to be tested for collision or can the path's nodes be enough?
Testing every objects against every other objects is terrible. Does all mobs has to be tested vs all other mobs or only specific type or faction need to be tested? Can you split your world into smaller zone that would only test mobs within it against objects also in it?
There's huge work done in MMO server's code to make it work properly. The optimizations done is sometime insane, but as long as it works.