3D Buffers in HLSL? - c#

I wanna send a series of integers to HLSL in the form of a 3D array using unity. I've been trying to do this for a couple of days now, but without any gain. I tried to pack the buffers into each other (StructuredBuffer<StructuredBuffer<StructuredBuffer<int>>>), but it simply won't work. And I need to make this thing resizable, so I can't use arrays in structs. What should I do?
EDIT: To clarify a bit more what I am trying to do here, this is a medical program. When you go make a scan of your body, some files are generated. Those files are called DICOM files(.dcm). Those files are given out to a doctor. The doctor should open the program, select all of the DICOM files and load them. Each DICOM file contains an image. However, those images are not as the normal images used in our daily life. Those images are grayscale and each pixel has a value that ranges between -1000 to a couple of thousands, so each pixel is saved as 2 bytes(or an Int16). I need to generate a 3D model of the body that got scanned, so I'm using the Marching Cubes algorithm to generate it(have a look at Polygonising a Scalar Field). The problem is I used to loop over each pixel in about 360 512*512 sized images, which took too much time. I used to read the pixel data from each file once I needed it when I used the CPU. Now I'm trying to make this process occur at runtime. I need to send all of the pixel data to the GPU before processing it. That's my problem. I need the GPU to read data from disk. Because that ain't possible, I need to send 360*512*512*4 bytes of data to the GPU in the form of 3D array of Ints. I'm also planning to keep the data there to avoid retransfer of that huge amount of memory. What should I do? Refer to this link to know more about what I'm doing

From what I've understood, I would suggest to try the following:
Flatten your data (nested buffers are not what you want on your gpu)
Split your data across multiple ComputeBuffers if necessary (when I played around with them on a Nvidia Titan X I could store approximately 1GB of data per buffer. I was rendering a 3D point cloud with 1.5GB of data or something, the 360MBytes of data you mentioned should not be a problem then)
If you need multiple buffers: let them overlap as needed for your marching cubes algorithm
Do all of your calculations in a ComputeShader (I think requires DX11, if you have multiple buffer, run it multiple times and accumulate your results) and then use the results in a standard shader which your call from OnPostRender function (use Graphics.DrawProcedural inside to just draw points or build a mesh on the gpu)
Edit (Might be interesting to you)
If you want to append data to a gpu buffer (because you don't know the exact size or you can't write it to the gpu at once), you can use AppendBuffers and a ComputeShader.
C# Script Fragments:
struct DataStruct
{
...
}
DataStruct[] yourData;
yourData = loadStuff();
ComputeBuffer tmpBuffer = new ComputeBuffer(512, Marshal.SizeOf(typeof(DataStruct)));
ComputeBuffer gpuData = new ComputeBuffer(MAX_SIZE, Marshal.SizeOf(typeof(DataStruct)), ComputeBufferType.Append);
for (int i = 0; i < yourData.Length / 512; i++) {
// write data subset to temporary buffer on gpu
tmpBuffer.SetData(DataStruct.Skip(i*512).Take((i+1)*512).ToArray()); // Use fancy Linq stuff to select data subset
// set up and run compute shader for appending data to "gpuData" buffer
AppendComputeShader.SetBuffer(0, "inBuffer", tmpBuffer);
AppendComputeShader.SetBuffer(0, "appendBuffer", gpuData);
AppendComputeShader.Dispatch(0, 512/8, 1, 1); // 8 = gpu work group size -> use 512/8 work groups
}
ComputeShader:
struct DataStruct // replicate struct in shader
{
...
}
#pragma kernel append
StructuredBuffer<DataStruct> inBuffer;
AppendStructuredBuffer<DataStruct> appendBuffer;
[numthreads(8,1,1)]
void append(int id: SV_DispatchThreadID) {
appendBuffer.Append(inBuffer[id]);
}
Note:
AppendComputeShader has to be assigned via the Inspector
512 is an arbitrary batch size, there is an upper limit of how much data you can append to a gpu buffer at once, but I think that depends on the hardware (for me it seemed to be 65536 * 4 Bytes)
you have to provide a maximum size for gpu buffers (on the Titan X it seems to be ~1GB)

In Unity we currently have the MaterialPropertyBlock that allows SetMatrixArray and SetVectorArray, and to make this even sweeter, we can set globally using the Shader static helpers SetGlobalVectorArray and SetGlobalMatrixArray. I believe that these will help you out.
In case you prefer the old way, please look at this quite nice article showing how to pass arrays of vectors.

Related

C# Performance optimization: Avoid drawing points at the same location, increase size instead

I am visualizing a point cloud in Unity. My C# script reads the RGB data of a .png file and draws a particle at the corresponding position( x=r, y=g, z=b).
Pictures of course have multiple pixels of the same color, currently they are still drawn but i want to avoid that and increase the size of the corresponding particle instead.
I already tried checking with Array.IndexOf() for existing particles and increasing their size when found.
The problem with this solution is that it is very slow. The possible amount of different particles is 256*256*256 and when I tried it with only 50*50 particles it took over a minute to compute. An example with 4*4 worked well but this is far from what I need.
I already did think about making a list with existing particles. Maybe the list search is faster, but then I also would have to transform the list to an array.
Another idea is to just store a counter value in an int[256,256,256] and then iterating through it to create particles. But this would also be a huge overhead.
Any ideas for a better approach are very welcome.
Edit: Creating all the particles including the unneccesary ones is very fast, just taking 1-2 seconds to compute a million particles. For this visual an rendering improvement I hope that I dont need to increase the computation time by factor 10 and above.
I'm not too sure on the size of your dataset, but having tested this quick bit of code with 1million items, it seems pretty quick (<0.5s to generate data and generate weighting).
Assuming your RBG list looks like the following:
var rbgList = new List<RBG>();
then a quick bit of LINQ to provide grouping by unique RGB combinations with number of unique occurences would be like:
var grouping=
rbgList.GroupBy(val =>
new {val.R, val.B, val.G}, (key, group)=>
new {RBG= new RBG(key.R, key.B, key.G), Count = group.Count()})
.Select(g=>g)
.OrderBy(g=>g.Count);
Then you can itterate through 'grouping', getting the RBG value and the Count, allowing you to locate the x/y/z coords you need then scale the point size based on the count.
Hope that helps

Uniform buffer size on Nvidia GPUs

I use C# with OpenTK to access OpenGL API. My project uses tessellation to render a heightmap. My tessellation control shader splits a square into a grid of 64 squares and my tessellation evaluation shader adds vertical offsets to those points. Vertical offsets are stored in a uniform float buffer like this:
uniform float HeightmapBuffer[65 * 65];
Everything works fine, when I run the project on my laptop with AMD Radeon 8250 GPU. The problems start when I try to run it on Nvidia graphic cards. I tried an older GT 430 and a brand new GTX 1060, but results are same:
Tessellation evaluation info
----------------------------
0(13) : error C5041: cannot locate suitable resource to bind variable "HeightmapBuffer". Possibly large array.
As I researched this problem, I found GL_MAX_UNIFORM_BLOCK_SIZE variable which returns ~500MB on the AMD and 65.54 kB on both Nvidia chips. It's a little strange, since my array actually uses only 16.9 kB, so I am not even sure if the "BLOCK SIZE" actually limits the size of one variable. Maybe it limits the size of all uniforms passed to one shader? Even so, I can't believe that my program would use 65 kB.
Note that I also tried to go the 'common' way by using a texture, but I think there were problems with interpolation, so when placing two adjacent heightmaps together, the borders didn't match. With a uniform buffer array on the other side, things work perfectly.
So what is the actual meaning of GL_MAX_UNIFORM_BLOCK_SIZE? Why is this value on Nvidia GPUs so low? Is there any other way to pass a large array to my shader?
As I researched this problem, I found GL_MAX_UNIFORM_BLOCK_SIZE variable which returns ~500MB on the AMD and 65.54 kB on both Nvidia chips.
GL_MAX_UNIFORM_BLOCK_SIZE is the wrong limit. That applies only to Uniform Buffer Objects.
You just declare an array
uniform float HeightmapBuffer[65 * 65];
outside of a uniform block. Since you seem to use this in a tesselation evaluation shader, the relevant limit is MAX_TESS_EVALUATION_UNIFORM_COMPONENTS (there is a separate such limit for each programmable stage). This component limit counts just the number of float components, so a vec4 will consume 4 components, a float just one.
In your particular case, the latest GL spec, [GL 4.6 core profile] (https://www.khronos.org/registry/OpenGL/specs/gl/glspec46.core.pdf)
at the time of this writing, just guarantees a minimum value of 1024 for that (=4kiB), and you are way beyond that limit.
It is actually a very bad idea to use plain uniforms for such amounts of data. You should consider using UBOs, Texture Buffer Objects, Shader Storage Buffer Objects or even plain textures to store your array. UBOs would probably be the most natural choice in your scenario.

nAudio odd buffer values from playing files

The float array buffers I'm getting from nAudio seem really odd, when I replay it sounds perfect but graphing the buffer showed a picture that looked mostly like noise. It took me a while but I think I've made some headway but I'm a little stuck.
The float array that comes out has a block align of 8, so 4 floats per sample (I'm recording at 16bit so one float should easily hold this. However there are 2 and often 3 (for load) floats provided per sample. I ended up graphing it - Charts of Data. The top picture is the closest I can get to reconstructing the wave, the bottom is the wave as recorded and the middle is a chart of the raw data.
It seems to me that each float is simply holding a byte value but I'm very confused as to the first value which appears to be some kind of scaling factor.
Before I go into to much detail on what I've found I might just leave it at that with the hope Mark will know exactly how/why I am seeing this.
My current best attempt to decode this data is to convert the numbers to bytes then left shift them together which provides the top chart of the attached. I'm fairly sure that there is more to it however.
OK so after a bit more tweaking I figured out that the float array was in fact an array of bytes from floats. Not sure if that makes sense, each "float" in the 4 floats per sample was raw bits that made up floats.
In the end this made it incredibly easy to process the buffer into an array of floats as follows;
_samplesToProcess = floatsIn.Length / WaveFormat.BlockAlign * WaveFormat.Channels;
if (_rawFloatsOut.Length < _samplesToProcess)
_rawFloatsOut = new float[_samplesToProcess];
Buffer.BlockCopy(floatsIn, 0, _rawFloatsOut, 0, floatsIn.Length);
BufferProcessor(_rawFloatsOut);

Downsampling the Kinect 2's Color Camera input

I'm using the Kinect 2 for Windows and the C# version of the SDK. If needed writing a separate C++ lib or using C#'s unsafe regions for better performance is definitely an option
I'm trying to downsample the input of the Kinect's Color Camera as 1920x1080 pixels # 30 fps is a bit much. But I cannot find a built in function to reduce the resolution (very odd, am I missing something?)
My next idea was to store the data in a large byte[] and then selectively sample from that byte[] directly into another byte[] to reduce the amount of data.
int ratio = full.Length / smallBuffer.Length;
int bpp = (int)frameDescription.BytesPerPixel;
for (int i = 0; i < small.Length; i += bpp)
{
Array.Copy(full, i * ratio, small, i, bpp);
}
However, this method gives me a very funny result. The image has the correct width and height but the image is repeated along the horizontal axis multiple times. (Twice if I use half the original resoltion, thrice if I use a third, etc...).
How can I correctly downsample (subsample is actually a better description) the video?
My final solution was letting the encoder (x264VFW in my case do the downsampling, the real bottleneck turned out to be the copying of the array which was solved by giving the encoder a pointer to where the array was in managed memory (using a GCHandle).

Verify image sequence

Problem
Problem shaping
Image sequence position and size are fixed and known beforehand (it's not scaled). It will be quite short, maximum of 20 frames and in a closed loop. I want to verify (event driven by button click), that I have seen it before.
Lets say I have some image sequence, like:
http://img514.imageshack.us/img514/5440/60372aeba8595eda.gif
If seen, I want to see the ID associated with it, if not - it will be analyzed and added as new instance of image sequence, that has been seen. I have though about this quite a while, and I admit, this might be a hard problem. I seem to be having hard time of putting this all together, can someone assist (in C#)?
Limitations and uses
I am not trying to recreate copyright detection system, like content id system Youtube has implemented (Margaret Gould Stewart at TED ( link )). The image sequence can be thought about like a (.gif) file, but it is not and there is no direct way to get binary. Similar method could be used, to avoid duplicates in "image sharing database", but it is not what I am trying to do.
My effort
Gaussian blur
Mathematica function to generate Gaussian blur kernels:
getKernel[L_] := Transpose[{L}].{L}/(Total[Total[Transpose[{L}].{L}]])
getVKernel[L_] := L/Total[L]
Turns out, that it is much more efficient to use 2 passes of vector kernel, then matrix kernel. Thy are based on Pascal triangle uneven rows:
{1d/4, 1d/2, 1d/4}
{1d/16, 1d/4, 3d/8, 1d/4, 1d/16}
{1d/64, 3d/32, 15d/64, 5d/16, 15d/64, 3d/32, 1d/64}
Data input, hashing, grayscaleing and lightboxing
Example of source bits, that might be useful:
Lightbox around the known rectangle: FrameX
Using MD5CryptoServiceProvider to get md5 hash of the content inside known rectangle atm.
Using ColorMatrix to grayscale image
Source example
Source example (GUI; code):
Get current content inside defined rectangle.
private Bitmap getContentBitmap() {
Rectangle r = f.r;
Bitmap hc = new Bitmap(r.Width, r.Height);
using (Graphics gf = Graphics.FromImage(hc)) {
gf.CopyFromScreen(r.Left, r.Top, 0, 0, //
new Size(r.Width, r.Height), CopyPixelOperation.SourceCopy);
}
return hc;
}
Get md5 hash of bitmap.
private byte[] getBitmapHash(Bitmap hc) {
return md5.ComputeHash(c.ConvertTo(hc, typeof(byte[])) as byte[]);
}
Get grayscale of the image.
public static Bitmap getGrayscale(Bitmap hc){
Bitmap result = new Bitmap(hc.Width, hc.Height);
ColorMatrix colorMatrix = new ColorMatrix(new float[][]{
new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0.5f,0.5f,0.5f,0,0},
new float[]{0.5f,0.5f,0.5f,0,0}, new float[]{0,0,0,1,0,0},
new float[]{0,0,0,0,1,0}, new float[]{0,0,0,0,0,1}});
using (Graphics g = Graphics.FromImage(result)) {
ImageAttributes attributes = new ImageAttributes();
attributes.SetColorMatrix(colorMatrix);
g.DrawImage(hc, new Rectangle(0, 0, hc.Width, hc.Height),
0, 0, hc.Width, hc.Height, GraphicsUnit.Pixel, attributes);
}
return result;
}
I think you have a few issues with this:
Not all image sequences [videos] are equal [but many are similar]
Where is your data coming from?
How will you repesent the data related to your viewings?
Size of the data
Issue #1:
Many images can differ slightly by compression, water marking, missing frames, and adding clips. I would suggest sampling the video. For example you may want to consider sub-sampling small sections of the images in the video. Additionally, to avoid noisy images and issues with lossely compression algorithms. You may want to consider grayscaling the frames sampled, and doing a gaussian blur. [Guassian because its "more natural" (short answer)] Once you have enough sub samples to where you have a good confidence of similarity to the video then store it in a database. With the samples you can hash them, or store them to do a % similarity later.
Issue #2
Your datasource is going to influence the tool kits, and libraries that you use.
I would suggest keeping this simple [keep it with gifs and create a custom viewer, dont' try to write a browser plugin while developing your logic]
Issue #3
Using something like Postgres [if there are a lot of large sized objects] or SQLLite is highly suggested for indexing, storing, and recalling past meta data.
Issue #4
The size of the data will have a huge determination on recall, sampling, querying the database, etc.
Overall advice: Don't bite off more than you can handle at this stage. Start small and then grow.
Also take a look at Computer Vision algorithms for more help on the object representation/recall.
The question itself is sure very interesting and challenging, however there are many practical issues as stated by #monksy.
The opportunist pragmatic in me would take a step back, look at the big picture and see if there is another way to solve the problem. For example, if you are building some kind of "image sharing community" and want to avoid duplicates in the database, you could do a simple md5 on the file (animated gifs on the web are usually always the same, it's rare that people modify them).
Another example: if you are analyzing scientific samples (like meteo sequences) it may be easier to directly embed some kind of hash in every file when generating them.
This depends on wether you only want to know wether you've seen an absolutely identical movie again, or you also want to identify movies that are very similar but have been changed a bit (made lighter, have a watermark added, compression changed, etc.)
In the first case, just take any type of hash of the file and use that (because the file will be identical on the binary level.
In the second case (which I think is what you want) you have an interesting image processing problem on your hands. You could find yourself at the front-lines of image processing science with this if you'd want. If that is the case I suggest you start reading about SURF and OpenCV, and continue on from that.
If you want to match very similar, but not identical videos, and don't want to go the ultra-robus scientific route then I'd suggest the following process:
Do the gaussian blur you already do.
Divide each image into a few equally sized rectangles (you'd have to test for the best number, but I'd suggest you start with 9.
For each rectangle in each frame compute the full-colour histogram, then find the most occurring colour in that rectangle. This gives you 9*20 = 180 numbers. This is the "fingerprint" of this movie.
Find the most similar fingerprint in your database, if it is similar enough you already know about it, otherwise you don't.
Step 4 is a bit vague because I'm not really into this field. You are currently using an MD5 hash as a sort of fingerprint, but this is unsuitable in this case because slight differences in the input of a good cryptographic hashing function produce very large differences in the hash. This will mean that two very similar frames will have a totally different MD5 hash, so from the hash you'd never know they were similar.
As long as speed of database lookups is not an issue I'd just go for the sum of square differences as a measure of fingerprint similarity, and set a threshold on that to identify equal movies. However, this is not very fast for huge datasets, and in those cases you'd probably need to transform your fingerprint to something that will allow you to find similar fingerprints faster. One thing you could do here is start by selecting all known movies with very similar average colour for the entire video, then from that select the movies that have very similar average colour in each frame, and in the ones that remain at that point do the full rectangle-by-rectangle fingerprint match. But I'm sure there are even faster options for matching 180 numbers.
Perhaps you can find a way to get a binary copy of the image data of each frame in a variable. Hash that data (md5?) and store each of the hashes. Then you can see if you've ever seen that hash before. If you haven't, it's a new frame.

Categories