Efficient pixel shader sum of all pixels

Efficient pixel shader sum of all pixels - c#

How can I efficiently calculate the sum of all pixels in an image, by using a HSLS pixel shader? I'm interested in Pixel Shader 2.0, that I could invoke as a WPF shader effect.

There is a much simpler solution that doesn't use shaders: load the image as a texture, create a mipmap chain and read back the value of the last mipmap (1x1 pixel). This trick is used in games extensively to calculate, for example, the average brigthness of a scene (in order to apply HDR tonemapping). It's a great trick if you value simplicity and efficiency over accuracy.

Assuming that you want to sum r/g/b channel values in image (and assuming that you can destroy original pixels) - here is my partial solution:
Each 10th pixel would calculate average color of 10 neighboring pixels and put this averaged color on image.
Then in CPU side sum Pix_Value*10 of each 10th pixel.
In this case we get about 10x speed-up compared to summing pixel values only in CPU side.

Effects in WPF are much more suited to producing visual result. It sounds like you want to use them for a different type of calculation, to do this you would need to treat the image result as data this would need RenderTargetBitmap which would be done in software anyway.
You might want to look at these projects designed for GPGPU.
Accelerator
Brahma
OpenTK I dont know the state of the OpenCL bindings in OpenTK. Other technologies such as DirectCompute and CUDA might be worth a look too.

Related

Kinect V2 Background Removal - Cutting of an pixels over foot and Hair

I have implemented Background remove functionality(aka : Green Screen Implemetation) using kinect in my Windows-RT application over there the noise of pixels (Jitter) is very high at foot area as well on hair of the acquired user so how to reduce this noise of pixels ?

There are a few techniques you could apply to reduce noise:
cv::bilateralFilter, most intensive, but with the right number of iterations will smooth out the image.
cv::morphologyEx, morphological closing will remove small gaps (of a few pixels) in the image, if the structuing element (cross, circle etc.) is the right kind and size.
cv::inpaint, will close bigger gaps and fill out the image where data is unavaliable. I suggest trying bilateral filtering (smoothing) after this step.
cv::findContours, filtering contours with an area smaller than a threshold could be used to remove big gaps in the image.
1 & 3 are mostly for salt and pepper noise and 2 & 3 are most appropriate in removing missing data.
Scaling down the depth data and scaling it back up to size (with good interpolation) also has the effect of smoothing out the image whilst preserving edges.
Using the K2, you might also find that mapping from color to depth coordinate space or vice vera gives you better results than the former.
Lastly, I would suggest you to look at some techniques used by traditional green screening and VR/AR, such as colouring the outermost edges of the foreground with a light or dark outline to get a 'clean' look.

3D z-buffer c# GDI plus

Im writting very simple 3d engine in c# and GDI+, just for render some models (I think Directx or OpenGL is like shovel to eat soup). So far I have succesfully implemented drawing Wireframe of my model, but next step is of course Faces. And there is my problem, for now I just project my 3d points to 2d point and then drawn it using simple
for each faceg.DrawPolygon(Pens.Red, projected_points); and for wireframe its ok.
It is possible to calculate overlapping part of polygon? and then draw FilledPolygon,
Or better idea is drawing pixel by pixel and if z-buffer of my pixel is further then set new pixel.
If first option is possible, which one is faster (for implement and calculating)?

It is possible to calculate overlapping part of polygon? and then draw FilledPolygon, Or
better idea is drawing pixel by pixel and if z-buffer of my pixel is further then set
new pixel.
If first option is possible, which one is faster (for implement and calculating)?
Yes, it is possible. You can test every polygon with every other of your list. The complexity depends on the type of the polygon (of course, it's easiest with triangles). But the performance may drop drastically with high count of polygons. And even if you find the overlapping areas, you will need to interpolate colors, or texture coordinates (if you plan to use such). Also I'm not sure about the API you use for drawing, but GDI doesn't support fill polygon with interpolated colors.
I have heard that this was the approach used in 3d graphics before inventing the Z buffer. :)
I once tried to realize similar project and used Z-buffer + my own routine to fill triangles with interpolated colors (which uses the Z-buffer). I drawed directly to a GDI bitmap's pixel data buffer. Then after all polygons has been rendered, I bitblt'ed the result to the screen.

Opengl - "fullscreen" texture rendering performance issue

I am writing a 2D game in openGL and I ran into some performance problems while rendering a couple of textures covering the whole window.
What I do is actually create a texture with the size of the screen, render my scene onto that texture using FBO and then render the texture a couple of times with a different offsets to get a kind of "shadow" going. But when I do that I get a massive performance drop while using my integrated video card.
So all in all I render 7 quads onto the whole screen(background image, 5 "shadow images" with a black "tint" and the same texture with its true colors). I am using RGBA textures which are 1024x1024 in size and fit them in a 900x700 window. I am getting 200 FPS when I am not rendering the textures and 34 FPS when I do (in both scenarios I actually create the texture and render the scene onto it). I find this quite odd because I am only rendering 7 quads essentially. A strange thing is also that when I run a CPU profiler it doesn't suggest that this is the bottleneck(I know that opengl uses a pipeline architecture and this thing can happen but most of the times it doesn't).
When I use my external video card I get consistent 200 FPS when I do the tests above. But when I disable the scene rendering onto the texture and disable the texture rendering onto the screen I get ~1000 FPS. This happens only to my external video card - when I disable the FBO using the integrated one I get the same 200 FPS. This really confuses me.
Can anyone explain what's going on and if the above numbers sound right?
Integrated video card - Intel HD Graphics 4000
External video card - NVIDIA GeForce GTX 660M
P.S. I am writing my game in C# - so I use OpenTK if that is of any help.
Edit:
First of all thanks for all of the responses - they were all very helpful in a way, but unfortunately I think there is just a little bit more to it than just "simplify/optimize your code". Let me share some of my rendering code:
//fields defined when the program is initialized
Rectangle viewport;
//Texture with the size of the viewport
Texture fboTexture;
FBO fbo;
//called every frame
public void Render()
{
//bind the texture to the fbo
GL.BindFramebuffer(FramebufferTarget.Framebuffer, fbo.handle);
GL.FramebufferTexture2D(FramebufferTarget.Framebuffer, fboTexture,
TextureTarget.Texture2D, texture.TextureID, level: 0);
//Begin rendering in Ortho 2D space
GL.MatrixMode(MatrixMode.Projection);
GL.PushMatrix();
GL.LoadIdentity();
GL.Ortho(viewport.Left, viewport.Right, viewport.Top, viewport.Bottom, -1.0, 1.0);
GL.MatrixMode(MatrixMode.Modelview);
GL.PushMatrix();
GL.LoadIdentity();
GL.PushAttrib(AttribMask.ViewportBit);
GL.Viewport(viewport);
//Render the scene - this is really simple I render some quads using shaders
RenderScene();
//Back to Perspective
GL.PopAttrib(); // pop viewport
GL.MatrixMode(MatrixMode.Projection);
GL.PopMatrix();
GL.MatrixMode(MatrixMode.Modelview);
GL.PopMatrix();
//Detach the texture
GL.FramebufferTexture2D(FramebufferTarget.Framebuffer, fboTexture, 0,
0, level: 0);
//Unbind the fbo
GL.BindFramebuffer(FramebufferTarget.Framebuffer, 0);
GL.PushMatrix();
GL.Color4(Color.Black.WithAlpha(128)); //Sets the color to (0,0,0,128) in a RGBA format
for (int i = 0; i < 5; i++)
{
GL.Translate(-1, -1, 0);
//Simple Draw method which binds the texture and draws a quad at (0;0) with
//its size
fboTexture.Draw();
}
GL.PopMatrix();
GL.Color4(Color.White);
fboTexture.Draw();
}
So I don't think there is actually anything wrong with the fbo and rendering onto the texture, because this is not causing the program to slow down on both of my cards. Previously I was initializing the fbo every frame and that might have been the reason for my Nvidia card to slow down, but now when I am pre-initializing everything I get the same FPS both with and without fbo.
I think the problem is not with the textures in general because if I disable textures and just render the untextured quads I get the same result. And still I think that my integrated card should run faster than 40 FPS when rendering only 7 quads on the screen, even if they cover the whole of it.
Can you give me some tips on how can I actually profile this and post back the result? That would be really useful.
Edit 2:
Ok I experimented a bit and managed to get much better performance. First I tried rendering the final quads with a shader - this didn't have any impact on performance as I expected.
Then I tried to run a profiler. But I far as I know SlimTune is just a CPU profiler and it didn't give me the results I wanted. Then I tried gDEBugger. It has an integration with visual studio which I later found out that it does not support .NET projects. I tried running the external version but it didn't seem to work (but maybe I just haven't played with it enough).
The thing that really did the trick was that rather than rendering the 7 quads directly to the screen I first render them on a texture, again using fbo, and then render the final texture once onto the screen. This got my fps from 40 to 120. Again this seem kind of curios to say the least. Why is rendering to a texture way faster than directly rendering to the screen? Nevertheless thanks for the help everyone - it seems that I have fixed my problem. I would really appreciate if someone come up with reasonable explanation of the situation.

Obviously this is a guess since I haven't seen or profiled your code, but I would guess that integrated cards are just struggling with your post-processing (drawing the texture several times to achieve your "shadow" effect).
I don't know your level of familiarity with these concepts, so sorry if I'm a bit verbose here.
About Post-Processing
Post-processing is the process of taking your completed scene, rendered to a texture, and applying effects to the image before displaying it on the screen. Typical uses of post-processing include:
Bloom - Simulate brightness more naturally by "bleeding" bright pixels into neighboring darker ones.
High Dynamic Range rendering - Bloom's big brother. The scene is rendered to a floating-point texture, allowing greater color ranges (as opposed to the usual 0 for black and 1 for full brightness). The final colors displayed on the screen are calculated using the average luminance of all the pixels on the screen. The effect of all of this is that the camera acts somewhat like the human eye - in a dark room, a bright light (say, through a window) looks extremely bright, but once you get outside, the camera adjusts and light only looks that bright if you stare directly at the sun.
Cel-shading - Colors are modified to give a cartoon-like appearance.
Motion blur
Depth of field - The in-game camera approximates a real one (or your eyes), where only objects at a certain distance are in-focus and the rest are blurry.
Deferred shading - A fairly advanced application of post-processing where lighting is calculated after the scene has been rendered. This costs a lot of video RAM (it usually uses several fullscreen textures) but allows a large number of lights to be added to the scene quickly.
In short, you can use post-processing for a lot of neat tricks. Unfortunately...
Post Processing Has a Cost
The cool thing about post-processing is that its cost is independent of the scene's geometric complexity - it will take the same amount of time whether you drew a million triangles or whether you drew a dozen. That's also its drawback, however. Even though you're only rendering a quad over and over to do post-processing, there is a cost for rendering each pixel. If you were to use a larger texture, the cost would be larger.
A dedicated graphics card obviously has far more computing resources to apply post-processing, whereas an integrated card usually has much fewer resources it can apply. It is for this reason that "low" graphics settings on video games often disable many post-processing effects. This wouldn't show up as a bottleneck on a CPU profiler because the delay happens on the graphics card. The CPU is waiting for the graphics card to finish before continuing your program (or, more accurately, the CPU is running another program while it waits for the graphics card to finish).
How Can You Speed Things Up?
Use fewer passes. If you halve the passes, you halve the time it takes to do post-processing. To that end,
Use shaders. Since I didn't see you mention them anywhere, I'm not sure if you're using shaders for your post-processing. Shaders essentially allow you to write a function in a C-like language (since you're in OpenGL, you can use either GLSL or Cg) which is run on every rendered pixel of an object. They can take any parameters you like, and are extremely useful for post-processing. You set the quad to be drawn using your shader, and then you can insert whatever algorithm you'd like to be run on every pixel of your scene.

Seeing some code would be nice. If the only difference between the two is using an external GPU or not, the difference could be in memory management (ie how and when you're creating an FBO, etc.), since streaming data to the GPU can be slow. Try moving anything that creates any sort of OpenGL buffer or sends any sort of data to it to initialization. I can't really give any more detailed advice without seeing exactly what you're doing.

It isn't just about number of quads you render, and I believe in your case it's got more to do with amout of triangle filling your video card has to do.
As was mentioned, the common way to do fullscreen post-processing is with shaders. If you want better performance on your integrated card and can't use shaders, then you should simplify your rendering routine.
Make sure you really need alpha blending. On some cards/drivers rendering textures with alpha channel can significantly reduce performance.
A somewhat low-quality way to reduce the amount of fullscreen filling would be to first perform all of your shadow draws on another, smaller texture (say, 256x256 instead of 1024x1024). Then you would draw a quad with that compound shadow texture onto your buffer. This way instead of 7 1024x1024 quads you would only need 6 256x256 and one 1024x1024. But you will lose in resolution.
Another technique, and I'm not sure it can be applied in your case, is to pre-render your complex background so you'll have to do less drawing in your rendering loop.

Remove Kinect depth shadow

I've recently started hacking on my Kinect and I want to remove the depth shadow. The shadow is caused by the IR emitter being positioned slightly to the side of the camera, so any close object will get a big shadow and distant object less or no shadow.
The shadow length is related to the distance between the closest and the farthest spot on each side of the shadow.
My goal is to be able to map the color image correctly onto the depth. This doesn't work without processing the shadow as this picture shows:

Does the depth shadow always come out black?
If so you could use a simple method like a temporal median to calculate the background of the image (more info here: http://www.roborealm.com/help/Temporal_Median.php) and then whenever a pixel is black, set it to the background value at that pixel location.

I did some preliminary work on this problem a few weeks ago. My code works directly on a WriteableBitmap rather than the depth data, but if you're only doing image processing, it should work. The algorithm isn't perfect and would benefit with some more tweaking. If you update the code at all, let me know; I'd be very interested to see what you're doing!
The source code is posted on my blog:
http://richardpianka.com/2011/02/trackingni-depth-correction/

I don't know how it is with c# but openni c++ has a function called xnSetViewPoint() the only problem is, you lose the top 20 or so rows of imagedata due to the transformation.
the reason is due to using two different sensors, which are placed close by each other but not exactly at the same position.

Kinect Method - MapDepthFrametoColorFrame
Get the [x,y] positions in the depth frame, and use that method to fill in

I'm sorry to say but that shadow is caused by your body blocking the inferred dots from hitting that spot of the room so it creates a black spot... Nothing you can do but change the base background to a different color other than black so it won't be a noticeable shadow

The color camera and kinect depth camera dont have the same dimensions, and origin of the infra red dots are not from the same cam, its a IR projector a few cm aside from it (as that displacement is used to calculate depth).
However the solution seams easy here, your shadow data is on the left side.
so you need to extend the last known color data before it went black.
And to fit it better move translate the color cam data to the right.

Matrix Multiplication To Rotate An Image In C#

I need to write a program that uses matrix multiplication to rotate an image (a simple square), based on the center of the square, a certain amount of degree based on what I need. Any help on this would be greatly appreciated. I almost have no clue as to what I'm doing because I have not taken so much as a glance at Calculus.

Take a look at http://www.aforgenet.com/framework/. This is a complete image processing framework in C# that I'm using on a project. I just checked their help and they have a function that does what you want -
// create filter - rotate for 30 degrees keeping original image size
RotateBicubic filter = new RotateBicubic( 30, true );
// apply the filter
Bitmap newImage = filter.Apply( image );
It is an LGPL library, so if licensing is an issue, if you link against their binaries, you will have no issues. Their are also other libraries out there.
If you do decide to write it yourself, be careful about speed as C# doing number crunching is not great. But there are ways to work around it.

Here's a good code project article discussing just what you're wanting:
http://www.codeproject.com/KB/GDI-plus/matrix_transformation.aspx

Rotating an digital image in the plane boils down to a lot of 2X2 matrix multiplications. There's no calculus involved here! You don't need an entire image processing framework to rotate a square image - unless this is really performance sensitive in terms of image quality and speed.
Go and read the first half of Wikipedia's article on the rotation matrix and that should get you off to a good start.
In a nutshell, establish your origin (perhaps the center of the image if that's where you want to rotate around), then compute in pixel space the coordinate of a pixel you'd like to rotate, and multiply by your rotation matrix (see article.). Once you've done the multiply, you'll have your new coordinates of the pixel in pixel space. Write out that pixel in another image buffer and you'll be off and rotating. Repeat. Note that once you know your angle of rotation, you only need compute your rotation matrix once!
Have fun,
Paul

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.