Unsafe C# and pointers for 2D rendering, good or bad? - c#

I am writing a C# control that wraps DirectX 9 and provides a simplified interface to perform 2D pixel level drawing. .NET requires that I wrap this code in an unsafe code block and compile with the allow unsafe code option.
I'm locking the entire surface which then returns a pointer to the locked area of memory. I can then write pixel data directly using "simple" pointer arithmetic. I have performance tested this and found a substantial speed improvement over other "safe" methods I know of.
Is this the fastest way to manipulate individual pixels in a C# .NET application? Is there a better, safer way? If there was an equally fast approach that does not require pointer manipulation it would be my preference to use that.
(I know this is 2008 and we should all be using DirectX 3D, OpenGL, etc., however this control is to be used exclusively for 2D pixel rendering and simply does not require 3D rendering.)

Using unsafe pointers is the fastest way to do direct memory manipulation in C# (definitely faster than using the Marshal wrapper functions).
Just out of curiosity, what sort of 2D drawing operations are you trying to perform?
I ask because locking a DirectX surface to do pixel level manipulations will defeat most of the hardware acceleration benefits that you would hope to gain from using DirectX. Also, the DirectX device will fail to initialize when used over terminal services (remote desktop), so the control will be unusable in that scenario (this may not matter to you).
DirectX will be a big win when drawing large triangles and transforming images (texture mapped onto a quad), but it won't really perform that great with single pixel manipulation.
Staying in .NET land, one alternative is to keep around a Bitmap object to act as your surface, using LockBits and directly accessing the pixels through the unsafe pointer in the returned BitmapData object.

Yes, that is probably the fastest way.
A few years ago I had to compare two 1024x1024 images at the pixel level; the get-pixel methods took 2 minutes, and the unsafe scan took 0.01 seconds.

I have also used unsafe to speed up things of that nature. The performance improvements are dramatic, to say the least. The point here is that unsafe turns off a bunch of things that you might not need as long as you know what you're doing.
Also, check out DirectDraw. It is the 2D graphics component of DirectX. It is really fast.

I recently was tasked with creating a simple histogram control for one of our thin client apps (C#). The images that I was analyzing were about 1200x1200 and I had to go the same route. I could make the thing draw itself once with no problem, but the control needed to be re-sizable. I tried to avoid it, but I had to get at the raw memory itself.
I'm not saying it is impossible using the standard .NET classes, but I couldn't get it too work in the end.

Related

C++ AMP calculations and WPF rendering graphics card dual use performance

Situation:
In an application that has both the need for calculation as well as rendering images (image preprocessing and then displaying) I want to use both AMP and WPF (with AMP doing some filters on the images and WPF not doing much more than displaying scaled/rotated images and some simple overlays, both running at roughly 30fps, new images will continuously stream in).
Question:
Is there any way to find out how the 2 will influence each other?
I am wondering on whether I will see the hopefully nice speed-up I will see in an isolated AMP only environment in the actual application later on as well.
Additional Info:
I will be able and am going to measure the AMP performance separately, since it is low level and new functionality that I am going to set up in a separate project anyway. The WPF rendering part already exists though in a complex application, so it would be difficult to isolate that.
I am not planning on doing the filters etc for rendering only since the results will be needed in intermediate levels as well (other algorithms, e. g. edge detection, saving, ...).
There are a couple of things you should consider here;
Is there any way to find out how the 2 will influence each other?
Directly no, but indirectly yes. Both WPF and AMP are making use of the GPU for rendering. If the AMP portion of your application uses too much of the GPU's resources it will interfere with your frame rate. The Cartoonizer case study from the C++ AMP book uses MFC and C++ AMP to do exactly the way you describe. On slower hardware with high image processing loads you can see the application's responsiveness suffer. However in almost all cases cartoonizing images on the GPU is much faster and can achieve video frame rates.
I am wondering on whether I will see the hopefully nice speed-up
With any GPU application the key to seeing performance improvements is that the speedup from running compute on the GPU, rather than the CPU, must make up for the additional overhead of copying data to and from the GPU.
In this case there is additional overhead as you must also marshal data from the native (C++ AMP) to managed (WPF) environments. You need to take care to do this efficiently by ensuring that your data types are blitable and do not require explicit marshaling. I implemented an N-body modeling application that used WPF and native code.
Ideally you would want to render the results of the GPU calculation without moving it through the CPU. This is possible but not if you explicitly involve WPF. The N-body example achieves this by embedding a DirectX render area directly into the WPF and then renders data directly from the AMP arrays. This was largely because the WPF viewport3D really didn't meet my performance needs. For rendering images WPF may be fine.
Unless things have changed with VS 2013 you definitely want your C++ AMP code in a separate DLL as there are some limitations when using native code in a C++/CLI project.
As #stijn suggests I would build a small prototype to make sure that the gains you get by moving some of the compute to the GPU are not lost due to the overhead of moving data both to and from the GPU but also into WPF.

Fast copying of GUI in C#?

Right now I'm copying window graphics from one window to another via BitBlt from WinApi. I wonder if there is any other fast / faster way to do the same in C#.
Keyword is performance here. If I should stay with WinApi I would hold HDC in memory for quick drawing and if .NET Framework has other possibilities I would probably hold Graphics objects. Right now I'm something to slow when I have to copy ~ 1920x1080 windows.
So how can I boost performance of gui copying in C# ?
I just want to know if I can get better than this. Explicit hardware acceleration (OpenGL, DirectX) is out of my interest here. I decided to stay pure .NET + WinApi.
// example to copy desktop to window
Graphics g = Graphics.FromHwnd(Handle);
IntPtr dc = g.GetHdc();
IntPtr dc0 = Windows.GetWindowDC(Windows.GetDesktopWindow());
Windows.BitBlt(dc, 0, 0, Width, Height, dc0, 0, 0, Windows.SRCCOPY);
// clean up of DCs and Graphics left out
Hans's questions:
How slow is it?
Too slow. Feels (very) stiff.
How fast does it need to be?
The software mustn't feel slow.
Why is it important?
User friendliness of software.
What does the hardware look like?
Any random PC.
Why can't you solve it with better hardware?
It's just software that runs on Windows machines. You won't goy and buy a new PC for a random software that runs slow on your old ?
Get a better video card!
The whole point of the GDI, whether you access it from native code or via .Net, is that it abstracts the details of the graphics subsystem. The downside being that the low level operations such as blitting are in the hands of the graphic driver writers. You can safely assume that these are as optimised as it's possible to get (after all, a video card manufacturer wants to make their card look the best).
The overhead of using the .Net wrappers instead of using the native calls directly pale into insignificance compared to the time spent doing the operation itself, so you're not going to gain much really.
Your code is doing a non-scaling, no blending copy which is possibly the fastest way to copy images.
Of course, you should be profiling the code to see what effect any changes you do to the code is having.
The question then is why are you copying such large images from one window to another? Do you control the contents of both windows?
Update
If you control both windows, why not draw to a single surface and then blit that to both windows? This should replace the often costly operation of reading data from the video card (i.e. blitting from one window to another) with two write operations. It's just a thought and might not work but you'd need timing data to see if it makes any difference.

Is there any advantage to using C++ instead of C# when using Direct3D?

Is there any advantage to using C++ instead of C# when using Direct3D? The tutorials I've found for C++ and DirectX all use Direct3D (which to my knowledge is managed). Similarly, all of the C# tutorials I've found are for Direct3D.
Is Direct3D managed?
Is there any difference between using D3D in either of the two languages?
DirectX is entirely native. Any impression you may have that it's managed is completely and utterly wrong. There are managed wrappers that will allow you to use DirectX from managed code. In addition, DirectX is programmed to be accessed from C++ or C, or similar languages. If you look at the SlimDX project, they encountered numerous issues, especially due to resource collection, because C# doesn't genuinely support non-memory resources being automatically collected and using doesn't cut the mustard. In addition, game programming can be very CPU-intensive, and often, the additional performance lost by using a managed language is untenable, and virtually all existing supporting libraries are for C or C++.
If you want to make a small game, or something like that, there's nothing at all stopping you from using managed code. However, I know of no commercial games that actually take this route.
The point of Direct3D is to move rendering off the CPU and onto the GPU. If there were to be a significant performance difference it would be for that code that runs on the CPU. Therefore I don't see that there should be any significant performance difference between native and managed code for the part of your code that interfaces with Direct3D.
Direct3D itself is not managed code.
It depends on what you're doing exactly. As David Heffernan mentioned, one of the objectives of Direct3D is to move as much processing as possible to the GPU. With the advent of vertex shaders, pixel shaders, and much more, we're closer to that reality than ever.
Of course given infinite time and resources, you can usually create more efficient algorithms in C++ than C#. This will affect performance at the CPU level. Today, processing that is not graphics related is still mostly done on the CPU. There are things like CUDA, OpenCL, and even future versions of DirectX which will open up possibilities of moving any parallel-friendly algorithm to the GPU as well. But the adoption rate of those technologies (and the video cards that support it) isn't exactly mainstream just yet.
So what types of CPU-intensive algorithms should you consider C++ for?
Artificial Intelligence
Particle engines / n-body simulations
Fast Fourier transform
Those are just the first things I can think of. At least the first two are very common in games today. AI is often done in a compromised fashion in games to run as quickly as possible, simply because it can be so processor intensive. And then particle engines are everywhere.

Translating C to C# and HLSL: will this be possible?

I've taken on quite a daunting challenge for myself. In my XNA game, I want to implement Blargg's NTSC filter. This is a C library that transforms a bitmap to make it look like it was output on a CRT TV with the NTSC standard. It's quite accurate, really.
The first thing I tried, a while back, was to just use the C library itself by calling it as a dll. Here I had two problems, 1. I couldn't get some of the data to copy correctly so the image was messed up, but more importantly, 2. it was extremely slow. It required getting the XNA Texture2D bitmap data, passing it through the filter, and then setting the data again to the texture. The framerate was ruined, so I couldn't go down this route.
Now I'm trying to translate the filter into a pixel shader. The problem here (if you're adventurous to look at the code - I'm using the SNES one because it's simplest) is that it handles very large arrays, and relies on interesting pointer operations. I've done a lot of work rewriting the algorithm to work independently per pixel, as a pixel shader will require. But I don't know if this will ever work. I've come to you to see if finishing this is even possible.
There's precalculated array involved containing 1,048,576 integers. Is this alone beyond any limits for the pixel shader? It only needs to be set once, not once per frame.
Even if that's ok, I know that HLSL cannot index arrays by a variable. It has to unroll it into a million if statements to get the correct array element. Will this kill the performance and make it a fruitless endeavor again? There are multiple array accesses per pixel.
Is there any chance that my original plan to use the library as is could work? I just need it to be fast.
I've never written a shader before. Is there anything else I should be aware of?
edit: Addendum to #2. I just read somewhere that not only can hlsl not access arrays by variable, but even to unroll it, the index has to be calculable at compile time. Is this true, or does the "unrolling" solve this? If it's true I think I'm screwed. Any way around that? My algorithm is basically a glorified version of "the input pixel is this color, so look up my output pixel values in this giant array."
From my limited understanding of Shader languages, your problem can easily be solved by using texture instead of array.
Pregenerate it on CPU and then save as texture. 1024x1024 in your case.
Use standard texture access functions as if texture was the array. Posibly using nearest-neighbor to limit blendinding of individual pixels.
I dont think this is possible if you want speed.

XOR Drawing in C#

I am a trying to learn C# .Net.
I had written small (hobby) Analog Clock application in VB sometime ago(edit: VB6, to be precise), and I thought I will rewrite in C#.NET, as part of my learning process.
In the VB application, I drew the hands of the clock in XOR Drawmode, so that I have to move the second hand, I just had to redraw it in the previous position and then draw the current position - I need not refresh the whole Form. All I did was
Me.DrawMode = vbNotXorPen
and then
Me.Line...
on a VB Form
In C# I don't find an equivalent Xor Pen Draw mode.
I found
ControlPaint.DrawReversibleLine
somewhere on the net, but I am not sure whether ControlPaint is meant for such purposes (and I don't understand based on what co-ordinate system ControlPaint is drawing)
Is there an equivalent to XOR drawing in C#.NET?
Or is there a better way to do what I am doing (with the best performance)
(Both VB and C# are my hobbies. So feel free to correct me wherever I am wrong)
.NET/GDI+ does not support XOR drawing. You'll have to workaround it by using p/invoke calls of several native functions.
See the link below for more information
http://www.vbaccelerator.com/home/net/code/libraries/Graphics/ZoomIn/article.asp
IMHO, until and unless you are targeting some really slow computers, you don't need to optimize performance by using XOR technique.
Since you'd be drawing the second hand only once in a second, a complete redraw of the clock would be much better. Also, the second hand will "look" good if drawn directly, and use smoothing mode set to Anti alias for a more cleaner look.
To optimize performance, you can create a bmp for clock every one minute and then draw the second hand upon it.

Categories