C++ AMP calculations and WPF rendering graphics card dual use performance

C++ AMP calculations and WPF rendering graphics card dual use performance - c#

Situation:
In an application that has both the need for calculation as well as rendering images (image preprocessing and then displaying) I want to use both AMP and WPF (with AMP doing some filters on the images and WPF not doing much more than displaying scaled/rotated images and some simple overlays, both running at roughly 30fps, new images will continuously stream in).
Question:
Is there any way to find out how the 2 will influence each other?
I am wondering on whether I will see the hopefully nice speed-up I will see in an isolated AMP only environment in the actual application later on as well.
Additional Info:
I will be able and am going to measure the AMP performance separately, since it is low level and new functionality that I am going to set up in a separate project anyway. The WPF rendering part already exists though in a complex application, so it would be difficult to isolate that.
I am not planning on doing the filters etc for rendering only since the results will be needed in intermediate levels as well (other algorithms, e. g. edge detection, saving, ...).

There are a couple of things you should consider here;
Is there any way to find out how the 2 will influence each other?
Directly no, but indirectly yes. Both WPF and AMP are making use of the GPU for rendering. If the AMP portion of your application uses too much of the GPU's resources it will interfere with your frame rate. The Cartoonizer case study from the C++ AMP book uses MFC and C++ AMP to do exactly the way you describe. On slower hardware with high image processing loads you can see the application's responsiveness suffer. However in almost all cases cartoonizing images on the GPU is much faster and can achieve video frame rates.
I am wondering on whether I will see the hopefully nice speed-up
With any GPU application the key to seeing performance improvements is that the speedup from running compute on the GPU, rather than the CPU, must make up for the additional overhead of copying data to and from the GPU.
In this case there is additional overhead as you must also marshal data from the native (C++ AMP) to managed (WPF) environments. You need to take care to do this efficiently by ensuring that your data types are blitable and do not require explicit marshaling. I implemented an N-body modeling application that used WPF and native code.
Ideally you would want to render the results of the GPU calculation without moving it through the CPU. This is possible but not if you explicitly involve WPF. The N-body example achieves this by embedding a DirectX render area directly into the WPF and then renders data directly from the AMP arrays. This was largely because the WPF viewport3D really didn't meet my performance needs. For rendering images WPF may be fine.
Unless things have changed with VS 2013 you definitely want your C++ AMP code in a separate DLL as there are some limitations when using native code in a C++/CLI project.
As #stijn suggests I would build a small prototype to make sure that the gains you get by moving some of the compute to the GPU are not lost due to the overhead of moving data both to and from the GPU but also into WPF.

Related

IIS & service-based generation of web images - GDI, GDI+ or Direct2D?

It is May 2017. My boss has asked me to produce some code to make some custom web images on our website based on text that the user enters into their browser.
The Server environment is Windows 2012 running IIS, and I am familiar with C#. From what I read I should be able to use GDI+ to create images, draw smooth text into them etc.
However, one of my colleagues suggested GDI+ may not work on Windows Server, and that GDI+ is based on older GDI which is 32-bit and will therefore be scrapped one day soon, and that I should use DirectX instead. I feel that to introduce another layer would make matters more complex to write & support.
There are a lot of web pages discussing these subjects as well as performance of each but it feels inconclusive so I ask for experience from the SO community.
So, question: Will GDI work on Windows Server ?
EDIT: Thanks for the responses. I see from them that I was a tad vague on a couple of points. Specifically, we are intending the rendering to image process to be a queue-based process with a service running the GDI+ graphics code. I have just read this from 2013 which suggests that GDI+ should not be run within a service, and suggesting that Direct2D is the MS preferred way-to-go.
EDIT 2: Further research has found this page. It says the options are GDI, GDI+ or Direct2D. I copy the key paras here, though the entire page is a quick read so view at source for context if you can.
Options for Available APIs
There are three options for server-side rendering: GDI, GDI+ and
Direct2D. Like GDI and GDI+, Direct2D is a native 2D rendering API
that gives applications more control over the use of graphics devices.
In addition, Direct2D uniquely supports a single-threaded and a
multithreaded factory. The following sections compare each API in
terms of drawing qualities and multithreaded server-side rendering.
GDI
Unlike Direct2D and GDI+, GDI does not support high-quality
drawing features. For instance, GDI does not support antialiasing for
creating smooth lines and has only limited support for transparency.
Based on the graphics performance test results on Windows 7 and
Windows Server 2008 R2, Direct2D scales more efficiently than GDI,
despite the redesign of locks in GDI. For more information about these
test results, see Engineering Windows 7 Graphics Performance. In
addition, applications using GDI are limited to 10240 GDI handles per
process and 65536 GDI handles per session. The reason is that
internally Windows uses a 16-bit WORD to store the index of handles
for each session.
GDI+*
While GDI+ supports antialiasing and alpha
blending for high-quality drawing, the main problem with GDI+ for
server-scenarios is that it does not support running in Session 0.
Since Session 0 only supports non-interactive functionality, functions
that directly or indirectly interact with display devices will
therefore receive errors. Specific examples of functions include not
only those dealing with display devices, but also those indirectly
dealing with device drivers. Similar to GDI, GDI+ is limited by its
locking mechanism. The locking mechanisms in GDI+ are the same in
Windows 7 and Windows Server 2008 R2 as in previous versions.
Direct2D
Direct2D is a hardware-accelerated, immediate-mode, 2-D graphics API
that provides high performance and high-quality rendering. It offers a
single-threaded and a multithreaded factory and the linear scaling of
course-grained software rendering. To do this, Direct2D defines a root
factory interface. As a rule, an object created on a factory can only
be used with other objects created from the same factory. The caller
can request either a single-threaded or a multithreaded factory when
it is created. If a single-threaded factory is requested, then no
locking of threads is performed. If the caller requests a
multithreaded factory, then, a factory-wide thread lock is acquired
whenever a call is made into Direct2D. In addition, the locking of
threads in Direct2D is more granular than in GDI and GDI+, so that the
increase of the number of threads has minimal impact on the
performance.
After some discussion of threading and some sample code, it concludes...
Conclusion
As seen from the above, using Direct2D for server-side rendering is simple and straightforward. In addition, it provides high quality and highly parallelizable rendering that can run in low-privilege environments of the server.
Whilst I interpret the slant of the piece as being pro-Direct2D, the points on locking and session-0 for GDI+ are concerning. Arguably, since we propose a queue-based process, the locking issue is less severe, but if there are immediate and practical restrictions to what a service can do with GDI+ then it looks like Direct2D is the only viable route for my project.
Did I interpret this correctly or has the SO community more recent & relevant experience?
EDIT: With the initial batch of responses slowing up and no sign of a definitive answer, I add this edit. The team here has selected sharpdx as a wrapping library to MS DirectWrite which is itself part of the Direct3D family of API's. We are not 100% certain that sharpdx will be required and we will be comparing it to a solely DirectWrite implementation as we go along looking out for the benefit or hindrance the extra layer represents. We believe at this point in time that this follows the direction MS were trying to suggest in the article sampled above, and that we will be free of GDI/+ shortcomings in a service environment and able to benefit from performance and feature gains in DirectWrite. We shall see.
EDIT: Having delved into SharpDx we are making progress and something mentioned by Mgetz about 'WARP' now makes sense. Direct3D is the underpinning tech we access via the SharpDX API. AS with all low-level graphics work, we request a device context (aka dc), then a drawing surface, then we draw. The device context part is where WARP comes in. A dc is usually fronting a hardware device - but in my project I am targeting a service on a server where it is unlikely that there will be a graphics processor, and maybe not even a video card. If it is a virtual server then the video processor may be shared etc. So I don't want to be tied to a 'physical' hardware device. Enter WARP (good time to view the link for full context), which is an entirely software realisation of a dc - no hardware dependency. Sweet. Here is an extract from the linked page:
Enabling Rendering When Direct3D 10 Hardware is Not Available
WARP allows fast rendering in a variety of situations where hardware
implementations are unavailable, including:
When the user does not have any Direct3D-capable hardware When an application runs as a service or in a server environment
When a video card is not installed
When a video driver is not available, or is not working correctly
When a video card is out of memory, hangs, or would take too many system resources to initialize

In your case, I would probably try to go with SkiaSharp (https://github.com/mono/SkiaSharp) to abstract a bit from the platform/API details

GDI/GDI+ in WPF applications: performance and good praticles

There are some GDI's objects to do some works with images in WPF, but these objects generate memory leaks easily and other errors (i.e MILERR_WIN32ERROR).
What would be high level alternatives to do the same work without using GDI?
Would be GDI bad for performance in a WPF application, once WPF uses DirectX beneath?

What would be high level alternatives to do the same work without using GDI?
It really depends, but ideally, you'd do the work using WPF's api instead.
Would be GDI bad for performance in a WPF application, once WPF uses DirectX beneath?
There's always going to be extra conversion between WPF's image formats and System.Drawing, as WPF doesn't use GDI. This is going to cause some extra overhead to map back and forth.

Displaying bitmaps rapidly

I have a WPF application where I need to add a feature that will display a series of full screen bitmaps very fast. In most cases it will be only two images, essentially toggling the two images. The rate they are displayed should be constant, around 10-20ms per image. I tried doing this directly in WPF using a timer, but the display rate appeared to vary quit a bit. I also tried using SharpGL (a .Net wrapper on OpenGL), but it was very slow when using large images (I may not have been doing the best way). I will have all the bitmaps upfront, before compile time, so the format could be changed as long as the pixels are not altered.
What would be the best way to do this?
I'm already behind schedule so I don't have time to learn lots of APIs or experiment with lots of options.

"I tried doing this directly in WPF using a timer, but the display rate appeared to vary quit a bit."
Instead of using a Timer, use Thread.Sleep(20) as it wont hog as many system resources. This should give you an immediate improvement.
It also sounds as though there will be user interaction with the application while the images are rapidly toggling, in this case put the code that toggles the images in a background thread. Remember though that the UI- is not thread safe.
These are just quick wins, but you might need to use DirectX for Hardware Acceleration to get around HAL:
The Windows' Hardware Abstraction Layer (HAL) is implemented in
Hal.dll. The HAL implements a number of functions that are
implemented in different ways by different hardware platforms, which
in this context, refers mostly to the Chipset. Other components in the
operating system can then call these functions in the same way on all
platforms, without regard for the actual implementation.

High performance graphics using the WPF Visual layer

I am creating a WPF mapping program which will potentially load and draw hundreds of files to the screen at any one time, and a user may want to zoom and pan this display. Some of these file types may contain thousands of points, which would most likely be connected as some kind of path. Other supported formats will include TIFF files.
Is it better for performance to have a single DrawingVisual to which all data is drawn, or should I be creating a new DrawingVisual for each file loaded?
If anyone can offer any advice on this it would be much appreciated.

You will find lots of related questions on Stack Overflow, however not all of them mention that one of the most high-performance ways to draw large amounts of data to the screen is to use the WriteableBitmap API. I suggest taking a look at the WriteableBitmapEx open source project on codeplex. Disclosure, I have contributed to this once, but it is not my library.
Having experimented with DrawingVisual, StreamGeometry, OnRender, Canvas, all these fall over once you have to draw 1,000+ or more "objects" to the screen. There are techniques that deal with the virtualization of a canvas (there' a million items demo with Virtualized Canvas) but even this is limited to the ~1000 visible at one time before slow down. WriteableBitmap allows you to access a bitmap directly and draw on that (oldskool style) meaning you can draw tens of thousands of objects at speed. You are free to implement your own optimisations (multi-threading, level of detail) but do note you don't get much frills with that API. You literally are doing the work yourself.
There is one caveat though. While WPF uses the CPU for tesselation / GPU for rendering, WriteableBitmap will use CPU for everything. Therefore the fill-rate (number of pixels rendered per frame) becomes the bottleneck depending on your CPU power.
Failing that if you really need high-performance rendering, I'd suggest taking a look at SharpDX (Managed DirectX) and the interop with WPF. This will give you the highest performance as it will directly use the GPU.

Using many small DrawingVisuals with few details rendered per visual gave better performance in my experience compared to less DrawingVisuals with more details rendered per visual. I also found that deleting all of the visuals and rendering new visuals was faster than reusing existing visuals when a redraw was required. Breaking each map into a number of visuals may help performance.
As with anything performance related, conducting timing tests with your own scenarios is the best way to be sure.

Is there any advantage to using C++ instead of C# when using Direct3D?

Is there any advantage to using C++ instead of C# when using Direct3D? The tutorials I've found for C++ and DirectX all use Direct3D (which to my knowledge is managed). Similarly, all of the C# tutorials I've found are for Direct3D.
Is Direct3D managed?
Is there any difference between using D3D in either of the two languages?

DirectX is entirely native. Any impression you may have that it's managed is completely and utterly wrong. There are managed wrappers that will allow you to use DirectX from managed code. In addition, DirectX is programmed to be accessed from C++ or C, or similar languages. If you look at the SlimDX project, they encountered numerous issues, especially due to resource collection, because C# doesn't genuinely support non-memory resources being automatically collected and using doesn't cut the mustard. In addition, game programming can be very CPU-intensive, and often, the additional performance lost by using a managed language is untenable, and virtually all existing supporting libraries are for C or C++.
If you want to make a small game, or something like that, there's nothing at all stopping you from using managed code. However, I know of no commercial games that actually take this route.

The point of Direct3D is to move rendering off the CPU and onto the GPU. If there were to be a significant performance difference it would be for that code that runs on the CPU. Therefore I don't see that there should be any significant performance difference between native and managed code for the part of your code that interfaces with Direct3D.
Direct3D itself is not managed code.

It depends on what you're doing exactly. As David Heffernan mentioned, one of the objectives of Direct3D is to move as much processing as possible to the GPU. With the advent of vertex shaders, pixel shaders, and much more, we're closer to that reality than ever.
Of course given infinite time and resources, you can usually create more efficient algorithms in C++ than C#. This will affect performance at the CPU level. Today, processing that is not graphics related is still mostly done on the CPU. There are things like CUDA, OpenCL, and even future versions of DirectX which will open up possibilities of moving any parallel-friendly algorithm to the GPU as well. But the adoption rate of those technologies (and the video cards that support it) isn't exactly mainstream just yet.
So what types of CPU-intensive algorithms should you consider C++ for?
Artificial Intelligence
Particle engines / n-body simulations
Fast Fourier transform
Those are just the first things I can think of. At least the first two are very common in games today. AI is often done in a compromised fashion in games to run as quickly as possible, simply because it can be so processor intensive. And then particle engines are everywhere.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.