I'm working on a new OpenGL application, and am aware that display lists will be deprecated in OpenGL 3.1 (along with many other useful features, which (to me) seems kind of silly) and replaced with Vertex Buffer Objects. I successfully drew a triangle using VBOs on an NVidia card, but the example failed to run on an Intel chip on my netbook, because it does not support glGenBuffers. It seems there is a crucial flaw here in OpenGL (a compatibility breakage between newer and older GPUs/GMAs). As a small business, compatibility with as many systems as possible is necessary for my game, but I don't want my program to not work on newer graphics cards (due to its dependency on display lists, which have been removed from the OpenGL 4.1 specification). Which would give me the widest support on graphics cards (old AND new). Display lists, or vertex buffer objects?
If your application will run on GMA, you necessarily have a low poly count. So the inefficiency of having display lists emulated in the drivers for new video cards will not be a problem, they have bandwidth to spare.
If you're still concerned about efficiency, make sure to use glVertexPointer/glDrawArray to maximize batch size. This can be combined with display lists, but reduces the number of separate operations in the list and therefore makes emulation less problematic.
Worst case, if some platform really doesn't support display lists, you can replace glCallList with a function call.
Your Intel card does support VBOs but only through the ARB interfaces. Try glGenBuffersARB (along with converting all your other VBO code to use the ARB versions). It will work on nVidia and Intel GMA.
Alternatively you can query for the OpenGL version supported the system and use DisplayLists or VBO's accordingly instead of relying on one method to be compatible with every OpenGL version. This way you can be certain that you are using the most up-to-date, most efficient and fastest way of drawing, whatever method (displaylists/VBO's) is supported by the system.
Also, such an implementation can be easily extended to support future drawing methods in, say, OpenGL 4.4 or OpenGL 5.0.
However, this can result in multiple pieces of code doing, functionally, the same thing: tell the GPU what to do which might increase code-size and complexity.
Related
It is May 2017. My boss has asked me to produce some code to make some custom web images on our website based on text that the user enters into their browser.
The Server environment is Windows 2012 running IIS, and I am familiar with C#. From what I read I should be able to use GDI+ to create images, draw smooth text into them etc.
However, one of my colleagues suggested GDI+ may not work on Windows Server, and that GDI+ is based on older GDI which is 32-bit and will therefore be scrapped one day soon, and that I should use DirectX instead. I feel that to introduce another layer would make matters more complex to write & support.
There are a lot of web pages discussing these subjects as well as performance of each but it feels inconclusive so I ask for experience from the SO community.
So, question: Will GDI work on Windows Server ?
EDIT: Thanks for the responses. I see from them that I was a tad vague on a couple of points. Specifically, we are intending the rendering to image process to be a queue-based process with a service running the GDI+ graphics code. I have just read this from 2013 which suggests that GDI+ should not be run within a service, and suggesting that Direct2D is the MS preferred way-to-go.
EDIT 2: Further research has found this page. It says the options are GDI, GDI+ or Direct2D. I copy the key paras here, though the entire page is a quick read so view at source for context if you can.
Options for Available APIs
There are three options for server-side rendering: GDI, GDI+ and
Direct2D. Like GDI and GDI+, Direct2D is a native 2D rendering API
that gives applications more control over the use of graphics devices.
In addition, Direct2D uniquely supports a single-threaded and a
multithreaded factory. The following sections compare each API in
terms of drawing qualities and multithreaded server-side rendering.
GDI
Unlike Direct2D and GDI+, GDI does not support high-quality
drawing features. For instance, GDI does not support antialiasing for
creating smooth lines and has only limited support for transparency.
Based on the graphics performance test results on Windows 7 and
Windows Server 2008 R2, Direct2D scales more efficiently than GDI,
despite the redesign of locks in GDI. For more information about these
test results, see Engineering Windows 7 Graphics Performance. In
addition, applications using GDI are limited to 10240 GDI handles per
process and 65536 GDI handles per session. The reason is that
internally Windows uses a 16-bit WORD to store the index of handles
for each session.
GDI+*
While GDI+ supports antialiasing and alpha
blending for high-quality drawing, the main problem with GDI+ for
server-scenarios is that it does not support running in Session 0.
Since Session 0 only supports non-interactive functionality, functions
that directly or indirectly interact with display devices will
therefore receive errors. Specific examples of functions include not
only those dealing with display devices, but also those indirectly
dealing with device drivers. Similar to GDI, GDI+ is limited by its
locking mechanism. The locking mechanisms in GDI+ are the same in
Windows 7 and Windows Server 2008 R2 as in previous versions.
Direct2D
Direct2D is a hardware-accelerated, immediate-mode, 2-D graphics API
that provides high performance and high-quality rendering. It offers a
single-threaded and a multithreaded factory and the linear scaling of
course-grained software rendering. To do this, Direct2D defines a root
factory interface. As a rule, an object created on a factory can only
be used with other objects created from the same factory. The caller
can request either a single-threaded or a multithreaded factory when
it is created. If a single-threaded factory is requested, then no
locking of threads is performed. If the caller requests a
multithreaded factory, then, a factory-wide thread lock is acquired
whenever a call is made into Direct2D. In addition, the locking of
threads in Direct2D is more granular than in GDI and GDI+, so that the
increase of the number of threads has minimal impact on the
performance.
After some discussion of threading and some sample code, it concludes...
Conclusion
As seen from the above, using Direct2D for server-side rendering is simple and straightforward. In addition, it provides high quality and highly parallelizable rendering that can run in low-privilege environments of the server.
Whilst I interpret the slant of the piece as being pro-Direct2D, the points on locking and session-0 for GDI+ are concerning. Arguably, since we propose a queue-based process, the locking issue is less severe, but if there are immediate and practical restrictions to what a service can do with GDI+ then it looks like Direct2D is the only viable route for my project.
Did I interpret this correctly or has the SO community more recent & relevant experience?
EDIT: With the initial batch of responses slowing up and no sign of a definitive answer, I add this edit. The team here has selected sharpdx as a wrapping library to MS DirectWrite which is itself part of the Direct3D family of API's. We are not 100% certain that sharpdx will be required and we will be comparing it to a solely DirectWrite implementation as we go along looking out for the benefit or hindrance the extra layer represents. We believe at this point in time that this follows the direction MS were trying to suggest in the article sampled above, and that we will be free of GDI/+ shortcomings in a service environment and able to benefit from performance and feature gains in DirectWrite. We shall see.
EDIT: Having delved into SharpDx we are making progress and something mentioned by Mgetz about 'WARP' now makes sense. Direct3D is the underpinning tech we access via the SharpDX API. AS with all low-level graphics work, we request a device context (aka dc), then a drawing surface, then we draw. The device context part is where WARP comes in. A dc is usually fronting a hardware device - but in my project I am targeting a service on a server where it is unlikely that there will be a graphics processor, and maybe not even a video card. If it is a virtual server then the video processor may be shared etc. So I don't want to be tied to a 'physical' hardware device. Enter WARP (good time to view the link for full context), which is an entirely software realisation of a dc - no hardware dependency. Sweet. Here is an extract from the linked page:
Enabling Rendering When Direct3D 10 Hardware is Not Available
WARP allows fast rendering in a variety of situations where hardware
implementations are unavailable, including:
When the user does not have any Direct3D-capable hardware When an application runs as a service or in a server environment
When a video card is not installed
When a video driver is not available, or is not working correctly
When a video card is out of memory, hangs, or would take too many system resources to initialize
In your case, I would probably try to go with SkiaSharp (https://github.com/mono/SkiaSharp) to abstract a bit from the platform/API details
Situation:
In an application that has both the need for calculation as well as rendering images (image preprocessing and then displaying) I want to use both AMP and WPF (with AMP doing some filters on the images and WPF not doing much more than displaying scaled/rotated images and some simple overlays, both running at roughly 30fps, new images will continuously stream in).
Question:
Is there any way to find out how the 2 will influence each other?
I am wondering on whether I will see the hopefully nice speed-up I will see in an isolated AMP only environment in the actual application later on as well.
Additional Info:
I will be able and am going to measure the AMP performance separately, since it is low level and new functionality that I am going to set up in a separate project anyway. The WPF rendering part already exists though in a complex application, so it would be difficult to isolate that.
I am not planning on doing the filters etc for rendering only since the results will be needed in intermediate levels as well (other algorithms, e. g. edge detection, saving, ...).
There are a couple of things you should consider here;
Is there any way to find out how the 2 will influence each other?
Directly no, but indirectly yes. Both WPF and AMP are making use of the GPU for rendering. If the AMP portion of your application uses too much of the GPU's resources it will interfere with your frame rate. The Cartoonizer case study from the C++ AMP book uses MFC and C++ AMP to do exactly the way you describe. On slower hardware with high image processing loads you can see the application's responsiveness suffer. However in almost all cases cartoonizing images on the GPU is much faster and can achieve video frame rates.
I am wondering on whether I will see the hopefully nice speed-up
With any GPU application the key to seeing performance improvements is that the speedup from running compute on the GPU, rather than the CPU, must make up for the additional overhead of copying data to and from the GPU.
In this case there is additional overhead as you must also marshal data from the native (C++ AMP) to managed (WPF) environments. You need to take care to do this efficiently by ensuring that your data types are blitable and do not require explicit marshaling. I implemented an N-body modeling application that used WPF and native code.
Ideally you would want to render the results of the GPU calculation without moving it through the CPU. This is possible but not if you explicitly involve WPF. The N-body example achieves this by embedding a DirectX render area directly into the WPF and then renders data directly from the AMP arrays. This was largely because the WPF viewport3D really didn't meet my performance needs. For rendering images WPF may be fine.
Unless things have changed with VS 2013 you definitely want your C++ AMP code in a separate DLL as there are some limitations when using native code in a C++/CLI project.
As #stijn suggests I would build a small prototype to make sure that the gains you get by moving some of the compute to the GPU are not lost due to the overhead of moving data both to and from the GPU but also into WPF.
I am looking into making a game for Windows Phone and Windows 8 RT. The first iteration of the game will use XNA for the UI.
But since I plan to have other iterations that may not use XNA, I am writing my core game logic in a Portable Class Library.
I have gotten to the part where I am calculating vector math (sprite locations) in the core game logic.
As I was figuring this out, I had a co-worker tell me that I should make sure that I am doing these calculations on the GPU (not the CPU).
So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Side Question: If not, should they be done on the GPU? Or is it OK for me to do them in my Portable Class Library and have the CPU run them?
Note: If I need to have XNA do them so that I can use the GPU then it is not hard to inject that functionality into my core logic from XNA. I just want to know if it is something I should really be doing.
Note II: My game is a 2D game. It will be calculating movement of bad guys and projectiles along a vector. (Meaning this is not a huge 3D Game.)
I think your co-worker is mistaken. Here are just two of the reasons that doing this kind of calculation on the GPU doesn't make sense:
The #1 reason, by a very large margin, is that it's not cheap to get data onto the GPU in the first place. And then it's extremely expensive to get data back from the GPU.
The #2 reason is that the GPU is good for doing parallel calculations - that is - it does the same operation on a large amount of data. The kind of vector operations you will be doing are many different operations, on a small-to-medium amount of data.
So you'd get a huge win if - say - you were doing a particle system on the GPU. It's a large amount of homogeneous data, you perform the same operation on each particle, and all the data can live on the GPU.
Even XNA's built-in SpriteBatch does most of its per-sprite work on the CPU (everything but the final, overall matrix transformation). While it could do per-sprite transforms on the GPU (and I think it used to in XNA 3), it doesn't. This allows it to reduce the amount of data it needs to send the GPU (a performance win), and makes it more flexible - as it leaves the vertex shader free for your own use.
These are great reasons to use the CPU. I'd say if it's good enough for the XNA team - it's good enough for you :)
Now, what I think your co-worker may have meant - rather than the GPU - was to do the vector maths using SIMD instructions (on the CPU).
These give you a performance win. For example - adding a vector usually requires you to add the X component, and then the Y component. Using SIMD allows the CPU to add both components at the same time.
Sadly Microsoft's .NET runtime does not currently make (this kind of) use of SIMD instructions. It's supported in Mono, though.
So, here is the question, if I use XNA vector libraries to do my vector calculations, are they automatically done on the GPU?
Looking inside the Vector class in XNA using ILSpy reveals that the XNA Vector libraries do not use the graphics card for vector math.
I am building an app for Windows Mobile 6.5 and I was wondering if there is any way to hardware accelerate various calculations. I would like to have the GPU do some of the work for the app, instead of relying on the CPU to do everything.
I would like to use C#, but if that is not possible, then C++ is just fine.
Thanks for any guidance!
EDIT-
An example of the types of calculations I want to offload to the GPU would be things like calculating the locations of 25-100 different rectangles so they can be placed on the screen. This is just a simple example, but I've currently been doing these kinds of calculations on a seperate thread, so I figured (since it's geometry calculations) it would be a prime candidate for GPU.
To fully answer your question I would need more details about what calculations you are trying to perform, but the short answer is no, the GPUs in Windows Mobile devices and the SDK Microsoft exposes are not suitable for GPGPU(General-Purpose Computation on Graphics Hardware).
GPGPU really only became practical when GPUs started providing programmable vertex and pixel shaders with DirectX9(and limited support in 8). The GPUs used with Windows Mobile 6.5 devices are much more similar to those around DirectX8, and do not have programmable vertex and pixel shaders:
http://msdn.microsoft.com/en-us/library/aa920048.aspx
Even on modern desktop graphics cards with GPGPU libraries such as CUDA, getting performance increases when offloading calculations to the GPU is not a trivial task. The calculations must be inherently suited to GPUS( ie able to run massively in parallel, and enough calculations performed on any memory to offset the cost of transferring it to the GPU and back ).
That does not mean it is impossible to speed up calculations with the GPU on Windows Mobile 6.5, however. There is a small set problems that can be mapped to a fixed functions pipeline without shaders. If you can figure out how to solve your problem by rending polygons and reading back the resulting image, then you can use the GPU to do it, but it is unlikely that the calculations you need to do would be suitable, or that it would be worth the effort of attempting.
Is there any advantage to using C++ instead of C# when using Direct3D? The tutorials I've found for C++ and DirectX all use Direct3D (which to my knowledge is managed). Similarly, all of the C# tutorials I've found are for Direct3D.
Is Direct3D managed?
Is there any difference between using D3D in either of the two languages?
DirectX is entirely native. Any impression you may have that it's managed is completely and utterly wrong. There are managed wrappers that will allow you to use DirectX from managed code. In addition, DirectX is programmed to be accessed from C++ or C, or similar languages. If you look at the SlimDX project, they encountered numerous issues, especially due to resource collection, because C# doesn't genuinely support non-memory resources being automatically collected and using doesn't cut the mustard. In addition, game programming can be very CPU-intensive, and often, the additional performance lost by using a managed language is untenable, and virtually all existing supporting libraries are for C or C++.
If you want to make a small game, or something like that, there's nothing at all stopping you from using managed code. However, I know of no commercial games that actually take this route.
The point of Direct3D is to move rendering off the CPU and onto the GPU. If there were to be a significant performance difference it would be for that code that runs on the CPU. Therefore I don't see that there should be any significant performance difference between native and managed code for the part of your code that interfaces with Direct3D.
Direct3D itself is not managed code.
It depends on what you're doing exactly. As David Heffernan mentioned, one of the objectives of Direct3D is to move as much processing as possible to the GPU. With the advent of vertex shaders, pixel shaders, and much more, we're closer to that reality than ever.
Of course given infinite time and resources, you can usually create more efficient algorithms in C++ than C#. This will affect performance at the CPU level. Today, processing that is not graphics related is still mostly done on the CPU. There are things like CUDA, OpenCL, and even future versions of DirectX which will open up possibilities of moving any parallel-friendly algorithm to the GPU as well. But the adoption rate of those technologies (and the video cards that support it) isn't exactly mainstream just yet.
So what types of CPU-intensive algorithms should you consider C++ for?
Artificial Intelligence
Particle engines / n-body simulations
Fast Fourier transform
Those are just the first things I can think of. At least the first two are very common in games today. AI is often done in a compromised fashion in games to run as quickly as possible, simply because it can be so processor intensive. And then particle engines are everywhere.