I'm deciding on whether or not to use VSync for a new game that I've been developing using OpenGL. The goal is to provide the user with the best gaming experience, and to have a good balance between performance and quality. This game is designed to run on both older (Intel/Netbook) computers and newer (NVidia/i7 desktop) computers. I am aware that the purpose of VSync is to prevent tearing, however; I have never been able to reproduce this tearing issue even with VSync turned off. To provide the best experience; should VSync be turned on, or off?
There are many things that can be said about this issue. Let me count the way:
if you don't see tearing, you're likely vsync'ed, even if you think you're not. Reasons may vary, but ultimately, swapping buffers in the middle of the frame is very noticeable if any movement is happening (one reason might be that you're not configured to flip buffers, so something has to do a copy of your framebuffer)
vsync on has noticeable artifacts too. Typically, it creates frames that display for a variable amount of time, more tied to the display refresh rate than your rendering rate. This can create micro-stuttering, and is very hard to control, as you don't know when you generate your frame which tick it will display at, so, you can't compensate for motion artifacts. This is why some people try to lock their rendering speed to the refresh rate. Always render at 60fps (or 30fps), time update is time += 16.7ms John Carmack has been asking for a mode that does "vsync at 60Hz, but don't sync if I missed the deadline" for what I assume would be this reason.
vsync on saves power (important on netbooks)
vsync off reduces input latency (when your input is 3 frames before display, it can start to matter). You can try to compensate some of that, but ultimately, it is hard to inject input updates at the very last minute.
So the bottom line answer is that there is no perfect answer. It really depends on what matters more for your game. Things you need to look at: which framerate you want to achieve, how much input latency matters, how much movement is there in the game, is power going to be a concern.
For the best user experience, place an option in your menu to turn VSync on/off. This way the user can decide. You might not be able to reproduce tearing, but it's definitely a real issue on some systems.
As for what should be the default setting, I'm not sure, I think it's your choice. I prefer to have vertical sync off by default because it reduces performance a bit when enabled and most people don't recognize the tearing or don't care about it, but there are good reasons to enable it by default, too.
I believe the default for VSync should be on (based on all my gaming, not programming experience :) ). Just because you don't see it, doesn't mean you shouldn't follow a good
practice.
maybe you can't stare as closely at the screen as some other gamer
maybe when you see a little tear you are not as annoyed as some gamers
maybe your specific screen hides tear better than other screens would
And as schanaader mentioned, typically games would leave vsync as a configuration option somewhere in the video settings menu. Default should still be "on" for those that don't know what vsync means, and if the user is knowledgeable enough, they have the option of tweaking it to see what the difference is
Related
Since a decent frame rate is so important in VR apps, I was just wondering if you can predict frames dropping? If so, before this issue actually occurs, can you deactivate some scripts or other features except the camera transform's updating and rendering of the environment? So, if performance drops (i.e. frames drop) no nausea will be experienced.
Predicting the future is not very likely, so you're going to have to adapt on the fly when you see performance drop. Either that or you could imagine creating a test environment a user could run where you try and figure out the capabilities of the user's hardware setup and tweak settings accordingly for future actual app runs. (i.e. "the test environment ran below the desired 120fps at medium settings, so default to low from now on")
I don't know what platform you are on exactly, but just in case you're on the Oculus ecosystem you might be able to get some help however.
By default on Oculus devices you're supported by what they refer to as "Asynchronous TimeWarp". This in essence decouples the headset's transform update and rendering from the framerate of your application. If no up-to-date frame is available, the latest frame will be transformed based on the latest head tracking information, reducing how noticeable such hiccups are. You will still want to avoid this having to kick in as much as possible though.
Additionally, Oculus supports "Fixed Foveated Rendering" on their mobile platforms where depending on your GPU utilization the device can render at lower resolutions at the edges of your view. In practice I've found this to be surprisingly effective, even though (as the name implies) it's fixed at the center of the view and does not include any eye tracking. But as with the previous method, not needing it is always better.
I'm unfortunately less familiar with options on other devices, but I'm sure others will pitch in if those exist.
I know that a certain delay between the “play-order” and the actual start of the playback of a sound is inevitable.
However, for my current project, I must be able to start a sound-playback at a certain moment in time. This moment is known, so the solution to the problem is ether to reduce the delay-time as much as possible or to somehow predict the latency and start the sound somewhat earlier (depending on the predicted latency).
I describe the problem in detail here:
https://naudio.codeplex.com/discussions/662236
My current solution is to use NAudio to play a sound and simultaneously observe the sound-output-volume. This way I can measure the latency and use it to time the “play-order” for the following sounds.
This way I get decent results (about 30 ms deviation from the supposed play-time), but I wanted to ask if you guys have better suggestions.
Best regards and many thanks
I have a multithreaded program which consist of a C# interop layer over C++ code.
I am setting threads affinity (like in this post) and it works on part of my code, however on second part it doesn't work. Can Intel Compiler / IPP / MKL libs / inline assembly interfere with external affinity setting?
UPDATE:
I can't post code as it is whole environment with many many dlls. I set environment values: OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 IPP_NUM_THREADS=1. When it runs in single thread, it runs ok, but when I use number of C# threads and set affinity per thread (on a quad core machine), the initialization is going fine on separate cores, but during processing all threads start using the same core. Hope I am clear enough.
Thanks.
We've had this exact problem; we'd set our thread affinity to what we wanted, and the IPP/MKL functions would blow that away! The answer to your question is 'yes'.
Auto Parallelism
The issue is that, by default, the Intel libraries like to automatically execute multi-threaded versions of the routines. So, a single FFT gets computed by a number of threads setup by the library specifically for this purpose.
Intel's intent is that the programmer could get on with the job of writing a single threaded application, and the library would allow that single thread to benefit from a multicore processor by creating a number of threads for the maths work. A noble intent (your source code then need know nothing about the runtime hardware to extract the best achievable performance - handy sometimes), but a right bloody nuisance when one is doing one's own threading for one's own reasons.
Controlling the Library's Behaviour
Take a look at these Intel docs, section Support Functions / Threading Support Functions. You can either programmatically control the library's threading tendancies, or there's some environment variables you can set (like MKL_NUM_THREADS) before your program runs. Setting the number of threads was (as far as I recall) enough to stop the library doing its own thing.
Philosophical Essay Inspired By Answering Your Question (best ignored)
More or less everything Intel is doing in CPU design and software (e.g. IPP/MKL) is aimed at making it unnecessary for the programmer to Worry About Threads. You want good math performance? Use MKL. You want that for loop to go fast? Turn on Auto Parallelisation in ICC. You want to make the best use of cache? That's what Hyperthreading is for.
It's not a bad approach, and personally speaking I think that they've done a pretty good job. AMD too. Their architectures are quite good at delivering good real world performance improvements to the "Average Programmer" for the minimal investment in learning, re-writing and code development.
Irritation
However, the thing that irritates me a little bit (though I don't want to appear ungrateful!) is that whilst this approach works for the majority of programmers out there (which is where the profitable market is), it just throws more obstacles in the way of those programmers who want to spin their own parallelism. I can't blame Intel for that of course, they've done exactly the right thing; they're a market led company, they need to make things that will sell.
By offering these easy features the situation of there being too many under skilled and under trained programmers becomes more entrenched. If all programmers can get good performance without having to learn what auto parallelism is actually doing, then we'll never move on. The pool of really good programmers who actually know that stuff will remain really small.
Problem
I see this as a problem (though only a small one, I'll explain later). Computing needs to become more efficient for both economic and environmental reasons. Intel's approach allows for increased performance, and better silicon manufacturing techniques produces lower power consumption, but I always feel like it's not quite as efficient as it could be.
Example
Take the Cell processor at the heart of the PS3. It's something that I like to witter on about endlessly! However, IBM developed that with a completely different philosophy to Intel. They gave you no cache (just some fast static RAM instead to use as you saw fit), the architecture was pretty much pure NUMA, you had to do all your own parallelisation, etc. etc. The result was that if you really knew what you were doing you could get about 250GFLOPS out of the thing (I think the none-PS3 variants went to 320GLOPS), for 80Watts, all the way back in 2005.
It's taken Intel chips about another 6 or 7 years or so for a single device to get to that level of performance. That's a lot of Moores law growth. If the Cell got manufactured on Intel's latest silicon fab and was given as many transistors as Intel put into their big Xeons, it would still blow everything else away.
No Market
However, apart from PS3, Cell was a none-starter market proposition. IBM decided that it would never be a big enough seller to be worth their while. There just wasn't enough programmers out there who could really use it, and to indulge the few of us who could makes no commercial sense, which wouldn't please the shareholders.
Small Problem, Bigger Problem
I said earlier that this was only a small problem. Well, most of the world's computing isn't about high maths performance, it's become Facebook, Twitter, etc. That sort is all about I/O performance, and for that you don't need high maths performance. So in that sense the dependence on Intel Doing Everything For You so that the average programmer to get good math performance matters very little. There's just not enough maths being done to warrant a change in design philosophy.
In fact, I strongly suspect that the world will ultimately decide that you don't need a large chip at all, an ARM should do just fine. If that does come to pass then the market for Intel's very large chips with very good general purpose maths compute performance will vanish. Effectively those of use who want good maths performance are being heavily subsidised by those who want to fill enourmous data centres with Intel based hardware and put Intel PCs on every desktop.
We're simply lucky that Intel apparently has a desire to make sure that every big CPU they build is good at maths regardless of whether most of their users actually use that maths performance. I'm sure that desire has its foundations in marketing prowess and wanting the bragging rights, but those are not hard, commercially tangible artifacts that bring shareholder value.
So if those data centre guys decide that, actually, they'd rather save electricity and fill their data centres with ARMs, where does that leave Intel? ARMs are fine devices for the purpose for which they're intended, but they're not at the top of my list of Supercomputer chips. So where does that leave us?
Trend
My take on the current market trend is that 'Workstations' (PCs as we call them now) are going to start costing lots and lots of money, just like they did in the 1980s / early 90s.
I think that better supercomputers will become unaffordable because no one can spare the $10billions it would take to do the next big chip. If people stop having PCs there won't be a mass market for large all-out GPUs, so we won't even be able to use those instead. They're an exclusive thing, but super computers do play a vital role in our world and we do need them to get better. So who is going to pay for that? Not me, that's for sure.
Oops, that went on for quite a while...
I'm a pretty big newbie when it comes to optimization. In the current game I'm working on I've managed to optimize a function and shave about 0.5% of its CPU load and that's about as 'awesome' as I've been.
My situation is as follows: I've developed a physics heavy game in MonoTouch using an XNA wrapper library called ExEn and try as I might I've found it very hard to get the game to reach a playable framerate on an iPhone4 (don't even want to think about iPhone3GS at this point).
The performance degradation is almost certainly in the physics calculations, if I turn physics off the framerate jumps up sharply, if I disable everything, rendering, input, audio and just leave physics on performance hovers around 15fps during physics intensive situations.
I used Instruments to profile the performance and this is what I got: http://i.imgur.com/FX25h.png The functions which drain the most performance are either from the physics engine (Farseer) or the ExEn XNA wrapper functions they call (notably Vector2.Max, Vector2.Min).
I looked into those functions and I know wherever it can Farseer is passing values by reference into those functions rather than by value so that's that covered (and it's literally the only way I can think of. The functions are very simple themselves basically amounting to such operations as
return new Vector2(Max(v1.x, v2.x), Max(v1.y, v2.y))
Basically I feel like I'm stuck and in my limited capacity and understanding of code optimizations I'm not sure what my options are or if I even have any options (maybe I should just curl into a fetal position and cry?). With LLVM turned on and built in release I'm getting maybe 15fps at best. I did manage to bring the game up to 30fps by lowering the physics precision but this makes many levels simply unplayable as bodies intersect one another and collapse in on themselves.
So my question is, is this a lost cause or is there anything I can do to beef up performance?
First of all, love your game on Windows Phone 7!
Secondly, I don't see anything out of the ordinary in your profiler output. I did a quick and dirty performance analysis of the Farseer engine once (running in .net) and came up with similar results. It almost looks like you have a slowdown that is proportional across the board and may be due to mono itself.
I suppose you follow the performance hints in http://farseerphysics.codeplex.com/documentation already :-)
The most important thing seems to be
to reduce complexity for the
collision detection calculations,
i.e. not the visual but the colliding
shapes. In Unijty3D they are called
colliders and you can attach a simple
cube as a collider to a complex human
body. I don't know anything about
Fareer but they probably have similar
concept (is it called body?).
If possible, try to replace your main
character or other complex objects by
easy cubes and check if fps raises.
Compiler switches sometimes leverage performance. Be really sure that there are no debug settings activated (I got up to 30 times slower code in a C++ library project). Ensure that armv7 optimisation is turned on and -O3 or -Os
Watch out for logging statements as they are extremely expensive on iPhone
[Update:]
Try to decrease the number of actively calculated AABBs just to find out which part of the physics engine causes the trouble. If it's the pure number follow FFox' advice.
What is about other platforms? Where did you perform the testing during the development phase, on simulator? Which one? Any chance to get it running on Android or Android simulator or Windows Phone? This would give you a hint if it is an iPhone specific problem.
Ah, I just saw that ExEn still is in pre-release state and the final will be launched on July 21th as OS. IMO this changes the situation: If your App is running fine on some other comparable platform, then just wait for the release and give it a new try. Chances are pretty well that there is still debugging code in the pre-release you are working on.
My main question is this: for graphic processing (e.g. moving images, collisions, scaling etc.), is there something I can do to get the best performance out of the processing? Under what circumstances is the GPU used instead of the CPU, and can I selectively use one over the other? Should I? Also, what Graphic API should I use if I wanted to get the best performance with graphic processing. Or should I even use an API at all?
I feel as though I'm asking to many questions in one Question, or that I'm not wording it correctly, but I don't see how I could do otherwise. If any clarification is needed, just ask. I've searched many places, but this may be a duplicate, so my apologies if it is.
I'm not sure what you are creating, but these are my musings on the subject:
A good option is XNA. That is basically a wrapper on top of DirectX that is made for utilizing the GPU for graphic intensive tasks, but hides some of the dirty implementation of using DirectX directly. Your best option for games, graphical simulations etc.
Another option is WPF that also uses DirectX at the bottom, but is more geared towards creating user interfaces than games and similar tasks, but is extremely flexible.
WinForms/GDI+ is your worst option, since it does not use hardware acceleration at all, and will only use the CPU.