I'm a pretty big newbie when it comes to optimization. In the current game I'm working on I've managed to optimize a function and shave about 0.5% of its CPU load and that's about as 'awesome' as I've been.
My situation is as follows: I've developed a physics heavy game in MonoTouch using an XNA wrapper library called ExEn and try as I might I've found it very hard to get the game to reach a playable framerate on an iPhone4 (don't even want to think about iPhone3GS at this point).
The performance degradation is almost certainly in the physics calculations, if I turn physics off the framerate jumps up sharply, if I disable everything, rendering, input, audio and just leave physics on performance hovers around 15fps during physics intensive situations.
I used Instruments to profile the performance and this is what I got: http://i.imgur.com/FX25h.png The functions which drain the most performance are either from the physics engine (Farseer) or the ExEn XNA wrapper functions they call (notably Vector2.Max, Vector2.Min).
I looked into those functions and I know wherever it can Farseer is passing values by reference into those functions rather than by value so that's that covered (and it's literally the only way I can think of. The functions are very simple themselves basically amounting to such operations as
return new Vector2(Max(v1.x, v2.x), Max(v1.y, v2.y))
Basically I feel like I'm stuck and in my limited capacity and understanding of code optimizations I'm not sure what my options are or if I even have any options (maybe I should just curl into a fetal position and cry?). With LLVM turned on and built in release I'm getting maybe 15fps at best. I did manage to bring the game up to 30fps by lowering the physics precision but this makes many levels simply unplayable as bodies intersect one another and collapse in on themselves.
So my question is, is this a lost cause or is there anything I can do to beef up performance?
First of all, love your game on Windows Phone 7!
Secondly, I don't see anything out of the ordinary in your profiler output. I did a quick and dirty performance analysis of the Farseer engine once (running in .net) and came up with similar results. It almost looks like you have a slowdown that is proportional across the board and may be due to mono itself.
I suppose you follow the performance hints in http://farseerphysics.codeplex.com/documentation already :-)
The most important thing seems to be
to reduce complexity for the
collision detection calculations,
i.e. not the visual but the colliding
shapes. In Unijty3D they are called
colliders and you can attach a simple
cube as a collider to a complex human
body. I don't know anything about
Fareer but they probably have similar
concept (is it called body?).
If possible, try to replace your main
character or other complex objects by
easy cubes and check if fps raises.
Compiler switches sometimes leverage performance. Be really sure that there are no debug settings activated (I got up to 30 times slower code in a C++ library project). Ensure that armv7 optimisation is turned on and -O3 or -Os
Watch out for logging statements as they are extremely expensive on iPhone
[Update:]
Try to decrease the number of actively calculated AABBs just to find out which part of the physics engine causes the trouble. If it's the pure number follow FFox' advice.
What is about other platforms? Where did you perform the testing during the development phase, on simulator? Which one? Any chance to get it running on Android or Android simulator or Windows Phone? This would give you a hint if it is an iPhone specific problem.
Ah, I just saw that ExEn still is in pre-release state and the final will be launched on July 21th as OS. IMO this changes the situation: If your App is running fine on some other comparable platform, then just wait for the release and give it a new try. Chances are pretty well that there is still debugging code in the pre-release you are working on.
Related
Since a decent frame rate is so important in VR apps, I was just wondering if you can predict frames dropping? If so, before this issue actually occurs, can you deactivate some scripts or other features except the camera transform's updating and rendering of the environment? So, if performance drops (i.e. frames drop) no nausea will be experienced.
Predicting the future is not very likely, so you're going to have to adapt on the fly when you see performance drop. Either that or you could imagine creating a test environment a user could run where you try and figure out the capabilities of the user's hardware setup and tweak settings accordingly for future actual app runs. (i.e. "the test environment ran below the desired 120fps at medium settings, so default to low from now on")
I don't know what platform you are on exactly, but just in case you're on the Oculus ecosystem you might be able to get some help however.
By default on Oculus devices you're supported by what they refer to as "Asynchronous TimeWarp". This in essence decouples the headset's transform update and rendering from the framerate of your application. If no up-to-date frame is available, the latest frame will be transformed based on the latest head tracking information, reducing how noticeable such hiccups are. You will still want to avoid this having to kick in as much as possible though.
Additionally, Oculus supports "Fixed Foveated Rendering" on their mobile platforms where depending on your GPU utilization the device can render at lower resolutions at the edges of your view. In practice I've found this to be surprisingly effective, even though (as the name implies) it's fixed at the center of the view and does not include any eye tracking. But as with the previous method, not needing it is always better.
I'm unfortunately less familiar with options on other devices, but I'm sure others will pitch in if those exist.
I've made a simple 2D game in Unity using the application Tiled for the tilemap, but when I run it, every now and then the screen will jump like it has a quick fps drop.
I'm not sure what would be hurting the performance of the game, because it is not performing any demanding tasks. My only guess is that it could be the large tilemap I'm using, but I feel this is not the problem because the game was still jerky when the map was not large. Furthermore, I tried scaling the map to an even larger size but it didn't make the performance any worse than it already was.
Does anyone know what could be causing the performance issue here? Thanks.
Periodic frame-rate drops could be caused by a number of things - fortunately Unity gives you a very good tool to track this down in the form of the Profiler (Window > Profiler). It's recommended that you build the game (check Development Build and Autoconnect Profiler) rather than testing in the editor, as often the editor creates a lot of overhead that shows up in the Profiler and may lead you astray.
Play the game and see if there are any spikes when you notice the fps drop. I would not be surprised if you have Garbage Collection (GC) causing your periodic fps drops. GC is triggered periodically in the background to clean up memory that was allocated but is no longer needed. You can get a good idea of how much memory is being allocated by selecting CPU Usage and looking at the GC Alloc column. This can add up quickly! Optimising a game to avoid GC spikes is a whole topic unto itself.
But, of course, you problem might have nothing to do with GC spikes. Hopefully the Profiler will tell you which stones you should be turning over.
I have a multithreaded program which consist of a C# interop layer over C++ code.
I am setting threads affinity (like in this post) and it works on part of my code, however on second part it doesn't work. Can Intel Compiler / IPP / MKL libs / inline assembly interfere with external affinity setting?
UPDATE:
I can't post code as it is whole environment with many many dlls. I set environment values: OMP_NUM_THREADS=1 MKL_NUM_THREADS=1 IPP_NUM_THREADS=1. When it runs in single thread, it runs ok, but when I use number of C# threads and set affinity per thread (on a quad core machine), the initialization is going fine on separate cores, but during processing all threads start using the same core. Hope I am clear enough.
Thanks.
We've had this exact problem; we'd set our thread affinity to what we wanted, and the IPP/MKL functions would blow that away! The answer to your question is 'yes'.
Auto Parallelism
The issue is that, by default, the Intel libraries like to automatically execute multi-threaded versions of the routines. So, a single FFT gets computed by a number of threads setup by the library specifically for this purpose.
Intel's intent is that the programmer could get on with the job of writing a single threaded application, and the library would allow that single thread to benefit from a multicore processor by creating a number of threads for the maths work. A noble intent (your source code then need know nothing about the runtime hardware to extract the best achievable performance - handy sometimes), but a right bloody nuisance when one is doing one's own threading for one's own reasons.
Controlling the Library's Behaviour
Take a look at these Intel docs, section Support Functions / Threading Support Functions. You can either programmatically control the library's threading tendancies, or there's some environment variables you can set (like MKL_NUM_THREADS) before your program runs. Setting the number of threads was (as far as I recall) enough to stop the library doing its own thing.
Philosophical Essay Inspired By Answering Your Question (best ignored)
More or less everything Intel is doing in CPU design and software (e.g. IPP/MKL) is aimed at making it unnecessary for the programmer to Worry About Threads. You want good math performance? Use MKL. You want that for loop to go fast? Turn on Auto Parallelisation in ICC. You want to make the best use of cache? That's what Hyperthreading is for.
It's not a bad approach, and personally speaking I think that they've done a pretty good job. AMD too. Their architectures are quite good at delivering good real world performance improvements to the "Average Programmer" for the minimal investment in learning, re-writing and code development.
Irritation
However, the thing that irritates me a little bit (though I don't want to appear ungrateful!) is that whilst this approach works for the majority of programmers out there (which is where the profitable market is), it just throws more obstacles in the way of those programmers who want to spin their own parallelism. I can't blame Intel for that of course, they've done exactly the right thing; they're a market led company, they need to make things that will sell.
By offering these easy features the situation of there being too many under skilled and under trained programmers becomes more entrenched. If all programmers can get good performance without having to learn what auto parallelism is actually doing, then we'll never move on. The pool of really good programmers who actually know that stuff will remain really small.
Problem
I see this as a problem (though only a small one, I'll explain later). Computing needs to become more efficient for both economic and environmental reasons. Intel's approach allows for increased performance, and better silicon manufacturing techniques produces lower power consumption, but I always feel like it's not quite as efficient as it could be.
Example
Take the Cell processor at the heart of the PS3. It's something that I like to witter on about endlessly! However, IBM developed that with a completely different philosophy to Intel. They gave you no cache (just some fast static RAM instead to use as you saw fit), the architecture was pretty much pure NUMA, you had to do all your own parallelisation, etc. etc. The result was that if you really knew what you were doing you could get about 250GFLOPS out of the thing (I think the none-PS3 variants went to 320GLOPS), for 80Watts, all the way back in 2005.
It's taken Intel chips about another 6 or 7 years or so for a single device to get to that level of performance. That's a lot of Moores law growth. If the Cell got manufactured on Intel's latest silicon fab and was given as many transistors as Intel put into their big Xeons, it would still blow everything else away.
No Market
However, apart from PS3, Cell was a none-starter market proposition. IBM decided that it would never be a big enough seller to be worth their while. There just wasn't enough programmers out there who could really use it, and to indulge the few of us who could makes no commercial sense, which wouldn't please the shareholders.
Small Problem, Bigger Problem
I said earlier that this was only a small problem. Well, most of the world's computing isn't about high maths performance, it's become Facebook, Twitter, etc. That sort is all about I/O performance, and for that you don't need high maths performance. So in that sense the dependence on Intel Doing Everything For You so that the average programmer to get good math performance matters very little. There's just not enough maths being done to warrant a change in design philosophy.
In fact, I strongly suspect that the world will ultimately decide that you don't need a large chip at all, an ARM should do just fine. If that does come to pass then the market for Intel's very large chips with very good general purpose maths compute performance will vanish. Effectively those of use who want good maths performance are being heavily subsidised by those who want to fill enourmous data centres with Intel based hardware and put Intel PCs on every desktop.
We're simply lucky that Intel apparently has a desire to make sure that every big CPU they build is good at maths regardless of whether most of their users actually use that maths performance. I'm sure that desire has its foundations in marketing prowess and wanting the bragging rights, but those are not hard, commercially tangible artifacts that bring shareholder value.
So if those data centre guys decide that, actually, they'd rather save electricity and fill their data centres with ARMs, where does that leave Intel? ARMs are fine devices for the purpose for which they're intended, but they're not at the top of my list of Supercomputer chips. So where does that leave us?
Trend
My take on the current market trend is that 'Workstations' (PCs as we call them now) are going to start costing lots and lots of money, just like they did in the 1980s / early 90s.
I think that better supercomputers will become unaffordable because no one can spare the $10billions it would take to do the next big chip. If people stop having PCs there won't be a mass market for large all-out GPUs, so we won't even be able to use those instead. They're an exclusive thing, but super computers do play a vital role in our world and we do need them to get better. So who is going to pay for that? Not me, that's for sure.
Oops, that went on for quite a while...
I've been working on a physics based game using MonoTouch for iPhone and XNA for Windows Phone 7. The game runs great on Windows Phone 7 but on iPhone I'm finding there to be a bit of lag in CPU bound operations.
The reason I suspect it's the CPU operations which are causing the slow down is because if I disable physics the game runs at a solid 60fps, it's only when I enable it that it chugs and it chugs even more so when lots of stuff is happening on screen. I'm using the Farseer Physics engine which was written for XNA but runs fine on iOS through MonoTouch.
The difference in performance between wp7 and iPhone is quite substantial which leads me ot believe there may be something going on that's hurting performance which I'm not seeing.
So I was just wanting to know if anyone here has had similar performance issues with monotoucha nd how they got past them? I have a few ideas involving multithreading but I feel that the iPhone (iPhone4 in particular) should be able to handle Angry Birds-esque physics processing, considering Angry Birds is an iPhone game.
The first step is to determine the source of the slowdown, follow the profiling instructions here:
http://monotouch.net/Documentation/Profiling
A simple way to get more performance is to use the LLVM code generation option. The builds take longer, but they produce better code.
Is there any advantage to using C++ instead of C# when using Direct3D? The tutorials I've found for C++ and DirectX all use Direct3D (which to my knowledge is managed). Similarly, all of the C# tutorials I've found are for Direct3D.
Is Direct3D managed?
Is there any difference between using D3D in either of the two languages?
DirectX is entirely native. Any impression you may have that it's managed is completely and utterly wrong. There are managed wrappers that will allow you to use DirectX from managed code. In addition, DirectX is programmed to be accessed from C++ or C, or similar languages. If you look at the SlimDX project, they encountered numerous issues, especially due to resource collection, because C# doesn't genuinely support non-memory resources being automatically collected and using doesn't cut the mustard. In addition, game programming can be very CPU-intensive, and often, the additional performance lost by using a managed language is untenable, and virtually all existing supporting libraries are for C or C++.
If you want to make a small game, or something like that, there's nothing at all stopping you from using managed code. However, I know of no commercial games that actually take this route.
The point of Direct3D is to move rendering off the CPU and onto the GPU. If there were to be a significant performance difference it would be for that code that runs on the CPU. Therefore I don't see that there should be any significant performance difference between native and managed code for the part of your code that interfaces with Direct3D.
Direct3D itself is not managed code.
It depends on what you're doing exactly. As David Heffernan mentioned, one of the objectives of Direct3D is to move as much processing as possible to the GPU. With the advent of vertex shaders, pixel shaders, and much more, we're closer to that reality than ever.
Of course given infinite time and resources, you can usually create more efficient algorithms in C++ than C#. This will affect performance at the CPU level. Today, processing that is not graphics related is still mostly done on the CPU. There are things like CUDA, OpenCL, and even future versions of DirectX which will open up possibilities of moving any parallel-friendly algorithm to the GPU as well. But the adoption rate of those technologies (and the video cards that support it) isn't exactly mainstream just yet.
So what types of CPU-intensive algorithms should you consider C++ for?
Artificial Intelligence
Particle engines / n-body simulations
Fast Fourier transform
Those are just the first things I can think of. At least the first two are very common in games today. AI is often done in a compromised fashion in games to run as quickly as possible, simply because it can be so processor intensive. And then particle engines are everywhere.