I am trying to make a Unity game with their new MLAPI (Mid-level networking API). I've followed this tutorial exactly without changing anything, and the game is running fine on my local (Linux) PC.
I changed the connection IP and copied the build file over to a cloud server I rent (DigitalOcean Ubuntu 20.04), and used the flag -mlapi server, and the -batchmode -nographics options, but I still suspect it is trying to emulate graphics on the CPU.
The 100% CPU problem seems to be documented and the suggested solution is including the line Application.targetFrameRate = 30;. I tried doing the following (targetFrameRate is ignored if vSync is not disabled):
switch (mlapiValue)
{
case "server":
netManager.StartServer();
// https://docs-multiplayer.unity3d.com/docs/troubleshooting/troubleshooting
QualitySettings.vSyncCount = 0;
Application.targetFrameRate = 1;
break;
case "host":
netManager.StartHost();
break;
case "client":
netManager.StartClient();
break;
}
However, when I move the client, I still get 100% CPU (with of course the additional bonus of each action being executed instantly on the server(!?) but 1 second later on each client).
What is even going on here? Someone online suggested it might be related to socket polling, but when I start 2 instances one of them gets killed (out of CPU). Note that the single server still seems pretty responsive.
This issue only seems to occur on Linux machines with exactly 1 core. I tried reducing fps to 1 on a single-core, which did reduce CPU below 5%, but only if I start a single server (e.g. on port 7778). Once I start a second server on port 7779, we are back at 100% CPU. In summary:
Setup
Outcome
1 core, 60 fps, 1 instance
100% CPU (bad)
1 core, 1 fps, 1 instance
<5% CPU (OK)
1 core, 1 fps, 2 instances
100% CPU (bad)
2 cores, 60 fps, 10+ instances
<10% CPU (OK)
So my recommendation is to just rent a cloud instance with 2+ vCPUs.
Related
Disclaimer: Since I am unsure of what the issue really is, I have put it into Stackoverflow instead of Serverfault.
Description
I am an owner of an application that approaches launch, so I made a tool that does regressiontests and stresstesting (loadtesting), when we're testing with 2-3 clients we see no impact, but as soon as we reach 8-10 clients we see huge impact in service delays and handle time from our API.
TPS = Tests Per Second initiated by my tool (clients/threads initiated every second)
Here is the output from my stress test tool:
TPS # 5 - Avg. handle time: 1302 ms
TPS # 10 - Avg. handle time: 5641 ms
TPS # 30 - Avg. handle time: 13549 ms
TPS # 50 - Avg. handle time: 6136 ms
TPS # 100 - Avg. handle time: 24854 ms
Notes:
A. What I usually see, is no real pattern between the TPS, except it just takes long time after 10+ TPS. As you can see 50 TPS is much faster than 30 TPS.
B. What I also noticed, is that it looks like it queues requests, and there is some time between it completes new requests, see this screenshot: https://gyazo.com/1431b5113ac216983a6ca6e1f1bd75ad
C. The only thing I saw improve, is when removing external calls in the code and running the code again (external API), but when testing the external API isolated E2E, I see no delay (5000+ TPS with 120 ms avg.), but we can't handle more than 10 TPS without delay through our service.
D. No hardware graph ever gets over 5% (memory or CPU)
Do you have any suggestions, where should I look next? I am open to questions.
Technology:
C# (.NET Core) IIS WebService
What I have done/tested (with no impact):
End to end stres-testing the external applications we use, here we can easily handle 5000+ TPS with no real delay.
Implemented LoadBalancers (NGINX) on all layers (front-end, service, integration)
Used MongoDB Optimizer to attempt to speed up data flow.
We have upgraded our DB from 16 GB -> 64 GB and 2 CPU -> 8 CPU
Looked through IIS setup, to see if I can find something that looks odd.
Deactivated Antivirus, to see if external HTTPS calls are impacted.
I reduced my problem to this example:
[STAThread]
static void Main() {
var t = Task.Run(async delegate {
await Task.Delay(TimeSpan.FromSeconds(5));
return "delayed action";
});
t.Wait();
Console.WriteLine("Task t Status: {0}, Result: {1}", t.Status, t.Result);
}
While this behaves normally on my host PC, it closes before returning the "delayed action" when ran on VMware Workstation Player 15 with fresh Windows 10 installation. There are no errors. If I put another Console.WriteLine at the beginning it shows in cmd.
I assigned to VMware 4 cores and 6GB of memory, cpu virtualizations are turned off, 3d acceleration is turned on. Am I missing some dependencies, or VM needs different configuration?
My goal was to create a series of SendInput functions that need to be spread over time. I even tried a 3rd party “clicker” tool that got a delay option and it has the same issue. I had to set it to 30ms to get clicks 500ms apparat, as if the bulk of clicks never registered. Doing same with my code did not work on VM but works OK on host PC.
I can unfortunately not help you with fixing the VMWare end, aside from two off ideas:
If there is anything that passes through Threads directly to the Host CPU, you could try turning them off. It would be bad for performance, but might avoid issues with the Thread (Manager) and starting OS running on slightly different clocks.
I do I have a different approach you could try. One that relies a whole lot less on Thread/Async and similar Delay systems. Something that might be slightly more robust, against whatever is causing this. This difference was purely by accident - it started it's live as a example for a very basic Rate Limiting System, to be run in a seperate thread:
integer interval = 20;
DateTime dueTime = DateTime.Now.AddMillisconds(interval);
while(true){
if(DateTime.Now >= dueTime){
//insert code here
//Update next dueTime
dueTime = DateTime.Now.AddMillisconds(interval);
}
else{
//Just yield to not tax out the CPU
Thread.Sleep(1);
}
}
This one only uses the Thread mechanics for the "low CPU load when idle" part. Everything else is based on the hopefully more stable DateTime System.
Do note that short Intervalls are poorly supported, even 20 ms might already be beyond the limit. DateTime.Now() is not remotely as accurate as as the type can be precise, so it only remotely works for dual or tripple digit Intervalls. Delay and Sleep do actually support Millisecond precision.
I'm stuck figuring out what will be the next trouble shooting step to take or research.
I've got one web application running on one IIS8 webserver. Lately we are struggling with the performance of the webapplication. This morning the webapplication 'crasht' again. With this I mean that it didn't respond to any new requests.
Because Perfmon is your friend, I fired off the following counters:
Processor Time (%)
Requests Executing
Requests/Sec
See the image I've included.
What I find interessting is that "Requests Executing" only increased at this point. And... the CPU was not running 100% to get rid of all these requests.
As I'm the main developer of this web application, I already optimized many high-CPU webpages.
With this SO thread I'm hoping to find the bottle neck. Maybe with tips for extra logging. Application code analysis etc. I can provide more information about the setup or application when needed.
Hope you can help. Many thanks in advance.
Some specifications:
Windows Server 2012R2
IIS8
Intel Xeon Quad core CPU
Total of 4 gig memory
ASP.net 4.0 web application
User specs:
Google Analytics Real Time visitors between 100-500
Normal requests/sec average of 30
'Peak' requests/sec of 200 / 300
EDIT:
I run a Perfmon Data Collection Set for about a hour. Exactly at the moment when the website crashed. I used the tool PAL to analyze it. It only had many warnings about the memory being below 5%.
This memory issue is clearly an issue we will be resolving soon.
Another thing I noticed is that the list of "current requests" in IIS8 was enormous. It contained hundreds of "current requests". I would expect these requests to reach a time out value and send a request Timed Out to the users.
Here is a print-screen of the current healthy situation:
This list was infinite long.
And, again another thing I just noticed, was one request taking more then 10 minutes(!) to deliver a byte[] to a user. The 'state' was SendResponse. I persume this user was on a low-bandwitdh device. As long as this user is downloading - one workerprocess is taken. How should we prepare this long pending requests?
Have a look at PAL at https://pal.codeplex.com/ - this will guide you in collecting a number of stats and then analyse them for you.
Along with CPU, other bottlenecks are memory, network, and disc IO. PAL should help you in collecting the appropriate stats - it has several pre-defined sets for e.g. Web Server, DB Server, etc.
I have a simple console application that uses ZeroMQ to send and receive messages. In the receive portion, I have the following message pump code:
ZMQ.Context _context = new ZMQ.Context(1);
ZMQ.PollItem[] pollItems = new ZMQ.PollItem[0];
while (!_finished)
{
if (pollItems.Length > 0)
context.Poll(pollItems, pollTimeout);
else
Thread.Sleep(1);
if (_receiversChanged)
UpdatePollItems(ref pollItems);
}
(The idea is that I can add and remove items from the poller at run-time, as I need to add receivers. UpdatePollItems simply creates a new array whenever the set of receivers changes.)
I have tried pollTimeout values of 50ms and 500ms but the app (which is sitting on its main thread on Console.ReadKey) still uses 100% of one core, even when no messages are being sent. I ran the app under the profiler and confirmed that it is ZMQ.Context.Poller that is chewing all the CPU.
Have others seen similar behaviour? I am using the latest ZeroMQ C# binding (clrzmq-x64.2.2.3 from NuGet).
Yes there is a bug in the driver. I hit that as well. Looking at the code it is possible that the .net 4 version should fare better, but you have to recompile it. I will check whether the code I rewrote could be reintegrated as a pull request.
I'm going to guess that when you say you are setting the poll timeout to 500 ms, that you are setting the variable pollTimeout to 500. Which would be incorrect. For a timeout of 500ms, the variable pollTimeout should be set to 500000. If you do context.Poll(...,500) it is interpreted as 500 usec and it is internally rounded off to 0 ms.
I verified on my own system that passing 500 to poll will cause CPU utilization between 90 and 100%. Setting the value to anything over 1000 makes CPU usage much less and for 500000 (500ms) it should be negligible.
Either way, please update your code sample to the initialization of the variable pollTimeout. If I am completely off base, then at least it will prevent other would-be answerers from going down this path.
I have a system that is processing messages from an MSMQ queue (using multiple processes to do the processing).
Each message processing implies reading some rows, and making 3 updates and 1 insert. I'm processing around 60 messages per sec with this system.
What is puzzling me is that when for whatever reason, the queue has a buildup of messages, and the system is working as fast as it can to process messages, the CPU usage exhibits a cycle with peaks of 95%-100% and valleys of 50%-45%. This behavior holds even when I add more processes to do the processing.
Is it expected of SQL Server to show that behavior when the workload implies row insertion and updates (I'm thinking cache flushing, etc.)? Maybe it's got something to do with the host (this is running on Hyper-V, instead of on real hardware)?
Here's a link to a perfmon run: http://dl.dropbox.com/u/2200654/sql_perf_000001.rar
This is an example how to start logman counter trace for some relevant counters, with 10s collection interval, for a default SQL Server instance:
#echo off
del %temp%\sql_perf*.blg
logman delete sql_perf
logman create counter sql_perf -f bin -si 10 -o %temp%\sql_perf.blg -c "\Processor(_Total)\*" "\Physical Disk(*)\*" "\Process(*)\*" "\SQLServer:Access Methods\*" "\SQLServer:Databases(*)\*" "\SQLServer:Memory Manager\*" "\SQLServer:SQL Statistics\*" "\SQLServer:Wait Statistics\*" "\SQLServer:Transactions\*"
logman start sql_perf
Named instances change the counter category name from "SQLServer:..." to ":..."
The uploaded performance counters are missing the Phsyical Disk counters so the IO analysis is incomplete, but one think standing out is that the % CPU usage is an exact match of the SQL Server transaction/sec and batches/sec:
Note how the red line (% CPU) follows exactly the shape of the green line (Transactions/sec) and also the blue line (Batches/sec). This indicates that the spikes are entirely driven by the application behavior. Simply said, the SQL Server spikes in CPU (and also on IO write read, the purple line) simply because your application spikes requests every 3 minutes or so.
Check the rest - could it be you hit acheckpoint and thus processing slows down until the dirty pages are written to the database?
I've found the reason of the strange behavior: lack of memory. When I added the 'Memory: Page Reads/sec' counter, the valleys coincided with times of high page faulting.
Thanks for your answers (now the question seems silly, I'll ask for more memory :).