I have something very weird happening on my Azure website just today. I have not released any new code lately.
Approximately every 3rd to 5th page that I request on the site, it will sit there and hang, and eventually get an Error 524 (presented by CloudFlare). It tells me that my server or database may be overloaded. Checking Azure everything's running fine at less than 10% consumption.
The weird thing is, on the page requests that hang, if I kill the request (clicking the X where the refresh button normally is on Google Chrome), then retry the page, it returns everything immediately.
This morning when I first noticed, I thought it might have been Azure, so I checked the Azure status page, and sure enough, my region was having latency problems.
But, the problems, according to Azure, have apparently been resolved, yet I'm still receiving these same symptoms/errors.
Checking my error table just shows the db calls that are timing out while this is occurring, no patterns that I could recognize.
I've checked App Insights as well, checking the pages before and after and nothing is jumping out at me. I have another user over in the eastern United States and he said he hasn't noticed anything and all is working great.
I don't know what else to be checking here. This is consistently-inconsistent!
Here's the website to try:
www.wrestlestat.com
This is really frustrating because both Azure and CloudFlare status pages say everything is working normally. This is still happnening for me, anywhere from 3 to 5 pages one will hang...
EDIT/UPDATE - 2 days after incident started it has magically fixed itself. I have no idea of anything done, and I released NO code changes to attempt a fix. I'm baffled.
Related
We have an ASP.NET (.NET Framework 4.7.2) that has been working fine for the past several months. We installed this at a new customer and have come across this strange issue that I can't seem to solve.
For some reason, during normal API calls, we receive a 500.0 response that takes 17 seconds to complete. After this happens every request that is sent receives a 401.0 response that takes 5 seconds to complete, even though the authorization has not changed. Everything that was previously working no longer works. The only fix is to recycle the application pool, which seems to clear the issue and service is restored.
No error is ever recorded. The event logs show nothing, IIS logs only show the 500 and subsequent 401.0 responses, and application logs don't report any kind of exception. It just simply shuts down.
Our API uses anonymous authentication. Windows authentication was turned on, so we disabled it. We found somewhere else that this error can sometimes be caused by anon authentication being set to a specific user, so we changed that to application pool identity. This was a newer version of our API, so we rolled it back to one at a different client that was not having these issues and the error persists.
I'm beyond out of ideas. We can't figure out what is going on.
This API is running on a Win2016 Server, IIS 10, latest updates. It has been rebooted several times as well.
I've been trying to diagnose an issue pertaining to thousands of hung/stuck EndRequest requests in IIS. This is becoming a large problem for us as we're hitting the concurrent connection cap after about a week or two and have to recycle the whole application pool to clear the request list.
Because this is a live application, I have limited troubleshooting options, so anything that would halt or bring down the application pool I am not allowed to do.
IIS Information
Concurrent connection cap is set to its maximum of 65535.
Configuration debug in the web config is set to false and we have a
timeout set at 110 seconds.
Windows Server 2012 R2 Version 6.2 (Build 9200)
IIS Version 8.5.9600.16384
The long running requests have 0 data transfer, checked with
WireShark.
I'm pretty much at a loss on why these aren't timing out. I've set all the appropriate settings - the ones I could find from MSDN and other sources. We have a very, very hard time replicating this on our development environment so it's been blind testing for the most part. I've found articles and such on other state hangs, but I cannnot find anything on why a request in the EndRequest state will not time out.
Advanced Settings Page:
https://postimg.org/image/gxec32kmt/
Application Pool Requests Page:
https://postimg.org/image/qupcw57o5/
Web Config:
https://postimg.org/image/5xt4rh1xh/
Update 1
I did a bit of digging into our fallback that is supposed to close connections after an hour of no usage. We seem to currently have 10,153 sessions still active with a last active time of 3 days ago. I've stepped through this function quite a bit and it seems to be working as intended. It goes through the list of sessions and anyone over an hour of inactivity has their WebSocketHandler.Close() method called. However it seems some sessions are refusing to close after the method is being called. We have logging in place to tell us if any exceptions are being thrown during the run but it seems as though it's running as expected.
This was my mistake. I was running against an old sessions data pull. A current pull of the session data shows no sessions running greater then their specified time. This means that the WebSocketHandler.Close() was called on them and they were removed from our in-memory list.
Update 2
NETSTAT using netstat -s on pastebin: https://pastebin.com/embed_js/qBbZ4gJ1
Update 3
Correction to update 1. Can a connection close be called and fail? If so, then we're accidentally orphaning the reference to the connection in our server. I would still expect the IIS timeout to kick in however, there must be some catches to it collecting requests.
I have an ASP.NET Webforms Application on Azure but i always get the following Error on some sites:
502 - Web server received an invalid response while acting as a gateway or proxy server.
I already read a lot of topics regarding the 502 error on Azure but i still don't understand what the problem in my particular situation is.
The error occurs just on some site of the application. I can always reproduce the following pattern:
Open SITENAME.azurewebsites.net -> Error occurs
Open SITENAME.azurewebsites.net/Site1.aspx -> Error not occuring
Open SITENAME.azurewebsites.net/Site2.aspx -> Error not occuring
Refresh SITENAME.azurewebsites.net/Site2.aspx -> Error occurs and won't go away until i call Site1.aspx again
The only thing i found in the log is, that the application loads System.Windows.Forms.dll and there is an AccessViolationException. I have no idea why this dll is loaded at all because there is no Reference in any Project in Visual Studio to it but thats another story :).
But what i don't understand is that the error 502 does not always occur on Site 502.
Maybe someone can give me a hint what might be the problem or how i could find it...
Thanks for your help!
EDIT:
What i forgot to mention: In some threads regarding this error i read, that it occurs after 3 minutes or something like that. In my case it is nearly alway about 25 seconds until the errormessage shows up.
That points to an application issue. The reason you are etting 502 is because the worker process is crashing and the front end is left with a request with no response and returning a 502 to say exactly that. Look for eventlog.xml under the LogFiles folder for your website. Alternatively you can try remote debugging from VS to your website.
System.Windows.Forms.dll contains a lot of UI code that will most probably not work Azure websites sandbox. The reason it's loaded is probably because you are using something from the assembly or using something that uses something from that assembly. It doesn't have to be listed in Visual Studio to be loaded since it's a part of the standard .NET Framework.
I would suggest looking into remote debugging and figuring out at what point this is getting loaded and why.
In my case, I got 502 errors because the site was restarted by the azure auto-heal system. It turns out I made tests with that auto-heal system a few days ago, but since in the end I disabled it, I didn't think it could cause my 502 errors.
This is where I discovered that the azure interface to change auto-heal settings (mywebsite.scm.azurewebsites.net/Support -> mitigate) only affects the production slot. But when you swap your deployment slots, the settings get swapped. There is apparently no way to directly change the staging slot settings, you have to swap, change settings, and swap again.
So, I ended up having my staging slot with auto-heal enabled, and my production slot with auto-heal disabled (and of course at that time I thought it was disabled on both slots). Then I was "randomly" hitting 502 errors either on staging or production depending on how many times I swapped them. What's weird is that though the application seems to restart (or at least fails to respond to a few requests), I don't get the corresponding events in my log file, like if it wasn't running Application_Start after an app pool recycle triggered by the auto-heal system.
Took me a whole day to find out what was happening, I hope this answer can help someone in the same situation.
I got the error for a while after fiddling with connection strings, went away and came back in an hour, and the issue had disappeared and the site worked normally again. A highly technical answer for you.
I have a strange issue. A client added some data to our database through the website (ASP.NET MVC 4) at 7:30am. They e-mailed me right away to let me know the data wasn't appearing. I got the e-mail at 8:30am and checked if the data was there. It was. Our site logs showed it being added at 8:00am, a half hour after the user added it. The date in the log is actually calculated (just DateTime.Now call) in the server code, not in the database. That tells me that the server code ran at 8 and not 7:30 when it was initiated.
All the code is currently running synchronously, not async (I know, not great but haven't had time to fix it).
My question is whether there is any way that IIS could appear to execute a call but actually run it later?
Thanks,
Jason
Now that I am playing with NHibernate I am getting a lot more YSODs as I am learning it however I seem to get this error sometimesafter a YSOD:
This webpage is not available
The webpage at http://localhost:49497/ might be temporarily down or it may have moved permanently to a new web address.
Error 139 (net::ERR_TEMPORARILY_THROTTLED): Unknown error.
Is there any way to disable this because I have to wait a few minutes every time and that is a pretty big killer is productivity?
I'm assuming you're using Google Chrome, because a search on the error message shows that it's a "feature" of Chrome. If a page returns a 500 error too many times Chrome will "throttle" access to the page, apparently is some sort of anti DDOS mechanism in Chrome.
I've never seen it personally, but the commenters on the thread I linked said it looked like it was coming from the server.