I have my MVC application with API used in it running on IIS 6.0(7.0 on production servers). For the API, I use IHttpHandler implementation in API.ashx file.
I have many different API calls being made to my API.ashx file, but I'll tell about one, that has no DB calls, so it's definitely NOT database issue.
At the very beginning of ProcessRequest method I've added Diagnostics.Stopwatch to track performance and stopping it at the last method's line.
The output of my stopwatch is always stable(+-2ms) and shows 5ms(!!!) in average.
But on my site, I see absolutely unstable and different Time to First Byte. It may start from 15ms and may grow up to 1 SECOND, and demonstrates 300 ms in average, but in logs I'll still have my stable 5ms from stopwatch.
This happens on every server I use, even locally(so this is not network related-problem) and on production. BTW all static resources are loaded really fast(<10ms)
Can anyone suggest the solution to this?
This sounds like a difficult one to diagnose without a little more detail. Could you edit your question and add a waterfall chart showing the slow API call in question? A really good tool to produce waterfall charts is http://webpagetest.org
I also recommend reading this article about diagnosing slow TTFBs.
http://www.websiteoptimization.com/speed/tweak/time-to-first-byte/
It goes into great detail about some of the reasons behind a slow response.
Here are some server performance issues that may be slowing down your server.
Memory leaks
Too many processes / connections
External resource delays
Inefficient SQL Queries
Slow database calls
Insuficient server resources
Overloaded Shared Servers
Inconsistent website response times
Hope that helps!
Related
In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.
Here is my problem:
I have just been brought onto a massive asp.net C# project and I've been charged with fixing some performance issues (not my area of expertise). More specifically after 5 - 7 redirects/ajax calls the web server stops responding and the whole page (and eventually the browser) freezes.
I don't think this is a coding issue as I've set up break points in a few pages (Page_Load method) and after the 5 requests it does not even reach the break points.
I don't believe this is related to this issue as I've increased the browser's maximum connections per server parameter and I got the same behavior. Furthermore after these 5 request in one browser IE, the application stops working in FF as well.
This is not a resource issue as the w3wp.exe process never exceeds 500MB memory.
One thing I've noticed when using Fiddler and other tools to monitor the requests is that the server takes a very long time when loading image files (png, jpg). I don't know if this is relevant.
I've enabled failed request tracing on the server and the only thing I've noticed is that some request fail with a 401 error even dough I've set Anonymous Authentication to enabled.
Here is the exact message
MODULE_SET_RESPONSE_ERROR_STATUS
ModuleName ManagedPipelineHandler
Notification 128
HttpStatus 401
HttpReason Unauthorized
HttpSubStatus 0
ErrorCode 0
ConfigExceptionInfo
Notification EXECUTE_REQUEST_HANDLER
ErrorCode The operation completed successfully. (0x0)
This message is sometimes thrown with ModuleName: ScriptModule
I have already wasted 2 days on this thing and I'm running out of ideas so any suggestions would be appreciated.
Like any large generic problem, your best bet in diagnosing the issue is to figure out how to break down the issue into smaller parts, how to hypothesize the issues, and how to validate or invalidate your hypotheses. My first inclination would be to hypothesize that the server-side processes in this particular are taking a long time, causing your client requests to block, making the whole thing seem frozen.
From there, I would attempt to replicate the long running server side processes by creating isolated client side tests - perhaps if the URLs are HTTP gets, I would test the same URLs individually. If they were HTTP posts, I'd create an isolated test form if feasible to see what happens with each request. If a long running server side process is found then you have a starting point.
If there are no long running server side processes then it may be JavaScript / client side coding issues that need to be looked into. But definitely when you're working a large, unfamiliar project, your best bet is to figure out how to break down the issue into smaller components that can then be tested
I solved the issue finally. Here is what I did:
Experimented with IIS settings and App_Pool recycling and noticed that there is nothing wrong with the way it handles requests that actually reach it.
I focused on the Http.sys module and noticed that in the log files there were a lot of Timer_ConnectionIdle and Client_Reset errors.
After some more experimentation and a lot of Google searches, I accidentally found this answer and it solved my issue. As the answer suggests the problem was caused by the AVG antivirus installed and incorrectly configured on the server.
Thanks for all the help and suggestions.
If it's ajax calls that are causing your browser to freeze, make sure they are not blocking ajax calls.
Just appending to Shan's answer, which is a good one.
First off, there is obviously a code issue as this is by no means 'normal' behavior for IIS.
That said, you must isolate it as Shan indicated. For example, given the server itself no longer accepts connections then we can pretty well eliminate javascript as the source of the problem and relegate it to being just a symptom.
Typically when a worker process spins into space like this it is due to either an infinite loop or an issue where multiple threads are trying to lock the same resource. I bet if you let it run long enough IIS itself will timeout, kill and restart the process.
With that in mind you want to look for any type of multithreaded garbage (which I highly recommend you don't do in a web server) or for anything that indicates a tight infinite loop. A loop is going to become apparent if you execute the requests individually. A multi-threaded issue will only show up if you happen to get a collision.
Run various performance counters on the web server. Also, once it locks up, let it sit that way for awhile. Once IIS performs it's own reset on the worker process go look for indicators in the event log.
We ran into strange sql / linq behaviour today:
We used to use a web application to perform some intensive database actions on our system. Recently we moved to a winforms interface for various reasons.
We found out that performance has seriously decreased: an action that used to take about 15 minutes now takes as long as one whole hour. The strange thing is that It's the exact same method being called. The method performs quite a bit of read / write using linq2sql, and profiling on the client machine showed that the problematic section is on the SQL action itself, in the linq's "Save" method.
The only difference between the cases is that on one case the method is called from a web application's code behind (MVC in this case), and on the other from a windows form.
The one idea I could come up with is that SQL performance has something to do with the identity of the user accessing the db, but I could not find any support for that assumption.
Any ideas?
Did you run both tests from the same machine? If not hardware differences could be the issue... or network... one could be in a higher speed section of your network... like in the same vlan as the sql server. Try running the client code on the same server the web app was running on.
Also if your app is updating progress in a sycronous manner the app could be waiting a long time for display to update... as apposed to working with a stream ala response.write.
If you are actually outputting progress as you go you should make sure that the progress updates are events and that the display of those happens on another thread so that the processing isn't waiting on display. Actually you probably should put the processing on its own thread... and just have an event handler take care of the updates... that is a whole different discussion. The point is that your app could be waiting to update the display of progress.
It's a very old issue but I happened to run into the question just now. So for whom is may concern nowadays, the solution (and there-before the problem) was frustratingly silly. Linq2SQL was configured on the dev machines to constantly write a log to console.
This was causing a huge delay due to the simple act of outputing large amount of text to the console. On the web server the log was not being written, and therefore - no performance drawback. There was a colossal face-palming once we figured this one out. Thanks for the helpers, I hope this answer will help someone solve it faster next time.
Unattended logging. That was the problem.
how much traffic is heavy traffic? what are the best resources for learning about heavy traffic web site development?.. like what are the approaches?
There are a lot of principles that apply to any web site, irrelevant of the underlying stack:
use HTTP caching facilities. For one there is the user agent cache. Second, the entire web backbone is full of proxies that can cache your requests, so use this to full advantage. A request that does even land on your server will add 0 to your load, you can't optimize better than that :)
corollary to the point above, use CDNs (Content Delivery Network, like CloudFront) for your static content. CSS, JPG, JS, static HTML and many more pages can be served from a CDN, thus saving the web server from a HTTP request.
second corollary to the first point: add expiration caching hints to your dynamic content. Even a short cache lifetime like 10 seconds will save a lot of hits that will be instead served from all the proxies sitting between the client and the server.
Minimize the number of HTTP requests. Seems basic, but is probably the best overlooked optimization available. In fact, Yahoo best practices puts this as the topmost optimization, see Best Practices for Speeding Up Your Web Site. Here is their bets practices list:
Minimize HTTP Requests
Use a Content Delivery Network
Add an Expires or a Cache-Control Header
Gzip Components
... (the list is quite long actually, just read the link above)
Now after you eliminated as much as possible from the superfluous hits, you still left with optimizing whatever requests actually hit your server. Once your ASP code starts to run, everything will pale in comparison with the database requests:
reduce number of DB calls per page. The best optimization possible is, obviously, not to make the request to the DB at all to start with. Some say 4 reads and 1 write per page are the most a high load server should handle, other say one DB call per page, still other say 10 calls per page is OK. The point is that fewer is always better than more, and writes are significantly more costly than reads. Review your UI design, perhaps that hit count in the corner of the page that nobody sees doesn't need to be that accurate...
Make sure every single DB request you send to the SQL server is optimized. Look at each and every query plan, make sure you have proper covering indexes in place, make sure you don't do any table scan, review your clustered index design strategy, review all your IO load, storage design, etc etc. Really, there is no short cut you can take her, you have to analyze and optimize the heck out of your database, it will be your chocking point.
eliminate contention. Don't have readers wait for writers. For your stack, SNAPSHOT ISOLATION is a must.
cache results. And usually this is were the cookie crumbles. Designing a good cache is actually quite hard to pull off. I would recommend you watch the Facebook SOCC keynote: Building Facebook: Performance at Massive Scale. Somewhere at slide 47 they show how a typical internal Facebook API looks like:
.
cache_get (
$ids,
'cache_function',
$cache_params,
'db_function',
$db_params);
Everything is requested from a cache, and if not found, requested from their MySQL back end. You probably won't start with 60000 servers thought :)
On the SQL Server stack the best caching strategy is one based on Query Notifications. You can almost mix it with LINQ...
I will define heavy traffic as traffic which triggers resource intensive work. Meaning, if one web request triggers multiple sql calls, or they all calculate pi with a lot of decimals, then it is heavy.
If you are returning static html, then your bandwidth is more of an issue than what a good server today can handle (more or less).
The principles are the same no matter if you use MVC or not when it comes to optimize for speed.
Having a decoupled architecture
makes it easier to scale by adding
more servers etc
Use a repository
pattern for data retrieval (makes
adding a cache easier)
Cache data
which is expensive to query
Data to
be written could be written thru a
cache, so that the client don't have
to wait for the actual database
commit
There's probably more ground rules as well. Maybe you can you say something about the architecture of your application, and how much load you need to plan for?
MSDN has some resources on this. This particular article is out of date, but is a start.
I would suggest also not limiting yourself to reading about the MVC stack: many principles are cross-platform.
we have a biztalk server (a virtual one (1!)...) at our company, and an sql server where the data is being kept.
Now we have a lot of data traffic. I'm talking about hundred of thousands. So I'm actually not even sure if one server is pretty safe, but our company is not that easy to convince.
Now recently we have a lot of problems.
Allow me to situate in detail, so I'm not missing anything:
Our server has 5 applications:
One with 3 orchestrations, 12 send ports, 16 receive locations.
One with 4 orchestrations, 32 send ports, 20 receive locations.
One with 4 orchestrations, 24 send ports, 20 receive locations.
One with 47 (yes 47) orchestrations, 37 send ports, 6 receive locations.
One with common application with a couple of resources.
Our problems have occured since we deployed the applications with the 47 orchestrations.
A lot of these orchestrations use assign shapes which use c# code to do the mapping. This is because we use HL7 extensions and this is kind of special, so by using c# code & xpath it was a lot easier to do the mapping because a lot of these schema's look alike. The c# reads in XmlNodes received through xpath, and returns XmlNode which are then assigned again to biztalk messages. I'm not sure if this could be the cause, but I thought I'd mention it.
The send and receive ports have a lot of different types: File, MQSeries, SQL, MLLP, FTP.
Each of these types have a different host instances, to balance out the load.
Our orchestrations use the BiztalkApplication host.
On this server also a couple of scripts are running, mostly ftp upload scripts & also a zipper script, which zips files every half an hour in a daily zip and deletes the zip files after a month. We use this zipscript on our backup files (we backup a lot, backups are also on our server), we did this because the server had problems with sending files to a location where there were a lot (A LOT) of files, so after the files were reduced to zips it went better.
Now the problems we are having recently are mainly two major problems:
Our most important problem is the following. We kept a receive location with a lot of messages on a queue for testing. After we start this receive location which uses the 47 orchestrations, the running service instances start to sky rock. Ok, this is pretty normal. Let's say about 10000, and then we stop the receive location to see how biztalk handles these 10000 instances. Normally they would go down pretty fast, and it does sometimes, but after a while it starts to "throttle", meaning they just stop being processed and the service instances stay at the same number, for example in 30 seconds it goes down from 10000 to 4000 and then it stays at 4000 and it lowers very very very slowly, like 30 in 5minutes or something. So this means, that all the other service instances of the other applications are also stuck in here, and they are also not processed.
We noticed that after restarting our host instances the instance number went down fast again. So we tried to selectively restart different host instances to locate the problem. We noticed that eventually restarting the file send/receive host instance would do the trick. So we thought file sends would be the problem. Concidering that we make a lot of backups. So we replaced the file type backups with mqseries backups. The same problem occured, and funny thing, restarting the file send/receive host still fixes the problem.
No errors can be found in the event viewer either.
A second problem we're having is. That sometimes at arround 6 am, all or a part of the host instances are being stopped.
In the event viewer we noticed the following errors (these are more than one):
The receive location "MdnBericht SQL" with URL "SQL://ZNACDBPEG/mdnd0001/" is shutting down. Details:"The error threshold has been exceeded. The receive location is shutting down.".
The Messaging Engine failed to add a receive location "M2m Othello Export Start Bestand" with URL "\m2mservices\Othello_import$\DataFilter Start*.xml" to the adapter "FILE". Reason: "The FILE adapter cannot access the folder \m2mservices\Othello_import$\DataFilter Start.
Verify this folder exists.
Error: Logon failure: unknown user name or bad password.
".
The FILE adapter cannot access the folder \m2mservices\Othello_import$\DataFilter Start.
Verify this folder exists.
Error: Logon failure: unknown user name or bad password.
An attempt to connect to "BizTalkMsgBoxDb" SQL Server database on server "ZNACDBBTS" failed.
Error: "Login failed for user ''. The user is not associated with a trusted SQL Server connection."
It woould seem that there's a login failure at this time and that because of it other services are also experiencing problems, and eventually they are shut down.
The thing is, our user is admin, and it's impossible that it's password is wrong "sometimes". We have concidering that the problem could be due to an infrastructure problem, but that's not really are department.
I know it's a long post, but we're not sure anymore what to do. Would adding another server and balancing the load solve our problems? Is there a way to meassure our balance and know where to start splitting? What are normal numbers of load etc?
I appreciate any answers because these issues are getting worse and we're also on a deadline.
Thanks a lot for replies!
Your immediate problem is BizTalk throttling feature. It's supposed to help BizTalk survive temporary overload conditions. One of its many problems is that you can see the throttling kick-in only in the performance monitor and not in the event log.
What you should do:
Separate the new application to a different host than the rest of the applications. Throttling is done in the host level. So the problematic application wont affect the rest of the applications.
Read about how to disable throttling in the link above.
What we have done is implementing an external throttling service. That feed the BizTalk receive location in small digestible packets. Its ugly, but the problem is ugly.
Update to comment: You have enough host instances. So Ignore that advice. You may reorder the applications between the instances. But there are no clear guidelines to do that. So its just shuffling and guessing.
About the safeness of disabling throttling. This feature doesn't make much sense in many scenarios. You have to study it. Check which of the throttling parameters you are hitting (this can be seen in the performance monitor) and decide how to change the thresholds.
How many host instances do you have?
From the line:
The send and receive ports have a lot
of different types: File, MQSeries,
SQL, MLLP, FTP. Each of these types
have a different host instances, to
balance out the load. Our
orchestrations use the
BiztalkApplication host
It sounds like you have a lot - I recently did an audit of a system where BizTalk was self throttling and the issue was in part due to too many host instances. Each host instance places its own load upon the BizTalk messagebox, as well as chewing up a minimum of 200mb memory.
Reading your comment, you have 20 - this is too many and would be a big part of your problems.
A good starting host setup would be:
A dedicated tracking host
One host that contains all receive handlers for adapters
One host that contains all orchestrations
One host that contains all send handlers for adapters
One host for adapters that need to be clustered (like FTP and MSMQ)
You can then also consider things like introducing "real time" hosts and batched hosts, so you can tune the real time hosts for low latency.
You can also have hosts for specific applications if there are known to be unstable, but in general this should not be done.
I run a BizTalk system that has similar problems and can empathize with what you are seeing. I don't know if it's the same, but I thought I'd share my experience in case.
In the same manner restarting the send/receive seems to fix the problem. In my case I found a direct correlation to memory usage by the host processes. I used performance counters to see when a given host was throttled for memory. By creating extra hosts, and moving orchestrations and ports between them I was able to narrow down which business sets were causing the problem. Basically in my case restarting the hosts was the equivalent to the ultimate "garbage collection" to free up memory. This was of course until enough instances came through to gobble it up again.
I'm afraid I have not solved the issue yet, but a few things I found to alleviate the issue:
Raise the memory to a given process so that throttling does not occur or occurs later
Each host instance, while informative, does have an overhead that is added. Try combining hosts that are not your problem children together to reduce the memory foot print.
Throw hardware at the problem, ram is cheap
I measure the following every few minutes in perfmon so I can diagnose where the problem is:
BizTalk:MessageAgent(*)\Process memory usage (MB)
BizTalk:MessageAgent(*)\Process memory usage threshold
Memory\Available MBytes
A few other things to take a look at. Make sure any custom pipelines use good BizTalk memory practices (i.e. no XML DOM manipulation hiding somewhere, etc). Also theoretically reducing the number of threads for a given host should lower the amount of memory it can seize at one time. I did not seem to have much luck with this one. Maybe the BizTalk throttling overrode it as others have mentioned, I don't know. Also, on a final note, if you dump the perfmon results to a csv, with Excel you can make some pretty memory usage graphs. These might be useful for talking to management about buying more hardware. That's assuming your issue fits this scenario as well.
We fixed the problem temporarily due to a combination of all ur answers.
We set the process memory usage throttling parameters of some hosts higher.
We divided the balance of the host instances better after I analyzed all the memory usage of all hosts, thanks to performance counters and also with the use of a tool called MsgBoxViewer.
And now we're trying to get more physical memory & hopefully also an extra server or a 64bit server.
Thanks for all replies!
We recently installed a 64-bit server in cluster with our older server. Thanks to this we can balance the memory even better which solved a lot of problems.
Although the 64-bit didn't give us much improvements (except for a bit more memory) since it can't use 64-bits on IBM MQ's, MLLP's, HL7 pipelines etc...
The other answers are helpful for run-time performance tuning, but i would recommend a design change as well.
You say that you do a lot of message manipulation in the orchestration in the message assignment shapes.
I would recommend moving that code to dedicated transforms. They are much more light weight, and can be executed faster. You can combine custom xslt and c# in these maps to do the hard work. Orchestrations cost more in development, design and testing, and a whole lot more in run-time performance.
You can then use transforms for message transformation, and leave the orchestrating (what is left of it after moving the message assignment code) to the orchestrations.
The added benefit of using transforms over orchestrations is that they are much more testable.