I recently monitored my sql database activity I found about 400 processes in activity monitoring.Later I figured that the problem is with my connection string object which would not be cleared physically even though I completely closed and disposed it, so once I suspend my IIS all the processes from activity monitoring would disappear.
after a little searching I found that I can clean all of my connections from application pool so that all the useless processes from SMSS would be killed but
I'm really concerned about it's impact on webserver. It's true that this approach would clear useless tasks from SMSS but for every request a new connection should really be created is it worth it???
considering my application is kind of enterprise app which is supposed to handle to many requests, I'm so afraid of making IIS server down by using this approach.
Do notice that my connection string value is not completely fixed for all the requeests, I made it variable by changing only "Application Name" section of it in every request according to the request parameters for the purpose of getting requestors information in sql activity monitoring and sql profiler.
is it worth to do so considering my business scope or it's better I fix the connection string value in other word is performance lag on this approach is so severe that I have to change my logging strategy or it's just a little slower???
Do notice that my connection string value is not completely fixed for all the requeests, I made it variable by changing only "Application Name" section of it in every request according to the request parameters for the purpose of getting requestors information in sql activity monitoring and sql profiler.
This is really bad because it kills pooling. You might as well disable pooling but that comes with a heavy performance penalty (which you are paying right now already).
Don't do that. Obtain monitoring information in a different way.
Besides that, neither SQL Server nor .NET have a problem with 400 connections. That's unusually high but will not cause problems.
If you run multiple instances of the app (e.g. for HA) this will multiply. The limit is 30k. I'm not aware of any reasons why this would cause a slowdown for the app, but it might cause problems for your monitoring tools.
Related
We are running a Http Api and want to be able to set a limit to the number of requests a user can do per time unit. When this limit has been reached, we don't want the users to receive errors, such as Http 429. Instead we want to increase the response times. This has the result that the users can continue to work, but slower, and can then choose to upgrade or not upgrade its paying plan. This solution can quite easily be implemented using Thread.sleep (or something similar) for x number of seconds, on all requests of a user that has passed its limit.
We think that in worst case there might be a problem with the number of possible connections for a single server, since as long as we keep delaying the response, we keep a connection open, and therefore limiting the number of possible other connections.
All requests to the Api is running asynchronously. The Server itself is built to be scalable and is running behind a load balancer. We can start up additional servers if necessary.
When searching for this type of throttling, we find very few examples of this way of limiting the users, and the examples we found seemed not concerned at all about connections running out. So we wonder is this not a problem?
Are there any downsides to this that we are missing, or is this a feasible solution? How many connections can we have open simultaneously without starting to get problems? Can our vision be solved in another way, that is without giving errors to the user?
Thread.Sleep() is pretty much the worst possible thing you can do on a web server. It doesn't matter that you are running things asynchronously because that only applies to I/O bound operations and then frees the thread to do more work.
By using a Sleep() command, you will effectively be taking that thread out of commission for the time it sleeps.
ASP.Net App Pools have a limited number of threads available to them, and therefore in the worst case scenario, you will max out the total number of connections to your server at 40-50 (whatever the default is), if all of them are sleeping at once.
Secondly
This opens up a major attack vector in terms of DOS. If I am an attacker, I could easily take out your entire server by spinning up 100 or 1000 connections, all using the same API key. Using this approach, the server will dutifully start putting all the threads to sleep and then it's game over.
UPDATE
So you could use Task.Delay() in order to insert an arbitrary amount of latency in the response. Under the hood it uses a Timer which is much lighter weight than using a thread.
await Task.Delay(numberOfMilliseconds);
However...
This only takes care of one side of the equation. You still have an open connection to your server for the duration of the delay. Because this is a limited resource it still leaves you vulnerable to a DOS attack that wouldn't have normally existed.
This may be an acceptable risk for you, but you should at least be aware of the possibility.
Why not simply add a "Please Wait..." on the client to artificially look like it's processing? Adding artificial delays on server costs you, it leaves connections as well as threads tied up unnecessarily.
I'm developing an asp.net MVC web application and the client has request that we try our best to make it as resilient as possible to Denial of Service attacks. They are worried that the site may receive malicious high volume requests with the intention to slow/take down the site.
I have discussed this with the product owner as really being out of the remit for the actual web application. I believe it falls to the responsibility of the hosting/network team to monitor traffic and respond to malicious requests.
However they are adamant that the application should have some precautions built into it. They do not want to implement CAPTCHA though.
It has been suggested that we restrict the number of requests that can be made for a session within a given time frame. I was thinking of doing something like this
Best way to implement request throttling in ASP.NET MVC? But using the session id not the client IP as this would cause problems for users coming from behind a corporate firewall - their IP would all be the same.
They have also suggested adding the ability to turn off certain areas of the site - suggesting that an admin user could turn off database intensive areas..... However this would be controlled through the UI and surely if it was under DOS attack an admin user would not be able to get to it anyway.
My question is, is it really worth doing this? Surely a real DOS attack would be much more advanced?
Do you have any other suggestions?
A Denial of Service attack can be pretty much anything that would affect the stability of your service for other people. In this case you're talking about a network DoS and as already stated, this generally wouldn't happen at your application level.
Ideally, this kind of attack would be mitigated at the network level. There are dedicated firewalls that are built for this such as the Cisco ASA 5500 series which works it's way up from basic protection through to high throughput mitigation. They're pretty smart boxes and I can vouch for their effectiveness at blocking these type of attacks, so long as the correct model for the throughput you're getting is being used.
Of course, if it's not possible to have access to a hardware firewall that does this for you, there are some stopgap measures you can put in place to assist with defence from these types of attacks. Please note that none of these are going to be even half as effective as a dedicated firewall would be.
One such example would be the IIS Module Dynamic IP Restrictions which allows you to define a limit of maximum concurrent requests. However, in practice this has a downside in that it may start blocking legitimate requests from browsers that have a high concurrent request throughput for downloading scripts and images etc.
Finally, something you could do that is really crude, but also really effective, is something like what I had written previously. Basically, it was a small tool that monitors log files for duplicate requests from the same IP. So let's say 10 requests to /Home over 2 seconds from 1.2.3.4. If this was detected, a firewall rule (in Windows Advanced Firewall, added using the shell commands) would be added to block requests from this IP, the rule could then be removed 30 minutes later or so.
Like I say, it's very crude, but if you have to do it at the server level, you don't really have many sensible options since it's not where it should be done. You are exactly correct in that the responsibility somewhat lies with the hosting provider.
Finally, you're right about the CAPTCHA, too. If anything, it could assist with a DoS by performing image generation (which could be resource intensive) over and over again, thus starving your resources even more. The time that a CAPTCHA would be effective though, would be if your site were to be spammed by automated registration bots, but I'm sure you knew that already.
If you really want to do something at application level just to please the powers that be, implementing something IP-based request restriction in your app is doable, albeit 90% ineffective (since you will still have to process the request).
You could implement the solution in the cloud and scale servers if you absolutely had to stay up, but it could get expensive...
Another idea would be to log the ip addresses of registered users. In the event of a DOS restrict all traffic to requests from 'good' users.
Preventing a true DoS attack on the application-level is not really doable, as the requests will most probably kill your webserver before it kills your application due to the fact that your application is associated with an application pool which again has a maximum of concurrent requests defined by the server technology you are using.
This interesting article
http://www.asp.net/web-forms/tutorials/aspnet-45/using-asynchronous-methods-in-aspnet-45
states that windows 7, windows Vista and Windows 8 have a maximum of 10 concurrent requests. It goes further with the statement that "You will need a Windows server operating system to see the benefits of asynchronous methods under high load".
You can increase the HTTP.sys queue limit of the application pool that is associated with your application in order to increase the amount of requests that will be queued (for later computation when threads are ready), which will prevent the HTTP Protocol Stack (HTTP.sys)
from returning Http error 503 when the limit is exceeded and no worker-process is available to handle further requests.
You mention that the customer requires you to "try [your] best to make it as resilient as possible to Denial of Service attacks".
My suggestion might not be an applicable measure in your situation, but you could look into implementing the Task-based Asynchronous Pattern (TAP) mentioned in the article in order to accommodate the customers requirement.
This pattern will release threads while long-lasting operations are performed and making the threads available for further requests (thus keeping your HTTP.sys queue lower) - while also giving your application the benefit of increased overall performance when multiple requests to third-party services or multiple intensive IO computations are performed.
This measure will NOT make your application resilient to DoS attacks, but it will make your application as responsible as possible on the hardware that it is served on.
I am experiencing the exact same issue as a user reports on eggheadcafe, but don't know what steps to take after reading the following answer.:
Two problems you should chase down:
1. Why is the website leaking resources to the finalizers. That is
bad
2. What is Oracle code waiting on -- work with Oracle's support on it
This is the issue:
I have an intermittent problem with a
web site hosted on IIS6 (w2k3 sp2).
I appears to occur randomly to users
when they click on a hyperlink within
a page. The request is sent to the
web server but a response is never
returned. If the user tries to
navigate to another hyperlink they are
not able to (i.e. the web site appears
to hang for that user). Other users
of the website at the time are not
affected by this hang and if the user
with the problem opens a new http
session (closing IE and opening the
web site again) they no longer
experience the hang.
I've placed a debugger (IISState) on
the w3wp process with the following
output. Entries with "Thread is
waiting for a lock to be released.
Looking for lock owner." look like
they might be causing the issue. Can
anyone tell what lock the process is
waiting on?
Thanks
http://www.eggheadcafe.com/software/aspnet/33799697/session-hangs.aspx
In my case my .Net C# MVC application runs against a MySQL database for data and a MS SQL database for .Net membership.
I hope someone with more knowledge of IIS can help resolve this problem.
It sounds like you have a race condition in your database calls resulting in a deadlock at the database level. You may want to look at the settings you have in your application pool for database connections. Likely you will need to put some checks in somewhere or redefine procedures in order to reduce the likelihood of the race:
http://msdn.microsoft.com/en-us/library/ms178104.aspx
I would explain the experienced hang due to session serialization. Not the part about saving/loading it from some source, but that ASP.NET does not allow the same session to execute two parallel pages simultaneously, unless they execute with a readonly-session. The later is done either in the page directive, or in web.config, by setting EnableSessionState="ReadOnly".
Your problem still exists, this wont change that the first thread hangs. I would verify that your database connections are disposed correctly. However, you never mention any Oracle database in your question (only Mysql and SQL Server). Why are you using the Oracle drivers at all? (This seems like a valid place to start debugging.)
However, as stated by David Wang in his answer in your linked question, part two of your problem is a lock that's never released. You'll need support from Oracle (or their source code) to debug this further.
IIS hang is not something surprising. IISState is out of date, and you may use Debug Diag,
http://support.microsoft.com/kb/919791 (if CPU usage is high)
http://support.microsoft.com/kb/919792 (otherwise)
The hang dumps should tell you what is the root cause.
Microsoft support can help analyze the dumps, if you are not familiar with the tricks. http://support.microsoft.com
Currently there is discussion as to what are the pros and cons of having a single sql connection architecture.
To elaborate what we are discussing is, at application creation open a sql connection and at application close or error closing the sql connection. And not creating another connection at all, but using just that one to talk with the DB.
We are wondering what the community thinks.
Close the connection as soon as you do not longer need it for an undefined amount of time. By doing so, the connection returns to the connection-pool (if connection pooling is enabled), and can be (re)used by someone else.
(Connections are expensive resources, and are sometimes limited).
If you keep hold on a connection for the entire lifetime of an application, and you have multiple users for that application (thus multiple instances of the app, and multiple connections), and if your DB server is limited to have only x number of concurrent connections, then you could have a problem ....
See also best practices for ado.net
Follow this simple rule... Open connection as late as possible and close it as soon as possible.
I think it's a bad idea, for several reasons.
If you have 10,000 users using your application, that's 10,000 connections open constantly
If you have to restart your Sql Server, all those 10,000 connections are invalidated and your application will suddenly - assuming you've included reconnect logic - be making 10000 near-simultaneous re-connect requests.
To expand on point 1, you should close connections as soon as you can because otherwise you're using up a finite resource for, potentially, an inifinite period of time. If you had Sql Server configured to allow a maximum of 10,001 simultaneous connections, then you can only have 10,001 users running your application at any one time. If you open/close connections on demand then your application will scale much further as the likelihood of all the active users making use of the database simultaneously is, realistically, low.
Under the covers, ADO.NET uses connection pooling to manage the connections to the database. I would suggest leaving it up to the connection pool to take care of your connection needs. Keeping a connection open for the duration of your application is a bad idea.
I use a helpdesk system called Richmond Systems that uses one connection for the life of the application, and as a laptop user, it is a royal pain in the behind. Even when I carry my laptop around open, the jumps between the wireless access points are enough to drop the DB conenction. The software then complains about the DB conenction, gets into an error state and won't close. It has to be killed manually from Task Manager.
In short, DON'T HOLD OPEN A DATABASE CONNECTION FOR LONGER THAN NECESSARY.
But on the flip side, I'd be cautious about opening and closing connections too often. This is a lot cheaper with connection pooling than without, but even with pooling, the pool manager may decide to grow or shrink the pool, turning it back into an expensive operation.
My general rule is to open a connection when the user initiates some action, do the work, then close the connection before waiting for the next user input. For any given "Update" button click or whatever, I'll generally have only one connection. But you definately do not want to keep connections open while waiting for user input if you can at all help it for all the reasons others have mentioned. You could literally wait for days before the user presses another key or touches another button -- what if he leaves his computer on and goes on vacation? Tying up a resource for unpredictable amounts of time like that is bad news. In most cases, the elapsed time waiting for user input will far exceed the time doing actual work.
we have a biztalk server (a virtual one (1!)...) at our company, and an sql server where the data is being kept.
Now we have a lot of data traffic. I'm talking about hundred of thousands. So I'm actually not even sure if one server is pretty safe, but our company is not that easy to convince.
Now recently we have a lot of problems.
Allow me to situate in detail, so I'm not missing anything:
Our server has 5 applications:
One with 3 orchestrations, 12 send ports, 16 receive locations.
One with 4 orchestrations, 32 send ports, 20 receive locations.
One with 4 orchestrations, 24 send ports, 20 receive locations.
One with 47 (yes 47) orchestrations, 37 send ports, 6 receive locations.
One with common application with a couple of resources.
Our problems have occured since we deployed the applications with the 47 orchestrations.
A lot of these orchestrations use assign shapes which use c# code to do the mapping. This is because we use HL7 extensions and this is kind of special, so by using c# code & xpath it was a lot easier to do the mapping because a lot of these schema's look alike. The c# reads in XmlNodes received through xpath, and returns XmlNode which are then assigned again to biztalk messages. I'm not sure if this could be the cause, but I thought I'd mention it.
The send and receive ports have a lot of different types: File, MQSeries, SQL, MLLP, FTP.
Each of these types have a different host instances, to balance out the load.
Our orchestrations use the BiztalkApplication host.
On this server also a couple of scripts are running, mostly ftp upload scripts & also a zipper script, which zips files every half an hour in a daily zip and deletes the zip files after a month. We use this zipscript on our backup files (we backup a lot, backups are also on our server), we did this because the server had problems with sending files to a location where there were a lot (A LOT) of files, so after the files were reduced to zips it went better.
Now the problems we are having recently are mainly two major problems:
Our most important problem is the following. We kept a receive location with a lot of messages on a queue for testing. After we start this receive location which uses the 47 orchestrations, the running service instances start to sky rock. Ok, this is pretty normal. Let's say about 10000, and then we stop the receive location to see how biztalk handles these 10000 instances. Normally they would go down pretty fast, and it does sometimes, but after a while it starts to "throttle", meaning they just stop being processed and the service instances stay at the same number, for example in 30 seconds it goes down from 10000 to 4000 and then it stays at 4000 and it lowers very very very slowly, like 30 in 5minutes or something. So this means, that all the other service instances of the other applications are also stuck in here, and they are also not processed.
We noticed that after restarting our host instances the instance number went down fast again. So we tried to selectively restart different host instances to locate the problem. We noticed that eventually restarting the file send/receive host instance would do the trick. So we thought file sends would be the problem. Concidering that we make a lot of backups. So we replaced the file type backups with mqseries backups. The same problem occured, and funny thing, restarting the file send/receive host still fixes the problem.
No errors can be found in the event viewer either.
A second problem we're having is. That sometimes at arround 6 am, all or a part of the host instances are being stopped.
In the event viewer we noticed the following errors (these are more than one):
The receive location "MdnBericht SQL" with URL "SQL://ZNACDBPEG/mdnd0001/" is shutting down. Details:"The error threshold has been exceeded. The receive location is shutting down.".
The Messaging Engine failed to add a receive location "M2m Othello Export Start Bestand" with URL "\m2mservices\Othello_import$\DataFilter Start*.xml" to the adapter "FILE". Reason: "The FILE adapter cannot access the folder \m2mservices\Othello_import$\DataFilter Start.
Verify this folder exists.
Error: Logon failure: unknown user name or bad password.
".
The FILE adapter cannot access the folder \m2mservices\Othello_import$\DataFilter Start.
Verify this folder exists.
Error: Logon failure: unknown user name or bad password.
An attempt to connect to "BizTalkMsgBoxDb" SQL Server database on server "ZNACDBBTS" failed.
Error: "Login failed for user ''. The user is not associated with a trusted SQL Server connection."
It woould seem that there's a login failure at this time and that because of it other services are also experiencing problems, and eventually they are shut down.
The thing is, our user is admin, and it's impossible that it's password is wrong "sometimes". We have concidering that the problem could be due to an infrastructure problem, but that's not really are department.
I know it's a long post, but we're not sure anymore what to do. Would adding another server and balancing the load solve our problems? Is there a way to meassure our balance and know where to start splitting? What are normal numbers of load etc?
I appreciate any answers because these issues are getting worse and we're also on a deadline.
Thanks a lot for replies!
Your immediate problem is BizTalk throttling feature. It's supposed to help BizTalk survive temporary overload conditions. One of its many problems is that you can see the throttling kick-in only in the performance monitor and not in the event log.
What you should do:
Separate the new application to a different host than the rest of the applications. Throttling is done in the host level. So the problematic application wont affect the rest of the applications.
Read about how to disable throttling in the link above.
What we have done is implementing an external throttling service. That feed the BizTalk receive location in small digestible packets. Its ugly, but the problem is ugly.
Update to comment: You have enough host instances. So Ignore that advice. You may reorder the applications between the instances. But there are no clear guidelines to do that. So its just shuffling and guessing.
About the safeness of disabling throttling. This feature doesn't make much sense in many scenarios. You have to study it. Check which of the throttling parameters you are hitting (this can be seen in the performance monitor) and decide how to change the thresholds.
How many host instances do you have?
From the line:
The send and receive ports have a lot
of different types: File, MQSeries,
SQL, MLLP, FTP. Each of these types
have a different host instances, to
balance out the load. Our
orchestrations use the
BiztalkApplication host
It sounds like you have a lot - I recently did an audit of a system where BizTalk was self throttling and the issue was in part due to too many host instances. Each host instance places its own load upon the BizTalk messagebox, as well as chewing up a minimum of 200mb memory.
Reading your comment, you have 20 - this is too many and would be a big part of your problems.
A good starting host setup would be:
A dedicated tracking host
One host that contains all receive handlers for adapters
One host that contains all orchestrations
One host that contains all send handlers for adapters
One host for adapters that need to be clustered (like FTP and MSMQ)
You can then also consider things like introducing "real time" hosts and batched hosts, so you can tune the real time hosts for low latency.
You can also have hosts for specific applications if there are known to be unstable, but in general this should not be done.
I run a BizTalk system that has similar problems and can empathize with what you are seeing. I don't know if it's the same, but I thought I'd share my experience in case.
In the same manner restarting the send/receive seems to fix the problem. In my case I found a direct correlation to memory usage by the host processes. I used performance counters to see when a given host was throttled for memory. By creating extra hosts, and moving orchestrations and ports between them I was able to narrow down which business sets were causing the problem. Basically in my case restarting the hosts was the equivalent to the ultimate "garbage collection" to free up memory. This was of course until enough instances came through to gobble it up again.
I'm afraid I have not solved the issue yet, but a few things I found to alleviate the issue:
Raise the memory to a given process so that throttling does not occur or occurs later
Each host instance, while informative, does have an overhead that is added. Try combining hosts that are not your problem children together to reduce the memory foot print.
Throw hardware at the problem, ram is cheap
I measure the following every few minutes in perfmon so I can diagnose where the problem is:
BizTalk:MessageAgent(*)\Process memory usage (MB)
BizTalk:MessageAgent(*)\Process memory usage threshold
Memory\Available MBytes
A few other things to take a look at. Make sure any custom pipelines use good BizTalk memory practices (i.e. no XML DOM manipulation hiding somewhere, etc). Also theoretically reducing the number of threads for a given host should lower the amount of memory it can seize at one time. I did not seem to have much luck with this one. Maybe the BizTalk throttling overrode it as others have mentioned, I don't know. Also, on a final note, if you dump the perfmon results to a csv, with Excel you can make some pretty memory usage graphs. These might be useful for talking to management about buying more hardware. That's assuming your issue fits this scenario as well.
We fixed the problem temporarily due to a combination of all ur answers.
We set the process memory usage throttling parameters of some hosts higher.
We divided the balance of the host instances better after I analyzed all the memory usage of all hosts, thanks to performance counters and also with the use of a tool called MsgBoxViewer.
And now we're trying to get more physical memory & hopefully also an extra server or a 64bit server.
Thanks for all replies!
We recently installed a 64-bit server in cluster with our older server. Thanks to this we can balance the memory even better which solved a lot of problems.
Although the 64-bit didn't give us much improvements (except for a bit more memory) since it can't use 64-bits on IBM MQ's, MLLP's, HL7 pipelines etc...
The other answers are helpful for run-time performance tuning, but i would recommend a design change as well.
You say that you do a lot of message manipulation in the orchestration in the message assignment shapes.
I would recommend moving that code to dedicated transforms. They are much more light weight, and can be executed faster. You can combine custom xslt and c# in these maps to do the hard work. Orchestrations cost more in development, design and testing, and a whole lot more in run-time performance.
You can then use transforms for message transformation, and leave the orchestrating (what is left of it after moving the message assignment code) to the orchestrations.
The added benefit of using transforms over orchestrations is that they are much more testable.