I work on a very large, high traffic ecommerce website. We're currently migrating our site from ColdFusion to .NET. We've recently run into an issue during this conversion that I was hoping for a little help with. Our current website is about 1/3 .net now and 2/3 ColdFusion.
One issue though, is that when we release our latest project, which is a project to convert the My Account section everything is fine for awhile, but anywhere between 3 to 24 hours the website just crashes. In order to get it back up, we need to restart IIS and sometimes ColdFusion. When I say crashes, I mean it just hangs, sits there and spins forever.
We have really good server monitoring, but when we look at the services memory nothing look unusual except for the number of connections to SQL. For some reason fairly quickly before the crash SQL shoots up in the number of connections, it goes from around 24 connections to around 100, just sits there and the site goes down until we restart services.
We currently use SQL Server 2005, Entity framework as our data access method and we're on IIS 7.5. Our web server is virtual but our database is physical.
We've had multiple people on our team go through all of the code in this new project to confirm that their were no connections that were being left open, as based on the connections issues that's sort of how is seems. We couldn't find any connections left open, not one.
This is an example of our current data access to entity:
/// <summary>
/// Get Products by their Primary Category ID. Default Category ID is 0: Top Level Categories.
/// </summary>
/// <param name="languageCode">Two character language code of Categories being searched. Defined in dbo.Languages, LanguageCode field.</param>
/// <param name="primaryCategoryId">int - Primary Category ID</param>
/// <returns>List<Product%gt;</returns>
public List<Products.Product> GetProducts(string languageCode, int primaryCategoryId = 0)
{
CatalogEntity context = null;
EntityConnection conn = null;
try
{
conn = this.GetConnection();
context = new CatalogEntity(conn);
List<I_Products> Products = context.GetProductsByPrimaryCatId(primaryCategoryId, languageCode).Distinct().ToList();
return Products.Select(Product => new Products.Product(Product)).Distinct().ToList();
}
catch (System.Exception ex)
{
string message = "Error occurred while calling GetProducts.";
throw new Exception.CatalogDataException(message, CodeLibrary.Core.Helpers.ProcessHelper.GetProcessName(this), ex);
}
finally
{
if (conn != null && conn.State == ConnectionState.Open) conn.Close();
if (context != null) context.Dispose();
conn.Dispose();
}
}
Again, this is just one example of one of our data access methods in C#. Don't see any issues with this do you? Again, we use this format across the board. We've confirmed this.
With the new .net project, we use .net membership provider. We use a CLR to encrypt users passwords with a hash so that we can use the same hash method in CF. Not sure if this is the issue but thought it was worth mentioning.
Any ideas?
There is a list of possibilities here. For example, when a call to SQL server fails to return data to CF, CF can hang onto the thread. It becomes a sort of "phantom thread." CF then creates new connections to the DB server and adds them to the connection pool - resulting in the many extra connections you are seeing. It's counted against the "simultaneous requests" setting in the CF admin. When there are enough of them "hanging" your requests queue and your server locks up even if it doesn't appear anything is going on. You can see this behavior by enabling metrics, by using the server monitor (if on the Enterprise version) or by using fusionreactor (an excellent and inexpensive 3rd party introspective monitor for your CF/Java server).
Of course that is what is happening. You have to find out why it's happening. Among the possibilities are:
Networking - sometimes autosynch on ports to your switches can interupt connections and result in hanging "phantom" threads. See this post on Hanging jrun and networking.
Database Locking - this can produce issues like this and may be occuring even if you think you are not seeing it. It's sometimes tricky to catch. One particular locking issue that can be troublesome is the "max degree of parallelism" which can result in fairly idle looking DB connections that are nonetheless hanging.
You will probably need to get a bit more information on the CF side of things to know exactly what is going on here.
Follow up ... I'm providing some possibilities from the CF side even though your question was from the .NET side. I'm assuming that CF could be in play since restarting CF sometimes fixes the issue.
Related
This is not really a problem per say but I have this requirement at work whereby I need to write a windows application in C# that will monitor all our internal and external systems. Some of these systems are websites and some are windows applications which are constantly polling data the whole time to and from the database including soap api calls. What my application needs to do is monitor these systems and notify relevant users whenever downtime occurs and for how long has it been offline.
I have done the database design using SQL Server as a DBMS but I'm stuck in terms of implementation. What approach can I use to achieve this? TCP/IP?
This application should run every x seconds.
I have created a few flags inside an enum that will constantly check the application state if its ok, in erroneous state or whether it should warn the user. In addition to this I have also created a constructor that will initialize all the components of the service monitor through a DLL.
Something like so:
[Flags]
public enum ClientApplicationState
{
ERROR = 0,
WARNING = 1,
OK = 2
}
/// <summary>
/// Constructor which sets up and initalizes all the components for the Service Monitoring DLL
/// </summary>
/// <param name="applicationName">The Name Of The Application</param>
/// <param name="port">The port on which to listen to</param>
/// <param name="timerPeriod"> Optional time in milisecinds to overide the default update freq (default is 1000)</param>
public ServiceMonitor(string applicationName, short port, int timerPeriod = 1000)
{
_messages = new List<string>();
_state = ClientApplicationState.OK;
//TODO: Throw port exception
try
{
_monClient = new MonitoringClient(applicationName, ClientApplicationState.OK, "Starting Up",
Process.GetCurrentProcess().ProcessName, port);
}
catch (Exception e)
{
}
InitTimer(timerPeriod);
_updateTimer.Enabled = true;
}
The websites are hosted and are live and the windows applications are running on our Windows 2012 R2 server.
How can I approach this?
Your questions are far too general to answer. But I think you need to really look at the "things" you intend to monitor. And then you need to understand services (if that is the approach you want to take - which seems appropriate).
A service is started and runs continuously (logically). Your idea of running "every x seconds" is a bit misleading. You may want your service to "check" every x seconds, but don't confuse that idea with the application executing. And, of course, you should be able to configure that frequency - another feature you should plan for.
To be honest, I don't really see the purpose of a database yet. And based on what I read, I suggest you stop thinking about the implementation and start thinking about this service as a whole. Where you store data is irrelevant at this point in time. You need to decide what the service does (and "monitor for downtime" is a very premature requirement), for what "things" it does this service, and how it will respond. What happens when your service stops? Do you monitor the monitor? Who will install/configure it? Will it provide any information while running? And so on.
And before you go further, it seems like you need to write what is a rather complex service/system. Something MS (and others) have already done - like the MS Operations Manager https://technet.microsoft.com/library/hh230741.aspx. If you're going to reinvent the wheel (at least partially), then it may help you to look at the documentation to see what it does, how it is configured, the concepts behind it, etc.
I have a console batch application which includes a process that uses SqlDataAdapter.Fill(DataTable) to perform a simple SELECT on a table.
private DataTable getMyTable(string conStr)
{
DataTable tb = new DataTable();
StringBuilder bSql = new StringBuilder();
bSql.AppendLine("SELECT * FROM MyDB.dbo.MyTable");
bSql.AppendLine("WHERE LEN(IdString) > 0");
try
{
string connStr = ConfigurationManager.ConnectionStrings[conStr].ConnectionString;
using (SqlConnection conn = new SqlConnection(connStr))
{
conn.Open();
using (SqlDataAdapter adpt = new SqlDataAdapter(bSql.ToString(), conn))
{
adpt.Fill(tb);
}
}
return tb;
}
catch (SqlException sx)
{
throw sx;
}
catch (Exception ex)
{
throw ex;
}
}
This method is executed synchronously, and was run successfully in several test environments over many months of testing -- both when started from the command-line or started under control of an AutoSys job.
When moved into production, however, the process hung up -- at the Fill method as nearly as we can tell. Worse, instead of timing out, it apparently started spawning new request threads, and after a couple hours, had consumed more than 5 GB of memory on the application server. This affected other active applications, making me very unpopular. There was no exception thrown.
The Connection String is about as plain-vanilla as they come.
"data source=SERVER\INSTANCE;initial catalog=MyDB;integrated security=True;"
Apologies if I use the wrong terms regarding what the SQL DBA reported below, but when we had a trace put on the SQL Server, it showed the Application ID (under which the AutoSys job was running) being accepted as a valid login. The server then appeared to process the SELECT query. However, it never returned a response. Instead, it went into an "awaiting command" status. The request thread appeared to remain open for a few minutes, then disappeared.
The DBA said there was no sign of a deadlock, but that he would need to monitor in real time to determine whether there was blocking.
This only occurs in the production environment; in test environments, the SQL Servers always responded in under a second.
The AutoSys Application ID is not a new one -- it's been used for several years with other SQL Servers and had no issues. The DBA even ran the SELECT query manually on the production SQL server logged in as that ID, and it responded normally.
We've been unable to reproduce the problem in any non-production environment, and hesitate to run it in production without a server admin standing by to kill the process. Our security requirements limit my access to view server logs and processes, and I usually have to engage another specialist to look at them for me.
We need to solve this problem sooner or later. The amount of data we're looking at is currently only a few rows, but will increase over the next few months. From what's happening, my best guess is that it involves communication and/or security between the application server and the SQL server.
Any additional ideas or items to investigate are welcome. Thanks everyone.
This may be tied to permissions. SQL Server does some odd things instead of giving a proper error message sometimes.
My suggestion, and this might improve performance anyway, is to write a stored procedure on the server side that executes the select, and call the stored procedure. This way, the DBA can ensure you have proper access to the stored procedure without allowing direct access to the table if for some reason that's being blocked, plus you should see a slight performance boost.
Though it may be caused by some strange permissions/ADO.NET issues as mentioned by #user1895086, I'd nonetheless would recommend to recheck a few things one more time:
Ensure that queries run manually by DBA and executed in your App are the same - either hardcode it or at least log just before running. It is better to be safe than sorry.
Try to select only few rows - it is always a good idea to not select the entire table if you can avoid it, and in our case SELECT TOP 1(or 100) query may not exhibit such problems. Perhaps there is just much more data than you think and ADO.Net just dutifully tries to load all those rows. Or perhaps not.
Try SqlDataReader to be sure that SqlDataAdapter does not cause any issues - yes, it uses the same DataAdapter internally, but we would at least exclude those additional operations from a list of suspects.
Try to get a hand on the dump with those 5 GB of memory - analyzing memory dumps is not a trivial task, but it won't be too difficult to understand what is eating those hefty chunks of memory. Because I somehow doubt that ADO.NET will just spawn a lot of additional objects for no reason.
I recently monitored my sql database activity I found about 400 processes in activity monitoring.Later I figured that the problem is with my connection string object which would not be cleared physically even though I completely closed and disposed it, so once I suspend my IIS all the processes from activity monitoring would disappear.
after a little searching I found that I can clean all of my connections from application pool so that all the useless processes from SMSS would be killed but
I'm really concerned about it's impact on webserver. It's true that this approach would clear useless tasks from SMSS but for every request a new connection should really be created is it worth it???
considering my application is kind of enterprise app which is supposed to handle to many requests, I'm so afraid of making IIS server down by using this approach.
Do notice that my connection string value is not completely fixed for all the requeests, I made it variable by changing only "Application Name" section of it in every request according to the request parameters for the purpose of getting requestors information in sql activity monitoring and sql profiler.
is it worth to do so considering my business scope or it's better I fix the connection string value in other word is performance lag on this approach is so severe that I have to change my logging strategy or it's just a little slower???
Do notice that my connection string value is not completely fixed for all the requeests, I made it variable by changing only "Application Name" section of it in every request according to the request parameters for the purpose of getting requestors information in sql activity monitoring and sql profiler.
This is really bad because it kills pooling. You might as well disable pooling but that comes with a heavy performance penalty (which you are paying right now already).
Don't do that. Obtain monitoring information in a different way.
Besides that, neither SQL Server nor .NET have a problem with 400 connections. That's unusually high but will not cause problems.
If you run multiple instances of the app (e.g. for HA) this will multiply. The limit is 30k. I'm not aware of any reasons why this would cause a slowdown for the app, but it might cause problems for your monitoring tools.
I am experiencing the exact same issue as a user reports on eggheadcafe, but don't know what steps to take after reading the following answer.:
Two problems you should chase down:
1. Why is the website leaking resources to the finalizers. That is
bad
2. What is Oracle code waiting on -- work with Oracle's support on it
This is the issue:
I have an intermittent problem with a
web site hosted on IIS6 (w2k3 sp2).
I appears to occur randomly to users
when they click on a hyperlink within
a page. The request is sent to the
web server but a response is never
returned. If the user tries to
navigate to another hyperlink they are
not able to (i.e. the web site appears
to hang for that user). Other users
of the website at the time are not
affected by this hang and if the user
with the problem opens a new http
session (closing IE and opening the
web site again) they no longer
experience the hang.
I've placed a debugger (IISState) on
the w3wp process with the following
output. Entries with "Thread is
waiting for a lock to be released.
Looking for lock owner." look like
they might be causing the issue. Can
anyone tell what lock the process is
waiting on?
Thanks
http://www.eggheadcafe.com/software/aspnet/33799697/session-hangs.aspx
In my case my .Net C# MVC application runs against a MySQL database for data and a MS SQL database for .Net membership.
I hope someone with more knowledge of IIS can help resolve this problem.
It sounds like you have a race condition in your database calls resulting in a deadlock at the database level. You may want to look at the settings you have in your application pool for database connections. Likely you will need to put some checks in somewhere or redefine procedures in order to reduce the likelihood of the race:
http://msdn.microsoft.com/en-us/library/ms178104.aspx
I would explain the experienced hang due to session serialization. Not the part about saving/loading it from some source, but that ASP.NET does not allow the same session to execute two parallel pages simultaneously, unless they execute with a readonly-session. The later is done either in the page directive, or in web.config, by setting EnableSessionState="ReadOnly".
Your problem still exists, this wont change that the first thread hangs. I would verify that your database connections are disposed correctly. However, you never mention any Oracle database in your question (only Mysql and SQL Server). Why are you using the Oracle drivers at all? (This seems like a valid place to start debugging.)
However, as stated by David Wang in his answer in your linked question, part two of your problem is a lock that's never released. You'll need support from Oracle (or their source code) to debug this further.
IIS hang is not something surprising. IISState is out of date, and you may use Debug Diag,
http://support.microsoft.com/kb/919791 (if CPU usage is high)
http://support.microsoft.com/kb/919792 (otherwise)
The hang dumps should tell you what is the root cause.
Microsoft support can help analyze the dumps, if you are not familiar with the tricks. http://support.microsoft.com
We currently have a little situation on our hands - it seems that someone, somewhere forgot to close the connection in code. Result is that the pool of connections is relatively quickly exhausted. As a temporary patch we added Max Pool Size = 500; to our connection string on web service, and recycle pool when all connections are spent, until we figure this out.
So far we have done this:
SELECT SPId
FROM MASTER..SysProcesses
WHERE DBId = DB_ID('MyDb') and last_batch < DATEADD(MINUTE, -15, GETDATE())
to get SPID's that aren't used for 15 minutes. We're now trying to get the query that was executed last using that SPID with:
DBCC INPUTBUFFER(61)
but the queries displayed are various, meaning either something on base level regarding connection manipulation was broken, or our deduction is erroneous...
Is there an error in our thinking here? Does the DBCC / sysprocesses give results we're expecting or is there some side-effect catch? (for example, connections in pool influence?)
(please, stick to what we could find out using SQL since the guys that did the code are many and not all present right now)
I would expect that there is a myriad of different queries 'remembered' by inputbuffer - depending on the timing of your failure and the variety of queries you run, it seems unlikely that you'd see consistent queries in this way. Recall that the connections will eventually be closed, but only when they're GC'd and finalized.
As Mitch suggests, you need to scour your source for connection-opens and ensure they're localized and wrapped in a using(). Also look for possibly-long-lived objects that might be holding on to connections. In an early version of our catalog ASP page objects held connections that weren't managed properly.
To narrow it down, can you monitor connection-counts (perfmon) as you focus on specific portions of your app? Does it happen more in CRUD areas vs. reporting or other queries? That might help narrow down the source-scour you need to do.
Are you able to change the connection strings to contain information about where and why the connection was created in the Application field?