Linq to SQL connections - c#

I'm using Linq to SQL for a fairly complicated site, and after go live we've had a number of database timeouts. The first thing I noticed was there are a fairly large number of connections to the database.
Coming from an ADO.net background we used to code it so that any site would only use one or two pooled connections, and this resulted in acceptable performance even with a fair few concurrent users.
So my question is, was this old way of doing it flawed, OR is there a way to do it LINQ? It seems our performance issues are being caused by so many connections to the DB but if this was an issue I'd have thought it would be mentioned in all the tutorials for LINQ.
Any suggestions?

I'm guessing that you are keeping DataContexts around and not calling Dispose on them when done (or leaving them around, at least).
Rather, you should initialize your DataContext, perform your operation, and then dispose of it when done. You shouldn't hold a reference to it between operations.
Preferably, you would use the using statement for handling the call to IDisposable.
Regarding connection pooling, the SqlClient pools connections by default, so unless you explicitly turn it off, you should be taking advantage of it already. Of course, if you aren't releasing connections you are using, then pooling will only take you so far.

Related

Desktop C# SQL Server (LocalDB) database access patterns

I'm coming from a Native C++ / PHP / MySQL / SQLite background.
Spending this weekend learning C# / WinForms / SQL Server / ASP.NET. And it all seems to work differently. Especially considering I no longer know exactly what happens under the hood, where I can optimize things and so on.
Needing to work with SQL Server (LocalDB) I think I noticed a weird database access pattern in most of the online examples I read + video tutorials (I got 2 books from Amazon but they arrive next week so currently, to my shame, learning basics online).
Every time they access the Database in those examples, they open and close a SqlConnection for each query.
using(var sql = new SqlConnection())
{
sql.Open();
// do Sql stuff here
}
For a C++ guy, this is making me very nervous:
What's the overhead of open/close connections all the time when I need to do a query?
Why not open an object and reuse it when required?
Can anyone tell me if this a performance-friendly DB access pattern in Desktop C# or go with Plan B? The end-result will be a C# Windows Service featuring an IOCP Server (which I figured out already) that should deal with up to 1,000 connections. It won't be very data intensive. But even with 100 clients, Sql Open/Close operations overhead, if any, can add up quickly.
I also noticed MultipleActiveResultSets=True; that should make this especially friendly for multiple-reads. So, I would imagine a single connection for the entire application's read-access & short-write with MARS should do the trick?! And dedicated connections for larger INSERT/UPDATE.
Plan B: I've initially thought about creating a connection pool for short reading / writing operations. And another one for longer read/write operations. And looping through it myself... Or maybe one connection per client but I'm not sure that won't be quite abusive.
Actually, there is very little performance issue here, and the small amount of overhead is made up for by a huge increase in maintainability.
First, SqlConnection uses ADO.NET connection pooling by default. So connections are not actually opened and closed to the server. Instead, internally ADO.NET has a pool of connections that it reuses when appropriate, grouping them by ConnectionString. It's very good at managing these resources, so long as you are good about cleaning up your objects when you are done with them.
This is part of what makes this work. By closing the connection, you are telling the connection pool that the connection can be reused by a different SqlConnection, so in effect, what you view as a performance problem is actually a performance optimization.
Coming from native programming, the first thing you have to learn about writing code in a managed world is that you HAVE to release your resources, because otherwise the garbage collector won't be able to efficiently clean them up. I know your first impulse is to try and manage the lifetimes yourself, but you should only do this when it is absolutely necessary.
The second thing to learn is to stop "getting nervous" about things you view as potential performance issues... Only optimize when you KNOW them to be a problem (ie, you have used a profiler and found that the normal way isn't as efficient as you would like it to.
As always, read the documentation:
http://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.110).aspx

SQL Performance, .Net Optimizations vs Best Practices

I need confirmation/explanation from you pros/gurus with the following because my team is telling me "it doesn't matter" and it's fustrating me :)
Background: We have a SQL Server 2008 that is being used by our main MVC3 / .Net4 web app. We have about 200+ concurrent users at any given point. The server is being hit EXTREMELY hard (locks, timeouts, overall slowness) and I'm trying to apply things i learned throughout my career and at my last MS certification class. They are things we've all been drilled on ("close SQL connections STAT") and I'm trying to explain to my team that these 'little things", though not one alone makes a difference, adds up in the end.
I need to know if the following do have a performance impact or if it's just 'best practice'
1. Using "USING" keyword.
Most of their code is like this:
public string SomeMethod(string x, string y) {
SomethingDataContext dc = new SomethingDataContext();
var x = dc.StoredProcedure(x, y);
}
While I'm trying to tell them that USING closes/frees up resources faster:
using (SomethingDataContext dc = new SomethingDataContext()) {
var x = dc.StoredProcedure(x, y);
}
Their argument is that the GC does a good enough job cleaning up after the code is done executing, so USING doesn't have a huge impact. True or false and why?
2. Connection Pools
I always heard setting up connection pools can significantly speed up any website (at least .Net w/ MSSQL).
I recommended we add the following to our connectionstrings in the web.config:
..."Pooling=True;Min Pool Size=3;Max Pool Size=100;Connection
Timeout=10;"...
Their argument is that .Net/MSSQL already sets up the connection pools behind the scenes and is not necessary to put in our web.config. True or false? Why does every other site say pooling should be added for optimal performance if it's already setup?
3. Minimize # of calls to DB
The Role/Membership provider that comes with the default .Net MVC project is nice - it's handy and does most of the legwork for you. But these guys are making serious use of UsersInRoles() and use it freely like a global variable (it hits the DB everytime this method is called).
I created a "user object" that loads all the roles upfront on every pageload (along with some other user stuff, such as GUIDs, etc) and then query this object for if the user has the Role.
Other parts of the website have FOR statements that loop over 200 times and do 20-30 sql queries on every pass = over 4,000 database calls. It somehow does this in a matter of seconds, but what I want to do is consolidate the 20-30 DB calls into one, so that it makes ONE call 200 times (each loop).
But because SQL profiler says the query took "0 seconds", they're argument is it's so fast and small that the servers can handle these high number of DB queries.
My thinking is "yeah, these queries are running fast, but they're killing the overall SQL server's performance."
Could this be a contributing factor? Am I worrying about nothing, or is this a (significant) contributing factor to the server's overall performance issues?
4. Other code optimizations
The first one that comes to mind is using StringBuilder vs a simple string variable. I understand why I should use StringBuilder (especially in loops), but they say it doesn't matter - even if they need to write 10k+ lines, their argument is that the performance gain doesn't matter.
So all-in-all, are all the things we learn and have drilled into us ("minimize scope!") just 'best practice' with no real performance gain or do they all contribute to a REAL/measurable performance loss?
EDIT***
Thanks guys for all your answers! I have a new (5th) question based on your answers:
They in fact do not use "USING", so what does that mean is happening? If there is connection pooling happening automatically, is it tying up connections from the pool until the GC comes around? Is it possible each open connection to the SQL server is adding a little more burden to the server and slowing it down?
Based on your suggestions, I plan on doing some serious benchmarking/logging of connection times because I suspect that a) the server is slow, b) they aren't closing connections and c) Profiler is saying it ran in 0 seconds, the slowness might be coming from the connection.
I really appreciate your help guys. THanks again
Branch the code, make your changes & benchmark+profile it against the current codebase. Then you'll have some proof to back up your claims.
As for your questions, here goes:
You should always manually dispose of classes which implement IDisposable, the GC won't actually call dispose however if the class also implements a finalizer then it will call the finalizer however in most implementations they only clean up unmanaged resources.
It's true that the .NET framework already does connection pooling, I'm not sure what the defaults are but the connection string values would just be there to allow you to alter them.
The execution time of the SQL statement is only part of the story, in SQL profiler all you will see is how long the database engine took to execute the query, what you're missing there is the time it takes the web server to connect to and receive the results from the database server so while the query may be quick, you can save on a lot of IO & network latency by batching queries.
This one is a good one to do some profiling on to prove the extra memory used by concatenation over string builders.
Oye. For sure, you can't let GC close your database connections for you. GC might not happen for a LONG time...sometimes hours later. It doesn't happen right away as soon as a variable goes out of scope. Most people use the IDisposable using() { } syntax, which is great, but at the very least something, somewhere needs to be calling connection.Close()
Objects that implement IDisposable and hold on inmanaged resources also implement a finilizer that will ensure that dispose is called during GC, the problem is when it is called, the gc can take a lot of time to do it and you migth need those resources before that. Using makes the call to the dispose as soon as you are done with it.
You can modify the parameters of pooling in the webconfig but its on by default now, so if you leave the default parameters you are no gaining anything
You not only have to think about how long it takes the query to execute but also the connection time between application server and database, even if its on the same computer it adds an overhead.
StringBuilder wont affect performance in most web applications, it would only be important if you are concatenating 2 many times to the same string, but i think its a good idea to use it since its easier to read .
I think that you have two separate issues here.
Performance of your code
Performance of the SQL Server database
SQL Server
Do you have any monitoring in place for SQL Server? Do you know specifically what queries are being run that cause the deadlocks?
I would read this article on deadlocks and consider installing the brilliant Who is active to find out what is really going on in your SQL Server. You might also consider installing sp_Blitz by Brent Ozar. This should give you an excellent idea of what is going on in your database and give you the tools to fix that problem first.
Other code issues
I can't really comment on the other code issues off the top of my head. So I would look at SQL server first.
Remember
Monitor
Identify Problems
Profile
Fix
Go to 1
Well, I'm not a guru, but I do have a suggestion: if they say you're wrong, tell them, "Prove it! Write me a test! Show me that 4000 calls are just as fast as 200 calls and have the same impact on the server!"
Ditto the other things. If you're not in a position to make them prove you right, prove them wrong, with clear, well-documented tests that show that what you're saying is right.
If they're not open even to hard evidence, gathered from their own server, with code they can look at and inspect, then you may be wasting your time on that team.
At the risk of just repeating what others here have said, here's my 2c on the matter
Firstly, you should pick your battles carefully...I wouldn't go to war with your colleagues on all 4 points because as soon as you fail to prove one of them, it's over, and from their perspective they're right and you're wrong.
Also bear in mind that no-one likes to be told their beatiful code is an ugly baby, so I assume you'll be diplomatic - don't say "this is slow", say "I found a way to make this even faster"....(of course your team could be perfectly reasonable so I'm basing that on my own experience as well:) So you need to pick one of the 4 areas above to tackle first.
My money is on #3.
1, 2 and 4 can make a difference, but in my own experience, not that much - but what you described in #3 sounds like death by a thousand papercuts for the poor old server! The queries probably execute fast because they're parameterised so they're cached, but you need to bear in mind that "0 seconds" in the profiler could be 900 milliseconds, if you see what I mean...add that up for many and things start getting slow; this could also be a primary source of the locks because if each of these nested queries is hitting the same table over and over, no matter how fast it runs, with the number of users you mentioned, it's certain you will have contention.
Grab the SQL and run it in SSMS but include Client Statistics so you can see not only the execution time but also the amount of data being sent back to the client; that will give you a clearer picture of what sort of overhead in involved.
Really the only way you can prove any of this is to setup a test and measure as others have mentioned, but also be certain to also run some profiling on the server as well - locks, IO queues, etc, so that you can show that not only is your way faster, but that it places less load on the server.
To touch on your 5th question - I'm not sure, but I would guess that any SqlConnection that's not auto-disposed (via using) is counted as still "active" and is not available from the pool any more. That being said - the connection overhead is pretty low on the server unless the connection is actually doing anything - but you can again prove this by using the SQL Performance counters.
Best of luck with it, can't wait to find out how you get on.
I recently was dealing with a bug in the interaction between our web application and our email provider. When an email was sent, a protocol error occurred. But not right away.
I was able to determine that the error only occurred when the SmtpClient instance was closed, which was occurring when the SmtpClient was disposed, which was only happening during garbage collection.
And I noticed that this often took two minutes after the "Send" button was clicked...
Needless to say, the code now properly implements using blocks for both the SmtpClient and MailMessage instances.
Just a word to the wise...
1 has been addressed well above (I agree with it disposing nicely, however, and have found it to be a good practice).
2 is a bit of a hold-over from previous versions of ODBC wherein SQL Server connections were configured independently with regards to pooling. It used to be non-default; now it's default.
As to 3 and 4, 4 isn't going to affect your SQL Server's performance - StringBuilder might help speed the process within the UI, certainly, which may have the effect of closing off your SQL resources faster, but they won't reduce the load on the SQL Server.
3 sounds like the most logical place to concentrate, to me. I try to close off my database connections as quickly as possible, and to make the fewest calls possible. If you're using LINQ, pull everything into an IQueryable or something (list, array, whatever) so that you can manipulate it & build whatever UI structures you need, while releasing the connection prior to any of that hokum.
All of that said, it sounds like you need to spend some more quality time with the profiler. Rather than looking at the amount of time each execution took, look at the processor & memory usage. Just because they're fast doesn't mean they're not "hungry" executions.
The using clause is just syntactic sugar, you are essentially doing
try
{
resouce.DoStuff();
}
finally
{
resource.Dispose()
}
Dispose is probably going to get called anyway when the object is garbage collected, but only if the framework programmers did a good job of implementing the disposable pattern. So the arguments against your colleagues here are:
i) if we get into the habit of utilizing using we make sure to free unmanaged resources because not all framework programmers are smart to implement the disposable pattern.
ii) yes, the GC will eventually clean that object, but it may take a while, depending on how old that object is. A gen 2 GC cleanup is done only once per second.
So on short:
see above
yes, pooling is set by default to true and max pool size to 100
you are correct, definitely the best area to push on for improvements.
premature optimization is the root of all evil. Get #1 and #3 in first. Use SQL
profiler and db specific methods (add indexes, defragment them, monitor deadlocks etc.).
yes, could be. best way is to measure it - look at the perf counter SQLServer: General Statistics – User Connections; here is an article describing how to do it.
Always measure your improvements, don't change the code without evidence!

Is MSDTC a big resource drain?

So, Is MSDTC a big resource drain on a system (server & application)?
In the past, I have written several large scale web applications that rely on MSDTC for database transactions. I have never investigated how much of a drain that would be on the server.
Currently, I am working with CSLA framework and CSLA doesn't rely on MSDTC for transactions. It basically reuses the same connection object and executes all database commands within a TransactionScope object.
I guess I am looking for some arguments either way as to using MSDTC or not.
MSTDC is used for distributed transaction. To simplify, using a TransactionScope can implicitely use MSDTC if the transaction needs to be distributed, ie: if the TransactionScope surrounds a piece of code that implies more than one resource. This is called escalation, and most of the time happens automatically.
So, yes it takes somes resources, but if you do need ACID transaction across multiple systems ("resource managers", like SQL Server, Oracle, or MSMQ for example) on a Windows OS, you don't have much choice but use MSDTC.
One thing about performance that can be done when configuration MSDTC, is ensure there is only one Coordinator for a pool of distributed resources, avoiding MSDTC to MSDTC communication. Configuration is usually the biggest problem you'll face with MSDTC. Example here: http://yrushka.com/index.php/security/configure-msdtc-for-distributed-transactions/
Compared to you application probably not. I use it on my current project and I have never noticed it affecting the CPU resources. What you do have to be careful about is latency, if there are multiple servers involved in your transaction that will be a much bigger problem than CPU.
Another way to look at is, its not going to be CPU bound, execution will be based on IO. Of course this assumes you don't do a lot of computation in your transaction, but that's not DTC's fault now is it?
I have used MSDTC for transactions enrolling multiple partners (one or more DB servers and one or more servers using MSMQ). The drain of using MSDTC in terms of performance vs. using transactions in general isn't a big deal. We were processing more than 40 million messages a data through MSMQ and every single one had a db action as well (though some were just cached reads, not many though).
The biggest problem is MSDTC is a huge pain in the but when you are crossing zones (e.g. DMZ to intranet). Getting stuff to enroll when they are in different zones is possible but takes tweaking. DTC also has a large number of configuration options if you are interested.

Caching Entity Framework DbContexts per request

I have several classes based on System.Entity.Data.DbContext. They get used several times a request in disparate ends of the web application - is it expensive to instantiate them?
I was caching a copy of them in HttpContext.Current.Items because it didn't feel right to have several copies of them per request, but I have now found out that it doesn't get automatically disposed from the HttpContext at the end of the request. Before I set out writing the code to dispose it (in Application_EndRequest), I thought I'd readdress the situation as there really is no point caching them if I should just instantiate them where I need them and dispose them there and then.
Questions similar to this have been asked around the internet, but I can't seem to find one that answers my question exactly. Sorry if I'm repeating someone though.
Update
I've found out that disposing of the contexts probably doesn't matter in this blog post, but I'm still interested to hear whether they are expensive to instantiate in the first place. Basically, is there lots of EF magic going on there behind the scenes that I want to avoid doing too often?
Best bet would be to use an IoC container to manage lifecycles here -- they are very, very good at it and this is quite a common scenario. Has the added advantage of making dynamic invocation easy -- meaning requests for your stylesheet won't create a DB context because it is hardcoded in BeginRequest().
I'm answering my own question for completeness.
This answer provides more information about this issue.
In summary, it isn't that expensive to instantiate the DbContext, so don't worry.
Furthermore, you don't really need to worry about disposing the data contexts either. You might notice ScottGu doesn't in his samples (he usually has the context as a private field on the controller). This answer has some good information from the Linq to SQL team about disposing data contexts, and this blog post also expands on the subject.
Use HttpContext.Items and dispose your context manually in EndRequest - you can even create custom HTTP module for that. That is a correct handling. Context disposal will also release references to all tracked entities and allow GC collecting them.
You can use multiple context per request if you really need them but in most scenarios one is enough. If your server processing is one logical operation you should use one context for whole unit of work. It is especially important if you do more changes in transaction because with multiple context your transaction will be promoted to distributed and it has negative performance impact.
We have a web project using a similar pattern to the one you've described (albeit with multiple and independant L2S Contexts instead of EF). Although the context is not disposed at the end of the request, we have found that because the HttpContext.Current becomes unreferenced, the GC collects the context in due course, causing the dispose under the scenes. We confirmed this using a memory analyser. Although the context was persisting a bit longer than it should, it was acceptable for us.
Since noticing the behaviour we have tried a couple alternatives, including disposing the contexts on EndRequest, and forcing a GC Collect on EndRequest (that one wasn't my idea and was quickly receded).
We're now investigating the possibility of implementing a Unit of Work pattern that encompasses our collection of Contexts during a request. There are some great articles about if you google it, but for us, alas, the time it would take to implement outweighs the potential benefit.
On the side, I'm now investigating the complexity of moving to a combined SOA/Unit of Work approach, but again, it's one of those things hindsight slaps you with after having built up an enterprise sized application without the knowledge.
I'm keen to hear other peoples views on the subject.

Efficiency of transaction in code vs. DB

What is more efficient -- having a IDbTransaction in the .net code or handling it in the database? Why?
What are the possible scenarios in which either should be used?
When it comes to connection-based transactions (IDbTransaction), the overall performance should be pretty similar - but by handling it in the .NET code you make it possible to conveniently span multiple database operations on the same connection. If you are doing transaction management inside TSQL you should really limit it to the single TSQL query. There may well be an extra round-trip for the begin/end, but that isn't likely to hurt you.
It is pretty rare (these days) that I'd manually write TSQL-based transactions - maybe if I was writing something called directly by the server via an agent (rather than from my own application code).
The bigger difference is between IDbTransaction and TransactionScope see Transactions in .net for more, but the short version is that TransactionScope is slightly slower (depending on the scenario), but can span multiple connections / databases (or other resources).

Categories