Is MSDTC a big resource drain? - c#

So, Is MSDTC a big resource drain on a system (server & application)?
In the past, I have written several large scale web applications that rely on MSDTC for database transactions. I have never investigated how much of a drain that would be on the server.
Currently, I am working with CSLA framework and CSLA doesn't rely on MSDTC for transactions. It basically reuses the same connection object and executes all database commands within a TransactionScope object.
I guess I am looking for some arguments either way as to using MSDTC or not.

MSTDC is used for distributed transaction. To simplify, using a TransactionScope can implicitely use MSDTC if the transaction needs to be distributed, ie: if the TransactionScope surrounds a piece of code that implies more than one resource. This is called escalation, and most of the time happens automatically.
So, yes it takes somes resources, but if you do need ACID transaction across multiple systems ("resource managers", like SQL Server, Oracle, or MSMQ for example) on a Windows OS, you don't have much choice but use MSDTC.
One thing about performance that can be done when configuration MSDTC, is ensure there is only one Coordinator for a pool of distributed resources, avoiding MSDTC to MSDTC communication. Configuration is usually the biggest problem you'll face with MSDTC. Example here: http://yrushka.com/index.php/security/configure-msdtc-for-distributed-transactions/

Compared to you application probably not. I use it on my current project and I have never noticed it affecting the CPU resources. What you do have to be careful about is latency, if there are multiple servers involved in your transaction that will be a much bigger problem than CPU.
Another way to look at is, its not going to be CPU bound, execution will be based on IO. Of course this assumes you don't do a lot of computation in your transaction, but that's not DTC's fault now is it?

I have used MSDTC for transactions enrolling multiple partners (one or more DB servers and one or more servers using MSMQ). The drain of using MSDTC in terms of performance vs. using transactions in general isn't a big deal. We were processing more than 40 million messages a data through MSMQ and every single one had a db action as well (though some were just cached reads, not many though).
The biggest problem is MSDTC is a huge pain in the but when you are crossing zones (e.g. DMZ to intranet). Getting stuff to enroll when they are in different zones is possible but takes tweaking. DTC also has a large number of configuration options if you are interested.

Related

Entity Framework with elastic pool. How to manage my SaaS client database?

I am currently looking to build an SaaS in ASP.Net hosted on Azure Cloud.
I am looking for advice on how to best build my database and the Entity Framework that goes with it. Once a customer registers on the web app, the app needs to create a seperate database for each customer on my Azure SQL server.
I have started looking into the option of elastic pooling, but it has left me quite confused. To tell you a bit about my database, it has one "meta"-database for all general settings. And then each customer has a database with his portfolio.
Example
database [Settings] with tables (Currency, Stocks, Bonds) [
[Customer1] SomeFinanceProduct [Currency as foreign, stock as foreign], SomeOtherFinanceProduct [Currency as foreign, bond as foreign]
[Customer2] SomeFinanceProduct [Currency as foreign, stock as foreign],
SomeOtherFinanceProduct [Currency as foreign, bond as foreign]
[Customer3] etc.
I would appreciate some help from more experienced developpers. Many thanks, this is an important issue for me. I have also found this post from 2015 where they said that the solution would be soon released, but I have not found anything on the web.
I can't speak to the Entity Framework part of your question too much, however, I can speak to the elastic pool side of things.
It's important to note that an Azure elastic pool is just a billing and resource allocation construct. As far as your application or its code is concerned, there is no difference if you use an elastic pool. You still have a database, it lives on a server, and that server (and indirectly, but more specifically in the case of Azure, that database) has resource constraints.
In Azure, you traditionally create a database and choose a service or pricing tier. You pay X dollars in exchange for Y resources (CPU, memory, storage size, connection counts, etc) for that database. You repeat this for every single database you create. Over time, databases grow in size or usage and they become more demanding, so you must change the service tiers of each database individually as this happens. As you have more and more databases, this becomes tedious and cost ineffective.
With elastic pools, you can take any number of individual databases and drop the individual service/pricing plans and instead buy a big bucket of resources [i.e. the elastic pool] and give those resources to all of the databases. The theory is that collectively you need fewer resources with this approach, and this allows you to save money. It also makes better use of the resources you are buying.
The reason you need fewer resources is because generally databases experience peak demand at different times. When you buy resources individually, you have to over buy on every single database to handle the peaks (which means you have a lot of wasted resources just sitting there unused). On an elastic pool, since all database are in the pool together you only buy enough extra resources which will cover however many peaks you typically would have going on concurrently at the same time; now you have fewer resources sitting idle wasting money.
As I mentioned, the other benefit of using elastic pools is that you can make better use of the resources you have. Consider a database which has very low demands placed upon it; you'd naturally purchase a small (and thus cheap) plan for it. Then consider a database which has high demands placed upon it; you'll likely buy a plan with much greater resources. Now, occasionally the low use database gets some big hits. With the small plan, the resources aren't enough and performance degrades terribly. Meanwhile the other database has tons of resources and much of it is being unused. Wouldn't it be nice if the small database which is experiencing the unusual peak could borrow some of those resources for a few minutes? That's exactly what elastic pools do! Elastic pools have lots of built in scalability wins for your applications!
The last important thing to note, is that elastic pools cost more per unit of resource, than regular databases. This means that there is a break even point and that its more expensive to use elastic pools until you have enough databases to make it worthwhile. For my needs I've found 10-15 databases to be a fairly good break even point. Once you have enough, create the pool. Then, as you add more databases to the pool later on, the "per database" costs start going down even more.
--
So to get back to your question, elastic pools will not specifically affect your ability to use Entity Framework for your project. Regardless of whether you choose to pool your databases or not, you'll have to get your code to talk to the appropriate customer specific database based on who is logged in.
You want an elastic pool with a shard per tenant.
This link describes the tools that are available for managing and querying sharded databases in a multi-tenant scenario. Follow the links in the first paragraph for details on each.

Desktop C# SQL Server (LocalDB) database access patterns

I'm coming from a Native C++ / PHP / MySQL / SQLite background.
Spending this weekend learning C# / WinForms / SQL Server / ASP.NET. And it all seems to work differently. Especially considering I no longer know exactly what happens under the hood, where I can optimize things and so on.
Needing to work with SQL Server (LocalDB) I think I noticed a weird database access pattern in most of the online examples I read + video tutorials (I got 2 books from Amazon but they arrive next week so currently, to my shame, learning basics online).
Every time they access the Database in those examples, they open and close a SqlConnection for each query.
using(var sql = new SqlConnection())
{
sql.Open();
// do Sql stuff here
}
For a C++ guy, this is making me very nervous:
What's the overhead of open/close connections all the time when I need to do a query?
Why not open an object and reuse it when required?
Can anyone tell me if this a performance-friendly DB access pattern in Desktop C# or go with Plan B? The end-result will be a C# Windows Service featuring an IOCP Server (which I figured out already) that should deal with up to 1,000 connections. It won't be very data intensive. But even with 100 clients, Sql Open/Close operations overhead, if any, can add up quickly.
I also noticed MultipleActiveResultSets=True; that should make this especially friendly for multiple-reads. So, I would imagine a single connection for the entire application's read-access & short-write with MARS should do the trick?! And dedicated connections for larger INSERT/UPDATE.
Plan B: I've initially thought about creating a connection pool for short reading / writing operations. And another one for longer read/write operations. And looping through it myself... Or maybe one connection per client but I'm not sure that won't be quite abusive.
Actually, there is very little performance issue here, and the small amount of overhead is made up for by a huge increase in maintainability.
First, SqlConnection uses ADO.NET connection pooling by default. So connections are not actually opened and closed to the server. Instead, internally ADO.NET has a pool of connections that it reuses when appropriate, grouping them by ConnectionString. It's very good at managing these resources, so long as you are good about cleaning up your objects when you are done with them.
This is part of what makes this work. By closing the connection, you are telling the connection pool that the connection can be reused by a different SqlConnection, so in effect, what you view as a performance problem is actually a performance optimization.
Coming from native programming, the first thing you have to learn about writing code in a managed world is that you HAVE to release your resources, because otherwise the garbage collector won't be able to efficiently clean them up. I know your first impulse is to try and manage the lifetimes yourself, but you should only do this when it is absolutely necessary.
The second thing to learn is to stop "getting nervous" about things you view as potential performance issues... Only optimize when you KNOW them to be a problem (ie, you have used a profiler and found that the normal way isn't as efficient as you would like it to.
As always, read the documentation:
http://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.110).aspx

Can interprocess communication be as fast as in-process events (using wcf and c#)

I have an application that performs analysis on incoming event flow (CEP engine).
This flow can come from different sources (database, network, etc...).
For efficient decoupling, I want this service to expose a named pipe using wcf, and allow a different application to read the data from the source and feed it into the service.
So, one process is in charge of getting and handling the incoming data while the other for analyzing it, connecting the two using wcf with named pipes binding. They both will be deployed on the same machine.
Question is, will I notice a lower throughput using wcf in the middle then if I would have simply coupled the two services into a single process and use regular events?
No, in modern mainstream operating systems, IPC will never be, can never be, as fast as in-process eventing. The reason for this is the overhead of context switching associated to activating different processes. Even for a multi-core system where distinct processes run on distinct cores, though they each run independently (and therefore there is no cost associated to activating one process versus another - they are both always active), the communication across processes still requires crossing security boundaries, hitting the network stack (even if using pipes), and so on. Where a local function call will be on the order of 1000's of cpu cycles to invoke, an IPC will be millions.
So IPC will be slower than in-process communication. Whether that actually matters in your case, is a different question. For example, suppose you have an operation that requires Monte Carlo simnulation that runs for 2 hours. In this case it really doesn't matter whether it takes 1ms or 1000ms in order to invoke the operation.
Usually, performance of the communication is not what you want to optimize for. Even if performance is important, focusing on one small aspect of performance - let's say, whether to use IPC or local function calls - is probably the wrong way to go about things.
I assumed "CEP" referred to "complex event processing" which implies high throughput, high volume processing. So I understand that performance is important to you.
But, for true scalability and reliability, you cannot simply optimize in-process eventing; You will need to rely on multiple computers and scale out. This will imply some degree of IPC, one way or the other. It's obviously important to be efficient at the smaller scale (events) but your overall top-end performance will be largely bounded by the architecture you choose for scale out.
WCF is nice because of the flexibility it allows in moving building blocks from the local machine to a remote machine, and because of the Channel stack, you can add communication services in a modular way.
Whether this is important to you, is up to you to decide.

Efficiency of transaction in code vs. DB

What is more efficient -- having a IDbTransaction in the .net code or handling it in the database? Why?
What are the possible scenarios in which either should be used?
When it comes to connection-based transactions (IDbTransaction), the overall performance should be pretty similar - but by handling it in the .NET code you make it possible to conveniently span multiple database operations on the same connection. If you are doing transaction management inside TSQL you should really limit it to the single TSQL query. There may well be an extra round-trip for the begin/end, but that isn't likely to hurt you.
It is pretty rare (these days) that I'd manually write TSQL-based transactions - maybe if I was writing something called directly by the server via an agent (rather than from my own application code).
The bigger difference is between IDbTransaction and TransactionScope see Transactions in .net for more, but the short version is that TransactionScope is slightly slower (depending on the scenario), but can span multiple connections / databases (or other resources).

Linq to SQL connections

I'm using Linq to SQL for a fairly complicated site, and after go live we've had a number of database timeouts. The first thing I noticed was there are a fairly large number of connections to the database.
Coming from an ADO.net background we used to code it so that any site would only use one or two pooled connections, and this resulted in acceptable performance even with a fair few concurrent users.
So my question is, was this old way of doing it flawed, OR is there a way to do it LINQ? It seems our performance issues are being caused by so many connections to the DB but if this was an issue I'd have thought it would be mentioned in all the tutorials for LINQ.
Any suggestions?
I'm guessing that you are keeping DataContexts around and not calling Dispose on them when done (or leaving them around, at least).
Rather, you should initialize your DataContext, perform your operation, and then dispose of it when done. You shouldn't hold a reference to it between operations.
Preferably, you would use the using statement for handling the call to IDisposable.
Regarding connection pooling, the SqlClient pools connections by default, so unless you explicitly turn it off, you should be taking advantage of it already. Of course, if you aren't releasing connections you are using, then pooling will only take you so far.

Categories