Enterprise scale BO Layer to SQL Server MAX Connections - c#

We have an application with approximately 60,000 client machines accessing it. Previously we had a distributed model but we are moving to SaaS by creating a BO Layer and having calls come up into it over the WAN. We use LINQ to Entities to access the database from the BO layer. Our multi-tenant model is federated so that 'enterprises' comprising of multiple stores are on distinct sql servers (which usually has about 200 'enterprises' per server).
Each BO server is dual processor 8 core with HT (32 logicals). IIS is setup to have 32 max worker processes.
The BO layer is working pretty well as each call pulls the connection string associated with that enterprise which then talks to the correct database. The problem I am having though is that we have 1/4 of our clients on and about 15 BO servers, I have noticed that we have 3000+ open connections to each database server and its growing.
Any idea why it is growing like this? What am I supposed to set where to make it re-use connections (connection pooling appears to be on) that will keep it from flooding each db server like this? Any other suggestions?

It could be purely architecture thing.
How many database servers you have in total? And is the problem about workload is heavy on certain database servers but not others?
If that's the case, then probably considering how to partition different enterprise to different database servers will help. Or further partition data in heavy loaded database servers. Another technical is to vertical partition different tables for enterprises to different databases given no joins across vertical partitioned tables.

Related

Which transaction manager will be used in WCF?

I am going through the transactions exist in WCF service but seeking some more clarification on this. I am not sure about which transaction manager WCF will use for following scenarios:
If the WCF service is performing insert in table of one SQL server database and delete from table of another SQL server database (in same or different server)
If the same WCF service is performing insert in table of one SQL server database and delete from table oracle database.
If WCF service calling 2 different WCF service performing operation on same SQL server base database.
Kindly help me providing some understanding on this situations.
I think you're giving WCF more credit than it's due. WCF can do some amazing stuff, but there's nothing magical about it. It provides a set of interfaces for web services and allows you to provide an intermediary access layer for your data.
So let's tackle your scenarios:
If the WCF service is performing insert in table of one SQL server database and delete from table of another SQL server database (in same or different server)
We've got two RDBMS in use here, so you're going to have two transaction managers. The first transaction manager is in the RDBMS for the insert, and the second transaction manager is for the delete.
If the same WCF service is performing insert in table of one SQL server database and delete from table oracle database.
Again, we've got two RDBMS in use here, so you're going to have two transaction managers. The first transaction manager is in the RDBMS for the insert, and the second transaction manager is for the delete.
Note that we don't need to care about which type of RDBMS it is, we just track the number that are involved.
If WCF service calling 2 different WCF service performing operation on same SQL server base database.
This one is a little trickier because we don't know what the 2 WCF services are doing, and there is some unadvisable voodoo magic that could be done to coordinate transactions across the 2 services. I'm going to assume you're smarter than that and didn't mean that case.
So in this case, we have 1 RDBMS performing 2 separate transactions. We'll have 1 transaction manager from the 1 RDBMS, but the operations will complete under different transactions.
To wrap that up - to know how many transaction managers are involved, you need to look at the number of RDBMS that are being used. And to know how many transactions will be required, you need to look at the number of operations performed.
Notice that the use of WCF has no bearing on your concern about the managers. WCF just happens to be a tool that provides an additional way of accessing the data through a service. WCF is cool, but it's not magic.
Additional note
You asked in a comment:
my concern is that in all of this condition which transaction manager it will use a) The LTM b) The KTM c) The DTC?
And for the MS SQL Server transactions, it will either be the LTM or the DTC that handles the transaction. Per this MSDN Blog entry, it's not necessarily something you need to worry about until performance becomes a significant issue. And you should avoid premature optimization in favor of getting things working first.
And based upon this description of the KTM, it's very unclear how you think you'd be using the KTM in any of the cases you asked about.
The Kernel Transaction Manager (KTM) enables the development of applications that use transactions. The transaction engine itself is within the kernel, but transactions can be developed for kernel- or user-mode transactions, and within a single host or among distributed hosts.
Also note that Oracle DB has a separate transaction manager for its RDBMS that is different than the MS SQL Server transaction manager(s).

Multi-tenancy: Individual database per tenant

We are developing a multi-tenant application. With respect to architecture, we have designed shared middle tier for business logic and one database per tenant for data persistence. Saying that, business tier will establish set of connections (connection pool) with the database server per tenant. That means application maintain separate connection-pool for each tenant. If we expect around 5000 tenants, then this solution needs high resource utilization (connections between app server and database server per tenant), that leads to performance issue.
We have resolved that by keeping common connection pool. In order to maintain single connection pool across different databases, we have created a new database called ‘App-master’. Now, we always connect to the ‘App-master’ database first and then change the database to tenant specific database. That solved our connection-pool issue.
This solution works perfectly fine with on-premise database server. But it does not work with Azure Sql as it does not support change database.
Appreciate in advance to suggest how to maintain connection pool or better approach / best practice to deal with such multi-tenant scenario.
I have seen this problem before with multiple tenancy schemes with separate databases. There are two overlapping problems; the number of web servers per tenant, and the total number of tenants. The first is the bigger issue - if you are caching database connections via ADO.net connection pooling then the likelihood of any specific customer connection coming into a web server that has an open connection to their database is inversely proportional to the number of web servers you have. The more you scale out, the more any given customer will notice a per-call (not initial login) delay as the web server makes the initial connection to the database on their behalf. Each call made to a non-sticky, highly scaled, web server tier will be decreasingly likely to find an existing open database connection that can be reused.
The second problem is just one of having so many connections in your pool, and the likelihood of this creating memory pressure or poor performance.
You can "solve" the first problem by establishing a limited number of database application servers (simple WCF endpoints) which carry out database communications on behalf of your web server. Each WCF database application server serves a known pool of customer connections (Eastern Region go to Server A, Western Region go to Server B) which means a very high likelihood of a connection pool hit for any given request. This also allows you to scale access to the database separately to access to HTML rendering web servers (the database is your most critical performance bottleneck so this might not be a bad thing).
A second solution is to use content specific routing via a NLB router. These route traffic based on content and allow you to segment your web server tier by customer grouping (Western Region, Eastern Region etc) and each set of web servers therefore has a much smaller number of active connections with a corresponding increase in the likelihood of getting an open and unused connection.
Both these problems are issues with caching generally, the more you scale out as a completely "unsticky" architecture, the less likelihood that any call will hit cached data - whether that is a cached database connection, or read-cached data. Managing user connections to allow for maximum likelihood of a cache hit would be useful to maintain high performance.
Another method of restricting the number of connection pools per app server is to use Application Request Routing (ARR) to divide up your tenants and assign them to subsets of the web tier. This lends itself to a more scalable "pod" architecture where a "pod" is a small collection of web/app servers coupled to a subset of the databases. A good article on this approach is here:
http://azure.microsoft.com/blog/2013/10/31/application-request-routing-in-csf/
If you are building a multi-tenant DB application Azure you should also check-out the new Elastic Scale client libraries that simplify data-dependent routing and facilitate cross-shard queries and management operations. http://azure.microsoft.com/en-us/documentation/articles/sql-database-elastic-scale-documentation-map/

Socket programming to control the number of clients

I m developing a C# database application. I used SQL Server 2005 as back end and C# .NET 2010 as front end.
My application is installed on each client machine. When database is updated all clients of my system get notified By SQL Server Event Dependency technique.
But now I want to control the number of clients connected to server. That is I only want to give access to 3 clients. For that, I want to add some client/Server code in my application using socket programming.
Please guide me on this issue.
From the SqlDependency Class on MSDN:
SqlDependency was designed to be used in ASP.NET or middle-tier services where there is a relatively small number of servers having dependencies active against the database. It was not designed for use in client applications, where hundreds or thousands of client computers would have SqlDependency objects set up for a single database server. If you are developing an application where you need reliable sub-second notifications when data changes, review the sections Planning an Efficient Query Notifications Strategy and Alternatives to Query Notifications in the Planning for Notifications topic in SQL Server Books Online.
In your particular scenario, I guess it would be a good idea to have a middle layer server which manages the client machines and which will use the SQLDependency to be notified by the changes in the database. Then, it will push notifications to batches of n client machines, following the logic you expect.

how to scale MVC 4 WebApi and Entitiy Framework

I have a web app built with MVC 4 with WebApi on the back end which has entity framework.
I have used structure map to inject the entity framework to webapi. and injecting webapi client to MVC 4 app.
he application is running fine, but soon I will need scale.
MVC 4 app sits on one server, webapi is on another server and there is a database server.
How can I scale webapi horizontally? if i add webapi servers and database servers, is there a configuration for entity framework which will take multiple connection strings and do a round robin querying? is there sharding available for EF.
How about httpclient? how about failover such as client takes multiple IPs and if one fails, requests go to another server?
How can I scale them?
Typically one adds additional web servers and then uses a load balancer to distribute incoming requests among them. There's a few considerations here.
If the web server persists data across requests (via ASP.NET session), you will need to create a separate state server that all the web servers can share, or use a load balancer that is state aware.
If the performance issue is stemming from database IO problems, (missing table indexes, index fragmentation, requests pulling huge result sets, less than optimal disk\hardware configs, etc...) then adding more web servers will not address the problem. The first step is to monitor and profile your database and make sure it is performing well.

WCF - solution architecture

I am working on a project in which a WCF service will be consumed by iOS apps. The number of hits expected on the webserver at any given point in time is around 900-1000. Every request may take 1-2 seconds to complete. The same number of requests are expected on every second 24/7.
This is what my plan:
Write WCF RESTful service (the instance context mode will be percall).
Request/Response will be in Json.
There are some information that needs to be persisted in the server - this information is actually received from another remote system - which is shared among all the requests. Since using a database may not be a good idea (response time is very important - 2 seconds is the max the customer can wait), would it be good to keep it in server memory (say a static Dictionary - assume this dictionary will be a collection of 150000 objects - each object consists of 5-7 string types and their keys). I know, this is volatile!
Each request will spawn a new thread (by using Threading.Timers) to do some cleanup - this thread will do some database read/write as well.
Now, if there is a load balancer introduced sometime later, the in-memory stored objects cannot be shared between requests routed through another node - any ideas?
I hope you gurus could help me by throwing your comments/suggestions on the entire architecture, WCF throttling, object state persistence etc. Please provide some pointers on the required Hardware as well. We plan to use Windows 2008 Enterprise Edition server, IIS and SQL Server 2008 Std edition database.
Adding more t #3:
As I said, we get some information to the service from a remote system. On the web server where the the WCF is hosted, a client of the remote system will be installed and WCF references one of this client dlls to get the information, in the form of a hashtable(that method returns a hashtable - around 150000 objects will be there in this collection). Would you suggest writing this information to the database, and the iOS requests (on every second) which reach the service retrieves this information from the database directly? Would it perform better than consuming directly from this hashtable if this is made static?
Since you are using Windows Server 2008 I would definitely use the Windows Server App Fabric Cache to store your state:
http://msdn.microsoft.com/en-us/library/ff383813.aspx
It is free to use, well supported and integrated and is (more or less) API compatible with the Windows Azure App Fabric Cache if you every shift your service to Azure. In our company (disclaimer: not my team) we used to use MemCache but changed to the App Fabirc Cache and don't regret it.
Let me throw some comments/suggestions based on my experience in serving a similar amount or request under the WCF framework, 3.5 back in the days.
I don't agree to #3. Using a database here is the right thing to do. To address response time, implement caching and possibly cache dependency in order to keep the data synchronized across all instances (assuming that you are load balanced)(also see App Fabric suggested above/below). In real world scenarios, data changes, often, and you must minimize the impact.
We used Barracuda hardware and software to handle scalability as far as I can tell.
Consider indexing keys/values with Lucene if applicable. Lucene delivers extremely good performances when it comes to read/write. Do not use it to store your entire data, read on it. A life saver if used correctly. Note that it could be complicated to implement on a load balanced environment.
Basically, caching might be the only necessary change to your architecture.

Categories