Which data access technology is better for DocumentDB - c#

I'm creating a website content management system which stores a whole bunch of website articles and let user be able to modify these articles through the system. I'm a typical SQL Server developer however I'm thinking maybe this system can be done in DocumentDB.We are using C# plus WebAPI to do the read and write. I'm testing different data access technology to see which one performs better. I have been trying Ling, Linq Lambda, SQL and Stored Procedure. The thing is all these query methods seems all running around 600ms to 700ms when I test via Postman. For example, one of my test is a simple Get http://localhost:xxxxxx/multilanguage/resources/1, which would take 600ms+. That was only a 1 kb document and there are only have 5 documents stored in my collection so far. So I guess what I want to ask is: is there a quicker way to query DocumentDB than this. The reason I ask is because I did something similar in SQL Server before(not to query document, it was for relational tables). A much more complex query in a stored procedure on multiple joined tables only takes around 300ms. So I guess there should be a quicker way to do this. Thanks for any suggestions!

Most probably if you will change implementation to stab you will get same performance since actually you are testing connection time between yours server and client (postman).

There's a couple things you can do, but do keep in mind that DocumentDB, and other NoSQL solutions behave very differently than standard SQL Server. For example, the more nodes and RAM available to DocumentDB the better it will perform overall. The development instance of DocumentDB on Azure is understandably going to use fewer resources than a production instance. Since Azure takes care of scaling, one way to think about it is that the more data you have the better it will perform.
That said, something you are probably not used to is sharing your connection object for your whole application. That avoids the start up penalties every time you want to get your data. Summarizing Performance Tips:
Use TCP connection instead of HTTPS when you can
Use await client.OpenAsync() to avoid pausing on start up latency for the first request
Connect to the DocumentDB in the same region (keep in mind if you host across regions)
Use a singleton to access DocumentDB (it's threadsafe)
Cache your SelfLinks for quick access
Tune your page sizes so that you get only the data you intend to use
The more advanced performance tips cover index policies, etc. DocumentDB and other NoSQL databases behave differently than SQL databases. That also means your assumptions about how the APIs work are probably wrong. Make sure you are testing similar concepts. The SQL Server database connection object needs you to create/dispose of objects for each transaction so it can return those connections back to a connection pool. Treating DocumentDB the same way is going to cause the same kind of performance problems as if you didn't use a connection pool.

Related

Why using multiple database in same instance a bad idea in Redis?

I am new to redis therefore I don't know more about its complex technicalities. But let me put my scenario here: I am running two websites from same server and I wanted redis to work on both. On searching, I found that I can do this by assigning different index to different db on same server instance like below:
//In my first website (development)
Idatabase dbOfDev=_conn.GetDatabase(0);
//In my second website (production)
Idatabase dbOfProd=_conn.GetDatabase(1);
This was ideal for me since I could cache both my database in same instance. But then I bumped into What's the Point of Multiple Redis Databases? and How do I change between redis database? links which says "Use of multiple database in same server instance is discouraged and deprecated". Though these links do try to explain the reason behind it, being a beginner, I am still not able to understand its deep technical aspects.
Can anyone explain the reason in simpler terms as why using multiple redis db of same server instance is discouraged. Also, in simpler terms, how can I manage caching of both my websites on same server without the above said approach?
how can I manage caching of both my websites on same server without the above said approach?
You can use different key tag for each website. Say, name the two websites as A and B. For all keys of website A, give each key a prefix(key tag): A:. On the other hand, give each key for website B another prefix: B:. In this way, you can have a unique key namespace for each website.
SET A:key1 val1
SET A:key2 val2
LPUSH B:key1 1
SADD B:key2 val
Also check this answer for more solutions.
Can anyone explain the reason in simpler terms as why using multiple redis db of same server instance is discouraged.
AFAIK, multiple databases feature is NOT discouraged and deprecated. It's a method to isolated key namespaces for different applications. However, the author of Redis consider
Redis multiple database errors my worst decision in Redis design at
all, since it makes Redis internals more complex.
Redis is single-threaded, so compared to multiple databases, multiple Redis instances can take advantage of multiple cores. If you have multiple databases in one Redis instance, you can still only use one core. Also Redis instance itself has little memory footprint, so you don't need to worry about multiple Redis instance costs you too much.
Redis is very fast, and normally the bottleneck is network bandwidth, NOT CPU. So normally you CANNOT get too much gain by using multiple Redis instances. However, if one of your application needs to do some slow commands on Redis, and don't want it to block other applications, you can have a separate Redis instance for the slow application, and have another Redis instance for other fast applications.
Also note that Redis Cluster doesn't support multiple databases.
Personally, I like this multiple database feature. Normally, if I run a Redis instance, not Redis Cluster, I'll put my data into some database other than the default database, i.e. database 0, to avoid incidentally login Redis and do some horrible things on the default database. Also it's very easy to implement a double buffer with multiple databases, e.g. write data to a new database, when it's done, use the SWAPDB command to swap the old DB and new DB automatically and efficiently.
It is not. If you are building a multi-tenant application, supporting multiple websites, it does make sense. And if one of the websites needs to scale more rapidly, you can setup a different instance (or cluster) for that one alone and migration is much simpler.

How to model data using MongoDB

We have a relative large scale application that uses relational DB (MSSQL).
After a lot of reading I've decided that I want to examine using MongoDB and not MSSQL, mainly because performance and scale issues.
I read and study about Mongo and couldn't figure out the answer for the following questions:
Should we do it? Bare in mind we have the time to invest, the only question is "is it good for us?"
How to model our data?
My problem with mongo is that we have a lot of one to many relations in our DB.
After reading this great post (and the second part as well), I've realized a good practice will be to divide the decision into 3 scenarios:
1 to few
1 to many
1 to squillions.
In our db, most of the times we use one-to-many, but the problem is that most of the times it's the same "one".
For example, we have users and transactions tables.
Each user can perform a transaction, so basically what I should do is to model the user as following:
{
"name": "John",
...,
"Transactions" : [ObjectId("..."), ObjectId("..."),...]
}
So far it's fine, the problem is that we have a lot more than just transactions, for example we could have: posts, requests and many more features like transactions, and then, my users collection becomes huge (more then 25 "columns"). And also when I want to retrieve a data set I have to do several queries unlike MSSQL in which I'm just using Join statement.
Another issue is that I'll have to save a lot of extra data, for example, for each transaction I have to save the terminal ID, and in the report I'll have to show the terminal name, in that case (as for my understanding) I have 2 choices, the one is to do 2 queries and the other is to save the terminal name as well. In relational DB this is a simple join.
So maybe for schemes like ours, Mongo(or any other document based DB) is not the best choice?
I know those are a newbie questions :)
We use c# for our server side (ASP.Net Web API)
Thanks in advance!
You can face with some serious issues while modeling your data with 2 and 3 approaches:
For One to many you may face with data inconsistency or/and eventual consistency. Here, you store inside document an index (array of references) to external documents. So, for your example to add a new transaction you need two requests: create a transaction and add its reference to a user (update document). Mongo DB has ACID transactions only on document level, so for your case application for some reason can create a transaction but doesn’t add its reference to user. It can be app failures, network problems, bugs and so on. Of course, you can simulate db transaction in app with try/catch block making data cleanup when an error occurs. It will help but not in fully because app can fall down between requests.
So, if your app is high loaded after some time you can have some number of “dad” transactions which are not linked to any user. It couldn’t be a big problem if your app doesn’t query transactions directly – only via users, you will have only useless data in db. Otherwise you will have data inconsistency.
To fix that you need to create background job which will make proper cleanup. So, some period of time your data can be inconsistent – eventual consistency. For some applications, it can be ok, for another – not.
The same problem you can face while deleting transactions.
I agree, that a document with 25 arrays of references (columns) looks not very good. Working with such objects manually will be harder (testing, manual data fixes and so on.
One to squillions doesn’t have this affect but you need indexes to query efficiently. For large and shared db you can have bad performance.
In general, I’d like to say document dbs are pretty good if your app works mostly with one document (aggregate) and don’t have a lot of references to another docs and you don’t need transactions between docs. Denormalization can also be a source of inconsistency.
Key-value data is very easy to scale. Document dbs – it’s one step closer to key-value data-store. Column-oriented dbs are even more closed to key-value and so they can be scaled even better.
Also, I recommend you to consider the next measures to improve your SQL Server db performance:
Caching – perhaps you can cache some your app aggregates instead of gathering (making joins) them in SQL db all the time. For instance, Stack Overflow uses SQL Server db and Redis for caching aggregates (questions with answers, comments and so on).
Tune query performance within indexes, db structure, demoralization and so on.
If your db is hosted in on premise SQL Server then additional memory, SSD disk, table partitioning, data compressions, replication can help. As a rule, SQL Server gives a good performance with these approaches for dbs up to 1 TB.
CQRS approach.
Consider storing your app data in different databases. Every type of dbs has its own strong and weak sides. Document DB is good for storing aggregates, SQL db – for relational data and so on. Complex apps as a rule use a few db types.

Reduce SQL Server overhead caching query results

I have a software who does a heavy processing based on some files.
I have to query some tables in SQL Server in the process and this is killing the DB and the application performance. (other applications use the same tables).
After optimizing queries and code, getting better results but not enough. After research I reached the solution: Caching some query results. My idea is cache one specific table (identified as the overhead) rows that the file being process need.
I was think in using AppCache Fabric (I'm on MS stack), made some tests it have a large memory usage for small objects ( appcache service have ~350MB of ram usage without objects). But I need to make some queries in these result table (like search for lastname, ssn, birthdate etc.)
My second option is MongoDb as a cache store. I've research about this and most of people I read recommend using memcached or Redis, but I'm using Windows servers and they're not supported officialy.
Using mongo as cache store in this case it is a good approach? Or AppFabric Caching + tag search is better?
It is hard to tell what is better because we don't know enough about your bottlenecks. A lot is depending on quality of the data you're discussing. If the data is very static and is not called constantly but to compile the data set is time-consuming, the good solution might be to use the materialized view. If this data is frequently called than you better caching it on some server (e.g. app fabric).
There are many techniques and possibilities. But you really need to think of the network traffic, demand, size, etc, etc. And it is hard to answer this here without knowing all the details.
Looks like you are on the right way but may be all you need is just a parametrized query. Hard to tell. But I would add Materialized view into the roster that you just posted. May be all you need is to build this view from all the data you need and just access its contents.
My question to you would be that what are your long-term goals or estimates for your application? If this is the highest load you are going to expereince then tuning the DB or using MVL would be an answer. But the long term solution to this is distributed caching and you are already thinking along those lines. Your data requirements is what we'd called "reference data" or "lookup-data" and once you are excuting multiple lookups with limited DB resources there will be performance issue and your DB will become a performance bottleneck.
So the solution, that you are already thinking of, is caching this "reference" data in a cache without the need to go to the database, while, at the same time, keeping cache synchronized with the Database.
Appfabric I wouldn't be too sure about as it will have the same support issues that you mention. What is your budget like? Can you think about spending on a cachisng solution like NCache?

How can I cache private data in a webfarm for ASP MVC

I am making a member based web app in ASP MVC3 and I am trying to plan ahead, at first our user base will not be huge, but as with any software the potential for a sudden volume spike is always a possibility.
Thinking ahead to this scenario, I know that the database is the bottleneck area on most web apps. We are using MSSQL 2008RS we will have dedicated servers with several client databases each client has there own database so if one server begins to bottle neck we can scale vertically or move some of the databases to a new server and begin filling it up.
To access the databases we use primarily LINQ 2 SQL and are currently re-factoring some of our code to make use of the IQueryable mechanisms to do a lazy load of content. but each page contains quite a bit of content from various parts of the database.
We also have a few large databases that are used for widgets in the program that rarely change but have millions of rows. The goal with those is to somehow sync them to the primary source and distribute them across several machines and then load balance those servers.
With this layout should I even worry about caching, or will the built-in caching mechanisms in MSSQL be sufficient?
If so where should I begin? I have looked briefly at app fabric but it looks as tho it is for Azure only?
Resources:
How to cache data in a MVC application
http://stephenwalther.com/blog/archive/2008/08/28/asp-net-mvc-tip-39-use-the-velocity-distributed-cache.aspx
http://stephenwalther.com/blog/archive/2008/08/29/asp-net-mvc-tip-40-don-t-cache-pages-that-require-authentication.aspx
Lazy loading is a performance killer. Its better to load the entire object graph with one join than to lazy load other properties. This is especially the case with a list of objects. If you iterate you'll end up lazy loading for each item in the list. Furthermore every call to the db has overhead. Less calls = better performance.
SO was a top 1000 website before it needed two database servers. I think you'll be ok.
If your revenue model says "each client will have its own database" than your scaling issues should be really easy to solve. Sounds like you already have a plan to scale up with more servers as your client base increases. Whats the problem?
Caching on the web tier is usually the first scaling fix you'll have to worry about. You probably don't need to do a fresh db call with each page request.
Overall this sounds like a lot of premature optimization. Your traffic hasn't reached a point where you need to be worried about scaling. Make these kinds of decisions at the last second possible.
The database cache is different to most caches - it can if course load used data into memory and re-use query plans, but that isn't really a cache as such.
AppFabric is definitely not just azure; after all, I it was you wouldnt be able to install it (and use it) locally :) but in truth there is little between AppFabroc, redis and memcached (the latter lacks persistance, of course).
But I think you should initially look at using the inbuilt asp.net caching; both data caching via HttpContext.Cache, and caching of entire responses (or, in MVC 3, partials). Obviously you should have a broad idea of what data is used heavily by lots of requests, and is safe to re-use : cache that!
Just make sure you treat all cached FAA as immutable (if you need to update the cache, re-add a modified value; don't modify the existing objects) - reason: it won't work the same if you start needing to use distributed caching, as that uses serialization, and any changes you make won't be seen by the next request.

What are the pros and cons to keeping SQL in Stored Procs versus Code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are the advantages/disadvantages of keeping SQL in your C# source code or in Stored Procs? I've been discussing this with a friend on an open source project that we're working on (C# ASP.NET Forum). At the moment, most of the database access is done by building the SQL inline in C# and calling to the SQL Server DB. So I'm trying to establish which, for this particular project, would be best.
So far I have:
Advantages for in Code:
Easier to maintain - don't need to run a SQL script to update queries
Easier to port to another DB - no procs to port
Advantages for Stored Procs:
Performance
Security
I am not a fan of stored procedures
Stored Procedures are MORE maintainable because:
* You don't have to recompile your C# app whenever you want to change some SQL
You'll end up recompiling it anyway when datatypes change, or you want to return an extra column, or whatever. The number of times you can 'transparently' change the SQL out from underneath your app is pretty small on the whole
You end up reusing SQL code.
Programming languages, C# included, have this amazing thing, called a function. It means you can invoke the same block of code from multiple places! Amazing! You can then put the re-usable SQL code inside one of these, or if you want to get really high tech, you can use a library which does it for you. I believe they're called Object Relational Mappers, and are pretty common these days.
Code repetition is the worst thing you can do when you're trying to build a maintainable application!
Agreed, which is why storedprocs are a bad thing. It's much easier to refactor and decompose (break into smaller parts) code into functions than SQL into... blocks of SQL?
You have 4 webservers and a bunch of windows apps which use the same SQL code Now you realized there is a small problem with the SQl code so do you rather...... change the proc in 1 place or push the code to all the webservers, reinstall all the desktop apps(clickonce might help) on all the windows boxes
Why are your windows apps connecting directly to a central database? That seems like a HUGE security hole right there, and bottleneck as it rules out server-side caching. Shouldn't they be connecting via a web service or similar to your web servers?
So, push 1 new sproc, or 4 new webservers?
In this case it is easier to push one new sproc, but in my experience, 95% of 'pushed changes' affect the code and not the database. If you're pushing 20 things to the webservers that month, and 1 to the database, you hardly lose much if you instead push 21 things to the webservers, and zero to the database.
More easily code reviewed.
Can you explain how? I don't get this. Particularly seeing as the sprocs probably aren't in source control, and therefore can't be accessed via web-based SCM browsers and so on.
More cons:
Storedprocs live in the database, which appears to the outside world as a black box. Simple things like wanting to put them in source control becomes a nightmare.
There's also the issue of sheer effort. It might make sense to break everything down into a million tiers if you're trying to justify to your CEO why it just cost them 7 million dollars to build some forums, but otherwise creating a storedproc for every little thing is just extra donkeywork for no benefit.
This is being discussed on a few other threads here currently. I'm a consistent proponent of stored procedures, although some good arguments for Linq to Sql are being presented.
Embedding queries in your code couples you tightly to your data model. Stored procedures are a good form of contractual programming, meaning that a DBA has the freedom to alter the data model and the code in the procedure, so long as the contract represented by the stored procedure's inputs and outputs is maintained.
Tuning production databases can be extremely difficult when the queries are buried in the code and not in one central, easy to manage location.
[Edit] Here is another current discussion
In my opinion you can't vote for yes or no on this question. It totally depends on the design of your application.
I totally vote against the use of SPs in an 3-tier environment, where you have an application server in front. In this kind of environment your application server is there to run your business logic. If you additionally use SPs you start distributing your implementation of business logic all over your system and it will become very unclear who is responsible for what. Eventually you will end up with an application server that will basically do nothing but the following:
(Pseudocode)
Function createOrder(Order yourOrder)
Begin
Call SP_createOrder(yourOrder)
End
So in the end you have your middle tier running on this very cool 4 Server cluster each of them equipped with 16 CPUs and it will actually do nothing at all! What a waste!
If you have a fat gui client that directly connects to your DB or maybe even more applications it's a different story. In this situation SPs can serve as some sort of pseudo middle tier that decouples your application from the data model and offers a controllable access.
Advantages for in Code:
Easier to maintain - don't need to run a SQL script to update queries
Easier to port to another DB - no procs to port
Actually, I think you have that backwards. IMHO, SQL in code is pain to maintain because:
you end up repeating yourself in related code blocks
SQL isn't supported as a language in many IDE's so you have just a series of un-error checked strings performing tasks for you
changes in a data type, table name or constraint are far more prevalent than swapping out an entire databases for a new one
your level of difficulty increases as your query grows in complexity
and testing an inline query requires building the project
Think of Stored Procs as methods you call from the database object - they are much easier to reuse, there is only one place to edit and in the event that you do change DB providers, the changes happen in your Stored Procs and not in your code.
That said, the performance gains of stored procs is minimal as Stu said before me and you can't put a break point in a stored procedure (yet).
CON
I find that doing lots of processing inside stored procedures would make your DB server a single point of inflexibility, when it comes to scaling your act.
However doing all that crunching in your program as opposed to the sql-server, might allow you to scale more if you have multiple servers that runs your code. Of-course this does not apply to stored procs that only does the normal fetch or update but to ones that perform more processing like looping over datasets.
PROS
Performance for what it may be worth (avoids query parsing by DB driver / plan recreation etc)
Data manipulation is not embedded in the C/C++/C# code which means I have less low level code to look through. SQL is less verbose and easier to look through when listed separately.
Due to the separation folks are able to find and reuse SQL code much easier.
Its easier to change things when schema changes - you just have to give the same output to the code and it will work just fine
Easier to port to a different database.
I can list individual permissions on my stored procedures and control access at that level too.
I can profile my data query/ persistence code separate from my data transformation code.
I can implement changeable conditions in my stored procedure and it would be easy to customize at a customer site.
It becomes easier to use some automated tools to convert my schema and statements together rather than when it is embedded inside my code where I would have to hunt them down.
Ensuring best practices for data access is easier when you have all your data access code inside a single file - I can check for queries that access the non performant table or that which uses a higher level of serialization or select *'s in the code etc.
It becomes easier to find schema changes / data manipulation logic changes when all of it is listed in one file.
It becomes easier to do search and replace edits on SQL when they are in the same place e.g. change / add transaction isolation statements for all stored procs.
I and the DBA guy find that having a separate SQL file is easier / convenient when the DBA has to review my SQL stuff.
Lastly you don't have to worry about SQL injection attacks because some lazy member of your team did not use parametrized queries when using embedded sqls.
The performance advantage for stored procedures is often negligable.
More advantages for stored procedures:
Prevent reverse engineering (if created With Encryption, of course)
Better centralization of database access
Ability to change data model transparently (without having to deploy new clients); especially handy if multiple programs access the same data model
I fall on the code side. We build data access layer that's used by all all the apps (both web and client), so it's DRY from that perspective. It simplifies the database deployment because we just have to make sure the table schema's are correct. It simplifies code maintenance because we don't have to look at source code and the database.
I don't have much problem with the tight coupling with the data model because I don't see where it's possible to really break that coupling. An application and its data are inherently coupled.
Stored procedures.
If an error slips or the logic changes a bit, you do not have to recompile the project. Plus, it allows access from different sources, not just the one place you coded the query in your project.
I don't think it is harder to maintain stored procedures, you should not code them directly in the database but in separate files first, then you can just run them on whatever DB you need to set-up.
Advantages for Stored procedures:
More easily code reviewed.
Less coupled, therefore more easily tested.
More easily tuned.
Performance is generally better, from the point of view of network traffic - if you have a cursor, or similar, then there aren't multiple trips to the database
You can protect access to the data more easily, remove direct access to the tables, enforce security through the procs - this also allows you to find relatively quickly any code that updates a table.
If there are other services involved (such as Reporting services), you may find it easier to store all of your logic in a stored procedure, rather than in code, and having to duplicate it
Disadvantages:
Harder to manage for the developers: version control of the scripts: does everyone have their own database, is the version control system integrated with the database and IDE?
In some circumstances, dynamically created sql in code can have better performance than a stored proc. If you have created a stored proc (let's say sp_customersearch) that gets extremely complicated with dozens of parameters because it must be very flexible, you can probably generate a much simpler sql statement in code at runtime.
One could argue that this simply moves some processing from SQL to the web server, but in general that would be a good thing.
The other great thing about this technique is that if you're looking in SQL profiler you can see the query you generated and debug it much easier than seeing a stored proc call with 20 parameters come in.
I like stored procs, dont know how many times I was able to make a change to an application using a stored procedure which didn't produce any downtime to the application.
Big fan of Transact SQL, tuning large queries have proven to be very useful for me. Haven't wrote any inline SQL in about 6 years!
You list 2 pro-points for sprocs:
Performance - not really. In Sql 2000 or greater the query plan optimisations are pretty good, and cached. I'm sure that Oracle etc do similar things. I don't think there's a case for sprocs for performance any more.
Security? Why would sprocs be more secure? Unless you have a pretty unsecured database anyway all the access is going to be from your DBAs or via your application. Always parametrise all queries - never inline something from user input and you'll be fine.
That's best practice for performance anyway.
Linq is definitely the way I'd go on a new project right now. See this similar post.
#Keith
Security? Why would sprocs be more secure?
As suggested by Komradekatz, you can disallow access to tables (for the username/password combo that connects to the DB) and allow SP access only. That way if someone gets the username and password to your database they can execute SP's but can't access the tables or any other part of the DB.
(Of course executing sprocs may give them all the data they need but that would depend on the sprocs that were available. Giving them access to the tables gives them access to everything.)
Think of it this way
You have 4 webservers and a bunch of windows apps which use the same SQL code
Now you realized there is a small problem with the SQl code
so do you rather......
change the proc in 1 place
or
push the code to all the webservers, reinstall all the desktop apps(clickonce might help) on all the windows boxes
I prefer stored procs
It is also easier to do performance testing against a proc, put it in query analyzer
set statistics io/time on
set showplan_text on and voila
no need to run profiler to see exactly what is being called
just my 2 cents
I prefer keeping in them in code (using an ORM, not inline or ad-hoc) so they're covered by source control without having to deal with saving out .sql files.
Also, stored procedures aren't inherently more secure. You can write a bad query with a sproc just as easily as inline. Parameterized inline queries can be just as secure as a sproc.
Use your app code as what it does best: handle logic.
User your database for what it does best: store data.
You can debug stored procedures but you will find easier to debug and maintaing logic in code.
Usually you will end recompiling your code every time you change the database model.
Also stored procedures with optional search parameters are very inneficient because you have to specify in advance all the possible parameters and complex searches are sometimes not possible because you cant predict how many times a parameter is going to be repeated in the seach.
When it comes to security, stored procedures are much more secure. Some have argued that all access will be through the application anyway. The thing that many people are forgetting is that most security breaches come from inside a company. Think about how many developers know the "hidden" user name and password for your application?
Also, as MatthieuF pointed out, performance can be much improved due to fewer round trips between the application (whether it's on a desktop or web server) and the database server.
In my experience the abstraction of the data model through stored procedures also vastly improves maintainability. As someone who has had to maintain many databases in the past, it's such a relief when confronted with a required model change to be able to simply change a stored procedure or two and have the change be completely transparent to ALL outside applications. Many times your application isn't the only one pointed at a database - there are other applications, reporting solutions, etc. so tracking down all of those affected points can be a hassle with open access to the tables.
I'll also put checks in the plus column for putting the SQL programming in the hands of those who specialize in it, and for SPs making it much easier to isolate and test/optimize code.
The one downside that I see is that many languages don't allow the passing of table parameters, so passing an unknown number data values can be annoying, and some languages still can't handle retrieving multiple resultsets from a single stored procedure (although the latter doesn't make SPs any worse than inline SQL in that respect).
One of the suggestions from a Microsoft TechEd sessions on security which I attended, to make all calls through stored procs and deny access directly to the tables. This approach was billed as providing additional security. I'm not sure if it's worth it just for security, but if you're already using stored procs, it couldn't hurt.
Definitely easier to maintain if you put it in a stored procedure. If there's difficult logic involved that will potentially change in the future it is definitely a good idea to put it in the database when you have multiple clients connecting. For example I'm working on an application right now that has an end user web interface and an administrative desktop application, both of which share a database (obviously) and I'm trying to keep as much logic on the database as possible. This is a perfect example of the DRY principle.
I'm firmly on the side of stored procs assuming you don't cheat and use dynamic SQL in the stored proc. First, using stored procs allows the dba to set permissions at the stored proc level and not the table level. This is critical not only to combating SQL injection attacts but towards preventing insiders from directly accessing the database and changing things. This is a way to help prevent fraud. No database that contains personal information (SSNs, Credit card numbers, etc) or that in anyway creates financial transactions should ever be accessed except through strored procedures. If you use any other method you are leaving your database wide open for individuals in the company to create fake financial transactions or steal data that can be used for identity theft.
Stored procs are also far easier to maintain and performance tune than SQL sent from the app. They also allow the dba a way to see what the impact of a database structural change will have on the way the data is accessed. I've never met a good dba who would allow dynamic access to the database.
We use stored procedures with Oracle DB's where I work now. We also use Subversion. All the stored procedures are created as .pkb & .pks files and saved in Subversion. I've done in-line SQL before and it is a pain! I much prefer the way we do it here. Creating and testing new stored procedures is much easier than doing it in your code.
Theresa
Smaller logs
Another minor pro for stored procedures that has not been mentioned: when it comes to SQL traffic, sp-based data access generates much less traffic. This becomes important when you monitor traffic for analysis and profiling - the logs will be much smaller and readable.
I'm not a big fan of stored procedures, but I use them in one condition:
When the query is pretty huge, it's better to store it in the database as a stored procedure instead of sending it from the code. That way, instead of sending huge ammounts of string characters from the application server to the database, only the "EXEC SPNAME" command will be sent.
This is overkill when the database server and the web server are not on the same network (For example, internet communication). And even if that's not the case, too much stress means a lot of wasted bandwith.
But man, they're so terrible to manage. I avoid them as much as I can.
A SQL stored proc doesn't increase the performance of the query
Well obviously using stored procedures has several advantages over constructing SQL in code.
Your code implementation and SQL become independent of each other.
Code is easier to read.
Write once use many times.
Modify once
No need to give internal details to the programmer about the database. etc , etc.
Stored Procedures are MORE maintainable because:
You don't have to recompile your C# app whenever you want to change some SQL
You end up reusing SQL code.
Code repetition is the worst thing you can do when you're trying to build a maintainable application!
What happens when you find a logic error that needs to be corrected in multiple places? You're more apt to forget to change that last spot where you copy & pasted your code.
In my opinion, the performance & security gains are an added plus. You can still write insecure/inefficient SQL stored procedures.
Easier to port to another DB - no procs to port
It's not very hard to script out all your stored procedures for creation in another DB. In fact - it's easier than exporting your tables because there are no primary/foreign keys to worry about.
#Terrapin - sprocs are just as vulnerable to injection attacks. As I said:
Always parametrise all queries - never inline something from user input and you'll be fine.
That goes for sprocs and dynamic Sql.
I'm not sure not recompiling your app is an advantage. I mean, you have run your unit tests against that code (both application and DB) before going live again anyway.
#Guy - yes you're right, sprocs do let you control application users so that they can only perform the sproc, not the underlying action.
My question would be: if all the access it through your app, using connections and users with limited rights to update/insert etc, does this extra level add security or extra administration?
My opinion is very much the latter. If they've compromised your application to the point where they can re-write it they have plenty of other attacks they can use.
Sql injections can still be performed against those sprocs if they dynamically inline code, so the golden rule still applies, all user input must always be parametrised.
Something that I haven't seen mentioned thus far: the people who know the database best aren't always the people that write the application code. Stored procedures give the database folks a way to interface with programmers that don't really want to learn that much about SQL. Large--and especially legacy--databases aren't the easiest things to completely understand, so programmers might just prefer a simple interface that gives them what they need: let the DBAs figure out how to join the 17 tables to make that happen.
That being said, the languages used to write stored procedures (PL/SQL being a notorious example) are pretty brutal. They typically don't offer any of the niceties you'd see in today's popular imperative, OOP, or functional languages. Think COBOL.
So, stick to stored procedures that merely abstract away the relational details rather than those that contain business logic.
I generally write OO code. I suspect that most of you probably do, too. In that context, it seems obvious to me that all of the business logic - including SQL queries - belongs in the class definitions. Splitting up the logic such that part of it resides in the object model and part is in the database is no better than putting business logic into the user interface.
Much has been said in earlier answers about the security benefits of stored procs. These fall into two broad categories:
1) Restricting direct access to the data. This definitely is important in some cases and, when you encounter one, then stored procs are pretty much your only option. In my experience, such cases are the exception rather than the rule, however.
2) SQL injection/parametrized queries. This objection is a red herring. Inline SQL - even dynamically-generated inline SQL - can be just as fully parametrized as any stored proc and it can be done just as easily in any modern language worth its salt. There is no advantage either way here. ("Lazy developers might not bother with using parameters" is not a valid objection. If you have developers on your team who prefer to just concatenate user data into their SQL instead of using parameters, you first try to educate them, then you fire them if that doesn't work, just like you would with developers who have any other bad, demonstrably detrimental habit.)
I am a huge supporter of code over SPROC's. The number one reasons is keeping the code tightly coupled, then a close second is the ease of source control without a lot of custom utilities to pull it in.
In our DAL if we have very complex SQL statements, we generally include them as resource files and update them as needed (this could be a separate assembly as well, and swapped out per db, etc...).
This keeps our code and our sql calls stored in the same version control, without "forgetting" to run some external applications for updating.

Categories