Ensure "Reasonable" queries only - c#

In our organization we have the need to let employees filter data in our web application by supplying WHERE clauses. It's worked great for a long time, but we occasionally run into users providing queries that require full table scans on large tables or inefficient joins, etc.
Some clown might write something like:
select * from big_table where
Name in (select name from some_table where name like '%search everything%')
or name in ('a', 'b', 'c')
or price < 20
or price > 40
or exists (select 1 from some_other_table where col1 + col2 + col3 = 4)
or exists (select 1 from table_a, table+b)
Obviously, this is not a great way to query these tables with computed values, non-indexed columns, lots of OR's and an unrestricted join on table_a and table_b.
But for a user, this may make total sense.
So what's the best way, if any, to allow internal users to supply a query to the database while ensuring that it won't lock a dozen tables and hang the webserver for 5 minutes?
I'm guessing that's a programmatic way in c#/sql-server to get the execution plan for a query before it runs. And if so, what factors contribute to cost? Estimated I/O cost? Estimated CPU cost? What would be reasonable limits at which to tell the user that his query's no good?
EDIT: We're a market research company. We have thousands of surveys, each with their own data. We have dozens of researchers that want to slice that data in arbitrary ways. We have tools to let them construct "valid" filters using a GUI, but some "power users" want to supply their own queries. I realize this isn't standard or best practice, but how else can I let dozens of users query tables for the rows they want using arbitrarily complex conditions and ever-changing conditions?

The premise of your question states:
In our organization we have the need to let employees filter date in our web application by supplying WHERE clauses.
I find this premise to be flawed on its face. I can't imagine a situation where I would allow users to do this. In addition to the problems you have already identified, you are opening yourself up to SQL Injection attacks.
I would highly recommend reassessing your requirements to see if you can't build a safer, more focused way of allowing your users to search.
However, if your users really are sophisticated (and trusted!) enough to be supplying WHERE clauses directly, they need to be educated on what they can and can't submit as a filter.

You can try using the following:
SET SHOWPLAN_ALL ON
GO
SET FMTONLY ON
GO
<<< Your SQL code here >>>
GO
SET FMTONLY OFF
GO
SET SHOWPLAN_ALL OFF
GO
Then you can parse through what you've got. As to where to draw the line on various things, that's going to take some experience. There are some things to watch for, but nothing that is cut and dried. It's often more of an art to examine the query plans than a science.
As others have pointed out though, I think that your problem goes deeper than the technology implications. The fact that you let unqualified people access your database in such a way is the underlying problem. From past experience, I often see this in companies where they are too lazy or too inexperienced to properly capture their application's requirements. I'm not saying that this is necessarily the case with your corporate environment, but that's what I've seen.

In addition of trying to control what the users enter (which is a loosing battle, there will always be a new hire that will come up with an immaginative query), I'd look into Resource Governor, see Managing SQL Server Workloads with Resource Governor. You put the ad-hoc queries into a separate pool and cap the allocated resources. This way you can mitigate the problem by limiting the amount of damage a bad query can do to other tasks.
And you should also consider giving access to the data by other means, like Power Pivot and let users massage their data as hard as they want on their own Excel. Business power users love that, and the impact on the transaciton processign server is minimal.

Instead of allowing employees to directly write (append to) queries, and then trying to calculate the query cost before running it, why not create some kind of Advanced Search or filter feature that is NOT writing SQL you cannot control?

In very large Enterprise originations on internal application this is a common practice. Often during your design phase you will limit the criteria or put sensible limits on data ranges, but once the business gets hold of the app there will be calls from the business unit management to remove the restrictions. In my origination this is a management problem not an engineering issue.
What we did was profile all of the criteria and found the largest offenders, both users and what types of queries caused the most problems and put limitations on some of the queries. Also some very expensive queries that were used on a regular basis were added to the app and the app cached the results and ran the queries when load was low. We also created caned optimized queries for standard users and gave only specified users the ability to search for anything. Just a couple of ideas.

You could make a data model for your database and allow users to use SQL Reporting Services' Report Builder. Its GUI-based and doesn't require writing WHERE clauses, so there should be a limit to how much damage they can do.
Or you could warehouse a copy of the db for the purpose of user queries, update the db every hour or so, and let them go to town... :)

I have worked a few places where this also came up. What we ended up doing was NOT allowing users unconstrained access, and promising to have IT do their best to provide queries when needed. The issue was that the database is fairly complicated, and even if users could write grammatically and syntactically correct SQL, they don't necessarily understand the relationships between the tables. In other words, even if they could write their own SQL they would get the wrong answers. We convinced the users that the risk of making the wrong decision based on a flawed or incomplete understanding of the 200 tables in the database was too high. Better to get the right answer after a day than the wrong one instantly.
The other part of this is what does IT do when user A writes a query and gets 1 answer, then user B writes what he thinks is the same query and gets a different answer? Is it IT's job to find the differences? To fix both pieces of SQL? etc. The bottom line is that I would not allow them access. I would load the system with predefined queries, as others have mentioned, and try to train mgmt why that is the only way it will work in the long run.

If you have so much data and you want to provide to your customers the ability to analyse and view the information as they want to, I strongly recommand to thing about OLAP technologies.

I guess you've never heard of SQL Injection attacks? What if the user enters A DROP DATABASE command after the WHERE clause?

This is the reason that direct SELECT permission is almost never given to users in the vast majority of applications.
A far better approach would be to engineer your application around use cases so that you are able to cover a reasonable percentage of requirements with specifically designed filters/aggregation/layout options.
There are a myriad of ways to do this so some analysis of your specific problem domain will definitely be required together with research into viable methods.
Whilst direct SQL access is the most flexible for your users, long executing queries are likely to be just the start of your headaches. SQL injection is a big concern here, whether it's source is malicious or simply misguided.

(Chad mentioned this in a comment, but I think it deserves to be an answer.)
Maybe you should copy data that needs to be queried ad-hoc into a separate database, to isolate any problems from the majority of users.

Related

Most effective way of storing and managing moderate number of users

In a current project of mine I need to manage and store a moderate number (from 10-100 to 5000+) of users (ID, username, and some other data).
This means I have to be able to find users quickly at runtime, and I have to be able to save and restore the database to continue statistics after a restart of the program. I will also need to register every connect/disconnect/login/logout of a user for the statistics. (And some other data as well, but you get the idea).
In the past, I saved settings and other stuff in encoded textfiles, or serialized the needed objects and wrote them down. But these methods require me to rewrite the whole database on each change, and that's increasingly slowing it down (especially with a growing number of users/entries), isn't it?
Now the question is: What is the best way to do this kind of thing in C#?
Unfortunately, I don't have any experience in SQL or other query languages (except for a bit of LINQ), but that's not posing any problem for me, as I have the time and motivation to learn one (or more if required) for this task.
Most effective is highly subjective based on who you ask even if narrowing down this question to specific needs. If you are storing non-relational data Mongo or some other NoSQL type of database such as Raven DB would be effective. If your data has a relational shape then an RDBMS such as MySQL, SQL Server, or Oracle would be effective. Relational databases are ideal if you are going to have heavy reporting requirements as this allows non-developers more ease of access in writing simple SQL queries against it. But also keeping in mind performance with disk cache persistence that databases provide. Commonly accessed data is stored in memory to save the round trips to the disk (with hybrid drives I suppose accessing some files directly accomplishes the same thing however SSD's are still not as fast as RAM access). So you really need to ask yourself some questions to identify the best solution for you; What is the shape of your data (flat, relational, etc), do you have reporting requirements where less technical team members need to be able to query the data repository, and what are your performance metrics?

Do SQL Queries Need to be that Complicated?

An open-ended question which may not have a "right" answer, but expert input on this would be appreciated.
Do SQL Queries Need to be that Complicated?
From a Web Dev point of view, as C#/.Net progresses, it seems that there are plenty of easy ways (LINQ, Generics) to do a lot of the things that some people tend to do in their SQL queries (sorting, ordering, merging, etc). That being said, since SQL tends to be the processing "bottleneck" for a lot of apps, a lot of the logic for SQL queries is being moved to the business layer.
As this trend continues, I'm seeing less of a need for large SQL queries.
What do you all think? Are you still writing large SQL queries? If so, is it because you need to or because you are more comfortable doing so than working in the business layer?
What's a "large" query?
The "bottleneck" encountered IME is typically because the tables were modeled poorly, compounded by someone constructing SQL queries that has little to no experience with SQL (the most common issue being thinking SQL is procedural when it's actually SET based). Lack of indexing is the next most common issue.
ORM has evolved to support native queries -- clear recognition that ORM simplifies database interaction, but can't perform as well as proper SQL query development.
Keeping the persistence handling in the business layer is justified by desiring database independence (at the risk of performance). Otherwise, it's a waste of money and resources to ignore what the database can handle in far larger loads, in a central location (that can be clustered).
It depends entirely on the processing. If you're trying to do lots of crazy stuff in your SQL which does things like pivoting or text processing, or whatever, and it turns out to be faster to avoid doing it in SQL and process it outside the database server instead, then yes, you were probably using SQL wrong, and the code belongs in the business layer or on the client.
In contrast, SQL excels at set operations, and that's what it should primarily be used for. I've seen an awful lot of applications slowed down because business logic or display code was grabbing a million rows of resultset from the database, bringing them back one at a time, and then throwing 990,000 of them away by doing what's effectively a set operation (JOIN, whatever) outside the database, instead of selecting the 10,000 interesting results using a query on the server and then processing the results of that.
So. It depends on what you mean by "large SQL queries". I feel from the way you're asking the question that what you mean is "overly-complex, non-set-based translations of business/presentation logic into SQL queries that should never have been written in the first place."
in many data-in/data-out cases, no.
in some cases, yes.
If all you need to work with is a simple navigation hierarchy (mainly focusing on parent, sibling, child, etc), then LINQ and it's friends are excellent choices - they reduce the pain (and effort and risk) from the majority of queries. But there are a number of scenarios where it doesn't work so well:
large-scale set-based operations: I can do a wide-ranging query in TSQL without the need to drag that data over the network in one large query, and then (even worse) update each record individually (since in many cases the ORM tools will choose individual UPDATE/INSERT/DELETE operations etc). Not only is this slow, it increases the chances of data drift. So to counter that you might add a transaction - but a long-lived transaction (while you suck a glut of data over the network) is bad
simply: there are a lot of queries where hand-tuning it achieves things that the ORMs simply can't; I had a scenario recently where a relatively basic LINQ query was performing badly. I hand tuned it (using some ROW_NUMBER() etc) and the IO stats went down to only 5% of what they were with the generated query.
there are some queries that are exceptionally difficult to express in some query syntax options, and even if you do - would lead to bad queries. Yet which can be expressed very elegantly in TSQL: example: Linq to Sql: select query with a custom order by
This is a subjective question.
IMO, SQL (or whatever query language you use to access the db) should be as complicated as necessary to solve performance problems.
There are two competing interests:
Performance: This means, load the least amount of data you need in the smallest number of queries.
Maintainability: Load as much as possible (lets say, as it makes sense) with the simplest, most reusable kind of query and do everything else in memory.
So you always need to find your way between performance and maintainability. This is actually nothing special - that's what you do when programming all the time.
Newer ways of doing db queries don't change a lot in this situation. Even if you use NHibernate's HQL, you consider performance and maintainability. You already went a step to maintainability, but you may fall back to SQL to tune some queries.
For me, the deciding factor between writing a giant sql query or a bunch of simple queries and then do everything in the code is usually performance. The latter is preferred but if it goes way too slow, I'll do the former (Sql is optimized for data processing after all).
The reason because I prefer the latter is, that in general my team is more comfortable with code then sql queries. I like sql a lot but if a giant sql query means that I'm only one who can debug/understand it in a reasonable amount of time, that's not a good thing. Another reason is also that with a giant query, you will usually program some business logic in it. If I have a business layer, I prefer too have as much of my business logic there as possible.
Off course, you could decide to stuff all your business logic in stored procedures. Your program is then nothing more then a GUI interface to the API of your database. It depends on the requirements of your project and if your team can handle this.
That said, you give Linq as an alternative technology. I have noticed in my team that thanks to my experience with SQL, I'm very comfortable with Linq while my colleagues are not. The problem on a deeper level is procedural vs set based thinking. Linq is comparable to sql. If you are not comfortable with SQL, chances are you won't be with Linq.

Join queries and when it's too much

I find that I am using a lot of join queries, especially to get statistics about user operations from my database. Queries like this are not uncommon:
from io in db._Owners where io.tenantId == tenantId
join i in db._Instances on io.instanceId equals i.instanceId
join m in db._Machines on i.machineId equals m.machineId
select ...
My app is still not active, so I have no way of judging if these queries will be computationally prohibitive in real-life. My query:
Is there a limit to when doing too many 'joins' is too much, and can that be described without getting real-life operating stats?
What are my alternatives? For example, is it better to just create additional tables to hold statistics that are I update as I go, rather than pulling together different table sources each time I want statistics?
If you do not have performance information then do not optimize.
Premature optimization is the root of all evil.
1) I don't think you'll ever reach the "limit".
2) This is called denomalization, premature denormalization is just wasted effort if you don't know if a problem exists.
I'd say your query looks pretty normal.
1) Is there a limit to when doing too many 'joins' is too much
No, the number of joins isn't an issue so much as the structure of the data within each table, presence and use of indexes and what needs to be done to get data out.
Normalized data is commonly a primary goal in relational DB design. You typically consider denormalization as a means of optimizing queries only as necessary because of the added effort required to maintain data consistency.
If you're really concerned, post your data model ERD (database tables & how they relate) and the database you are using for the project (because not all databases are the same).
Unless you have very high traffic and indexes are properly set, etc., you shouldn't have problems.
For reporting/analysis, some places will create a data warehouse which in its most basic form is a [partially] denormalized copy of your main database. They are easier to report on since one table usually contain most, if not all, the data needed in a report. They can also be much faster to read from since you don't have to join so much. However, they'll require more disk space (duplicated data). If writes are allowed, they'll be slower (have to update all the duplicated data) and you'll have the problem of keeping that duplicated data consistent.
In other words, unless you're only doing reporting (or read-only access), keep the joins.

Why do I need Stored Procedures when I have LINQ to SQL

My understanding of Linq to Sql is it will take my Linq statement and convert it into an equivalent SQL statement.
So
var products = from p in db.Products
where p.Category.CategoryName == "Beverages"
select p
Just turns into
Select * from Products where CategoryName = 'Beverages'
If that's the case, I don't see how stored procedures are useful anymore.
Sprocs are another tool in the box. You might use your fancy automatically-adjusting wrench for 90% of your tasks, but you can't use that shiny thing on stripped nuts. For that a good ol' monkey wrench is your best friend. Unless you break the bolt, in which case you're stuck with assembly.
if that's all you ever did in sql, you didn't need sprocs before!
Security.
I've seen several "security best practice" guidelines which recommend you do all your data access via SP's, and you only grant privileges to execute those SP's.
If a client simply cannot do select or delete on any database tables, the risk may be lower should that client be hacked.
I've never personally worked on a project which worked this way, it always seemed like a giant pain in the backside.
Ah, the subject of many a debate.
Many would argue these days that technologies such as LINQ-to-SQL generate such good SQL these days that the performance advantages are marginal. Personally, I prefer SQL experts tuning SQL performance, not general coders, so I tend to disagree.
However, my main preference for stored procedures has less to do with performance and more to do with security and configuration management.
Much of my architectural work is on service-oriented solutions and by treating the database as a service, it is significantly aided by the use of stored procedures.
Principally, limiting access to the database through stored procedures creates a well-defined interface, limiting the attack surface area and increasing testability. Allowing applications direct access to the underlying data greatly increases the attack surface area, reducing security, and makes impact analysis extremely difficult.
Stored Procedures and Linq to Sql solve different problems.
Linq to Sql is particular to Microsoft SQL Server.
I tend to prefer using stored procedures for several reasons:
it makes the security configuration easier (as mentioned by other posters).
It provides a clearly defined interface for DB access (although responsibility for this could be shifted into other areas, such as a DAL written in C#
I find that the Query Optimizer, in Oracle at least, is able to make more intelligent decisions the more information you give it. This really requires testing with both methods though for your specific scenarios though.
Depending on the developers available, you may have some very good SQL coders who will be better at producing efficient queries if they use sprocs.
The downside is that it can be a pain to keep the code that invokes the sprocs in sync with the database if things are evolving rapidly. The points about producing efficient queries could count as premature optimization. At the end of the day, there is no substitute for benchmarking performance under realistic conditions.
I can think of several good reasons for stored procedures:
When working with bigger tables, it can be hard to generate an efficient query using LINQ to SQL.
A DBA can analyze and troubleshout stored procedures. But think of what happens when two complicated LINQ operations from different front-ends clash.
Stored procedures can enforce data integrity. Deny write access on tables, and allow changes only through stored procedure.
Updating stored procedures is as easy as running ALTER PROCEDURE on a server. If a deployment takes months, and a script minutes, you'll be more flexible with stored procedures.
For a small application that's maintained by one person, stored procedures are probably overkill.
There are significant associated performance improvements on the SQL Server side of things if you use stored procedures in appropriate circumstances.
Stored procedure support for LINQ to SQL was included partly for compatibility with existing systems. This allows developers to migrate from a sproc-based system to a fully LINQ-based system over time, sproc by sproc, rather than forcing developers to make a rush to convert an entire system all at once.
Personally, I don't care for LINQ. I like a separation of the data manipulation stuff and the code stuff. Additionally, the anonymous types that are generated from a LINQ statement cannot be passed-off to other layers of an n-tier application, so either the type needs to be concretely defined, or the LINQ call needs to be made in the UI. Gack!
Additionally, there are the security concerns (whatever user the LINQ code is calling into MS SQL Server under needs to have unfettered access to the data, so if that username/password are compromised, so is the data).
And lastly, LINQ to SQL only works for MS SQL Server (as it comes from MS).
Sprocs have their uses, just like using LINQ does. IMO if an operation is performed multiple times in multiple places then it's a good candidate for "refactoring" into a Stored Proc, as opposed to a LINQ statement that is repeated in different places.
Also, and this is probably blasphemy to a lot of people here, sometimes you should put some logic into the database and then a sproc comes in handy. It's a rare occurrence but sometimes the nature of business rules demands it.
Stored Procedures are useful in many cases, but in General if you are using an ORM you should let the ORM generate the SQL for you. Why should we have to maintain at a minimum of four stored procedures (insert update delete and a single select) for each table.
With that said as people pointed out there are security benefits to using stored procedures. You won't have to grant users read/write to the tables, which is a good protection against SQL Injection.
Stored Procedures are also useful when the logic used to retrieve data is fairly complex. You typicaly see this more in Reporting Scenario's and in which case your probally not using Linq2Sql or some other ORM.
In my opinion if your not generating your SQL but essentially hardcoding it within an app tier, then that should be refactored into stored procedures, and yes there are always exceptions to any rules but in general.
One use of a stored procedure in Linq2Sql might be if you have multiple servers, and are linking to them, you could use a stored procedure to expose data from that other server and manipulate it. This would hide the multiple servers from your application.
Some things can't be done without stored procedures. For instance, at my previous job, there was a stored procedure that return the current value from a row, and incremented it in the same atomic operation such that no two processes every got the same value. I don't remember why this was done instead of using auto-increment, but there was a reason for it.
Reason : Large amounts of data to move from one table to another.
Let's say that once in a while you have to archive items from one table to another or do similar things. With LINQ that would mean to retrieve let's say one million rows from table A into the DBMS client and then insert them into table B.
With a stored procedure things work nice, in sets.
Lots of people have been getting by just fine without them for some time now. If you can do your work securely and efficiently without them, don't feel guilty about going with pure L2S. We're glad to be rid of them # my shop.
You certainly don't "need" stored procedures. But they can come in handy if your domain model requires a complex aggregate Entity and you don't have the luxury/flexibility to modify your database tables to fit your domain model. In this case using Linq-to-SQL or another ORM might result in a very poorly performing set of database calls to construct your Entity. A stored proc can come to the rescue here.
Of course, I would advocate using a methodology or process like TDD/BDD that provides you the flexibility to modify your database tables as needed without much pain. That's always the easier, more maintainable path in my opinion.
Simple example:
select * from Products where GetCategoryType(CategoryName)=1
GetCategoryType can run really fast, because it runs on the DB server.
There's no Linq to SQL substitute for that as far as I know.
I'm coming rather late to this thread. But depending on who you talk to, Linq to SQL is either dead, very dead, or at best a zombie.
In addition, no single tool suits every situation - you need to choose the right tool for the specific job in hand:
Stored procs enable you to enforce complex business rules across multiple client applications.
Stored procs can give you a great security layer.
Stored procs can give you a great abstraction layer.
Stored procs can give you better caching in some circumstances.

What are the pros and cons to keeping SQL in Stored Procs versus Code [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Closed 9 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
What are the advantages/disadvantages of keeping SQL in your C# source code or in Stored Procs? I've been discussing this with a friend on an open source project that we're working on (C# ASP.NET Forum). At the moment, most of the database access is done by building the SQL inline in C# and calling to the SQL Server DB. So I'm trying to establish which, for this particular project, would be best.
So far I have:
Advantages for in Code:
Easier to maintain - don't need to run a SQL script to update queries
Easier to port to another DB - no procs to port
Advantages for Stored Procs:
Performance
Security
I am not a fan of stored procedures
Stored Procedures are MORE maintainable because:
* You don't have to recompile your C# app whenever you want to change some SQL
You'll end up recompiling it anyway when datatypes change, or you want to return an extra column, or whatever. The number of times you can 'transparently' change the SQL out from underneath your app is pretty small on the whole
You end up reusing SQL code.
Programming languages, C# included, have this amazing thing, called a function. It means you can invoke the same block of code from multiple places! Amazing! You can then put the re-usable SQL code inside one of these, or if you want to get really high tech, you can use a library which does it for you. I believe they're called Object Relational Mappers, and are pretty common these days.
Code repetition is the worst thing you can do when you're trying to build a maintainable application!
Agreed, which is why storedprocs are a bad thing. It's much easier to refactor and decompose (break into smaller parts) code into functions than SQL into... blocks of SQL?
You have 4 webservers and a bunch of windows apps which use the same SQL code Now you realized there is a small problem with the SQl code so do you rather...... change the proc in 1 place or push the code to all the webservers, reinstall all the desktop apps(clickonce might help) on all the windows boxes
Why are your windows apps connecting directly to a central database? That seems like a HUGE security hole right there, and bottleneck as it rules out server-side caching. Shouldn't they be connecting via a web service or similar to your web servers?
So, push 1 new sproc, or 4 new webservers?
In this case it is easier to push one new sproc, but in my experience, 95% of 'pushed changes' affect the code and not the database. If you're pushing 20 things to the webservers that month, and 1 to the database, you hardly lose much if you instead push 21 things to the webservers, and zero to the database.
More easily code reviewed.
Can you explain how? I don't get this. Particularly seeing as the sprocs probably aren't in source control, and therefore can't be accessed via web-based SCM browsers and so on.
More cons:
Storedprocs live in the database, which appears to the outside world as a black box. Simple things like wanting to put them in source control becomes a nightmare.
There's also the issue of sheer effort. It might make sense to break everything down into a million tiers if you're trying to justify to your CEO why it just cost them 7 million dollars to build some forums, but otherwise creating a storedproc for every little thing is just extra donkeywork for no benefit.
This is being discussed on a few other threads here currently. I'm a consistent proponent of stored procedures, although some good arguments for Linq to Sql are being presented.
Embedding queries in your code couples you tightly to your data model. Stored procedures are a good form of contractual programming, meaning that a DBA has the freedom to alter the data model and the code in the procedure, so long as the contract represented by the stored procedure's inputs and outputs is maintained.
Tuning production databases can be extremely difficult when the queries are buried in the code and not in one central, easy to manage location.
[Edit] Here is another current discussion
In my opinion you can't vote for yes or no on this question. It totally depends on the design of your application.
I totally vote against the use of SPs in an 3-tier environment, where you have an application server in front. In this kind of environment your application server is there to run your business logic. If you additionally use SPs you start distributing your implementation of business logic all over your system and it will become very unclear who is responsible for what. Eventually you will end up with an application server that will basically do nothing but the following:
(Pseudocode)
Function createOrder(Order yourOrder)
Begin
Call SP_createOrder(yourOrder)
End
So in the end you have your middle tier running on this very cool 4 Server cluster each of them equipped with 16 CPUs and it will actually do nothing at all! What a waste!
If you have a fat gui client that directly connects to your DB or maybe even more applications it's a different story. In this situation SPs can serve as some sort of pseudo middle tier that decouples your application from the data model and offers a controllable access.
Advantages for in Code:
Easier to maintain - don't need to run a SQL script to update queries
Easier to port to another DB - no procs to port
Actually, I think you have that backwards. IMHO, SQL in code is pain to maintain because:
you end up repeating yourself in related code blocks
SQL isn't supported as a language in many IDE's so you have just a series of un-error checked strings performing tasks for you
changes in a data type, table name or constraint are far more prevalent than swapping out an entire databases for a new one
your level of difficulty increases as your query grows in complexity
and testing an inline query requires building the project
Think of Stored Procs as methods you call from the database object - they are much easier to reuse, there is only one place to edit and in the event that you do change DB providers, the changes happen in your Stored Procs and not in your code.
That said, the performance gains of stored procs is minimal as Stu said before me and you can't put a break point in a stored procedure (yet).
CON
I find that doing lots of processing inside stored procedures would make your DB server a single point of inflexibility, when it comes to scaling your act.
However doing all that crunching in your program as opposed to the sql-server, might allow you to scale more if you have multiple servers that runs your code. Of-course this does not apply to stored procs that only does the normal fetch or update but to ones that perform more processing like looping over datasets.
PROS
Performance for what it may be worth (avoids query parsing by DB driver / plan recreation etc)
Data manipulation is not embedded in the C/C++/C# code which means I have less low level code to look through. SQL is less verbose and easier to look through when listed separately.
Due to the separation folks are able to find and reuse SQL code much easier.
Its easier to change things when schema changes - you just have to give the same output to the code and it will work just fine
Easier to port to a different database.
I can list individual permissions on my stored procedures and control access at that level too.
I can profile my data query/ persistence code separate from my data transformation code.
I can implement changeable conditions in my stored procedure and it would be easy to customize at a customer site.
It becomes easier to use some automated tools to convert my schema and statements together rather than when it is embedded inside my code where I would have to hunt them down.
Ensuring best practices for data access is easier when you have all your data access code inside a single file - I can check for queries that access the non performant table or that which uses a higher level of serialization or select *'s in the code etc.
It becomes easier to find schema changes / data manipulation logic changes when all of it is listed in one file.
It becomes easier to do search and replace edits on SQL when they are in the same place e.g. change / add transaction isolation statements for all stored procs.
I and the DBA guy find that having a separate SQL file is easier / convenient when the DBA has to review my SQL stuff.
Lastly you don't have to worry about SQL injection attacks because some lazy member of your team did not use parametrized queries when using embedded sqls.
The performance advantage for stored procedures is often negligable.
More advantages for stored procedures:
Prevent reverse engineering (if created With Encryption, of course)
Better centralization of database access
Ability to change data model transparently (without having to deploy new clients); especially handy if multiple programs access the same data model
I fall on the code side. We build data access layer that's used by all all the apps (both web and client), so it's DRY from that perspective. It simplifies the database deployment because we just have to make sure the table schema's are correct. It simplifies code maintenance because we don't have to look at source code and the database.
I don't have much problem with the tight coupling with the data model because I don't see where it's possible to really break that coupling. An application and its data are inherently coupled.
Stored procedures.
If an error slips or the logic changes a bit, you do not have to recompile the project. Plus, it allows access from different sources, not just the one place you coded the query in your project.
I don't think it is harder to maintain stored procedures, you should not code them directly in the database but in separate files first, then you can just run them on whatever DB you need to set-up.
Advantages for Stored procedures:
More easily code reviewed.
Less coupled, therefore more easily tested.
More easily tuned.
Performance is generally better, from the point of view of network traffic - if you have a cursor, or similar, then there aren't multiple trips to the database
You can protect access to the data more easily, remove direct access to the tables, enforce security through the procs - this also allows you to find relatively quickly any code that updates a table.
If there are other services involved (such as Reporting services), you may find it easier to store all of your logic in a stored procedure, rather than in code, and having to duplicate it
Disadvantages:
Harder to manage for the developers: version control of the scripts: does everyone have their own database, is the version control system integrated with the database and IDE?
In some circumstances, dynamically created sql in code can have better performance than a stored proc. If you have created a stored proc (let's say sp_customersearch) that gets extremely complicated with dozens of parameters because it must be very flexible, you can probably generate a much simpler sql statement in code at runtime.
One could argue that this simply moves some processing from SQL to the web server, but in general that would be a good thing.
The other great thing about this technique is that if you're looking in SQL profiler you can see the query you generated and debug it much easier than seeing a stored proc call with 20 parameters come in.
I like stored procs, dont know how many times I was able to make a change to an application using a stored procedure which didn't produce any downtime to the application.
Big fan of Transact SQL, tuning large queries have proven to be very useful for me. Haven't wrote any inline SQL in about 6 years!
You list 2 pro-points for sprocs:
Performance - not really. In Sql 2000 or greater the query plan optimisations are pretty good, and cached. I'm sure that Oracle etc do similar things. I don't think there's a case for sprocs for performance any more.
Security? Why would sprocs be more secure? Unless you have a pretty unsecured database anyway all the access is going to be from your DBAs or via your application. Always parametrise all queries - never inline something from user input and you'll be fine.
That's best practice for performance anyway.
Linq is definitely the way I'd go on a new project right now. See this similar post.
#Keith
Security? Why would sprocs be more secure?
As suggested by Komradekatz, you can disallow access to tables (for the username/password combo that connects to the DB) and allow SP access only. That way if someone gets the username and password to your database they can execute SP's but can't access the tables or any other part of the DB.
(Of course executing sprocs may give them all the data they need but that would depend on the sprocs that were available. Giving them access to the tables gives them access to everything.)
Think of it this way
You have 4 webservers and a bunch of windows apps which use the same SQL code
Now you realized there is a small problem with the SQl code
so do you rather......
change the proc in 1 place
or
push the code to all the webservers, reinstall all the desktop apps(clickonce might help) on all the windows boxes
I prefer stored procs
It is also easier to do performance testing against a proc, put it in query analyzer
set statistics io/time on
set showplan_text on and voila
no need to run profiler to see exactly what is being called
just my 2 cents
I prefer keeping in them in code (using an ORM, not inline or ad-hoc) so they're covered by source control without having to deal with saving out .sql files.
Also, stored procedures aren't inherently more secure. You can write a bad query with a sproc just as easily as inline. Parameterized inline queries can be just as secure as a sproc.
Use your app code as what it does best: handle logic.
User your database for what it does best: store data.
You can debug stored procedures but you will find easier to debug and maintaing logic in code.
Usually you will end recompiling your code every time you change the database model.
Also stored procedures with optional search parameters are very inneficient because you have to specify in advance all the possible parameters and complex searches are sometimes not possible because you cant predict how many times a parameter is going to be repeated in the seach.
When it comes to security, stored procedures are much more secure. Some have argued that all access will be through the application anyway. The thing that many people are forgetting is that most security breaches come from inside a company. Think about how many developers know the "hidden" user name and password for your application?
Also, as MatthieuF pointed out, performance can be much improved due to fewer round trips between the application (whether it's on a desktop or web server) and the database server.
In my experience the abstraction of the data model through stored procedures also vastly improves maintainability. As someone who has had to maintain many databases in the past, it's such a relief when confronted with a required model change to be able to simply change a stored procedure or two and have the change be completely transparent to ALL outside applications. Many times your application isn't the only one pointed at a database - there are other applications, reporting solutions, etc. so tracking down all of those affected points can be a hassle with open access to the tables.
I'll also put checks in the plus column for putting the SQL programming in the hands of those who specialize in it, and for SPs making it much easier to isolate and test/optimize code.
The one downside that I see is that many languages don't allow the passing of table parameters, so passing an unknown number data values can be annoying, and some languages still can't handle retrieving multiple resultsets from a single stored procedure (although the latter doesn't make SPs any worse than inline SQL in that respect).
One of the suggestions from a Microsoft TechEd sessions on security which I attended, to make all calls through stored procs and deny access directly to the tables. This approach was billed as providing additional security. I'm not sure if it's worth it just for security, but if you're already using stored procs, it couldn't hurt.
Definitely easier to maintain if you put it in a stored procedure. If there's difficult logic involved that will potentially change in the future it is definitely a good idea to put it in the database when you have multiple clients connecting. For example I'm working on an application right now that has an end user web interface and an administrative desktop application, both of which share a database (obviously) and I'm trying to keep as much logic on the database as possible. This is a perfect example of the DRY principle.
I'm firmly on the side of stored procs assuming you don't cheat and use dynamic SQL in the stored proc. First, using stored procs allows the dba to set permissions at the stored proc level and not the table level. This is critical not only to combating SQL injection attacts but towards preventing insiders from directly accessing the database and changing things. This is a way to help prevent fraud. No database that contains personal information (SSNs, Credit card numbers, etc) or that in anyway creates financial transactions should ever be accessed except through strored procedures. If you use any other method you are leaving your database wide open for individuals in the company to create fake financial transactions or steal data that can be used for identity theft.
Stored procs are also far easier to maintain and performance tune than SQL sent from the app. They also allow the dba a way to see what the impact of a database structural change will have on the way the data is accessed. I've never met a good dba who would allow dynamic access to the database.
We use stored procedures with Oracle DB's where I work now. We also use Subversion. All the stored procedures are created as .pkb & .pks files and saved in Subversion. I've done in-line SQL before and it is a pain! I much prefer the way we do it here. Creating and testing new stored procedures is much easier than doing it in your code.
Theresa
Smaller logs
Another minor pro for stored procedures that has not been mentioned: when it comes to SQL traffic, sp-based data access generates much less traffic. This becomes important when you monitor traffic for analysis and profiling - the logs will be much smaller and readable.
I'm not a big fan of stored procedures, but I use them in one condition:
When the query is pretty huge, it's better to store it in the database as a stored procedure instead of sending it from the code. That way, instead of sending huge ammounts of string characters from the application server to the database, only the "EXEC SPNAME" command will be sent.
This is overkill when the database server and the web server are not on the same network (For example, internet communication). And even if that's not the case, too much stress means a lot of wasted bandwith.
But man, they're so terrible to manage. I avoid them as much as I can.
A SQL stored proc doesn't increase the performance of the query
Well obviously using stored procedures has several advantages over constructing SQL in code.
Your code implementation and SQL become independent of each other.
Code is easier to read.
Write once use many times.
Modify once
No need to give internal details to the programmer about the database. etc , etc.
Stored Procedures are MORE maintainable because:
You don't have to recompile your C# app whenever you want to change some SQL
You end up reusing SQL code.
Code repetition is the worst thing you can do when you're trying to build a maintainable application!
What happens when you find a logic error that needs to be corrected in multiple places? You're more apt to forget to change that last spot where you copy & pasted your code.
In my opinion, the performance & security gains are an added plus. You can still write insecure/inefficient SQL stored procedures.
Easier to port to another DB - no procs to port
It's not very hard to script out all your stored procedures for creation in another DB. In fact - it's easier than exporting your tables because there are no primary/foreign keys to worry about.
#Terrapin - sprocs are just as vulnerable to injection attacks. As I said:
Always parametrise all queries - never inline something from user input and you'll be fine.
That goes for sprocs and dynamic Sql.
I'm not sure not recompiling your app is an advantage. I mean, you have run your unit tests against that code (both application and DB) before going live again anyway.
#Guy - yes you're right, sprocs do let you control application users so that they can only perform the sproc, not the underlying action.
My question would be: if all the access it through your app, using connections and users with limited rights to update/insert etc, does this extra level add security or extra administration?
My opinion is very much the latter. If they've compromised your application to the point where they can re-write it they have plenty of other attacks they can use.
Sql injections can still be performed against those sprocs if they dynamically inline code, so the golden rule still applies, all user input must always be parametrised.
Something that I haven't seen mentioned thus far: the people who know the database best aren't always the people that write the application code. Stored procedures give the database folks a way to interface with programmers that don't really want to learn that much about SQL. Large--and especially legacy--databases aren't the easiest things to completely understand, so programmers might just prefer a simple interface that gives them what they need: let the DBAs figure out how to join the 17 tables to make that happen.
That being said, the languages used to write stored procedures (PL/SQL being a notorious example) are pretty brutal. They typically don't offer any of the niceties you'd see in today's popular imperative, OOP, or functional languages. Think COBOL.
So, stick to stored procedures that merely abstract away the relational details rather than those that contain business logic.
I generally write OO code. I suspect that most of you probably do, too. In that context, it seems obvious to me that all of the business logic - including SQL queries - belongs in the class definitions. Splitting up the logic such that part of it resides in the object model and part is in the database is no better than putting business logic into the user interface.
Much has been said in earlier answers about the security benefits of stored procs. These fall into two broad categories:
1) Restricting direct access to the data. This definitely is important in some cases and, when you encounter one, then stored procs are pretty much your only option. In my experience, such cases are the exception rather than the rule, however.
2) SQL injection/parametrized queries. This objection is a red herring. Inline SQL - even dynamically-generated inline SQL - can be just as fully parametrized as any stored proc and it can be done just as easily in any modern language worth its salt. There is no advantage either way here. ("Lazy developers might not bother with using parameters" is not a valid objection. If you have developers on your team who prefer to just concatenate user data into their SQL instead of using parameters, you first try to educate them, then you fire them if that doesn't work, just like you would with developers who have any other bad, demonstrably detrimental habit.)
I am a huge supporter of code over SPROC's. The number one reasons is keeping the code tightly coupled, then a close second is the ease of source control without a lot of custom utilities to pull it in.
In our DAL if we have very complex SQL statements, we generally include them as resource files and update them as needed (this could be a separate assembly as well, and swapped out per db, etc...).
This keeps our code and our sql calls stored in the same version control, without "forgetting" to run some external applications for updating.

Categories