Do SQL Queries Need to be that Complicated? - c#

An open-ended question which may not have a "right" answer, but expert input on this would be appreciated.
Do SQL Queries Need to be that Complicated?
From a Web Dev point of view, as C#/.Net progresses, it seems that there are plenty of easy ways (LINQ, Generics) to do a lot of the things that some people tend to do in their SQL queries (sorting, ordering, merging, etc). That being said, since SQL tends to be the processing "bottleneck" for a lot of apps, a lot of the logic for SQL queries is being moved to the business layer.
As this trend continues, I'm seeing less of a need for large SQL queries.
What do you all think? Are you still writing large SQL queries? If so, is it because you need to or because you are more comfortable doing so than working in the business layer?

What's a "large" query?
The "bottleneck" encountered IME is typically because the tables were modeled poorly, compounded by someone constructing SQL queries that has little to no experience with SQL (the most common issue being thinking SQL is procedural when it's actually SET based). Lack of indexing is the next most common issue.
ORM has evolved to support native queries -- clear recognition that ORM simplifies database interaction, but can't perform as well as proper SQL query development.
Keeping the persistence handling in the business layer is justified by desiring database independence (at the risk of performance). Otherwise, it's a waste of money and resources to ignore what the database can handle in far larger loads, in a central location (that can be clustered).

It depends entirely on the processing. If you're trying to do lots of crazy stuff in your SQL which does things like pivoting or text processing, or whatever, and it turns out to be faster to avoid doing it in SQL and process it outside the database server instead, then yes, you were probably using SQL wrong, and the code belongs in the business layer or on the client.
In contrast, SQL excels at set operations, and that's what it should primarily be used for. I've seen an awful lot of applications slowed down because business logic or display code was grabbing a million rows of resultset from the database, bringing them back one at a time, and then throwing 990,000 of them away by doing what's effectively a set operation (JOIN, whatever) outside the database, instead of selecting the 10,000 interesting results using a query on the server and then processing the results of that.
So. It depends on what you mean by "large SQL queries". I feel from the way you're asking the question that what you mean is "overly-complex, non-set-based translations of business/presentation logic into SQL queries that should never have been written in the first place."

in many data-in/data-out cases, no.
in some cases, yes.
If all you need to work with is a simple navigation hierarchy (mainly focusing on parent, sibling, child, etc), then LINQ and it's friends are excellent choices - they reduce the pain (and effort and risk) from the majority of queries. But there are a number of scenarios where it doesn't work so well:
large-scale set-based operations: I can do a wide-ranging query in TSQL without the need to drag that data over the network in one large query, and then (even worse) update each record individually (since in many cases the ORM tools will choose individual UPDATE/INSERT/DELETE operations etc). Not only is this slow, it increases the chances of data drift. So to counter that you might add a transaction - but a long-lived transaction (while you suck a glut of data over the network) is bad
simply: there are a lot of queries where hand-tuning it achieves things that the ORMs simply can't; I had a scenario recently where a relatively basic LINQ query was performing badly. I hand tuned it (using some ROW_NUMBER() etc) and the IO stats went down to only 5% of what they were with the generated query.
there are some queries that are exceptionally difficult to express in some query syntax options, and even if you do - would lead to bad queries. Yet which can be expressed very elegantly in TSQL: example: Linq to Sql: select query with a custom order by

This is a subjective question.
IMO, SQL (or whatever query language you use to access the db) should be as complicated as necessary to solve performance problems.
There are two competing interests:
Performance: This means, load the least amount of data you need in the smallest number of queries.
Maintainability: Load as much as possible (lets say, as it makes sense) with the simplest, most reusable kind of query and do everything else in memory.
So you always need to find your way between performance and maintainability. This is actually nothing special - that's what you do when programming all the time.
Newer ways of doing db queries don't change a lot in this situation. Even if you use NHibernate's HQL, you consider performance and maintainability. You already went a step to maintainability, but you may fall back to SQL to tune some queries.

For me, the deciding factor between writing a giant sql query or a bunch of simple queries and then do everything in the code is usually performance. The latter is preferred but if it goes way too slow, I'll do the former (Sql is optimized for data processing after all).
The reason because I prefer the latter is, that in general my team is more comfortable with code then sql queries. I like sql a lot but if a giant sql query means that I'm only one who can debug/understand it in a reasonable amount of time, that's not a good thing. Another reason is also that with a giant query, you will usually program some business logic in it. If I have a business layer, I prefer too have as much of my business logic there as possible.
Off course, you could decide to stuff all your business logic in stored procedures. Your program is then nothing more then a GUI interface to the API of your database. It depends on the requirements of your project and if your team can handle this.
That said, you give Linq as an alternative technology. I have noticed in my team that thanks to my experience with SQL, I'm very comfortable with Linq while my colleagues are not. The problem on a deeper level is procedural vs set based thinking. Linq is comparable to sql. If you are not comfortable with SQL, chances are you won't be with Linq.

Related

Entity Framework 4 vs Native Ado.net

I was wondering how does the Entity Framework 4 is compared to native Ado.Net and SPs ?
what i would be missing if i used normal Ado.Net ?
does it worth leaving EF4 ?
In a nutshell, EF is an object-relational mapper (ORM), and ADO.Net is raw power. An ORM allows you to trade some runtime performance for ease of maintenance. You gain the ability to write code in a more declarative manner, expressing what you want out of the database instead of exactly how to go about getting it. As a result, changes to the database structure can be accounted for in the mappings rather than in every single part of your application that needed to touch the particular table that changed.
What you would be missing if you use ADO.Net is developer productivity. Describing each database operation in detail to ADO.Net is time consuming, error-prone, and not much fun.
I don't think I would ever want to "leave" an ORM and go back to raw ADO.Net except in situations in which extreme performance is required, such as importing large amounts of data, in which case you might be better off writing an SSIS package anyway.
EF is not suited for "crunching" large amounts of data: statistical or financial data with lot of abstract entities for example. Otherwise it's fine. Anyway, unless you suffering from perfomance issues - it's fine too. Also nothing stops you from using both concepts at the same time.
EF feels more natural, but if you are a hardcore sql user, it might feel weak and odd at first. But I like doing everything on the c# side, less maintenance issues, less headaches, less magic strings.
Anyways, for performance issues, unless you are doing mass inserts, updates, you won't see any difference.
If you use normal ADO.Net, without some kind of OR/M wrapped around it, you would still be working on records, not classes with behaviours and methods on them. You would need an additional biz layer tied to the record.

Which one can have better performance - LINQ to EF or NHibernate?

I want to start working on a big project. I research about performance issues about LINQ to EF and NHibernate. I want to use one of them as ORM in my project. now my question is that which one of these two ORM can get me better performance in my project? I will use SQL Server 2008 as database and C# as programming language.
Neither one will have "better performance."
When analyzing performance, you need to look at the limiting factor. The limiting factor in this case will not be the ORM you choose, but rather how you use that tool, how you write your queries, and how you optimize the database backend.
Therefore, the "fastest" ORM will be the one which you can use correctly, coupled with the database server you best understand.
The ORM itself does have a certain amount of overhead, so the "fastest", in terms of sheer performance, is to use none at all. However, this favors the computer's time over at your development time, which is typically not a good trade-off. ORMs can save large amounts of your development time, while imposing only a small overhead when used correctly.
Typically when people experience performance problems when using an ORM it is because they are using the ORM incorrectly, rather than because they picked the "wrong" ORM.
We're currently using Fluent NHibernate on one our projects (with web services, so that adds additional time lag) and as far as I can see, data access is pretty much instantaneous (from human perspective).
Maybe someone can provide answer with concrete numbers though.
Since these two ORMs are somewhat different, it'd be better to decide on which one to use with regard to your specific needs, rather than performance (which, like I said, shouldn't be a big deal).
Here's a nice benchmark. As you can see results depend on whether you are doing SELECT, UPDATE, DELETE.

What is the overhead associated with .NET stored procedures executing on Sql Server?

Certainly there is some type of context switching, marshaling, serializing, etc that takes place when you choose to write a stored procedure in NET vs T/SQL.
Are there any hard facts about this overhead as it relates to making a decision about using .NET vs T/SQL for stored procedures?
What factors do you use in order to make the decision?
For me, developing in C# is 1000% faster than developing in T/SQL due to my learning curve, but when does this gain not offset the performance hit (if there even is one)?
I'd say it depends.
Some things you will find using CLR procedres are better
Using native .NET objects -- file system, network, etc
Using features not offered by TQL or not as good as .NET e.g regular expressions
For yet others you will find TSQL procedures are better, either in terms of ease of use or raw execution speed
Using set based logic (where abc in def or abc not in ghi)
CRUD operations
The overhead is irrelevant in relation to switching, marshaling, etc, unless...your data is not already in the database.
That said, I tend to do almost no stored procedures do as much as possible with T-SQL -- wrapped in an ORM.
Really, that is the answer for there. Look into Entity Framework, NHibernate, Linq To SQL, etc. Get out of writing stored procs (I can hear the down votes now), and do it with C#.
EDIT: Added more info
Stored procs are hard to test (inherently data bound), they offer not speed benefits over dynamic sql, and really are not any more secure. That said, for things like sorting and filtering, a relational database is more efficient than C# will be.
But, if you have to use stored procs (and there are times), use TSQL as much as possible. If you have a substancial amount of logic that needs to be performed, that is not set based, then break out C#.
The one time I did this was to add some extra date processing to my database. The company I was at used a 5-4-4 week-to-month fiscal calendar. That was almost impossible to do in SQL. So I used C# for that one.
It's sort of a religious question, but there can be substantial differences depending on what you're doing, your schema, and how many records you're doing it with. Testing is really the best way to find out for your circumstances.
In general, stored prcoedures will scale better; but that may not be true in your case, or that may not be an important consideration for you.

Why do I need Stored Procedures when I have LINQ to SQL

My understanding of Linq to Sql is it will take my Linq statement and convert it into an equivalent SQL statement.
So
var products = from p in db.Products
where p.Category.CategoryName == "Beverages"
select p
Just turns into
Select * from Products where CategoryName = 'Beverages'
If that's the case, I don't see how stored procedures are useful anymore.
Sprocs are another tool in the box. You might use your fancy automatically-adjusting wrench for 90% of your tasks, but you can't use that shiny thing on stripped nuts. For that a good ol' monkey wrench is your best friend. Unless you break the bolt, in which case you're stuck with assembly.
if that's all you ever did in sql, you didn't need sprocs before!
Security.
I've seen several "security best practice" guidelines which recommend you do all your data access via SP's, and you only grant privileges to execute those SP's.
If a client simply cannot do select or delete on any database tables, the risk may be lower should that client be hacked.
I've never personally worked on a project which worked this way, it always seemed like a giant pain in the backside.
Ah, the subject of many a debate.
Many would argue these days that technologies such as LINQ-to-SQL generate such good SQL these days that the performance advantages are marginal. Personally, I prefer SQL experts tuning SQL performance, not general coders, so I tend to disagree.
However, my main preference for stored procedures has less to do with performance and more to do with security and configuration management.
Much of my architectural work is on service-oriented solutions and by treating the database as a service, it is significantly aided by the use of stored procedures.
Principally, limiting access to the database through stored procedures creates a well-defined interface, limiting the attack surface area and increasing testability. Allowing applications direct access to the underlying data greatly increases the attack surface area, reducing security, and makes impact analysis extremely difficult.
Stored Procedures and Linq to Sql solve different problems.
Linq to Sql is particular to Microsoft SQL Server.
I tend to prefer using stored procedures for several reasons:
it makes the security configuration easier (as mentioned by other posters).
It provides a clearly defined interface for DB access (although responsibility for this could be shifted into other areas, such as a DAL written in C#
I find that the Query Optimizer, in Oracle at least, is able to make more intelligent decisions the more information you give it. This really requires testing with both methods though for your specific scenarios though.
Depending on the developers available, you may have some very good SQL coders who will be better at producing efficient queries if they use sprocs.
The downside is that it can be a pain to keep the code that invokes the sprocs in sync with the database if things are evolving rapidly. The points about producing efficient queries could count as premature optimization. At the end of the day, there is no substitute for benchmarking performance under realistic conditions.
I can think of several good reasons for stored procedures:
When working with bigger tables, it can be hard to generate an efficient query using LINQ to SQL.
A DBA can analyze and troubleshout stored procedures. But think of what happens when two complicated LINQ operations from different front-ends clash.
Stored procedures can enforce data integrity. Deny write access on tables, and allow changes only through stored procedure.
Updating stored procedures is as easy as running ALTER PROCEDURE on a server. If a deployment takes months, and a script minutes, you'll be more flexible with stored procedures.
For a small application that's maintained by one person, stored procedures are probably overkill.
There are significant associated performance improvements on the SQL Server side of things if you use stored procedures in appropriate circumstances.
Stored procedure support for LINQ to SQL was included partly for compatibility with existing systems. This allows developers to migrate from a sproc-based system to a fully LINQ-based system over time, sproc by sproc, rather than forcing developers to make a rush to convert an entire system all at once.
Personally, I don't care for LINQ. I like a separation of the data manipulation stuff and the code stuff. Additionally, the anonymous types that are generated from a LINQ statement cannot be passed-off to other layers of an n-tier application, so either the type needs to be concretely defined, or the LINQ call needs to be made in the UI. Gack!
Additionally, there are the security concerns (whatever user the LINQ code is calling into MS SQL Server under needs to have unfettered access to the data, so if that username/password are compromised, so is the data).
And lastly, LINQ to SQL only works for MS SQL Server (as it comes from MS).
Sprocs have their uses, just like using LINQ does. IMO if an operation is performed multiple times in multiple places then it's a good candidate for "refactoring" into a Stored Proc, as opposed to a LINQ statement that is repeated in different places.
Also, and this is probably blasphemy to a lot of people here, sometimes you should put some logic into the database and then a sproc comes in handy. It's a rare occurrence but sometimes the nature of business rules demands it.
Stored Procedures are useful in many cases, but in General if you are using an ORM you should let the ORM generate the SQL for you. Why should we have to maintain at a minimum of four stored procedures (insert update delete and a single select) for each table.
With that said as people pointed out there are security benefits to using stored procedures. You won't have to grant users read/write to the tables, which is a good protection against SQL Injection.
Stored Procedures are also useful when the logic used to retrieve data is fairly complex. You typicaly see this more in Reporting Scenario's and in which case your probally not using Linq2Sql or some other ORM.
In my opinion if your not generating your SQL but essentially hardcoding it within an app tier, then that should be refactored into stored procedures, and yes there are always exceptions to any rules but in general.
One use of a stored procedure in Linq2Sql might be if you have multiple servers, and are linking to them, you could use a stored procedure to expose data from that other server and manipulate it. This would hide the multiple servers from your application.
Some things can't be done without stored procedures. For instance, at my previous job, there was a stored procedure that return the current value from a row, and incremented it in the same atomic operation such that no two processes every got the same value. I don't remember why this was done instead of using auto-increment, but there was a reason for it.
Reason : Large amounts of data to move from one table to another.
Let's say that once in a while you have to archive items from one table to another or do similar things. With LINQ that would mean to retrieve let's say one million rows from table A into the DBMS client and then insert them into table B.
With a stored procedure things work nice, in sets.
Lots of people have been getting by just fine without them for some time now. If you can do your work securely and efficiently without them, don't feel guilty about going with pure L2S. We're glad to be rid of them # my shop.
You certainly don't "need" stored procedures. But they can come in handy if your domain model requires a complex aggregate Entity and you don't have the luxury/flexibility to modify your database tables to fit your domain model. In this case using Linq-to-SQL or another ORM might result in a very poorly performing set of database calls to construct your Entity. A stored proc can come to the rescue here.
Of course, I would advocate using a methodology or process like TDD/BDD that provides you the flexibility to modify your database tables as needed without much pain. That's always the easier, more maintainable path in my opinion.
Simple example:
select * from Products where GetCategoryType(CategoryName)=1
GetCategoryType can run really fast, because it runs on the DB server.
There's no Linq to SQL substitute for that as far as I know.
I'm coming rather late to this thread. But depending on who you talk to, Linq to SQL is either dead, very dead, or at best a zombie.
In addition, no single tool suits every situation - you need to choose the right tool for the specific job in hand:
Stored procs enable you to enforce complex business rules across multiple client applications.
Stored procs can give you a great security layer.
Stored procs can give you a great abstraction layer.
Stored procs can give you better caching in some circumstances.

Is there a performance issue when using inline SQL statements rather then using a DAL design?

I made a fairly large social network type website and used nothing but inline SQL Statements to access the database (I was new to the language so back off!)
Are there any performance issues when doing it this way as opposed to using a massive XSD DataSet file to handle all the queries? Or is this just bad design?
Thanks!
When you reach real DB performance issues it won't really matter (performance wise) whether you're using stored procedures or direct SQL statements.
Your best bet in that situation is to avoid DB in the first place. In other words, it would be better to plan and architect a good caching mechanism because that will make all the difference when it really comes to serious traffic.
Stored procedures or inline code... again, performance wise (i'm not talking about maintainability, security, ...) simple doesn't matter that much anymore.
I think the maintenance issue/cost will have much more impact, will be much greater then the performance impact (if there's any performance impact at all).
If it is SQL Server changing your Inline SQL Statements to be parametrised calls to sp_ExecuteSQL should yield a very significant performance improvement - and would probably be easier to refactor than moving to, say, Stored Procedures instead of inline code.
IME ultimately, Stored procedures that return multiple recordsets (i.e. do several pieces of work / logic, rather than just replace single in-line queries) would yield more performance improvements.
Strictly speaking the SQL performance would be better if you use inline sql statements rather than an XSD Dataset. This is assuming that at a minimum you use parameterized queries (although SQL Server 2005 even optimizes for non-parameterized queries).
The statement that inline sql cannot be optimized by the database engine isn't true. Stored procedures are optimized identically to inline sql with SQL Server.
But whether it makes sense from a design perspective is another story entirely. And whether you are over-optimizing too early is another question.
Stay away from generated typed datasets. The performance isn't that good. If you need the type of functionality, look at LINQ; otherwise just grab the Enterprise Library and go direct.
When having large amount of data a DB is better in ways of: maintenance and performance. Having stored procedures for example makes it possible to delegate the work and query-optimizations to a professional DBA, no recompiles are needed. When having XSD/XML data files, you have to regenerate your code for each change.
Also look at concurrency: more users, more web-request, more concurrency... you can end up in locks in the data file. You could implement advanced caching mechanisms to gain more performance but that's also extra maintenance and complexity.

Categories