I have an application that uses DataTables to perform grouping, filtering and aggregation of data. I want to replace datatables with my own data structures so we don't have any unnecessary overhead that we get from using datatables. So my question is if Linq can be used to perform the grouping, filtering and aggregation of my data and if it can is the performance comparable to datatables or should I just hunker down and write my own algorithms to do it?
Thanks
Dan R.
Unless you go for simple classes (POCO etc), your own implementation is likely to have nearly as much overhead as DataTable. Personally, I'd look more at using tools like LINQ-to-SQL, Entity Framework, etc. Then you can use either LINQ-to-Objects against local data, or the provider-specific implementation for complex database queries without pulling all the data to the client.
LINQ-to-Objects can do all the things you mention, but it involves having all the data in memory. If you have non-trivial data, a database is recommended. SQL Server Express Edition would be a good starting point if you look at LINQ-to-SQL or Entity Framework.
Edited re comment:
Regular TSQL commands are fine and dandy, but you ask about the difference... the biggest being that LINQ-to-SQL will provide the entire DAL for you, which is a huge time saver, as well as making it possible to get a lot more compile-time safety. But is also allows you to use the same approach to look at your local objects and your database - for example, the following is valid C# 3.0 (except for [someDataSource], see below):
var qry = from row in [someDataSource]
group row by row.Category into grp
select new {Category = grp.Key, Count = grp.Count(),
TotalValue = grp.Sum(x=>x.Value) };
foreach(var x in qry) {
Console.WriteLine("{0}, {1}, {2}", x.Category, x.Count, x.TotalValue);
}
If [someDataSource] is local data, such as a List<T>, this will execute locally; but if this is from your LINQ-to-SQL data-context, it can build the appropriate TSQL at the database server. This makes it possible to use a single query mechanism in your code (within the bounds of LOLA, of course).
You'd be better off letting your database handle grouping, filtering and aggregation. DataTables are actually relatively good at this sort of thing (their bad reputation seems to come primarily from inappropriate usage), but not as good as an actual database. Moreover, without a lot of work on your part, I would put my money on the DataTable's having better performance than your homegrown data structure.
Why not use a local database like Sqlserver CE or firebird embedded? (or even ms access! :)). Store the data in the local database, do the processing using simple sql queries and pull the data back. Much simpler and likely less overhead, plus you don't have to write all the logic for grouping/aggregates etc. as the database systems already have that logic built in, debugged and working.
Yes, you can use LINQ to do all those things using your custom objects.
And I've noticed a lot of people suggest that you do this type of stuff in the database... but you never indicated where the database was coming from.
If the data is coming from the database then at the very least the filtering should probably happen there, unless you are doing something specialized (e.g. working from a cached set of data). And even then, if you are working with a significant amount of cached data, you might do well to put that data into an embedded database like SQLite, as someone else has already mentioned.
Related
I'm currently working on a WPF application which was build using entity framework to access data (SQL Server database) (database first).
In the past, the database was on an internal server and I did not notice any problem regarding the performance of the application even though the database is very badly implemented (only tables, no views, no indexes or stored procedure). I'm the one who created it but it was my first job and I was not very good with databases so I felt like entity framework was the best approach to focus mainly on code.
However, the database is now on another server which is waaay slower. As you guessed it, the application has now big performance issues (more than 10 seconds to load a dozen rows, same to insert new rows,...).
Should I stay with entity framework but try to improve performance by altering the database adding views and stored procedure ?
Should I get rid off entity framework and use only "basic" code (and improve the database at the same time) ?
Is there a simple ORM I could use instead of EF ?
Time is not an issue here, I can use all the time I want to improve the application but I can't seem to make a decision about the best way to make my application evolved.
The database is quite simple (around 10 tables), the only thing that could complicates thing is that I store files in there. So I'm not sure I can really use whatever I want. And I don't know if it's important but I need to display quite a few calculated fields. Any advice ?
Feel free to ask any relevant questions.
For performance profiling, the first place I recommend looking is an SQL profiler. This can capture the exact SQL statements that EF is running, and help identify possible performance culprits. I cover a few of these here. The Schema issues are probably the most relevant place to start. The title targets MVC, but most of the items relate to WPF and any application.
A good, simple profiler that I use for SQL Server is ExpressProfiler. (https://github.com/OleksiiKovalov/expressprofiler)
With the move to a new server, and it now sending the data over the wire rather than pulling from a local database, the performance issues you're noticing will most likely be falling under the category of "loading too much, too often". Now you won't only be waiting for the database to load the data, but also for it to package it up and send it over the wire. Also, does the new database represent the same data volume and serve only a single client, or now serving multiple clients? Other catches for developers is "works on my machine" where local testing databases are smaller and not dealing with concurrent queries from the server. (where locks and such can impact performance)
From here, run a copy of the application with an isolated database server (no other clients hitting it to reduce "noise") with the profiler running against it. The things to look out for:
Lazy Loading - This is cases where you have queries to load data, but then see lots (dozens to hundreds) of additional queries being spun off. Your code may say "run this query and populate this data" which you expect should be 1 SQL query, but by touching lazy-loaded properties, this can spin off a great many other queries.
The solution to lazy loading: If you need the extra data, eager load it with .Include(). If you only need some of the data, look into using .Select() to select view models / DTO of the data you need rather than relying on complete entities. This will eliminate lazy load scenarios, but may require some significant changes to your code to work with view models/dtos. Tools like Automapper can help greatly here. Read up on .ProjectTo() to see how Automapper can work with IQueryable to eliminate lazy load hits.
Reading too much - Loading entities can be expensive, especially if you don't need all of that data. Culprits for performance include excessive use of .ToList() which will materialize entire entity sets where a subset of data is needed, or a simple exists check or count would suffice. For example, I've seen code that does stuff like this:
var data = context.MyObjects.SingleOrDefault(x => x.IsActive && x.Id = someId);
return (data != null);
This should be:
var isData = context.MyObjects.Where(x => x.IsActive && x.Id = someId).Any();
return isData;
The difference between the two is that in the first example, EF will effectively do a SELECT * operation, so in the case where data is present it will return back all columns into an entity, only to later check if the entity was present. The second statement will run a faster query to simply return back whether a row exists or not.
var myDtos = context.MoyObjects.Where(x => x.IsActive && x.ParentId == parentId)
.ToList()
.Select( x => new ObjectDto
{
Id = x.Id,
Name = x.FirstName + " " + x.LastName,
Balance = calculateBalance(x.OrderItems.ToList()),
Children = x.Children.ToList()
.Select( c => new ChildDto
{
Id = c.Id,
Name = c.Name
}).ToList()
}).ToList();
Statements like this can go on and get rather complex, but the real problems is the .ToList() before the .Select(). Often these creep in because devs try to do something that EF doesn't understand, like call a method. (i.e. calculateBalance()) and it "works" by first calling .ToList(). The problem here is that you are materializing the entire entity at that point and switching to Linq2Object. This means that any "touches" on related data, such as .Children will now trigger lazy loads, and again further .ToList() calls can saturate more data to memory which might otherwise be reduced in a query. The culprit to look out for is .ToList() calls and to try removing them. Select simpler values before calling .ToList() and then feed that data into view models where the view models can calculate resulting data.
The worst culprit like this I've seen was due to a developer wanting to use a function in a Where clause:
var data = context.MyObjects.ToList().Where(x => calculateBalance(x) > 0).ToList();
That first ToList() statement will attempt to saturate the whole table to entities in memory. A big performance impact beyond just the time/memory/bandwidth needed to load all of this data is simply the # of locks the database must make to reliably read/write data. The fewer rows you "touch" and the shorter you touch them, the nicer your queries will play with concurrent operations from multiple clients. These problems magnify greatly as systems transition to being used by more users.
Provided you've eliminated extra lazy loads and unnecessary queries, the next thing to look at is query performance. For operations that seem slow, copy the SQL statement out of the profiler and run that in the database while reviewing the execution plan. This can provide hints about indexes you can add to speed up queries. Again, using .Select() can greatly increase query performance by using indexes more efficiently and reducing the amount of data the server needs to pull back.
For file storage: Are these stored as columns in a relevant table, or in a separate table that is linked to the relevant record? What I mean by this, is if you have an Invoice record, and also have a copy of an invoice file saved in the database, is it:
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFileData
or
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFile
InvoiceId
InvoiceFileData
It is a better structure to keep large, seldom used data in separate tables rather than combined with commonly used data. This keeps queries to load entities small and fast, where that expensive data can be pulled up on-demand when needed.
If you are using GUIDs for keys (as opposed to ints/longs) are you leveraging newsequentialid()? (assuming SQL Server) Keys set to use newid() or in code, Guid.New() will lead to index fragmentation and poor performance. If you populate the IDs via database defaults, switch them over to use newsequentialid() to help reduce the fragmentation. If you populate IDs via code, have a look at writing a Guid generator that mimics newsequentialid() (SQL Server) or pattern suited to your database. SQL Server vs. Oracle store/index GUID values differently so having the "static-like" part of the UUID bytes in the higher order vs. lower order bytes of the data will aid indexing performance. Also consider index maintenance and other database maintenance jobs to help keep the database server running efficiently.
When it comes to index tuning, database server reports are your friends. After you've eliminated most, or at least some serious performance offenders from your code, the next thing is to look at real-world use of your system. The best thing here to learn where to target your code/index investigations are the most used and problem queries that the database server identifies. Where these are EF queries, you can usually reverse-engineer based on the tables being hit which EF query is responsible. Grab these queries and feed them through the execution plan to see if there is an index that might help matters. Indexing is something that developers either forget, or get prematurely concerned about. Too many indexes can be just as bad as too few. I find it's best to monitor real-world usage before deciding on what indexes to add.
This should hopefully give you a start on things to look for and kick the speed of that system up a notch. :)
First you need to run a performance profiler and find put what is the bottle neck here, it can be database, entity framework configuration, entity framework queries and so on
In my experience, entity framework is a good option to this kind of applications, but you need understand how it works.
Also, What entity framework are you using? the lastest version is 6.2 and has some performance improvements that olders does not have, so if you are using a old one i suggest that update it
Based on the comments I am going to hazard a guess that it is mostly a bandwidth issue.
You had an application that was working fine when it was co-located, perhaps a single switch, gigabit ethernet and 200m of cabling.
Now that application is trying to send or retrieve data to/from a remote server, probably over the public internet through an unknown number of internal proxies in contention with who knows what other traffic, and it doesn't perform as well.
You also mention that you store files in the database, and your schema has fields like Attachment.data and Doc.file_content. This suggests that you could be trying to transmit large quantities (perhaps megabytes) of data for a simple query and that is where you are falling down.
Some general pointers:
Add indexes for anywhere you are joining tables or values you
commonly query on.
Be aware of the difference between Lazy & Eager
loading in Entity Framework. There is no right or wrong answer,
but you should be know what you approach you are using and why.
Split any file content
into its own table, with the same primary key as the main table or
play with different EF classes to make sure you only retrieve files
when you need to use them.
I might be way off here, and this question probably bordering subjective, but here goes anyway.
Currently I use IList<T> to cache information from the database in memory so I can use LINQ to query information from them. I have a ORM'ish layer I've written with the help of some questions here on SO, to easily query the information I need from the DB. For example:
IList<Customer> customers = DB.GetDataTable("Select * FROM Customers").ToList<Customer>();
Its been working fine. I also have extension methods to do CRUD updates on single items within these lists:
DB.Update<Customer>(customers(0));
Again working quite well.
Now in the GUI layer of my app, specifically when binding DataGridView's for the user to edit the data, i find myself bypassing this DAL layer and directly using TableAdapters within the forms which kind of breaks the layered architecture which smells a bit to me. I've also found the fact that I'm using TableAdapters here and ILists there, there are differing standards followed throughout my code which I would like to consolidate into one.
Ideally, I would like to be able to bind to these lists and then have the DAL update the list's 'dirty' data for me. To me, this process would involve the following:
Traversing the list for any 'dirty' items
For each of these, see if there is already an item with the PK in the DB
If (2), then update, else insert
Finally, perform a Delete FROM * WHERE ID NOT IN('all ids in list') query
I'm not entirely sure how this is handled in a TableAdapter, but I can see the performance of this method dropping significantly and quite quickly with increasing items in the list.
So my question is this:
Is there an easier way of committing List to a database? Note the word commit, as it may be an insert/update or delete.
Should I maybe convert to DataTable? e.g. here
I'm sure some of the more advanced ORM's will perform this type of thing, however is there any mini-orm (e.g. dapper/Petapoco/Simple.data etc) that can do this for me? I want to keep it simple (as is with my current DAL) and flexible (I don't mind writing the SQL if its gets me exactly what I need).
Currently I use IList to cache information from the database in memory so I can use LINQ to query information from them.
Linq also has a department called Linq-to-Datasets so this is not a compelling reason.
Better decide what you really want/need:
a full ORM like Entity Framework
use DataSets with DataDapters
use basic ADO.NET (DataReader and List<>) and implement your own change-tracking.
You can mix them to some extent but like you noted it's better to pick one.
I have a problem which I cannot seem to get around no matter how hard I try.
This company works in market analysis, and have pretty large tables (300K - 1M rows) and MANY columns (think 250-300) which we do some calculations on.
I´ll try to get straight to the problem:
The problem is the filtering of the data. All databases I´ve tried so far are way too slow to select data and return it.
At the moment I am storing the entire table in memory and filtering using dynamic LINQ.
However, while this is quite fast (about 100 ms to filter 250 000 rows) I need better results than this...
Is there any way I can change something in my code (not the data model) which could speed the filtering up?
I have tried using:
DataTable.Select which is slow. Dynamic LINQ which is better, but
still too slow. Normal LINQ (just for testing purposes) which almost
is good enough. Fetching from MySQL and do the processing later on
which is badass slow.
At the beginning of this project we thought that some high-performance database would be able to handle this, but I tried:
H2 (IKVM)
HSQLDB (compiled ODBC-driver)
CubeSQL
MySQL
SQL
SQLite
...
And they are all very slow to interface .NET and get results from.
I have also tried splitting the data into chunks and combining them later in runtime to make the total amount of data which needs filtering smaller.
Is there any way in this universe I can make this faster?
Thanks in advance!
UPDATE
I just want to add that I have not created this database in question.
To add some figures, if I do a simple select of 2 field in the database query window (SQLyog) like this (visit_munic_name is indexed):
SELECT key1, key2 FROM table1 WHERE filter1 = filterValue1
It takes 125 milliseconds on 225639 rows.
Why is it so slow? I have tested 2 different boxes.
Of course they must change someting, obviously?
You do not explain what exactly you want to do, or why filtering a lot of rows is important. Why should it matter how fast you can filter 1M rows to get an aggregate if your database can precalculate that aggregate for you? In any case it seems you are using the wrong tools for the job.
On one hand, 1M rows is a small number of rows for most databases. As long as you have the proper indexes, querying shouldn't be a big problem. I suspect that either you do not have indexes on your query columns or you want to perform ad-hoc queries on non-indexed columns.
Furthermore, it doesn't matter which database you use if your data schema is wrong for the job. Analytical applications typically use star schemas to allow much faster queries for a lot more data than you describe.
All databases used for analysis purposes use special data structures which require that you transform your data to a form they like.
For typical relational databases you have to create star schemas that are combined with cubes to precalculate aggregates.
Column databases store data in a columnar format usually combined with compression to achieve fast analytical queries, but they require that you learn to query them in their own language, which may be very different than the SQL language most people are accustomed to.
On the other hand, the way you query (LINQ or DataTable.Select or whatever) has minimal effect on performance. Picking the proper data structure is much more important.
For instance, using a Dictionary<> is much faster than using any of the techniques you mentioned. A dictionary essentially checks for single values in memory. Executing DataTable.Select without indexes, using LINQ to Datasets or to Objects is essentially the same as scanning all entries of an array or a List<> for a specific value,because that is what all these methods do - scan an entire list sequentially.
The various LINQ providers do not do the job of a database. They do not optimize your queries. They just execute what you tell them to execute. Even doing a binary search on a sorted list is faster than using the generic LINQ providers.
There are various things you can try, depending on what you need to do:
If you are looking for a quick way to slice and dice your data, use an existing product like PowerPivot functionality of Excel 2010. PowerPivot loads and compresses MANY millions of rows in an in-memory columnar format and allows you to query your data just as you would with a Pivot table, and even define joins with other in memory sources.
If you want a more repeatable process you can either create the appropriate star schemas in a relational database or use a columnar database. In either case you will have to write the scripts to load your data in the proper structures.
If you are creating your own application you really need to investigate the various algorithms and structures used by other similar tools either for in memory.
I was reading this SO question but still I am not clear about one specific thing.
If I use NHibernate, why do I need LINQ?
The question in my mind becomes more aggravated when I knew that NHibernate also included LINQ support.
LINQ to NHibernate?
WTF!
LINQ is a query language. It allows you to express queries in a way that is not tied in to your persistence layer.
You may be thinking about the LINQ 2 SQL ORM.
Using LINQ in naming the two is causes unfortunate confusions like yours.
nHibernate, EF, LINQ2XML are all LINQ providers - they all allow you to query a data source using the LINQ syntax.
Well, you don't need Linq, you can always do without it, but you might want it.
Linq provides a way to express operations that behave on sets of data that can be queried and where we can then perform other operations based on the state of that data. It's deliberately written so as to be as agnostic as possible whether that data is in-memory collections, XML, database, etc. Ultimately it's always operating on some sort of in-memory object, with some means of converting between in-memory and the ultimate source, though some bindings go further than others in pushing some of the operations down to a different layer. E.g. calling .Count() can end up looking at a Count property, spinning through a collection and keeping a tally, sending a Count(*) query to a database or maybe something else.
ORMs provide a way to have in-memory objects and database rows reflect each other, with changes to one being reflected by changes to the other.
That fits nicely into the "some means of converting" bit above. Hence Linq2SQL, EF and Linq2NHibernate all fulfil both the ORM role and the Linq provider role.
Considering that Linq can work on collections you'd have to be pretty perverse to create an ORM that couldn't support Linq at all (you'd have to design your collections to not implement IEnumerable<T> and hence not work with foreach). More directly supporting it though means you can offer better support. At the very least it should make for more efficient queries. For example if an ORM gave us a means to get a Users object that reflected all rows in a users table, then we would always be able to do:
int uID = (from u in Users where u.Username == "Alice" select u.ID).FirstOrDefault();
Without direct support for Linq by making Users implement IQueryable<User>, then this would become:
SELECT * FROM Users
Followed by:
while(dataReader.Read())
yield return ConstructUser(dataReader);
Followed by:
foreach(var user in Users)
if(user.Username == "Alice")
return user.ID;
return 0;
Actually, it'd be just slightly worse than that. With direct support the SQL query produced would be:
SELECT TOP 1 id FROM Users WHERE username = 'Alice'
Then the C# becomes equivalent to
return dataReader.Read() ? dataReader.GetInt32(0) : 0;
It should be pretty clear how the greater built-in Linq support of a Linq provider should lead to better operation.
Linq is an in-language feature of C# and VB.NET and can also be used by any .NET language though not always with that same in-language syntax. As such, ever .NET developer should know it, and every C# and VB.NET developer should particularly know it (or they don't know C# or VB.NET) and that's the group NHibernate is designed to be used by, so they can depend on not needing to explain a whole bunch of operations by just implementing them the Linq way. Not supporting it in a .NET library that represents queryable data should be considered a lack of completeness at best; the whole point of an ORM is to make manipulating a database as close to non-DB related operations in the programming language in use, as possible. In .NET that means Linq supprt.
First of all LINQ alone is not an ORM. It is a DSL to query the objects irrespective of the source it came from.
So it makes perfect sense that you can use LINQ with Nhibernate too
I believe you misunderstood the LINQ to SQL with plain LINQ.
Common sense?
There is a difference between an ORM like NHibernate and compiler integrated way to express queries which is use full in many more scenarios.
Or: Usage of LINQ (not LINQ to SQL etc. - the langauge, which is what you talk about though I am not sure you meant what you said) means you dont have to deal with Nhibernate special query syntax.
Or: Anyone NOT using LINQ - regardless of NHibernate or not - de qualifies without good explanation.
You don't need it, but you might find it useful. Bear in mind that Linq, as others have said, is not the same thing as Linq to SQL. Where I work, we write our own SQL queries to retrieve data, but it's quite common for us to use Linq to manipulate that data in order to serve a specific need. For instance, you might have a data access method that allows you to retrieve all dogs owned by Dave:
new DogOwnerDal().GetForOwner(id);
If you're only interested in Dave's daschunds for one specific need, and performance isn't that much of an issue, you can use Linq to filter the response for all of Dave's dogs down to the specific data that you need:
new DogOwnerDal().GetForOwner(id).Where(d => d.Breed == DogBreeds.Daschund);
If performance was crucial, you might want to write a specific data access method to retrieve dogs by owner and breed, but in many cases the effort expended to create the new data access method doesn't increase efficiency enough to be worth doing.
In your example, you might want to use NHibernate to retrieve a lump of data, and then use Linq to break that data into lots of individual subsets for some form of processing. It might well be cheaper to get the data once and use Linq to split it up, instead of repeatedly interrogating the database for different mixtures of the same data.
My understanding of Linq to Sql is it will take my Linq statement and convert it into an equivalent SQL statement.
So
var products = from p in db.Products
where p.Category.CategoryName == "Beverages"
select p
Just turns into
Select * from Products where CategoryName = 'Beverages'
If that's the case, I don't see how stored procedures are useful anymore.
Sprocs are another tool in the box. You might use your fancy automatically-adjusting wrench for 90% of your tasks, but you can't use that shiny thing on stripped nuts. For that a good ol' monkey wrench is your best friend. Unless you break the bolt, in which case you're stuck with assembly.
if that's all you ever did in sql, you didn't need sprocs before!
Security.
I've seen several "security best practice" guidelines which recommend you do all your data access via SP's, and you only grant privileges to execute those SP's.
If a client simply cannot do select or delete on any database tables, the risk may be lower should that client be hacked.
I've never personally worked on a project which worked this way, it always seemed like a giant pain in the backside.
Ah, the subject of many a debate.
Many would argue these days that technologies such as LINQ-to-SQL generate such good SQL these days that the performance advantages are marginal. Personally, I prefer SQL experts tuning SQL performance, not general coders, so I tend to disagree.
However, my main preference for stored procedures has less to do with performance and more to do with security and configuration management.
Much of my architectural work is on service-oriented solutions and by treating the database as a service, it is significantly aided by the use of stored procedures.
Principally, limiting access to the database through stored procedures creates a well-defined interface, limiting the attack surface area and increasing testability. Allowing applications direct access to the underlying data greatly increases the attack surface area, reducing security, and makes impact analysis extremely difficult.
Stored Procedures and Linq to Sql solve different problems.
Linq to Sql is particular to Microsoft SQL Server.
I tend to prefer using stored procedures for several reasons:
it makes the security configuration easier (as mentioned by other posters).
It provides a clearly defined interface for DB access (although responsibility for this could be shifted into other areas, such as a DAL written in C#
I find that the Query Optimizer, in Oracle at least, is able to make more intelligent decisions the more information you give it. This really requires testing with both methods though for your specific scenarios though.
Depending on the developers available, you may have some very good SQL coders who will be better at producing efficient queries if they use sprocs.
The downside is that it can be a pain to keep the code that invokes the sprocs in sync with the database if things are evolving rapidly. The points about producing efficient queries could count as premature optimization. At the end of the day, there is no substitute for benchmarking performance under realistic conditions.
I can think of several good reasons for stored procedures:
When working with bigger tables, it can be hard to generate an efficient query using LINQ to SQL.
A DBA can analyze and troubleshout stored procedures. But think of what happens when two complicated LINQ operations from different front-ends clash.
Stored procedures can enforce data integrity. Deny write access on tables, and allow changes only through stored procedure.
Updating stored procedures is as easy as running ALTER PROCEDURE on a server. If a deployment takes months, and a script minutes, you'll be more flexible with stored procedures.
For a small application that's maintained by one person, stored procedures are probably overkill.
There are significant associated performance improvements on the SQL Server side of things if you use stored procedures in appropriate circumstances.
Stored procedure support for LINQ to SQL was included partly for compatibility with existing systems. This allows developers to migrate from a sproc-based system to a fully LINQ-based system over time, sproc by sproc, rather than forcing developers to make a rush to convert an entire system all at once.
Personally, I don't care for LINQ. I like a separation of the data manipulation stuff and the code stuff. Additionally, the anonymous types that are generated from a LINQ statement cannot be passed-off to other layers of an n-tier application, so either the type needs to be concretely defined, or the LINQ call needs to be made in the UI. Gack!
Additionally, there are the security concerns (whatever user the LINQ code is calling into MS SQL Server under needs to have unfettered access to the data, so if that username/password are compromised, so is the data).
And lastly, LINQ to SQL only works for MS SQL Server (as it comes from MS).
Sprocs have their uses, just like using LINQ does. IMO if an operation is performed multiple times in multiple places then it's a good candidate for "refactoring" into a Stored Proc, as opposed to a LINQ statement that is repeated in different places.
Also, and this is probably blasphemy to a lot of people here, sometimes you should put some logic into the database and then a sproc comes in handy. It's a rare occurrence but sometimes the nature of business rules demands it.
Stored Procedures are useful in many cases, but in General if you are using an ORM you should let the ORM generate the SQL for you. Why should we have to maintain at a minimum of four stored procedures (insert update delete and a single select) for each table.
With that said as people pointed out there are security benefits to using stored procedures. You won't have to grant users read/write to the tables, which is a good protection against SQL Injection.
Stored Procedures are also useful when the logic used to retrieve data is fairly complex. You typicaly see this more in Reporting Scenario's and in which case your probally not using Linq2Sql or some other ORM.
In my opinion if your not generating your SQL but essentially hardcoding it within an app tier, then that should be refactored into stored procedures, and yes there are always exceptions to any rules but in general.
One use of a stored procedure in Linq2Sql might be if you have multiple servers, and are linking to them, you could use a stored procedure to expose data from that other server and manipulate it. This would hide the multiple servers from your application.
Some things can't be done without stored procedures. For instance, at my previous job, there was a stored procedure that return the current value from a row, and incremented it in the same atomic operation such that no two processes every got the same value. I don't remember why this was done instead of using auto-increment, but there was a reason for it.
Reason : Large amounts of data to move from one table to another.
Let's say that once in a while you have to archive items from one table to another or do similar things. With LINQ that would mean to retrieve let's say one million rows from table A into the DBMS client and then insert them into table B.
With a stored procedure things work nice, in sets.
Lots of people have been getting by just fine without them for some time now. If you can do your work securely and efficiently without them, don't feel guilty about going with pure L2S. We're glad to be rid of them # my shop.
You certainly don't "need" stored procedures. But they can come in handy if your domain model requires a complex aggregate Entity and you don't have the luxury/flexibility to modify your database tables to fit your domain model. In this case using Linq-to-SQL or another ORM might result in a very poorly performing set of database calls to construct your Entity. A stored proc can come to the rescue here.
Of course, I would advocate using a methodology or process like TDD/BDD that provides you the flexibility to modify your database tables as needed without much pain. That's always the easier, more maintainable path in my opinion.
Simple example:
select * from Products where GetCategoryType(CategoryName)=1
GetCategoryType can run really fast, because it runs on the DB server.
There's no Linq to SQL substitute for that as far as I know.
I'm coming rather late to this thread. But depending on who you talk to, Linq to SQL is either dead, very dead, or at best a zombie.
In addition, no single tool suits every situation - you need to choose the right tool for the specific job in hand:
Stored procs enable you to enforce complex business rules across multiple client applications.
Stored procs can give you a great security layer.
Stored procs can give you a great abstraction layer.
Stored procs can give you better caching in some circumstances.