I was looking through the sample LINQ queries provided with LINQPad taken from the C# 4.0 in a Nutshell book, and ran across something I have never used in LINQ to SQL... Compiled Queries.
Here is the exact exmaple:
// LINQ to SQL lets you precompile queries so that you pay the cost of translating
// the query from LINQ into SQL only once. In LINQPad the typed DataContext is
// called TypeDataContext, so we proceed as follows:
var cc = CompiledQuery.Compile ((TypedDataContext dc, decimal minPrice) =>
from c in Customers
where c.Purchases.Any (p => p.Price > minPrice)
select c
);
cc (this, 100).Dump ("Customers who spend more than $100");
cc (this, 1000).Dump ("Customers who spend more than $1000");
What does precompiling a LINQ to SQL query like this actually buy me? Would I get a performance boost from a query slightly more complex than this one? Is this even used in actual practice?
In a nutshell, precompiled querires buy you a performance gain when you need to run a single query multiple times.
Here's some information on LINQ To SQL performance.
I’ve read in several places that
compiling your LINQ will help, but I
have never heard anyone say how
drastic the speed improvement can be.
For example, in one of my favorite
books (LINQ in Action) by Fabrice
Marguerie and others, he quotes on
page 296 a blog post by Rico Mariani
titled DLINQ (Linq to SQL Performance
(Part 1) as saying using a compiled
query offers nearly twice the
performanced of a non-compiled query,
and goes on to say that it brings the
performance to within 93% of using a
raw data reader. Well, suffice it to
say I never ran the test myself. I
could have lived with twice, but not
37 times.
So, from this, it seems that you
should always compile your LINQ to SQL
queries. Well, that’s not quite true.
What I’m recommending is that if you
have a reason to execute the same
query over and over you should
strongly consider compiling. If for
example, you are just making a LINQ to
SQL call once, there is no benefit
because you have to compile it anyway.
Call it ten times? Well, you will
have to decide for yourself.
The way I use compiled queries is in a static way: I statically declare the compiled query, so the query tree structure has to be parsed only once, and you basically have a prepared statement that just needs some extra parameters.
This is mainly used on websites, so the query has to be compiled only once, ever. The performance gain depends of course on the complexity of your query.
We use this at our company and for queries that are run often need not be compiled for each run. You don't have to get overly complex with linq to sql before this makes a difference, but that will depend on the traffic and load on the servers.
From this article from Rico Mariani's Performance Tidbits
Q4:
What are the downsides to precompiled
queries?
A:
There is no penalty to precompiling
(see Quiz #13). The only way you might
lose performance is if you precompile
a zillion queries and then hardly use
them at all -- you'd be wasting a lot
of memory for no good reason.
But measure :)
Related
I have some performance measuring issue between the EF query run through the web application and running the Profiler generated T-SQL directly into the SQL Query window.
Following is my EF query that executes through the web application:
IEnumerable<application> _entityList = context.applications
.Include(context.indb_generalInfo.EntitySet.Name)
.Include(context.setup_budget.EntitySet.Name)
.Include(context.setup_committee.EntitySet.Name)
.Include(context.setup_fund.EntitySet.Name)
.Include(context.setup_appStatus.EntitySet.Name)
.Include(context.appSancAdvices.EntitySet.Name)
.Where(e => e.indb_generalInfo != null);
if (isIFL != null)
_entityList = _entityList.Where(e => e.app_isIFL == isIFL);
int _entityCount = _entityList.Count(); // hits the database server at this line
While tracing the above EF Query in SQL Profiler it reveals that it took around 221'095 ms to execute. (The applications table having 30,000+, indb_generalInfo having 11,000+ and appSancAdvices having 30,000+ records).
However, when I copy the T-SQL from Profiler and run it directly from Query window it takes around 4'000 ms only.
Why is it so?
The venom in this query is in the first words: IEnumerable<application>. If you replace that by var (i.e. IQueryable) the query will be translated into SQL up to and including the last Count(). This will take considerably less time, because the amount of transported data is reduced to almost nothing.
Further, as bobek already mentioned, you don't need the Includes as you're only counting context.applications items.
Apart from that, you will always notice overhead of using an ORM like Entity Framework.
That's because EF needs to translate your code into TSQL first which is costly as well. Look at this link here: http://peterkellner.net/2009/05/06/linq-to-sql-slow-performance-compilequery-critical/ It'll let you compile your LINQ and should help you with the speed. Also, do you really need that many tables for this query? Maybe you can think of a way to filter it out and only pull out what you need?
The EF definitely has a cost in terms of performance. But it also provides the flexibility to use storedprocs for complex TSQL. But in my opinion it should be your last resort.
in case you interested re performance and EF.
http://msdn.microsoft.com/en-us/data/hh949853.aspx
However...
EF Query in SQL Profiler it reveals that it took around 221'095 ms to
execute.
then..
copy the T-SQL from Profiler and run it directly from Query window
Where the SQL came from is irrelevant.
Q1 took x millisecs. Based on SQL profiler info
Exact same Query Q1' takes less based on SQL profiler. Which means the source of the SQL isnt the issue, it implies environmental issues are involved.
The most obvious explanation, SQL server has buffered many data pages and can much better serve the second identical request.
I have a problem. My LINQ to SQL queries are pushing data to the database at ~1000 rows per second. But this is much too slow for me. The objects are not complicated. CPU usage is <10% and bandwidth is not the bottleneck too.
10% is on client, on server is 0% or max 1% generally not working at all, not traversing indexes etc.
Why 1000/s are slow, i need something around 20000/s - 200000/s to solve my problem in other way i will get more data than i can calculate.
I dont using transaction but LINQ using, when i post for example milion objects new objects to DataContext and run SubmitChanges() then this is inserting in LINQ internal transaction.
I dont use parallel LINQ, i dont have many selects, mostly in this scenario i'm inserting objects and want use all resources i have not only 5% od cpu and 10kb/s of network!
when i post for example milion objects
Forget it. Linq2sql is not intended for such large batch updates/inserts.
The problem is that Linq2sql will execute a separate insert (or update) statement for each insert (update). This kind of behaviour is not suitable with such large numbers.
For inserts you should look into SqlBulkCopy because it is a lot faster (and really order of magnitudes faster).
Some performance optimization can be achived with LINQ-to-SQL using first off precompiled queries. A large part of the cost is compiling the actual query.
http://www.albahari.com/nutshell/speedinguplinqtosql.aspx
http://msdn.microsoft.com/en-us/library/bb399335.aspx
Also you can disable object tracking which may give you milliseconds of improvement. This is done on the datacontext right after you instantiate it.
I also encountered this problem before. The solution I used is Entity Framework. There is a tutorial here. One traditional way is to use LINQ-To-Entity, which has similar syntax and seamless integration of C# objects. This way gave me 10x acceleration in my impression. But a more efficient (in magnitude) way is to write SQL statement by yourself, and then use ExecuteStoreQuery function to fetch the results. It requires you to write SQL rather than LINQ statements, but the returned results can still be read by C# easily.
I use a stored procedure to fetch data and modifications from the database.
If i use LINQ for the same job will there be any performance issue? I mean will I get better performance or not?
I saw that LINQ syntax is bit complicated, so is there any tool which help me to generate LINQ syntax.
thanks
There is a slight performance impact against using Linq2Sql VS calling stored procedures; for one, the stored procedure can be compiled and stored so subsequent calls are quicker, and you have more control and options in T-SQL land VS having Linq2Sql generate the statements for you.
However, unless your site is getting heavy traffic, theres no reason not to use it if you are looking for an easy way to get your site set up.
In general, LINQ will not beat the performance of a stored procedure. After all, Linq2Sql generates SQL and that will never be better than a well-written stored procedure.
Linq2sql, if not used properly, can harm performance. That is often caused by developers that do not understand the deferred loading principle and that can lead to a LOT of queries.
Where it WILL create performance issues is with bulk operations. This is an area where you do not want to use Linq2Sql but you need SqlBulkCopy or stored procedures.
Nevertheless, I use it a lot and am happy in general.
LinqPad is a free program and can be a way to learn LINQ. It comes loaded with 500 examples.
An open-ended question which may not have a "right" answer, but expert input on this would be appreciated.
Do SQL Queries Need to be that Complicated?
From a Web Dev point of view, as C#/.Net progresses, it seems that there are plenty of easy ways (LINQ, Generics) to do a lot of the things that some people tend to do in their SQL queries (sorting, ordering, merging, etc). That being said, since SQL tends to be the processing "bottleneck" for a lot of apps, a lot of the logic for SQL queries is being moved to the business layer.
As this trend continues, I'm seeing less of a need for large SQL queries.
What do you all think? Are you still writing large SQL queries? If so, is it because you need to or because you are more comfortable doing so than working in the business layer?
What's a "large" query?
The "bottleneck" encountered IME is typically because the tables were modeled poorly, compounded by someone constructing SQL queries that has little to no experience with SQL (the most common issue being thinking SQL is procedural when it's actually SET based). Lack of indexing is the next most common issue.
ORM has evolved to support native queries -- clear recognition that ORM simplifies database interaction, but can't perform as well as proper SQL query development.
Keeping the persistence handling in the business layer is justified by desiring database independence (at the risk of performance). Otherwise, it's a waste of money and resources to ignore what the database can handle in far larger loads, in a central location (that can be clustered).
It depends entirely on the processing. If you're trying to do lots of crazy stuff in your SQL which does things like pivoting or text processing, or whatever, and it turns out to be faster to avoid doing it in SQL and process it outside the database server instead, then yes, you were probably using SQL wrong, and the code belongs in the business layer or on the client.
In contrast, SQL excels at set operations, and that's what it should primarily be used for. I've seen an awful lot of applications slowed down because business logic or display code was grabbing a million rows of resultset from the database, bringing them back one at a time, and then throwing 990,000 of them away by doing what's effectively a set operation (JOIN, whatever) outside the database, instead of selecting the 10,000 interesting results using a query on the server and then processing the results of that.
So. It depends on what you mean by "large SQL queries". I feel from the way you're asking the question that what you mean is "overly-complex, non-set-based translations of business/presentation logic into SQL queries that should never have been written in the first place."
in many data-in/data-out cases, no.
in some cases, yes.
If all you need to work with is a simple navigation hierarchy (mainly focusing on parent, sibling, child, etc), then LINQ and it's friends are excellent choices - they reduce the pain (and effort and risk) from the majority of queries. But there are a number of scenarios where it doesn't work so well:
large-scale set-based operations: I can do a wide-ranging query in TSQL without the need to drag that data over the network in one large query, and then (even worse) update each record individually (since in many cases the ORM tools will choose individual UPDATE/INSERT/DELETE operations etc). Not only is this slow, it increases the chances of data drift. So to counter that you might add a transaction - but a long-lived transaction (while you suck a glut of data over the network) is bad
simply: there are a lot of queries where hand-tuning it achieves things that the ORMs simply can't; I had a scenario recently where a relatively basic LINQ query was performing badly. I hand tuned it (using some ROW_NUMBER() etc) and the IO stats went down to only 5% of what they were with the generated query.
there are some queries that are exceptionally difficult to express in some query syntax options, and even if you do - would lead to bad queries. Yet which can be expressed very elegantly in TSQL: example: Linq to Sql: select query with a custom order by
This is a subjective question.
IMO, SQL (or whatever query language you use to access the db) should be as complicated as necessary to solve performance problems.
There are two competing interests:
Performance: This means, load the least amount of data you need in the smallest number of queries.
Maintainability: Load as much as possible (lets say, as it makes sense) with the simplest, most reusable kind of query and do everything else in memory.
So you always need to find your way between performance and maintainability. This is actually nothing special - that's what you do when programming all the time.
Newer ways of doing db queries don't change a lot in this situation. Even if you use NHibernate's HQL, you consider performance and maintainability. You already went a step to maintainability, but you may fall back to SQL to tune some queries.
For me, the deciding factor between writing a giant sql query or a bunch of simple queries and then do everything in the code is usually performance. The latter is preferred but if it goes way too slow, I'll do the former (Sql is optimized for data processing after all).
The reason because I prefer the latter is, that in general my team is more comfortable with code then sql queries. I like sql a lot but if a giant sql query means that I'm only one who can debug/understand it in a reasonable amount of time, that's not a good thing. Another reason is also that with a giant query, you will usually program some business logic in it. If I have a business layer, I prefer too have as much of my business logic there as possible.
Off course, you could decide to stuff all your business logic in stored procedures. Your program is then nothing more then a GUI interface to the API of your database. It depends on the requirements of your project and if your team can handle this.
That said, you give Linq as an alternative technology. I have noticed in my team that thanks to my experience with SQL, I'm very comfortable with Linq while my colleagues are not. The problem on a deeper level is procedural vs set based thinking. Linq is comparable to sql. If you are not comfortable with SQL, chances are you won't be with Linq.
I have a question for anyone who has experience on i4o or PLINQ. I have a big object collection (about 400K ) needed to query. The logic is very simple and straightforward. For example, there has a collection of Person objects, I need to find the persons matched with same firstName, lastName, datebirth, or the first initial of FirstName/lastname, etc. It is just a time consuming process using LINQ to Object.
I am wondering if i4o (http://www.codeplex.com/i4o)
or PLINQ can help on improving the query performance. Which one is better? And if there has any approach out there.
Thanks!
With 400k objects, I wonder whether a database (either in-process or out-of-process) wouldn't be a more appropriate answer. This then abstracts the index creation process. In particular, any database will support multiple different indexes over different column(s), making the queries cited all very supportable without having to code specifically for each (just let the query optimizer worry about it).
Working with it in-memory may be valid, but you might (with vanilla .NET) have to do a lot more manual index management. By the sounds of it, i4o would certainly be worth investigating, but I don't have any existing comparison data.
i4o : is meant to speed up quering using linq by using indexes like old relational database days.
PLinq: is meant to use extra cpu cores to process the query in parallel.
If performance is your target, depending on your hardware, I say go with i4o it will make a hell of improvement.
I haven't used i4o but I have used PLINQ.
Without know specifics of the query you're trying to improve it's hard to say which (if any) will help.
PLINQ allows for multiprocessing of queries, where it's applicable. There are time however when parallel processing won't help.
i4o looks like it helps with indexing, which will speed up some calls, but not others.
Bottom line is, it depends on the query being run.