I have some performance measuring issue between the EF query run through the web application and running the Profiler generated T-SQL directly into the SQL Query window.
Following is my EF query that executes through the web application:
IEnumerable<application> _entityList = context.applications
.Include(context.indb_generalInfo.EntitySet.Name)
.Include(context.setup_budget.EntitySet.Name)
.Include(context.setup_committee.EntitySet.Name)
.Include(context.setup_fund.EntitySet.Name)
.Include(context.setup_appStatus.EntitySet.Name)
.Include(context.appSancAdvices.EntitySet.Name)
.Where(e => e.indb_generalInfo != null);
if (isIFL != null)
_entityList = _entityList.Where(e => e.app_isIFL == isIFL);
int _entityCount = _entityList.Count(); // hits the database server at this line
While tracing the above EF Query in SQL Profiler it reveals that it took around 221'095 ms to execute. (The applications table having 30,000+, indb_generalInfo having 11,000+ and appSancAdvices having 30,000+ records).
However, when I copy the T-SQL from Profiler and run it directly from Query window it takes around 4'000 ms only.
Why is it so?
The venom in this query is in the first words: IEnumerable<application>. If you replace that by var (i.e. IQueryable) the query will be translated into SQL up to and including the last Count(). This will take considerably less time, because the amount of transported data is reduced to almost nothing.
Further, as bobek already mentioned, you don't need the Includes as you're only counting context.applications items.
Apart from that, you will always notice overhead of using an ORM like Entity Framework.
That's because EF needs to translate your code into TSQL first which is costly as well. Look at this link here: http://peterkellner.net/2009/05/06/linq-to-sql-slow-performance-compilequery-critical/ It'll let you compile your LINQ and should help you with the speed. Also, do you really need that many tables for this query? Maybe you can think of a way to filter it out and only pull out what you need?
The EF definitely has a cost in terms of performance. But it also provides the flexibility to use storedprocs for complex TSQL. But in my opinion it should be your last resort.
in case you interested re performance and EF.
http://msdn.microsoft.com/en-us/data/hh949853.aspx
However...
EF Query in SQL Profiler it reveals that it took around 221'095 ms to
execute.
then..
copy the T-SQL from Profiler and run it directly from Query window
Where the SQL came from is irrelevant.
Q1 took x millisecs. Based on SQL profiler info
Exact same Query Q1' takes less based on SQL profiler. Which means the source of the SQL isnt the issue, it implies environmental issues are involved.
The most obvious explanation, SQL server has buffered many data pages and can much better serve the second identical request.
Related
I'm currently working on a WPF application which was build using entity framework to access data (SQL Server database) (database first).
In the past, the database was on an internal server and I did not notice any problem regarding the performance of the application even though the database is very badly implemented (only tables, no views, no indexes or stored procedure). I'm the one who created it but it was my first job and I was not very good with databases so I felt like entity framework was the best approach to focus mainly on code.
However, the database is now on another server which is waaay slower. As you guessed it, the application has now big performance issues (more than 10 seconds to load a dozen rows, same to insert new rows,...).
Should I stay with entity framework but try to improve performance by altering the database adding views and stored procedure ?
Should I get rid off entity framework and use only "basic" code (and improve the database at the same time) ?
Is there a simple ORM I could use instead of EF ?
Time is not an issue here, I can use all the time I want to improve the application but I can't seem to make a decision about the best way to make my application evolved.
The database is quite simple (around 10 tables), the only thing that could complicates thing is that I store files in there. So I'm not sure I can really use whatever I want. And I don't know if it's important but I need to display quite a few calculated fields. Any advice ?
Feel free to ask any relevant questions.
For performance profiling, the first place I recommend looking is an SQL profiler. This can capture the exact SQL statements that EF is running, and help identify possible performance culprits. I cover a few of these here. The Schema issues are probably the most relevant place to start. The title targets MVC, but most of the items relate to WPF and any application.
A good, simple profiler that I use for SQL Server is ExpressProfiler. (https://github.com/OleksiiKovalov/expressprofiler)
With the move to a new server, and it now sending the data over the wire rather than pulling from a local database, the performance issues you're noticing will most likely be falling under the category of "loading too much, too often". Now you won't only be waiting for the database to load the data, but also for it to package it up and send it over the wire. Also, does the new database represent the same data volume and serve only a single client, or now serving multiple clients? Other catches for developers is "works on my machine" where local testing databases are smaller and not dealing with concurrent queries from the server. (where locks and such can impact performance)
From here, run a copy of the application with an isolated database server (no other clients hitting it to reduce "noise") with the profiler running against it. The things to look out for:
Lazy Loading - This is cases where you have queries to load data, but then see lots (dozens to hundreds) of additional queries being spun off. Your code may say "run this query and populate this data" which you expect should be 1 SQL query, but by touching lazy-loaded properties, this can spin off a great many other queries.
The solution to lazy loading: If you need the extra data, eager load it with .Include(). If you only need some of the data, look into using .Select() to select view models / DTO of the data you need rather than relying on complete entities. This will eliminate lazy load scenarios, but may require some significant changes to your code to work with view models/dtos. Tools like Automapper can help greatly here. Read up on .ProjectTo() to see how Automapper can work with IQueryable to eliminate lazy load hits.
Reading too much - Loading entities can be expensive, especially if you don't need all of that data. Culprits for performance include excessive use of .ToList() which will materialize entire entity sets where a subset of data is needed, or a simple exists check or count would suffice. For example, I've seen code that does stuff like this:
var data = context.MyObjects.SingleOrDefault(x => x.IsActive && x.Id = someId);
return (data != null);
This should be:
var isData = context.MyObjects.Where(x => x.IsActive && x.Id = someId).Any();
return isData;
The difference between the two is that in the first example, EF will effectively do a SELECT * operation, so in the case where data is present it will return back all columns into an entity, only to later check if the entity was present. The second statement will run a faster query to simply return back whether a row exists or not.
var myDtos = context.MoyObjects.Where(x => x.IsActive && x.ParentId == parentId)
.ToList()
.Select( x => new ObjectDto
{
Id = x.Id,
Name = x.FirstName + " " + x.LastName,
Balance = calculateBalance(x.OrderItems.ToList()),
Children = x.Children.ToList()
.Select( c => new ChildDto
{
Id = c.Id,
Name = c.Name
}).ToList()
}).ToList();
Statements like this can go on and get rather complex, but the real problems is the .ToList() before the .Select(). Often these creep in because devs try to do something that EF doesn't understand, like call a method. (i.e. calculateBalance()) and it "works" by first calling .ToList(). The problem here is that you are materializing the entire entity at that point and switching to Linq2Object. This means that any "touches" on related data, such as .Children will now trigger lazy loads, and again further .ToList() calls can saturate more data to memory which might otherwise be reduced in a query. The culprit to look out for is .ToList() calls and to try removing them. Select simpler values before calling .ToList() and then feed that data into view models where the view models can calculate resulting data.
The worst culprit like this I've seen was due to a developer wanting to use a function in a Where clause:
var data = context.MyObjects.ToList().Where(x => calculateBalance(x) > 0).ToList();
That first ToList() statement will attempt to saturate the whole table to entities in memory. A big performance impact beyond just the time/memory/bandwidth needed to load all of this data is simply the # of locks the database must make to reliably read/write data. The fewer rows you "touch" and the shorter you touch them, the nicer your queries will play with concurrent operations from multiple clients. These problems magnify greatly as systems transition to being used by more users.
Provided you've eliminated extra lazy loads and unnecessary queries, the next thing to look at is query performance. For operations that seem slow, copy the SQL statement out of the profiler and run that in the database while reviewing the execution plan. This can provide hints about indexes you can add to speed up queries. Again, using .Select() can greatly increase query performance by using indexes more efficiently and reducing the amount of data the server needs to pull back.
For file storage: Are these stored as columns in a relevant table, or in a separate table that is linked to the relevant record? What I mean by this, is if you have an Invoice record, and also have a copy of an invoice file saved in the database, is it:
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFileData
or
Invoices
InvoiceId
InvoiceNumber
...
InvoiceFile
InvoiceId
InvoiceFileData
It is a better structure to keep large, seldom used data in separate tables rather than combined with commonly used data. This keeps queries to load entities small and fast, where that expensive data can be pulled up on-demand when needed.
If you are using GUIDs for keys (as opposed to ints/longs) are you leveraging newsequentialid()? (assuming SQL Server) Keys set to use newid() or in code, Guid.New() will lead to index fragmentation and poor performance. If you populate the IDs via database defaults, switch them over to use newsequentialid() to help reduce the fragmentation. If you populate IDs via code, have a look at writing a Guid generator that mimics newsequentialid() (SQL Server) or pattern suited to your database. SQL Server vs. Oracle store/index GUID values differently so having the "static-like" part of the UUID bytes in the higher order vs. lower order bytes of the data will aid indexing performance. Also consider index maintenance and other database maintenance jobs to help keep the database server running efficiently.
When it comes to index tuning, database server reports are your friends. After you've eliminated most, or at least some serious performance offenders from your code, the next thing is to look at real-world use of your system. The best thing here to learn where to target your code/index investigations are the most used and problem queries that the database server identifies. Where these are EF queries, you can usually reverse-engineer based on the tables being hit which EF query is responsible. Grab these queries and feed them through the execution plan to see if there is an index that might help matters. Indexing is something that developers either forget, or get prematurely concerned about. Too many indexes can be just as bad as too few. I find it's best to monitor real-world usage before deciding on what indexes to add.
This should hopefully give you a start on things to look for and kick the speed of that system up a notch. :)
First you need to run a performance profiler and find put what is the bottle neck here, it can be database, entity framework configuration, entity framework queries and so on
In my experience, entity framework is a good option to this kind of applications, but you need understand how it works.
Also, What entity framework are you using? the lastest version is 6.2 and has some performance improvements that olders does not have, so if you are using a old one i suggest that update it
Based on the comments I am going to hazard a guess that it is mostly a bandwidth issue.
You had an application that was working fine when it was co-located, perhaps a single switch, gigabit ethernet and 200m of cabling.
Now that application is trying to send or retrieve data to/from a remote server, probably over the public internet through an unknown number of internal proxies in contention with who knows what other traffic, and it doesn't perform as well.
You also mention that you store files in the database, and your schema has fields like Attachment.data and Doc.file_content. This suggests that you could be trying to transmit large quantities (perhaps megabytes) of data for a simple query and that is where you are falling down.
Some general pointers:
Add indexes for anywhere you are joining tables or values you
commonly query on.
Be aware of the difference between Lazy & Eager
loading in Entity Framework. There is no right or wrong answer,
but you should be know what you approach you are using and why.
Split any file content
into its own table, with the same primary key as the main table or
play with different EF classes to make sure you only retrieve files
when you need to use them.
I am querying for values from a database in AWS sydney, (I am in new zealand), using stopwatch i measured the query time, it is wildly inconsistent, sometimes in the 10s of milliseconds and sometimes in the hundreds of milliseconds, for the exact same query. I have no idea why.
Var device = db.things.AsQueryable().FirstOrDefault(p=>p.ThingName == model.thingName);
things table only has 5 entries, I have tried it without the asqueryable and it seems to make no difference. I am using visual studio 2013, entity framework version 6.1.1
EDIT:
Because this is for a business, I cannot put a lot of code up, another time example is that it went from 34 ms to 400 ms
thanks
This can be related to cold-warm query execution.
The very first time any query is made against a given model, the Entity Framework does a lot of work behind the scenes to load and validate the model. We frequently refer to this first query as a "cold" query. Further queries against an already loaded model are known as "warm" queries, and are much faster.
You can find more information about this in the following article:
https://msdn.microsoft.com/en-us/library/hh949853(v=vs.113).aspx
One way to make sure this is the problem is to write a Stored Procedure and get data by it(using Entity Framework) to see if the problem is in the connection or in the query(Entity Framework) itself.
Came across code where .Where(o => o.x == y) was changed to .Where(o => o.x.Equals(y)). I knew that == was parsed out by EF's SQL generator to execute on the server, but wasn't sure about .Equals(). Clearly this change was done as a matter of habit, perhaps someone coming out of a C++ background and not thinking about the fact that == would have been parsed as an expression, not executed as a function, and would be converted to SQL. This change compiles and runs but I was wondering if it's because EF is treating it as a Func instead of as an expression, and as such perhaps executing a generalized query and moving the filter to client-side, or something similarly ridiculous.
Linq-To-Entities, as of version 6 of EF, doesn't perform any kind of filtering in the client. If you try to execute any kind of not supported function (meaning it can't be translated to the DB provider) on an EF's IQueryable, it'll throw an exception.
So the answer is: no, it's not executing it locally.
PS: I've read somewhere that this feature is a planned addition to EF7, but this is unconfirmed and just speculation
Update: link to source here: http://blogs.msdn.com/b/adonet/archive/2014/10/27/ef7-v1-or-v7.aspx
Quoting the relevant part in case the link goes dead:
An example of this is how queries are processed. In EF6.x the entire LINQ query was translated into a single SQL query that was executed in the database. This meant your query could only contain things that EF knew how to translate to SQL and you would often get complex SQL that did not perform well.
In EF7 we are adopting a model where the provider gets to select which bits of the query to execute in the database, and how they are executed. This means that query now supports evaluating parts of the query on the client rather than database. It also means the providers can make use of queries with multiple results sets etc., rather than creating one single SELECT with everything in it.
I ran SQL Profiler. It generated "[table].[x] = 'y'" as was originally intended with '=='.
I was looking through the sample LINQ queries provided with LINQPad taken from the C# 4.0 in a Nutshell book, and ran across something I have never used in LINQ to SQL... Compiled Queries.
Here is the exact exmaple:
// LINQ to SQL lets you precompile queries so that you pay the cost of translating
// the query from LINQ into SQL only once. In LINQPad the typed DataContext is
// called TypeDataContext, so we proceed as follows:
var cc = CompiledQuery.Compile ((TypedDataContext dc, decimal minPrice) =>
from c in Customers
where c.Purchases.Any (p => p.Price > minPrice)
select c
);
cc (this, 100).Dump ("Customers who spend more than $100");
cc (this, 1000).Dump ("Customers who spend more than $1000");
What does precompiling a LINQ to SQL query like this actually buy me? Would I get a performance boost from a query slightly more complex than this one? Is this even used in actual practice?
In a nutshell, precompiled querires buy you a performance gain when you need to run a single query multiple times.
Here's some information on LINQ To SQL performance.
I’ve read in several places that
compiling your LINQ will help, but I
have never heard anyone say how
drastic the speed improvement can be.
For example, in one of my favorite
books (LINQ in Action) by Fabrice
Marguerie and others, he quotes on
page 296 a blog post by Rico Mariani
titled DLINQ (Linq to SQL Performance
(Part 1) as saying using a compiled
query offers nearly twice the
performanced of a non-compiled query,
and goes on to say that it brings the
performance to within 93% of using a
raw data reader. Well, suffice it to
say I never ran the test myself. I
could have lived with twice, but not
37 times.
So, from this, it seems that you
should always compile your LINQ to SQL
queries. Well, that’s not quite true.
What I’m recommending is that if you
have a reason to execute the same
query over and over you should
strongly consider compiling. If for
example, you are just making a LINQ to
SQL call once, there is no benefit
because you have to compile it anyway.
Call it ten times? Well, you will
have to decide for yourself.
The way I use compiled queries is in a static way: I statically declare the compiled query, so the query tree structure has to be parsed only once, and you basically have a prepared statement that just needs some extra parameters.
This is mainly used on websites, so the query has to be compiled only once, ever. The performance gain depends of course on the complexity of your query.
We use this at our company and for queries that are run often need not be compiled for each run. You don't have to get overly complex with linq to sql before this makes a difference, but that will depend on the traffic and load on the servers.
From this article from Rico Mariani's Performance Tidbits
Q4:
What are the downsides to precompiled
queries?
A:
There is no penalty to precompiling
(see Quiz #13). The only way you might
lose performance is if you precompile
a zillion queries and then hardly use
them at all -- you'd be wasting a lot
of memory for no good reason.
But measure :)
I'm making an application that will analyze real-time data that has been stored to a SQL CE database. When I test the application as it is built now, with LINQ to SQL, I get slow results and I need to rethink how to do this.
To save me some time, can I trust that L2S is just as fast as the 'old' SqlCe methodes were? I like L2S and would prefer to stay with it, and if your experience says it's as fast as any other db connection, I can rest assured that I wouldn't increase performance by rewriting the L2S to old SQL statements.
The bottlenecks when using SqlCE doesn't stem from the SQL generated from Linq to Sql. Remember, CE is an in process db and therefore has it's limitations. For example, LEFT OUTTER JOINS are a DISASTER regardless of what you use to query it. Inserts and Updates aren't bad, but then again, if you'll be doing a high volume of either one of those, you'll suffer some serious performance issues. My point is, the slowness isn't because of LINQ to SQL. I've benchmarked it in the past (don't know if I still have that code) and from what I remember, Linq to SQL wasn't slower then querying it directly with ADO.NET. The performance issues are due to the constraints of CE itself.
If you are using SQL CE this video from last year's PDC is very informational. The idea we have of how to optimize queries for full blown SQL Server not always apply, and sometimes detriment performance on SQL CE.
I would recommend you watch it, as the presenter explains the differences and does benchmarks to show the results. Here you can find a link to his blog.