I was looking for some tips to improve my entity framework query performance and came accross this useful article.
The author of this article mentioned following:
09 Avoid using Views
Views degrade the LINQ query performance costly. These are slow in performance and impact the performance greatly. So avoid using views in LINQ to Entities.
I am just familiar with this meaning of view in the context of databases. And beacuse I don't understand this statement: Which views does he mean?
It depends, though rarely to a significant degree.
Let's say we've a view like:
CREATE VIEW TestView
AS
Select A.x, B.y, B.z
FROM A JOIN B on A.id = B.id
And we create an entity mapping for this.
Let's also assume that B.id is bound so that it is non-nullable and has a foreign key relationship with A.id - that is, whenever there's a B row, there is always at least one corresponding A.
Now, if we could do something like from t in context.TestView where t.x == 3 instead of from a in context.A join b in context.B on a.id equals b.id where a.x == 3 select new {a.x, b.y, b.z}.
We can expect the former to be converted to SQL marginally faster, because it's a marginally simpler query (from both the Linq and SQL perspective).
We can expect the latter to be converted from an SQL query to a SQLServer (or whatever) internal query marginally faster.
We can expect that internal query to be pretty much identical, unless something went a bit strange. As such, we'd expect the performance at that point to be identical.
In all, there isn't very much to choose between them. If I had to bet on one, I'd bet on that using the view being slightly faster especially on first call, but I wouldn't bet a lot on it.
Now lets consider (from t in context.TestView select t.z).Distinct(). vs (from b in context.B select b.z).Distinct().
Both of these should turn into a pretty simple SELECT DISTINCT z FROM ....
Both of these should turn into a table scan or index scan only of table B.
The first might not (flaw in the query plan), but that would be surprising. (A quick check on a similar view does find SQLServer ignoring the irrelevant table).
The first could take slightly longer to produce a query plan for, since the fact that the join on A.id is irrelevant would have to be deduced. But then database servers are good at that sort of thing; it's a set of computer science and problems that have had decades of work done on them.
If I had to bet on one, I'd bet on the view making things very slightly slower, though I'd bet more on it being so slight a difference that it disappears. An actual test with these two sorts of query found the two to be within the same margin of differences (i.e the range of different times for the two overlapped with each other).
The effect in this case on the production of the SQL from the linq query will be nil (they're effectively the same at that point, but with different names).
Lets consider if we had a trigger on that view, so that inserting or deleting carried out the equivalent inserts or deletes. Here we will gain slightly from using one SQL query rather than two (or more), and it's easier to ensure it happens in a single transaction. So a slight gain for views in this case.
Now, let's consider a much more complicated view:
CREATE VIEW Complicated
AS
Select A.x, B.x as y, C.z, COALESCE(D.f, D.g, E.h) as foo
FROM
A JOIN B on A.r = B.f + 2
JOIN C on COALESCE(A.g, B.x) = C.x
JOIN D on D.flag | C.flagMask <> 0
WHERE EXISTS (SELECT null from G where G.x + G.y = A.bar AND G.deleted = 0)
AND A.deleted = 0 AND B.deleted = 0
We could do all of this at the linq level. If we did, it would probably be a bit expensive as query production goes, though that is rarely the most expensive part of the overall hit on a linq query, though compiled queries may balance this out.
I'd lean toward the view being the more efficient approach, though I'd profile if that was my only reason for using the view.
Now lets consider:
CREATE VIEW AllAncestry
AS
WITH recurseAncetry (ancestorID, descendantID)
AS
(
SELECT parentID, childID
FROM Parentage
WHERE parentID IS NOT NULL
UNION ALL
SELECT ancestorID, childID
FROM recurseAncetry
INNER JOIN Parentage ON parentID = descendantID
)
SELECT DISTINCT (cast(ancestorID as bigint) * 0x100000000 + descendantID) as id, ancestorID, descendantID
FROM recurseAncetry
Conceptually, this view does a large number of selects; doing a select, and then recursively doing a select based on the result of that select and so on until it has all the possible results.
In actual execution, this is converted into two table scans and a lazy spool.
The linq-based equivalent would be much heavier; really you'd be better off either calling into the equivalent raw SQL, or loading the table into memory and then producing the full graph in C# (but note that this is going to be a waste on queries based on this that don't need everything).
In all, using a view here is going to be a big saving.
In summary; using views is generally of negligible performance impact, and that impact can go either way. Using views with triggers can give a slight performance win and make it easier to ensure data integrity, by forcing it to happen in a single transaction. Using views with a CTE can be a big performance win.
Non-performance reasons for using or avoiding views though are:
The use of views hides the relationship between the entities related to that view and the entities related to the underlying tables from your code. This is bad as your model is now incomplete in this regard.
If the views are used in other applications apart from yours, you will be more consistent with those other applications, take advantage of already tried-and-tested code, and automatically deal with changes to the view's implementation.
That's some pretty serious micro-optimisation in that article.
I wouldn't take it as gospel personally, having worked with EF quite a bit.
Sure those things can matter, but generally speaking, it's pretty quick.
If you've got a complicated view, and then you're performing further LINQ on that view, then sure, it could probably cause some slow performace, I wouldn't bet on it though.
The article doesn't even have any bench marks!
If performance is a serious issue for your program, narrow down which queries are slow and post them here, see if the SO community can help optimise the query for you. Much better solution than all the micro-optimisation if you ask me.
Related
I've inherited a codebase and I'm having a weird issue with Entity Framework Core v3.1.19.
Entity Framework is generating the following query (as found in SQL Server Profiler) and it's taking nearly 30 seconds to run, when running the same code (again taken from profiler) takes 1 second in SSMS (this is one example but the entire site runs extremely slow when getting data from the database).
exec sp_executesql N'SELECT [t].[Id], [t].[AccrualLink], [t].[BidId], [t].[BidId1], [t].[Cancelled], [t].[ClientId], [t].[CreatedUtc], [t].[CreatorUserId], [t].[Date], [t].[DeletedUtc], [t].[DeleterUserId], [t].[EmergencyContact], [t].[EmergencyName], [t].[EmergencyPhone], [t].[EndDate], [t].[FinalizerId], [t].[Guid], [t].[Invoiced], [t].[IsDeleted], [t].[Notes], [t].[OfficeId], [t].[PONumber], [t].[PlannerId], [t].[PortAgencyAgentEmail], [t].[PortAgencyAgentName], [t].[PortAgencyAgentPhone], [t].[PortAgencyId], [t].[PortAgentId], [t].[PortId], [t].[PortType], [t].[PositionNote], [t].[ProposalLink], [t].[ServiceId], [t].[ShipId], [t].[ShorexAssistantEmail], [t].[ShorexAssistantName], [t].[ShorexAssistantPhone], [t].[ShorexManagerEmail], [t].[ShorexManagerName], [t].[ShorexManagerPhone], [t].[ShuttleBus], [t].[ShuttleBusEmail], [t].[ShuttleBusName], [t].[ShuttleBusPhone], [t].[ShuttleBusServiceProvided], [t].[TouristInformationBus], [t].[TouristInformationEmail], [t].[TouristInformationName], [t].[TouristInformationPhone], [t].[TouristInformationServiceProvided], [t].[UpdatedUtc], [t].[UpdaterUserId], [t].[Water], [t].[WaterDetails], [t0].[Id], [t0].[CreatedUtc], [t0].[CreatorUserId], [t0].[DeletedUtc], [t0].[DeleterUserId], [t0].[Guid], [t0].[IsDeleted], [t0].[LanguageId], [t0].[Logo], [t0].[Name], [t0].[Notes], [t0].[OldId], [t0].[PaymentTerms], [t0].[Pricing], [t0].[Services], [t0].[Status], [t0].[UpdatedUtc], [t0].[UpdaterUserId], [t1].[Id], [t1].[CreatedUtc], [t1].[CreatorUserId], [t1].[DeletedUtc], [t1].[DeleterUserId], [t1].[Guid], [t1].[IsDeleted], [t1].[Name], [t1].[OldId], [t1].[UpdatedUtc], [t1].[UpdaterUserId], [s].[Id], [s].[CreatedUtc], [s].[CreatorUserId], [s].[DeletedUtc], [s].[DeleterUserId], [s].[Guid], [s].[IsDeleted], [s].[Name], [s].[Pax], [s].[UpdatedUtc], [s].[UpdaterUserId]
FROM (
SELECT [o].[Id], [o].[AccrualLink], [o].[BidId], [o].[BidId1], [o].[Cancelled], [o].[ClientId], [o].[CreatedUtc], [o].[CreatorUserId], [o].[Date], [o].[DeletedUtc], [o].[DeleterUserId], [o].[EmergencyContact], [o].[EmergencyName], [o].[EmergencyPhone], [o].[EndDate], [o].[FinalizerId], [o].[Guid], [o].[Invoiced], [o].[IsDeleted], [o].[Notes], [o].[OfficeId], [o].[PONumber], [o].[PlannerId], [o].[PortAgencyAgentEmail], [o].[PortAgencyAgentName], [o].[PortAgencyAgentPhone], [o].[PortAgencyId], [o].[PortAgentId], [o].[PortId], [o].[PortType], [o].[PositionNote], [o].[ProposalLink], [o].[ServiceId], [o].[ShipId], [o].[ShorexAssistantEmail], [o].[ShorexAssistantName], [o].[ShorexAssistantPhone], [o].[ShorexManagerEmail], [o].[ShorexManagerName], [o].[ShorexManagerPhone], [o].[ShuttleBus], [o].[ShuttleBusEmail], [o].[ShuttleBusName], [o].[ShuttleBusPhone], [o].[ShuttleBusServiceProvided], [o].[TouristInformationBus], [o].[TouristInformationEmail], [o].[TouristInformationName], [o].[TouristInformationPhone], [o].[TouristInformationServiceProvided], [o].[UpdatedUtc], [o].[UpdaterUserId], [o].[Water], [o].[WaterDetails]
FROM [OpsDocuments] AS [o]
WHERE ([o].[IsDeleted] <> CAST(1 AS bit)) AND ((CASE
WHEN [o].[Cancelled] = CAST(0 AS bit) THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END & CASE
WHEN [o].[Invoiced] = CAST(0 AS bit) THEN CAST(1 AS bit)
ELSE CAST(0 AS bit)
END) = CAST(1 AS bit))
ORDER BY [o].[Date]
OFFSET #__p_0 ROWS FETCH NEXT #__p_1 ROWS ONLY
) AS [t]
LEFT JOIN [TourClients] AS [t0] ON [t].[ClientId] = [t0].[Id]
LEFT JOIN [TourLanguages] AS [t1] ON [t0].[LanguageId] = [t1].[Id]
LEFT JOIN [Ships] AS [s] ON [t].[ShipId] = [s].[Id]
ORDER BY [t].[Date]',N'#__p_0 int,#__p_1 int',#__p_0=0,#__p_1=10
This query is returning 10 rows from a possible 55 so were not talking big numbers or anything.
At first I thought it might be data type issues on conversion but checking all the data types they are all correct and since the issue is showing in profiler I'm assuming this is a SQL issue not specifically Entity Framework. However I cant find any difference between the two when running in profiler except the one from EF just takes 30 times longer.
Hoping someone might have a suggestion of where to look.
Edit: Thanks for all the suggestions in the comments. As to the Linq and reproducible example it's going to be tricky as the code base for this project is some odd home-baked auto-generating system. You give it a ViewModel with tonnes of custom attributes and it tries to do everything for you (so many layers of abstraction) so its difficult to find anything.
It sounds like I'm going to have to start rewriting these into more finite controllers.
EF will always take longer than a raw SQL because EF has to materialize tracked entities for every entity returned in the query.
Looking at the SQL this is an eager-loading query across 4 tables, OPSDocuments, TourClients, TourLanguages, and Ships.
Reasons this could suddenly take much longer after some seemingly unrelated changes: new relationships being lazy loaded.
An example of this would be where this data is being serialized and a new relationship has been added to one or more entities which are now being tripped by lazy load hits. (Usually evidenced by seeing extra queries coming up after this one runs before the page loads)
Other causes for this to be taking longer than it should:
The DbContext is tracking too many entities. The more entities a DbContext is tracking, the more references it has to go through when piecing together results from a Linq query. Some teams expect that EF caches instances similar to NHibernate and this would improve performance. Typically it is the opposite, the more entities it is tracking the longer it can take to get results.
Concurrent reads & locks. If tables are not efficiently indexed this can be a bit of a killer when a system is run in production compared to testing/debugging. Typically though this would affect systems that have very large row and/or user counts.
The best general advice I can offer when it comes to tackling performance issues with EF is to leverage projection as much as possible. This helps you optimized queries and identify useful indexes that reflect the highest-volume scenarios you are pulling data, as well as avoid future pitfalls from changing relationships which can result in Select n+1 lazy load hits creeping into systems.
For example, instead of:
var results = context.OpsDocuments
.Include(x => x.TourClient)
.ThenInclude(x => x.TourLanguage)
.Include(x => x.Ship)
.OrderBy(x => x.Date)
.ToList();
use:
var results = context.OpsDocuments
.Select(x => new TourSummaryViewModel
{
DocumentId = x.DocumentId,
ClientId = x.Client.Id,
ClientName = x.Client.Name,
Language = x.Client.Language.Name,
ShipName = x.Ship.Name,
Date = x.Date
}).OrderBy(x => x.Date)
.ToList();
... Where the view model reflects just the details you need from the entity graph. This protects you from introduced relationships that the view/consumer doesn't need (unless you add them to the Select) and the resulting query can help identify useful indexes to boost performance if this is something that gets run a fair bit. (Tuning indexing based on actual DB use rather than guesswork)
I would also recommend that all queries like this implement a limiter for the maximum rows returned. (using Take) to help avoid surprises as systems age where row counts grow over time leading to performance degradation over time.
I know this is a very late answer, but based on a similar situation recently encountered - this looks very much like an EntityFramework LINQ-to-SQL clause in the codebase is using bitwise operators ('&', '|') instead of logical operators ('&&', '||'). That would explain the odd 'CAST 1 as bit' and '&' and '|' occurrences in the generated SQL above.
CASTs in the WHERE absolutely kill performance. In our case, a 30sec query immediately went subsecond once this was identified.*
Check LINQ along the lines of: ".Where(x => x.Prop1==true & x.Prop2==false) | (x.Prop3==true))..." and ensure the operators are '&&' instead of '&', etc. It's easy to be already thinking ahead to SQL when writing this code, but it's still C#!
I need to be a little more specific on how CASTing in WHEREs killed performance in our case, without actually CASTing the db fields themselves. Here's an example of generated SQL, from using bitwise ops in the EF Core .Where() C#:
WHERE CASE WHEN cid = 1234 THEN CAST(1 as bit) ELSE CAST (0 as bit) END & (CASE WHEN (date1 IS NULL OR date1 IN ('2000-1-1', '1999-1-1')) THEN CAST (1 as bit) ELSE CAST(0 as bit) END | CASE WHEN (isverified IS NOT NULL AND isverified = CAST(1 as bit)) THEN CAST (1 AS bit) ELSE CAST(0 as bit) END)
This can be rewritten with logical ops as:
WHERE ci=1234 AND ((date1 IS NULL OR date1 IN ('2000-1-1', '1999-1-1')) OR (isverified IS NOT NULL AND isverified=1))
First query (EF-generated thanks to bitwise ops mistakenly in the EF code Where clause) took 30secs on our 45-million-row table. The second was <1s. The explanation for this that I can see is - and I'm open to correction - that the first query essentially generates a bitwise expression per row that must be evaluated, thus being non-sargable and requiring a table scan.
The main issue here is that you have stated that this "query" is taking more than 30 seconds in EF and less than 1 second in SSMS, but what you haven't provided is the SQL that EF has compiled for execution
You're asking us to compare apples with the idea of an orange...
We really need to see the compiled SQL as a minimum but the C# / Linq code will also be helpful. It doesn't have to compile, but it will demonstrate some of the context that you are operating within.
tldr
This is less likely to be about EF itself and more about the patterns in the code you are executing and your specific query.
For such a small and simple query lazy loading should not be used at all, after that the usual suspects that we talk about with EF performance should not be significantly measurable for this tiny dataset either. All we can say from the little information provided is that your EF query does not match your expected SQL, so we should start there and make sure your EF query is compiling a reasonable approximation of the query that you are expecting.
If all else fails, simply use Raw SQL Queries and move on.
While it is true that there are some overheads inherent by using an ORM like EF, with a simple query like this we should be talking about a few milliseconds, anything else indicates that your EF Linq query is either wrong or written very poorly.
If you are using Lazy Loading, then be mindfull of which lines of code will cause a new query from the server instead of using the in-memory data. Lazy Loading can be powerful but there are relatively few situations where it makes sense. Using projections is a good alternative, but you should consider disabling lazy loading altogether and switching over to eager loading always. If you are unsure, try disabling the lazy loading feature of your data context, you'll find out very quickly if your code was depending on the lazy feature as it will likely fail at runtime.
If there is a single execution point then you should be able capture the raw SQL and time the round trip.
Post the code you used to time the execution, the raw SQL and the time please.
If a single execution point takes 30 seconds to load then there might be a cold start issue, that is you might have some processes executing before your query, wihtout knowing more about your framework, an easy example to debug with is to initiate the database connection first with a simple call to return the count of all the OpsDocuments records, then execute your query.
The other performance concerns like having too many columns or strange data type comparisons don't really apply here. You could optimise this query for sure, but with 10 rows and less than 50 columns, even a very slow PC should be able to read this result into an EF graph in a few milliseconds.
If you have already eliminated Lazy-Loading, and your captured SQL query generated by EF is lightning fast when executed in SSMS but awfully slow from your application runtime, then Locking "might" be a concern.
A simple way to verify if locking is an issue is to query the database for the current executing queries while your application is waiting for the response, if the wait time is truely 30 seconds, then you'll have plenty of time to execute the following in SSMS while you are waiting.
As a bonus, this will prove if the query is running at all
Declare #Identifier Char(1) = '~'
SELECT r.session_id, r.status,
st.TEXT AS batch_text,
qp.query_plan AS 'XML Plan',
r.start_time,
r.status,
r.total_elapsed_time, r.blocking_session_id, r.wait_type, r.wait_time, r.open_transaction_count, r.open_resultset_count
FROM sys.dm_exec_requests AS r
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS st
CROSS APPLY sys.dm_exec_query_plan(r.plan_handle) AS qp
WHERE st.TEXT NOT LIKE 'Declare #Identifier Char(1) = ''~''%'
ORDER BY cpu_time DESC;
I have often found that if I have too many joins in a Linq query (whether using Entity Framework or NHibernate) and/or the shape of the resulting anonymous class is too complex, Linq takes a very long time to materialize the result set into objects.
This is a generic question, but here's a specific example using NHibernate:
var libraryBookIdsWithShelfAndBookTagQuery = (from shelf in session.Query<Shelf>()
join sbttref in session.Query<ShelfBookTagTypeCrossReference>() on
shelf.ShelfId equals sbttref.ShelfId
join bookTag in session.Query<BookTag>() on
sbttref.BookTagTypeId equals (byte)bookTag.BookTagType
join btbref in session.Query<BookTagBookCrossReference>() on
bookTag.BookTagId equals btbref.BookTagId
join book in session.Query<Book>() on
btbref.BookId equals book.BookId
join libraryBook in session.Query<LibraryBook>() on
book.BookId equals libraryBook.BookId
join library in session.Query<LibraryCredential>() on
libraryBook.LibraryCredentialId equals library.LibraryCredentialId
join lcsg in session
.Query<LibraryCredentialSalesforceGroupCrossReference>()
on library.LibraryCredentialId equals lcsg.LibraryCredentialId
join userGroup in session.Query<UserGroup>() on
lcsg.UserGroupOrganizationId equals userGroup.UserGroupOrganizationId
where
shelf.ShelfId == shelfId &&
userGroup.UserGroupId == userGroupId &&
!book.IsDeleted &&
book.IsDrm != null &&
book.BookFormatTypeId != null
select new
{
Book = book,
LibraryBook = libraryBook,
BookTag = bookTag
});
// add a couple of where clauses, then...
var result = libraryBookIdsWithShelfAndBookTagQuery.ToList();
I know it's not the query execution, because I put a sniffer on the database and I can see that the query is taking 0ms, yet the code is taking about a second to execute that query and bring back all of 11 records.
So yeah, this is an overly complex query, having 8 joins between 9 tables, and I could probably restructure it into several smaller queries. Or I could turn it into a stored procedure - but would that help?
What I'm trying to understand is, where is that red line crossed between a query that is performant and one that starts to struggle with materialization? What's going on under the hood? And would it help if this were a SP whose flat results I subsequently manipulate in memory into the right shape?
EDIT: in response to a request in the comments, here's the SQL emitted:
SELECT DISTINCT book4_.bookid AS BookId12_0_,
libraryboo5_.librarybookid AS LibraryB1_35_1_,
booktag2_.booktagid AS BookTagId15_2_,
book4_.title AS Title12_0_,
book4_.isbn AS ISBN12_0_,
book4_.publicationdate AS Publicat4_12_0_,
book4_.classificationtypeid AS Classifi5_12_0_,
book4_.synopsis AS Synopsis12_0_,
book4_.thumbnailurl AS Thumbnai7_12_0_,
book4_.retinathumbnailurl AS RetinaTh8_12_0_,
book4_.totalpages AS TotalPages12_0_,
book4_.lastpage AS LastPage12_0_,
book4_.lastpagelocation AS LastPag11_12_0_,
book4_.lexilerating AS LexileR12_12_0_,
book4_.lastpageposition AS LastPag13_12_0_,
book4_.hidden AS Hidden12_0_,
book4_.teacherhidden AS Teacher15_12_0_,
book4_.modifieddatetime AS Modifie16_12_0_,
book4_.isdeleted AS IsDeleted12_0_,
book4_.importedwithlexile AS Importe18_12_0_,
book4_.bookformattypeid AS BookFor19_12_0_,
book4_.isdrm AS IsDrm12_0_,
book4_.lightsailready AS LightSa21_12_0_,
libraryboo5_.bookid AS BookId35_1_,
libraryboo5_.libraryid AS LibraryId35_1_,
libraryboo5_.externalid AS ExternalId35_1_,
libraryboo5_.totalcopies AS TotalCop5_35_1_,
libraryboo5_.availablecopies AS Availabl6_35_1_,
libraryboo5_.statuschangedate AS StatusCh7_35_1_,
booktag2_.booktagtypeid AS BookTagT2_15_2_,
booktag2_.booktagvalue AS BookTagV3_15_2_
FROM shelf shelf0_,
shelfbooktagtypecrossreference shelfbookt1_,
booktag booktag2_,
booktagbookcrossreference booktagboo3_,
book book4_,
librarybook libraryboo5_,
library librarycre6_,
librarycredentialsalesforcegroupcrossreference librarycre7_,
usergroup usergroup8_
WHERE shelfbookt1_.shelfid = shelf0_.shelfid
AND booktag2_.booktagtypeid = shelfbookt1_.booktagtypeid
AND booktagboo3_.booktagid = booktag2_.booktagid
AND book4_.bookid = booktagboo3_.bookid
AND libraryboo5_.bookid = book4_.bookid
AND librarycre6_.libraryid = libraryboo5_.libraryid
AND librarycre7_.librarycredentialid = librarycre6_.libraryid
AND usergroup8_.usergrouporganizationid =
librarycre7_.usergrouporganizationid
AND shelf0_.shelfid = #p0
AND usergroup8_.usergroupid = #p1
AND NOT ( book4_.isdeleted = 1 )
AND ( book4_.isdrm IS NOT NULL )
AND ( book4_.bookformattypeid IS NOT NULL )
AND book4_.lightsailready = 1
EDIT 2: Here's the performance analysis from ANTS Performance Profiler:
It is often database "good" practice to place lots of joins or super common joins into views. ORMs don't let you ignore these facts nor do they supplement the decades of time spent fine tuning databases to do these kinds of things efficiently. Refactor those joins into a singular view or a couple views if that'd make more sense in the greater perspective of your application.
NHibernate should be optimizing the query down and reducing the data so that .Net only has to mess with the important parts. However, if those domain objects are just naturally large, that's still a lot of data. Also, if it's a really large result set in terms of rows returned, that's a lot of objects getting instantiated even if the DB is able to return the set quickly. Refactoring this query into a view that only returns the data you actually need would also reduce object instantiation overhead.
Another thought would be to not do a .ToList(). Return the enumerable and let your code lazily consume the data.
According to profiling information, the CreateQuery takes 45% of the total execution time. However as you mentioned the query took 0ms when you executed directly. But this alone is not enough to say there is a performance problem because,
You are running the query with the profiler which has significant impact on execution time.
When you use a profiler, it will affect every code is being profiled but not the sql execution time (because it happens in the SQL server), so you can see everything else is slower compared to SQL statement.
so ideal scenario is to measure how long it takes to execute entire code block, measure time for SQL query and calculate times, and if you do that you will probably end up with different values.
However, I'm not saying that the the NH Linq to SQL implementation is optimized for any query you come up with, but there are other ways in NHibernate to deal with those situations such as QueryOverAPI, CriteriaQueries, HQL and finally SQL.
Where is that red line crossed between a query that is performant and
one that starts to struggle with materialization. What's going on under the hood?
This one is pretty hard question and without having detail knowledge of NHibernate Linq to SQL provider it's hard to provide a accurate answer. You can always try different mechanisms provided and see which one is the best for given scenario.
And would it help if this were a SP whose flat results I subsequently
manipulate in memory into the right shape?
Yes, using a SP would help things to work pretty fast, but using SP would add more maintenance problems to your code base.
You have generic question, I'll tell you generic answer :)
If you query data for reading (not for update) try to use anonymous classes. The reason is - they are lighter to create, they have no navigatoin properties. And you select only data you need! It's very important rule. So, try to replace your select with smth like this:
select new
{
Book = new { book.Id, book.Name},
LibraryBook = new { libraryBook.Id, libraryBook.AnotherProperty},
BookTag = new { bookTag.Name}
}
Stored procedures are good, when query is complex and linq-provider generates not effective code, so, you can replace it with plain SQL or stored procedure. It's not offten case and, I think, it's not your situation
Run your sql-query. How many rows it returns? Is it the same value as result? Sometimes linq provider generates code, that select much more rows to select one entity. It happens, when entity has one to many relationship with another selecting entity. For example:
class Book
{
int Id {get;set;}
string Name {get;set;}
ICollection<Tag> Tags {get;set;}
}
class Tag
{
string Name {get;set;}
Book Book {get;set;}
}
...
dbContext.Books.Where(o => o.Id == 1).Select(o=>new {Book = o, Tags = o.Tags}).Single();
I Select only one book with Id = 1, but provider will generate code, that returns rows amount equals to Tags amount (entity framework does this).
Split complex query to set of simple and join in client side. Sometimes, you have complex query with many conditionals and resulting sql become terrible. So, you split you big query to more simple, get results of each and join/filter on client side.
At the end, I advice you to use anonymous class as result of select.
Don’t use Linq’s Join. Navigate!
in that post you can see:
As long as there are proper foreign key constraints in the database, the navigation properties will be created automatically. It is also possible to manually add them in the ORM designer. As with all LINQ to SQL usage I think that it is best to focus on getting the database right and have the code exactly reflect the database structure. With the relations properly specified as foreign keys the code can safely make assumptions about referential integrity between the tables.
I agree 100% with the sentiments expressed by everyone else (with regards to their being two parts to the optimisation here and the SQL execution being a big unknown, and likely cause of poor performance).
Another part of the solution that might help you get some speed is to pre-compile your LINQ statements. I remember this being a huge optimisation on a tiny project (high traffic) I worked on ages and ages ago... seems like it would contribute to the client side slowness you're seeing. Having said all that though I've not found a need to use them since... so heed everyone else's warnings first! :)
https://msdn.microsoft.com/en-us/library/vstudio/bb896297(v=vs.100).aspx
I am new to LINQ queries. I have read/researched about all advantages of LINQ queries over SQL but i have one basic question why do we need to use these queries as i feel their syntax is more complicated than traditional sql queries?
For example look at below example for simple Left Outer Join
var q=(from pd in dataContext.tblProducts
join od in dataContext.tblOrders on pd.ProductID equals od.ProductID into t
from rt in t.DefaultIfEmpty()
orderby pd.ProductID
select new
{
//To handle null values do type casting as int?(NULL int)
//since OrderID is defined NOT NULL in tblOrders
OrderID=(int?)rt.OrderID,
pd.ProductID,
pd.Name,
pd.UnitPrice,
//no need to check for null since it is defined NULL in database
rt.Quantity,
rt.Price,
})
.ToList();
So, the point of LINQ (Language Integrated Queries) is to provide easy ways of working with enumerable collections in executing memory. Contrast to SQL, which is a language for determining what the user gets from a set of data in a database.
Because of the SQL-like syntax, it's easy to confuse LINQ code with SQL, and think that they're 'alike' - they're really not. SQL gets a subset of data from a superset; LINQ is 'syntactic sugar' that hides common operations involving foreach loops.
For instance, this is a common programming pattern:
foreach(Thing thing in things)
{
if(thing.SomeProperty() == "Some Value")
return true;
}
...this is done rather easily in LINQ:
return things.Any(t => t.SomeProperty() == "Some Value");
The two code are functionally the same, and I'm pretty sure even compile to roughly the same IL code. The difference is how it looks to you.
You don't have to use LINQ; you can choose to use a standard foreach, and there are times, such as complex loops, where it is useful to do so. Ultimately it is a question of readability - my counter-question to you is, is the LINQ version of your foreach loop more, or less, readable than the original foreach loop?
If the answer is 'less', then I suggest converting it back to a foreach.
I'm by no means an sql or a linq expert, I use them both.
There is a trend to either make linq into something bad or a silver bullet depending on what side are you.
You need to seriously consider your project requirements in order to choose. The choice is not mutually exclusive. Take what is good from them both .
Advantages
Quick turn around for development
Queries can be dynamically
Tables are automatically created into class
Columns are automatically created into properties
Relationship are automatically appeaded to classes
Lambda expressions are awesome
Data is easy to setup and use
Disadvantages
No clear outline for Tiers
No good way of view permissions
Small data sets will take longer to build the query than execute
There is an overhead for creating queries
When queries are moved from sql to application side, joins are very slow
DBML concurrency issues
Hard to understand advance queries using Expressions
I found that programmers used to Sql will have a hard time figuring out the tricks with LINQ. But programmers with Sql knowledge, but haven't done a ton of work with it, will pick up linq quicker.
The main issue when people start using LINQ is that they keep thinking in the SQL way, they design the SQL query first and then translate it to LINQ. You need to learn how to think in the LINQ way and your LINQ query will become neater and simpler. For instance, in your LINQ you don't need joins. You should use Associations/Navigation Properties instead. Check this post for more details.
I need to do a query on my database that might be something like this where there could realistically be 100 or more search terms.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
IQueryable<Address> addressQuery = DbContext.Addresses;
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
return addressQuery;
}
However when it contains more than about 15 terms it throws and exception on execution because the SQL generated is too long.
Can this kind of query be done through Entity Framework?
What other options are there available to complete a query like this?
Sorry, are we talking about THIS EXACT SQL?
In that case it is a very simple "open your eyes thing".
There is a way (contains) to map that string into an IN Clause, that results in ONE sql condition (town in ('','',''))
Let me see whether I get this right:
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
should be
addressQuery.Where ( x => towns.Contains (x.Town)
The resulting SQL will be a LOT smaller. 100 items is still taxing it - I would dare saying you may have a db or app design issue here and that requires a business side analysis, I have not me this requirement in 20 years I work with databases.
This looks like a scenario where you'd want to use the PredicateBuilder as this will help you create an Or based predicate and construct your dynamic lambda expression.
This is part of a library called LinqKit by Joseph Albahari who created LinqPad.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
var predicate = PredicateBuilder.False<Address>();
foreach (string town in towns)
{
string temp = town;
predicate = predicate.Or (p => p.Town.Equals(temp));
}
return DbContext.Addresses.Where (predicate);
}
You've broadly got two options:
You can replace .Any with a .Contains alternative.
You can use plain SQL with table-valued-parameters.
Using .Contains is easier to implement and will help performance because it translated to an inline sql IN clause; so 100 towns shouldn't be a problem. However, it also means that the exact sql depends on the exact number of towns: you're forcing sql-server to recompile the query for each number of towns. These recompilations can be expensive when the query is complex; and they can evict other query plans from the cache as well.
Using table-valued-parameters is the more general solution, but it's more work to implement, particularly because it means you'll need to write the SQL query yourself and cannot rely on the entity framework. (Using ObjectContext.Translate you can still unpack the query results into strongly-typed objects, despite writing sql). Unfortunately, you cannot use the entity framework yet to pass a lot of data to sql server efficiently. The entity framework doesn't support table-valued-parameters, nor temporary tables (it's a commonly requested feature, however).
A bit of TVP sql would look like this select ... from ... join #townTableArg townArg on townArg.town = address.town or select ... from ... where address.town in (select town from #townTableArg).
You probably can work around the EF restriction, but it's not going to be fast and will probably be tricky. A workaround would be to insert your values into some intermediate table, then join with that - that's still 100 inserts, but those are separate statements. If a future version of EF supports batch CUD statements, this might actually work reasonably.
Almost equivalent to table-valued paramters would be to bulk-insert into a temporary table and join with that in your query. Mostly that just means you're table name will start with '#' rather than '#' :-). The temp table has a little more overhead, but you can put indexes on it and in some cases that means the subsequent query will be much faster (for really huge data-quantities).
Unfortunately, using either temporary tables or bulk insert from C# is a hassle. The simplest solution here is to make a DataTable; this can be passed to either. However, datatables are relatively slow; the over might be relevant once you start adding millions of rows. The fastest (general) solution is to implement a custom IDataReader, almost as fast is an IEnumerable<SqlDataRecord>.
By the way, to use a table-valued-parameter, the shape ("type") of the table parameter needs to be declared on the server; if you use a temporary table you'll need to create it too.
Some pointers to get you started:
http://lennilobel.wordpress.com/2009/07/29/sql-server-2008-table-valued-parameters-and-c-custom-iterators-a-match-made-in-heaven/
SqlBulkCopy from a List<>
I'm from old school where DB had all data access encapsulated into views, procedures, etc. Now I'm forcing myself into using LINQ for most of the obvious queries.
What I'm wondering though, is when to stop and what practical? Today I needed to run query like this:
SELECT D.DeviceKey, D.DeviceId, DR.DriverId, TR.TruckId, LP.Description
FROM dbo.MBLDevice D
LEFT OUTER JOIN dbo.DSPDriver DR ON D.DeviceKey = DR.DeviceKey
LEFT OUTER JOIN dbo.DSPTruck TR ON D.DeviceKey = TR.DeviceKey
LEFT OUTER JOIN
(
SELECT LastPositions.DeviceKey, P.Description, P.Latitude, P.Longitude, P.Speed, P.DeviceTime
FROM dbo.MBLPosition P
INNER JOIN
(
SELECT D.DeviceKey, MAX(P.PositionKey) LastPositionKey
FROM dbo.MBLPosition P
INNER JOIN dbo.MBLDevice D ON P.DeviceKey = D.DeviceKey
GROUP BY D.DeviceKey
) LastPositions ON P.PositionKey = LastPositions.LastPositionKey
) LP ON D.DeviceKey = LP.DeviceKey
WHERE D.IsActive = 1
Personally, I'm not able to write corresponing LINQ. So, I found tool online and got back 2 page long LINQ. It works properly-I can see it in profiler but it's not maintainable IMO. Another problem is that I'm doing projection and getting Anonymous object back. Or, I can manually create class and project into that custom class.
At this point I wonder if it is better to create View on SQL Server and add it to my model? It will break my "all SQL on cliens side" mantra but will be easier to read and maintain. No?
I wonder where you stop with T-SQL vs LINQ ?
EDIT
Model description.
I have DSPTrucks, DSPDrivers and MBLDevices.
Device can be attached to Truck or to Driver or to both.
I also have MBLPositions which is basically pings from device (timestamp and GPS position)
What this query does - in one shot it returns all device-truck-driver information so I know what this device attached to and it also get's me last GPS position for those devices. Response may look like so:
There is some redundant stuff but it's OK. I need to get it in one query.
In general, I would also default to LINQ for most simple queries.
However, when you get at a point where the corresponding LINQ query becomes harder to write and maintain, then what's the point really? So I would simply leave that query in place. It works, after all. To make it easier to use it's pretty straight-forward to map a view or cough stored procedure in your EF model. Nothing wrong with that, really (IMO).
You can firstly store Linq queries in variables which may help to make it not only more readable, but also reusable.
An example maybe like the following:
var redCars = from c in cars
where c.Colour == "red"
select c;
var redSportsCars = from c in redCars
where c.Type == "Sports"
select c;
Queries are lazily executed and not composed until you compile them or iterate over them so you'll notice in profiler that this does produce an effecient query
You will also benifit from defining relationships in the model and using navigation properties, rather than using the linq join syntax. This (again) will make these relationships reusable between queries, and more readable (because you don't specify the relationships in the query like the SQL above)
Generally speaking your LINQ query will be shorter than the equivalent SQL, but I'd suggest trying to work it out by hand rather than using a conversion tool.
With the exception of CTEs (which I'm fairly sure you can't do in LINQ) I would write all queries in LINQ these days
I find when using LINQ its best to ignore whatever sql it generates as long as its retrieving the right thing and is performant, only when one of those doesn't work do I actually look at what its generating.
In terms of the sql it generates being maintainable, you shouldn't really worry about the SQL being maintainable but more the LINQ query that is generating the SQL.
In the end if the sql is not quite right I believe there are various things you can do to make LINQ generate SQL more along the lines you want..to some extent.
AFAIK there isn't any inherent problem with getting anonymous objects back, however if you are doing it it multiple places you may want to create a class to keep things neater.