Improve LINQ query performance?

Improve LINQ query performance? - c#

public List<Agents_main_view_distinct> getActiveAgents(DateTime start, DateTime end)
{
myactiveagents = null;
myactiveagents = mydb.Agents_main_view_distincts.Where(u => u.Status.Equals("Existing") && u.DateJoined2 >=start && u.DateJoined2 <=end).OrderByDescending(ac => ac.Recno).ToList();
return myactiveagents;
}
I have a simple LINQ query that queries a view.My worry is its performance. It works well with few hundreds records but when the records are over 2000. The SQL server times out.
Things I have done to improve on the performance.
1. Wrote a query to query the tables directly (No improvement).
2. Reduced unnecessary columns, previous it had 27 columns, reduces it to 20.
In a desperation attempt I increased the server time out to 600. But still it was timing out.
View SQL query
SELECT dbo.Agents.Recno, dbo.Agents.Rec_date, dbo.Agents.AgentsId, dbo.Agents.AgentsName, dbo.Agents.Industry_status, dbo.Agents.DOB, dbo.Agents.Branch,
dbo.Agents.MobileNumber, dbo.Agents.MaritalStatus, dbo.Agents.PIN, dbo.Agents.Gender, dbo.Agents.Email, dbo.Agents.ProvisionalLicense,
dbo.Agents.IRALicenseNumber, dbo.Agents.PreviousCompany, dbo.Agents.YearsOfExperience, dbo.Agents.COPNumber, dbo.Agents.DateJoined AS DateJoined2,
dbo.Agents.DateJoined, dbo.Agents.PreviousOccupation, dbo.Agents.ProffesionalQualification, dbo.Agents.EducationalQualification, dbo.Agents.Status,
dbo.Agents.Termination_Date2 AS Termination_Date, dbo.Agents.Comments, dbo.Agents.Temination_code, dbo.Agents.Company_ID, dbo.Agents.Submit_By,
dbo.Agents.PassportPhoto, dbo.Insurane_Companies.Company_name, DATEDIFF(year, dbo.Agents.DOB, GETDATE()) AS age, YEAR(dbo.Agents.DateJoined)
AS YearJoined, YEAR(dbo.Agents.Termination_Date2) AS YearTermination, dbo.Agents.REGION, dbo.Agents.DOB AS DOB2,
dbo.Insurane_Companies.Company_code
FROM dbo.Agents INNER JOIN
dbo.Insurane_Companies ON dbo.Agents.Company_ID = dbo.Insurane_Companies.Company_id

You could try moving the Where and OrderBy clauses into the view itself, passing in parameters by using a stored procedure/ user defined function if necessary.
You could also add Glimpse to your project. Amongst many other things, you can inspect SQL calls to see if you have any unnecessary or time consuming DB hits.

As James suggested, it would be best to create a stored procedure and call it via LINQ, passing any necessary parameters. This way, the server will handle the processing in the should be much faster since the query would not have to be converted back to SQL to be processed. Other than that you can use SQL's Query Analyzer or Profiler to see what the bottleneck may be.

Related

Why this parameterized SQL takes forever when the same hardcoded one executes in no time

I've got a query that looks like this:
SELECT ct,
text AS ST,
kval.idkwd
FROM (SELECT ST = kv.idkwd,
Count(kv.idkwd) CT,
kv.idkwd
FROM mwf
INNER JOIN info
ON mwf.ident = info.idinfo
INNER JOIN rel
ON rel.idinfo = info.idinfo
INNER JOIN pers
ON pers.idpers = rel.idpers
LEFT JOIN kwd kv
ON kv.idkwd = info.kwsvstatus
WHERE mwf.id IN ( :mwfIds)
GROUP BY idkwd) kw
INNER JOIN kwd kval
ON kw.idkwd = kval.idkwd
ORDER BY text
From a ASP.NET application, this query is executed this way, using NHibernate:
session.CreateQuery(query);
query.SetParameterList("mwfIds", mwfIds, NHibernateUtil.Guid);
return query.List();
For a reason unknown, it sometimes takes 30 seconds to run (for some given parameters). The measures are given by SQL Profiler.
I tried executing this same query with the same parameters on SSMS (copied from the SQL Profiler output), and it runs in less than 1 second.
Worse, if I change the C# code to
session.CreateQuery(hardcodedQuery);
return query.List();
where hardcodedQuery is the same query I ran in SSMS (i.e. the same as always, only without any parameter set using NH), it also runs in less than 1 second.
Why does the parameterized query take so much time ?

As already said by Sean Lange in his comment this behavior is very likely to be caused by parameter sniffing.
In my experience, it has always been solved by fixing the indexes. (Do not add indexes too quickly, having too many indexes may causes other performance issues. Like bad index choices by the query optimizer, leading to temp db spills by example.)
Parameter sniffing does not occur only on stored procedure. By examples, it occurs on sql queries executed through sp_executesql or EXEC(). It may even occurs with auto-parameterized scalar values founded in queries.
Parameter sniffing is an optimization fall-back used by SQL Server in case of missing indexes. It shapes a query plans generated for a first query with its specific parameters values, which then get cached in query plan cache. All subsequent call to the same query with different parameters values, with similar connection properties, will then use that query plan, whatever the parameters values are.
If the values of the first query call was corresponding to a corner case yielding a high filtering condition from one table, but others calls values do not cause the same high filtering, the cached query plan causes them to badly perform.
SSMS has rarely the same connection options than your application, causing it to not reuse the cached query plan used by the application. Another query plan gets generated, adapted to the query parameters values you are testing if you are lacking indexes. So SSMS appears to perform better... But no, it does just use a query plan tailored for the specific parameters values you are testing.
A more detailed, precise and adequate explanation can be read in Slow in the Application, Fast in SSMS? Understanding Performance Mysteries blog post.
Do not be deterred by its raw aspect, this blog is a great resource in my opinion. Do not either be fooled by the How SQL Server Compiles a Stored Procedure heading, he writes in the second sentence following it:
If your application does not use stored procedures, but submits SQL statements directly, most of what I say this chapter is still applicable.
This blog post will also give you guidance on how to resolve such issues.

This might be because of out-dated statistics. Please use "inner hash join" instead of "inner join". it possible makes a difference.
Or you can update statistics regularly (or use auto update statistics) if practical. Updating statistics may take long if your table is huge though.

Efficiently paging large data sets with LINQ

When looking into the best ways to implement paging in C# (using LINQ), most suggestions are something along these lines:
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
// Get the total num records
var total = query.Count();
// Page the results
var paged = query.Skip((pageNum - 1) * pageSize).Take(pageSize);
This seems to be the commonly suggested strategy (simplified).
For me, my main purpose in paging is for efficiency. If my table contains 1.2 million records where Something == something, I don't want to retrieve all of them at the same time. Instead, I want to page the data, grabbing as few records as possible. But with this method, it seems that this is a moot point.
If I understand it correctly, the first statement is still retrieving the 1.2 million records, then it is being paged as necessary.
Does paging in this way actually improve performance? If the 1.2 million records are going to be retrieved every time, what's the point (besides the obvious UI benefits)?
Am I misunderstanding this? Any .NET gurus out there that can give me a lesson on LINQ, paging, and performance (when dealing with large data sets)?

The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.
It is when you call query.Count() that the first will be executed
SELECT COUNT(*) FROM Table WHERE Something = something
On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).
If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.
Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.

// Execute the query
var query = db.Entity.Where(e => e.Something == something);
For your information, nothing is called after the first statement.
// Get the total num records
var total = query.Count();
This count query will be translated to SQL, and it'll make a call to database.
This call will not get all records, because the generated SQL is something like this:
SELECT COUNT(*) FROM Entity where Something LIKE 'something'
For the last query, it doesn't get all the records neither. The query will be translated into SQL, and the paging run in the database.
Maybe you'll find this question useful: efficient way to implement paging

I believe Entity Framework might structure the SQL query with the appropriate conditions based on the linq statements. (e.g. using ROWNUMBER() OVER ...).
I could be wrong on that, however. I'd run SQL profiler and see what the generated query looks like.

Complex Linq-To-Entities query with deferred execution: prevent OrderBy being used as a subquery/projection

I built a dynamic LINQ-to-Entities query to support optional search parameters. It was quite a bit of work to get this producing performant SQL and I am NEARLY there, but I stumble across a big issue with OrderBy which gets translated into kind of a projection / subquery containing the actual query, causing extremely inperformant SQL. I can't find a solution to get this right. Maybe someone can help me out :)
I spare you the complete query for now as it is long and complex, I translate it into a simple sample for better understanding:
I'm doing something like this:
// Start with the base query
var query = from a in db.Articles
where a.UserId = 1;
// Apply some optional conditions
if (tagParam != null)
query = query.Where(a => a.Tag = tagParam);
if (authorParam != null)
query = query.Where(a => a.Author = authorParam);
// ... and so on ...
// I only want the 50 most recent articles, so I finally want to apply Take and OrderBy
query = query.OrderByDescending(a => a.Published);
query = query.Take(50);
The resulting SQL strangely translates the OrderBy in an container query:
select top 50 Id, Published, Title, Content
from (select Id, Published, Title Content
from Articles
where UserId = 1
and Author = #paramAuthor)
order by Published desc
Note that also the Top 50 got moved to the outer query. In case I would only use Take(50), the top 50 sql statement would correctly be applied to the inner query above (the outer query wouldn't even exist). Only when I use OrderBy, Linq-To-Entities uses this container query approach.
This causes a very bad execution plan where the inner query takes all articles that apply to the parameters from Disk and pass them to the outer query - and only there, OrderBy and Top is processed. In my case, this can be hundred thousands of lines. I already tried to move the order by manually into the inner statement and execute this - this produces much better results as the existing indexes allow the SQL Server to easily find the top 50 rows in right order without reading all rows from disk.
Is there any way I can get EF to append the order by clause to the inner query? Or any other trick to get this working right?
Any help would be greatly appreciated :)
Edit: As an additional information, some tests with less complex queries showed that the Optimizer normally handles such subquery scenarios well. In my scenario, the Optimizer fails on this unfortunately and moves hundrets of thousands of rows through the query plan. But moving the OrderBy to the inner query solves it and the Optimizer does it right.
Edit 2: After couple of hours of more testing it seems the issue with the wrong execution plan is a SQL Server issue that is not caused by the created container query. While the move of the order by and top clause into the inside query did fix the issue initially, I can't reproduce this now anymore, SQL Server started using the bad execution plan now also here (while the data in the DB remained unchanged). The move of the order by clause might caused SQL Server to take other statistics into account but it seems it was not due to the better/more clean query design. However, I still want to know why EF uses a container query here and if I can influence this behavior. If it will not improve performance, at least it would make debugging easier if the generated EF queries are more straightforward and not that convoluted.

EF 4.0 make batch query to get count of resultset but return only top 5 records

Is there a way to get count of resultset but return only top 5 records while making just one db hit instead of 2 (one for count and second for data)

There is not a particularly good way to do this in Entity Framework, at least as of v4. #Tobias writes a single LINQ query, but his suspicions are correct. You'll see multiple queries roll by in SQL Profiler.
Ignoring EF for a minute, this is a relatively complicated problem for SQL Server. Well, it's complicated once your data size gets large or your query gets complicated. You can get a flavor for what's involved here.
With that said, I wouldn't worry about it being 2 queries just yet. Don't optimize until you know it is an actual performance problem. You'll likely end up working around EF, maybe using the EF extensions and creating a stored proc that can take advantage of windowed functions and CTE's. Or maybe it will just return two result sets in a single procedure.

This little query should do the trick (I'm not sure if that's really just one physical query though, and it could be that the grouping is done in the code rather than in the DB), but it's definitely more convenient:
var obj = (from x in entities.SomeTable
let item = new { N = 1, x }
group item by item.N into g
select new { Count = g.Count(), First = g.Take(5) }).FirstOrDefault();
Nonetheless, just doing this in two queries will definitely be much faster (especially if you define them in one stored procedure, as proposed here).

Selecting first 100 records using Linq

How can I return first 100 records using Linq?
I have a table with 40million records.
This code works, but it's slow, because will return all values before filter:
var values = (from e in dataContext.table_sample
where e.x == 1
select e)
.Take(100);
Is there a way to return filtered? Like T-SQL TOP clause?

No, that doesn't return all the values before filtering. The Take(100) will end up being part of the SQL sent up - quite possibly using TOP.
Of course, it makes more sense to do that when you've specified an orderby clause.
LINQ doesn't execute the query when it reaches the end of your query expression. It only sends up any SQL when either you call an aggregation operator (e.g. Count or Any) or you start iterating through the results. Even calling Take doesn't actually execute the query - you might want to put more filtering on it afterwards, for instance, which could end up being part of the query.
When you start iterating over the results (typically with foreach) - that's when the SQL will actually be sent to the database.
(I think your where clause is a bit broken, by the way. If you've got problems with your real code it would help to see code as close to reality as possible.)

I don't think you are right about it returning all records before taking the top 100. I think Linq decides what the SQL string is going to be at the time the query is executed (aka Lazy Loading), and your database server will optimize it out.

Have you compared standard SQL query with your linq query? Which one is faster and how significant is the difference?
I do agree with above comments that your linq query is generally correct, but...
in your 'where' clause should probably be x==1 not x=1 (comparison instead of assignment)
'select e' will return all columns where you probably need only some of them - be more precise with select clause (type only required columns); 'select *' is a vaste of resources
make sure your database is well indexed and try to make use of indexed data
Anyway, 40milions records database is quite huge - do you need all that data all the time? Maybe some kind of partitioning can reduce it to the most commonly used records.

I agree with Jon Skeet, but just wanted to add:
The generated SQL will use TOP to implement Take().
If you're able to run SQL-Profiler and step through your code in debug mode, you will be able to see exactly what SQL is generated and when it gets executed. If you find the time to do this, you will learn a lot about what happens underneath.
There is also a DataContext.Log property that you can assign a TextWriter to view the SQL generated, for example:
dbContext.Log = Console.Out;
Another option is to experiment with LINQPad. LINQPad allows you to connect to your datasource and easily try different LINQ expressions. In the results panel, you can switch to see the SQL generated the LINQ expression.

I'm going to go out on a limb and guess that you don't have an index on the column used in your where clause. If that's the case then it's undoubtedly doing a table scan when the query is materialized and that's why it's taking so long.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.