Linq. Help me tune this! - c#

I have a linq query that is causing some timeout issues. Basically, I have a query that is returning the top 100 results from a table that has approximately 500,000 records.
Here is the query:
using (var dc = CreateContext())
{
var accounts = string.IsNullOrEmpty(searchText)
? dc.Genealogy_Accounts
.Where(a => a.Genealogy_AccountClass.Searchable)
.OrderByDescending(a => a.ID)
.Take(100)
: dc.Genealogy_Accounts
.Where(a => (a.Code.StartsWith(searchText)
|| a.Name.StartsWith(searchText))
&& a.Genealogy_AccountClass.Searchable)
.OrderBy(a => a.Code)
.Take(100);
return accounts.Select(a =>
}
}
Oddly enough it is the first linq query that is causing the timeout. I thought that by doing a 'Take' we wouldn't need to scan all 500k of records. However, that must be what is happening. I'm guessing that the join to find what is 'searchable' is causing the issue. I'm not able to denormalize the tables... so I'm wondering if there is a way to rewrite the linq query to get it to return quicker... or if I should just write this query as a Stored Procedure (and if so, what might it look like). Thanks.

Well to start with, I'd find out what query is being generated (in LINQ to SQL you'd set the Log on the data context) and then profile it in SQL Server Management Studio. Play with it there until you've found something that is fast enough (either by changing the query or adding indexes) and if you've had to change the query, work out how to represent that in LINQ.
I suspect the problem is that you're combining OrderBy and Take - which means it potentially needs to find out all the results in order to work out which the top 100 would look like. Is Code indexed? If not, try indexing that - it may help by allowing the server to consider records in the order in which they'd be returned, so it can stop after it's found 100 records. You should look at indexes for the other columns too.

The Take(100) translates to "Select Top 100" etc. This would help if your problem was an otherwise huge result set, where there are a lot of columns returned. I bet though that your problem is a table scan resulting from the query. In this case, .Take(100) might not help much at all.
So, the likely culprit is the same as if you were doing SQL using ADO.NET: How are your Indxes? Are the fields being searched fields for which you don't have good indexes? This would cause a drastic decrease in performance compared to queries that do utilize good indexes. Add an index that includes Code and Name and see what happens. Not using an index for Code is guaranteed to hose you, because of the Order By. Also, what field links Genealogy_Accounts and Genealogy_AccountClass? A lack of index on either table could hose things. (I would guess an index including Searchable is unlikely to help.)
Use SQL Profiler to see the actual query being run (though you can do this in VS too), and to see how bad it really is on the server.
The problem might be LINQ doing something stupid generating the query, but this is probably not the case. We're finding LINQ-to-SQL often makes better queries than we do. Even if it looks goofy, it's usually very efficient. You can put the SQL in Query Analyzer, and check out the query plan. Then rewrite the SQL to be more human-simple and see if it improve things -- I bet it won't. I think you'll still see a table scan, indicating something is wrong with your index.

Related

EF LINQ ToList is very slow

I am using ASP NET MVC 4.5 and EF6, code first migrations.
I have this code, which takes about 6 seconds.
var filtered = _repository.Requests.Where(r => some conditions); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes 6 seconds, has 8 items inside
I thought that this is because of relations, it must build them inside memory, but that is not the case, because even when I return 0 fields, it is still as slow.
var filtered = _repository.Requests.Where(r => some conditions).Select(e => new {}); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes still around 5-6 seconds, has 8 items inside
Now the Requests table is quite complex, lots of relations and has ~16k items. On the other hand, the filtered list should only contain proxies to 8 items.
Why is ToList() method so slow? I actually think the problem is not in ToList() method, but probably EF issue, or bad design problem.
Anyone has had experience with anything like this?
EDIT:
These are the conditions:
_repository.Requests.Where(r => ids.Any(a => a == r.Student.Id) && r.StartDate <= cycle.EndDate && r.EndDate >= cycle.StartDate)
So basically, I can checking if Student id is in my id list and checking if dates match.
Your filtered variable contains a query which is a question, and it doesn't contain the answer. If you request the answer by calling .ToList(), that is when the query is executed. And that is the reason why it is slow, because only when you call .ToList() is the query executed by your database.
It is called Deferred execution. A google might give you some more information about it.
If you show some of your conditions, we might be able to say why it is slow.
In addition to Maarten's answer I think the problem is about two different situation
some condition is complex and results in complex and heavy joins or query in your database
some condition is filtering on a column which does not have an index and this cause the full table scan and make your query slow.
I suggest start monitoring the query generated by Entity Framework, it's very simple, you just need to set Log function of your context and see the results,
using (var context = new MyContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
if you see something strange in generated query try to make it better by breaking it in parts, some times Entity Framework generated queries are not so good.
if the query is okay then the problem lies in your database (assuming no network problem).
run your query with an SQL profiler and check what's wrong.
UPDATE
I suggest you to:
add index for StartDate and EndDate Column in your table (one for each, not one for both)
ToList executes the query against DB, while first line is not.
Can you show some conditions code here?
To increase the performance you need to optimize query/create indexes on the DB tables.
Your first line of code only returns an IQueryable. This is a representation of a query that you want to run not the result of the query. The query itself is only runs on the databse when you call .ToList() on your IQueryable, because its the first point that you have actually asked for data.
Your adjustment to add the .Select only adds to the existing IQueryable query definition. It doesnt change what conditions have to execute. You have essentially changed the following, where you get back 8 records:
select * from Requests where [some conditions];
to something like:
select '' from Requests where [some conditions];
You will still have to perform the full query with the conditions giving you 8 records, but for each one, you only asked for an empty string, so you get back 8 empty strings.
The long and the short of this is that any performance problem you are having is coming from your "some conditions". Without seeing them, its is difficult to know. But I have seen people in the past add .Where clauses inside a loop, before calling .ToList() and inadvertently creating a massively complicated query.
Jaanus. The most likely reason of this issue is complecity of generated SQL query by entity framework. I guess that your filter condition contains some check of other tables.
Try to check generated query by "SQL Server Profiler". And then copy this query to "Management Studio" and check "Estimated execution plan". As a rule "Management Studio" generatd index recomendation for your query try to follow these recomendations.

Complex Linq-To-Entities query with deferred execution: prevent OrderBy being used as a subquery/projection

I built a dynamic LINQ-to-Entities query to support optional search parameters. It was quite a bit of work to get this producing performant SQL and I am NEARLY there, but I stumble across a big issue with OrderBy which gets translated into kind of a projection / subquery containing the actual query, causing extremely inperformant SQL. I can't find a solution to get this right. Maybe someone can help me out :)
I spare you the complete query for now as it is long and complex, I translate it into a simple sample for better understanding:
I'm doing something like this:
// Start with the base query
var query = from a in db.Articles
where a.UserId = 1;
// Apply some optional conditions
if (tagParam != null)
query = query.Where(a => a.Tag = tagParam);
if (authorParam != null)
query = query.Where(a => a.Author = authorParam);
// ... and so on ...
// I only want the 50 most recent articles, so I finally want to apply Take and OrderBy
query = query.OrderByDescending(a => a.Published);
query = query.Take(50);
The resulting SQL strangely translates the OrderBy in an container query:
select top 50 Id, Published, Title, Content
from (select Id, Published, Title Content
from Articles
where UserId = 1
and Author = #paramAuthor)
order by Published desc
Note that also the Top 50 got moved to the outer query. In case I would only use Take(50), the top 50 sql statement would correctly be applied to the inner query above (the outer query wouldn't even exist). Only when I use OrderBy, Linq-To-Entities uses this container query approach.
This causes a very bad execution plan where the inner query takes all articles that apply to the parameters from Disk and pass them to the outer query - and only there, OrderBy and Top is processed. In my case, this can be hundred thousands of lines. I already tried to move the order by manually into the inner statement and execute this - this produces much better results as the existing indexes allow the SQL Server to easily find the top 50 rows in right order without reading all rows from disk.
Is there any way I can get EF to append the order by clause to the inner query? Or any other trick to get this working right?
Any help would be greatly appreciated :)
Edit: As an additional information, some tests with less complex queries showed that the Optimizer normally handles such subquery scenarios well. In my scenario, the Optimizer fails on this unfortunately and moves hundrets of thousands of rows through the query plan. But moving the OrderBy to the inner query solves it and the Optimizer does it right.
Edit 2: After couple of hours of more testing it seems the issue with the wrong execution plan is a SQL Server issue that is not caused by the created container query. While the move of the order by and top clause into the inside query did fix the issue initially, I can't reproduce this now anymore, SQL Server started using the bad execution plan now also here (while the data in the DB remained unchanged). The move of the order by clause might caused SQL Server to take other statistics into account but it seems it was not due to the better/more clean query design. However, I still want to know why EF uses a container query here and if I can influence this behavior. If it will not improve performance, at least it would make debugging easier if the generated EF queries are more straightforward and not that convoluted.

Does the order of the include and the where matter in a LINQ query?

I have the following:
var objectives = _objectivesRepository
.GetAll()
.Where(o => o.ExamId == examId || examId == 0)
.Include(o => o.ObjectiveDetails)
.ToList();
In a previous post one of the users said that it was important to put the where before the include in a LINQ query.
Can someone let me know if this is correct? Does order matter? How about if there are many where and includes ?
In Entity Framework yes it does matter, but only in certain scenarios. When using groupings or projections, it will fail to include the requested data.
See this blog post on the subject.
The actual answer, is that usually, the order does not matter significantly. Following your example statement, I would describe the logical translational steps to a relational query:
Get all objects, with all their properties (in relational algebra they are considered attributes)
Restrict the retrieved rows based on your condition ((relational algebra projection operation)
Restrict the attributes of the retrieved rows which are eagerly loaded (relational algebra selection operation)
In your specific query, the steps 2 and 3 are interchangeable without altering the final outcome. As stated here, this is the default case. Nevertheless, even if the final outcome would not change, the performance could be significantly be affected. This is the reason for which the modern databases have query optimizers which create an execution plan to optimize the specific query.
Nevertheless, this is not always the case. So, I suppose that you could always find a case where the above do not apply. Regarding performance, no assumptions are safe. You should always measure things. You could always use the SQL Server profiler to see the translation of your linq to entities query to the final SQL query. Then you could use the SQL server tools (like the query analyzer) to see the execution plan of the final SQL query.
Hope I helped!

Querying with many (~100) search terms with Entity Framework

I need to do a query on my database that might be something like this where there could realistically be 100 or more search terms.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
IQueryable<Address> addressQuery = DbContext.Addresses;
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
return addressQuery;
}
However when it contains more than about 15 terms it throws and exception on execution because the SQL generated is too long.
Can this kind of query be done through Entity Framework?
What other options are there available to complete a query like this?
Sorry, are we talking about THIS EXACT SQL?
In that case it is a very simple "open your eyes thing".
There is a way (contains) to map that string into an IN Clause, that results in ONE sql condition (town in ('','',''))
Let me see whether I get this right:
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
should be
addressQuery.Where ( x => towns.Contains (x.Town)
The resulting SQL will be a LOT smaller. 100 items is still taxing it - I would dare saying you may have a db or app design issue here and that requires a business side analysis, I have not me this requirement in 20 years I work with databases.
This looks like a scenario where you'd want to use the PredicateBuilder as this will help you create an Or based predicate and construct your dynamic lambda expression.
This is part of a library called LinqKit by Joseph Albahari who created LinqPad.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
var predicate = PredicateBuilder.False<Address>();
foreach (string town in towns)
{
string temp = town;
predicate = predicate.Or (p => p.Town.Equals(temp));
}
return DbContext.Addresses.Where (predicate);
}
You've broadly got two options:
You can replace .Any with a .Contains alternative.
You can use plain SQL with table-valued-parameters.
Using .Contains is easier to implement and will help performance because it translated to an inline sql IN clause; so 100 towns shouldn't be a problem. However, it also means that the exact sql depends on the exact number of towns: you're forcing sql-server to recompile the query for each number of towns. These recompilations can be expensive when the query is complex; and they can evict other query plans from the cache as well.
Using table-valued-parameters is the more general solution, but it's more work to implement, particularly because it means you'll need to write the SQL query yourself and cannot rely on the entity framework. (Using ObjectContext.Translate you can still unpack the query results into strongly-typed objects, despite writing sql). Unfortunately, you cannot use the entity framework yet to pass a lot of data to sql server efficiently. The entity framework doesn't support table-valued-parameters, nor temporary tables (it's a commonly requested feature, however).
A bit of TVP sql would look like this select ... from ... join #townTableArg townArg on townArg.town = address.town or select ... from ... where address.town in (select town from #townTableArg).
You probably can work around the EF restriction, but it's not going to be fast and will probably be tricky. A workaround would be to insert your values into some intermediate table, then join with that - that's still 100 inserts, but those are separate statements. If a future version of EF supports batch CUD statements, this might actually work reasonably.
Almost equivalent to table-valued paramters would be to bulk-insert into a temporary table and join with that in your query. Mostly that just means you're table name will start with '#' rather than '#' :-). The temp table has a little more overhead, but you can put indexes on it and in some cases that means the subsequent query will be much faster (for really huge data-quantities).
Unfortunately, using either temporary tables or bulk insert from C# is a hassle. The simplest solution here is to make a DataTable; this can be passed to either. However, datatables are relatively slow; the over might be relevant once you start adding millions of rows. The fastest (general) solution is to implement a custom IDataReader, almost as fast is an IEnumerable<SqlDataRecord>.
By the way, to use a table-valued-parameter, the shape ("type") of the table parameter needs to be declared on the server; if you use a temporary table you'll need to create it too.
Some pointers to get you started:
http://lennilobel.wordpress.com/2009/07/29/sql-server-2008-table-valued-parameters-and-c-custom-iterators-a-match-made-in-heaven/
SqlBulkCopy from a List<>

Selecting first 100 records using Linq

How can I return first 100 records using Linq?
I have a table with 40million records.
This code works, but it's slow, because will return all values before filter:
var values = (from e in dataContext.table_sample
where e.x == 1
select e)
.Take(100);
Is there a way to return filtered? Like T-SQL TOP clause?
No, that doesn't return all the values before filtering. The Take(100) will end up being part of the SQL sent up - quite possibly using TOP.
Of course, it makes more sense to do that when you've specified an orderby clause.
LINQ doesn't execute the query when it reaches the end of your query expression. It only sends up any SQL when either you call an aggregation operator (e.g. Count or Any) or you start iterating through the results. Even calling Take doesn't actually execute the query - you might want to put more filtering on it afterwards, for instance, which could end up being part of the query.
When you start iterating over the results (typically with foreach) - that's when the SQL will actually be sent to the database.
(I think your where clause is a bit broken, by the way. If you've got problems with your real code it would help to see code as close to reality as possible.)
I don't think you are right about it returning all records before taking the top 100. I think Linq decides what the SQL string is going to be at the time the query is executed (aka Lazy Loading), and your database server will optimize it out.
Have you compared standard SQL query with your linq query? Which one is faster and how significant is the difference?
I do agree with above comments that your linq query is generally correct, but...
in your 'where' clause should probably be x==1 not x=1 (comparison instead of assignment)
'select e' will return all columns where you probably need only some of them - be more precise with select clause (type only required columns); 'select *' is a vaste of resources
make sure your database is well indexed and try to make use of indexed data
Anyway, 40milions records database is quite huge - do you need all that data all the time? Maybe some kind of partitioning can reduce it to the most commonly used records.
I agree with Jon Skeet, but just wanted to add:
The generated SQL will use TOP to implement Take().
If you're able to run SQL-Profiler and step through your code in debug mode, you will be able to see exactly what SQL is generated and when it gets executed. If you find the time to do this, you will learn a lot about what happens underneath.
There is also a DataContext.Log property that you can assign a TextWriter to view the SQL generated, for example:
dbContext.Log = Console.Out;
Another option is to experiment with LINQPad. LINQPad allows you to connect to your datasource and easily try different LINQ expressions. In the results panel, you can switch to see the SQL generated the LINQ expression.
I'm going to go out on a limb and guess that you don't have an index on the column used in your where clause. If that's the case then it's undoubtedly doing a table scan when the query is materialized and that's why it's taking so long.

Categories