Linq to Entities : Count is very slow - c#

I need to have parent and parent.child.count()....in the query.. when i do this it is taking 20 seconds....its not a huge database...Any ideas for optimization...
var plist = context.persons
.Select(p => new
{
p.fullName,
c.personID,
p.Status,
p.Birthdate,
p.Accounts.Count
}).ToList();

Here is a great article on using count() when you really meant to use any()
http://blogs.teamb.com/craigstuntz/2010/04/21/38598/

Do you need to use .count or could you use .any?
http://msdn.microsoft.com/en-us/library/bb534972.aspx

Since this is entity framework, open up the sql profiler and take a look at what sql queries are being sent to the database. It sounds like you may see that a single query is sent to fetch the group identifiers, and then another set of queries (one for each group) might be fetching the count. If that's happening, you'll have to post the linq query for someone to resolve the issue.

Based on the code you sent, it doesn't look like things should be taking that long. I have a few suggestions:
Use LinqPad to do this query. It will let you see the SQL that gets generated. Then run that SQL code in SQL Server Management Studio, and tell it to include the actual execution plan. This will help you learn whether there's a particular point in the query that's taking a lot of time. For example, if you don't have an index on the Account table's PersonId reference, this query will take a lot longer.
Look at how you're using this data. It's very rare that you really need to have all the people in your entire system in memory at the same time. In fact, I suspect that simply getting all this person data out of the database is probably taking a lot more time than the Count() is.
Are you displaying this data? If so, wouldn't it be better to "page" the results, only showing maybe ten entries at a time? You can use the .Take(int) method before calling .ToList() to get only as many entries as you need.
If you're processing and aggregating this data for the sake of site metrics, it's probably better to set up your query to return the end result before it gets evaluated.
If you can describe how this data is being used, or provide a screenshot of the SQL's execution, we can provide more feedback.

I solved a similar problem using the GroupBy method.
IEnumerable> accounts = Accounts.GroupBy(x => x.personID);
accounts.Count() will return the number of accounts that belong to the person.
accounts.Key will return the personID of the group.

I had a somehow similar problem, I tried these and worked out better :
child.count(x=> x.paretnID == inputParentID)
child.where(x=> x.parentID == inputParentID)
my original code which took around 15-20 seconds on each iteration was:
return (isEdit) ? db.ChasisBuys.Single(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Chasises.Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";
new code which runs good is :
**return (isEdit) ? db.Chasises.Where(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";**
Database has around 1000 records in chasises , about 5 in chasisBuys and about 20 in Bikes.
my opinion is that Linq to SQL queries does not do preevaluations such in logical statements which for instance if you write "return a && b && c;" if statement a is false other statements are not evaluated and I was expecting such thing in linq to sql but it's not the case.

Related

Simulate Entity Framework's .Last() when using SQL Server

SQL server is able to translate EF .First() using its function TOP(1). But when using Entity Framework's .Last() function, it throws an exception. SQL server does not recognize such functions, for obvious reasons.
I used to work it around by sorting descending and taking the first corresponding line :
var v = db.Table.OrderByDescending(t => t.ID).FirstOrDefault(t => t.ClientNumber == ClientNumberDetected);
This does it with a single query, but sorting the whole table (million rows) before querying...
Do I have good reasons to think there will be speed issues if I abuse of this technique ?
I thought of something similar... but it requires two query :
int maxID_of_Client = db.Where(t => t.ClientNumber == ClientNumberDetected).Max(t => t.ID);
var v = db.First(t => t.ID == maxID_of_Client);
It's consisting of retrieving the max ID of the client, then use this ID to retrieve the last line of the client.
It doesn't seems faster to query two times...
There must be a way to optimize this and use a single query without sorting millions of datas.
Unless there is something I don't understand, I'm probably not the first to think about this problem and I want to solve it for good !
Thanks in advance.
The assumption driving this question is that result sets with no ordering clause come back from your DB in any predictable order at all.
In reality, result sets that come back from SQL have no implicit ordering and none should be assumed.
Therefore, the result of
db.Table.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is actually indeterminate.
Whether you're taking first or last, without ordering it's all meaningless anyway.
Now, what goes to SQL where you add an ordering clause to your LINQ? It will be something similar to...
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue
or, in the the descending/last case
SELECT TOP(1) something FROM somewhere WHERE foo=bar ORDER BY somevalue DESC
From SQL's POV, there's no significant difference here and your DB will be optimized for this sort of query. The index can be scanned in either direction, and the cost of each query above is the same.
TL;DR :
db.Table.OrderByDescending(t => t.ID)
.FirstOrDefault(t => t.ClientNumber == ClientNumberDetected)
is just fine.

EF LINQ ToList is very slow

I am using ASP NET MVC 4.5 and EF6, code first migrations.
I have this code, which takes about 6 seconds.
var filtered = _repository.Requests.Where(r => some conditions); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes 6 seconds, has 8 items inside
I thought that this is because of relations, it must build them inside memory, but that is not the case, because even when I return 0 fields, it is still as slow.
var filtered = _repository.Requests.Where(r => some conditions).Select(e => new {}); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes still around 5-6 seconds, has 8 items inside
Now the Requests table is quite complex, lots of relations and has ~16k items. On the other hand, the filtered list should only contain proxies to 8 items.
Why is ToList() method so slow? I actually think the problem is not in ToList() method, but probably EF issue, or bad design problem.
Anyone has had experience with anything like this?
EDIT:
These are the conditions:
_repository.Requests.Where(r => ids.Any(a => a == r.Student.Id) && r.StartDate <= cycle.EndDate && r.EndDate >= cycle.StartDate)
So basically, I can checking if Student id is in my id list and checking if dates match.
Your filtered variable contains a query which is a question, and it doesn't contain the answer. If you request the answer by calling .ToList(), that is when the query is executed. And that is the reason why it is slow, because only when you call .ToList() is the query executed by your database.
It is called Deferred execution. A google might give you some more information about it.
If you show some of your conditions, we might be able to say why it is slow.
In addition to Maarten's answer I think the problem is about two different situation
some condition is complex and results in complex and heavy joins or query in your database
some condition is filtering on a column which does not have an index and this cause the full table scan and make your query slow.
I suggest start monitoring the query generated by Entity Framework, it's very simple, you just need to set Log function of your context and see the results,
using (var context = new MyContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
if you see something strange in generated query try to make it better by breaking it in parts, some times Entity Framework generated queries are not so good.
if the query is okay then the problem lies in your database (assuming no network problem).
run your query with an SQL profiler and check what's wrong.
UPDATE
I suggest you to:
add index for StartDate and EndDate Column in your table (one for each, not one for both)
ToList executes the query against DB, while first line is not.
Can you show some conditions code here?
To increase the performance you need to optimize query/create indexes on the DB tables.
Your first line of code only returns an IQueryable. This is a representation of a query that you want to run not the result of the query. The query itself is only runs on the databse when you call .ToList() on your IQueryable, because its the first point that you have actually asked for data.
Your adjustment to add the .Select only adds to the existing IQueryable query definition. It doesnt change what conditions have to execute. You have essentially changed the following, where you get back 8 records:
select * from Requests where [some conditions];
to something like:
select '' from Requests where [some conditions];
You will still have to perform the full query with the conditions giving you 8 records, but for each one, you only asked for an empty string, so you get back 8 empty strings.
The long and the short of this is that any performance problem you are having is coming from your "some conditions". Without seeing them, its is difficult to know. But I have seen people in the past add .Where clauses inside a loop, before calling .ToList() and inadvertently creating a massively complicated query.
Jaanus. The most likely reason of this issue is complecity of generated SQL query by entity framework. I guess that your filter condition contains some check of other tables.
Try to check generated query by "SQL Server Profiler". And then copy this query to "Management Studio" and check "Estimated execution plan". As a rule "Management Studio" generatd index recomendation for your query try to follow these recomendations.

Entity Framework + LINQ + "Contains" == Super Slow?

Trying to refactor some code that has gotten really slow recently and I came across a code block that is taking 5+ seconds to execute.
The code consists of 2 statements:
IEnumerable<int> StudentIds = _entities.Filters
.Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null)
.Select(x => x.StudentId)
.Distinct<int>();
and
_entities.StudentClassrooms
.Include("ClassroomTerm.Classroom.School.District")
.Include("ClassroomTerm.Teacher.Profile")
.Include("Student")
.Where(x => StudentIds.Contains(x.StudentId)
&& x.ClassroomTerm.IsActive
&& x.ClassroomTerm.Classroom.IsActive
&& x.ClassroomTerm.Classroom.School.IsActive
&& x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();
So it's a bit messy but first I get a Distinct list of Id's from one Table (Filters), then I query another Table using it.
These are relatively small tables, but it's still 5+ seconds of query time.
I put this in LINQPad and it showed that it was doing the bottom query first then running 1000 "distinct" queries afterwards.
On a whim I changed the "StudentIds" code by just adding .ToArray() at the end. This improved the speed 1000x ... it now takes like 100ms to complete the same query.
What's the deal? What am I doing wrong?
This is one of the pitfalls of deferred execution in Linq: In your first approach StudentIds is really an IQueryable, not an in-memory collection. That means using it in the second query will run the query again on the database - each and every time.
Forcing execution of the first query by using ToArray() makes StudentIds an in-memory collection and the Contains part in your second query will run over this collection that contains a fixed sequence of items - This gets mapped to something equivalent to a SQL where StudentId in (1,2,3,4) query.
This query will of course, be much much faster since you determined this sequence once up-front, and not every time the Where clause is executed. Your second query without using ToArray() (I would think) would be mapped to a SQL query with an where exists (...) sub-query that gets evaluated for each row.
ToArray() Materializes the initial query to the server memory.
My guess would be the query provider is not able to parse the expression StudentIds.Contains(x.StudentId). Hence it probably thinks that the studentIds is an array already loaded to memory. So it's probably querying the database over and over again during the parsing phase. The only way to know for sure is to setup the profiler.
If you need to do this on the db server, use a join, instead of "contains". If you need to use contains to do what looks like a join problem, you are likely to be missing a surrogate primary key or a foreign key somewhere.
You could also declare studentIds as IQueryable instead of IEnumerable. This might give the query provider the hint it needs to interpret the studentIds as expression aka. data not already loaded to memory. I somehow doubt this but worth a try.
If all else fails, use ToArray(). This will load the initial studentIds to memory.

Linq. Help me tune this!

I have a linq query that is causing some timeout issues. Basically, I have a query that is returning the top 100 results from a table that has approximately 500,000 records.
Here is the query:
using (var dc = CreateContext())
{
var accounts = string.IsNullOrEmpty(searchText)
? dc.Genealogy_Accounts
.Where(a => a.Genealogy_AccountClass.Searchable)
.OrderByDescending(a => a.ID)
.Take(100)
: dc.Genealogy_Accounts
.Where(a => (a.Code.StartsWith(searchText)
|| a.Name.StartsWith(searchText))
&& a.Genealogy_AccountClass.Searchable)
.OrderBy(a => a.Code)
.Take(100);
return accounts.Select(a =>
}
}
Oddly enough it is the first linq query that is causing the timeout. I thought that by doing a 'Take' we wouldn't need to scan all 500k of records. However, that must be what is happening. I'm guessing that the join to find what is 'searchable' is causing the issue. I'm not able to denormalize the tables... so I'm wondering if there is a way to rewrite the linq query to get it to return quicker... or if I should just write this query as a Stored Procedure (and if so, what might it look like). Thanks.
Well to start with, I'd find out what query is being generated (in LINQ to SQL you'd set the Log on the data context) and then profile it in SQL Server Management Studio. Play with it there until you've found something that is fast enough (either by changing the query or adding indexes) and if you've had to change the query, work out how to represent that in LINQ.
I suspect the problem is that you're combining OrderBy and Take - which means it potentially needs to find out all the results in order to work out which the top 100 would look like. Is Code indexed? If not, try indexing that - it may help by allowing the server to consider records in the order in which they'd be returned, so it can stop after it's found 100 records. You should look at indexes for the other columns too.
The Take(100) translates to "Select Top 100" etc. This would help if your problem was an otherwise huge result set, where there are a lot of columns returned. I bet though that your problem is a table scan resulting from the query. In this case, .Take(100) might not help much at all.
So, the likely culprit is the same as if you were doing SQL using ADO.NET: How are your Indxes? Are the fields being searched fields for which you don't have good indexes? This would cause a drastic decrease in performance compared to queries that do utilize good indexes. Add an index that includes Code and Name and see what happens. Not using an index for Code is guaranteed to hose you, because of the Order By. Also, what field links Genealogy_Accounts and Genealogy_AccountClass? A lack of index on either table could hose things. (I would guess an index including Searchable is unlikely to help.)
Use SQL Profiler to see the actual query being run (though you can do this in VS too), and to see how bad it really is on the server.
The problem might be LINQ doing something stupid generating the query, but this is probably not the case. We're finding LINQ-to-SQL often makes better queries than we do. Even if it looks goofy, it's usually very efficient. You can put the SQL in Query Analyzer, and check out the query plan. Then rewrite the SQL to be more human-simple and see if it improve things -- I bet it won't. I think you'll still see a table scan, indicating something is wrong with your index.

Why is the SQL produced by LINQ-to-Entities so inefficient?

The following (cut down) code excerpt is a Linq-To-Entities query that results in SQL (via ToTraceString) that is much slower than a hand crafted query. Am I doing anything stupid, or is Linq-to-Entities just bad at optimizing queries?
I have a ToList() at the end of the query as I need to execute it before using it to build an XML data structure (which was a whole other pain).
var result = (from mainEntity in entities.Main
where (mainEntity.Date >= today) && (mainEntity.Date <= tomorrow) && (!mainEntity.IsEnabled)
select new
{
Id = mainEntity.Id,
Sub =
from subEntity in mainEntity.Sub
select
{
Id = subEntity.Id,
FirstResults =
from firstResultEntity in subEntity.FirstResult
select new
{
Value = firstResultEntity.Value,
},
SecondResults =
from secondResultEntity in subEntity.SecondResult
select
{
Value = secondResultEntity.Value,
},
SubSub =
from subSubEntity in entities.SubSub
where (subEntity.Id == subSubEntity.MainId) && (subEntity.Id == subSubEntity.SubId)
select
new
{
Name = (from name in entities.Name
where subSubEntity.NameId == name.Id
select name.Name).FirstOrDefault()
}
}
}).ToList();
While working on this, I've also has some real problems with Dates. When I just tried to include returned dates in my data structure, I got internal error "1005".
Just as a general observation and not based on any practical experience with Linq-To-Entities (yet): having four nested subqueries inside a single query doesn't look like it's awfully efficient and speedy to begin with.
I think your very broad statement about the (lack of) quality of the SQL generated by Linq-to-Entities is not warranted - and you don't really back it up by much evidence, either.
Several well respected folks including Rico Mariani (MS Performance guru) and Julie Lerman (author of "Programming EF") have been showing in various tests that in general and overall, the Linq-to-SQL and Linq-to-Entities "engines" aren't really all that bad - they achieve overall at least 80-95% of the possible peak performance. Not every .NET app dev can achieve this :-)
Is there any way for you to rewrite that query or change the way you retrieve the bits and pieces that make up its contents?
Marc
Have you tried not materializing the result immediately by calling .ToList()? I'm not sure it will make a difference, but you might see improved performance if you iterate over the result instead of calling .ToList() ...
foreach( var r in result )
{
// build your XML
}
Also, you could try breaking up the one huge query into separate queries and then iterating over the results. Sending everything in one big gulp might be the issue.

Categories