SQL slowness under ADO Entity Framework - c#

I need to create a csv file that will contain all current Subscribers, plus a series of strings that came from the database.
to fetch all Subscribers I'm doing:
public IQueryble<Subscribers> ListAllSubscribersByCalendarId(Decimal cid)
{
return db.Subscribers.Where(x => x.calendar_id.Equals(cid));
}
pretty simple.
problem is that I already have more than 5000 and this takes forever (literally)!
even to show the last 30 records only, takes long long time, my query is:
public IQueryble<Subscribers> ListLast30SubscribersByCalendarId(Decimal cid)
{
return db.Subscribers
.Where(x => x.calendar_id.Equals(cid))
.Take(30)
.OrderByDescending(x => x.created_date);
}
What can I do to speed up this process?

Your expression is one that needs be checked on the client and cannot be converted to a SQL statement. So if you run SQL profiler, I bet you will see records trickling one by one to the client.
Change the condition to x.calendar_id == cid

two possible options:
create a nonclustered index for x.calendar_id:
or do the ordering in the application instead of sql:
public IQueryble<Subscribers> ListLast30SubscribersByCalendarId(Decimal cid)
{
return db.Subscribers
.Where(x => x.calendar_id.Equals(cid))
.Take(30)
}
var ordered = ListLast30SubscribersByCalendarId(1).ToList().OrderByDescending(x => x.created_date);

Aliostad's answer should help a lot. Also ensure that the calendar_id field is indexed correctly.
First, I'm just checking that you want to take the first 30 however the database determines which the first 30 are, and then sort those 30 by created date? If you want to grab the first 30 based on their creation date, then you'll want to include the created_date in your index appropriately so it can be used in the query.
Finally, this answer assumes you are using a tool that correctly maps these LINQ operators to an appropriate SQL query, and isn't just doing this entire operation on the entire record set in memory.

This should really happen in a stored procedure. Let the database server do what it's good at:like run queries crazy fast.

Related

SQL Server 2008 Skip Take with long parameter

This question is very likely to this question (so need to implement pagination with SQL Server 2008 and Entity Framework):
Offset/Fetch based paging (Implementation) in EntityFramework (Using LINQ) for SQL Server 2008
However the problem is that my DB have more than 10 billion rows. So basically Skip does not work, I need a Skip/Take methods that accept "long" as parameter. Any possible solution with Linq and EF? Thanks
Well, if it is necessary at all - expected the case in comments where you have to skip billions to take only 20 - to read as lot as possible data.
You could use a stored procedure that will get the amount to skip and
take.
Another approach would be to execute a plain SQL on your dbContext's database property.
If you want to use LINQ and you have increment IDs you could give following a try - but it is a bit more expensive and it will not cover the long value for data amount to retrieve:
At first execution you have to determine the ID of int.MaxValue-1. After that point you could re-use the logic for rest of paging. (maybe depending on a flag)
Use the ID as where clause (lower border) and add a Take with needed amount. From result's last record you will save the ID for next paging call where you will use it instead of first determined ID.
A possible approach code (not tested) could be (will not cover all your cases without any modifications):
private long _lastId;
public IEnumerable<Students> GetPage(int toTake)
{
List<Students> result;
result = isFirstPage
? _context.Students
.Take(toTake)
.ToList();
: _context.Students
.Where(s => s.Id > _lastId)
.Take(toTake)
.ToList();
_lastId = result.LastOrDefault()?.Id ?? 0;
return result;
}

Entity Framework LINQ for finding sub items from LastOrDefault parent

I have few related objects and relation is like
public class Project
{
public List<ProjectEdition> editions;
}
public class ProjectEdition
{
public List<EditionItem> items;
}
public class EditionItem
{
}
I wanted to fetch the EditionItems from Last entries of ProjectEditions only for each Project
Example
Project#1 -> Edition#1 [contains few edition items ] , Edition#2 [contains few edition items]
Project#2 -> Edition#1 ,Edition#2 and Edition#3
My required output contains EditionItems from Edition#2 of Project#1 and Edition#3 of Project#2 only . I mean EditionItems from latest edition of a Project or last edition of a Project only
To get this i tried this query
List<EditionItem> master_list = context.Projects.Select(x => x.ProjectEditions.LastOrDefault())
.SelectMany(x => x.EditionItems).ToList();
But its returns error at LatsOrDefault() section
An exception of type 'System.NotSupportedException' occurred in EntityFramework.SqlServer.dll but was not handled in user code
Additional information: LINQ to Entities does not recognize the method '---------.Models.ProjectEdition LastOrDefault[ProjectEdition](System.Collections.Generic.IEnumerable`1
so how can i filter for last edition of a project and then get the list of EditionItems from it in a single LINQ call
Granit got the answer right, so I won't repeat his code. I would like to add the reasons for this behaviour.
Entity Framework is magic (sometimes too much magic) but it yet translates your LINQ queries into SQL and there are limitations to that of what your underlying database can do (SQL Server in this case).
When you call context.Projects.FirstOrDefault() it is translated into something like Select TOP 1 * from Projects. Note the TOP 1 part - this is SQL Server operator that limits number of rows returned. This is part of query optimisation in SQL Server. SQL Server does not have any operators that will give you LAST 1 - because it needs to run the query, return all the results, take the last one and dump the rest - this is not very efficient, think of a table with a couple (bi)million records.
So you need to apply whatever required sort order to your query and limit number of rows you return. If you need last record from the query - apply reverse sort order. You do need to sort because SQL Server does not guarantee order of records returned if no Order By is applied to the query - this is due to the way the data is stored internally.
When you write LINQ queries with EF I do recommend keep an eye on what SQL is generated by your queries - sometimes you'll see how complex they come out and you can easily simplify the query. And sometimes with lazy-loading enabled you introduce N+1 problem with a stroke of a key (literally). I use ExpressProfiler to watch generated SQL, LinqPad can also show you the SQL queries and there are other tools.
You cannot use method LastOrDefault() or Last() as discussed here.
Insetad, you can use OrderByDescending() in conjunction with FirstOrDefault() but first you need to have a property in you ProjectEdition with which you want to order the entities. E.g. if ProjectEdition has a property Id (which there is a good chance it does), you can use the following LINQ query:
List<EditionItem> master_list = context.Projects.Select(
x => x.ProjectEditions
.OrderByDescending(pe => pe.Id)
.FirstOrDefault())
.SelectMany(x => x.EditionItems).ToList();
List<EditionItem> master_list = context.Projects
.Select(p => p.editions.LastOrDefault())
.SelectMany(pe => pe.items).ToList();
IF LastOrDefault not supported you can try using OrderByDescending
List<EditionItem> master_list = context.Projects
.Select(p => p.editions.OrderByDescending(e => e.somefield).FirstOrDefault())
.SelectMany(pe => pe.items).ToList();
from p in context.project
from e in p.projectEdition.LastOrDefault()
select new EditionItem
{
item1 = e.item1
}
Please try this

Least expensive operation to check for new records in table with EF

I have PostgreSQL running together with EF7.
My table structure:
Id (bigint)
Content (text)
CreatedAt (timestamp without time zone NOT NULL)
I have infinite scroll in my frontend app and query results with .Skip(n) & .Take(m).
I.e.
.Skip(0), .Take(10);
.Skip(10), .Take(10);
.Skip(20), .Take(10);
<...>
Now, while scrolling, if there're newer records, I have to know how many and add them to .Skip(n) function. I don't need to display them, just need to add them to consideration while skipping.
Currently I'm checking for them like so but this seemingly should be quite expensive after table will exceed 50k-100k records:
_myRepository.GetAll().Where(x => x.CreatedAt > newestActivityDate).Count();
GetAll():
public IQueryable<Activity> GetAll()
{
return _context.Activities.OrderByDescending(x => x.CreatedAt);
}
What would be the best (most performant) operation to check whether there're new records & how many? Checking by date and only then doing count if there're any new records?
EDIT:
Added GetAll() description for more clarity.
I don't know what your .GetAll() method does, so that really is the crux of your problem.
LINQ has what is called deferred execution. What that means is that the query will not get executed until the results are acted upon. So in the case of when you're building a SQL query, nothing will get sent to the database until you finish with it.
So if your .GetAll() method queries with a .ToList() in there, then it will pull everything from the database immediately before adding the rest of your filtering. In that case, yes, it will be very expensive.
However, you can get around that by just asking for the count. You already have most of that set up.
If you add a new method in your repository like this:
public int GetNewRecordsCount(DateTime newestActivityDate)
{
_context.Widgets.Count(x => x.CreatedAt > newestActivityDate);
}
Then this will create a different SQL query that is just something similar to this:
SELECT COUNT(*)
FROM Widgets
WHERE CreatedAt > newestActivityDate;
That is a very quick operation where the database will do all of the filtering for you. The only thing returned to your code is just a single row, single column result of the total count.

EF LINQ ToList is very slow

I am using ASP NET MVC 4.5 and EF6, code first migrations.
I have this code, which takes about 6 seconds.
var filtered = _repository.Requests.Where(r => some conditions); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes 6 seconds, has 8 items inside
I thought that this is because of relations, it must build them inside memory, but that is not the case, because even when I return 0 fields, it is still as slow.
var filtered = _repository.Requests.Where(r => some conditions).Select(e => new {}); // this is fast, conditions match only 8 items
var list = filtered.ToList(); // this takes still around 5-6 seconds, has 8 items inside
Now the Requests table is quite complex, lots of relations and has ~16k items. On the other hand, the filtered list should only contain proxies to 8 items.
Why is ToList() method so slow? I actually think the problem is not in ToList() method, but probably EF issue, or bad design problem.
Anyone has had experience with anything like this?
EDIT:
These are the conditions:
_repository.Requests.Where(r => ids.Any(a => a == r.Student.Id) && r.StartDate <= cycle.EndDate && r.EndDate >= cycle.StartDate)
So basically, I can checking if Student id is in my id list and checking if dates match.
Your filtered variable contains a query which is a question, and it doesn't contain the answer. If you request the answer by calling .ToList(), that is when the query is executed. And that is the reason why it is slow, because only when you call .ToList() is the query executed by your database.
It is called Deferred execution. A google might give you some more information about it.
If you show some of your conditions, we might be able to say why it is slow.
In addition to Maarten's answer I think the problem is about two different situation
some condition is complex and results in complex and heavy joins or query in your database
some condition is filtering on a column which does not have an index and this cause the full table scan and make your query slow.
I suggest start monitoring the query generated by Entity Framework, it's very simple, you just need to set Log function of your context and see the results,
using (var context = new MyContext())
{
context.Database.Log = Console.Write;
// Your code here...
}
if you see something strange in generated query try to make it better by breaking it in parts, some times Entity Framework generated queries are not so good.
if the query is okay then the problem lies in your database (assuming no network problem).
run your query with an SQL profiler and check what's wrong.
UPDATE
I suggest you to:
add index for StartDate and EndDate Column in your table (one for each, not one for both)
ToList executes the query against DB, while first line is not.
Can you show some conditions code here?
To increase the performance you need to optimize query/create indexes on the DB tables.
Your first line of code only returns an IQueryable. This is a representation of a query that you want to run not the result of the query. The query itself is only runs on the databse when you call .ToList() on your IQueryable, because its the first point that you have actually asked for data.
Your adjustment to add the .Select only adds to the existing IQueryable query definition. It doesnt change what conditions have to execute. You have essentially changed the following, where you get back 8 records:
select * from Requests where [some conditions];
to something like:
select '' from Requests where [some conditions];
You will still have to perform the full query with the conditions giving you 8 records, but for each one, you only asked for an empty string, so you get back 8 empty strings.
The long and the short of this is that any performance problem you are having is coming from your "some conditions". Without seeing them, its is difficult to know. But I have seen people in the past add .Where clauses inside a loop, before calling .ToList() and inadvertently creating a massively complicated query.
Jaanus. The most likely reason of this issue is complecity of generated SQL query by entity framework. I guess that your filter condition contains some check of other tables.
Try to check generated query by "SQL Server Profiler". And then copy this query to "Management Studio" and check "Estimated execution plan". As a rule "Management Studio" generatd index recomendation for your query try to follow these recomendations.

Linq to Entities : Count is very slow

I need to have parent and parent.child.count()....in the query.. when i do this it is taking 20 seconds....its not a huge database...Any ideas for optimization...
var plist = context.persons
.Select(p => new
{
p.fullName,
c.personID,
p.Status,
p.Birthdate,
p.Accounts.Count
}).ToList();
Here is a great article on using count() when you really meant to use any()
http://blogs.teamb.com/craigstuntz/2010/04/21/38598/
Do you need to use .count or could you use .any?
http://msdn.microsoft.com/en-us/library/bb534972.aspx
Since this is entity framework, open up the sql profiler and take a look at what sql queries are being sent to the database. It sounds like you may see that a single query is sent to fetch the group identifiers, and then another set of queries (one for each group) might be fetching the count. If that's happening, you'll have to post the linq query for someone to resolve the issue.
Based on the code you sent, it doesn't look like things should be taking that long. I have a few suggestions:
Use LinqPad to do this query. It will let you see the SQL that gets generated. Then run that SQL code in SQL Server Management Studio, and tell it to include the actual execution plan. This will help you learn whether there's a particular point in the query that's taking a lot of time. For example, if you don't have an index on the Account table's PersonId reference, this query will take a lot longer.
Look at how you're using this data. It's very rare that you really need to have all the people in your entire system in memory at the same time. In fact, I suspect that simply getting all this person data out of the database is probably taking a lot more time than the Count() is.
Are you displaying this data? If so, wouldn't it be better to "page" the results, only showing maybe ten entries at a time? You can use the .Take(int) method before calling .ToList() to get only as many entries as you need.
If you're processing and aggregating this data for the sake of site metrics, it's probably better to set up your query to return the end result before it gets evaluated.
If you can describe how this data is being used, or provide a screenshot of the SQL's execution, we can provide more feedback.
I solved a similar problem using the GroupBy method.
IEnumerable> accounts = Accounts.GroupBy(x => x.personID);
accounts.Count() will return the number of accounts that belong to the person.
accounts.Key will return the personID of the group.
I had a somehow similar problem, I tried these and worked out better :
child.count(x=> x.paretnID == inputParentID)
child.where(x=> x.parentID == inputParentID)
my original code which took around 15-20 seconds on each iteration was:
return (isEdit) ? db.ChasisBuys.Single(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Chasises.Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";
new code which runs good is :
**return (isEdit) ? db.Chasises.Where(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";**
Database has around 1000 records in chasises , about 5 in chasisBuys and about 20 in Bikes.
my opinion is that Linq to SQL queries does not do preevaluations such in logical statements which for instance if you write "return a && b && c;" if statement a is false other statements are not evaluated and I was expecting such thing in linq to sql but it's not the case.

Categories