Can i combine the following two linq queries in an single one, for speeding things up?
The first one, searches and performs the paging
Products.Data = db.Products.Where(x => x.ProductCode.Contains(search) ||
x.Name.Contains(search) ||
x.Description.Contains(search) ||
x.DescriptionExtra.Contains(search) ||
SqlFunctions.StringConvert(x.Price).Contains(search) ||
SqlFunctions.StringConvert(x.PriceOffer).Contains(search) ||
SqlFunctions.StringConvert(x.FinalPrice).Contains(search) ||
SqlFunctions.StringConvert(x.FinalPriceOffer).Contains(search))
.OrderBy(p => p.ProductID)
.Skip(PageSize * (page - 1))
.Take(PageSize).ToList();
while the second one counts the total filtered results.
int count = db.Products.Where(x => x.ProductCode.Contains(search) ||
x.Name.Contains(search) ||
x.Description.Contains(search) ||
x.DescriptionExtra.Contains(search) ||
SqlFunctions.StringConvert(x.Price).Contains(search) ||
SqlFunctions.StringConvert(x.PriceOffer).Contains(search) ||
SqlFunctions.StringConvert(x.FinalPrice).Contains(search) ||
SqlFunctions.StringConvert(x.FinalPriceOffer).Contains(search))
.Count();
Get rid of your ridiculously inefficient conversions.
SqlFunctions.StringConvert(x.Price).Contains(search) ||
No index use possible, full table scan, plus a conversion - that is as bad as it gets.
And make sure you have all indices.
Nothing else you can do.
I do not think You can combine them directly. This is one problem of paging - You need to know the total count of result anyway. Problem of dynamic paging further is, that one page can be inconsistent with another, because it is from different time. You can easily miss item completely because of this. If this can be a problem, I would avoid dynamic paging. You can fill ids of the whole result into some temporary table on server and do paging from there. Or You can return all ids from fulltext search and query the rest of data on demand.
There are some more optimizations, You can start returning results when the search string is at least 3 characters long, or You can build special table with count estimates for this purpose. You can also decide, that You would return only first ten pages and save server storage for ids (or client bandwidth for ids).
I am sad to see "Stop using contains" answers without alternative. Searching in the middle of the words is many times a must. The fact is, SQL server is terribly slow on text processing and searching is no exception. AFAIK even full-text indexes would not help You much with in-the-middle substring searching.
For the presented query on 10k records I would expect about 40ms per query to get counts or all results (my desktop). You can make computed persisted colum on this table with all texts concatenated and all numbers converted and query only that column. It would speed things up significantly (under 10ms for query on my desktop).
[computedCol] AS (((((((((([text1]+' ')+[text2])+' ')+[text3])+' ')+CONVERT([nvarchar](max),[d1]))+' ')+CONVERT([nvarchar](max),[d2]))+' ')+CONVERT([nvarchar](max),[d3])) PERSISTED
Stop using 'contains' function cause it's a very slow thing (if you can)
Make sure that Your queries can make use of index`es in the DB.
If You MUST have 'contains' - the take a look at full-text search capabilieties of SQL, but You might need to change this so pure sql or customise how your Linq is translated to SQL to make use of Full-Text indexes
Related
If I do a query like below where I'm searching for the same ID but on two different columns. Should I have an index like this? Or should I create 2 separate indexes, one for each column?
modelBuilder.Entity<Transfer>()
.HasIndex(p => new { p.SenderId, p.ReceiverId });
Query:
var transfersCount = await _dbContext.Transfers
.Where(p => p.ReceiverId == user.Id || p.SenderId == user.Id)
.CountAsync();
What if I have a query like this below, would I need a multicolumn index on all 4 columns?
var transfersCount = await _dbContext.Transfers
.Where(p => (p.SenderId == user.Id || p.ReceiverId == user.Id) &&
(!transferParams.Status.HasValue || p.TransferStatus == (TransferStatus)transferParams.Status) &&
(!transferParams.Type.HasValue || p.TransferType == (TransferType)transferParams.Type))
.CountAsync();
I recommend two single-column indices.
The two single-column indices will perform better in this query because both columns would be in a fully ordered index. By contrast, in a multi-column index, only the first column is fully ordered in the index.
If you were using an AND condition for the sender and receiver, then you would benefit from a multi-column index. The multi-column index is ideal for situations where multiple columns have conditional statements that must all be evaluated to build the result set (e.g., WHERE receiver = 1 AND sender = 2). In an OR condition, a multi-column index would be leveraged as though it were a single-column index only for the first column; the second column would be unindexed.
The full intricacies of index design would take well more than an SO answer to explain; there are probably books about it, and it will feature as a reasonable proportion of a database administrator's job
Indexes have a cost to maintain so you generally strive to have the fewest possible that offer you the most flexibility with what you want to do. Generally an index will have some columns that define its key and a reference to rows in the table that have those keys. When using an index the database engine can quickly look up the key, and discover which rows it needs to read from. It then looks up those rows as a secondary operation.
Indexes can also store table data that aren't part of the lookup key, so you might find yourself creating indexes that also track other columns from the row so that by the time the database has found the key it's looking for in the index it also has access to the row data the query wants and doesn't then need to launch a second lookup operation to find the row. If a query wants too many rows from a table, the database might decide to skip using the index at all; there's some threshold beyond which it's faster to just read all the rows direct from the table and search them rather than suffer the indirection of using the index to find which rows need to be read
The columns that an index indexes can serve more than one query; order is important. If you always query a person by name and also sometimes query by age, but you never query by age alone, it would be better to index (name,age) than (age,name). An index on (name,age) can serve a query for just WHERE name = ..., and also WHERR name = ... and age = .... If you use an OR keyword in a where clause you can consider that as a separate query entirely that would need its own index. Indeed the database might decide to run "name or age" as two parallel queries and combine the results to remove duplicates. If your app needs later change so that instead of just querying a mix of (name), (name and age) it is now frequently querying (name), (name and age), (name or age), (age), (age and height) then it might make sense to have two indexes: (name, age) plus (age, height). The database can use part or all of both of these to server the common queries. Remember that using part of an index only works from left to right. An index on (name, age) wouldn't typically serve a query for age alone.
If you're using SQLServer and SSMS you might find that showing the query plan also reveals a missing index recommendation and it's worth considering carefully whether an index needs to be added. Apps deployed to Microsoft azure also automatically look at common queries where performance suffers because of a lack of an index and it can be the impetus to take a look at the query being run and seeing how existing or new indexes might be extended or rearranged to cover it; as first noted it's not really something a single SO answer of a few lines can prep you for with a "always do this and it will be fine" - companies operating at large scale hire people whose sole mission is to make sure the database runs well they usually grumble a lot about the devs and more so about things like entity framework because an EF LINQ query is a layer disconnected from the actual SQL being run and may not be the most optimal approach to getting the data. All these things you have to contend with.
In this particular case it seems like indexes on SenderId+TransferStatus+TransferType and another on ReceiverId+TransferStatus+TransferType could help the two queries shown, but I wouldn't go as far as to say "definitely do that" without taking a holistic view of everything this table contains, how many different values there are in those columns and what it's used for in the context of the app. If Sender/Receiver are unique, there may be no point in adding more columns to the index as keys. If TransferStatus and Type change such that some combination of them helps uniquely identify some particular row out of hundreds then it may make sense, but then if this query only runs once a day compared to another that is used 10 times a second... There's too much variable and unknown to provide a concrete answer to the question as presented; don't optmize prematurely - indexing columns just because they're used in some where clause somewhere would be premature
This question is very likely to this question (so need to implement pagination with SQL Server 2008 and Entity Framework):
Offset/Fetch based paging (Implementation) in EntityFramework (Using LINQ) for SQL Server 2008
However the problem is that my DB have more than 10 billion rows. So basically Skip does not work, I need a Skip/Take methods that accept "long" as parameter. Any possible solution with Linq and EF? Thanks
Well, if it is necessary at all - expected the case in comments where you have to skip billions to take only 20 - to read as lot as possible data.
You could use a stored procedure that will get the amount to skip and
take.
Another approach would be to execute a plain SQL on your dbContext's database property.
If you want to use LINQ and you have increment IDs you could give following a try - but it is a bit more expensive and it will not cover the long value for data amount to retrieve:
At first execution you have to determine the ID of int.MaxValue-1. After that point you could re-use the logic for rest of paging. (maybe depending on a flag)
Use the ID as where clause (lower border) and add a Take with needed amount. From result's last record you will save the ID for next paging call where you will use it instead of first determined ID.
A possible approach code (not tested) could be (will not cover all your cases without any modifications):
private long _lastId;
public IEnumerable<Students> GetPage(int toTake)
{
List<Students> result;
result = isFirstPage
? _context.Students
.Take(toTake)
.ToList();
: _context.Students
.Where(s => s.Id > _lastId)
.Take(toTake)
.ToList();
_lastId = result.LastOrDefault()?.Id ?? 0;
return result;
}
Trying to refactor some code that has gotten really slow recently and I came across a code block that is taking 5+ seconds to execute.
The code consists of 2 statements:
IEnumerable<int> StudentIds = _entities.Filters
.Where(x => x.TeacherId == Profile.TeacherId.Value && x.StudentId != null)
.Select(x => x.StudentId)
.Distinct<int>();
and
_entities.StudentClassrooms
.Include("ClassroomTerm.Classroom.School.District")
.Include("ClassroomTerm.Teacher.Profile")
.Include("Student")
.Where(x => StudentIds.Contains(x.StudentId)
&& x.ClassroomTerm.IsActive
&& x.ClassroomTerm.Classroom.IsActive
&& x.ClassroomTerm.Classroom.School.IsActive
&& x.ClassroomTerm.Classroom.School.District.IsActive).AsQueryable<StudentClassroom>();
So it's a bit messy but first I get a Distinct list of Id's from one Table (Filters), then I query another Table using it.
These are relatively small tables, but it's still 5+ seconds of query time.
I put this in LINQPad and it showed that it was doing the bottom query first then running 1000 "distinct" queries afterwards.
On a whim I changed the "StudentIds" code by just adding .ToArray() at the end. This improved the speed 1000x ... it now takes like 100ms to complete the same query.
What's the deal? What am I doing wrong?
This is one of the pitfalls of deferred execution in Linq: In your first approach StudentIds is really an IQueryable, not an in-memory collection. That means using it in the second query will run the query again on the database - each and every time.
Forcing execution of the first query by using ToArray() makes StudentIds an in-memory collection and the Contains part in your second query will run over this collection that contains a fixed sequence of items - This gets mapped to something equivalent to a SQL where StudentId in (1,2,3,4) query.
This query will of course, be much much faster since you determined this sequence once up-front, and not every time the Where clause is executed. Your second query without using ToArray() (I would think) would be mapped to a SQL query with an where exists (...) sub-query that gets evaluated for each row.
ToArray() Materializes the initial query to the server memory.
My guess would be the query provider is not able to parse the expression StudentIds.Contains(x.StudentId). Hence it probably thinks that the studentIds is an array already loaded to memory. So it's probably querying the database over and over again during the parsing phase. The only way to know for sure is to setup the profiler.
If you need to do this on the db server, use a join, instead of "contains". If you need to use contains to do what looks like a join problem, you are likely to be missing a surrogate primary key or a foreign key somewhere.
You could also declare studentIds as IQueryable instead of IEnumerable. This might give the query provider the hint it needs to interpret the studentIds as expression aka. data not already loaded to memory. I somehow doubt this but worth a try.
If all else fails, use ToArray(). This will load the initial studentIds to memory.
I need to have parent and parent.child.count()....in the query.. when i do this it is taking 20 seconds....its not a huge database...Any ideas for optimization...
var plist = context.persons
.Select(p => new
{
p.fullName,
c.personID,
p.Status,
p.Birthdate,
p.Accounts.Count
}).ToList();
Here is a great article on using count() when you really meant to use any()
http://blogs.teamb.com/craigstuntz/2010/04/21/38598/
Do you need to use .count or could you use .any?
http://msdn.microsoft.com/en-us/library/bb534972.aspx
Since this is entity framework, open up the sql profiler and take a look at what sql queries are being sent to the database. It sounds like you may see that a single query is sent to fetch the group identifiers, and then another set of queries (one for each group) might be fetching the count. If that's happening, you'll have to post the linq query for someone to resolve the issue.
Based on the code you sent, it doesn't look like things should be taking that long. I have a few suggestions:
Use LinqPad to do this query. It will let you see the SQL that gets generated. Then run that SQL code in SQL Server Management Studio, and tell it to include the actual execution plan. This will help you learn whether there's a particular point in the query that's taking a lot of time. For example, if you don't have an index on the Account table's PersonId reference, this query will take a lot longer.
Look at how you're using this data. It's very rare that you really need to have all the people in your entire system in memory at the same time. In fact, I suspect that simply getting all this person data out of the database is probably taking a lot more time than the Count() is.
Are you displaying this data? If so, wouldn't it be better to "page" the results, only showing maybe ten entries at a time? You can use the .Take(int) method before calling .ToList() to get only as many entries as you need.
If you're processing and aggregating this data for the sake of site metrics, it's probably better to set up your query to return the end result before it gets evaluated.
If you can describe how this data is being used, or provide a screenshot of the SQL's execution, we can provide more feedback.
I solved a similar problem using the GroupBy method.
IEnumerable> accounts = Accounts.GroupBy(x => x.personID);
accounts.Count() will return the number of accounts that belong to the person.
accounts.Key will return the personID of the group.
I had a somehow similar problem, I tried these and worked out better :
child.count(x=> x.paretnID == inputParentID)
child.where(x=> x.parentID == inputParentID)
my original code which took around 15-20 seconds on each iteration was:
return (isEdit) ? db.ChasisBuys.Single(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Chasises.Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";
new code which runs good is :
**return (isEdit) ? db.Chasises.Where(x => x.ChasisBuyID == long.Parse(Request.QueryString["chbid"])).Count(y => y.Bikes.Count > 0 && y.ColorID == buyItems[(int)index].ColorID && y.ChasisTypeID == buyItems[(int)index].ChasisTypeID).ToString() : "-";**
Database has around 1000 records in chasises , about 5 in chasisBuys and about 20 in Bikes.
my opinion is that Linq to SQL queries does not do preevaluations such in logical statements which for instance if you write "return a && b && c;" if statement a is false other statements are not evaluated and I was expecting such thing in linq to sql but it's not the case.
I have a linq query that is causing some timeout issues. Basically, I have a query that is returning the top 100 results from a table that has approximately 500,000 records.
Here is the query:
using (var dc = CreateContext())
{
var accounts = string.IsNullOrEmpty(searchText)
? dc.Genealogy_Accounts
.Where(a => a.Genealogy_AccountClass.Searchable)
.OrderByDescending(a => a.ID)
.Take(100)
: dc.Genealogy_Accounts
.Where(a => (a.Code.StartsWith(searchText)
|| a.Name.StartsWith(searchText))
&& a.Genealogy_AccountClass.Searchable)
.OrderBy(a => a.Code)
.Take(100);
return accounts.Select(a =>
}
}
Oddly enough it is the first linq query that is causing the timeout. I thought that by doing a 'Take' we wouldn't need to scan all 500k of records. However, that must be what is happening. I'm guessing that the join to find what is 'searchable' is causing the issue. I'm not able to denormalize the tables... so I'm wondering if there is a way to rewrite the linq query to get it to return quicker... or if I should just write this query as a Stored Procedure (and if so, what might it look like). Thanks.
Well to start with, I'd find out what query is being generated (in LINQ to SQL you'd set the Log on the data context) and then profile it in SQL Server Management Studio. Play with it there until you've found something that is fast enough (either by changing the query or adding indexes) and if you've had to change the query, work out how to represent that in LINQ.
I suspect the problem is that you're combining OrderBy and Take - which means it potentially needs to find out all the results in order to work out which the top 100 would look like. Is Code indexed? If not, try indexing that - it may help by allowing the server to consider records in the order in which they'd be returned, so it can stop after it's found 100 records. You should look at indexes for the other columns too.
The Take(100) translates to "Select Top 100" etc. This would help if your problem was an otherwise huge result set, where there are a lot of columns returned. I bet though that your problem is a table scan resulting from the query. In this case, .Take(100) might not help much at all.
So, the likely culprit is the same as if you were doing SQL using ADO.NET: How are your Indxes? Are the fields being searched fields for which you don't have good indexes? This would cause a drastic decrease in performance compared to queries that do utilize good indexes. Add an index that includes Code and Name and see what happens. Not using an index for Code is guaranteed to hose you, because of the Order By. Also, what field links Genealogy_Accounts and Genealogy_AccountClass? A lack of index on either table could hose things. (I would guess an index including Searchable is unlikely to help.)
Use SQL Profiler to see the actual query being run (though you can do this in VS too), and to see how bad it really is on the server.
The problem might be LINQ doing something stupid generating the query, but this is probably not the case. We're finding LINQ-to-SQL often makes better queries than we do. Even if it looks goofy, it's usually very efficient. You can put the SQL in Query Analyzer, and check out the query plan. Then rewrite the SQL to be more human-simple and see if it improve things -- I bet it won't. I think you'll still see a table scan, indicating something is wrong with your index.