Understanding PagedList mvc and Entity Framework (Slow Queries) - c#

I have a table in a MySQL database with a few thousand records. On my webpage, I display 10 records at a time using PagedList. I generally call the data like this.
using (var ctx = new mydbEntities())
{
var customers = ctx.customers.OrderBy(o => o.ID).ToPagedList(1,10);
}
If I was to look at the SQL that this generates, am I correct in saying that ToPagedList will only select 10 rows from the Database rather than return everything before taking ten from the result?
On some occasions, I use raw SQL as the query is quite complex and it is built up as a string depending on certain conditions. A simplified example.
using (var ctx = new mydbEntities())
{
List<MySqlParameter> parameters = new List<MySqlParameter>();
string strQry = "select * from visitor;";
var customers = ctx.Database.SqlQuery<customer>(strQry, parameters.ToArray()).OrderByDescending(o => o.ID).ToPagedList(1,10);
}
I guess this will return all records before applying paging?
Is there an efficient way to apply paging using PagedList to the latter example?
Thanks guys.

I think to better help you with your question it would be nice to take a look
at the generated sql for both queries.
From what i have seen on the Repo of the PagedList:
dncuug/X.PagedList
troygoode.PagedList
It will do 2 calls on the database:
will take the count of the query provided
will take the actual query with the correct take and skip.
Now your second query if you want to get better performance it may be more effective to write the code as a raw sql query to make the paging function as pointed by #Ivan Stoev.
Either way both queries will be executed twice when you are using the PagedList library.
Be alerted though when using .Database.SqlQuery<customer> the results you get are not cached inside Entity Framework and they will not be tracked even if they are valid Entity Object.
For more info about that check below:
Database.SqlQuery Method
DbRawSqlQuery Class

var customers = ctx.customers.OrderBy(o => o.ID).ToPagedList(1,10);
With this query; the Linq to entities translate the query as something like select * from customers order by ID OFFSET 1 ROWS FETCH NEXT 10 ROWS ONLY and this is which you desired.
var customers = ctx.Database.SqlQuery<customer>(strQry, parameters.ToArray()).OrderByDescending(o => o.ID).ToPagedList(1,10);
With this query; the strQry query is already fetched by SQL Server and records are started to retrieve. So, desired OrderBy and Pagination actions are performed in memory.
There are two options here;
Use the Linq To Entity queries and translate ToPagedList in
IQueryable form
Or modify the strQry query to perform order by and pagination by using something like that
select * from table order by cl1 OFFSET x ROWS FETCH NEXT y ROWS ONLY

Related

Filter IQueryable by values from another database using EF

I have two databases. I need to fetch some data (property values) from a first database and then filter out the records from a second database by those fetched values.
Now I am afraid that this operation may fail if the first database query will return too many records.
I generated an SQL query to read and checked that this query does not contain the filtering out by the values from the first table. So, it seems the LINQ to objects (instead of the LINQ to SQL) is applied in case of too many records. So, I decided to resort to using a HashSet instead of List to filter out values.
I had the following code:
IQueryable<Obj> query = ...
List<int> listOfSomeValues = ...
query = query.Where(obj => listOfSomeValues.Contains(obj.Prop));
Since the lookup operation in set should be faster than in list, I am changed my code to:
IQueryable<Obj> query = ...
List<int> listOfSomeValues = ...
var set = new HashSet<int>(listOfSomeValues);
query = query.Where(obj => set.Contains(obj.Prop));
Is there anything else I could do to improve the IQueryable filtering by values from another database using EF? Or is my only option is to resort to writing stored procedures which would make database do all the hard work?

Efficiently paging large data sets with LINQ

When looking into the best ways to implement paging in C# (using LINQ), most suggestions are something along these lines:
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
// Get the total num records
var total = query.Count();
// Page the results
var paged = query.Skip((pageNum - 1) * pageSize).Take(pageSize);
This seems to be the commonly suggested strategy (simplified).
For me, my main purpose in paging is for efficiency. If my table contains 1.2 million records where Something == something, I don't want to retrieve all of them at the same time. Instead, I want to page the data, grabbing as few records as possible. But with this method, it seems that this is a moot point.
If I understand it correctly, the first statement is still retrieving the 1.2 million records, then it is being paged as necessary.
Does paging in this way actually improve performance? If the 1.2 million records are going to be retrieved every time, what's the point (besides the obvious UI benefits)?
Am I misunderstanding this? Any .NET gurus out there that can give me a lesson on LINQ, paging, and performance (when dealing with large data sets)?
The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.
It is when you call query.Count() that the first will be executed
SELECT COUNT(*) FROM Table WHERE Something = something
On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).
If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.
Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
For your information, nothing is called after the first statement.
// Get the total num records
var total = query.Count();
This count query will be translated to SQL, and it'll make a call to database.
This call will not get all records, because the generated SQL is something like this:
SELECT COUNT(*) FROM Entity where Something LIKE 'something'
For the last query, it doesn't get all the records neither. The query will be translated into SQL, and the paging run in the database.
Maybe you'll find this question useful: efficient way to implement paging
I believe Entity Framework might structure the SQL query with the appropriate conditions based on the linq statements. (e.g. using ROWNUMBER() OVER ...).
I could be wrong on that, however. I'd run SQL profiler and see what the generated query looks like.

LINQ Select Statement making many SQL Calls

I am trying to run a Select on an IQUeryable linked to a database query. Its working correctly, but for all the properties being selecte-d, its running a seperate query.
My code looks something like this
IQueryable<MyDataSource> data = [Some Complicated Query I've been Building Up];
var results = data.Select(d => new
{
A = d.A,
B = d.B,
C = d.C
}).Take(100).ToArray();
Now, this is taking ages, even though the actual Query isn't taking that long.
When I ran an SQL profiler on it, I'm finding out that its running a different SQL select procedure for each property I'm selecting - for each entity I'm returning (so in the above example around 300 different queries, as well as the actual first query to perform the filtering).
I'm quite sure I'm doing something wrong here, what is it? I'm expecting it to run a single large query - which selects the right columns from the datasource (You know Select top 100 d.A, d.B, d.C from [bla bla] ), not all this mess.
You can influence the lazy/eager loading using DataLoadOptions
This will force eager loading for B
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<a>(a => a.B);
dc.LoadOptions = options;

Select rows from one DataTable not in another

I'm trying to get a list of rows in DataTableA where the value in Column 1 is not in Column1 of DataTableB.
I'm using the following LinQ query
//Not in Database
var query = from i in dtImport.AsEnumerable()
where !dtProducts.AsEnumerable().Any(p => p[colP] == i[colI])
select i;
Such that I want a list of products in the import table that aren't already in the products table.
Which seems to skip past the line quickly when I'm debugging but then when I call anything relating to that query such as int rows = query.Count<DataRow>(); or DataTable dtResult = query.CopyToDataTable(); it seems to take a long time so I just stop the program.
So, What am I doing wrong?
Linq uses deferred execution. The query is executed when it is used (not when declared)
For better performance you can use a HashSet like the following;
var set = new HashSet<int>(dtProducts.AsEnumerable().Select(p => p.colP));
var result = dtImport.AsEnumerable().Where(i => !set.Contains(i[colI])).ToList();
The slowdown is expected: the query does not get evaluated until you enumerate the results, so you skip this line in the debugger pretty quickly: all it does is preparing to query the data source; the querying is done on enumerating the results.
As far as I can tell without profiling your code, the issue is probably related to a big out-of-db select that happens when you convert dtProducts and dtImport to IEnumerable: essentially, you bring the data from both tables into memory before doing your select. If your tables are of considerable size, this is probably where most of the time goes. But again, the only sure way to tell is profiling.
Your query is slow, because it has to enumerate the products for each record in dtImport. Put the products into a dictionary first, to speed up your query.
var prod = dtProducts.AsEnumerable().ToDictionary(p => p[colP]);
var query = from imp in dtImport.AsEnumerable()
where !prod.ContainsKey(imp[colI])
select imp;

.Skip().Take() on Entity Framework Navigation Properties is executing SELECT * on my SQL Server

I have a method on my generated partial class like this:
var pChildren = this.Children
.Skip(skipRelated)
.Take(takeRelated)
.ToList();
When I look at my SQL Server, I can see the generated code is doing a SELECT *.* FROM Children This code is taken directly from my class, I have verified that the order of my Skip/Take is BEFORE my .ToList.
If I remove the .ToList, that line is fast (and no SQL is sent to my DB), but the moment I try to foreach over the results, I get the same SQL sent to my DB: SELECT *.* FROM Children.
Is there something special I need to do when using .Skip and .Take on the navigation properties of my entities?
update
I'll try to get the actual SQL generated, I'm not currently setup for that. I found the first one because it shows up in SSMS's "recenty expensive queries" list.
Running this:
var pChildren = this.Children
//.Skip(skipRelated)
//.Take(takeRelated)
.ToList();
returns ~4,000,000 rows and takes ~25 seconds.
Running this:
var pChildren = this.Children
//.Skip(skipRelated)
.Take(takeRelated)
.ToList();
returns ~4,000,000 rows and takes ~25 seconds.
As I said, I'll grab the SQL generated for these and pose them up as well.
The problem is that you are performing a LINQ-to-Object query when you query a child collection like that. EF will load the whole collection and perform the query in memory.
If you are using EF 4 you can query like this
var pChildren = this.Children.CreateSourceQuery()
.OrderBy(/* */).Skip(skipRelated).Take(takeRelated);
In EF 4.1
var pChildren = context.Entry(this)
.Collection(e => e.Children)
.Query()
.OrderBy(/* */).Skip(skipRelated).Take(takeRelated)
.Load();
Does it help if you call Skip on the result of Take? i.e.
table.Take(takeCount+skipCount).Skip(skipCount).ToList()
Also, see
TOP/LIMIT Support for LINQ?

Categories