Select rows from one DataTable not in another - c#

I'm trying to get a list of rows in DataTableA where the value in Column 1 is not in Column1 of DataTableB.
I'm using the following LinQ query
//Not in Database
var query = from i in dtImport.AsEnumerable()
where !dtProducts.AsEnumerable().Any(p => p[colP] == i[colI])
select i;
Such that I want a list of products in the import table that aren't already in the products table.
Which seems to skip past the line quickly when I'm debugging but then when I call anything relating to that query such as int rows = query.Count<DataRow>(); or DataTable dtResult = query.CopyToDataTable(); it seems to take a long time so I just stop the program.
So, What am I doing wrong?

Linq uses deferred execution. The query is executed when it is used (not when declared)
For better performance you can use a HashSet like the following;
var set = new HashSet<int>(dtProducts.AsEnumerable().Select(p => p.colP));
var result = dtImport.AsEnumerable().Where(i => !set.Contains(i[colI])).ToList();

The slowdown is expected: the query does not get evaluated until you enumerate the results, so you skip this line in the debugger pretty quickly: all it does is preparing to query the data source; the querying is done on enumerating the results.
As far as I can tell without profiling your code, the issue is probably related to a big out-of-db select that happens when you convert dtProducts and dtImport to IEnumerable: essentially, you bring the data from both tables into memory before doing your select. If your tables are of considerable size, this is probably where most of the time goes. But again, the only sure way to tell is profiling.

Your query is slow, because it has to enumerate the products for each record in dtImport. Put the products into a dictionary first, to speed up your query.
var prod = dtProducts.AsEnumerable().ToDictionary(p => p[colP]);
var query = from imp in dtImport.AsEnumerable()
where !prod.ContainsKey(imp[colI])
select imp;

Related

Understanding PagedList mvc and Entity Framework (Slow Queries)

I have a table in a MySQL database with a few thousand records. On my webpage, I display 10 records at a time using PagedList. I generally call the data like this.
using (var ctx = new mydbEntities())
{
var customers = ctx.customers.OrderBy(o => o.ID).ToPagedList(1,10);
}
If I was to look at the SQL that this generates, am I correct in saying that ToPagedList will only select 10 rows from the Database rather than return everything before taking ten from the result?
On some occasions, I use raw SQL as the query is quite complex and it is built up as a string depending on certain conditions. A simplified example.
using (var ctx = new mydbEntities())
{
List<MySqlParameter> parameters = new List<MySqlParameter>();
string strQry = "select * from visitor;";
var customers = ctx.Database.SqlQuery<customer>(strQry, parameters.ToArray()).OrderByDescending(o => o.ID).ToPagedList(1,10);
}
I guess this will return all records before applying paging?
Is there an efficient way to apply paging using PagedList to the latter example?
Thanks guys.
I think to better help you with your question it would be nice to take a look
at the generated sql for both queries.
From what i have seen on the Repo of the PagedList:
dncuug/X.PagedList
troygoode.PagedList
It will do 2 calls on the database:
will take the count of the query provided
will take the actual query with the correct take and skip.
Now your second query if you want to get better performance it may be more effective to write the code as a raw sql query to make the paging function as pointed by #Ivan Stoev.
Either way both queries will be executed twice when you are using the PagedList library.
Be alerted though when using .Database.SqlQuery<customer> the results you get are not cached inside Entity Framework and they will not be tracked even if they are valid Entity Object.
For more info about that check below:
Database.SqlQuery Method
DbRawSqlQuery Class
var customers = ctx.customers.OrderBy(o => o.ID).ToPagedList(1,10);
With this query; the Linq to entities translate the query as something like select * from customers order by ID OFFSET 1 ROWS FETCH NEXT 10 ROWS ONLY and this is which you desired.
var customers = ctx.Database.SqlQuery<customer>(strQry, parameters.ToArray()).OrderByDescending(o => o.ID).ToPagedList(1,10);
With this query; the strQry query is already fetched by SQL Server and records are started to retrieve. So, desired OrderBy and Pagination actions are performed in memory.
There are two options here;
Use the Linq To Entity queries and translate ToPagedList in
IQueryable form
Or modify the strQry query to perform order by and pagination by using something like that
select * from table order by cl1 OFFSET x ROWS FETCH NEXT y ROWS ONLY

Which is faster between Linq to Sql And SQl Query

I have List of object like this
List<Product> _products;
Then I get productId input and search in this list like this
var target = _peoducts.Where(o => o.productid == input).FirstOrDefault();
my Question is
If This list have 100 Products (productId from 1 to 100) and an
input I get productId = 100. that mean this Method must loop for 100
time Right ? (If I ORDER BY productId ASC in Query)
Between use this Method and Query on Database with where clause like
this WHERE productId = #param
Thank you.
No. If there is an index with key productId it finds the correct row with O(log n) operations
Just implement both methods and take the time. (hint: use StopWatch() class)
Edit
To get the full performance you should not create an intermediate (unsorted) List<T> but put all your logic in a LINQ query which operates on the SQL Server.
#might be helpful to get your answer.
https://www.linqpad.net/WhyLINQBeatsSQL.aspx
If you execute that Where on a List<Product>, then:
you got all 100 rows from the database
and then looped through all products in memory until you found the one that matches or until you went through the entire list and found nothing.
If, on the other hand, you used an IQueryable<Product> that was connected to the database table, then:
You wouldn't have read anything from the database yet
When you apply the Where, you still wouldn't read anything
When you apply the FirstOrDefault a sql query is constructed to find just the one row you need. Given correct indexes on the table, this would be quite fast.

Efficiently paging large data sets with LINQ

When looking into the best ways to implement paging in C# (using LINQ), most suggestions are something along these lines:
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
// Get the total num records
var total = query.Count();
// Page the results
var paged = query.Skip((pageNum - 1) * pageSize).Take(pageSize);
This seems to be the commonly suggested strategy (simplified).
For me, my main purpose in paging is for efficiency. If my table contains 1.2 million records where Something == something, I don't want to retrieve all of them at the same time. Instead, I want to page the data, grabbing as few records as possible. But with this method, it seems that this is a moot point.
If I understand it correctly, the first statement is still retrieving the 1.2 million records, then it is being paged as necessary.
Does paging in this way actually improve performance? If the 1.2 million records are going to be retrieved every time, what's the point (besides the obvious UI benefits)?
Am I misunderstanding this? Any .NET gurus out there that can give me a lesson on LINQ, paging, and performance (when dealing with large data sets)?
The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.
It is when you call query.Count() that the first will be executed
SELECT COUNT(*) FROM Table WHERE Something = something
On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).
If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.
Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
For your information, nothing is called after the first statement.
// Get the total num records
var total = query.Count();
This count query will be translated to SQL, and it'll make a call to database.
This call will not get all records, because the generated SQL is something like this:
SELECT COUNT(*) FROM Entity where Something LIKE 'something'
For the last query, it doesn't get all the records neither. The query will be translated into SQL, and the paging run in the database.
Maybe you'll find this question useful: efficient way to implement paging
I believe Entity Framework might structure the SQL query with the appropriate conditions based on the linq statements. (e.g. using ROWNUMBER() OVER ...).
I could be wrong on that, however. I'd run SQL profiler and see what the generated query looks like.

LINQ Select Statement making many SQL Calls

I am trying to run a Select on an IQUeryable linked to a database query. Its working correctly, but for all the properties being selecte-d, its running a seperate query.
My code looks something like this
IQueryable<MyDataSource> data = [Some Complicated Query I've been Building Up];
var results = data.Select(d => new
{
A = d.A,
B = d.B,
C = d.C
}).Take(100).ToArray();
Now, this is taking ages, even though the actual Query isn't taking that long.
When I ran an SQL profiler on it, I'm finding out that its running a different SQL select procedure for each property I'm selecting - for each entity I'm returning (so in the above example around 300 different queries, as well as the actual first query to perform the filtering).
I'm quite sure I'm doing something wrong here, what is it? I'm expecting it to run a single large query - which selects the right columns from the datasource (You know Select top 100 d.A, d.B, d.C from [bla bla] ), not all this mess.
You can influence the lazy/eager loading using DataLoadOptions
This will force eager loading for B
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<a>(a => a.B);
dc.LoadOptions = options;

Are different IQueryable objects combined?

I have a little program that needs to do some calculation on a data range. The range maybe contain about half a millon of records. I just looked to my db and saw that a group by was executed.
I thought that the result was executed on the first line, and later I just worked with data in RAM. But now I think that the query builder combine the expression.
var Test = db.Test.Where(x => x > Date.Now.AddDays(-7));
var Test2 = (from p in Test
group p by p.CustomerId into g
select new { UniqueCount = g.Count() } );
In my real world app I got more subqueries that is based on the range selected by the first query. I think I just added a big overhead to let the DB make different selects.
Now I bascilly just call .ToList() after the first expression.
So my question is am I right about that the query builder combine different IQueryable when it builds the expression tree?
Yes, you are correct. LINQ expressions are lazily evaluated at the moment you evaluate them (via .ToList(), for example). At that point in time, Entity Framework will look at the total query and build an SQL statement to represent it.
In this particular case, it's probably wiser to not evaluate the first query, because the SQL database is optimized for performing set-based operations like grouping and counting. Rather than forcing the database to send all the Test objects across the wire, deserializing the results into in-memory objects, and then performing the grouping and counting locally, you will likely see better performance by having the SQL database just return the resulting Counts.

Categories