Which is faster between Linq to Sql And SQl Query - c#

I have List of object like this
List<Product> _products;
Then I get productId input and search in this list like this
var target = _peoducts.Where(o => o.productid == input).FirstOrDefault();
my Question is
If This list have 100 Products (productId from 1 to 100) and an
input I get productId = 100. that mean this Method must loop for 100
time Right ? (If I ORDER BY productId ASC in Query)
Between use this Method and Query on Database with where clause like
this WHERE productId = #param
Thank you.

No. If there is an index with key productId it finds the correct row with O(log n) operations
Just implement both methods and take the time. (hint: use StopWatch() class)
Edit
To get the full performance you should not create an intermediate (unsorted) List<T> but put all your logic in a LINQ query which operates on the SQL Server.

#might be helpful to get your answer.
https://www.linqpad.net/WhyLINQBeatsSQL.aspx

If you execute that Where on a List<Product>, then:
you got all 100 rows from the database
and then looped through all products in memory until you found the one that matches or until you went through the entire list and found nothing.
If, on the other hand, you used an IQueryable<Product> that was connected to the database table, then:
You wouldn't have read anything from the database yet
When you apply the Where, you still wouldn't read anything
When you apply the FirstOrDefault a sql query is constructed to find just the one row you need. Given correct indexes on the table, this would be quite fast.

Related

SQL Server 2008 Skip Take with long parameter

This question is very likely to this question (so need to implement pagination with SQL Server 2008 and Entity Framework):
Offset/Fetch based paging (Implementation) in EntityFramework (Using LINQ) for SQL Server 2008
However the problem is that my DB have more than 10 billion rows. So basically Skip does not work, I need a Skip/Take methods that accept "long" as parameter. Any possible solution with Linq and EF? Thanks
Well, if it is necessary at all - expected the case in comments where you have to skip billions to take only 20 - to read as lot as possible data.
You could use a stored procedure that will get the amount to skip and
take.
Another approach would be to execute a plain SQL on your dbContext's database property.
If you want to use LINQ and you have increment IDs you could give following a try - but it is a bit more expensive and it will not cover the long value for data amount to retrieve:
At first execution you have to determine the ID of int.MaxValue-1. After that point you could re-use the logic for rest of paging. (maybe depending on a flag)
Use the ID as where clause (lower border) and add a Take with needed amount. From result's last record you will save the ID for next paging call where you will use it instead of first determined ID.
A possible approach code (not tested) could be (will not cover all your cases without any modifications):
private long _lastId;
public IEnumerable<Students> GetPage(int toTake)
{
List<Students> result;
result = isFirstPage
? _context.Students
.Take(toTake)
.ToList();
: _context.Students
.Where(s => s.Id > _lastId)
.Take(toTake)
.ToList();
_lastId = result.LastOrDefault()?.Id ?? 0;
return result;
}

Efficiently paging large data sets with LINQ

When looking into the best ways to implement paging in C# (using LINQ), most suggestions are something along these lines:
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
// Get the total num records
var total = query.Count();
// Page the results
var paged = query.Skip((pageNum - 1) * pageSize).Take(pageSize);
This seems to be the commonly suggested strategy (simplified).
For me, my main purpose in paging is for efficiency. If my table contains 1.2 million records where Something == something, I don't want to retrieve all of them at the same time. Instead, I want to page the data, grabbing as few records as possible. But with this method, it seems that this is a moot point.
If I understand it correctly, the first statement is still retrieving the 1.2 million records, then it is being paged as necessary.
Does paging in this way actually improve performance? If the 1.2 million records are going to be retrieved every time, what's the point (besides the obvious UI benefits)?
Am I misunderstanding this? Any .NET gurus out there that can give me a lesson on LINQ, paging, and performance (when dealing with large data sets)?
The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.
It is when you call query.Count() that the first will be executed
SELECT COUNT(*) FROM Table WHERE Something = something
On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).
If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.
Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
For your information, nothing is called after the first statement.
// Get the total num records
var total = query.Count();
This count query will be translated to SQL, and it'll make a call to database.
This call will not get all records, because the generated SQL is something like this:
SELECT COUNT(*) FROM Entity where Something LIKE 'something'
For the last query, it doesn't get all the records neither. The query will be translated into SQL, and the paging run in the database.
Maybe you'll find this question useful: efficient way to implement paging
I believe Entity Framework might structure the SQL query with the appropriate conditions based on the linq statements. (e.g. using ROWNUMBER() OVER ...).
I could be wrong on that, however. I'd run SQL profiler and see what the generated query looks like.

Limit Number of Results being returned in a List from Linq

I'm using Linq/EF4.1 to pull some results from a database and would like to limit the results to the (X) most recent results. Where X is a number set by the user.
Is there a way to do this?
I'm currently passing them back as a List if this will help with limiting the result set. While I can limit this by looping until I hit X I'd just assume not pass the extra data around.
Just in case it is relevant...
C# MVC3 project running from a SQL Server database.
Use the Take function
int numberOfrecords=10; // read from user
listOfItems.OrderByDescending(x => x.CreatedDate).Take(numberOfrecords)
Assuming listOfItems is List of your entity objects and CreatedDate is a field which has the date created value (used here to do the Order by descending to get recent items).
Take() Function returns a specified number of contiguous elements from the start of a
sequence.
http://msdn.microsoft.com/en-us/library/bb503062.aspx
results = results.OrderByDescending(x=>x.Date).Take(10);
The OrderByDescending(...) will sort items by your date/time property (or w/e logic you want to use to get most recent) and Take(...) will limit to first x items (first being most recent, thanks to the ordering).
Edit: To return some rows not starting at the first row, use Skip():
results = results.OrderByDescending(x=>x.Date).Skip(50).Take(10);
Use Take(), before converting to a List. This way EF can optimize the query it creates and only return the data you need.

Select rows from one DataTable not in another

I'm trying to get a list of rows in DataTableA where the value in Column 1 is not in Column1 of DataTableB.
I'm using the following LinQ query
//Not in Database
var query = from i in dtImport.AsEnumerable()
where !dtProducts.AsEnumerable().Any(p => p[colP] == i[colI])
select i;
Such that I want a list of products in the import table that aren't already in the products table.
Which seems to skip past the line quickly when I'm debugging but then when I call anything relating to that query such as int rows = query.Count<DataRow>(); or DataTable dtResult = query.CopyToDataTable(); it seems to take a long time so I just stop the program.
So, What am I doing wrong?
Linq uses deferred execution. The query is executed when it is used (not when declared)
For better performance you can use a HashSet like the following;
var set = new HashSet<int>(dtProducts.AsEnumerable().Select(p => p.colP));
var result = dtImport.AsEnumerable().Where(i => !set.Contains(i[colI])).ToList();
The slowdown is expected: the query does not get evaluated until you enumerate the results, so you skip this line in the debugger pretty quickly: all it does is preparing to query the data source; the querying is done on enumerating the results.
As far as I can tell without profiling your code, the issue is probably related to a big out-of-db select that happens when you convert dtProducts and dtImport to IEnumerable: essentially, you bring the data from both tables into memory before doing your select. If your tables are of considerable size, this is probably where most of the time goes. But again, the only sure way to tell is profiling.
Your query is slow, because it has to enumerate the products for each record in dtImport. Put the products into a dictionary first, to speed up your query.
var prod = dtProducts.AsEnumerable().ToDictionary(p => p[colP]);
var query = from imp in dtImport.AsEnumerable()
where !prod.ContainsKey(imp[colI])
select imp;

Collecting metadata into table

I have tabluar data that passes through a C# program that I need to collect some metadata on before finishing. The metadata is always counts based on fields of the data. Also, I need them all grouped by one field in the data. Periodically, I need to add new counts to this collection of metadata.
I've been researching it for a little while, and I think what makes sense is to rework my program to store the data as a DataTable, then run LINQ queries on the table. The problem I'm having is being able to put the different counts into one table-like structure and then write that out.
I might run a query like this:
var query01 =
from record in records.AsEnumerable()
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Count = associationsGroup.Count<DataRow>() };
To get a count of all of the records grouped by the field Association Key. I'm going to want another count, grouped in the same way:
var query02 =
from record in records.AsEnumerable()
where record.Field<String>("Number 9") == "yes"
group record by record.Field<String>("Association Key") into associationsGroup
select new { AssociationKey = associationsGroup.Key, Number9Count = associationsGroup.Count<DataRow>() };
And so on.
I thought about trying Union chain the queries but I was having trouble getting them to union since I'm projecting into anonymous types. I couldn't figure out how to do it differently to make a union work better.
So, how can I collect my metadata into one table-like structure?
Not going to union because you have different types. Add Number9Count and Count to both annonymous types and try union again.
I ended up solving the problem by creating a class that holds the set of records I need as a DataTable. A user can add queries through a method, taking an argument Func<DataRow, bool>. The method constructs the query supplying that argument as the where clause, maintaining the same grouping and properties in the resulting anonymous-typed object.
When retrieving the results, the class iterates over each query stored and enters the results into a new DataTable.

Categories