How Dapper.NET works internally with .Count() and SingleOrDefault()?

How Dapper.NET works internally with .Count() and SingleOrDefault()? - c#

I am new to Dapper though I am aware about ORMs and DAL and have implemented DAL with NHibernate earlier.
Example Query: -
string sql = "SELECT * FROM MyTable";
public int GetCount()
{
var result = Connection.Query<MyTablePoco>(sql).Count();
return result;
}
Will Dapper convert this query (internally) to SELECT COUNT(*) FROM MyTable looking at .Count() at the end?
Similarly, will it convert to SELECT TOP 1 * FROM MyTable in case of SingleOrDefault()?
I came from NHibernate world where it generates query accordingly. I am not sure about Dapper though. As I am working with MS Access, I do not see a way to check the query generated.

No, dapper will not adjust your query. The immediate way to tell this is simply: does the method return IEnumerable... vs IQueryable...? If it is the first, then it can only use local in-memory mechanisms.
Specifically, by default, Query will actually return a fully populated List<>. LINQ's Count() method recognises that and just accesses the .Count property of the list. So all the data is fetched from the database.
If you want to ask the database for the count, ask the database for the count.
As for mechanisms to view what is actually sent to the database: we use mini-profiler for this. It works great.
Note: when you are querying exactly one row: QueryFirstOrDefault (and the other variants you would expect) exist and have optimizations internally (including hints to ADO.NET, although not all providers can act on those things) to do things as efficiently as possible, but it does not adjust your query. In some cases the provider itself (not dapper) can help, but ultimately: if you only want the first row, ask the database for the first row (TOP or similar).

Related

Efficiently paging large data sets with LINQ

When looking into the best ways to implement paging in C# (using LINQ), most suggestions are something along these lines:
// Execute the query
var query = db.Entity.Where(e => e.Something == something);
// Get the total num records
var total = query.Count();
// Page the results
var paged = query.Skip((pageNum - 1) * pageSize).Take(pageSize);
This seems to be the commonly suggested strategy (simplified).
For me, my main purpose in paging is for efficiency. If my table contains 1.2 million records where Something == something, I don't want to retrieve all of them at the same time. Instead, I want to page the data, grabbing as few records as possible. But with this method, it seems that this is a moot point.
If I understand it correctly, the first statement is still retrieving the 1.2 million records, then it is being paged as necessary.
Does paging in this way actually improve performance? If the 1.2 million records are going to be retrieved every time, what's the point (besides the obvious UI benefits)?
Am I misunderstanding this? Any .NET gurus out there that can give me a lesson on LINQ, paging, and performance (when dealing with large data sets)?

The first statement does not execute the actual SQL query, it only builds part of the query you intend to run.
It is when you call query.Count() that the first will be executed
SELECT COUNT(*) FROM Table WHERE Something = something
On query.Skip().Take() won't execute the query either, it is only when you try to enumerate the results(doing a foreach over paged or calling .ToList() on it) that it will execute the appropriate SQL statement retrieving only the rows for the page (using ROW_NUMBER).
If watch this in the SQL Profiler you will see that exactly two queries are executed and at no point it will try to retrieve the full table.
Be careful when you are using the debugger, because if you step after the first statement and try to look at the contents of query that will execute the SQL query. Maybe that is the source of your misunderstanding.

// Execute the query
var query = db.Entity.Where(e => e.Something == something);
For your information, nothing is called after the first statement.
// Get the total num records
var total = query.Count();
This count query will be translated to SQL, and it'll make a call to database.
This call will not get all records, because the generated SQL is something like this:
SELECT COUNT(*) FROM Entity where Something LIKE 'something'
For the last query, it doesn't get all the records neither. The query will be translated into SQL, and the paging run in the database.
Maybe you'll find this question useful: efficient way to implement paging

I believe Entity Framework might structure the SQL query with the appropriate conditions based on the linq statements. (e.g. using ROWNUMBER() OVER ...).
I could be wrong on that, however. I'd run SQL profiler and see what the generated query looks like.

Querying with many (~100) search terms with Entity Framework

I need to do a query on my database that might be something like this where there could realistically be 100 or more search terms.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
IQueryable<Address> addressQuery = DbContext.Addresses;
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
return addressQuery;
}
However when it contains more than about 15 terms it throws and exception on execution because the SQL generated is too long.
Can this kind of query be done through Entity Framework?
What other options are there available to complete a query like this?

Sorry, are we talking about THIS EXACT SQL?
In that case it is a very simple "open your eyes thing".
There is a way (contains) to map that string into an IN Clause, that results in ONE sql condition (town in ('','',''))
Let me see whether I get this right:
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
should be
addressQuery.Where ( x => towns.Contains (x.Town)
The resulting SQL will be a LOT smaller. 100 items is still taxing it - I would dare saying you may have a db or app design issue here and that requires a business side analysis, I have not me this requirement in 20 years I work with databases.

This looks like a scenario where you'd want to use the PredicateBuilder as this will help you create an Or based predicate and construct your dynamic lambda expression.
This is part of a library called LinqKit by Joseph Albahari who created LinqPad.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
var predicate = PredicateBuilder.False<Address>();
foreach (string town in towns)
{
string temp = town;
predicate = predicate.Or (p => p.Town.Equals(temp));
}
return DbContext.Addresses.Where (predicate);
}

You've broadly got two options:
You can replace .Any with a .Contains alternative.
You can use plain SQL with table-valued-parameters.
Using .Contains is easier to implement and will help performance because it translated to an inline sql IN clause; so 100 towns shouldn't be a problem. However, it also means that the exact sql depends on the exact number of towns: you're forcing sql-server to recompile the query for each number of towns. These recompilations can be expensive when the query is complex; and they can evict other query plans from the cache as well.
Using table-valued-parameters is the more general solution, but it's more work to implement, particularly because it means you'll need to write the SQL query yourself and cannot rely on the entity framework. (Using ObjectContext.Translate you can still unpack the query results into strongly-typed objects, despite writing sql). Unfortunately, you cannot use the entity framework yet to pass a lot of data to sql server efficiently. The entity framework doesn't support table-valued-parameters, nor temporary tables (it's a commonly requested feature, however).
A bit of TVP sql would look like this select ... from ... join #townTableArg townArg on townArg.town = address.town or select ... from ... where address.town in (select town from #townTableArg).
You probably can work around the EF restriction, but it's not going to be fast and will probably be tricky. A workaround would be to insert your values into some intermediate table, then join with that - that's still 100 inserts, but those are separate statements. If a future version of EF supports batch CUD statements, this might actually work reasonably.
Almost equivalent to table-valued paramters would be to bulk-insert into a temporary table and join with that in your query. Mostly that just means you're table name will start with '#' rather than '#' :-). The temp table has a little more overhead, but you can put indexes on it and in some cases that means the subsequent query will be much faster (for really huge data-quantities).
Unfortunately, using either temporary tables or bulk insert from C# is a hassle. The simplest solution here is to make a DataTable; this can be passed to either. However, datatables are relatively slow; the over might be relevant once you start adding millions of rows. The fastest (general) solution is to implement a custom IDataReader, almost as fast is an IEnumerable<SqlDataRecord>.
By the way, to use a table-valued-parameter, the shape ("type") of the table parameter needs to be declared on the server; if you use a temporary table you'll need to create it too.
Some pointers to get you started:
http://lennilobel.wordpress.com/2009/07/29/sql-server-2008-table-valued-parameters-and-c-custom-iterators-a-match-made-in-heaven/
SqlBulkCopy from a List<>

lightswitch LINQ PreprocessQuery

I use the PreprocessQuery method to extend a query in lightswitch.
Something like this:
query = (from item in query
where (validIDs.Contains(item.tableIDs.myID)) &&
elementCount[item.ID] <= maxEleCount)
select item);
Where validIDs is a HashSet(int) and elementCount is a Dictionary(int, int).
the first where clause is working fine, but the second -> elementCount[item.ID] <= maxEleCount
is not working.
What i want to do is to filter a table by some IDs (validIDs) and check also if in another table the number of entries for every of this IDs does not exceed a limit.
Any ideas?
EDIT
I found a solution. Instead of a Dictionary I also used a HashSet for the second where clause. It seems it is not possible to do the Dictionary lookup inside the LINQ statement for some reason (?)

First, although being a bit pedantic, what you're doing in a PreProcessQuery method is "restricting" records in the query, not "extending" the query.
What you put in a LING query has to be able to be processed by the Entity Framework data provider (in the case of LS, the SQL Server Data Provider).
Sometimes you'll find that while your LINQ query compiles, it fails at runtime. This is because the data provider is unable to express it to the data store (again in this case SQL Server).
You're normally restricted to "primitive" values, so if you hadn't said that using a Dictionary actually worked, I would have said that it wouldn't.
Any time you have a static (as in non-changing) value, I'd suggest that you create a variable outside of your LINQ query, then use the variable in the LINQ query. By doing this, you're simply passing a value, the data provider doesn't have to try to figure out how to pass it to the data store.
Reading your code again, this might not be what you're doing, but hopefully this explanation will still be helpful.

Selecting first 100 records using Linq

How can I return first 100 records using Linq?
I have a table with 40million records.
This code works, but it's slow, because will return all values before filter:
var values = (from e in dataContext.table_sample
where e.x == 1
select e)
.Take(100);
Is there a way to return filtered? Like T-SQL TOP clause?

No, that doesn't return all the values before filtering. The Take(100) will end up being part of the SQL sent up - quite possibly using TOP.
Of course, it makes more sense to do that when you've specified an orderby clause.
LINQ doesn't execute the query when it reaches the end of your query expression. It only sends up any SQL when either you call an aggregation operator (e.g. Count or Any) or you start iterating through the results. Even calling Take doesn't actually execute the query - you might want to put more filtering on it afterwards, for instance, which could end up being part of the query.
When you start iterating over the results (typically with foreach) - that's when the SQL will actually be sent to the database.
(I think your where clause is a bit broken, by the way. If you've got problems with your real code it would help to see code as close to reality as possible.)

I don't think you are right about it returning all records before taking the top 100. I think Linq decides what the SQL string is going to be at the time the query is executed (aka Lazy Loading), and your database server will optimize it out.

Have you compared standard SQL query with your linq query? Which one is faster and how significant is the difference?
I do agree with above comments that your linq query is generally correct, but...
in your 'where' clause should probably be x==1 not x=1 (comparison instead of assignment)
'select e' will return all columns where you probably need only some of them - be more precise with select clause (type only required columns); 'select *' is a vaste of resources
make sure your database is well indexed and try to make use of indexed data
Anyway, 40milions records database is quite huge - do you need all that data all the time? Maybe some kind of partitioning can reduce it to the most commonly used records.

I agree with Jon Skeet, but just wanted to add:
The generated SQL will use TOP to implement Take().
If you're able to run SQL-Profiler and step through your code in debug mode, you will be able to see exactly what SQL is generated and when it gets executed. If you find the time to do this, you will learn a lot about what happens underneath.
There is also a DataContext.Log property that you can assign a TextWriter to view the SQL generated, for example:
dbContext.Log = Console.Out;
Another option is to experiment with LINQPad. LINQPad allows you to connect to your datasource and easily try different LINQ expressions. In the results panel, you can switch to see the SQL generated the LINQ expression.

I'm going to go out on a limb and guess that you don't have an index on the column used in your where clause. If that's the case then it's undoubtedly doing a table scan when the query is materialized and that's why it's taking so long.

How to get linq to produce exactly the sql I want?

It is second nature for me to whip up some elaborate SQL set processing code to solve various domain model questions. However, the trend is not to touch SQL anymore. Is there some pattern reference or conversion tool out there that helps convert the various SQL patterns to Linq syntax?
I would look-up ways to code things like the following code: (this has a sub query):
SELECT * FROM orders X WHERE
(SELECT COUNT(*) FROM orders Y
WHERE Y.totalOrder > X.totalOrder) < 6
(Grab the top five highest total orders with side effects)
Alternatively, how do you know Linq executes as a single statement without using a debugger? I know you need to follow the enumeration, but I would assume just lookup the patterns somewhere.
This is from the MSDN site which is their example of doing a SQL difference. I am probably wrong, but I wouldn't think this uses set processing on the server (I think it pulls both sets locally then takes the difference, which would be very inefficient). I am probably wrong, and this could be one of the patterns on that reference.
SQL difference example:
var differenceQuery =
(from cust in db.Customers
select cust.Country)
.Except
(from emp in db.Employees
select emp.Country);
Thanks
-- Update:
-- Microsoft's 101 Linq Samples in C# is a closer means of constructing linq in a pattern to produce the SQL you want. I will post more as I find them. I am really looking for a methodology (patterns or a conversion tool) to convert SQL to Linq.
-- Update (sql from Microsoft's difference pattern in Linq):
SELECT DISTINCT [t0].[field] AS [Field_Name]
FROM [left_table] AS [t0]
WHERE NOT (EXISTS(
SELECT NULL AS [EMPTY]
FROM [right_table] AS [t1]
WHERE [t0].[field] = [t1].[field]
))
That's what we wanted, not what I expected. So, that's one pattern to memorize.

If you have hand-written SQL, you can use ExecuteQuery, specifying the type of "row" class as a function template argument:
var myList = DataContext.ExecuteQuery<MyRow>(
"select * from myview");
The "row" class exposes the columns as public properties. For example:
public class MyRow {
public int Id { get; set; }
public string Name { get; set; }
....
}
You can decorate the columns with more information:
public class MyRow {
....
[Column(Storage="NameColumn", DbType="VarChar(50)")]
public string Name { get; set; }
....
}
In my experience linq to sql doesn't generate very good SQL code, and the code it does generate breaks down for large databases. What linq to sql does very well is expose stored procedures to your client. For example:
var result = DataContext.MyProcedure(a,b,c);
This allows you to store SQL in the database, while having the benefits of an easy to use, automatically generated .NET wrapper.
To see the exact SQL that's being used, you can use the SQL Server Profiler tool:
http://msdn.microsoft.com/en-us/library/ms187929.aspx
The Linq-to-Sql Debug Visualizer:
http://weblogs.asp.net/scottgu/archive/2007/07/31/linq-to-sql-debug-visualizer.aspx
Or you can write custom code to log the queries:
http://goneale.wordpress.com/2008/12/31/log-linq-2-sql-query-execution-to-consoledebug-window/

This is why Linq Pad was created in the first place. :) It allows you to easily see what the output is. What the results of the query would be etc. Best is it's free. Maybe not the answer to your question but I am sure it could help you.

If you know exactly the sql you want, then you should use ExecuteQuery.
I can imagine a few ways to translate the query you've shown, but if you're concerned that "Except" might not be translated.
Test it. If it works the way you want then great, otherwise:
Rewrite it with items you know will translate, for example:
db.Customers.Where(c => !db.Employees.Any(e => c.Country == e.Country) );

If you are concerned about the TSQL generated, then I would suggest formalising the queries into stored procedures or UDFs, and accessing them via the data-context. The UDF approach has slightly better metadata and composability (compared to stored procedure) - for example you can add addition Where/Skip/Take etc to a UDF query and have it run at the database (but last time I checked, only LINQ-to-SQL (not Entity Framework) supported UDF usage).
You can also use ExecuteQuery, but there are advantages of letting the database own the fixed queries.
Re finding what TSQL executed... with LINQ-to-SQL you can assign any TextWriter (for example, Console.Out) to DataContext.Log.

I believe the best way is to use stored procedures. In this case you has full control on the SQL.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.