How to optimize this linq query? (with OrderBy and ThenBy)

How to optimize this linq query? (with OrderBy and ThenBy) - c#

I have currently in a table about 90k rows. And it's will grow up about 1kk ~ 5kk before i execute a clean up and put all rows in a "historical table". So, when i run this following query (MyEntities is a ObjectSet):
MyEntities.Skip(amount * page).Take(amount).ToList();
This query takes about 1.2s... but when i run this following query with OrderBy and ThenBy:
MyEntities.OrderBy(b => b.Day).ThenBy(b => b.InitialHour).Skip(amount * page).Take(amount).ToList();
This query takes about 5.7s. There is a way to optimize the second query?

A few suggestions:
Check that it really is happening in the database (instead of fetching all entities, then sorting)
Make sure that both Day and InitialHour are indexed.
Check the generated SQL isn't doing anything crazy (check the query plan)
EDIT: Okay, so it looks like MyEntities is actually declared as IEnumerable<MyEntity>, which means everything will be done in-process... all your LINQ calls will be via Enumerable.Select etc, rather than Queryable.Select etc. Just change the declared type of MyEntities to IQueryable<MyEntity> and watch it fly...

For reading data from your DB, it's usually a good idea to create custom SQL Views, say one View per grid and one View per Form that you want to populate.
In this example, you would create a view that does the sorting for you, then map that View to an Entity in Entity Framework, then query that Entity using LINQ.
This is nice, clean, readable, maintainable and as optimal as you can make it.
Good luck!

Related

How Dapper.NET works internally with .Count() and SingleOrDefault()?

I am new to Dapper though I am aware about ORMs and DAL and have implemented DAL with NHibernate earlier.
Example Query: -
string sql = "SELECT * FROM MyTable";
public int GetCount()
{
var result = Connection.Query<MyTablePoco>(sql).Count();
return result;
}
Will Dapper convert this query (internally) to SELECT COUNT(*) FROM MyTable looking at .Count() at the end?
Similarly, will it convert to SELECT TOP 1 * FROM MyTable in case of SingleOrDefault()?
I came from NHibernate world where it generates query accordingly. I am not sure about Dapper though. As I am working with MS Access, I do not see a way to check the query generated.

No, dapper will not adjust your query. The immediate way to tell this is simply: does the method return IEnumerable... vs IQueryable...? If it is the first, then it can only use local in-memory mechanisms.
Specifically, by default, Query will actually return a fully populated List<>. LINQ's Count() method recognises that and just accesses the .Count property of the list. So all the data is fetched from the database.
If you want to ask the database for the count, ask the database for the count.
As for mechanisms to view what is actually sent to the database: we use mini-profiler for this. It works great.
Note: when you are querying exactly one row: QueryFirstOrDefault (and the other variants you would expect) exist and have optimizations internally (including hints to ADO.NET, although not all providers can act on those things) to do things as efficiently as possible, but it does not adjust your query. In some cases the provider itself (not dapper) can help, but ultimately: if you only want the first row, ask the database for the first row (TOP or similar).

Using Linq to return the count of items from a query along with its resultset

I am using C# MVC4 with Linq.
I have used dependency injection for my project which resulted in me having a separate Model's project along with a separate Repository project (and one for testing ect). All this no problem.
I moved my queries out of the controllers (old style) and into the repository (new DI style), and injected them. It works fine.
I have a standard linq query (pick any example, they are basic enough), which returns a set of items from the database as normal. No problems here either.
My problem is, that I want to implement paging, and I taught it would be simple enough to. Here is my steps:
Take in the results of the linq query from the repository (injected into the controller) store it in a var. It looks something like:
var results = _someInjectedCode.GetListById(SomeId);
Before, I was able to do something simple like:
results.Count()
results.Skip(SomeNum).Take(SomeOtherNum)
But now that I want paging, I need to do my Skip Take something like this:
var results = from xyz in _someInjectedCode.GetListById(SomeId).SomeId).Skip(SomeNum).Take(SomeOtherNum)
select new[] {a,id, a.fName, a.lName .....}
The problem with this is that I no longer have access to the total count of items before the list was shortened to the Pre Skip...Take state unless I do two queries which means hitting the DB twice.
What is the best way to resolve this issue.

I just do it like this:
var result = (from n in mycollection
where n.someprop == "some value"
select n).ToList();
var count = result.Count;
There are probably other ways, but this is the simplest that I know of.

Thinking about it from a SQL point of view, I can't think of a way in a single normal query to retrieve both the total count and a subset of the data, so I don't think you will be able to do it in LINQ either.
To avoid creating two separate commands, only thing I can think of is a stored proc that returns two tables (one with just the count, the other with your subset of results). It would still execute two queries, but in a single connection. You'd lose your LINQ though. So if you want to keep your LINQ query, you might be stuck with making two separate calls.
The other way is to retrieve the entire unpaged resultset into memory, then run your Take and Skip against the array, but this is pretty wasteful and probably worse than two calls.

You can either add additional parameters to your repository interface/class which will provide paging parameters and return count alongside your result or modify your interfaces to return IQueryable and then apply count and then skip/take before query is compiled and sent for execution.

Querying with many (~100) search terms with Entity Framework

I need to do a query on my database that might be something like this where there could realistically be 100 or more search terms.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
IQueryable<Address> addressQuery = DbContext.Addresses;
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
return addressQuery;
}
However when it contains more than about 15 terms it throws and exception on execution because the SQL generated is too long.
Can this kind of query be done through Entity Framework?
What other options are there available to complete a query like this?

Sorry, are we talking about THIS EXACT SQL?
In that case it is a very simple "open your eyes thing".
There is a way (contains) to map that string into an IN Clause, that results in ONE sql condition (town in ('','',''))
Let me see whether I get this right:
addressQuery.Where( x => towns.Any( y=> x.Town == y ) );
should be
addressQuery.Where ( x => towns.Contains (x.Town)
The resulting SQL will be a LOT smaller. 100 items is still taxing it - I would dare saying you may have a db or app design issue here and that requires a business side analysis, I have not me this requirement in 20 years I work with databases.

This looks like a scenario where you'd want to use the PredicateBuilder as this will help you create an Or based predicate and construct your dynamic lambda expression.
This is part of a library called LinqKit by Joseph Albahari who created LinqPad.
public IQueryable<Address> GetAddressesWithTown(string[] towns)
{
var predicate = PredicateBuilder.False<Address>();
foreach (string town in towns)
{
string temp = town;
predicate = predicate.Or (p => p.Town.Equals(temp));
}
return DbContext.Addresses.Where (predicate);
}

You've broadly got two options:
You can replace .Any with a .Contains alternative.
You can use plain SQL with table-valued-parameters.
Using .Contains is easier to implement and will help performance because it translated to an inline sql IN clause; so 100 towns shouldn't be a problem. However, it also means that the exact sql depends on the exact number of towns: you're forcing sql-server to recompile the query for each number of towns. These recompilations can be expensive when the query is complex; and they can evict other query plans from the cache as well.
Using table-valued-parameters is the more general solution, but it's more work to implement, particularly because it means you'll need to write the SQL query yourself and cannot rely on the entity framework. (Using ObjectContext.Translate you can still unpack the query results into strongly-typed objects, despite writing sql). Unfortunately, you cannot use the entity framework yet to pass a lot of data to sql server efficiently. The entity framework doesn't support table-valued-parameters, nor temporary tables (it's a commonly requested feature, however).
A bit of TVP sql would look like this select ... from ... join #townTableArg townArg on townArg.town = address.town or select ... from ... where address.town in (select town from #townTableArg).
You probably can work around the EF restriction, but it's not going to be fast and will probably be tricky. A workaround would be to insert your values into some intermediate table, then join with that - that's still 100 inserts, but those are separate statements. If a future version of EF supports batch CUD statements, this might actually work reasonably.
Almost equivalent to table-valued paramters would be to bulk-insert into a temporary table and join with that in your query. Mostly that just means you're table name will start with '#' rather than '#' :-). The temp table has a little more overhead, but you can put indexes on it and in some cases that means the subsequent query will be much faster (for really huge data-quantities).
Unfortunately, using either temporary tables or bulk insert from C# is a hassle. The simplest solution here is to make a DataTable; this can be passed to either. However, datatables are relatively slow; the over might be relevant once you start adding millions of rows. The fastest (general) solution is to implement a custom IDataReader, almost as fast is an IEnumerable<SqlDataRecord>.
By the way, to use a table-valued-parameter, the shape ("type") of the table parameter needs to be declared on the server; if you use a temporary table you'll need to create it too.
Some pointers to get you started:
http://lennilobel.wordpress.com/2009/07/29/sql-server-2008-table-valued-parameters-and-c-custom-iterators-a-match-made-in-heaven/
SqlBulkCopy from a List<>

Query Execution of SP while using it in LINQ to SQL

I have a query that returns a resultset. And I want to apply filter and sorting on the resultset.
Can someone help me understand if I use the query in LINQ (I'm using EF 4.0), will I be able to get deferred executuin so that when i apply filter/sort in entity model, the execution happens only one time (deffered)
Thanks in advance!
Regards,
Bhavik

If the query takes no parameters, then yes, as you could make a view that calls that sproc, expose the view in your model, then query it.
If it takes parameters, then if you need the sort/filter done server-side, then I think you'd have to add a wrapper sproc (or modify the existing one) to pass in the sort and filter to perform (basically, do it manually, but at least server-side).
Alternatively, you could write the sql to do it server side (sproc results into temp table, then select from that temp table and apply filtering, still manually) and then ExecuteStoreQuery

No, you can't defer the execution of a linq filtering to the stored proc in sql. The stored proc will be executed first, a resultset will be returned, you can then cast it to a list of your object types, once done that you can filter using Linq.
You can easily cast the resultset to a list of your objects using context.Translate<>
Have a look to these links :
enter link description here
List item
Of course the query (in your code) will not be evaluated until you cast it to a list, so you can concatenate all the filtering you want to your resultset and then call the ToList() to get the results.

Using EF4 how can I track the # of times a record is part of a Skip().Take() result set

So using EF4, I'm running a search query that ends with this common function:
query = query.Skip(5).Take(10);
Those records in the database have a column called ImpressionCount, which I intend to use to count the number of times that each record displayed on a page of search results.
What's the most efficient way to do this? Off the top of my head, I'm just going to look at the result set, get a list of ID's and then hit the database again using ADO.NET to do something like:
UPDATE TableName SET ImpressionCount = ImpressionCount + 1 WHERE Id IN (1,2,3,4,5,6,7,8,9,10)
Seems simple enough, just wondering if there's a more .NET 4 / Linq-ish way to do this that I'm not thinking of. One that doesn't involve another hit to the database would be nice too. :)
EDIT: So I'm leaning towards IAbstract's response as the answer since it doesn't appear there's a "built in" way to do this. I didn't think there was but it never hurts to ask. However, the only other question I think I want to throw out there is: is it possible to write a SQL trigger that could only operate on this particular query? I don't want ImpressionCount to update on EVERY select statement for the record (for example, when someone goes to view the detail page, that's not an impression -- if an admin edits the record in the back end, that's not an impression)...possible using LINQ or no?
SQL Server would somehow need to be able to identify that the query was generated by that Linq command, not sure if that's possible or not. This site is expecting relatively heavy traffic so I'm just trying to optimize where possible, but if it's overkill, I might just go ahead and hit the database again one time for each page of results.

In my opinion, it is better to go ahead and run with the SQL command as you have it. Just because we have LINQ does not mean it is always the best choice. Your statement gives you a one-call process to update impression counts and should be fairly quick.

You can technically use a .Select to modify each element in a returned result, but it's not idiomatic C#/linq, so you're better off using a foreach loop.
Example:
var query = query.ToList().Select(x => { x.ImpressionCount++; return x; });
As IAbstract said, be careful of performance issues. Using the example above, or a foreach will execute 10 updates. Your one update statement is better.
I know Linq2NHibernate has this same issue - trying to stick with Linq just isn't any good for updates (that's why it's called "language integrated query");
Edit:
Actually there's probably no reason why EF4 or NHibernate couldn't parse the select expression, realize it's an update, and translate it into an update statement an execute it, but certainly neither framework will do that. If that were something that could happen, you'd want a new .Update() extension method for IQueryable<T> to explicitly state that you're modifying data. Using .Select() for it is a dirty hack.
... which means there's no reason you couldn't write your own .Update(x => x.ImpressionCount++) extension method for IQueryable<T> that output the SQL you want and call ExecuteStoreCommand, but it would be a lot of work.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.