mixing database and object query in linq and provide paged results

mixing database and object query in linq and provide paged results - c#

I need to build a query that provides paged results. Part of filtering occurs in the database and part of it occurs in objects that are in memory.
Below is a simplified sample that shows what I could do i.e. run a linq query against the database and then further filter it using the custom code and then use skip/take for paging but this would be very inefficient as it needs to load all items that match the first part of my query.
Things.Where(e=>e.field1==1 && e.field2>1).ToList()
.Where(e=>Helper.MyFilter(e,param1,param2)).Skip(m*pageSize).Take(pageSize);
MyFilter function uses additional data that is not located in the database and it is run with additional parameters (paramX in the above example)
Is there a preferred way to handle this situation without loading the initial result fully in memory.

yes, query and page at the database level. whatever logic is in Helper.MyFilter needs to be in the sql query.
the other option, which is more intrusive to your code base. is to save the view model, as well as the domain entity when the entity changes. part of the view model would contain the result of Helper.MyFilter(e) so you can quickly and efficiently query for it.

To support Jason's answer above - entity framework supports .Skip().Take(). So send it all down to the db level and convert your where into something EF can consume.
If your where helper is complicated use Albahari's predicate builder:
http://www.albahari.com/nutshell/predicatebuilder.aspx
or the slightly easier to use Universal Predicate Builder:
http://petemontgomery.wordpress.com/2011/02/10/a-universal-predicatebuilder/ based on the above.

.ToList()
You are converting your query into a memory object i.e. list and thus causing the query to execute and then you provide the paging on the data.
You can put it all in one Where clause:
Things.Where(e=>e.field1==1 && e.field2>1
&& e=>Helper.MyFilter(e)).Skip(m*pageSize).Take(pageSize);
and then .ToList().
That way you will give Linq to Sql a chance to generate a query and get you only the data that you want.
Or there is a particular reason why you want to do just that - converting to a memory object and then filtering? Although I don't see the point. You should be able to filter out the results that you don't want in the Linq to Sql query before you actually execute it against the database.
EDIT
As I can see from the discussion you have several options.
If you have a lot of data and do more reads than writes it might be wise to save the results from Helper.MyFilter into the database on insert if it's possible. That way you can increase performance on select as you will not pull all the data from the database and also you will have a more filtered data on the SELECT itself.
Or you can take another approach. You can put Helper class in a separate assembly and reference that assembly from SQL Server. This will enable you to put the paging logic in your database and use your code as well.

Related

Can EF Core return IQueryable from Stored Procedure / Views / Table Valued Function?

We're in need of passing ODATA-V4 query search, order by clauses to Database directly.
Here is the case:
There are joins among tables and we invoke (inline) table valued
functions using SQL to get desired records.
ODATA where clauses needs to be applied on the result-set, then we
apply pagination Skip, Take and Order By.
We started with Dapper, however Dapper supports only IEnumerable, Thus Dapper would bring entire records from DB then only OData (Query Options ApplyTo) pagination will apply, spoiling the performance gain :-(
[ODataRoute("LAOData")]
[HttpGet]
public IQueryable<LAC> GetLAOData(ODataQueryOptions<LAC> queryOptions)
{
using (IDbConnection connection = new SqlConnection(RoutingConstants.CascadeConnectionString))
{
var sql = "<giant sql query";
IQueryable<LAC> iqLac = null;
IEnumerable<LAC> sqlRes = connection.Query<LAC>(sql, commandTimeout: 300);
**IQueryable<LAC> iq = sqlRes.AsQueryable();
iqLac = queryOptions.ApplyTo(iq) as IQueryable<LAC>;
return iqLac;**
}
}
Most of the example we see on Stored procedure, Views support apparently returns List.
https://hackernoon.com/execute-a-stored-procedure-that-gets-data-from-multiple-tables-in-ef-core-1638a7f010c
Can we configure EF Core 2.2 to return IQueryable so that ODATA could
further filter out and then yield only desired counts say 10.?

Well, yes and no. You can certainly return an IQueryable, and you're already doing so, it seems. And, you can certainly further query via LINQ on that IQueryble, in memory.
I think what you're really asking, is if you can further query at the database-level, such only the ultimate result set you're after is returned from the database. The answer to that is a hard no. The stored procedure must be evaluated first. Once you've done that, all the result have been returned from the database. You can further filter in memory, but it's already too late for the database.
That said, you should understand that OData is fundamentally incompatible with the idea of using something like a stored procedure. The entire point is to describe the query via URL parameters - the entire query. You could use a view instead, but stored procedures should not be used along-side OData.

EF cannot return IQueryable from a Stored Procedure because the database engine itself doesn't provide a mechanism for selectively querying or manipulating execution of the script, you can't for instance do the following in SQL:
SELECT Field1, Field2
EXEC dbo.SearchForData_SP()
WHERE Field2 IS NOT NULL
ORDER BY Field3
The Stored Procedure is a black box to the engine, and because of this there are certain types of expressions and operations that you can use in SPs that you cannot use in normal set based SQL queries or expressions. For instance you can execute other stored procedures. SPs must be executed in their entirety, before you can process the results.
If the database engine itself cannot do anything to optimise the execution of Stored Procedures, its going to be hard for your ORM framework to do so.
This is why most documentation and examples around executing SPs via EF returns a List as that makes it clear that the entire contents of that list is in memory, casting that List to IQueryable with .AsQueryable() doesn't change the fact that the data is maintained within that List object.
There are joins among tables and we invoke (inline) table valued functions using SQL to get desired records.
What you are describing here is similar to what OData and EF try to offer you, mechanisms for composing complex queries. To take full advantage of OData and EF your should consider replicating or replacing your TVFs with linq statements. EF is RDBMS agnostic so it tries to use and enforce generic standards that can be applied to many database engines, not just SQLSERVER. When it comes to CTEs, TVFs and SPs the implementation and syntax in each database engine becomes a lot more specific even to specific versions in some cases. Rather than trying to be everything to everyone, the EF team has to enforce some limits so they can maintain quality of the services they have offered for us.
There is a happy medium that can be achieved however, where you can leverage the power of the two engines:
Design your SPs so that the filtering variables are passed through as parameters and restrict dependence on Stored Procedures to scenarios where the structure of the output is as efficient as you would normally need. You can then expose the SP as an Action endpoint in OData and the called can pass the parameter values directly through to the SP.
You can still wrap the response in an IQueryable<T> and decorate this action with the EnableQuery attribute, this will perform in memory $select, $expand and simple $filter operations but the service will still load the entire recordset into memory before constructing the response payload. This mechanism can still reduce bandwidth between the server and the client, just not between the database and the service layer.
Make different versions of your SP if you need to have different result structures for different use cases.
Use TVFs or Views only when the query is too complex to express easily using linq or you need to use Table Hints, CTEs, Recursive CTEs or Window Functions that cannot be easily replicated in Linq.
In many cases where CTEs (non recursive) are used, the expression can be easier to construct in Linq.
To squeeze the most performance from indexes you can use Table Hints in SQL, because we don't have tight control over how our Linq expressions will be composed into SQL it can take a lot of work to construct some queries in a way that the database can optimise them for us. In many scenarios, as with CTEs above, going through the process of rewriting your query in Linq can help avoid scenarios where you would traditionally have used table hints.
There are limits, when you want or need to take control, using specialised SQL Server concepts that EF doesn't support, you are making a conscious decision to have one and not the other.
I do not agree that OData and Stored Procedures are fundamentally incompatible there are many use cases where the two get along really well, you have to find the balance though. If you feel the need to pass through query options such as $select, $expand $filter, $top, $skip... to your Stored Procedure, either change your implementation to be constructed purely in Linq (so no SP) or change the client implementation so that you pass through formal parameters that can be handled directly in the SP.

Entity Framework dynamic filter query using ToTraceString vs sproc

I have a grid view with complex filtering options, the query is built with
an IQueryable on my dbcontext. This is working fine, the table can contain a lot of data, but the result is filtered with sane paging options.
I now have a requirement to implement an export feature that must work with all the filtering options available for the grid.
Let's say the table might contain a million rows.
I think I will run in performance issues by executing this with EF.
I also could create a stored procedure, but that would be quite complex and the logic will kind of be duplicated from the c# code.
Would it be a good idea to instead use the existing logic that build the IQueryable and generate the query string using
((System.Data.Objects.ObjectQuery)myIqueryable).ToTraceString()
Then I can run the generated query with ExecuteStoreQuery (or any other way to run sql directly)?
I tested this and it seem to work fine, but I am not sure how the performance vs a stored procedure would be or if I will run into issues I did not think of.

Entity Framework under the hood

I have some legacy code that uses Entity Framework.
When I debug the code I can see that EF DbContext contains the whole table. It was passed by OData to the frontend, and then angular processed it.
So I tried to search, is it possible to get only a single record by EF?
Everywhere I see the SingleOrDefault method, or other IQueryable, but as I understood, these are parts of the collections.
Microsoft says: Sometimes the value of default(TSource) is not the default value that you want to use if the collection contains no elements.
Does that mean EF always get all the data from the table and I can use them later?
Or is there a way to force inner query to get only one, and only one row?
We are using postgresql.

With Entity Framework, you can use LINQ to run queries and get single records or limited sets. However, in your .NET project the controller should be parsing OData query parameters and filtering the dataset before returning results to the client application. Please check your Controller code against this tutorial to see if you might be missing something.
If you are somehow bypassing the built-in OData framework, what might help is understanding which queries execute immediately vs which ones are deferred. See this list to understand exactly which operations will force a trip to the database and try to hold off on anything with immediate execution until as late as possible.

No, EF will not SELECT the entire table into memory if you use it correctly. By correctly; I mean:
context.Table.First();
Will translate into a SQL query that only returns one row, that will then map to an object to be returned to the calling code. This is because the above code uses LINQ-to-Entities. If you did something like this instead:
context.Table.ToList().First();
Then the entire table is selected to make the ToList work, and LINQ-to-Objects handles the First. So as long as you do your queries with lazy enumeration (not realizing the result ahead of time), you'll be fine.

Entity Framework performance of include

It's more a technical (behind the scenes of EF) kind of question for a better understanding of Include for my own.
Does it make the query faster to Include another table when using a Select statement at the end?
ctx.tableOne.Include("tableTwo").Where(t1 => t1.Value1 == "SomeValueFor").Select(res => new {
res.Value1,
res.tableTwo.Value1,
res.tableTwo.Value2,
res.tableTwo.Value3,
res.tableTwo.Value4
});
Does it may depend on the number of values included from another table?
In the example above, 4 out of 5 values are from the included table. I wonder if it does have any performance-impact. Even a good or a bad one?
So, my question is: what is EF doing behind the scenes and is there any preferred way to use Include when knowing all the values I will select before?

In your case it doesn't matter if you use Include(<relation-property-name>) or not because you don't materialize the values before the Select(<mapping-expression>). If you use the SQL Server Profiler (or other profiler) you can see that EF generates two exactly the same queries.
The reason for this is because the data is not materialized in memory before the Select - you are working on IQueryable which means EF will generate an SQL query at the end (before calling First(), Single(), FirstOrDefault(), SingleOrDefault(), ToList() or use the collection in a foreach statement). If you use ToList() before the Select() it will materialize the entities from the database into your memory where Include() will come in hand not to make N+1 queries when accessing nested properties to other tables.

It is about how you want EF to load your data. If you want A 'Table' data to be pre populated than use Include. It is more handy if Include statement table is going to be used more frequently and it will be little slower as EF has to load all the relevant date before hand. Read the difference between Lazy and Eager loading. by using Include, it will be the eager loading where data will be pre populated while on the other hand EF will send a call to the secondary table when projection takes place i-e Lazy loading.

I agree with #Karamfilov for his general discussion, but in your example your query could not be the most performant. The performance can be affected by many factors, such as indexes present on the table, but you must always help EF in the generation of SQL. The Include method can produce a SQL that includes all columns of the table, you should always check what is the generated SQL and verify if you can obtain a better one using a Join.
This article explains the techniques that can be used and what impact they have on performance: https://msdn.microsoft.com/it-it/library/bb896272(v=vs.110).aspx

Why write a custom LINQ provider?

What is the benefit of writing a custom LINQ provider over writing a simple class which implements IEnumerable?
For example this quesiton shows Linq2Excel:
var book = new ExcelQueryFactory(#"C:\Users.xls");
var administrators = from x in book.Worksheet<User>()
where x.Role == "Administrator"
select x;
But what is the benefit over the "naive" implementation as IEnumerable?

A Linq provider's purpose is to basically "translate" Linq expression trees (which are built behind the scenes of a query) into the native query language of the data source. In cases where the data is already in memory, you don't need a Linq provider; Linq 2 Objects is fine. However, if you're using Linq to talk to an external data store like a DBMS or a cloud, it's absolutely essential.
The basic premise of any querying structure is that the data source's engine should do as much of the work as possible, and return only the data that is needed by the client. This is because the data source is assumed to know best how to manage the data it stores, and because network transport of data is relatively expensive time-wise, and so should be minimized. Now, in reality, that second part is "return only the data asked for by the client"; the server can't read your program's mind and know what it really needs; it can only give what it's asked for. Here's where an intelligent Linq provider absolutely blows away a "naive" implementation. Using the IQueryable side of Linq, which generates expression trees, a Linq provider can translate the expression tree into, say, a SQL statement that the DBMS will use to return the records the client is asking for in the Linq statement. A naive implementation would require retrieving ALL the records using some broad SQL statement, in order to provide a list of in-memory objects to the client, and then all the work of filtering, grouping, sorting, etc is done by the client.
For example, let's say you were using Linq to get a record from a table in the DB by its primary key. A Linq provider could translate dataSource.Query<MyObject>().Where(x=>x.Id == 1234).FirstOrDefault() into "SELECT TOP 1 * from MyObjectTable WHERE Id = 1234". That returns zero or one records. A "naive" implementation would probably send the server the query "SELECT * FROM MyObjectTable", then use the IEnumerable side of Linq (which works on in-memory classes) to do the filtering. In a statement you expect to produce 0-1 results out of a table with 10 million records, which of these do you think would do the job faster (or even work at all, without running out of memory)?

You don't need to write a LINQ provider if you only want to use the LINQ-to-Objects (i.e. foreach-like) functionality for your purpose, which mostly works against in-memory lists.
You do need to write a LINQ provider if you want to analyse the expression tree of a query in order to translate it to something else, like SQL. The ExcelQueryFactory you mentioned seems to work with an OLEDB-Connection for example. This possibly means that it doesn't need to load the whole excel file into memory when querying its data.

In general performance. If you have some kind of index you can do a query much faster than what is possible on a simple IEnumerable<T>.
Linq-To-Sql is a good example for that. Here you transform the linq statement into another for understood by the SQL server. So the server will do the filtering, ordering,... using the indexes and doesn't need to send the whole table to the client which then does it with linq-to-objects.
But there are simpler cases where it can be useful too:
If you have a tree index over the propery Time then a range query like .Where(x=>(x.Time>=now)&&(x.Time<=tomorrow)) can be optimized a lot, and doesn't need to iterate over every item in the enumerable.

LINQ will provide deferred execution as much as maximum possible to improve the performance.
IEnumurable<> and IQueryable<> will totally provide different program implementations. IQueryable will give native query by building expression tree dynamically which provides good performance indeed then IEnumurable.
http://msdn.microsoft.com/en-us/vcsharp/ff963710.aspx
if we are not sure we may use var keyword and dynamically it will initialize a most suitable type.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.