Using a delegate CompiledQuery in a join? - c#

I have a question related to this previous question of mine. In an existing bit of LINQ which involves a number of joins, I'm trying to take each separate method comprising the join and convert it to a CompiledQuery.
First, the normal LINQ method:
private IQueryable<Widget> GetWidgetQuery()
{
return db.Widgets.Where(u => (!u.SomeField.HasValue || !u.SomeField.Value));
}
And here, a delegate (field) definition for a CompiledQuery along these lines:
private static readonly Func<DBDataContext, IQueryable<Widget>> GetWidgetQuery =
CompiledQuery.Compile((DBDataContext db) =>
db.Widgets.Where(u => (!u.SomeField.HasValue || !u.SomeField.Value)));
If I hover over the normal LINQ statement for the method GetWidgetQuery(), I see that it's a method as below:
(method) IQueryable<Widget> GetWidgetQuery()
However, the compiled query delegate (field) differs as follows:
(field) Func<DBDataContext, IQueryable<Widget>> GetWidgetQuery
Upon executing the latter as part of the LINQ statement, the syntax differs as follows. First, the normal LINQ's participation in the join:
var myquery =
from wxr in GetWidgetXRQuery()
join w in GetWidgetQuery() on wxr.WidgetID equals w.ID
select new DTO.WidgetList
{
...
}
And here, the invocation of the CompiledQuery in the form of the delegate:
var myquery =
from wxr in GetWidgetXRQuery()
join w in GetWidgetQuery.Invoke(myContext) on wxr.WidgetID equals w.ID
select new DTO.WidgetList
{
...
}
The former returns the expected result set; the latter, when I attempt myquery.ToList(), yields a stackoverflow exception, in part related to this limitation of .NET 3.5, I think.
Can someone please help me understand how the compiled statement existing as a field (or I guess I should say a delegate) rather than a method is killing my query? In short I know what I'm doing is wrong, but I'm not sure I understand what I misunderstand.

I tried doing roughly the same thing you're doing on EF 4, and everything seems to work fine. So it's either an EF 3.5 issue, or it has something to do with your implementation of GetWidgetXRQuery, or some combination of the two.
But the real point I'd like to make is that, as Roy Goode stated in an answer to your previous question, you lose all the advantages of a precompiled query once you extend that query in any way. By trying to perform a Join on your query, you are converting it to just a plain old query. So you might as well just use the non-compiled version which appears to work for you.
Update
Realized you were talking about LINQ to SQL. This sort of query does appear to have support in Entity Framework, but not LINQ to SQL. In .NET 4, I'm getting the following error:
An IQueryable that returns a self-referencing Constant expression is not supported.
That doesn't mean much to me, but I'm guessing that it has something to do with the way the compiled query is represented internally. I still get the same error if I evaluate the query into a variable and use that variable in the query later, so it clearly has nothing to do with the difference between a delegate and a function. I still maintain that a compiled query is not appropriate to use here. Either you need to create one big compiled query to represent the whole query you want to perform, or you need to use regular queries if you want to piece them together this way.

I just came across this same error while doing db integration testing, and to jump straight to the point without trying to explain my specific issue. Linq to Sql will create the sql query internally when using IQueryable and the moment you execute a method on that IQueryable, i.e. ToList() it executes that query on the database. So in my case I am joining to a method that returns IQueryable but is mocked to return a result, it is trying to compile that to a sql query but the IQueryable I created does not have an internal SQL query

Related

Using IQueryable with record type - InvalidOperationException

I am getting an InvalidOperationException when using a projection into a record type (.Net 7). The query works fine if the where clause is chained. It also works if where clause is not chained and the projection is not into a record (class, anonymous type both ok).
public record TestDto(
int CellLocationId,
string AreaCode
);
This works - Chained:
var query = _context.myTable.Where(x => x.AreaCode == areaCode).Select(s => new TestDto(s.CellLocationId, s.AreaCode));
This fails - Unchained with record:
var query = _context.myTable.Select(s => new TestDto(s.CellLocationId, s.AreaCode));
query = query.Where(x => x.AreaCode == areaCode);
This works - Unchained with anonymous type:
var query = _context.myTable.Select(s => new {s.CellLocationId, s.AreaCode});
query = query.Where(x => x.AreaCode == areaCode);
When projecting into a record the sql shown in the error appears as:
System.InvalidOperationException: The LINQ expression 'DbSet().Where(v => new TestDto(v.CellLocationId, v.AreaCode).AreaCode == __areaCode_0)' could not be translated.
Why can I not use a record when my IQueryable is using an unchained where clause?
15/11/2022 Update.
Github issue
Possible workarounds - project into class, anonymous type or as per Bagus Tesa answer (and my preferred option) project into record type at end of statement.
About IEnumerable and IQueryable
IQueryable extends IEnumerable
IEnumerable most of the time works in memory, with some exceptions
IQueryable will be translated into SQL Query
Both IQueryable and IEnumerable execution are delayed until they are materialized (e.g. using ToList, AsEnumerable on IQueryable, etc.)
The API is designed this way for convenience in working with database queries. It allows (somewhat) interchangeable linq chains between the two. However, there are some caveats that need to be kept in mind.
IQueryable can only infer whats within its Expression. You can't for example do something like this:
public int GetAge(DateTime date)
{
...
}
var firstUserAge = _context.users.First().Select(q => GetDate(q.DoY));
It won't work, see QA. Linq is simply unable to translate the GetDate method into SQL Query.
The Problem
Let's take a look with the third case (anonymous class).
var query = _context.myTable
.Select(s => new {s.CellLocationId, s.AreaCode});
query = query.Where(x => x.AreaCode == areaCode);
Up to _context.MyTable the only known type is simply DbSet<myTable>. However, as it reach the Select(s => ...) it became aware of the anonymous type. Linq the consider that query should be an IQueryable<..__AnonymousType..>. Given the complete information is given to it. Linq can translate it into (roughly) sql query shown below and the result snuggly fits the anonymous type.
select s.cellLocationId, s.AreaCode
from myTable as s
where s.AreaCode = ....
Now, let's see what happens in the first case which surprisingly works.
var query = _context.myTable
.Select(s => new TestDto(s.CellLocationId, s.AreaCode))
.Where(x => x.AreaCode == areaCode);
Hang on a second! Did not I previously mentioned that you can't call a method on your program's side from within an IQueryable?
Yes, your code above won't work in the past, see discussion. You can find numerous examples of similar problem. You can also see examples on EF Core 7 documentation that they used anonymous types all the time. The microsoft's official tutorial on DTO also assigns properties manually one-by-one without using constructor.
However, EF Core 7 is pretty new. It is said to have better performance compared to earlier EF Core. The changes may involve reworks on how it materialize IQueryable. There are plenty of issues need to be sorted out.
You should run query intercepts and see the queries being feed into the database to see the difference between the two cases above. I'd bet that the first case actually fetch everything in myTable.
What We Should Do
We can simply map into DTO at a later part of the code (at the very end of the chain).
var query = _context.MyTable;
if(someFlag)
{
query = query.Where(s => s.AreaCode = areaCode);
}
//... a bunch more filters
return query.AsEnumerable()
.Select(s => new TestDto(s.CellLocationId, s.AreaCode));
This way, you can avoid the bug altogether while keeping most of the heavy-lifting on the database.
You may complain "I will have duplicate code for computing AreaCode." You have to remember, DTO in the first place does not contain business logic.
Regarding what Bagus Tesa wrote, the most important parts there are:
However, EF Core 7 is pretty new. It is said to have better performance compared to earlier EF Core. The changes may involve reworks on how it materialize IQueryable. There are plenty of issues need to be sorted out.
and
We can simply map into DTO at a later part of the code (at the very end of the chain).
I do not like how that answer is written, and I started writing my own, and then, frankly, I ended up with my own wall of text, and I don't like it just the same as their answer. Oh well. Maybe someone finds it useful. I mark it with community-wiki, because I don't really want to compete. It's more like, IDK, addendum?
Regarding the latter point, there have been some issues with early-selects since the dawn of EF. Select-as-custom-class is a special construct designed to allow your query to return your custom entity types (instead of anon bags of fields), especially in cases where you Join several tables, where the result cannot be simply the context.table you start the query from.
Thanks to how IQueryable is designed, you can moreless put such a .Select anywhere in your query, but there's a catch: once IQueryable is now <of-you-custom-type>, your custom properties are exposed instead of original tables columns, and it gets increasingly harder to guarantee that your code doesn't try to map/execute something untranslatable.
For example, imagine:
// case1:
var result = context.myTable
.Where(x => x.AreaCode == areaCode)
.Select<MyClass>(...)
.ToList();
// case2:
var result = context.myTable
.Select<MyClass>(...)
.Where(x => x.AreaCode == areaCode)
.ToList();
In the first case, it's obvious. AreaCode is a column from MyTable, defined by whatever type of MyTable.
In second case, it's AreaCode but from MyClass, and since someone wanted to introduce a new type, we can assume it's not mapped in EF mappings to the same table as MyTable. So if EF hanles that, it may have to so some guessing what MyClass.AreaCode really is.
Furthermore, what if MyClass.AreaCode was:
private string theCode = "5";
public string AreaCode {
set { theCode = value; }
get { return theCode + "1"; }
}
In that case, case1 would still function properly. And case2 can never function properly. Not even mentioning, custom class constructors, we write .Select(.. => new MyClas(...)) and it's very tempting to hide some custom logic into the ctor, even if it's as simple as if/throw/ ArgumentNullException.
If EF handles such cases it in ANY way (like: assume props/ctors are transparent, ignoring logic inside), results will be unintuitive to the programmer: MyClass used in Query will behave differently than MyClass used in the runtime. EF shouldn't handle it, it should throw, but determining that a getter or constructor is pass-through without custom logic isn't that easy.
That's why anon-types new {foo=, bar=} are always preferred (2) as the intermediate types - they are 100% compiler-generated, the compiler knows everything about them, the compiler knows all fields/properties are plain and transparent, and that's why it's not a huge problem if you put .Select<anon> even early in the LINQ query. (1)
But let's get back to your code.
In that record-related code samples you provided:
// first:
var query = _context.myTable
.Select(s => new TestDto(s.CellLocationId, s.AreaCode))
.Where(x => x.AreaCode == areaCode);
// second:
var query = _context.myTable
.Select(s => new TestDto(s.CellLocationId, s.AreaCode));
query = query.Where(x => x.AreaCode == areaCode);
These pieces of code, if there is really nothing else between context:get_myTable and IQueryable<record>:Where, something that you'd for some reason omit, then these two pieces of code are essentially the same. Or should be.
When we write return " mom ".ToUpper().Trim(); we don't expect it to have different results than if we write var x = " mom ".ToUpper(); x = x.Trim(); return x;, right? That's all complete basics of operation/result/variable semantics.
The only difference here is splitting such unmaterialized IQueryable 'result' of a .Select() to a variable. LINQ call chain is exactly the same. Names are exactly the same. Difference is purely syntactical. So if you see that one of them works, and the other fails, this means there's a bug in the compiler, and you should:
file a bug issue at roslyn github
get away from whatever this bug can be related to as far as possible (meaning not use records yet, or at least not early in the query)
or, at least test the hell out of it, so you know how to use it safely even bugged, and get as much of those tests automated
There's really not much else to be advised here.
(1) actually sometimes it is problematic. Every ".Select" in the chain changes the "shape of the query", as they call it sometimes in MSDN articles, and it usually breaks any .Include setups that were done on the original shape. But YMMV there.
(2) anon-types are perfect here from the LINQ point of view, but are often a nightmare for the developer who wants to do anything a bit more reusable or structured. You cannot return them easily from your methods, since the type is anonymous. And so on. This severely limits reuse, and I can easily imagine, that's why RecordTypes are introduced - to provide a plain type with guaranteed mechanics, a type that can be named and easily referred to.

Why is this caught in running time?

what is the problem with the following code?
var results = db.Departments.Select(x => customfunction(x.DepartmentID));
results.toList();
Why this is this not caught in compile time but gives an exception in runtime?
There are many different LINQ providers that allow you to use LINQ against a variety of data sources (LINQ to Entities, LINQ to Objects, LINQ to XML among others).
Although LINQ to Entities does not know how to invoke your custom method (at least, not the providers for common databases), some LINQ providers might well understand how to execute myfunction. The compiler is not integrated with all of the many LINQ providers, so the information as to whether your custom method can be included is only available at runtime.
In fact, LINQ to Objects can execute it
var results = db.Departments
.AsEnumerable()
.Select(x => myfunction(x.DepartmentID));
The conversion of
db.Departments.Select(x => myfunction(x.DepartmentID));
to sql does not happen until very late in the process at runtime.
So at compilation time Entity Framework does not know that myFunction is not something it is not going to know how to convert to sql yet.
of course you could do
db.Departments.Select(x => (x * 2));
And EF will happily be able to translate to sql but if you try to use a custom property on Departments object that doesn't map to a field in the db or use a custom method that that the Linq data provider does not know how to translate to sql you will get the same error.
The idea is simple: .Select(x => myfunction(x.DepartmentID)) is never executed in C#. Instead, Entity Framework parses this expression and translates it to SQL.
For example, if you call .Count() in C# EF will translate this to SELECT COUNT(...)
This is all very powerful, except for a few cases such as yours.
EF analyzes your code through Expression Trees.
If you call myFunction() in an expression tree, it is impossible to "expand" that call to get the body of your function. This is a limitation of how C# is implemented. That's why EF is unable to generate sensible SQL, because it has no clue what your method does.
However, there is a reasonable alternative: LinqKit
LinqKit will allow you to write "functions" that are reusable in your expressions.
Example:
static string[] QueryCustomers (Expression<Func<Purchase, bool>> purchaseCriteria)
{
var data = new MyDataContext();
var query = from c in data.Customers.AsExpandable()
where c.Purchases.Any (purchaseCriteria.Compile())
select c.Name;
return query.ToArray();
}

Linq2Entities CompiledQuery for query that uses joins

I have a query that is not performing too well, e.g. the generated SQL code is sub-optimal.
The original statement looked something like this (simplified):
ctx.Table1.Where(t => ...)
.OrderBy(t => ....)
.Select(t => new {Table1 = t,
SomeProperty = t.Table2.SomeProperty,
SomeProperty2 = t.Table2.SomeProperty2,
AnotherProperty = t.Table3.AnotherProperty,
...
}
I looked in SQL Profiler and found that the generated SQL would join the same table multiple times and the statement would take around 1 second to execute.
I then rewrote the statement to something along these lines:
from t in ctx.Table1
join t2 in ctx.Table2 on t.key equals t2.key into lt2
from t2 in lt2.DefaultIfEmpty()
join t3 in ctx.Table3 on t.key equals t3.key into lt3
from t3 in lt3.DefaultIfEmpty()
where t ...
orderby t...
select new {Table1 = t, .... }
This generated a much nicer statement, that when grabbed from SQL profiler and executed in Management studio is double as fast as the statement generated by the code in the previous example.
However when running the code from the second example, the time taken for EF to generate the expression far superseeds the time gained from the query optimization.
So how do I go about writing statement number two as a CompiledQuery. I basically don't know how to return an anonymous type from a CompiledQuery.
A workaround I found for using CompiledQueries is:
Add a private InitQueryX() method before each QueryX() method that uses LINQ to Entity.
Use attributes and reflection call all the InitQueryX() methods from an Init() method.
Call Init() method once on app start.
This forces compilation of queries at start, yet enables writing queries in a more flexible manner then CompiledQueries do.
The InitQueryX() should use multiple dummy inputs so that it covers all the paths within QueryX() method (sort of like unit tests code coverage).
When possible, the InitQueryX()'s inputs should be mocks that result in 0 rows in database, so that the Init() method will take less time to run.
You can use Tuple class if your return object will have 8 or less properties. If you have more properties and don't want to declare a class for those properties you can use dynamic as the return type.

mixing c# functions with Linq-To-SQL conditions

I am having trouble mixing c# functions with conditions in Linq-To-SQL
suppose i have a database table "things" and a local c# Function : bool isGood(thing, params)
I want to use that function to select rows from the table.
var bad = dataContext.Things.Where(t=>t.type=mytype && !isGood(t,myparams))
dataContect.Things.deleteAllOnSubmit(bad);
or
if (dataContext.Things.Any(t=>t.type=mytype && isGood(t,myparams)))
{
return false;
}
Of course this does not work, Linq has no way of translating my function into a SQL statement. So this will produce:
NotSupportedException: Method 'Boolean isGood(thing,params)' has no supported translation to SQL.
What is the best way to redesign this so that it will work?
I can split the statements and convert to list like this:
List<Things> mythings dataContext.Things.Where(t=>t.type=mytype).toList()
if (mythings.Any(t=>isGood(t,myparams)))
{
return false;
}
this works, but seems inefficient, since the whole list has to be generated in every case.
And I don't think I can do a deleteAllOnSubmit with the result
I could do a foreach over mythings instead of calling toList(), that also works. Seems inelegant though.
What other options do I have, and what would be the recommended approach here?
edit:
calling asEnumerable() seems to be another option, and seems better than toList() at least. I.e.
dataContext.Things.Where(t=>t.type=mytype).asEnumerable().Any(t=>isGood(t,myparams))
Pulling the whole list back from the database to run a local c# function on it might seem inefficient, but that what you'd have to do if your isGood() function is local.
If you can translate your isGood() function into Linq, you could apply it before the toList() call, so it would get translated into SQL and the whole list wouldn't be retrieved.

including "offline" code in compiled querys

What happens behind the curtains when I include a function into my compiled query, like I do with DataConvert.ToThema() here to convert a table object into my custom business object:
public static class Queries
{
public static Func<MyDataContext, string, Thema> GetThemaByTitle
{
get
{
var func = CompiledQuery.Compile(
(MyDataContext db, string title) =>
(from th in elan.tbl_Thema
where th.Titel == title
select DataConvert.ToThema(th)).Single()
);
return func;
}
}
}
public static class DataConvert
{
public static Thema ToThema(tbl_Thema tblThema)
{
Thema thema = new Thema();
thema.ID = tblThema.ThemaID;
thema.Titel = tblThema.Titel;
// and some other stuff
return thema;
}
}
and call it like this
Thema th = Queries.GetThemaByTitle.Invoke(db, "someTitle");
Apparently the function is not translated in to SQL or something (how could it), but it also does not hold when I set a breakpoint there in VS2010.
It works without problems, but I don't understand how or why. What exactly happens there?
Your DataConvert.ToThema() static method is simply creating an instance of a type which has a default constructor, and setting various properties, is that correct? If so, it's not terribly different from:
(from th in elan.tbl_Thema
where th.Titel == title
select new Thema{ID=th.ThemaID, Titel=th.Titel, etc...}
).Single());
When you call Queries.GetThemaByTitle, a query is being compiled. (The way you are calling this, by the way, may or may not actually be giving you any benefits from pre-compiling). That 'Query' is actually a code expression tree, only part of which is intended to generate SQL code that is sent to the database.
Other parts of it will generate IL code which is grabbing what is returned from the database and putting it into some form for your consumption. LINQ (EF or L2S) is smart enough to be able to take your static method call and generate the IL from it to do what you want - and maybe it's doing so with an internal delegate or some such. But ultimately, it doesn't need to be (much) different from what would be generated from I substituted above.
But note that this happens regardless what the type is that you get back; somewhere, IL code is being generated that puts DB values into a CLR object. That is the other part of those expression trees.
If you want a more detailed look at those expression trees and what they involved, I'd have to dig for ya, but I'm not sure from your question if that's what you are looking for.
Let me start by pointing out, that whether you compile your query or not does not matter. You would observe the very same results even if you did not pre-compile.
Technically, as Andrew has pointed out, making this work is not that complicated. When your LINQ expression is evaluated an expression tree is constructed internally. Your function appears as a node in this expression tree. No magic here. You'll be able to write this expression both in L2S and L2E and it will compile and run fine. That is until you try to actually execute the actual SQL query against the database. This is where difference begins. L2S seems to happily execute this task, whereas L2E fails with NotSupportedException, and reporting that it does not know how to convert ToThema into store query.
So what's happening inside? In L2S, as Andrew has explained, the query compiler understands that your function can be run separately from the store query has been executed. So it emits calls to your function into the object reading pipeline (where data read from SQL is transformed to the objects that are returned as the result of your call).
Once thing Andrew was not quite right, is that it matters what's inside your static method. I don't think it does.
If you put a break point in the debugger to your function, you will see that it's called once per returned row. In the stack trace you will see "Lightweight Function", which, in reality, means that the method was emitted at run time. So this is how it works for Linq to Sql.
Linq to Entity team seemed to go different route. I do not know, what was the reasoning, why they decided to ban all InvocationExpressions from L2E queries. Perhaps these were performance reason, or may be the fact that they need to support all kind of providers, not SQL Server only, so that data readers might behave differently. Or they simply thought that most people wouldn't realize that some of those are executed per returned row and preferred to keep this option closed.
Just my thoughts. If anyone has any more insight, please chime in!

Categories