IQueryable OrderBy with Func<TModel, TValue> : what's happening

IQueryable OrderBy with Func<TModel, TValue> : what's happening - c#

With the code given from this question
OrderBy is not translated into SQL when passing a selector function
Func<Table1, string> f = x => x.Name;
var t = db.Table1.OrderBy(f).ToList();
The translated SQL is:
SELECT
[Extent1].[ID] AS [ID],
[Extent1].[Name] AS [Name]
FROM [dbo].[Table1] AS [Extent1]
OK.
I can understand that the code compiles : IQueryable inherits from IEnumerable, which have an OrderBy method taking a Func<TModel, TValue> as parameter.
I can understand that the ORDER BY clause is not generated in SQL, as we didn't pass an Expression<Func<TModel, TValue>> as the OrderBy parameter (the one for IQueryable)
But what happens behind the scene ? What happens to the "wrong" OrderBy method ? Nothing ? I can't see how and why... Any light in my night ?

Because f is a delegate rather than an expression, the compiler picks the IEnumerable OrderBy extension method instead of the IQueryable one.
This then means that all the results are fetched from the database, because the ordering is then done in memory as if it were Linq to Objects. That is, in-memory, the ordering can only be done by fetching all the records.
Of course, in reality this still doesn't actually happen until you start enumerating the result - which in your case you do straight away because you eager-load the result with your call to ToList().
Update in response to your comment
It seems that your question is as much about the IQueryable/IEnumerable duality being 'dangerous' from the point of view of introducing ambiguity. It really isn't:
t.OrderBy(r => r.Field)
C# sees the lambda as an Expression<> first and foremost so if t is an IQueryable then the IQueryable extension method is selected. It's the same as a variable of string being passed to an overloaded method with a string and object overload - the string version will be used because it's the best representation.
As Jeppe has pointed out, it's actually because the immediate interface is used, before inherited interfaces
t.AsEnumerable().OrderBy(r => r.Field)
C# can't see an IQueryable any more, so treats the lambda as a Func<A, B>, because that's it's next-best representation. (The equivalent of only an object method being available in my string/object analogy before.
And then finally your example:
Func<t, string> f = r => r.Field;
t.OrderBy(f);
There is no possible way that a developer writing this code can expect this to be treated as an expression for a lower-level component to translate to SQL, unless the developer fundamentally doesn't understand the difference between a delegate and an expression. If that's the case, then a little bit of reading up solves the problem.
I don't think it's unreasonable to require a developer to do a little bit of reading before they embark on using a new technology; especially when, in MSDN's defence, this particular subject is covered so well.
I realise now that by adding this edit I've now nullified the comment by #IanNewson below - but I hope it provides a compelling argument that makes sense :)

But what happens behind the scene?
Assuming db.Table1 returns an Table<Table1>, the compiler will:
Check whether Table<T> has an OrderBy method - nope
Check whether any of the base classes or interfaces it implements has an OrderBy method - nope
Start looking at extension methods
It will find both Queryable.OrderBy and Enumerable.OrderBy as extension methods which match the target type, but the Queryable.OrderBy method isn't applicable, so it uses Enumerable.OrderBy instead.
So you can think of it as if the compiler has rewritten your code into:
List<Table1> t = Enumerable.ToList(Enumerable.OrderBy(db.Table1, f));
Now at execution time, Enumerable.OrderBy will iterate over its source (db.Table1) and perform the appropriate ordering based on the key extraction function. (Strictly speaking, it will immediately return an IEnumerable<T> which will iterate over the source when it's asked for the first result.)

The queryable returns all records (hence no WHERE clause in the SQL statement), and then the Func is applied to the objects in the client's memory, through Enumerable.OrderBy. More specifically, the OrderBy call resolves to Enumerable.OrderBy because the parameter is a Func. You can therefore rewrite the statement using static method call syntax, to make it a bit clearer what's going on:
Func<Table1, string> f = x => x.Name;
var t = Enumerable.OrderBy(db.Table1, f).ToList();
The end result is that the sort specified by OrderBy is done by the client process rather than by the database server.

This answer is a kind of comment to Andras Zoltan's answer (but this is too long to fit in the comment format).
Zoltan's answer is interesting and mostly correct, except the phrase C# sees the lambda as an Expression<> first and foremost [...].
C# sees a lambda (and any anonymous function) as equally "close" to a delegate and the Expression<> (expression tree) of that same delegate. According to the C# specification, neither is a "better conversion target".
So consider this code:
class C
{
public void Overloaded(Expression<Func<int, int>> e)
{
Console.WriteLine("expression tree");
}
public void Overloaded(Func<int, int> d)
{
Console.WriteLine("delegate");
}
}
Then:
var c = new C();
c.Overloaded(i => i + 1); // will not compile! "The call is ambiguous ..."
So the reason why it works with IQueryable<> is something else. The method defined by the direct interface type is preferred over the method defined in the base interface.
To illustrate, change the above code to this:
interface IBase
{
void Overloaded(Expression<Func<int, int>> e);
}
interface IDerived : IBase
{
void Overloaded(Func<int, int> d);
}
class C : IDerived
{
public void Overloaded(Expression<Func<int, int>> e)
{
Console.WriteLine("expression tree");
}
public void Overloaded(Func<int, int> d)
{
Console.WriteLine("delegate");
}
}
Then:
IDerived x = new C();
x.Overloaded(i => i + 1); // compiles! At runtime, writes "delegate" to the console
As you see, the member defined in IDerived is chosen, not the one defined in IBase. Note that I reversed the situation (compared to IQueryable<>) so in my example the delegate overload is defined in the most derived interface and is therefore preferred over the expression tree overload.
Note: In the IQueryable<> case the OrderBy methods in question are not ordinary instance methods. Instead, one is an extension method to the derived interface, and the other is an extension method to the base interface. But the explanation is similar.

Related

Do interfaces not care about the name of the method when using Lambda expressions?

I guess I am a little confused by a couple of lines of code I have seen.
The first one:
IEnumerable<SignalViewModel> SelectionQuery =
from SignalViewModel svm in AvailableSignalsListView.SelectedItems
orderby AvailableSignalsListView.Items.IndexOf(svm)
select svm;
_GraphViewerViewModel.SelectAvailableSignals(SelectionQuery);
makes sense to me because the orderby defined here:
http://msdn.microsoft.com/en-us/library/bb534966%28v=vs.110%29.aspx
takes a source and a key selector. I am guessing that the query portion
orderby AvailableSignalsListView.Items.IndexOf(svm) creates a lambda expression and calls
Enumerable.OrderBy<TSource, TKey> Method (IEnumerable<TSource>, Func<TSource, TKey>)
with the IEnumerable<TSource> being AvailableSignalsListView.SelectedItems
and the Func<TSource, TKey> being a lambda expression created from the query statement along the lines of
svm => AvailableSignalsListView.Items.IndexOf(svm);
This one makes sense to me though i'm uncertain if my understanding of the details is correct.
The second one,
List<SignalViewModel> mySelectedItems = new List<SignalViewModel>();
//sort version
foreach (SignalViewModel svm in AvailableSignalsListView.SelectedItems)
{
mySelectedItems.Add(svm);
}
mySelectedItems.Sort((x,y) => AvailableSignalsListView.Items.IndexOf(x).CompareTo(AvailableSignalsListView.Items.IndexOf(y)));
doesn't make sense because sort takes an icomparer. I can accept that an icomparer only needs a function as defined by its interface (that takes two items and returns an int). However, how does the computer know what function name to give the delegate behind the scenes? or do interfaces not actually care about the names of functions?

There are several overloads to List.Sort. One that takes an IComparer interface, and one that takes a Comparison delegate. This is using the latter.
If you had a method that only accepted an interface you couldn't pass in a delegate; that wouldn't work.

When is ObjectQuery really an IOrderedQueryable?

Applied to entity framework, the extension methods Select() and OrderBy() both return an ObjectQuery, which is defined as:
public class ObjectQuery<T> : ObjectQuery, IOrderedQueryable<T>,
IQueryable<T>, <... more interfaces>
The return type of Select() is IQueryable<T> and that of OrderBy is IOrderedQueryable<T>. So you could say that both return the same type but in a different wrapper. Luckily so, because now we can apply ThenBy after OrderBy was called.
Now my problem.
Let's say I have this:
var query = context.Plots.Where(p => p.TrialId == 21);
This gives me an IQueryable<Plot>, which is an ObjectQuery<Plot>. But it is also an IOrderedQueryable:
var b = query is IOrderedQueryable<Plot>; // True!
But still:
var query2 = query.ThenBy(p => p.Number); // Does not compile.
// 'IQueryable<Plot>' does not contain a definition for 'ThenBy'
// and no extension method 'ThenBy' ....
When I do:
var query2 = ((IOrderedQueryable<Plot>)query).ThenBy(p => p.Number);
It compiles, but gives a runtime exception:
Expression of type 'IQueryable`1[Plot]' cannot be used for parameter of type 'IOrderedQueryable`1[Plot]' of method 'IOrderedQueryable`1[Plot] ThenBy[Plot,Nullable`1](IOrderedQueryable`1[Plot], Expressions.Expression`1[System.Func`2[Plot,System.Nullable`1[System.Int32]]])'
The cast is carried out (I checked), but the parameter of ThenBy is still seen as IQueryable (which puzzles me a bit).
Now suppose some method returns an ObjectQuery<Plot> to me as IQueryable<Plot> (like Select()). What if I want to know whether it is safe to call ThenBy on the returned object. How can I figure it out if the ObjectQuery is "real" or a "fake" IOrderedQueryable without catching exeptions?

Expression Trees are genuinely good fun! (or perhaps I'm a little bit of a freak) and will likely become useful in many a developer's future if Project Roslyn is anything to go by! =)
In your case, simple inherit from MSDN's ExpressionVisitor, and override the VisitMethodCall method in an inheriting class with something to compare m.MethodInfo with SortBy (i.e. if you're not too fussy simply check the name, if you want to be fussy use reflection to grab the actual SortBy MethodInfo to compare with.
Let me know if/what you need examples of, but honestly, after copy/pasting the ExpressionVisitor you'll probably need no more than 10 lines of non-expression-tree code ;-)
Hope that helps

Although Expression Trees are good fun, wouldn't in this case the simple solution be to use OrderBy rather than ThenBy?
OrderBy is an extension on IQueryable and returns an IOrderedQueryable.
ThenBy is an extension on IOrderedQueryable and returns an IOrderedQueryable.
So if you have a IQueryable (as in your case above, where query is an IQueryable) and you want to apply an initial ordering to it, use OrderBy. ThenBy is only intended to apply additional ordering to an already ordered query.
If you have a LINQ result of some kind, but you aren't sure if it is an IQueryable or an IOrderedQueryable and want to apply additional filtering to it, you could make two methods like:
static IOrderedQueryable<T, TKey> ApplyAdditionalOrdering<T, TKey>(this IOrderedQueryable<T, TKey> source, Expression<Func<T, TFilter>> orderBy)
{
return source.ThenBy(orderBy);
}
And
static IOrderedQueryable<T, TKey> ApplyAdditionalOrdering<T, TKey>(this IQueryable<T> source, Expression<Func<T, TFilter>> orderBy)
{
return source.OrderBy(orderBy);
}
The compiler will figure out the correct one to call based on the compile-time type of your query object.

Significance of AsEnumerable?

var query =
from dt1 in dtStudent.AsEnumerable()
join dt2 in dtMarks.AsEnumerable()
on dt1.Field<int>("StudentID")
equals dt2.Field<int>("StudentID")
select new StudentMark
{
StudentName = dt1.Field<string>("StudentName"),
Mark = dt2.Field<int>("Mark")
};
In the above coding, what is the significance of AsEnumerable? if the AsEnumerable doesn't exist in .NET Framework, then what would be the approach of developers to perform the above the task?

Assuming I'm interpreting it correctly, it's calling DataTableExtensions.AsEnumerable(). Without that (or something similar), you can't use LINQ to Objects as DataTable doesn't implement IEnumerable<T>, only IEnumerable.
Note that an alternative would be to use Cast<DataRow>, but that would be subtly different as that would use the DataTable's GetEnumerator directly within Cast, whereas I believe EnumerableRowCollection<TRow> does slightly more funky things with the data table. It's unlikely to show up any real changes, except possibly a slight performance difference.

The .AsEnumerable() extension is just short-hand for casting something that implements IEnumerable<T> to be IEnumerable<T>
So, if xs is int[], you can call xs.AsEnumerable() instead of (xs as IEnumerable<int>). It uses type inference to avoid needing to explicitly keying the type of xs.
Here's the code extracted by Reflector.NET:
public static IEnumerable<TSource> AsEnumerable<TSource>(
this IEnumerable<TSource> source)
{
return source;
}
But in this case I think I have to agree with Jon. It's probably from the System.Data.DataSetExtensions assembly.

an I prevent a specific type using generic restrictions

I have an overload method - the first implementation always returns a single object, the second implementation always returns an enumeration.
I'd like to make the methods generic and overloaded, and restrict the compiler from attempting to bind to the non-enumeration method when the generic type is enumerable...
class Cache
{
T GetOrAdd<T> (string cachekey, Func<T> fnGetItem)
where T : {is not IEnumerable}
{
}
T[] GetOrAdd<T> (string cachekey, Func<IEnumerable<T>> fnGetItem)
{
}
}
To be used with...
{
// The compile should choose the 1st overload
var customer = Cache.GetOrAdd("FirstCustomer", () => context.Customers.First());
// The compile should choose the 2nd overload
var customers = Cache.GetOrAdd("AllCustomers", () => context.Customers.ToArray());
}
Is this just plain bad code-smell that I'm infringing on here, or is it possible to disambiguate the above methods so that the compiler will always get the calling code right?
Up votes for anyone who can produce any answer other than "rename one of the methods".

Rename one of the methods. You'll notice that List<T> has an Add and and AddRange method; follow that pattern. Doing something to an item and doing something to a sequence of items are logically different tasks, so make the methods have different names.

This is a difficult use case to support because of how the C# compiler performs overload resolution and how it decides which method to bind to.
The first issue is that constraints are not part of the signature of a method and won't be considered for overload resolution.
The second problem you've got to overcome is that the compiler chooses the best match from the available signatures - which, when dealing with generics, generally means that SomeMethod<T>(T) will be considered a better match than SomeMethod<T>( IEnumerable<T> ) ... particularly when you've got parameters like T[] or List<T>.
But more fundamentally, you have to consider whether operating on a single value vs. a collection of values is really the same operation. If they are logically different, then you probably want to use different names just for clarity. Perhaps there are some use cases where you could argue that the semantic differences between single objects and collections of objects are not meaningful ... but in that case, why implement two different methods at all? It's unclear that method overloading is the best way to express the differences. Let's look at an example that lends to the confusion:
Cache.GetOrAdd("abc", () => context.Customers.Frobble() );
First, note that in the example above we are choosing to ignore the return parameter. Second, notice that we call some method Frobble() on the Customers collection. Now can you tell me which overload of GetOrAdd() will be called? Clearly without knowing the type that Frobble() returns it's not possible. Personally I believe that code whose semantics can't be readily inferred from the syntax should be avoided when possible. If we choose better names, this issue is alleviated:
Cache.Add( "abc", () => context.Customers.Frobble() );
Cache.AddRange( "xyz", () => context.Customers.Frobble() );
Ultimately, there are only three options to disambiguate the methods in your example:
Change the name of one of the methods.
Cast to IEnumerable<T> wherever you call the second overload.
Change the signature of one of the methods in a way that the compiler can differentiate.
Option 1 is self-evident, so I'll say no more about it.
Options 2 is also easy to understand:
var customers = Cache.GetOrAdd("All",
() => (IEnumerable<Customer>)context.Customers.ToArray());
Option 3 is more complicated. Let's look at ways we can be achieve it.
On approach is by changing the signature of the Func<> delegate, for instance:
T GetOrAdd<T> (string cachekey, Func<object,T> fnGetItem)
T[] GetOrAdd<T> (string cachekey, Func<IEnumerable<T>> fnGetItem)
// now we can do:
var customer = Cache.GetOrAdd("First", _ => context.Customers.First());
var customers = Cache.GetOrAdd("All", () => context.Customers.ToArray());
Personally, I find this option terribly ugly, unintuitive, and confusing. Introducing an unused parameter is terrible ... but, sadly it will work.
An alternative way of changing the signature (which is somewhat less terrible) is to make the return value an out parameter:
void GetOrAdd<T> (string cachekey, Func<object,T> fnGetItem, out T);
void GetOrAdd<T> (string cachekey, Func<IEnumerable<T>> fnGetItem, out T[])
// now we can write:
Customer customer;
Cache.GetOrAdd("First", _ => context.Customers.First(), out customer);
Customer[] customers;
var customers = Cache.GetOrAdd("All",
() => context.Customers.ToArray(), out customers);
But is this really better? It prevents us from using these methods as parameters of other method calls. It also makes the code less clear and less understandable, IMO.
A final alternative I'll present is to add another generic parameter to the methods which identifies the type of the return value:
T GetOrAdd<T> (string cachekey, Func<T> fnGetItem);
R[] GetOrAdd<T,R> (string cachekey, Func<IEnumerable<T>> fnGetItem);
// now we can do:
var customer = Cache.GetOrAdd("First", _ => context.Customers.First());
var customers = Cache.GetOrAdd<Customer,Customer>("All", () => context.Customers.ToArray());
So can use hints to help the compiler to choose an overload for us ... sure. But look at all of the extra work we have to do as the developer to get there (not to mention the introduced ugliness and opportunity for mistakes). Is it really worth the effort? Particularly when an easy and reliable technique (naming the methods differently) already exists to help us?

Use only one method and have it detect the IEnumerable<T> case dynamically rather than attempting the impossible via generic constraints. It would be "code smell" to have to deal with two different cache methods depending on if the object to store/retrieve is something enumerable or not. Also, just because it implements IEnumerable<T> does not mean it is necessarily a collection.

constraints don't support exclusion, which may seem frustrating at first, but is consistent and makes sense (consider, for example, that interfaces don't dictate what implementations can't do).
That being said, you could play around with the constraints of your IEnumerable overload...maybe change your method to have two generic typings <X, T> with a constraint like "where X : IEnumerable<T>" ?
ETA the following code sample:
void T[] GetOrAdd<X,T> (string cachekey, Func<X> fnGetItem)
where X : IEnumerable<T>
{
}

C# 3.0 Func/OrderBy type inference

So odd situation that I ran into today with OrderBy:
Func<SomeClass, int> orderByNumber =
currentClass =>
currentClass.SomeNumber;
Then:
someCollection.OrderBy(orderByNumber);
This is fine, but I was going to create a method instead because it might be usable somewhere else other than an orderBy.
private int ReturnNumber(SomeClass currentClass)
{
return currentClass.SomeNumber;
}
Now when I try to plug that into the OrderBy:
someCollection.OrderBy(ReturnNumber);
It can't infer the type like it can if I use a Func. Seems like to me they should be the same since the method itself is "strongly typed" like the Func.
Side Note: I realize I can do this:
Func<SomeClass, int> orderByNumber = ReturnNumber;

This could also be related to "return-type type inference" not working on Method Groups.
Essentially, in cases (like Where's predicate) where the generic parameters are only in input positions, method group conversion works fine. But in cases where the generic parameter is a return type (like Select or OrderBy projections), the compiler won't infer the appropriate delegate conversion.

ReturnNumber is not a method - instead, it represents a method group containing all methods with the name ReturnNumber but with potentially different arity-and-type signatures. There are some technical issues with figuring out which method in that method group you actually want in a very generic and works-every-time way. Obviously, the compiler could figure it out some, even most, of the time, but a decision was made that putting an algorithm into the compiler which would work only half the time was a bad idea.
The following works, however:
someCollection.OrderBy(new Func<SomeClass, int>(ReturnNumber))

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.