Calling a method inside a Linq query - c#

I want to insert into my table a column named 'S' that will get some string value based on a value it gets from a table column.
For example: for each ID (a.z) I want to gets it's string value stored in another table. The string value is returned from another method that gets it through a Linq query.
Is it possible to call a method from Linq?
Should I do everything in the same query?
This is the structure of the information I need to get:
a.z is the ID in the first square in table #1, from this ID I get another id in table #2, and from that I can get my string value that I need to display under column 'S'.
var q = (from a in v.A join b in v.B
on a.i equals b.j
where a.k == "aaa" && a.h == 0
select new {T = a.i, S = someMethod(a.z).ToString()})
return q;
The line S = someMethod(a.z).ToString() causing the following error:
Unable to cast object of type 'System.Data.Linq.SqlClient.SqlColumn'
to type 'System.Data.Linq.SqlClient.SqlMethodCall'.

You have to execute your method call in Linq-to-Objects context, because on the database side that method call will not make sense - you can do this using AsEnumerable() - basically the rest of the query will then be evaluated as an in memory collection using Linq-to-Objects and you can use method calls as expected:
var q = (from a in v.A join b in v.B
on a.i equals b.j
where a.k == "aaa" && a.h == 0
select new {T = a.i, Z = a.z })
.AsEnumerable()
.Select(x => new { T = x.T, S = someMethod(x.Z).ToString() })

You'll want to split it up into two statements. Return the results from the query (which is what will hit the database), and then enumerate the results a second time in a separate step to transform the translation into the new object list. This second "query" won't hit the database, so you'll be able to use the someMethod() inside it.
Linq-to-Entities is a bit of a strange thing, because it makes the transition to querying the database from C# extremely seamless: but you always have to remind yourself, "This C# is going to get translated into some SQL." And as a result, you have to ask yourself, "Can all this C# actually get executed as SQL?" If it can't - if you're calling someMethod() inside it - your query is going to have problems. And the usual solution is to split it up.
(The other answer from #BrokenGlass, using .AsEnumerable(), is basically another way to do just that.)

That is an old question, but I see nobody mention one "hack", that allows to call methods during select without reiterating. Idea is to use constructor and in constructor you can call whatever you wish (at least it works fine in LINQ with NHibernate, not sure about LINQ2SQL or EF, but I guess it should be the same).
Below I have source code for benchmark program, it looks like reiterating approach in my case is about twice slower than constructor approach and I guess there's no wonder - my business logic was minimal, so things like iteration and memory allocation matters.
Also I wished there was better way to say, that this or that should not be tried to execute on database,
// Here are the results of selecting sum of 1 million ints on my machine:
// Name Iterations Percent
// reiterate 294 53.3575317604356%
// constructor 551 100%
public class A
{
public A()
{
}
public A(int b, int c)
{
Result = Sum(b, c);
}
public int Result { get; set; }
public static int Sum(int source1, int source2)
{
return source1 + source2;
}
}
class Program
{
static void Main(string[] args)
{
var range = Enumerable.Range(1, 1000000).ToList();
BenchmarkIt.Benchmark.This("reiterate", () =>
{
var tst = range
.Select(x => new { b = x, c = x })
.AsEnumerable()
.Select(x => new A
{
Result = A.Sum(x.b, x.c)
})
.ToList();
})
.Against.This("constructor", () =>
{
var tst = range
.Select(x => new A(x, x))
.ToList();
})
.For(60)
.Seconds()
.PrintComparison();
Console.ReadKey();
}
}

Related

C# LINQ executing the same work over and over

Came across some legacy code where the logic attempts to prevent un-necessary multiple calls to an expensive query GetStudentsOnCourse(), but fails due to a misunderstanding of deferred execution.
var students = studentsToRemoveRecords.Select(x => x.CourseId)
.Distinct()
.SelectMany(c => studentRepository.GetStudentsOnCourse(c.Value));
var studentsToRemove = new List<Student>();
foreach (var record in studentsToRemoveRecords)
{
studentsToRemove.Add(
students.Single(s => s.Id == record.StudentId));
}
Here, if there are 2 records for the same course in studentsToRemoveRecords, the query GetStudentsOnCourse() will needlessly be called twice (with the same course id) instead of once.
You can solve this by converting students to a list beforehand and forcing it to memory (preventing the execution from being deferred). Or by simply rewriting the logic into something a bit simpler.
But I then realised I actually struggle to put into words exactly why GetStudentsOnCourse() is called twice in the scenario above... is it that LINQ is repeating the same work everytime studentsToRemoveRecords is iterated over, even though the resulting input values are identical each time?
is it that LINQ is repeating the same work everytime studentsToRemoveRecords is iterated over, even though the resulting input values are identical each time?
Yes, that's the nature of LINQ. Some Visual Studio Extensions, like ReSharper, give you warnings when you create code that might lead to multiple iterations of a LINQ Query.
If you want to avoid it, do this:
var students = studentsToRemoveRecords.Select(x => x.CourseId)
.Distinct()
.SelectMany(c => studentRepository.GetStudentsOnCourse(c.Value))
.ToList();
With ToList() the Query is executed immediately and the resulting entities are stored in a List<T>. Now you can iterate several times over students without having performance issues.
Edit to include comments:
Here is a link to some good documentation about it (thank you Sergio): LINQ Documentation
And some thoughts about your question how to handle this in a large code base:
Well, there are reasons for both scenarios - direct execution and storing the result into a new list, and deferred execution.
If you are familiar with SQL databases, you can think of a LINQ Query like a View or a Stored Procedure. You define what filtering/altering you want to execute on a base table to get the resulting entities. And each time you query that View/execute that Stored Procedure, it runs based on the current data in the base table.
Same for LINQ. Your Query (without ToList()) was just like the definition of the View. And each time you iterate over it, that definition gets executed based on the current Entities in studentsToRemoveRecords at that moment.
And maybe that's your intetion. Maybe you know that this base list is altering and you want to execute your query several times, expecting different results. Then do it without ToList().
But when you want to execute your query only once and then expect an immutable result list over which you can iterate multiple times, do it with ToList().
So both Scenarios are valid. And when you iterate only once, both scenarios are equal (disclaimer: when you iterate directly after defining the query). Maybe that's why you saw it so many times like this. It depends what you want.
Unclear exactly how your classes are done, BUT:
public class Student
{
public int Id { get; set; }
}
public class StudentCourse
{
public int StudentId { get; set; }
public int? CourseId { get; set; }
}
public class StudentRepository
{
public StudentCourse[] StudentCourses = new[]
{
new StudentCourse { CourseId = 1, StudentId = 100 },
new StudentCourse { CourseId = 2, StudentId = 200 },
new StudentCourse { CourseId = 3, StudentId = 300 },
new StudentCourse { CourseId = 4, StudentId = 400 },
};
public Student[] GetStudentsOnCourse(int courseId)
{
Console.WriteLine($"{nameof(GetStudentsOnCourse)}({courseId})");
return StudentCourses.Where(x => x.CourseId == courseId).Select(x => new Student { Id = x.StudentId }).ToArray();
}
}
and then
static void Main(string[] args)
{
var studentRepository = new StudentRepository();
var studentsToRemoveRecords = studentRepository.StudentCourses.ToArray();
var students = studentsToRemoveRecords.Select(x => x.CourseId)
.Distinct()
.SelectMany(c => studentRepository.GetStudentsOnCourse(c.Value));
//.ToArray();
var studentsToRemove = new List<Student>();
foreach (var record in studentsToRemoveRecords)
{
studentsToRemove.Add(
students.Single(s => s.Id == record.StudentId));
}
}
the method is called 16 times, with .ToArray() it is called 4 times. Note that .Single() will parse the full students collection to check that there is a single student with the "right" Id. Compare it with First() that will break after finding one record with the right Id (10 total calls of the method). As I've said in my comment, the method is called studentsToRemoveRecords.Count() * studentsToRemoveRecords.Distinct().Count(), so something like x ^ 2. Doing a .ToArray() "memoizes" the result of the GetStudentsOnCourse.
Just out of curiosity, you can add this class to your code:
public static class Tools
{
public static IEnumerable<T> DebugEnumeration<T>(this IEnumerable<T> enu)
{
Console.WriteLine("Begin Enumeration");
foreach (var res in enu)
{
yield return res;
}
}
}
and then do:
.SelectMany(c => studentRepository.GetStudentsOnCourse(c.Value))
.DebugEnumeration();
This will show you when the SelectMany is enumerated.

Why is Entity Framework having performance issues when calculating a sum

I am using Entity Framework in a C# application and I am using lazy loading. I am experiencing performance issues when calculating the sum of a property in a collection of elements. Let me illustrate it with a simplified version of my code:
public decimal GetPortfolioValue(Guid portfolioId) {
var portfolio = DbContext.Portfolios.FirstOrDefault( x => x.Id.Equals( portfolioId ) );
if (portfolio == null) return 0m;
return portfolio.Items
.Where( i =>
i.Status == ItemStatus.Listed
&&
_activateStatuses.Contains( i.Category.Status )
)
.Sum( i => i.Amount );
}
So I want to fetch the value for all my items that have a certain status of which their parent has a specific status as well.
When logging the queries generated by EF I see it is first fetching my Portfolio (which is fine). Then it does a query to load all Item entities that are part of this portfolio. And then it starts fetching ALL Category entities for each Item one by one. So if I have a portfolio that contains 100 items (each with a category), it literally does 100 SELECT ... FROM categories WHERE id = ... queries.
So it seems like it's just fetching all info, storing it in its memory and then calculating the sum. Why does it not do a simple join between my tables and calculate it like that?
Instead of doing 102 queries to calculate the sum of 100 items I would expect something along the lines of:
SELECT
i.id, i.amount
FROM
items i
INNER JOIN categories c ON c.id = i.category_id
WHERE
i.portfolio_id = #portfolioId
AND
i.status = 'listed'
AND
c.status IN ('active', 'pending', ...);
on which it could then calculate the sum (if it is not able to use the SUM directly in the query).
What is the problem and how can I improve the performance other than writing a pure ADO query instead of using Entity Framework?
To be complete, here are my EF entities:
public class ItemConfiguration : EntityTypeConfiguration<Item> {
ToTable("items");
...
HasRequired(p => p.Portfolio);
}
public class CategoryConfiguration : EntityTypeConfiguration<Category> {
ToTable("categories");
...
HasMany(c => c.Products).WithRequired(p => p.Category);
}
EDIT based on comments:
I didn't think it was important but the _activeStatuses is a list of enums.
private CategoryStatus[] _activeStatuses = new[] { CategoryStatus.Active, ... };
But probably more important is that I left out that the status in the database is a string ("active", "pending", ...) but I map them to an enum used in the application. And that is probably why EF cannot evaluate it? The actual code is:
... && _activateStatuses.Contains(CategoryStatusMapper.MapToEnum(i.Category.Status)) ...
EDIT2
Indeed the mapping is a big part of the problem but the query itself seems to be the biggest issue. Why is the performance difference so big between these two queries?
// Slow query
var portfolio = DbContext.Portfolios.FirstOrDefault(p => p.Id.Equals(portfolioId));
var value = portfolio.Items.Where(i => i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status))
.Select(i => i.Amount).Sum();
// Fast query
var value = DbContext.Portfolios.Where(p => p.Id.Equals(portfolioId))
.SelectMany(p => p.Items.Where(i =>
i.Status == ItemStatusConstants.Listed &&
_activeStatuses.Contains(i.Category.Status)))
.Select(i => i.Amount).Sum();
The first query does a LOT of small SQL queries whereas the second one just combines everything into one bigger query. I'd expect even the first query to run one query to get the portfolio value.
Calling portfolio.Items this will lazy load the collection in Items and then execute the subsequent calls including the Where and Sum expressions. See also Loading Related Entities article.
You need to execute the call directly on the DbContext the Sum expression can be evaluated database server side.
var portfolio = DbContext.Portfolios
.Where(x => x.Id.Equals(portfolioId))
.SelectMany(x => x.Items.Where(i => i.Status == ItemStatus.Listed && _activateStatuses.Contains( i.Category.Status )).Select(i => i.Amount))
.Sum();
You also have to use the appropriate type for _activateStatuses instance as the contained values must match the type persisted in the database. If the database persists string values then you need to pass a list of string values.
var _activateStatuses = new string[] {"Active", "etc"};
You could use a Linq expression to convert enums to their string representative.
Notes
I would recommend you turn off lazy loading on your DbContext type. As soon as you do that you will start to catch issues like this at run time via Exceptions and can then write more performant code.
I did not include error checking for if no portfolio was found but you could extend this code accordingly.
Yep CategoryStatusMapper.MapToEnum cannot be converted to SQL, forcing it to run the Where in .Net. Rather than mapping the status to the enum, _activeStatuses should contain the list of integer values from the enum so the mapping is not required.
private int[] _activeStatuses = new[] { (int)CategoryStatus.Active, ... };
So that the contains becomes
... && _activateStatuses.Contains(i.Category.Status) ...
and can all be converted to SQL
UPDATE
Given that i.Category.Status is a string in the database, then
private string[] _activeStatuses = new[] { CategoryStatus.Active.ToString(), ... };

How to pass variables to Expression Func Projection before compile

I am trying to write Expression Functions for my projections. I found a good article about that link. But I couldn't figure out how can I pass variables to these functions.
How can I write a projection function for this one
int i = 3;
var query = await _db.Users
.FilterByName(name)
.Select(item => new SearchResultItemViewModel
{
Id = item.Id,
Article = item.FirstName + i.ToString()
});
}))
This one is working. In select SQL string has only Id and Firstname but I can't pass any variable.
var query = await _db.Users
.FilterByName(name)
.Select(item => SearchResultItemViewModel.Projection)
public static Expression<Func<ApplicationUser, SearchResultItemViewModel>> Projection
{
get
{
return item => new SearchResultItemViewModel
{
Id = item.Id,
Article = item.FirstName
};
}
}
This one is working only if you call compile and invoke. SQL string has all rows. Leading to bad performance
var query = await _db.Users
.FilterByName(name)
.Select(item => SearchResultItemViewModel.Projection.Compile().Invoke(item,i))
public static Expression<Func<ApplicationUser,int, SearchResultItemViewModel>> Projection
{
get
{
return( item,i) => new SearchResultItemViewModel
{
Id = item.Id,
Article = item.FirstName + i.ToString()
};
}
}
I don't use EF so this might vary in your specific use case, but this seems to be a question about LINQ Expressions more than anything else.
The first big problem is that you are trying to use an Expression<Func<ApplicationUser, int, SearchResultItemViewModel>> where you really meant Expression<Func<ApplicationUser, SearchResultItemViewModel>> and that's not going to do what you want. Instead of binding to a variable you're invoking the indexed variant of Select. So instead of getting the same value of i for all rows you get the index of the row.
When you create an expression that references a variable, one of two things happens. For local variables (and parameters) the value is copied to an anonymous class instance which is bound to the expression, so you can't change it afterwards. For other variables the expression contains a reference to the variable itself, as well as the containing object for non-static variables.
Which means that you could in principle use a static variable to alter the parameter and never have to recreate the projection expression. There are certainly times when this is useful.
On the other hand, your code above is creating a new instance each time you access the Projection property. So why not just change it to a function and generate the expression you need when you need it?
public static Expression<Func<ApplicationUser, SearchResultItemViewModel>> Projection(int parm)
=> item => new SearchResultItemViewModel
{
Id = item.Id,
Article = item.FirstName + parm.ToString()
};
Each time you invoke the method you'll get back an expression that uses the specified value.
Or you could use an expression visitor to take a template expression and modify the constants in it to whatever you need at the time. Fun, but a bit beyond scope here I think.

Linq with boolean function to relational db in Entity Framework

Probably a few things wrong with my code here but I'm mostly having a problem with the syntax. Entry is a model for use in Entries and contains a TimeStamp for each entry. Member is a model for people who are assigned entries and contains an fk for Entry. I want to sort my list of members based off of how many entries the member has within a given period (arbitrarily chose 30 days).
A. I'm not sure that the function I created works correctly, but this is aside from the main point because I haven't really dug into it yet.
B. I cannot figure out the syntax of the Linq statement or if it's even possible.
Function:
private bool TimeCompare(DateTime TimeStamp)
{
DateTime bound = DateTime.Today.AddDays(-30);
if (bound <= TimeStamp)
{
return true;
}
return false;
}
Member list:
public PartialViewResult List()
{
var query = repository.Members.OrderByDescending(p => p.Entry.Count).Where(TimeCompare(p => p.Entry.Select(e => e.TimeStamp));
//return PartialView(repository.Members);
return PartialView(query);
}
the var query is my problem here and I can't seem to find a way to incorporate a boolean function into a .where statement in a linq.
EDIT
To summarize I am simply trying to query all entries timestamped within the past 30 days.
I also have to emphasize the relational/fk part as that appears to be forcing the Timestamp to be IEnumerable of System.Datetime instead of simple System.Datetime.
This errors with "Cannot implicitly convert timestamp to bool" on the E.TimeStamp:
var query = repository.Members.Where(p => p.Entry.First(e => e.TimeStamp) <= past30).OrderByDescending(p => p.Entry.Count);
This errors with Operator '<=' cannot be applied to operands of type 'System.Collections.Generic.IEnumerable' and 'System.DateTime'
var query = repository.Members.Where(p => p.Entry.Select(e => e.TimeStamp) <= past30).OrderByDescending(p => p.Entry.Count);
EDIT2
Syntactically correct but not semantically:
var query = repository.Members.Where(p => p.Entry.Select(e => e.TimeStamp).FirstOrDefault() <= timeComparison).OrderByDescending(p => p.Entry.Count);
The desired result is to pull all members and then sort by the number of entries they have, this pulls members with entries and then orders by the number of entries they have. Essentially the .where should somehow be nested inside of the .count.
EDIT3
Syntactically correct but results in a runtime error (Exception Details: System.ArgumentException: DbSortClause expressions must have a type that is order comparable.
Parameter name: key):
var query = repository.Members.OrderByDescending(p => p.Entry.Where(e => e.TimeStamp <= timeComparison));
EDIT4
Closer (as this line compiles) but it doesn't seem to be having any effect on the object. Regardless of how many entries I add for a user it doesn't change the sort order as desired (or at all).
var timeComparison = DateTime.Today.AddDays(-30).Day;
var query = repository.Members.OrderByDescending(p => p.Entry.Select(e => e.TimeStamp.Day <= timeComparison).FirstOrDefault());
A bit of research dictates that Linq to Entities (IE: This section)
...var query = repository.Members.OrderByDescending(...
tends to really not like it if you use your own functions, since it will try to map to a SQL variant.
Try something along the lines of this, and see if it helps:
var query = repository.Members.AsEnumerable().Where(TimeCompare(p => p.Entry.Select(e => e.TimeStamp).OrderByDescending(p => p.Entry.Count));
Edit: I should just read what you are trying to do. You want it to grab only the ones within the last X number of days, correct? I believe the following should work, but I would need to test when I get to my home computer...
public PartialViewResult List()
{
var timeComparison = DateTime.Today.AddDays(-30);
var query = repository.Members.Where(p => p.Entry.Select(e => e.TimeStamp).FirstOrDefault() <= timeComparison).OrderByDescending(p => p.Entry.Count));
//return PartialView(repository.Members);
return PartialView(query);
}
Edit2: This may be a lack of understanding from your code, but is e the same type as p? If so, you should be able to just reference the timestamp like so:
public PartialViewResult List()
{
var timeComparison = DateTime.Today.AddDays(-30);
var query = repository.Members.Where(p => p.TimeStamp <= timeComparison).OrderByDescending(p => p.Entry.Count));
//return PartialView(repository.Members);
return PartialView(query);
}
Edit3: In Edit3, I see what you are trying to do now (I believe). You're close, but OrderByDescending would need to go on the end. Try this:
var query = repository.Members
.Select(p => p.Entry.Where(e => e.TimeStamp <= timeComparison))
.OrderByDescending(p => p.Entry.Count);
Thanks for all the help Dylan but here is the final answer:
public PartialViewResult List()
{
var timeComparison = DateTime.Today.AddDays(-30).Day;
var query = repository.Members
.OrderBy(m => m.Entry.Where(e => e.TimeStamp.Day <= timeComparison).Count());
return PartialView(query);
}

Why does this LINQ query compile?

After reading "Odd query expressions" by Jon Skeet, I tried the code below.
I expected the LINQ query at the end to translate to int query = proxy.Where(x => x).Select(x => x); which does not compile because Where returns an int. The code compiled and prints "Where(x => x)" to the screen and query is set to 2. Select is never called, but it needs to be there for the code to compile. What is happening?
using System;
using System.Linq.Expressions;
public class LinqProxy
{
public Func<Expression<Func<string,string>>,int> Select { get; set; }
public Func<Expression<Func<string,string>>,int> Where { get; set; }
}
class Test
{
static void Main()
{
LinqProxy proxy = new LinqProxy();
proxy.Select = exp =>
{
Console.WriteLine("Select({0})", exp);
return 1;
};
proxy.Where = exp =>
{
Console.WriteLine("Where({0})", exp);
return 2;
};
int query = from x in proxy
where x
select x;
}
}
It's because your "select x" is effectively a no-op - the compiler doesn't bother putting the Select(x => x) call at the end. It would if you removed the where clause though. Your current query is known as a degenerate query expression. See section 7.16.2.3 of the C# 4 spec for more details. In particular:
A degenerate query expression is one that trivially selects the elements of the source. A later phase of the translation removes degenerate queries introduced by other translation steps by replacing them with their source. It is important however to ensure that the result of a query expression is never the source object itself, as that would reveal the type and identity of the source to the client of the query. Therefore this step protects degenerate queries written directly in source code by explicitly calling Select on the source. It is then up to the implementers of Select and other query operators to ensure that these methods never return the source object itself.
So, three translations (regardless of data source)
// Query // Translation
from x in proxy proxy.Where(x => x)
where x
select x
from x in proxy proxy.Select(x => x)
select x
from x in proxy proxy.Where(x => x)
where x .Select(x => x * 2)
select x * 2
It compiles because the LINQ query syntax is a lexical substitution. The compiler turns
int query = from x in proxy
where x
select x;
into
int query = proxy.Where(x => x); // note it optimises the select away
and only then does it check whether the methods Where and Select actually exist on the type of proxy. Accordingly, in the specific example you gave, Select does not actually need to exist for this to compile.
If you had something like this:
int query = from x in proxy
select x.ToString();
then it would get changed into:
int query = proxy.Select(x => x.ToString());
and the Select method would be called.

Categories