Linq to objects - boondoggle? - c#

I thought that LINQ to objects is great way for differed execution of joined data. In reality, it comes up as bad way to do things...
Here is what we had, having few, few hundred, 3K and 3.5K records in a,b,c,d correspondingly
IEnumerable<MyModel> data =
(from a in AList
from b in BList.Where(r => r.AId == a.Id)
from c in CList.Where(r => r.BId == b.Id)
from d in DList.Where(r => r.SomeId == myId && r.Some2Id == c.Some2Id)
// . . . . . .
Wasn't LINQ supposed to be great about doing it?
In reality, following works much faster, 60 times faster in fact
var dTemp = DList.Where(r => r.SomeId == myId).ToList();
var cTemp = CList.Where(c => dTemp.Any(d => d.Some2Id == c.Some2Id)).ToList();
IEnumerable<MyModel> data =
(from a in AList
from b in BList.Where(r => r.AId == a.Id)
from c in cTemp.Where(r => r.BId == b.Id)
// . . . . . .
And then I came across this article
Q: Is there a way to improve this query without abandoning single LINQ?
Or does this mean that LINQ to objects in form of joins need to be avoided and replaced by some sequential calls if performance is at stake?

Let's analyze the differences.
First query: you are performing a filter on BList, a filter on CListand two filters on DList, all in a deferred-execution manner. You then use a kind of a join.
Second query: you perform a static filter on DList and evaluate it, another static filter on CList based on DList and evaluate it and then a deferred-executed filter on both AList and BList.
The second query is faster because:
DList is not being looked at for useless values (due to previous filters)
CList only contains useful values due to previous filters
Anyway, both queries are wrong. Multiple from is basically a cross-join, as explained here. As #Reddog commented, the best way is to actually use Join:
var data = from a in AList
join b in BList on a.Id equals b.AId
join c in CList on b.Id equals c.BId
join d in DList on c.Some2Id equals d.Some2Id
where d.SomeId == someId;

Related

LINQ performance question when joining IEnumerable with IQueryable

I had some serious speed issues with the LINQ in this code (variable names have been changed)
var A = _service.GetA(param1, param2); // Returns Enumerable results
var results = (from b in _B.All() // _B.All() returns IQueryable
join c in _C.All() on b.Id equals c.Id // _C.All() returns IQueryable
join a in A on a.Id equals c.Id
where b.someId == id && a.boolVariable // A bool value
select new
{
...
}).ToList();
This LINQ took over 10 seconds to execute even though the number of rows in B and C tables were less than 100k.
I looked into this and by trial and error I managed to get the LINQ execution time to 200ms by changing the code to this:
var A = _service.GetA(param1, param2).Where(a => a.boolVariable); // Returns Enumerable results
var results = (from b in _B.All() // _B.All() returns IQueryable
join c in _C.All() on b.Id equals c.Id // _C.All() returns IQueryable
join a in A on a.Id equals c.Id
where b.someId == id
select new
{
...
}).ToList();
So my question is, why does this simple change have such drastic effects on the LINQ performance? The only change is that I filter the Enumerable list beforehand, the A enumerable has about 30 items before filtering and 15 after filtering.
In your first scenario: first it joins all the records in A which would take long time to join, then filters out for a.boolVariable.
In your second scenario you have a smaller subset of records for A prior to joining - of course this would take less time to join.

Query returns all results quickly, but then when call to JSON times out due to loading all unnecessary properties

I have the following query that gets a list of schools based on the criteria provided. Note: This database is very, very large with 10,000+ records. The end result is a list of 188 schools, which is exactly as we need.
return (from s in Context.Schools
join d in Context.Districts on s.DistrictID equals d.DistrictID
join r in Context.Rosters on s.SchoolID equals r.SchoolID
join te in Context.TestEvents on r.TestEventID equals te.TestEventID
join ta in Context.TestAdministrations on te.TestAdministrationID equals ta.TestAdministrationID
join sr in Context.ScoreResults on r.RosterID equals sr.RosterID into exists
from any in exists.DefaultIfEmpty()
where d.DistrictID == DistrictID
&& ta.SchoolYearID == SchoolYearID.Value
select s)
.Distinct()
.OrderBy(x => x.Name)
.ToList();
The problem is when we call return Json(Schools, JsonRequestBehavior.AllowGet); to send our schools back to the client the operation times out. It appears when stepping thorough the code that for some reason the DbContext is trying to pull in ALL of the properties for this result set, including the ones we don't need. I already have everything I need from the database in this Schools object. Why does it go back and start creating all the associated objects. Is there a way to stop this?.
This is an MVC application using EF 5 Code First.
Instead of selecting the whole entity, select a projection of only what you need:
var results = from s in Context.Schools
...
select new MyClassContainingOnlyAFewProperties {
Prop1 = s.Prop1,
Prop2 = s.Prop2,
//etc.
}
return results;
See also: What does Query Projection mean in Entity Framework?

Modularize (refactor) Linq queries

I have a few Linq queries. Semantically, they are
a join b join c join d where filter1(a) && filter2(c) && filter3(d)
a join b join c where filter1(a) && filter2(c)
a join b join c join e where filter1(a) && filter2(c) && filter4(e)
...
I want to be able to factor out the shared part:
a join b join c where filter1(a) && filter2(c)
and dynamically append join d and filter3(d)
Is there a way to do this? I am already using the Predicate Builder to dynamically build conditionals (filters).
EDIT: I am using Linq-to-SQL.
EDIT: The base query looks like:
from a in As.AsExpandable()
join b in Bs on a.Id equals b.PId
join c in Cs on b.Id equals c.PId
where filter1(a) && filter2(b) && filter3(c)
select new A { ... }
filters are predicates in Predicate Builder. The type of the query is IQueryable<A>.
Next, I'd like to join this with d
from a in BaseQuery()
join d in D on a.Id equals d.PId
Currently join d .. causes a compilation error:
The type of one of the expressions in the join clause is incorrect. Type inference failed in the call to Join
Your example is a bit vague, but it is easy to create a method that returns an IQueryable<T> and reuse that method, if that’s what you mean. Here is an example:
// Reusable method
public IQueryable<SomeObject> GetSomeObjectsByFilter(Context c)
{
return
from someObject in context.SomeObjects
where c.B.A.Amount < 1000
where c.Roles.Contains(r => r.Name == "Admin")
select someObject;
}
You can reuse this method in other places like this:
var q =
from c in GetSomeObjectsByFilter(context)
where !c.D.Contains(d => d.Items.Any(i => i.Value > 100))
select c;
Because the way IQueryable works, only the final query (the collection that you start iterating) will trigger a call to the database, which allows you to build a highly maintainable system by reusing business logic that gets effectively executed inside the database, whiteout the loss of any performance.
I do this all the time and it improves the maintainability of my code big time. It works no matter which O/RM tool you run, because there is no difference in Queryable<T> composition, between writing the query in one peace, or splitting it out to different methods.
Note that you do sometimes need some smart transformations to get the duplicate parts in a single method. Things that might help are returning grouped sets, and returning a set of a different type, than what you think you need. This sounds a bit vaque, but just post a question here at SO when you have problems splitting up a method. There are enough people here that can help you with that.
I can answer half your question easily. Linq makes it simple to append .where clauses to an existing query.
Example:
var x = db.Table1.where(w => w.field1 == nTestValue);
x = x.where(w => w.field2 == nTestValue2);
I believe you can do the joins as well but I have to go find an example in some old code. I'll look if nobody else jumps in with it soon.

has-relation in linq-statement?

I have three tables. Question, Discipline and QuestionHasDiscipline. QuestionHasDiscipline holds the relation between Question and Discipline. They all have an unique id-column to identify them.
I am trying to write a linq-statement that returns all the questions that have a certain discipline.
What I have begun doing is this:
var questions = (from q in context.Questions
where (from d in context.QuestionHasDiscipline
where d.QuestionId == q.QuestionId
) ...
But it obviously is horribly wrong. I've tried different approaches but now I turn to the greater minds.. Any suggestions?
You can use .Any() with a predicate.
from q in context.Questions
where context.QuestionHasDiscipline.Any(d => d.QuestionId == q.QuestionId)
select q;

How to Join 2 Generic IEnumerators

I'm wondering if its possible to join together IEnumerable's.
Basically I have a bunch of users and need to get their content from the database so I can search and page through it.
I'm using LINQ to SQL, my code at the moment it:
public IEnumerable<content> allcontent;
//Get users friends
IEnumerable<relationship> friends = from f in db.relationships
where f.userId == int.Parse(userId)
select f;
IEnumerable<relationship> freindData = friends.ToList();
foreach (relationship r in freindData)
{
IEnumerable<content> content = from c in db.contents
where c.userId == r.userId
orderby c.contentDate descending
select c;
// This is where I need to merge everything together
}
I hope that make some sense!
Matt
If I understand correctly what you are trying to do, why don't you try doing:
var result = from r in db.relationships
from c in db.contents
where r.userId == int.Parse(userId)
where c.userId == r.UserId
orderby c.contentDate descending
select new {
Relationship = r,
Content = c
}
This will give you an IEnumerable<T> where T is an anonymous type that has fields Relationship and Content.
If you know your users will have less than 2100 friends, you could send the keys from the data you already loaded back into the database easily:
List<int> friendIds = friendData
.Select(r => r.UserId)
.Distinct()
.ToList();
List<content> result = db.contents
.Where(c => friendIds.Contains(c.userId))
.ToList();
What happens here is that Linq translates each Id into a parameter and then builds an IN clause to do the filtering. 2100 is the maximum number of parameters that SQL server will accept... if you have more than 2100 friends, you'll have to break the ID list up and combine (Concat) the result lists.
Or, if you want a more literal answer to your question - Concat is a method that combines 2 IEnumerables together by creating a new IEnumerable which returns the items from the first and then the items from the second.
IEnumerable<content> results = Enumerable.Empty<content>();
foreach (relationship r in friendData)
{
IEnumerable<content> content = GetData(r);
results = results.Concat(content);
}
If you're doing an INNER join, look at the .Intersect() extension method.
Which things are you merging?
There are two main options you could use: .SelectMany(...) or .Concat(...)

Categories