Linq multiple where queries - c#

I have an issue building a fairly hefty linq query. Basically I have a situation whereby I need to execute a subquery in a loop to filter down the number of matches that are returned from the database. Example code is in this loop below:
foreach (Guid parent in parentAttributes)
{
var subQuery = from sc in db.tSearchIndexes
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
join pc in db.tPeopleIndexes on a.GUID equals pc.AttributeGUID
where a.RelatedGUID == parent && userId == pc.CPSGUID
select sc.CPSGUID;
query = query.Where(x => subQuery.Contains(x.Id));
}
When I subsequently call the ToList() on the query variable it appears that only a single one of the subqueries has been performed and I'm left with a bucketful of data I don't require. However this approach works:
IList<Guid> temp = query.Select(x => x.Id).ToList();
foreach (Guid parent in parentAttributes)
{
var subQuery = from sc in db.tSearchIndexes
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
join pc in db.tPeopleIndexes on a.GUID equals pc.AttributeGUID
where a.RelatedGUID == parent && userId == pc.CPSGUID
select sc.CPSGUID;
temp = temp.Intersect(subQuery).ToList();
}
query = query.Where(x => temp.Contains(x.Id));
Unfortunately this approach is nasty as it results in multiple queries to the remote database whereby the initial approach if I could get it working would only result in a single hit. Any ideas?

I think you are hitting a special case of capturing the loop variable in the lambda expression used to filter. Also known as an access to modified closure error.
Try this:
foreach (Guid parentLoop in parentAttributes)
{
var parent = parentLoop;
var subQuery = from sc in db.tSearchIndexes
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
join pc in db.tPeopleIndexes on a.GUID equals pc.AttributeGUID
where a.RelatedGUID == parent && userId == pc.CPSGUID
select sc.CPSGUID;
query = query.Where(x => subQuery.Contains(x.Id));
}
The problem is capturing the parent variable in the closure (that the LINQ syntax is converted to), which causes all the subQueryes to be run with the same parent id.
What happens is the compiler generating a class to hold the delegate and the local variables the delegate accesses. The compiler re-uses the same instance of that class for each loop; and therefore, once the query executes, all of the Wheres executes with the same parent Guid, namely the last to execute.
Declaring the parent inside the loop scope causes the compiler to essentially make a copy of the variable, with the correct value, to be captured.
This can be a bit hard to grasp at first, so if this is the first time it has hit you; I'd recommend these two articles for background and a thorough explanation:
Eric Lippert: Closing over the loop variable considered harmful and part two.
Jon Skeet: Closures

Maybe this way?
var subQuery = from sc in db.tSearchIndexes
join a in db.tAttributes on sc.AttributeGUID equals a.GUID
join pc in db.tPeopleIndexes on a.GUID equals pc.AttributeGUID
where parentAttributes.Contains(a.RelatedGUID) && userId == pc.CPSGUID
select sc.CPSGUID;

Related

C# Linq Group by Object

I have an issue of using group by in LINQ to SQL statement.
The cod I have is
var combinedItems = (from article in articles
join author in authors
on article.AuthorId equals author.Id into tempAuthors
from tempAuthor in tempAuthors.DefaultIfEmpty()
select new { article , author = tempAuthor});
var groups1 = (from combinedItem in combinedItems
group combinedItem by combinedItem.article into g
select g.Key).ToList();
var groups2 = (from combinedItem in combinedItems
group combinedItem by combinedItem.article.Id into g
select g.Key).ToList();
I tried to group in two different ways. The first way, I group by an object and the second way I just group by a field in one of the objects.
When I run groups1, I got an error saying need to evaluate in client side, while when I use groups2, it works all good. Can I ask what could be wrong? If I want to group by object, is there any way to do it?
In case you want to group by object, as you've not overridden Equals and GetHashCode() in your Article class or implemented IEqualityComparer<Article> you're just getting the default comparison, which checks if the references are equal. So what you need is something like this:
class GroupItemComparer : IEqualityComparer<Article>
{
public bool Equals(Article x, Article y)
{
return x.Id == y.Id &&
x.Name == y.Name;
}
public int GetHashCode(Article obj)
{
return obj.Id.GetHashCode() ^
obj.Name.GetHashCode();
}
}
And then you need to change your query to lambda expression:
var groups1 = combinedItems.GroupBy(c => c.article , new GroupItemComparer())
.Select(c => c.Key).ToList();
In case you got any exception regarding translation your method to SQL, you can use AsEnumerable or ToList methods before your GroupBy method, with this methods after data is loaded, any further operation is performed using Linq to Objects, on the data already in memory.
As others have pointed out, the GroupBy is using reference equality by default, and you could get around it by specifying one or more properties to group by. But why is that an error?
The whole point of the query is to translate your Linq query into SQL. Since object reference equality on the client can't be easily translated to SQL, the translator doesn't support it and gives you an error.
When you provide one or more properties to group by, the provider can translate that to SQL (e.g. GROUP BY article.Id), and thus the second method works without error.

Is it safe to join a table twice in the same query?

I need to write some linq (linq-to-sql) for a search page that allows the user to search for cars and optionally include search criteria for the car's parts. The two tables are CAR and CAR_PARTS. Here is what I have so far:
var query = db.CAR;
//if the user provides a car name to search by, filter on car name (this works)
if(model.CarName != "")
{
query = from c in query
where c.Name == model.CarName
select c;
}
//if the user provides a car part name to filter on, join the CAR_PART table
if(model.CarPartName != "")
{
query = from c in query
join parts in db.CAR_PARTS on c.ID equals parts.CarID
where parts.PartName == model.CarPartName
select c;
}
//if the user provides a car part code to filter on, join the CAR_PART table
if(model.CarPartCode != "")
{
query = from c in query
join parts in db.CAR_PARTS on c.ID equals parts.CarID
where parts.PartCode == model.CarPartCode
select c;
}
If the user decides they want to search on both CarPartName and CarPartCode, this logic would result in the CAR_PART table being joined twice. This feels wrong to me, but is this the correct way to handle this?
How would you write this?
It's legal to do so, but whether it makes sense, depends on your datamodel and your desired outcome.
Generally your code does the following if partname and partcode are defined
Join the cars table with the parts table with partname as join condition
Join the result of the first join again with the parts table with partcode as join condition.
Thus, this is equal to a join with join condition car.partname = part.name and car.partcode = part.code. I don't know, whether this is your desired behaviour or not.
There are some cases to distinguish
Joining with AND condition
CASE 1.1: name and code of a part are keys in the parts table
In this case for each name and code are each unique in the parts table, thus for each name there is exactly one code. The double join is not necessary, and may even lead to wrong results, because
if selected name and code identify the same part, it's the first join will already get the desired results
if name and code identifiy different parts, your result will be empty because the condition cannot be fullfilled.
In that situation I would suggest to write is as follows
if (!string.IsNullOrEmpty(model.CarPartName)){
// your join on partname
} else if (!string.IsNullOrEmpty(model.CarPartCode)) {
// your join on partcode
}
CASE 1.2: name and code of a part are NOT keys in the parts table
In this case, neither name nor code may be unique, and for one name there may be different codes and vice versa. Here the double join is necessary and will only return results containing parts which match both, name and code
Joining with OR condition
If on the other hand you want your join condition to be like car.partname = part.name and car.partcode = part.code you have to consider the following cases
CASE 2.1 name and code are keys
Here applies the same as above in case 1.1
CASE 2.2 name and code are NOT keys
Here you can't use the stepwise approach, because the result of the first join will only contain cars, where the name matches. There may be parts where only the code condition matches, but they can never be included in the final result, if they are not contained in the result of the first match. So in this case, you will have to define your query something like this
if (!string.IsNullOrEmpty(model.CarPartName) && !string.IsNullOrEmpty(model.CarPartCode)) {
query = from c in query
join parts in db.CAR_PARTS on c.ID equals parts.CarID
where parts.PartName == model.CarPartName || parts.PartCode == model.CarPartCode
select c;
} else if (!string.IsNullOrEmpty(model.CarPartName)) {
query = from c in query
join parts in db.CAR_PARTS on c.ID equals parts.CarID
where parts.PartName == model.CarPartName
select c;
} else if (!string.IsNullOrEmpty(model.CarPartCode)) {
query = from c in query
join parts in db.CAR_PARTS on c.ID equals parts.CarID
where parts.PartCode == model.CarPartCode
select c;
}
What is wrong in there is actually with proper relations you don't need the join at all. Add that the behavior of LinqToSQL you can write that as:
var query = db.CAR
.Where( c =>
( string.IsNullOrEmpty(model.CarName)
|| c.Name == model.CarName ) &&
( string.IsNullOrEmpty(model.CarPartName)
|| c.Parts.Any( p => p.PartName == model.CarPartName )) &&
( string.IsNullOrEmpty(model.CarPartCode)
|| c.Parts.Any( p => p.PartCode == model.CarPartCode )));
Yours would work provided query is IQueryable (db.CAR.AsQueryable()). The two Linq approaches are similar but not the same. Depending on your real necessity yours might be the correct one or the wrong one. Yours would produce two inner joins, while this one simply create 2 exists check. Assume you have:
Car, Id:5, Name: Volvo
And parts like:
CarID:5, PartName:HeadLights, PartCode:1 ... other details
CarID:5, PartName:HeadLights, PartCode:2 ... other details
CarID:5, PartName:HeadLights, PartCode:3 ... other details
If user asks with model.CarName = "Volvo" and model.PartName = "HeadLights", you would get back the same Volvo 3 times. In second approach, you get back a single Volvo.
HTH
I feel more comfortable with fluent syntax, but I'm sure something similar to the following will work for you. I would check the fields in your model as part of a Select statement and then conditionally join using one field or the other. If neither are set, leave it null.
var query = db.CAR;
if (!string.IsNullOrWhitespace(model.CarName))
{
query = query.Where(car => car.Name == model.CarName);
}
var items = query.Select(car => new
{
Car = car, // maybe better to split this up into different fields, but I don't know what the car object looks like
// I assume your Car entity model has a navigation property to parts:
CarPart = !string.IsNullOrWhitespace(model.CarPartName)
? car.Parts.FirstOrDefault(part => part.PartName == model.CarPartName)
: !string.IsNullOrWhitespace(model.CarPartCode)
? car.Parts.FirstOrDefault(part => part.PartCode == model.CarPartCode)
: null
})
.ToList();
This does mean that the Code will be ignored if the Name is filled in. Reverse it if it needs to be the other way around. Or if you want to use both fields, you can put the string null checks in the Where clause.

Query returns all results quickly, but then when call to JSON times out due to loading all unnecessary properties

I have the following query that gets a list of schools based on the criteria provided. Note: This database is very, very large with 10,000+ records. The end result is a list of 188 schools, which is exactly as we need.
return (from s in Context.Schools
join d in Context.Districts on s.DistrictID equals d.DistrictID
join r in Context.Rosters on s.SchoolID equals r.SchoolID
join te in Context.TestEvents on r.TestEventID equals te.TestEventID
join ta in Context.TestAdministrations on te.TestAdministrationID equals ta.TestAdministrationID
join sr in Context.ScoreResults on r.RosterID equals sr.RosterID into exists
from any in exists.DefaultIfEmpty()
where d.DistrictID == DistrictID
&& ta.SchoolYearID == SchoolYearID.Value
select s)
.Distinct()
.OrderBy(x => x.Name)
.ToList();
The problem is when we call return Json(Schools, JsonRequestBehavior.AllowGet); to send our schools back to the client the operation times out. It appears when stepping thorough the code that for some reason the DbContext is trying to pull in ALL of the properties for this result set, including the ones we don't need. I already have everything I need from the database in this Schools object. Why does it go back and start creating all the associated objects. Is there a way to stop this?.
This is an MVC application using EF 5 Code First.
Instead of selecting the whole entity, select a projection of only what you need:
var results = from s in Context.Schools
...
select new MyClassContainingOnlyAFewProperties {
Prop1 = s.Prop1,
Prop2 = s.Prop2,
//etc.
}
return results;
See also: What does Query Projection mean in Entity Framework?

one-to-many projected LINQ query executes repeatedly

I am projecting LINQ to SQL results to strongly typed classes: Parent and Child. The performance difference between these two queries is large:
Slow Query - logging from the DataContext shows that a separate call to the db is being made for each parent
var q = from p in parenttable
select new Parent()
{
id = p.id,
Children = (from c in childtable
where c.parentid = p.id
select c).ToList()
}
return q.ToList() //SLOW
Fast Query - logging from the DataContext shows a single db hit query that returns all required data
var q = from p in parenttable
select new Parent()
{
id = p.id,
Children = from c in childtable
where c.parentid = p.id
select c
}
return q.ToList() //FAST
I want to force LINQ to use the single-query style of the second example, but populate the Parent classes with their Children objects directly. otherwise, the Children property is an IQuerierable<Child> that has to be queried to expose the Child object.
The referenced questions do not appear to address my situation. using db.LoadOptions does not work. perhaps it requires the type to be a TEntity registered with the DataContext.
DataLoadOptions options = new DataLoadOptions();
options.LoadWith<Parent>(p => p.Children);
db.LoadOptions = options;
Please Note: Parent and Child are simple types, not Table<TEntity> types. and there is no contextual relationship between Parent and Child. the subqueries are ad-hoc.
The Crux of the Issue: in the 2nd LINQ example I implement IQueriable statements and do not call ToList() function and for some reason LINQ knows how to generate one single query that can retrieve all the required data. How do i populate my ad-hoc projection with the actual data as is accomplished in the first query? Also, if anyone could help me better-phrase my question, I would appreciate it.
It's important to remember that LINQ queries rely in deferred execution. In your second query you aren't actually fetching any information about the children. You've created the queries, but you haven't actually executed them to get the results of those queries. If you were to iterate the list, and then iterate the Children collection of each item you'd see it taking as much time as the first query.
Your query is also inherently very inefficient. You're using a nested query in order to represent a Join relationship. If you use a Join instead the query will be able to be optimized appropriately by both the query provider as well as the database to execute much more quickly. You may also need to adjust the indexes on your database to improve performance. Here is how the join might look:
var q = from p in parenttable
join child in childtable
on p.id equals child.parentid into children
select new Parent()
{
id = p.id,
Children = children.ToList(),
}
return q.ToList() //SLOW
The fastest way I found to accomplish this is to do a query that returns all the results then group all the results. Make sure you do a .ToList() on the first query, so that the second query doesn't do many calls.
Here r should have what you want to accomplish with only a single db query.
var q = from p in parenttable
join c in childtable on p.id equals c.parentid
select c).ToList();
var r = q.GroupBy(x => x.parentid).Select(x => new { id = x.Key, Children=x });
You must set correct options for your data load.
options.LoadWith<Document>(d => d.Metadata);
Look at this
P.S. Include for the LINQToEntity only.
The second query is fast precisely because Children is not being populated.
And the first one is slow just because Children is being populated.
Choose the one that fits your needs best, you simply can't have their features together!
EDIT:
As #Servy says:
In your second query you aren't actually fetching any information about the children. You've created the queries, but you haven't actually executed them to get the results of those queries. If you were to iterate the list, and then iterate the Children collection of each item you'd see it taking as much time as the first query.

Modularize (refactor) Linq queries

I have a few Linq queries. Semantically, they are
a join b join c join d where filter1(a) && filter2(c) && filter3(d)
a join b join c where filter1(a) && filter2(c)
a join b join c join e where filter1(a) && filter2(c) && filter4(e)
...
I want to be able to factor out the shared part:
a join b join c where filter1(a) && filter2(c)
and dynamically append join d and filter3(d)
Is there a way to do this? I am already using the Predicate Builder to dynamically build conditionals (filters).
EDIT: I am using Linq-to-SQL.
EDIT: The base query looks like:
from a in As.AsExpandable()
join b in Bs on a.Id equals b.PId
join c in Cs on b.Id equals c.PId
where filter1(a) && filter2(b) && filter3(c)
select new A { ... }
filters are predicates in Predicate Builder. The type of the query is IQueryable<A>.
Next, I'd like to join this with d
from a in BaseQuery()
join d in D on a.Id equals d.PId
Currently join d .. causes a compilation error:
The type of one of the expressions in the join clause is incorrect. Type inference failed in the call to Join
Your example is a bit vague, but it is easy to create a method that returns an IQueryable<T> and reuse that method, if that’s what you mean. Here is an example:
// Reusable method
public IQueryable<SomeObject> GetSomeObjectsByFilter(Context c)
{
return
from someObject in context.SomeObjects
where c.B.A.Amount < 1000
where c.Roles.Contains(r => r.Name == "Admin")
select someObject;
}
You can reuse this method in other places like this:
var q =
from c in GetSomeObjectsByFilter(context)
where !c.D.Contains(d => d.Items.Any(i => i.Value > 100))
select c;
Because the way IQueryable works, only the final query (the collection that you start iterating) will trigger a call to the database, which allows you to build a highly maintainable system by reusing business logic that gets effectively executed inside the database, whiteout the loss of any performance.
I do this all the time and it improves the maintainability of my code big time. It works no matter which O/RM tool you run, because there is no difference in Queryable<T> composition, between writing the query in one peace, or splitting it out to different methods.
Note that you do sometimes need some smart transformations to get the duplicate parts in a single method. Things that might help are returning grouped sets, and returning a set of a different type, than what you think you need. This sounds a bit vaque, but just post a question here at SO when you have problems splitting up a method. There are enough people here that can help you with that.
I can answer half your question easily. Linq makes it simple to append .where clauses to an existing query.
Example:
var x = db.Table1.where(w => w.field1 == nTestValue);
x = x.where(w => w.field2 == nTestValue2);
I believe you can do the joins as well but I have to go find an example in some old code. I'll look if nobody else jumps in with it soon.

Categories