Linq query which retrieved parent/child of children with a collection - c#

I have a self referencing Category class from which I would like to retrieve parent categories and all corresponding children if it has at least one child category and has at least 1 or more activities (ICollection<Activity>) in the collection.
This would also go for children of children as these should only be returned if there are children categories with at least 1 or more activities.
If there are no child categories with at least 1 or more activities the parent or child Category should not be returned.
The query should return the parent Category as an actual Category object and not just the CategoryId. It this possible?
public class Category
{
public int CategoryId { get; set; }
public string Name { get; set; }
public int? ParentId { get; set; }
public virtual Category Parent { get; set; }
public virtual ICollection<Category> Children { get; set; }
public virtual ICollection<Activity> Activities { get; set; }
}
UPDATE 1
The query which partially works:
var categories = _db.Categories
.Where(x => x.Parent != null && x.Activities.Count > 0)
.GroupBy(x => x.ParentId)
.Select(g => new { Parent = g.Key, Children = g.ToList() }).ToList();

Let's start off a bit smaller, since the query you are looking to create is somewhat complex. We will create your query from the bottom up. First off, you want to eliminate categories that do not have any child categories with at least one or more activities. Let's make a Predicate to return true for those that should be included and false for those that should be excluded, at a single level. We will do this in two stages. First, let's make a predicate that returns true for categories that have activities:
Predicate<Category> hasActivities = cat => cat.Activities.Any();
Second, let's make a Predicate to return true for those categories with child categories that have activities:
Predicate<Category> hasChildWithActivities =
parentCat => parentCat.Children.Any(hasActivities);
Now let's create the filter query that will filter a given Category's descendants. To do this, we will create a Func that takes a parent Category, performs the logic and returns the updated Category:
Func<Category, Category> getFilteredCategory =
parentCat =>
{
parentCat.Children = parentCat.Children
.Where(hasChildWithActivities)
.Select(getFilteredCategory);
return parentCat;
});
Note that this is equivalent to:
Func<Category, Category> getFilteredCategory = delegate(Category parentCat)
{
parentCat.Children = parentCat.Children
.Where(hasChildWithActivities)
.Select(getFilteredCategory);
return parentCat;
};
In your OP, you mentioned that you wanted to filter parents as well. You can use this same logic on the parents by traversing up to the top level and running this query, or by creating a separate query with "joins" or more complex "select" statements. IMHO, the latter would likely be messy and I would advise against it. If you need to apply the logic to parents as well, then first traverse up the tree. Either way, this should give you a good start.
Let me know if you have any questions. Good luck and happy coding! :)

Related

How to retrieve hierarchical data in inorder traversal?

I have an element that represents a node in a tree structure.
public class Element
{
public int Id { get; set; }
...
public Element Left { get; set; }
public Element Right { get; set; }
}
I am maintaining a table with all those elements by foreign keys to the child elements.
If I try to get this tree back by using eager loading, I get the tree in postorder traversal:
public string GetExpression(int rootId)
{
var root = _context.Set<Element>()
.Include(r => r.Left)
.Include(r => r.Right)
.ToList();
}
Is there a way using queries to get the elements in inorder traversal? Or do I have to do this by myself recursively?
The order that records are returned in a query is not defined unless you have an ORDERBY clause. So it is just luck that the example you gave returns them in a post-order.
I would suggest that you simply define in-order and post-order traversal methods and invoke them after you have loaded the entire set. You can omit the two .Include statements since EF will patch up the navigation relations during the load.

GetAllWithChildren() performance issue

I used SQLite-Net Extensions
in the following code to retrieve 1000 rows with their children relationships from an Sqlite database:
var list =
SQLiteNetExtensions.Extensions.ReadOperations.GetAllWithChildren<DataModel>(connection);
The problem is that the performance is awkward. Because GetAllWithChildren() returns a List not an Enumerable. Does exist any way to load the records in to an Enumerable using Sqlite.net extensions?
I now use Table() method from Sqlite.net, loads the fetched rows in to the Enumerable but I dont want to use it because it does not understand the relationships and does not load the children entities at all.
GetAllWithChildren suffers from the N+1 problem, and in your specific scenario this performs specially bad. It's not clear in your question what you're trying, but you could try these solutions:
Use the filterparameter in GetAllWithChildren:
Instead of loading all the objects to memory and then filter, you can use the filter property, that internally performs a Table<T>().Where(filter) query, and SQLite-Net will convert to a SELECT-WHERE clause, so it's very efficient:
var list = connection.GetAllWithChildren<DataModel>(d => d.Name == "Jason");
Perform the query and then load the relationships
If you look at the GetAllWithChildren code you'll realize that it just performs the query and then loads the existing relationships. You can do that by yourself to avoid automatically loading unwanted relationships:
// Load elements from database
var list = connection.Table<DataModel>().Where(d => d.Name == "Jason").toList();
// Iterate elements and load relationships
foreach (DataModel element in list) {
connection.GetChildren(element, recursive = false);
}
Load relationships manually
To completely workaround the N+1 problem you can manually fetch relationships using a Contains filter with the foreign keys. This highly depends on you entity model, but would look like this:
// Load elements from database
var list = connection.Table<DataModel>().Where(d => d.Name == "Jason").toList();
// Get list of dependency IDs
var dependencyIds = list.Select(d => d.DependencyId).toList();
// Load all dependencies from database on a single query
var dependencies = connection.Table<Dependency>.Where(d => dependencyIds.Contains(d.Id)).ToList();
// Assign relationships back to the elements
foreach (DataModel element in list) {
element.Dependency = dependencies.FirstOrDefault(d => d.Id == element.DependencyId);
}
This solution solves the N+1 problem, because it performs only two database queries.
Another method to load relationships manually
Imagine we have these classes:
public class Parent
{
[PrimaryKey, AutoIncrement] public int Id { get; set; }
public string Name { get; set; }
public List<Child> children { get; set; }
public override bool Equals(object obj)
{
return obj != null && Id.Equals(((BaseModel) obj).Id);
}
public override int GetHashCode()
{
return Id.GetHashCode();
}
}
and
public class Child
{
[PrimaryKey, AutoIncrement] public int Id { get; set; }
public string Name { get; set; }
public int ParentId { get; set; }
}
Hint these classes have one-to-many relation. Then inner join between them would be:
var parents = databaseSync.Table<Parent>().ToList();
var children = databaseSync.Table<Child>().ToList();
List<Parent> parentsWithChildren = parents.GroupJoin(children, parent => parent.Id, child => child.ParentId,
(parent, children1) =>
{
parent.children = children1.ToList();
return parent;
}).Where(parent => parent.children.Any()).ToList();

Select Subset in Linq

I am trying to write a linq query which will exclude any records that have a child record with a certain integer ID.
The class I am querying against looks like:
public class Group {
public ICollection<Item> { get; set; } // This is the child collection
}
public class Item {
public int Id { get; set; }
}
My repository query method is:
public ICollection<Group> Get(int itemId) {
return from c in Set.... // Set is an EF collection of all Groups
}
I want to return all Groups that do not have an Item in their Items collection with the Id equal to the itemId passed to the method.
Not sure how to write this most efficiently in Linq.
This will work (I'm using method syntax though as I prefer method syntax above query syntax for anything other than joins):
var result = db.Groups.Where(g => !g.Items.Any(i => i.Id == itemID)).ToList();
Select all groups which don't contain an item with an Id equal to itemID. By the way I notice you have Set in your code? Does this mean you already fetched all the groups beforehand or something (so filtering in memory)? The easiest way is to work with your DbContext and access your tables from there.

Linq EF Split Parent into multiple Parents

Using Entity Framework to query a database with a Parent table and Child table with a 1-n relationship:
public class Parent {
public int id { get; set; }
public IList<Child> Children { get; set; }
}
public class Child {
public int id { get; set; }
}
Using EF, here's a quick sample query:
var parents = context.Parents;
Which returns:
parent id = 1, children = { (id = 1), (id = 2), (id = 3) }
What we need is for this to flatten into a 1-1 relationship, but as a list of parents with a single child each:
parent id = 1, children = { (id = 1) }
parent id = 1, children = { (id = 2) }
parent id = 1, children = { (id = 3) }
We're using an OData service layer which hits EF. So performance is an issue -- don't want it to perform a ToList() or iterate the entire result for example.
We've tried several different things, and the closest we can get is creating an anonymous type like such:
var results = from p in context.Parents
from c in p.Children
select new { Parent = p, Child = c }
But this isn't really what we're looking for. It creates an anonymous type of parent and child, not parent with child. So we can't return an IEnumerable<Parent> any longer, but rather an IEnumerable<anonymous>. The anonymous type isn't working with our OData service layer.
Also tried with SelectMany and got 3 results, but all of Children which again isn't quite what we need:
context.Parents.SelectMany(p => p.Children)
Is what we're trying to do possible? With the sample data provided, we'd want 3 rows returned -- representing a List each with a single Child. When normally it returns 1 Parent with 3 Children, we want the Parent returned 3 times with a single child each.
Your requirements don't make any sense, the idea behind how EF and LINQ work is not those repetitive info like SQL does. But you know them better and we don't know the whole picture, so I will try to answer your question hoping I understood it correctly.
If like you said, your problem is that IEnumerable<anonymous> doesn't work with your OData service layer, then create a class for the relationship:
public class ParentChild {
public Parent Parent { get; set; }
public Child Child { get; set; }
}
And then you can use in in your LINQ query:
var results = from p in context.Parents
from c in p.Children
select new ParentChild { Parent = p, Child = c }

Avoiding duplicates in hierarchical parent-child relational collection

I am looking to write linq statement for a simple scenario of collections. I am trying to avoid duplicate items in collection based on parent child relationship. The data structure and sample code is below
public class Catalog
{
public int CatalogId { get; set; }
public int ParentCatalogId { get; set; }
public string CatalogName { get; set; }
}
public class Model
{
public int CatalogId { get; set; }
public string ItemName { get; set; }
...
}
List<Catalog> Catalogs : Contains the complete list of parent child relations to any level of all the catalogs and the root one with ParentCatalogid=null
List<Model> CollectionA : Contains all the items of child as well as parent catalog for a specific catalogId (till its root).
I need to create a CollectionB from CollectionA that will contain items of the provided catalogId including all the items of all the parents such that if item is present in child catalog, i need to ignore same item in parent catalog. In this way there wont be any duplicate Items if same items is available in child as well as parent.
In terms of code I am trying to achieve something like this
while (catalogId!= null)
{
CollectionB.AddRange(
CollectionA.Where(x => x.CatalogId == catalogId &&
!CollectionB.Select(y => y.ItemName).Contains(x.ItemName)));
// Starting from child to parent and ignoring items that are already in CollectionB
catalogId = Catalogs.
Where(x => x.Id == catalogId).
Select(x => x.ParentCatalogId).
FirstOrDefault();
}
I know that Contains clause in linq in above statement will not work but just put that statement to explain what i am trying to do. I can do that using foreach loop but just want to use linq. I am looking for correct linq statement to do this. The sample data is given below and will really appreciate if i can get some help
Catalog
ID ParenId CatalogName
1 null CatalogA
2 1 Catalogb
3 1 CatalogC
4 2 CatalogD
5 4 CatalogE
CollectionA
CatalogId ItemName
5 ItemA
5 ItemB
4 ItemA
4 ItemC
2 ItemA
2 ItemC
1 ItemD
Expected output
CollectionB
5 ItemA
5 ItemB
4 ItemC
1 ItemD
LINQ is not designed to traverse hierarchical data structures as it has been already considered in:
Walking a hierarchy table with Linq
Recursive Hierarchy - Recursive Query using Linq
But if you can get the hierarchy of catalogs from child to root then the problem could be solved with join and distinct - LINQ's Distinct() on a particular property :
var modelsForE = (from catalog in flattenedHierarchyOfCatalogE
join model in models
on catalog.CatalogId equals model.CatalogId
select model).
GroupBy(model => model.ItemName).
Select(modelGroup => modelGroup.First()).
Distinct();
Or even better - adapt Jon Skeet's answer for distinct.
It solves the duplicates problem but leaves us with another question : How to get flattenedHierarchyOfCatalogE?
PURE LINQ SOLUTION:
It is not easy task, but not exactly impossible with pure LINQ. Adapting How to search Hierarchical Data with Linq we get:
public static class LinqExtensions
{
public static IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten(c, selector))
.Concat(new[] { source });
}
}
//...
var catalogs = new Catalog[]
{
new Catalog(1, 0, "CatalogA"),
new Catalog(2, 1, "Catalogb"),
new Catalog(3, 1, "CatalogC"),
new Catalog(4, 2, "CatalogD"),
new Catalog(5, 4, "CatalogE")
};
var models = new Model[]
{
new Model(5, "ItemA"),
new Model(5, "ItemB"),
new Model(4, "ItemA"),
new Model(4, "ItemC"),
new Model(2, "ItemA"),
new Model(2, "ItemC"),
new Model(1, "ItemD")
};
var catalogE = catalogs.SingleOrDefault(catalog => catalog.CatalogName == "CatalogE");
var flattenedHierarchyOfCatalogE = catalogE.Flatten((source) =>
catalogs.Where(catalog =>
catalog.CatalogId == source.ParentCatalogId));
And then feed the flattenedHierarchyOfCatalogE into the query from the beginning of the question.
WARNING: I have added constructors for your classes, so previous snippet may fail to compile in your project:
public Catalog(Int32 catalogId, Int32 parentCatalogId, String catalogName)
{
this.CatalogId = catalogId;
this.ParentCatalogId = parentCatalogId;
this.CatalogName = catalogName;
} //...
SOMETHING TO CONSIDER
There is nothing wrong with previous solution(well, personally I may have considered to use something with less extensive use of LINQ like Recursive Hierarchy - Recursive Query using Linq), but whichever solution you like you may have one problem: It works, but it doesn't use any optimized datastructures - it is just direct search and selection. If your catalogs grow and queries will execute more often, then the performance may become a problem.
But even if the performance is not a problem then the ease of use of your classes is. Ids, foreign keys are good for relational databases but very unwieldy in OO systems. You may want to consider possible object relational mapping for your classes(or creation of their wrappers(mirrors) that will look something like:
public class Catalog
{
public Catalog Parent { get; set; }
public IEnumerable<Catalog> Children { get; set; }
public string CatalogName { get; set; }
}
public class Model
{
public Catalog Catalog { get; set; }
public string ItemName { get; set; }
}
Such classes are far more self contained and much more easier to use and to traverse their hierarchies. I don't know whether your system is database-driven or not, but you can nonetheless take a look at some object-relational mapping examples and technologies.
P.S.: LINQ is not an absolute tool in .NET arsenal. No doubts that it is very useful tool applicable in multitude of situations, but not in each of all possible. And if tool cannot help you to solve a problem, then it should be either modified or put aside for a moment.
You are most likely looking for SelectMany() extension. A short example of how it can be used to select all the children for comparison (to avoid duplicates) is below:
var col = new[] {
new { name = "joe", children = new [] {
new { name = "billy", age=1 },
new { name = "sally", age=4 }
}},
new { name = "bob", children = new [] {
new { name = "megan", age=10 },
new { name = "molly", age=7 }
}}
};
col.SelectMany(c => c.children).Dump("kids");
For more information there are a few questions on stack overflow about this extension and of course you can read the actual msdn documentation

Categories