I am looking to write linq statement for a simple scenario of collections. I am trying to avoid duplicate items in collection based on parent child relationship. The data structure and sample code is below
public class Catalog
{
public int CatalogId { get; set; }
public int ParentCatalogId { get; set; }
public string CatalogName { get; set; }
}
public class Model
{
public int CatalogId { get; set; }
public string ItemName { get; set; }
...
}
List<Catalog> Catalogs : Contains the complete list of parent child relations to any level of all the catalogs and the root one with ParentCatalogid=null
List<Model> CollectionA : Contains all the items of child as well as parent catalog for a specific catalogId (till its root).
I need to create a CollectionB from CollectionA that will contain items of the provided catalogId including all the items of all the parents such that if item is present in child catalog, i need to ignore same item in parent catalog. In this way there wont be any duplicate Items if same items is available in child as well as parent.
In terms of code I am trying to achieve something like this
while (catalogId!= null)
{
CollectionB.AddRange(
CollectionA.Where(x => x.CatalogId == catalogId &&
!CollectionB.Select(y => y.ItemName).Contains(x.ItemName)));
// Starting from child to parent and ignoring items that are already in CollectionB
catalogId = Catalogs.
Where(x => x.Id == catalogId).
Select(x => x.ParentCatalogId).
FirstOrDefault();
}
I know that Contains clause in linq in above statement will not work but just put that statement to explain what i am trying to do. I can do that using foreach loop but just want to use linq. I am looking for correct linq statement to do this. The sample data is given below and will really appreciate if i can get some help
Catalog
ID ParenId CatalogName
1 null CatalogA
2 1 Catalogb
3 1 CatalogC
4 2 CatalogD
5 4 CatalogE
CollectionA
CatalogId ItemName
5 ItemA
5 ItemB
4 ItemA
4 ItemC
2 ItemA
2 ItemC
1 ItemD
Expected output
CollectionB
5 ItemA
5 ItemB
4 ItemC
1 ItemD
LINQ is not designed to traverse hierarchical data structures as it has been already considered in:
Walking a hierarchy table with Linq
Recursive Hierarchy - Recursive Query using Linq
But if you can get the hierarchy of catalogs from child to root then the problem could be solved with join and distinct - LINQ's Distinct() on a particular property :
var modelsForE = (from catalog in flattenedHierarchyOfCatalogE
join model in models
on catalog.CatalogId equals model.CatalogId
select model).
GroupBy(model => model.ItemName).
Select(modelGroup => modelGroup.First()).
Distinct();
Or even better - adapt Jon Skeet's answer for distinct.
It solves the duplicates problem but leaves us with another question : How to get flattenedHierarchyOfCatalogE?
PURE LINQ SOLUTION:
It is not easy task, but not exactly impossible with pure LINQ. Adapting How to search Hierarchical Data with Linq we get:
public static class LinqExtensions
{
public static IEnumerable<T> Flatten<T>(this T source, Func<T, IEnumerable<T>> selector)
{
return selector(source).SelectMany(c => Flatten(c, selector))
.Concat(new[] { source });
}
}
//...
var catalogs = new Catalog[]
{
new Catalog(1, 0, "CatalogA"),
new Catalog(2, 1, "Catalogb"),
new Catalog(3, 1, "CatalogC"),
new Catalog(4, 2, "CatalogD"),
new Catalog(5, 4, "CatalogE")
};
var models = new Model[]
{
new Model(5, "ItemA"),
new Model(5, "ItemB"),
new Model(4, "ItemA"),
new Model(4, "ItemC"),
new Model(2, "ItemA"),
new Model(2, "ItemC"),
new Model(1, "ItemD")
};
var catalogE = catalogs.SingleOrDefault(catalog => catalog.CatalogName == "CatalogE");
var flattenedHierarchyOfCatalogE = catalogE.Flatten((source) =>
catalogs.Where(catalog =>
catalog.CatalogId == source.ParentCatalogId));
And then feed the flattenedHierarchyOfCatalogE into the query from the beginning of the question.
WARNING: I have added constructors for your classes, so previous snippet may fail to compile in your project:
public Catalog(Int32 catalogId, Int32 parentCatalogId, String catalogName)
{
this.CatalogId = catalogId;
this.ParentCatalogId = parentCatalogId;
this.CatalogName = catalogName;
} //...
SOMETHING TO CONSIDER
There is nothing wrong with previous solution(well, personally I may have considered to use something with less extensive use of LINQ like Recursive Hierarchy - Recursive Query using Linq), but whichever solution you like you may have one problem: It works, but it doesn't use any optimized datastructures - it is just direct search and selection. If your catalogs grow and queries will execute more often, then the performance may become a problem.
But even if the performance is not a problem then the ease of use of your classes is. Ids, foreign keys are good for relational databases but very unwieldy in OO systems. You may want to consider possible object relational mapping for your classes(or creation of their wrappers(mirrors) that will look something like:
public class Catalog
{
public Catalog Parent { get; set; }
public IEnumerable<Catalog> Children { get; set; }
public string CatalogName { get; set; }
}
public class Model
{
public Catalog Catalog { get; set; }
public string ItemName { get; set; }
}
Such classes are far more self contained and much more easier to use and to traverse their hierarchies. I don't know whether your system is database-driven or not, but you can nonetheless take a look at some object-relational mapping examples and technologies.
P.S.: LINQ is not an absolute tool in .NET arsenal. No doubts that it is very useful tool applicable in multitude of situations, but not in each of all possible. And if tool cannot help you to solve a problem, then it should be either modified or put aside for a moment.
You are most likely looking for SelectMany() extension. A short example of how it can be used to select all the children for comparison (to avoid duplicates) is below:
var col = new[] {
new { name = "joe", children = new [] {
new { name = "billy", age=1 },
new { name = "sally", age=4 }
}},
new { name = "bob", children = new [] {
new { name = "megan", age=10 },
new { name = "molly", age=7 }
}}
};
col.SelectMany(c => c.children).Dump("kids");
For more information there are a few questions on stack overflow about this extension and of course you can read the actual msdn documentation
Related
I am rather new to programming, < 2 years. I am trying to take a flat table that is currently a stored procedure in MS-SQL and turn it into a complex data structure. What I'm trying to accomplish is returning all the changes for the various release versions of a project.
These are the model classes I currently have:
public class ReleaseNote
{
public string ReleaseVersion { get; set; }
public DateTime ReleaseDate { get; set; }
public List<ReleaseNoteItems> ReleaseNoteItems { get; set; }
}
public class ReleaseNoteItems
{
public string ChangeType { get; set; }
public List<string> Changes { get; set; }
}
And this is the business logic in the DAL class I have:
public IEnumerable<ReleaseNote> GetAllReleaseNotes()
{
string cmdText = ConfigurationManager.AppSettings["ReleaseNotesAll"];
Func<DataTable, List<ReleaseNote>> transform = releaseNoteTransform;
return getRecords<ReleaseNote>(cmdText, transform);
}
public List<ReleaseNote> releaseNoteTransform(DataTable data)
{
//DISTINCT LIST OF ALL VERSIONS (PARENT RECORDS)
var versions = data.AsEnumerable().Select(row => new ReleaseNote
{
ReleaseVersion = row["ReleaseVersion"].ToString(),
ReleaseDate = DateTime.Parse(row["ReleaseDate"].ToString())
}).Distinct().ToList();
//ENUMERATE VERSIONS AND BUILD OUT RELEASENOTEITEMS
versions.ForEach(version =>
{
//GET LIST OF ROWS THAT BELONG TO THIS VERSION NUMBER
var rows = data.AsEnumerable().Where(row => row["ReleaseVersion"].ToString() == version.ReleaseVersion).ToList();
//GET DISTINCT LIST OF CHANGE TYPES IN THIS VERSION
var changeTypes = rows.Select(row => row["ChangeType"].ToString()).Distinct().ToList();
//INSTANTIATE LIST FOR RELEASENOTE ITEMS
version.ReleaseNoteItems = new List<ReleaseNoteItems>();
//ENUMERATE CHANGE TYPES AND CREATE THEM
changeTypes.ForEach(changeType =>
{
//FILTER FOR CHANGES FOR THIS SPECIFIC CHANGE TYPE AND PROJECT TO LIST<STRING>
var changes = rows.Where(row => row["ChangeType"].ToString() == changeType)
.Select(row => row["ReleaseNote"].ToString()).ToList();
//CREATE THE ITEM AND POPULATE IT
var releaseNoteDetail = new ReleaseNoteItems();
releaseNoteDetail.ChangeType = changeType;
releaseNoteDetail.Changes = changes;
version.ReleaseNoteItems.Add(releaseNoteDetail);
});
});
return versions;
}
I'm presently using Postman to return a JSON object and the issue I'm presently having is that it is not returning unique objects or release versions, it is still giving me duplicates.
These are some links I've looked at. None I've found provide solutions for the specific implementation I'm using. I've tried different implementations, but it seems they fall outside the framework of what I'm trying to accomplish.
Please let me know if you need more information. I'm trying to follow the question protocol, but I'm sure there is something I've left out.
Thanks in advance!
Nice & universal way to convert List of items to Tree
Is there a way to easily convert a flat DataTable to a nested .NET object?Recursive method turning flat structure to recursive
Sounds like your data has duplicates. A given ReleaseVersion may have more than one record. When you take DISTINCT in your example, you are enforcing uniqueness over {ReleaseVersion, ReleaseDate}, which apparently is not good enough.
If you want to have rows that are unique with respect to ReleaseVersion, you need to figure out how to populate ReleaseDate when there is more than one possible value. I would suggest that it should be populated with the latest release date associated with that version. You can enforce that logic with LINQ GroupBy and Max, like this:
var uniqueRows = dt.AsEnumerable()
.GroupBy(row => row["ReleaseVersion"])
.Select (group => new ReleaseNote
{
ReleaseVersion = group.Key as string,
ReleaseDate = group.Max(row => (DateTime)row["ReleaseDate"])
}
);
This LINQ will create one row per release version. The release date will be populated with the latest (max) release date, given the release version.
Using Entity Framework to query a database with a Parent table and Child table with a 1-n relationship:
public class Parent {
public int id { get; set; }
public IList<Child> Children { get; set; }
}
public class Child {
public int id { get; set; }
}
Using EF, here's a quick sample query:
var parents = context.Parents;
Which returns:
parent id = 1, children = { (id = 1), (id = 2), (id = 3) }
What we need is for this to flatten into a 1-1 relationship, but as a list of parents with a single child each:
parent id = 1, children = { (id = 1) }
parent id = 1, children = { (id = 2) }
parent id = 1, children = { (id = 3) }
We're using an OData service layer which hits EF. So performance is an issue -- don't want it to perform a ToList() or iterate the entire result for example.
We've tried several different things, and the closest we can get is creating an anonymous type like such:
var results = from p in context.Parents
from c in p.Children
select new { Parent = p, Child = c }
But this isn't really what we're looking for. It creates an anonymous type of parent and child, not parent with child. So we can't return an IEnumerable<Parent> any longer, but rather an IEnumerable<anonymous>. The anonymous type isn't working with our OData service layer.
Also tried with SelectMany and got 3 results, but all of Children which again isn't quite what we need:
context.Parents.SelectMany(p => p.Children)
Is what we're trying to do possible? With the sample data provided, we'd want 3 rows returned -- representing a List each with a single Child. When normally it returns 1 Parent with 3 Children, we want the Parent returned 3 times with a single child each.
Your requirements don't make any sense, the idea behind how EF and LINQ work is not those repetitive info like SQL does. But you know them better and we don't know the whole picture, so I will try to answer your question hoping I understood it correctly.
If like you said, your problem is that IEnumerable<anonymous> doesn't work with your OData service layer, then create a class for the relationship:
public class ParentChild {
public Parent Parent { get; set; }
public Child Child { get; set; }
}
And then you can use in in your LINQ query:
var results = from p in context.Parents
from c in p.Children
select new ParentChild { Parent = p, Child = c }
I am trying to work out how to use the .NET EntityFramework to generate both readable and natural code and efficient SQL query statements when fetching related entities. For example, given the following code-first definition
public class WidgetContext : DbContext
{
public DbSet<Widget> Widgets { get; set; }
public DbSet<Gizmo> Gizmos { get; set; }
}
public class Widget
{
public virtual int Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual ICollection<Gizmo> Gizmos { get; set; }
}
public class Gizmo
{
public virtual long Id { get; set; }
[Index]
[MaxLength(512)]
public virtual string Name { get; set; }
public virtual Widget Widget { get; set; }
public virtual int WidgetId { get; set; }
}
I want to be able to write code like
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First(x => x.Id == 123);
var gizmo = widget.Gizmos.First(x => x.Name == "gizmo 99");
}
and see a SQL query created along the lines of
SELECT TOP (1) * from Gizmos WHERE WidgetId = 123 AND Name = 'gizmo 99'
So that the work of picking the right Gizmo is performed by the database. This is important because in my use case each Widget could have thousands of related Gizmos and in a particular request I only need to retrieve one at a time. Unfortunately the code above causes the EntityFramework to create SQL like this instead
SELECT * from Gizmos WHERE WidgetId = 123
The match on Gizmo.Name is then being performed in memory by scanning the complete set of related Gizmo entities.
After a good deal of experimentation, I have found ways of creating the efficient SQL use I am looking for in the entity framework, but only by using ugly code which is much less natural to write. The example below illustrates this.
using System.Data.Entity;
using System.Data.Entity.Core.Objects.DataClasses;
using System.Linq;
static void Main(string[] args)
{
Database.SetInitializer(new DropCreateDatabaseAlways<WidgetContext>());
using (var wc = new WidgetContext())
{
var widget = new Widget() { Name = "my widget"};
wc.Widgets.Add(widget);
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
var widget = wc.Widgets.First();
for (int i = 0; i < 1000; i++)
widget.Gizmos.Add(new Gizmo() { Name = string.Format("gizmo {0}", i) });
wc.SaveChanges();
}
using (var wc = new WidgetContext())
{
wc.Database.Log = Console.WriteLine;
var widget = wc.Widgets.First();
Console.WriteLine("=====> Query 1");
// queries all gizmos associated with the widget and then runs the 'First' query in memory. Nice code, ugly database usage
var g1 = widget.Gizmos.First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 2");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g2 = ((EntityCollection<Gizmo>) widget.Gizmos).CreateSourceQuery().First(x => x.Name == "gizmo 99");
Console.WriteLine("=====> Query 3");
// queries on the DB with two terms in the WHERE clause - only pulls one record, good SQL, ugly code
var g3 = wc.Gizmos.First(x => x.Name == "gizmo 99" && x.WidgetId == widget.Id);
Console.WriteLine("=====> Query 4");
// queries on the DB with two terms in the WHERE clause - only pulls one record, also good SQL, ugly code
var g4 = wc.Entry(widget).Collection(x => x.Gizmos).Query().First(x => x.Name == "gizmo 99");
}
Console.ReadLine();
}
Query 1 demonstrates the 'fetch everything and filter' approach that is generated by the natural usage of the entity objects.
Queries 2,3 and 4 above all generate what I would consider to be an efficient SQL query - one that returns a single row and has two terms in the WHERE clause, but they all involve very stilted C# code.
Does anyone have a solution that will allow natural C# code to be written and generate efficient SQL utilization in this case?
I should note that I have tried replacing ICollection with EntityCollection in my Widget object to allow the cast to be removed from the Query 2 code above. Unfortunately this leads to an EntityException telling me that
The object could not be added to the EntityCollection or
EntityReference. An object that is attached to an ObjectContext cannot
be added to an EntityCollection or EntityReference that is not
associated with a source object.
when I try to retrieve any related objects.
Any suggestions appreciated.
Ok, further digging has let me get as close as I think is possible to where I want to be (which, to reiterate, is code that looks OO but generates efficient DB usage patterns).
It turns out that Query2 above (casting the related collection to an EntityCollection) actually isn't a good solution, since although it generates the desired query type against the database, the mere act of fetching the Gizmos collection from the widget is enough to make the entity framework go off to the database and fetch all related Gizmos - i.e. performing the query that I am trying to avoid.
However, it's possible to get the EntityCollection for a relationship without calling the getter of the collection property, as described here http://blogs.msdn.com/b/alexj/archive/2009/06/08/tip-24-how-to-get-the-objectcontext-from-an-entity.aspx. This approach sidesteps the entity framework fetching related entities when you access the Gizmos collection property.
So, an additional read-only property on the Widget can be added like this
public IQueryable<Gizmo> GizmosQuery
{
get
{
var relationshipManager = ((IEntityWithRelationships)this).RelationshipManager;
return (IQueryable<Gizmo>) relationshipManager.GetAllRelatedEnds().First( x => x is EntityCollection<Gizmo>).CreateSourceQuery();
}
}
and then the calling code can look like this
var g1 = widget.GizmosQuery.First(x => x.Name == "gizmo 99");
This approach generates SQL that efficiently fetches only a single row from the database, but depends on the following conditions holding true
Only one relationship from the source to the target type. Having multiple relationships linking a Widget to Gizmos would mean a more complicated predicate would be needed in the .First() call in GizmosQuery.
Proxy creation is enabled for the DbContext and the Widget class is eligible for proxy generation (https://msdn.microsoft.com/en-us/library/vstudio/dd468057%28v=vs.100%29.aspx)
The GizmosQuery property must not be called on objects that are newly created using new Widget() since these will not be proxies and will not implement IEntityWithRelationships. New objects that are valid proxies can be created using wc.Widgets.Create() instead if necessary.
I have a self referencing Category class from which I would like to retrieve parent categories and all corresponding children if it has at least one child category and has at least 1 or more activities (ICollection<Activity>) in the collection.
This would also go for children of children as these should only be returned if there are children categories with at least 1 or more activities.
If there are no child categories with at least 1 or more activities the parent or child Category should not be returned.
The query should return the parent Category as an actual Category object and not just the CategoryId. It this possible?
public class Category
{
public int CategoryId { get; set; }
public string Name { get; set; }
public int? ParentId { get; set; }
public virtual Category Parent { get; set; }
public virtual ICollection<Category> Children { get; set; }
public virtual ICollection<Activity> Activities { get; set; }
}
UPDATE 1
The query which partially works:
var categories = _db.Categories
.Where(x => x.Parent != null && x.Activities.Count > 0)
.GroupBy(x => x.ParentId)
.Select(g => new { Parent = g.Key, Children = g.ToList() }).ToList();
Let's start off a bit smaller, since the query you are looking to create is somewhat complex. We will create your query from the bottom up. First off, you want to eliminate categories that do not have any child categories with at least one or more activities. Let's make a Predicate to return true for those that should be included and false for those that should be excluded, at a single level. We will do this in two stages. First, let's make a predicate that returns true for categories that have activities:
Predicate<Category> hasActivities = cat => cat.Activities.Any();
Second, let's make a Predicate to return true for those categories with child categories that have activities:
Predicate<Category> hasChildWithActivities =
parentCat => parentCat.Children.Any(hasActivities);
Now let's create the filter query that will filter a given Category's descendants. To do this, we will create a Func that takes a parent Category, performs the logic and returns the updated Category:
Func<Category, Category> getFilteredCategory =
parentCat =>
{
parentCat.Children = parentCat.Children
.Where(hasChildWithActivities)
.Select(getFilteredCategory);
return parentCat;
});
Note that this is equivalent to:
Func<Category, Category> getFilteredCategory = delegate(Category parentCat)
{
parentCat.Children = parentCat.Children
.Where(hasChildWithActivities)
.Select(getFilteredCategory);
return parentCat;
};
In your OP, you mentioned that you wanted to filter parents as well. You can use this same logic on the parents by traversing up to the top level and running this query, or by creating a separate query with "joins" or more complex "select" statements. IMHO, the latter would likely be messy and I would advise against it. If you need to apply the logic to parents as well, then first traverse up the tree. Either way, this should give you a good start.
Let me know if you have any questions. Good luck and happy coding! :)
Here's my problem: I have a class that have 2 list properties of the same class type (but with some different restriction as on how to be filled), let's say:
public class Team
{
[Key]
public int IDTeam { get; set; }
public string TeamName { get; set; }
public List<Programmer> Members { get; set; }
public List<Programmer> Leaders { get; set; }
public LoadLists(MyProjectDBContext db)
{
this.Members = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience == "" || p.Experience == null)).ToList();
this.Leaders = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience != null && p.Experience != "")).ToList();
}
}
public class Programmer
{
[Key]
public int IDProgrammer { get; set; }
[ForeignKey("Team")]
public int IDTeam { get; set; }
public virtual Team Team { get; set; }
public string Name { get; set; }
public string Experience { get; set; }
}
At some point, I need to take a list of Teams, with it's members and leaders, and for this I would assume something like:
return db.Teams
.Include(m => m.Members.Where(p => p.Experience == "" || p.Experience == null)
.Include(l => l.Leaders.Where(p => p.Experience != null && p.Experience != "")
.OrderBy(t => t.TeamName)
.ToList();
And, of course, in this case I would be assuming it wrong (cause it's not working at all).
Any ideas on how to achieve that?
EDIT: To clarify a bit more, the 2 list properties of the team class should be filled according to:
1 - Members attribute - Should include all related proggramers with no experience (proggramer.Experience == null or "");
2 - Leaders attribute - Should include all related proggramers with any experience (programmer.Experiente != null nor "");
EDIT 2: Here's the MyProjectDbContext declaration:
public class MyProjectDBContext : DbContext
{
public DbSet<Team> Teams { get; set; }
public DbSet<Programmer> Programmers { get; set; }
}
You are talking about EntityFramework (Linq to entities) right? If so, Include() is a Method of Linq To Entities to include a sub-relation in the result set. I think you should place the Where() outside of the Inlcude().
On this topic you'll find some examples on how to use the Include() method.
So I suggest to add the Include()'s first to include the relations "Members" and "Leaders" and then apply your Where-Statement (can be done with one Where()).
return db.Teams
.Include("Team.Members")
.Include("Team.Leaders")
.Where(t => string.IsNullOrWhitespace(t.Members.Experience) ... )
What is unclear to me is your where criteria and your use-case at all as you are talking of getting a list of Teams with Leaders and Members. May above example will return a list of Teams that match the Where() statement. You can look though it and within that loop you can list its members and leaders - if that is the use-case.
An alternative is something like this:
return db.Members
.Where(m => string.IsNullOrWhitespace(m.Experience))
.GroupBy(m => m.Team)
This get you a list of members with no experience grouped by Team. You can loop the groups (Teams) and within on its members. If you like to get each team only once you can add a Distinct(m => m.Team) at the end.
Hope this helps. If you need some more detailed code samples it would help to understand your requirements better. So maybe you can say a few more words on what you expect from the query.
Update:
Just read our edits which sound interesting. I don't think you can do this all in one Linq-To-Entities statement. Personally I would do that on the getters of the properties Members and Leaders which do their own query (as a read-only property). To get performance for huge data amount I would even do it with SQL-views on the DB itself. But this depends a little on the context the "Members" and "Leaders" are used (high frequent etc).
Update 2:
Using a single query to get a table of teams with sublists for members and leaders I would do a query on "Programmers" and group them nested by Team and Experience. The result is then a list of groups (=Teams) with Groups (Experienced/Non-experience) with Programmers in it. The final table then can be build with three nested foreach-Statements. See here for some grouping examples (see the example "GroupBy - Nested").
Whenever you fetch entities, they will be stored in the context -- regardless of the form they are "selected" in. That means you can fetch the teams along with all the necessary related entities into an anonymous type, like this:
var teams =
(from team in db.Teams
select new {
team,
relatedProgrammers = team.Programmers.Where(
[query that gets all leaders OR members])
}).ToList().Select(x => x.team);
It looks like we're throwing away the relatedProgrammers field here, but those Programmer entities are still in memory. So, when you execute this:
foreach (var team in teams) team.LoadLists(db);
...it will populate the lists from the programmers that were already fetched, without querying the database again (assuming db is the same context instance as above).
Note: I haven't tested this myself. It's based on a similar technique shown in this answer.
EDIT - Actually, it looks like your "leaders" and "members" cover all programmers associated with a team, so you should be able to just do Teams.Include(t => t.Programmers) and then LoadLists.