Partially Populate Child Collection with NHibernate

Partially Populate Child Collection with NHibernate - c#

I've been struggling with this for a while, and can't seem to figure it out...
I've got a BlogPost class, which has a collection of Comments, and each of the comments has a DatePosted field.
What I need to do is query for a BlogPost and return it with a partially loaded Comments collection, say all comments posted on the 1 Aug 2009.
I've got this query:
BlogPost post = session.CreateCriteria<BlogPost>()
.Add(Restrictions.Eq("Id", 1))
.CreateAlias("Comments", "c")
.Add(Restrictions.Eq("c.DatePosted", new DateTime(2009, 8, 1)))
.UniqueResult<BlogPost>();
When I run this query and check out the generated sql, it first runs a query against the BlogPost table, joining to the Comment table with the correct date restriction in, then runs a second query just on the Comment table that returns everything.
Result is the Comments collection of the BlogPost class totally filled up!
What am I doing wrong?
I've got code samples if anyone needs more info...!

There is a result transformer for this, see the documentation.
Quote:
Note that the kittens collections held
by the Cat instances returned by the
previous two queries are not
pre-filtered by the criteria! If you
wish to retrieve just the kittens that
match the criteria, you must use
SetResultTransformer(CriteriaUtil.AliasToEntityMap).
IList cats =
sess.CreateCriteria(typeof(Cat))
.CreateCriteria("Kittens", "kt")
.Add( Expression.Eq("Name", "F%") )
.SetResultTransformer(CriteriaUtil.AliasToEntityMap)
.List();
You could also use filters that get activated using session.EnableFilter(name).
There is a similar question here.

You're not really doing anything wrong - hibernate just doesn't work that way.
If you navigate from the BlogPost to the Comments, Hibernate will populate the comments based on the association mapping that you've specified, not the query you used to retrieve the BlogPost. Presumably your mapping is just doing a join
on a key column. You can use a filter to get the effect you're looking for. But I think that will still fetch all the comments and then do a post-filter.
More simply, just query for what you want:
List<Comments> comments = session.CreateCriteria<BlogPost>()
.Add(Restrictions.Eq("Id", 1))
.CreateAlias("Comments", "c")
.Add(Restrictions.Eq("c.DatePosted", new DateTime(2009, 8, 1)))
.list();
This will in fact return only the comments from the specified date.
if it makes you feel better, you can then set them like this:
post.setComments(comments); //having already retreived the post elsewhere
I was also surprised by this behaviour when I first encountered it. It seems like a bug, but I've been told its by design.

thanks for the response, i guess i kinda understand why its by design, but i would have thought that there would be a built in method to enable this, your solution works, but feels like a bit of a hack!
my problem is that the child collection is HUGE if not filtered (the example i gave of posts and comments was to protect the names of the innocent!) and there is now way i can be pulling all the data back every time.
i've run Sql Profiler on this and its still pulling all the data back.
when i run the following code the first query does what you expect, just the one post comes back, but as soon as the second query is executed, two queries go to the database, the first to retrieve the filtered comments (bingo!), and then a second to populate the post.Comments property with all the comments, just what i'm trying to avoid!
var post = session.CreateCriteria<BlogPost>()
.Add(Restrictions.Eq("Id", 1))
.UniqueResult<BlogPost>();
var comments = session.CreateCriteria<Comment>()
.Add(Restrictions.Eq("BlogPostId", 1))
.Add(Restrictions.Eq("DatePosted", new DateTime(2009, 8, 1)))
.List<Comment>();
post.Comments = comments;
this is very strange, its not like i'm enumerating over the post.Comments list, so why is it populating it?! here are my classes and maps:
public class BlogPostMap : ClassMap<BlogPost>
{
public BlogPostMap()
{
Id(b => b.Id);
Map(b => b.Title);
Map(b => b.Body);
HasMany(b => b.Comments).KeyColumnNames.Add("BlogPostId");
}
}
public class CommentMap : ClassMap<Comment>
{
public CommentMap()
{
Id(c => c.Id);
Map(c => c.BlogPostId);
Map(c => c.Text);
Map(c => c.DatePosted);
}
}
public class BlogPost
{
public virtual int Id { get; set; }
public virtual string Title { get; set; }
public virtual string Body { get; set; }
public virtual IList<Comment> Comments { get; set; }
}
public class Comment
{
public virtual int Id { get; set; }
public virtual int BlogPostId { get; set; }
public virtual string Text { get; set; }
public virtual DateTime DatePosted { get; set; }
}
any ideas?

I agree it feels like a hack to manually populate the collection.
You can use a custom loader instead. Something like this:
<query name="loadComments">
<return alias="comments" class="Comment"/>
<load-collection alias="comments" role="Post.comments"/>
from Comments c where c.Id = ? and c.DatePosted = SYSDATE
</query>
Also, you can use sql-query if you want more control.
I've occasionally stooped to writing custom loaders when I couldn't get hibernate to generate the query I wanted. Anyway, don't know why I didn't think of that in the first place.

Make the Comments Collection lazy, so that hibernate doesn't fetch it when you're getting the BlogPost. Then use a filter on Comments collection.
comments = session.CreateFilter(blogPost.Comments, ... ).List();

Related

Class with multiple List Properties of same type but with different restrictions

Here's my problem: I have a class that have 2 list properties of the same class type (but with some different restriction as on how to be filled), let's say:
public class Team
{
[Key]
public int IDTeam { get; set; }
public string TeamName { get; set; }
public List<Programmer> Members { get; set; }
public List<Programmer> Leaders { get; set; }
public LoadLists(MyProjectDBContext db)
{
this.Members = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience == "" || p.Experience == null)).ToList();
this.Leaders = db.Programmers.Where(p => p.IDTeam = this.IDTeam
&& (p.Experience != null && p.Experience != "")).ToList();
}
}
public class Programmer
{
[Key]
public int IDProgrammer { get; set; }
[ForeignKey("Team")]
public int IDTeam { get; set; }
public virtual Team Team { get; set; }
public string Name { get; set; }
public string Experience { get; set; }
}
At some point, I need to take a list of Teams, with it's members and leaders, and for this I would assume something like:
return db.Teams
.Include(m => m.Members.Where(p => p.Experience == "" || p.Experience == null)
.Include(l => l.Leaders.Where(p => p.Experience != null && p.Experience != "")
.OrderBy(t => t.TeamName)
.ToList();
And, of course, in this case I would be assuming it wrong (cause it's not working at all).
Any ideas on how to achieve that?
EDIT: To clarify a bit more, the 2 list properties of the team class should be filled according to:
1 - Members attribute - Should include all related proggramers with no experience (proggramer.Experience == null or "");
2 - Leaders attribute - Should include all related proggramers with any experience (programmer.Experiente != null nor "");
EDIT 2: Here's the MyProjectDbContext declaration:
public class MyProjectDBContext : DbContext
{
public DbSet<Team> Teams { get; set; }
public DbSet<Programmer> Programmers { get; set; }
}

You are talking about EntityFramework (Linq to entities) right? If so, Include() is a Method of Linq To Entities to include a sub-relation in the result set. I think you should place the Where() outside of the Inlcude().
On this topic you'll find some examples on how to use the Include() method.
So I suggest to add the Include()'s first to include the relations "Members" and "Leaders" and then apply your Where-Statement (can be done with one Where()).
return db.Teams
.Include("Team.Members")
.Include("Team.Leaders")
.Where(t => string.IsNullOrWhitespace(t.Members.Experience) ... )
What is unclear to me is your where criteria and your use-case at all as you are talking of getting a list of Teams with Leaders and Members. May above example will return a list of Teams that match the Where() statement. You can look though it and within that loop you can list its members and leaders - if that is the use-case.
An alternative is something like this:
return db.Members
.Where(m => string.IsNullOrWhitespace(m.Experience))
.GroupBy(m => m.Team)
This get you a list of members with no experience grouped by Team. You can loop the groups (Teams) and within on its members. If you like to get each team only once you can add a Distinct(m => m.Team) at the end.
Hope this helps. If you need some more detailed code samples it would help to understand your requirements better. So maybe you can say a few more words on what you expect from the query.
Update:
Just read our edits which sound interesting. I don't think you can do this all in one Linq-To-Entities statement. Personally I would do that on the getters of the properties Members and Leaders which do their own query (as a read-only property). To get performance for huge data amount I would even do it with SQL-views on the DB itself. But this depends a little on the context the "Members" and "Leaders" are used (high frequent etc).
Update 2:
Using a single query to get a table of teams with sublists for members and leaders I would do a query on "Programmers" and group them nested by Team and Experience. The result is then a list of groups (=Teams) with Groups (Experienced/Non-experience) with Programmers in it. The final table then can be build with three nested foreach-Statements. See here for some grouping examples (see the example "GroupBy - Nested").

Whenever you fetch entities, they will be stored in the context -- regardless of the form they are "selected" in. That means you can fetch the teams along with all the necessary related entities into an anonymous type, like this:
var teams =
(from team in db.Teams
select new {
team,
relatedProgrammers = team.Programmers.Where(
[query that gets all leaders OR members])
}).ToList().Select(x => x.team);
It looks like we're throwing away the relatedProgrammers field here, but those Programmer entities are still in memory. So, when you execute this:
foreach (var team in teams) team.LoadLists(db);
...it will populate the lists from the programmers that were already fetched, without querying the database again (assuming db is the same context instance as above).
Note: I haven't tested this myself. It's based on a similar technique shown in this answer.
EDIT - Actually, it looks like your "leaders" and "members" cover all programmers associated with a team, so you should be able to just do Teams.Include(t => t.Programmers) and then LoadLists.

Best way to query using EF

Using LINQ, I am having trouble querying my DbContext in an efficient way.
The database contains 700,000 over entities which have a date and a name and other information.
In my code, I have a new list of objects (which can potentially have 100,000 elements) coming in and I would like to query my database and deduct which information are new entity or which information are existing entities that needs to be updated.
I would like to do it in a very efficient way (with a single query if possible).
This is my code :
public class MyDbContext : DbContext
{
public DbSet<MyEntity> MyEntities { get; set; }
}
public class MyEntity
{
[Key]
public Guid Id { get; set; }
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
public string Description { get; set; }
}
public class IncomingInfo
{
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
}
public class Modifier
{
public void AddOrUpdate(IList<IncomingInfo> info)
{
using (var context = new MyDbContext())
{
//Find the new information
//to add as new entities
IEnumerable<MyEntity> EntitiesToAdd = ??
//Find the information
//to update in existing entities
IEnumerable<MyEntity> EntitiesToUpdate = ??
}
}
}
Can someone help me constructing my query?
Thank you very much.
Edit :
Sorry I forgot to explain how do I consider two entities equal.
There are equal if the Date and the Name property are identical.
I first tried to build a predicate using LinqKit PredicateBuilder without much success (encountered the error of parameter too large, had to make multiple queries which took time).
So far the most successful way I found was to implement a LEFT OUTER join and join the incoming list to the DbSet
Which I implemented this way :
var values = info.GroupJoin(context.MyEntities,
inf => inf.Name + inf.Date.ToString(),
ent => ent.Name + ent.Date.ToString(),
(inf, ents) => new { Info = inf, Entities = ents })
.SelectMany(i => i.Entities.DefaultIfEmpty(),
(i, ent) => new { i.Info.Name, i.Info.Amount, i.Info.Date, ToBeAdded = ent == null ? true : false });
IEnumerable<MyEntity> EntitiesToAdd = values.Where(i => i.ToBeAdded)
.Select(i => new MyEntity
{
Id = Guid.NewGuid(),
Amount = i.Amount,
Date = i.Date,
Name = i.Name,
Description = null
}).ToList();
My test contains 700,000 entities in database. The incoming info list contains 70,000 items; where 50,000 are existing entities and 20,000 are new entities.
This query takes around 15 seconds to execute which does not seem right to me.
Hopefully this is enough to ask for help. Can someone help me one this ?
Thank you very much.

I read the pastebin response from #Leniency and it covers some of the same stuff I was going to say, like querying a date range and comparing on there. The problem with that method though is that (depending on how those dates are set) it might return all 700K+ records in the database, which would give you the absolute worst performance.
My suggestion is that you analyze your network topology to see how expensive your calls to the database really are. I'm assuming this is running on a (web) server which is receiving these IncomingInfo objects from clients. If this server is closely connected to your database server (or on the same machine) then you might be better off not optimizing your calls to the database.
Also, if you have control over the behavior of the clients, you might want to force them to send only like 25 to 100 records with each request. This would make it so that you could deal with them in much more manageable chunks. The client might have to send 100 or more requests to the server (which you could do async so that they get sent ~5 at a time, depending on expected load profiles), but at least it wouldn't be sitting there for 5+ minutes waiting to get a response back from the server for a single request.
BTW, the GroupJoin call that you said took 15 seconds probably is having to download all 700K records before doing the join. You see, joins can't be done on objects that don't exist on the same machine, it either has to send all the IncomingInfo objects (or at least the Name+Date.ToString() concatenations) to the database, or it has to request all the records from the database before any joins can be done. You would probably have to look at the SQL that is being sent to the database in order to tell which method is being used. But you would probably find that querying the database for matches one at a time would probably be faster than the join in this case.
Hope that helps! ;)

Does AsQueryable() on ICollection really makes lazy execution?

I am using Entity Framework CodeFirst where I have used Parent Child relations using ICollection as
public class Person
{
public string UserName { get;set}
public ICollection<Blog> Blogs { get; set;}
}
public class Blog
{
public int id { get; set; }
public string Subject { get; set; }
public string Body { get; set; }
}
Ok, so far everything is working ok, but my concern is, whenever I want to get the Blogs of a person, I get it as
var thePerson = _context.Persons.Where(x => x.UserName = 'xxx').SingleOrDefault();
var theBlogs = thePerson.Blogs.OrderBy(id).Take(5);
Now, I understand that, when the line is executed, all Blogs for that person is loaded into the memory and then sorting and selecting is done from memory. That is not ideal for a record of Person who has large number of blogs. I want to make the Blog Child as IQueryable so that the Sorting and Selecting is done in SQL database before pulling to Memory.
I know I could declare the Blogs as IQueryable in my context so that I could directly query as
var theBlogs = _context.Blogs.Where(.....)
but that is not feasible for me due to design choice, I want to avoid any circular reference as much as possible due to serialization problem. So, I did not make any reference of the parent entity in my child.
I found that, i can call AsQueryable() method on the blogs as
var theBlogs = thePerson.Blogs.AsQueryable().OrderBy(id).Take(5);
That looks like a magic for me and seems too good to be true. So my question. Does this AsQueryable really make the ICollection as IQueryable in reality and makes all Query process in SQL Server (Lazy loading) OR it is just a casting where Blogs are loaded into memory as like before, but change the interface from ICollection to IQueryable ?

So actually it appears that writing your navigation property as IQueryable<T> is not possible.
What you could do is adding a navigation property to Blog:
public class Blog
{
public int id { get; set; }
public string Subject { get; set; }
public string Body { get; set; }
public virtual Person Owner { get; set; }
}
From that, you can query as follows so it won't load everything into memory:
var thePerson = _context.Persons.Where(x => x.UserName = 'xxx').SingleOrDefault();
var results = _context.Blogs.Where(z => z.Person.Name = thePerson.Name).OrderBy(id).Take(5)
I suggest you to try LINQPad to see how LINQ is translated into SQL, and what is actually requested from the DB.

A better approach is described in Ladislav's answer. In your case:
var theBlogs = _context.Entry(thePerson)
.Collection(x => x.Blogs)
.Query()
.OrderBy(x => x.id)
.Take(5);

Nhibernate, painfully slow query, am I doing it wrong?

I have some major performance issues when asking a specific nhibernate question.
I have two tables, A and B where A has ~4000 rows and B has ~50 000 rows. The relation between A and B is one to many.
So the question that I ask needs to load all entites in A and then force load all entities in B because I want to aggregate over the entities in B.
I'm using fluenthibernate and have configured it to allow lazyloading, this works great for all other questions except this one where I have to load ~50000 entities, this number will likely grow with 50k a month. The question takes above a minute to ask now (probably even slower)
Obvious optimizations that I've already done: Only create one sessionfactory, lazyloading is not turned off.
So my question is this, will nhibernate be to slow in this aspect ? (that is, should I build my DAL with regular SQL questions rather than nhibernate?) or is there a way to improve the performance. This is a reporting application, so there won't be many concurrent users but I still would like to make this question atleast take less then 5-10 seconds.
EDIT
Adding code:
public class ChatSessions
{
public virtual int Id { get; set; }
public virtual IList<ChatComments> Comments { get; set; }
public ChatSessions()
{
Comments = new List<ChatComments>();
}
}
public ChatCommentsMapping()
{
Id(x => x.Id);
References(x => x.ChatSession);
}
public class ChatComments
{
public virtual int Id { get; set; }
public virtual ChatSessions ChatSession{ get; set; }
public virtual string Comment { get; set; }
public virtual DateTime TimeStamp { get; set; }
public virtual int CommentType { get; set; }
public virtual bool Deleted { get; set; }
public virtual string ChatAlias { get; set; }
}
public ChatSessionsMapping()
{
Id(x => x.Id);
References(x => x.ChatRoom)
.Not.LazyLoad();
HasMany(x => x.Comments)
.Table("chatcomments");
}
Then In my repo I use this query:
public IList<ChatComments> GetChatCommentsBySession(int chatsessionid)
{
using(var session = _factory.OpenSession())
{
var chatsession = session.Get<ChatSessions>(chatsessionid);
NHibernateUtil.Initialize(chatsession.Comments);
return chatsession.Comments;
}
}
And that method gets called once for every Chatsession.
The query that I aggregate with then looks something like this:
foreach (var hour in groupedByHour){
var datetime = hour.Sessions.First().StartTimeStamp;
var dp = new DataPoint<DateTime, double>
{
YValue = hour.Sessions.Select(x =>
_chatCommentsRepo.GetChatCommentsBySession(x.Id).Count)
.Aggregate((counter,item) => counter += item),
XValue = new DateTime(datetime.Year, datetime.Month, datetime.Day, datetime.Hour, 0, 0)
};
datacollection.Add(dp);
}

Selecting 50,000 rows of any size is not ever going to be quick, but consider using a subselect fetching strategory - it should work a lot better in your scenerio. Also, make sure you have an index for the foreign key in your database.
There's an example of what could be happening at the NHProf site
EDIT: I'd thoroughly recommend NHProf if you're doing any work with NHibernate - it's a quick way to get to WIN.

I posted a comment then re-read your question and suspect that you are probably utilizing NHibernate in a manner for which it's not ideal. You say you're pulling the table B rows to aggregate over them. Are you doing this using LINQ or something on the collections after you've pulled the individual records via NH?
If so, you might want to consider utilizing NH's capability to create projections that will perform the aggregates for you. In this way, NH will generate the SQL to do the aggregations, which in most cases is going to be much faster than doing 4000 retrievals of related items then performing aggregates in code.
This SO question might get you started: What's the best way to get aggregate results from NHibernate?
UPDATE
Yeah, looking at your code you're disabling lazy-loading, which is firing off a separate query for each of your chat items in order to pull the comments. It's taking forever because you're essentially doing 8000 separate queries.
It appears that you're trying to return a count of comments by hour. You can either do some manual SQL to split your comment timestamp by grouping by a DATEPART SQL expression or incorporate the datepart eval in your criteria, like this SO question: How to use DatePart in an NHibernate Criteria Query.

How do I get NHibernate to do a join?

I've used Fluent NHibernate to hook up a store and employee class where Stores can have many employees as follows:
public class Store
{
public virtual IList<Employee> Employees { get; set; }
//other store properties
}
public class Employee
{
public virtual Store Store { get; set; }
public virtual bool? SomeStatus1 { get; set; }
}
I'm needing to get all stores that have employees that do not have SomeStatus1 set to true.
My feable attempt here has failed:
Session.CreateCriteria(typeof(Store))
.Add(Restrictions.Not(Restrictions.Eq("Employees.SomeStatus1", true))
.List<Store>();
Any idea how I go about doing that?
The reason my attempt has failed is because the list Employees doesn't have a property of SomeStatus1...which is fairly obvious.
What I dont know, is how to get NHibernate to only get stores which have employees in the state I'm looking for...
I think what I'm wanting to ask NHibernate is to do a join...but I don't know how to ask it to do that...

you join by creating sub criteria
var criteria = Session.CreateCriteria(typeof(Store));
var join = criteria.CreateCriteria("Employees");
join.Add(Restrictions.Not(Restrictions.Eq("SomeStatus1", true));
return criteria.List<Store>();
Untested (obv) hope it works, but you get the idea. That's how I do it with N:1 but you have 1:N
EDIT: Ok, I did a bit of research after posting. It seems the code I did should work, but will cause loading of the employees collection. The same basic code is found on ayende's blog. There is a sample there which does the same thing without causing the collection to be reloaded. Hope that helps.

Try:
Session.CreateCriteria(typeof(Store))
.CreateAlias("Employees", "e")
.Add(Restrictions.Not(Restrictions.Eq("e.SomeStatus1", true))
.List<Store>();

I would suggest you use the Linq to NHibernate API instead of the Criteria API. With it, your query would be as follows:
var query = Session.Linq<Store>()
.Where(store => store.SomeStatus1 != true);
var result = query.ToList();
More help here.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.