Nhibernate, painfully slow query, am I doing it wrong? - c#

I have some major performance issues when asking a specific nhibernate question.
I have two tables, A and B where A has ~4000 rows and B has ~50 000 rows. The relation between A and B is one to many.
So the question that I ask needs to load all entites in A and then force load all entities in B because I want to aggregate over the entities in B.
I'm using fluenthibernate and have configured it to allow lazyloading, this works great for all other questions except this one where I have to load ~50000 entities, this number will likely grow with 50k a month. The question takes above a minute to ask now (probably even slower)
Obvious optimizations that I've already done: Only create one sessionfactory, lazyloading is not turned off.
So my question is this, will nhibernate be to slow in this aspect ? (that is, should I build my DAL with regular SQL questions rather than nhibernate?) or is there a way to improve the performance. This is a reporting application, so there won't be many concurrent users but I still would like to make this question atleast take less then 5-10 seconds.
EDIT
Adding code:
public class ChatSessions
{
public virtual int Id { get; set; }
public virtual IList<ChatComments> Comments { get; set; }
public ChatSessions()
{
Comments = new List<ChatComments>();
}
}
public ChatCommentsMapping()
{
Id(x => x.Id);
References(x => x.ChatSession);
}
public class ChatComments
{
public virtual int Id { get; set; }
public virtual ChatSessions ChatSession{ get; set; }
public virtual string Comment { get; set; }
public virtual DateTime TimeStamp { get; set; }
public virtual int CommentType { get; set; }
public virtual bool Deleted { get; set; }
public virtual string ChatAlias { get; set; }
}
public ChatSessionsMapping()
{
Id(x => x.Id);
References(x => x.ChatRoom)
.Not.LazyLoad();
HasMany(x => x.Comments)
.Table("chatcomments");
}
Then In my repo I use this query:
public IList<ChatComments> GetChatCommentsBySession(int chatsessionid)
{
using(var session = _factory.OpenSession())
{
var chatsession = session.Get<ChatSessions>(chatsessionid);
NHibernateUtil.Initialize(chatsession.Comments);
return chatsession.Comments;
}
}
And that method gets called once for every Chatsession.
The query that I aggregate with then looks something like this:
foreach (var hour in groupedByHour){
var datetime = hour.Sessions.First().StartTimeStamp;
var dp = new DataPoint<DateTime, double>
{
YValue = hour.Sessions.Select(x =>
_chatCommentsRepo.GetChatCommentsBySession(x.Id).Count)
.Aggregate((counter,item) => counter += item),
XValue = new DateTime(datetime.Year, datetime.Month, datetime.Day, datetime.Hour, 0, 0)
};
datacollection.Add(dp);
}

Selecting 50,000 rows of any size is not ever going to be quick, but consider using a subselect fetching strategory - it should work a lot better in your scenerio. Also, make sure you have an index for the foreign key in your database.
There's an example of what could be happening at the NHProf site
EDIT: I'd thoroughly recommend NHProf if you're doing any work with NHibernate - it's a quick way to get to WIN.

I posted a comment then re-read your question and suspect that you are probably utilizing NHibernate in a manner for which it's not ideal. You say you're pulling the table B rows to aggregate over them. Are you doing this using LINQ or something on the collections after you've pulled the individual records via NH?
If so, you might want to consider utilizing NH's capability to create projections that will perform the aggregates for you. In this way, NH will generate the SQL to do the aggregations, which in most cases is going to be much faster than doing 4000 retrievals of related items then performing aggregates in code.
This SO question might get you started: What's the best way to get aggregate results from NHibernate?
UPDATE
Yeah, looking at your code you're disabling lazy-loading, which is firing off a separate query for each of your chat items in order to pull the comments. It's taking forever because you're essentially doing 8000 separate queries.
It appears that you're trying to return a count of comments by hour. You can either do some manual SQL to split your comment timestamp by grouping by a DATEPART SQL expression or incorporate the datepart eval in your criteria, like this SO question: How to use DatePart in an NHibernate Criteria Query.

Related

EF Core: How to best get average value in a model of a related model

I've got a Blazor Server App using the Entity Framework (EF Core).
I use a code first approach with two models, Entry and Target.
Each entry has a target. So a target can have more than one entry pointing to it.
The model for the Target looks like this:
public class Target
{
public string TargetId { get; set; }
[Required]
public string Name { get; set; }
[InverseProperty("Target")]
public List<Entry> Entries { get; set; }
[NotMapped]
public double AverageEntryRating => Entries != null ? Entries.Where(e => e.Rating > 0).Select(e => e.Rating).Average() : 0;
}
An entry can have a rating, the model Entry looks like this:
public class Entry
{
public string EntryId { get; set; }
public int Rating { get; set; }
[Required]
public string TargetId {get; set; }
[ForeignKey("TargetId")]
public Target Target { get; set; }
}
As you can see in my Target model, I would like to know for each Target, what the average rating for it is, based on the average of all entries that point to the Target - that's why there is this (NotMapped) property in the target:
public double AverageEntryRating => Entries != null ? Entries.Where(e => e.Rating > 0).Select(e => e.Rating).Average() : 0;
But this does (of course) not always work, as the Entries of the target are not guaranteed to be loaded at the time the property is accessed.
I tried to solve it differently, for example to have a method in my TargetService, where I can pass in a targetId and gives me the result:
public double GetTargetMedianEntryRating(string targetId) {
var median = _context.Entries
.Where(e => e.TargetId == targetId && e.Rating > 0)
.Select(e => e.Rating)
.DefaultIfEmpty()
.Average();
return median;
}
But when I list out my targets in a table and then in a cell want to display this value (passing in the current targetId of the foreach loop) I get a concurrency exception, as the database context is used in multiple threads (I guess one from looping through the rows/targets and one other from getting the average value)... so this leads me into new troubles.
Personally I would prefer to work with the AverageEntryRating property on the Target model, as it seems natural to me and it would also be convenient to access the value just like this.
But how would I make sure, that the entries are loaded, when I access this property. Or is this not a good approach because this would mean I would have to load Entries anyway for all the targets which would lead to performance degradation? If yes, what would be a good way to get to the average/median value?
There are a couple of options I could think of, and it depends on you situation what to do. There might be more alternatives, but at least I hope that this can give you some options you hadn't considered.
Have a BaseQuery extension method that always include all Entries
You could make sure of doing .Include(x => x.Entries) whenever you are querying for Target. You can even create an extension method of the database context called something like TargetBaseQuery() that includes all necessary relationship whenever you use it. Then you will be sure that the Entries List of each Target will be loaded when you access the property AverageEntryRating.
The downside will be a performance hit, since every time you load a Target you will need to load all its entries... and that's for every Target you query.
However, if you need to get it working fast, this would be probably the easiest. The pragmatic approach would be to do this, measure the performance hit, and if it is too slow then try something else, instead of doing premature optimization. The risk of course would be that it might work fast now, but it might scale badly in the future. So it's up to you to decide.
Another thing to consider would be to not Include the Entries every single time, but only in those places where you know you need the average. It might however become a maintainability issue.
Have a different model and service method to calculate the TargetStats
You could create another class Model that stores the related data of a Target, but it's not persisted in the database. For example:
public class TargetStats
{
public Target Target { get; set; }
public double AverageEntryRating { get; set; }
}
Then in your service you could have a method ish like this (haven't tested, so it might not work as is, but you get the idea):
public List<TargetStats> GetTargetStats() {
var targetStats = _context.Target
.Include(x => x.Entries)
.Select(x => new TargetStats
{
Target = x,
AverageEntryRatings = x.Entries.Where(e => e.Rating > 0).Select(e => e.Rating).Average(),
})
.ToList()
return targetStats;
}
The only advantage of this is that you don't have to degrade the performance of all Target related queries, but only of those that requires the average rating.
But this query in particular might still be slow. What you could do to further tweak it, is write raw SQL instead of LINQ or have a view in the database that you can query.
Store and update the Target's average rating as a column
Probably the best you could do to keep the code clean and have good performance while reading, is to store the average as a column in the Target table. This will move the performance cost of the calculation to the saving/updating of a Target or its related Entries, but the readings will be super fast since the data is already available. If the readings happen way more often than the updates, then it's probably worth doing it.
You could take a look at EF Core docs on perfomance, since they talk a little bit about the different perfomance tunning alternatives.

EF tables with Many to many relationship, Will this consume memory?

I have 2 tables that saved family members, when I use include to retrieve the family members, the generated T-SQL is what I'm expected, but when I see the result from VS, like image below, it's look like never ending.
My questions:
It's this normal?
Should I avoid include when the relationship becomes complex?
If it is normal, will this very memory consumption?
POCO
public class Cust_ProfileTbl
{
[Key]
public long bintAccountNo { get; set; }
public string nvarCardName { get; set; }
public string varEmail { get; set; }
public virtual ICollection<Cust_ProfileFamilyTbl> profileFamilyParents { get; set; }
public virtual ICollection<Cust_ProfileFamilyTbl> profileFamilyChildren { get; set; }
}
public class Cust_ProfileFamilyTbl
{
[Key]
public int intProfileFamily { get; set; }
public long bintAccountNo { get; set; }
public long bintAccountNoMember { get; set; }
public virtual Cust_ProfileTbl custProfileParent { get; set; }
public virtual Cust_ProfileTbl custProfileChild { get; set; }
}
LINQ
var rs = from family in context.member.Include("profileFamilyParents.custProfileChild")
select family;
rs = rs.Where(x => x.bintAccountNo.Equals(1));
var result = rs.ToList();
In onModelCreating
modelBuilder.Entity<Cust_ProfileFamilyTbl>()
.HasRequired(m => m.custProfileParent)
.WithMany(t => t.profileFamilyParents)
.HasForeignKey(m => m.bintAccountNo)
.WillCascadeOnDelete(false);
modelBuilder.Entity<Cust_ProfileFamilyTbl>()
.HasRequired(m => m.custProfileChild)
.WithMany(t => t.profileFamilyChildren)
.HasForeignKey(m => m.bintAccountNoMember)
.WillCascadeOnDelete(false);
When people use an ORM like EF in their application, many times the application design gets driven by this ORM and the entities defined in its model. When the app is a simple "CRUD" application, that's not a problem, but an advantage, because you spare a lot of time.
However when things start to get more complicated, an "ORM guided design" becomes a problem. This looks to be the case.
There are at least two problems, recovered from the comments:
the data retrieved from the DB is more than needed
in this case, because of some particular relationships between entities, there is a circular reference, which creates an endless loop and a stack overflow when trying to show the model in the view
When this kind of situation shows up, the most advisable is to break the tight tie between the ORM and the rest of the app, which can be dine by defining a new class, and projecting the data into it. Let's give a generic ProfileDto name.
public class ProfileDto { ... }
DTO is a generic name for this kind of classes: Data Transfer Objects - but, when they have specific purposes, they can get other names like view models, when they're going to be used as the model sent to an MVC view
And then, what you need to do is to project the result of the query into the DTO:
var model = theQuery.Select(i => new ProfileDto { a = i.a, b = i.b...}).ToList();
With a good design of the Dto you'll only recover the needed data from the DB, and you'll avoid the loop problem (by not including the navigation property that creates the loop).
NOTE: many times people uses mappers, like AutoMapper or ValueInjecter to make the mapping, or part of the mapping, automatic
Code standardization is a very good idea until it becomes a source of problems. The main purpose of writing code is implementing the business logic. If code standardization, technology, or whatever, makes it harder to implement business logic, instead of contributing to the solution, they become a problem, so you need to avoid them.
Mapping you created is Normal but use of Include depends upon its usage
Use of Include depends on situation of use for example if you want to cache it in memory then you may use include, Where as if you are using only showing properties of Cust_ProfileTbl
class in some grid and on click you want show details of Cust_ProfileFamilyTbl then you might don't want to use include. But be careful if you are using Automapper or something because when It will try to map related properties it will query database.
It will consume memeory when you execute ToList() as doing so you are Loading query result into List collection. Where as If you again want to query the result then you can use ToQueryable() or just want to iterate the you can don't load them to List.

Best way to query using EF

Using LINQ, I am having trouble querying my DbContext in an efficient way.
The database contains 700,000 over entities which have a date and a name and other information.
In my code, I have a new list of objects (which can potentially have 100,000 elements) coming in and I would like to query my database and deduct which information are new entity or which information are existing entities that needs to be updated.
I would like to do it in a very efficient way (with a single query if possible).
This is my code :
public class MyDbContext : DbContext
{
public DbSet<MyEntity> MyEntities { get; set; }
}
public class MyEntity
{
[Key]
public Guid Id { get; set; }
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
public string Description { get; set; }
}
public class IncomingInfo
{
public DateTime Date { get; set; }
public string Name { get; set; }
public double Amount { get; set; }
}
public class Modifier
{
public void AddOrUpdate(IList<IncomingInfo> info)
{
using (var context = new MyDbContext())
{
//Find the new information
//to add as new entities
IEnumerable<MyEntity> EntitiesToAdd = ??
//Find the information
//to update in existing entities
IEnumerable<MyEntity> EntitiesToUpdate = ??
}
}
}
Can someone help me constructing my query?
Thank you very much.
Edit :
Sorry I forgot to explain how do I consider two entities equal.
There are equal if the Date and the Name property are identical.
I first tried to build a predicate using LinqKit PredicateBuilder without much success (encountered the error of parameter too large, had to make multiple queries which took time).
So far the most successful way I found was to implement a LEFT OUTER join and join the incoming list to the DbSet
Which I implemented this way :
var values = info.GroupJoin(context.MyEntities,
inf => inf.Name + inf.Date.ToString(),
ent => ent.Name + ent.Date.ToString(),
(inf, ents) => new { Info = inf, Entities = ents })
.SelectMany(i => i.Entities.DefaultIfEmpty(),
(i, ent) => new { i.Info.Name, i.Info.Amount, i.Info.Date, ToBeAdded = ent == null ? true : false });
IEnumerable<MyEntity> EntitiesToAdd = values.Where(i => i.ToBeAdded)
.Select(i => new MyEntity
{
Id = Guid.NewGuid(),
Amount = i.Amount,
Date = i.Date,
Name = i.Name,
Description = null
}).ToList();
My test contains 700,000 entities in database. The incoming info list contains 70,000 items; where 50,000 are existing entities and 20,000 are new entities.
This query takes around 15 seconds to execute which does not seem right to me.
Hopefully this is enough to ask for help. Can someone help me one this ?
Thank you very much.
I read the pastebin response from #Leniency and it covers some of the same stuff I was going to say, like querying a date range and comparing on there. The problem with that method though is that (depending on how those dates are set) it might return all 700K+ records in the database, which would give you the absolute worst performance.
My suggestion is that you analyze your network topology to see how expensive your calls to the database really are. I'm assuming this is running on a (web) server which is receiving these IncomingInfo objects from clients. If this server is closely connected to your database server (or on the same machine) then you might be better off not optimizing your calls to the database.
Also, if you have control over the behavior of the clients, you might want to force them to send only like 25 to 100 records with each request. This would make it so that you could deal with them in much more manageable chunks. The client might have to send 100 or more requests to the server (which you could do async so that they get sent ~5 at a time, depending on expected load profiles), but at least it wouldn't be sitting there for 5+ minutes waiting to get a response back from the server for a single request.
BTW, the GroupJoin call that you said took 15 seconds probably is having to download all 700K records before doing the join. You see, joins can't be done on objects that don't exist on the same machine, it either has to send all the IncomingInfo objects (or at least the Name+Date.ToString() concatenations) to the database, or it has to request all the records from the database before any joins can be done. You would probably have to look at the SQL that is being sent to the database in order to tell which method is being used. But you would probably find that querying the database for matches one at a time would probably be faster than the join in this case.
Hope that helps! ;)

Why is EF4 Code First so slow when storing objects?

I'm currently doing some research on usage of db4o a storage for my web application. I'm quite happy how easy db4o works. So when I read about the Code First approach I kinda liked is, because the way of working with EF4 Code First is quite similar to working with db4o: create your domain objects (POCO's), throw them at db4o, and never look back.
But when I did a performance comparison, EF 4 was horribly slow. And I couldn't figure out why.
I use the following entities :
public class Recipe
{
private List _RecipePreparations;
public int ID { get; set; }
public String Name { get; set; }
public String Description { get; set; }
public List Tags { get; set; }
public ICollection Preparations
{ get { return _RecipePreparations.AsReadOnly(); } }
public void AddPreparation(RecipePreparation preparation)
{
this._RecipePreparations.Add(preparation);
}
}
public class RecipePreparation
{
public String Name { get; set; }
public String Description { get; set; }
public int Rating { get; set; }
public List Steps { get; set; }
public List Tags { get; set; }
public int ID { get; set; }
}
To test the performance I new up a recipe, and add 50.000 RecipePrepations. Then I stored the object in db4o like so :
IObjectContainer db = Db4oEmbedded.OpenFile(Db4oEmbedded.NewConfiguration(), #"RecipeDB.db4o");
db.Store(recipe1);
db.Close();
This takes around 13.000 (ms)
I store the stuff with EF4 in SQL Server 2008 (Express, locally) like this :
cookRecipes.Recipes.Add(recipe1);
cookRecipes.SaveChanges();
And that takes 200.000 (ms)
Now how on earth is db4o 15(!!!) times faster that EF4/SQL? Am I missing a secret turbo button for EF4? I even think that db4o could be made faster? Since I don't initialize the database file, I just let it grow dynamically.
Did you call SaveChanges() inside the loop? No wonder it's slow! Try doing this:
foreach(var recipe in The500000Recipes)
{
cookRecipes.Recipes.Add(recipe);
}
cookRecipes.SaveChanges();
EF expects you to make all the changes you want, and then call SaveChanges once. That way, it can optimize database communication and sql to perform the changes between opening state and saving state, ignoring all changes that you have undone. (For example, adding 50 000 records, then removing half of them, then hitting SaveChanges will only add 25 000 records to the database. Ever.)
Perhaps you can disable Changetracking while adding new objects, this would really increase Performance.
context.Configuration.AutoDetectChangesEnabled = false;
see also for more info: http://coding.abel.nu/2012/03/ef-code-first-change-tracking/
The EF excels at many things, but bulk loading is not one of them. If you want high-performance bulk loading, doing it directly through the DB server will be faster than any ORM. If your app's sole performance constraint is bulk loading, then you probably shouldn't use the EF.
Just to add on to the other answers: db4o typically runs in-process, while EF abstracts an out-of-process (SQL) database. However, db4o is essentially single-threaded. So while it might be faster for this one example with one request, SQL will handle concurrency (multiple queries, multiple users) much better than a default db4o database setup.

Partially Populate Child Collection with NHibernate

I've been struggling with this for a while, and can't seem to figure it out...
I've got a BlogPost class, which has a collection of Comments, and each of the comments has a DatePosted field.
What I need to do is query for a BlogPost and return it with a partially loaded Comments collection, say all comments posted on the 1 Aug 2009.
I've got this query:
BlogPost post = session.CreateCriteria<BlogPost>()
.Add(Restrictions.Eq("Id", 1))
.CreateAlias("Comments", "c")
.Add(Restrictions.Eq("c.DatePosted", new DateTime(2009, 8, 1)))
.UniqueResult<BlogPost>();
When I run this query and check out the generated sql, it first runs a query against the BlogPost table, joining to the Comment table with the correct date restriction in, then runs a second query just on the Comment table that returns everything.
Result is the Comments collection of the BlogPost class totally filled up!
What am I doing wrong?
I've got code samples if anyone needs more info...!
There is a result transformer for this, see the documentation.
Quote:
Note that the kittens collections held
by the Cat instances returned by the
previous two queries are not
pre-filtered by the criteria! If you
wish to retrieve just the kittens that
match the criteria, you must use
SetResultTransformer(CriteriaUtil.AliasToEntityMap).
IList cats =
sess.CreateCriteria(typeof(Cat))
.CreateCriteria("Kittens", "kt")
.Add( Expression.Eq("Name", "F%") )
.SetResultTransformer(CriteriaUtil.AliasToEntityMap)
.List();
You could also use filters that get activated using session.EnableFilter(name).
There is a similar question here.
You're not really doing anything wrong - hibernate just doesn't work that way.
If you navigate from the BlogPost to the Comments, Hibernate will populate the comments based on the association mapping that you've specified, not the query you used to retrieve the BlogPost. Presumably your mapping is just doing a join
on a key column. You can use a filter to get the effect you're looking for. But I think that will still fetch all the comments and then do a post-filter.
More simply, just query for what you want:
List<Comments> comments = session.CreateCriteria<BlogPost>()
.Add(Restrictions.Eq("Id", 1))
.CreateAlias("Comments", "c")
.Add(Restrictions.Eq("c.DatePosted", new DateTime(2009, 8, 1)))
.list();
This will in fact return only the comments from the specified date.
if it makes you feel better, you can then set them like this:
post.setComments(comments); //having already retreived the post elsewhere
I was also surprised by this behaviour when I first encountered it. It seems like a bug, but I've been told its by design.
thanks for the response, i guess i kinda understand why its by design, but i would have thought that there would be a built in method to enable this, your solution works, but feels like a bit of a hack!
my problem is that the child collection is HUGE if not filtered (the example i gave of posts and comments was to protect the names of the innocent!) and there is now way i can be pulling all the data back every time.
i've run Sql Profiler on this and its still pulling all the data back.
when i run the following code the first query does what you expect, just the one post comes back, but as soon as the second query is executed, two queries go to the database, the first to retrieve the filtered comments (bingo!), and then a second to populate the post.Comments property with all the comments, just what i'm trying to avoid!
var post = session.CreateCriteria<BlogPost>()
.Add(Restrictions.Eq("Id", 1))
.UniqueResult<BlogPost>();
var comments = session.CreateCriteria<Comment>()
.Add(Restrictions.Eq("BlogPostId", 1))
.Add(Restrictions.Eq("DatePosted", new DateTime(2009, 8, 1)))
.List<Comment>();
post.Comments = comments;
this is very strange, its not like i'm enumerating over the post.Comments list, so why is it populating it?! here are my classes and maps:
public class BlogPostMap : ClassMap<BlogPost>
{
public BlogPostMap()
{
Id(b => b.Id);
Map(b => b.Title);
Map(b => b.Body);
HasMany(b => b.Comments).KeyColumnNames.Add("BlogPostId");
}
}
public class CommentMap : ClassMap<Comment>
{
public CommentMap()
{
Id(c => c.Id);
Map(c => c.BlogPostId);
Map(c => c.Text);
Map(c => c.DatePosted);
}
}
public class BlogPost
{
public virtual int Id { get; set; }
public virtual string Title { get; set; }
public virtual string Body { get; set; }
public virtual IList<Comment> Comments { get; set; }
}
public class Comment
{
public virtual int Id { get; set; }
public virtual int BlogPostId { get; set; }
public virtual string Text { get; set; }
public virtual DateTime DatePosted { get; set; }
}
any ideas?
I agree it feels like a hack to manually populate the collection.
You can use a custom loader instead. Something like this:
<query name="loadComments">
<return alias="comments" class="Comment"/>
<load-collection alias="comments" role="Post.comments"/>
from Comments c where c.Id = ? and c.DatePosted = SYSDATE
</query>
Also, you can use sql-query if you want more control.
I've occasionally stooped to writing custom loaders when I couldn't get hibernate to generate the query I wanted. Anyway, don't know why I didn't think of that in the first place.
Make the Comments Collection lazy, so that hibernate doesn't fetch it when you're getting the BlogPost. Then use a filter on Comments collection.
comments = session.CreateFilter(blogPost.Comments, ... ).List();

Categories