Extremely slow EF startup - 15 minutes

Extremely slow EF startup - 15 minutes - c#

Some time ago I created a system, in which user can define categories with custom fileds for some objects. Then, each object has FieldValue based on its category. Classes below:
public class DbCategory
{
public int Id { get; set; }
[Required]
public string Name { get; set; }
[Required]
public TextDbField MainField { get; set; }
public List<DbField> Fields { get; set; }
}
public class DbObject
{
public int Id { get; set; }
public byte[] Bytes { get; set; }
[Required]
public DbCategory Category { get; set; }
public TextDbFieldValue MainFieldValue { get; set; }
public List<DbFieldValue> FieldsValues { get; set; }
}
public abstract class DbField
{
public int Id { get; set; }
[Required]
public string Name { get; set; }
[Required]
public bool Required { get; set; }
}
public class IntegerDbField : DbField
{
public int? Minimum { get; set; }
public int? Maximum { get; set; }
}
public class FloatDbField : DbField
{
public double? Minimum { get; set; }
public double? Maximum { get; set; }
}
//... few other types
public abstract class DbFieldValue
{
[Key]
public int Id { get; set; }
[Required]
public DbField Field { get; set; }
[JsonIgnore]
public abstract string Value { get; set; }
}
public class IntDbFieldValue : DbFieldValue
{
public int? IntValue { get; set; }
public override string Value
{
get { return IntValue?.ToString(); }
set
{
if (value == null) IntValue = null;
else IntValue = int.Parse(value);
}
}
}// and other FieldValue types
On my dev machine (i5, 16bg ram and ssd drive), database (in SqlExpress) with 4 categories, each hasving 5-6 fields, 10k records, first query takes about 15s. This first query is
var result = db.Objects
.Include(s => s.Category)
.Include(s => s.Category.MainField)
.Include(s => s.MainFieldValue.Field)
.Include(s => s.FieldsValues.Select(f => f.Field))
.Where(predicate ?? AlwaysTrue)
.ToArray();
I do that to load everything into memory. Then, I work on cached list and just write changes into database. I do that, because user can perform search with filter on each FieldValue. Querying database each time then proved to be much to slow - this part however works pretty well.
Problem occurs later. Some clients defined 6 categories with 20+ fields on each, and store 70k+ records, startup takes more than 15 minutes sometimes. After that, there is no difference in the speed between 5k and 50k.
Every technique to improve EF Code First startup time I've found considers mostly view creation caching, ngening EF and so on, but in this case startup time grows after adding more records, not more entities types.
I realise that that's caused by the complexity of schema, but is there some way to speed this up? Fortunately, this is Windows Service, so once it is started, it goes for weeks, but still.
Should I drop EF for the first load and do it in pure SQL? Should I do this in batches? Should I change EF to nHibernate? Or something else? On virtualized servers during execution of this line, this program maxes out the CPU (not SQL server, but my application).
I've tried loading objects only and then load their properties later. This was a bit faster (but not noticably) on small databases, but is even slower on bigger ones. Any help appreciated, even if the answer is "suck it up and wait".

I managed to reduce total start time cuased by EF 3 times with those tricks:
Update framework to 6.2 and enable model caching:
public class CachingContextConfiguration : DbConfiguration
{
public CachingContextConfiguration()
{
SetModelStore(new DefaultDbModelStore(Directory.GetCurrentDirectory()));
}
}
Call ctx.Database.Initialize() explicitly from new thread, as early as possible. This still takes 3-4 seconds, but since it happens alongside other things, it helps a lot.
Load entities into EF cache in reasonable order.
Previously, I just wrote Include after Inlude, which translates into multiple joins. I found a "rule of thumb" on some blog posts, that up to two chained Includes EF performs rather well, but each more slows everything down massively. I also found a blog post, that showed EF caching: once given entity was loaded with Include or Load, it will be automatically put in proper property (blog author is wrong about union of objects). So I did this:
using (var db = new MyContext())
{
db.Fields.Load();
db.Categories.Include(c => c.MainField).Include(x => x.Fields).Load();
db.FieldValues.Load();
return db.Objects.Include(x => x.MainFieldValue.Field).ToArray();
}
This is fetching data 6 times faster than includes from question. I think that once entities are previously loaded, EF engine does not call database for related objects, it just gets them from cache.
I also added this in my context constructor:
Configuration.LazyLoadingEnabled = false;
Configuration.ProxyCreationEnabled = false;
Effects of that are barely noticable, but may play bigger role on huge data set.
I've also watched this presentation of EF Core by Rowan Miller and I will be switching to it on next release - in some cases it's 5-6 times faster than EF6.
Hope this helps someone

Related

Realm.io Many-to-Many

I am having some issues figuring out how the correct way to properly model a many-to-many relationship in my realm, namely around the fact that realm objects are always live.
So the model in question here revolves around two objects: Event and Inventory. An Event can have multiple inventory items assigned to it (think chairs, plates, forks, etc.), and an inventory item can be assigned to multiple events. When we assign it to an event we define how many of said item we want to assign to the event. However this is where the problem arises, since realm objects are always live and the object types are the same, whatever data Events has will affect my inventory data row as well.
Big picture is that I want to show how many items are assigned for each up coming event when I go into my Inventory detail view. So for example I may have 50 total chairs, I've assigned 40 for an event tomorrow, this means I cannot assign another 20 if someone tried to schedule an event that day as well.
My Realm objects look as follows:
public class Event : RealmObject
{
[PrimaryKey]
public string EventId { get; set; }
[Indexed]
public string VenueId { get; set; }
[Indexed]
public string Name { get; set; }
public DateTimeOffset DateOfEventUTC { get; set; }
public IList<Inventory> Items { get; }
}
public class Inventory : RealmObject
{
[PrimaryKey]
public string InventoryId { get; set; }
[Indexed]
public string VenueId { get; set; }
public Category Category { get; set; }
public int Count { get; set; }
public string Name { get; set; }
public string Description { get; set; }
[Backlink(nameof(Event.Items))]
public IQueryable<Event> Events { get; }
}
I then try to do what I want (namely showing how many of the item are assigned for that event) in my VM as so:
var item = unitOfWork.InventoryRepository.GetById(inventoryId);
var nextMonth = DateTime.UtcNow.AddMonths(1);
AssignedEvents = item.Events
.Where(x => x.DateOfEventUTC >= DateTime.UtcNow && x.DateOfEventUTC <= nextMonth)
.ToList()
.Select(x => new AssignedEventModel
{
DateOfEventUTC = x.DateOfEventUTC.DateTime,
Name = x.Name,
AssignedItems = x.Items.First(z => z.InventoryId == inventoryId).Count
})
.ToList();
Unfortunately, this is where the problem arises. I tried applying the [Ignored] tag as was recommended in the realm docs so that the item will no longer be persisted. This unfortunately did not solve my issue. I am still new to realm and I am much more familiar with SQL than NoSQL

I struggle to see how this could work in SQL either, but I'm not an expert in that so I may miss some details that would allow this to work in the way you structured it.
Coming back to our case: the problem has little to do with Realm being live, but more to do with the way you structured your domain models.
If you use the same "Inventory" model to do 2 things:
keep track of the total amount of each item
keep track of the amount of each inventory item used in a specific event
you'll have problems with what Count really represents.
Creating a third model would solve all your problems.
Inventory => for the overall amount of an item
Event => representing the event and all its data
EventInventory => representing the amount of an item used in that event
Not having much information about your project and your other models (see AssignedEventModel etc) I could suggest something along these lines
class Event : RealmObject
{
[PrimaryKey]
public string EventId { get; set; }
// ... other fields you need ...
public DateTimeOffset DateOfEventUTC { get; set; }
[Backlink(nameof(EventInventory.Event))]
public IList<EventInventory> Items { get; }
}
class EventInventory : RealmObject
{
public Inventory Inventory { get; set; }
public int Count { get; set; }
public Event Event { get; set; }
}
class Inventory : RealmObject
{
[PrimaryKey]
public string InventoryId { get; set; }
// ... other fields you need ...
public int TotalCount { get; set; }
[Backlink(nameof(EventInventory.Inventory))]
public IQueryable<EventInventory> EventInventories { get; }
}
Then in your Inventory's VM
var inventory = unitOfWork.InventoryRepository.GetById(inventoryId);
var inUse = inventory.EventInventories
.Where(x => /*...*/)
.Sum(x => x.Count);
// your databind count that want to show under Inventory's View
remainingCount = inventory.TotalCount - InUseCount;
So basically, now you can calculate how much is left available of a certain InventoryItem in a certain time frame. With these models you should be able to create your AssignedEventModel if you need to.
I hope this helps.
On a side node, I noticed that you are using unitOfWork and repository pattern (at least, so it seems). Although it may look like a great idea, it is generally discoraged to be used when working with Realm. This is simply because you are going to miss out on some of the powerful feature of Realm.
You can read more about this here in the "Repository" section of the answer.

Merge properties from mapping table to single class

I have a website that is using EF Core 3.1 to access its data. The primary table it uses is [Story] Each user can store some metadata about each story [StoryUserMapping]. What I would like to do is when I read in a Story object, for EF to automatically load in the metadata (if it exists) for that story.
Classes:
public class Story
{
[Key]
public int StoryId { get; set; }
public long Words { get; set; }
...
}
public class StoryUserMapping
{
public string UserId { get; set; }
public int StoryId { get; set; }
public bool ToRead { get; set; }
public bool Read { get; set; }
public bool WontRead { get; set; }
public bool NotInterested { get; set; }
public byte Rating { get; set; }
}
public class User
{
[Key]
public string UserId { get; set; }
...
}
StoryUserMapping has composite key ([UserId], [StoryId]).
What I would like to see is:
public class Story
{
[Key]
public int StoryId { get; set; }
public bool ToRead { get; set; } //From user mapping table for currently logged in user
public bool Read { get; set; } //From user mapping table for currently logged in user
public bool WontRead { get; set; } //From user mapping table for currently logged in user
public bool NotInterested { get; set; } //From user mapping table for currently logged in user
public byte Rating { get; set; } //From user mapping table for currently logged in user
...
}
Is there a way to do this in EF Core? My current system is to load the StoryUserMapping object as a property of the Story object, then have Non-Mapped property accessors on the Story object that read into the StoryUserMapping object if it exists. This generally feels like something EF probably handles more elegantly.
Use Cases
Setup: I have 1 million stories, 1000 users, Worst-case scenario I have a StoryUserMapping for each: 1 billion records.
Use case 1: I want to see all of the stories that I (logged in user) have marked as "to read" with more than 100,000 words
Use case 2: I want to see all stories where I have NOT marked them NotInterested or WontRead
I am not concerned with querying multiple StoryUserMappings per story, e.g. I will not be asking the question: What stories have been marked as read by more than n users. I would rather not restrict against this if that changes in future, but if I need to that would be fine.

Create yourself an aggregate view model object that you can use to display the data in your view, similar to what you've ended up with under the Story entity at the moment:
public class UserStoryViewModel
{
public int StoryId { get; set; }
public bool ToRead { get; set; }
public bool Read { get; set; }
public bool WontRead { get; set; }
public bool NotInterested { get; set; }
public byte Rating { get; set; }
...
}
This view model is concerned only about aggregating the data to display in the view. This way, you don't need to skew your existing entities to fit how you would display the data elsewhere.
Your database entity models should be as close to "dumb" objects as possible (apart from navigation properties) - they look very sensible as they are the moment.
In this case, remove the unnecessary [NotMapped] properties from your existing Story that you'd added previously.
In your controller/service, you can then query your data as per your use cases you mentioned. Once you've got the results of the query, you can then map your result(s) to your aggregate view model to use in the view.
Here's an example for the use case of getting all Storys for the current user:
public class UserStoryService
{
private readonly YourDbContext _dbContext;
public UserStoryService(YourDbContext dbContext)
{
_dbContext = dbContext;
}
public Task<IEnumerable<UserStoryViewModel>> GetAllForUser(string currentUserId)
{
// at this point you're not executing any queries, you're just creating a query to execute later
var allUserStoriesForUser = _dbContext.StoryUserMappings
.Where(mapping => mapping.UserId == currentUserId)
.Select(mapping => new
{
story = _dbContext.Stories.Single(story => story.StoryId == mapping.StoryId),
mapping
})
.Select(x => new UserStoryViewModel
{
// use the projected properties from previous to map to your UserStoryViewModel aggregate
...
});
// calling .ToList()/.ToListAsync() will then execute the query and return the results
return allUserStoriesForUser.ToListAsync();
}
}
You can then create a similar method to get only the current user's Storys that aren't marked NotInterested or WontRead.
It's virtually the same as before, but with the filter in the Where to ensure you don't retrieve the ones that are NotInterested or WontRead:
public Task<IEnumerable<UserStoryViewModel>> GetForUserThatMightRead(string currentUserId)
{
var storiesUserMightRead = _dbContext.StoryUserMappings
.Where(mapping => mapping.UserId == currentUserId && !mapping.NotInterested && !mapping.WontRead)
.Select(mapping => new
{
story = _dbContext.Stories.Single(story => story.StoryId == mapping.StoryId),
mapping
})
.Select(x => new UserStoryViewModel
{
// use the projected properties from previous to map to your UserStoryViewModel aggregate
...
});
return storiesUserMightRead.ToListAsync();
}
Then all you will need to do is to update your View's #model to use your new aggregate UserStoryViewModel instead of your entity.
It's always good practice to keep a good level of separation between what is "domain" or database code/entities from what will be used in your view.
I would recommend on having a good read up on this and keep practicing so you can get into the right habits and thinking as you go forward.
NOTE:
Whilst the above suggestions should work absolutely fine (I haven't tested locally, so you may need to improvise/fix, but you get the general gist) - I would also recommend a couple of other things to supplement the approach above.
I would look at introducing a navigation property on the UserStoryMapping entity (unless you already have this in; can't tell from your question's code). This will eliminate the step from above where we're .Selecting into an anonymous object and adding to the query to get the Storys from the database, by the mapping's StoryId. You'd be able to reference the stories belonging to the mapping simply by it being a child navigation property.
Then, you should also be able to look into some kind of mapping library, rather than mapping each individual property yourself for every call. Something like AutoMapper will do the trick (I'm sure other mappers are available). You could set up the mappings to do all the heavy lifting between your database entities and view models. There's a nifty .ProjectTo<T>() which will project your queried results to the desired type using those mappings you've specified.

EF6 takes 5 mins to run query on less then 40,000 records

I am running into a very odd error with EF6. I have uploaded ~38K records on my first pass. Then on my second round, I query the table with a conditional linq statement. That line of code takes about 4 mins to run. This are my entities.
[Table("RAW_ADWORDS")]
public class AdWord
{
[Key]
public int ID { get; set; }
public bool Processed { get; set; }
public string Client { get; set; }
public long ClientID { get; set; }
public bool Active { get; set; }
public bool ProcessedAllFile { get; set; }
public DateTime LastTimeRun{ get; set; }
public DateTime? LastDateTimeProcessed { get; set; }
public virtual List<AdWordCampaign> Campaigns { get; set; }
}
[Table("foobar")]
public class AdWordCampaign
{
[Key]
public int ID { get; set; }
public string Campaign { get; set; }
public long CampaignID { get; set; }
public string Day { get; set; }
public long Clicks { get; set; }
public string CampaignStatus { get; set; }
public long Cost { get; set; }
public long Impressions { get; set; }
public double CTR { get; set; }
public long AvgCPC { get; set; }
public double AvgPosition { get; set; }
public DateTime DownloadDate { get; set; }
}
}
First I run this:
AdWord objAdWord = adwordsContext.AdWords.Where(c => c.ClientID == iCampaignID).FirstOrDefault();
Then
AdWordCampaign objAdWordCampaign = objAdWord.Campaigns.Where(c => c.CampaignID == iElementCampaignID && c.Day == sElementDate).FirstOrDefault();
The line above seems to load ALL the records first before it does the query. Also it still takes 4 mins if I add a Take(5) in the query.

I hope this info will be useful.
Try to add indexes to fields of your table you are including to WHERE in LINQ.
You can always can create extra Views and add it into EF model and do LINQ to them. It will reduce time as well.
If you expect 1 record always try to use SingleOrDefault

Try:
objAdWord.Campaigns.FirstOrDefault(c => c.CampaignID == iElementCampaignID && c.Day == sElementDate)
.Where is an O(n) operation, I'm not sure if the Where then FirstOrDefault clause would be optimized but if it's not you're wasting a lot of time using Where. To improve performance further, ensure that CampaignID is indexed

You need to watch the queries that are generated & executed on the server and make sure they're optimized.
If you're using MS SQL Server, you want to run the SQL Server Profiler tool. Put breakpoints in your code before you call the method that executes the query. Clear the profiler's display, then execute the method. You can capture the SQL from there, then put it into SSMS and view the plan. If the query doesn't use indexes, you need to add indexes that it will use the next time it runs.
I've only ever used Database First, not Code First, so I don't know how you tell Entity Framework to create indexes in the Code First scenario, sorry. But you still need to optimize all of your queries.

I've seen this before with EF when referencing linked objects through a "primary object" - i.e. when you do
AdWordCampaign objAdWordCampaign = objAdWord.Campaigns.Where(...).FirstOrDefault();.
Quite simply it iterates all records one-by-one - and hence the slow query.
If you change to the following, you should get an almost instant response:
AdWord objAdWord = adwordsContext.AdWords.Where(c => c.ClientID == iCampaignID).FirstOrDefault();
AdWordCampaign objAdWordCampaign = <adwordsContext>.Campaigns
.Where(c => <c.AdwordId = objAdWord.Id> && c.CampaignID == iElementCampaignID && c.Day == sElementDate)
.FirstOrDefault();
I've put changes in angular brackets and I'm not sure which property within an AdWordCampaign marks the Id of the AdWord for the relationship from a glance at your model, but I'm sure you get the idea - go direct to Capaigns table via the context, using the AdWord as an addiitonal Where clause, rather than via the AdWord's Campaigns collection.

How to get count of lazy list in most efficient way?

I'm having a bit of performance problem with an EF query.
We basically have this:
public class Article
{
public int ID { get; set; }
public virtual List<Visit> Visits { get; set; }
}
public class Visit
{
public int? ArticleID { get; set; }
public DateTime Date { get; set; }
}
Now, I would like to do:
Article a = ...;
vm.Count = a.Visits.Count;
The problem is that, from what I can gather, this first causes the entire list being fetched, and then the count of it. When doing this in a loop this creates a performance problem.
I assumed that it was due to the object being "too concrete", so I've tried to move the Visits.Count call as far back in repository as I can (so that we're sort of working directly with the DbContext). That didn't help.
Any suggestions?

Assuming your data context has a Visits property:
public class MyDbContext: DbContext
{
public IDbSet<Article> Articles { get; set; }
public IDbSet<Visit> Visits { get; set; }
}
you could do that:
using (var ctx = new MyDbContext())
{
var count = ctx.Visits.Where(x => x.ArticleID == 123).Count();
}
Also if the Visits collection is not always required when dealing with an article you could declare it as IEnumerable<T>:
public class Article
{
public int ID { get; set; }
public virtual IEnumerable<Visit> Visits { get; set; }
}
and then rely on the lazy loading.

I think the performance issue might be in the lazy loading. (But need to see more code for that).
Try an include(a => a.Visits) on the moment you retrieve articles from the dbcontext.
for more inforamtion on EF performance: http://www.asp.net/web-forms/tutorials/continuing-with-ef/maximizing-performance-with-the-entity-framework-in-an-asp-net-web-application

In the end I did it another way.
I found that this was hit over and over in different ways, and due to the way the rest of the domain model is set up, I made a bit of a hack:
In my VisitRepository I created a new function GetArticleIDsWithVisit(), which makes a direct sql call via db.SqlQuery, returning a Dictionary. The dictionary is cached and used in all places where visit counts are needed.
Not very pretty, but I have wrapped it inside the repository so I think it's ok.

Does anyone know why I'm getting this warning from NHibernate/NH Profiler?

"Disabled lazy properies fetching for fully_qualified_type_name beacuse it does not support lazy at the entity level".
This warning was reported by NH Profiler, and as a result, I'm experiencing the dreaded SELECT N + 1 side affect. So if 2200 Subgroup entities are returned, an additional query is being executed to retrieve each InvoicePreference entity (2201 queries total). Something about that relationship seems to be causing the issue.
Here are the entities in question and their respective mappings.
Entity 1
public class Subgroup : Entity
{
public virtual string GroupNumber { get; set; }
public virtual string RUSNumber { get; set; }
public virtual string REANumber { get; set; }
public virtual string CustomerType { get; set; }
public virtual string Name { get; set; }
public virtual IList<IndividualEmployment> Employees { get; set; }
public virtual IList<BenefitsAdministrator> Administrators { get; set; }
public virtual InvoicePreference InvoicePreference { get; set; }
}
Entity 2
public class InvoicePreference : IEntity
{
public virtual Guid Id { get; set; }
public virtual Guid SubgroupId { get; set; }
public virtual bool PaperlessNotifications { get; set; }
}
Mapping 1
public static AutoPersistenceModel ConfigureSubGroup(this AutoPersistenceModel
autoPersistenceModel)
{
return autoPersistenceModel.Override<Subgroup>(map =>
{
map.Table("SubgroupV");
map.Id(s => s.Id).Column(SubGroupPrimaryKeyColumn);
map.Map(s => s.CustomerType, "BAS_Customer_Type");
map.Map(s => s.RUSNumber, "BAS_RUS_Number");
map.Map(s => s.GroupNumber, "BAS_Group_Number");
map.Map(s => s.REANumber, "BAS_REA_Number");
map.HasMany(s => s.Administrators).KeyColumn(SubGroupPrimaryKeyColumn);
map.HasMany(s => s.Employees).KeyColumn(SubGroupPrimaryKeyColumn);
map.HasOne(s => s.InvoicePreference).PropertyRef(i => i.SubgroupId);
});
}
Mapping 2
public static AutoPersistenceModel ConfigureInvoicePreference(this AutoPersistenceModel autoPersistenceModel)
{
return autoPersistenceModel.Override<InvoicePreference>(map =>
{
map.Table("SubgroupInvoicePreference");
map.Schema(RetirementStatementsSchemaName);
});
}

InvoicePreference is referenced as hasone. Since it is lazyloaded by default NHibernate will create a proxy to populate the property InvoicePreference and to do that it needs the identity from InvoicePreference which is not present in the Subgroup. Therefor it has to query for it using the property in the propertyref.
To remedy that do .Not.LazyLoad() and/or .Fetch.Join()

I guess that there is some reason why NH disabled lazy loading "on entity level", which I understand as not creating proxies. There may be several reasons for that. Did you get another warning before? I don't really understand why it disabled "lazy properies", which means that some properties are lazy loaded. This is a feature that is used in the mapping explicitly, but I can't see something like this in your mapping definitions.
To overcome the N+1, you may use Fetch.Join. I had bad experience with that, because the queries get really large. In a complex model, you could hit some database server limits (like max. number of columns of a query). It is mostly better to use batch size, which reduces the number of queries notably. Take a look at my answer to "Nhinerbate lazy loading of reference entity".

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Extremely slow EF startup - 15 minutes - c#

Related

Realm.io Many-to-Many

Merge properties from mapping table to single class

EF6 takes 5 mins to run query on less then 40,000 records

How to get count of lazy list in most efficient way?

Does anyone know why I'm getting this warning from NHibernate/NH Profiler?

Categories

Resources