Caching Entity Framework DbContexts per request - c#

I have several classes based on System.Entity.Data.DbContext. They get used several times a request in disparate ends of the web application - is it expensive to instantiate them?
I was caching a copy of them in HttpContext.Current.Items because it didn't feel right to have several copies of them per request, but I have now found out that it doesn't get automatically disposed from the HttpContext at the end of the request. Before I set out writing the code to dispose it (in Application_EndRequest), I thought I'd readdress the situation as there really is no point caching them if I should just instantiate them where I need them and dispose them there and then.
Questions similar to this have been asked around the internet, but I can't seem to find one that answers my question exactly. Sorry if I'm repeating someone though.
Update
I've found out that disposing of the contexts probably doesn't matter in this blog post, but I'm still interested to hear whether they are expensive to instantiate in the first place. Basically, is there lots of EF magic going on there behind the scenes that I want to avoid doing too often?

Best bet would be to use an IoC container to manage lifecycles here -- they are very, very good at it and this is quite a common scenario. Has the added advantage of making dynamic invocation easy -- meaning requests for your stylesheet won't create a DB context because it is hardcoded in BeginRequest().

I'm answering my own question for completeness.
This answer provides more information about this issue.
In summary, it isn't that expensive to instantiate the DbContext, so don't worry.
Furthermore, you don't really need to worry about disposing the data contexts either. You might notice ScottGu doesn't in his samples (he usually has the context as a private field on the controller). This answer has some good information from the Linq to SQL team about disposing data contexts, and this blog post also expands on the subject.

Use HttpContext.Items and dispose your context manually in EndRequest - you can even create custom HTTP module for that. That is a correct handling. Context disposal will also release references to all tracked entities and allow GC collecting them.
You can use multiple context per request if you really need them but in most scenarios one is enough. If your server processing is one logical operation you should use one context for whole unit of work. It is especially important if you do more changes in transaction because with multiple context your transaction will be promoted to distributed and it has negative performance impact.

We have a web project using a similar pattern to the one you've described (albeit with multiple and independant L2S Contexts instead of EF). Although the context is not disposed at the end of the request, we have found that because the HttpContext.Current becomes unreferenced, the GC collects the context in due course, causing the dispose under the scenes. We confirmed this using a memory analyser. Although the context was persisting a bit longer than it should, it was acceptable for us.
Since noticing the behaviour we have tried a couple alternatives, including disposing the contexts on EndRequest, and forcing a GC Collect on EndRequest (that one wasn't my idea and was quickly receded).
We're now investigating the possibility of implementing a Unit of Work pattern that encompasses our collection of Contexts during a request. There are some great articles about if you google it, but for us, alas, the time it would take to implement outweighs the potential benefit.
On the side, I'm now investigating the complexity of moving to a combined SOA/Unit of Work approach, but again, it's one of those things hindsight slaps you with after having built up an enterprise sized application without the knowledge.
I'm keen to hear other peoples views on the subject.

Related

Is there an actual risk of memory leak when applying migratons in EF?

According to MSDN, one may utilize the fact that migrations implmenet IDisposable like this.
using IServiceScope scope = app.Services.CreateScope();
scope.ServiceProvider.GetRequiredService<Context>()
.Database.Migrate();
However, I usually call the migration as follows and have never experienced any issues.
app.Services.CreateScope()
.ServiceProvider.GetRequiredService<Context>()
.Database.Migrate();
Is there a good reason in practice to worry? I figure, the migration is carried out once only, at the start and in most cases doesn't even carry out any changes, hence occupying the requested instance of the scope minimal amount of time. It's not asynchronous, neither, which to me additionally lowers the likelihood of it running amok.
Please note that I'm not advocating to omit using in that case. I understand the importance of following best-practices. But I'm also curious how important it is pragmatically.

EF Core memory usage and QueryTrackingBehavior.NoTracking

I have an ASP.NET Core 3 website that is frequently running out of memory on Azure.
One of the heavy-lifting (but frequently used) functions is to generate reports. So I thought I'd use one such report as a test case to see what's going on.
Here is a memory snapshot after the application loads, and then after 9 subsequent requests for one of the reports.
Looking at the diagnostics, lots of memory is consumed by EF change tracking objects.
I've found that if I use options.UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking); in startup, then the snapshots for the same activity produces the following:
This is a massive improvement - adding 2 MB for every request is not viable. Is this normal - I would have thought that even with change tracking on, the GC wouldn't let it get this bad? Or could there be something in my report code that is making it hold onto references or something - I read that static variables in a class can lead to the GC not freeing up those instances, is that a possibility? I'm not sure if switching off some default functionality is just a band-aid to something else I'm doing fundamentally wrong (I'm pretty sure I'm disposing everyting with using statements, etc.).
I would say that such outcome is expected when switching all EF queries to be NoTracking, specially in reporting scenarios where you most likely are reading and then tracking tons of objects in memory.
In the official docs you can find detailed information about this topic. In there you can also see a benchmark comparing the performance of two queries, one that uses the change tracker and another one that doesn't, using a small data set (10 Blogs with 20 posts each). Despite the tiny amount of data, the results are similar to yours: almost a 40% increase in performance and the same-ish decrease in allocated memory.
Therefore, in regards to I'm not sure if switching off some default functionality is just a band-aid to something else I'm doing fundamentally wrong, I would definitely say that's not a band-aid solution at all to do it just for the reporting functionality. In these read-only scenarios where you need a performance boost, using non tracking queries is actually recommended.
However, the only thing I would be aware of is that probably you don't want to switch the tracking behaviour off for ALL queries in your application. By doing so, if you rely on the change tracker to perform updates of the entities somewhere else in the application, those updates will stop working.
For example:
var blog = context.Blogs
.Where(blog => blog.Id = blogId)
.SingleOrDeafult();
blog.Name = "Another Name";
context.SaveChanges() // If the default query behaviour is 'NoTracking', the Blog's name won't be updated since it wasn't in the ChangeTracker.
What I would do instead is to keep the default behaviour as tracking, but then I would change all queries that are used just in the reports to be done in a non tracking manner. To achieve this, you will have to add .AsNoTracking() in all the reporting EF queries.
For example:
var blogs = context.Blogs
.AsNoTracking()
.ToList();
This way, you will boost considerably the performance of your read-only queries, without affecting the rest of the application behaviour.

Stateless Singletons and Concurrency

I have a question about stateless singletons. I also have a question about singletons with state.
Stateless singleton services are a good way to help with scalability. The programmer who architected the project which I maintain basically said there'll be no concurrency issues because "it is just code" (the Singleton class, that is). Meaning the class has no class level variables. It is just methods.
This is where my knowledge of C# gets a little hazy. Is there any possible issue where 2 users, via separate web requests, hit the stateless singleton at the same time? Could they end up in the same method at the same time? Is that even possible? If so, does that mean they'd be using the same local variables in that method? Sounds like a big mess, so I'm assuming it just can't happen. I'm assuming that somehow method calls are never polluted by other users.
I've asked many colleagues about this and no-one knows the answer. So it is a tricky issue.
My question about singletons generally is whether there is any problem with 2 or more concurrent users reading a public property of a Singleton. I'm only interested in reads. Is there a possibility of some kind of concurrency exception where a property is not inside a lock block? Or are concurrent, simultaneous reads safe? I don't really want to use the lock keyword, as that is a performance hit that I don't need.
Thanks
Singleton is an anti-pattern. A stateless singleton is even worse. If something does not hold state, there is not even the faintest reason to make it a singleton.
A stateless singleton is a pure static function from someone who enjoyed adding a pattern without thinking about what the pattern would achieve. Because in this case, he would have noticed that it achieves nothing.
If you see a stateless singleton, you can safely remove every bit of code that makes it a singleton. Add a static to the class definition. Done. Way better than before.
I think you are pretty confused about multi threading, singleton or not. I suggest you read a good book or tutorial on this because it's way out of scope for a simple answer here. If you have shared resources (simple example, a variable that is not a local) then you need to take special care in multi-threaded environments.
If you are reading more often than writing, using a ReaderWriterLock instead of a simple lockmight be beneficial. See here.

C# best practice for performance: object creation vs reuse

I am currently working on performance tuning of an existing C# website. There is a class say.. MyUtil.cs. This class has been extensively used across all web pages. On some pages around 10/12 instances are created (MyUtil). I ran the "Redgate" performance profiler. The object creation is a costly operation according to RedGate.
Please note that each instance sets specific properties and performs specific operation. So I can not reuse the object as it is. I have to reset all the member variables.
I want to optimize this code. I have thought about following options. Kindly help me evaluate which is the better approach here :
(1) Create a "Reset" method in "MyUtil.cs" class which will reset all the member variables (there are 167 of those :(..) so that I can reuse one single object in an page class.
(2) Continue with the multiple object creation (I do have Dispose() method in "MyUtil")
(3) I thought of "object pooling" but again I will have to reset the members. I think its better to pool objects at page level and release them instead of keeping them live at the project level.
Any reply on this will be appreciated. Thanks in advance..!
Every app has multiple opportunities for speedup of different sizes, like kinds of food on your plate.
Creating and initializing objects can typically be one of these, and can typically be a large one.
Whenever I see that object creation/initialization is taking a large fraction of time, I recycle used objects.
It's a basic technique, and it can make a big difference.
But I only do it if I know that it will save a healthy fraction of time.
Don't just do it on general principles.
I would recommend that you always create new objects instead of resetting the objects. This is so for the following reasons
GC is smart enough to classify objects and assign them a generation. It depends on the usage of objects in your code. Profiling is done based on the execution pattern of code as well as architecture of code
You will be able to get optimum result from hardware if you use the GC and let it manage the process as it also decides the garbage collection thresh hold based on hardware configuration available and available system resources.
Apart from that, your code will be much easier and manageable. - (Though it is not a direct benefit, but should also give at least some weight to it.)
Creating object pool at page level is also not a good idea because to re-use the object, you will have to do two things, fetch it from pool and reset its properties, i.e. you will also have to manage the pool which is additional burden
Creating a single instance and re-using it with resetting properties might also not be a good idea cause when you need more then one instance of object, it will not work.
so the conclusion is, you should keep creating objects in the page and let the Garbage collector do its job.

Caching a LINQ to SQL DataContext

We're in the process of doing some performance optimization for a multi-tenant Web application. Currently, a LinqToSql Data Context is created at the beginning of each web request. The context has a lifetime for the web request and it's injected into the constructor of any objects that need it using Castle Windsor.
We had the thought of caching the context (and a set of objects attached to it) in the session cache for up to a few minutes to optimize the setup costs for follow-on web requests. Is this a good/bad idea? What issues need to be considered?
A bad idea IMO. The biggest problem would be concurrency. Thanks to connection-pooling, the costs aren't that much as long as you use the data-context as a pipe for data, not the data bucket itself.
Cache the data; throw away the data-context.
Attempting to hold onto the data-context additionally doesn't scale out to multiple servers, or support any cache implementation except in-process.
Have you measured the setup costs so that you know whether this is worth considering? I really don't believe that is your bottleneck.

Categories