Trigger cache clearing on table crud operation in linq to sql - c#

I have a method that gets all the records from a particular database, then stores it in the cache. The next time that method is called, it first checks the cache to see if it can simply return a cache version, if that cache object hasn't been expired.
Question: how do I trigger a method everytime dataContext.SubmitChanges() is called? For example, if I get all the books from the Book table and store it in Cache["AllBooks"], I want this cache object to be cleared on any crud operations related to the Book table.
What I'm currently doing:
var b = dataContext.Books.Where(x => x.BookId == 4).SingleOrDefault();
b.Title = "new title for book w/ id of 4";
dataContext.SubmitChanges();
ClearBookCache();
later...
private void ClearBookCache() {
CustomCachingSystem.Clear["AllBooks"];
}
What I want: the ClearBookCache() to be automatically triggered on any Book table crud operations, vs. me having to remember to call it everytime I do a crud operation on the Book table.
Note: I wouldn't want that ClearBookCache() method to be called if I do a crud operation on a table that's unrelated to the Book table.
I hope this makes sense!

You could use the SqlDependency Class. Basically, it will let you detect changes to the queried data (if you use M$ SQL Server 2005+).
You may want to look into Service Broker technology (if you use M$ SQL Server 2005+).
Detecting it directly in your application probably won't be enough - you won't be able to detect any changes that were performed outside of your application.

Take a look at DataContext.GetChangeSet() - you could inherit from DataContext and override the SubmitChanges methods to clear the relevant caches based on the ChangeSet contents and then call MyBase.SubmitChanges(...).

Related

SingleOrDefault and FirstOrDefault returning cached data

Some previous code I had written used the Find() method to retrieve single entities by their primary key:
return myContext.Products.Find(id)
This worked great because I had this code tucked into a generic class, and each entity had a different field name as its primary key.
But I had to replace the code because I noticed that it was returning cached data, and I need it to return data from the database each call. Microsoft's documentation confirmed this is the behavior of Find().
So I changed my code to use SingleOrDefault or FirstOrDefault. I haven't found anything in documentation that states these methods return cached data.
Now I am executing these steps:
Save an entity via EF.
Execute an UPDATE statement in SSMS to update the recently saved
record's Description field.
Retrieve the entity into a new entity variable using SingleOrDefault
or FirstOrDefault.
The entities being returned still have the old value in the Description field.
I have run a SQL trace, and verified that the data is being queried during step 3. This baffles me - if EF is making a round trip to the database, why is it returning cached data?
I've searched online, and most answers apply to the Find() method. Furthermore, they suggest some solutions that are merely workarounds (dispose the DbContext and instantiate a new one) or solutions that won't work for me (use the AsNoTracking() method).
How can I retrieve my entities from the database and bypass the EF cache?
The behaviour you're seeing is described in Microsoft's How Queries Work article under point 3:
For each item in the result set
a. If this is a tracking query, EF checks if the data represents an entity already in the change tracker for the context instance
If so, the existing entity is returned
It's described a little better in this blog post:
It turns out that Entity Framework uses the Identity Map pattern. This means that once an entity with a given key is loaded in the context’s cache, it is never loaded again for as long as that context exists. So when we hit the database a second time to get the customers, it retrieved the updated 851 record from the database, but because customer 851 was already loaded in the context, it ignored the newer record from the database (more details).
All of this is saying that if you make a query, it checks the primary key first to see if it already has it in the cache. If so, it uses what's in the cache.
How do you avoid it? The first is to make sure you're not keeping your DbContext object alive too long. DbContext objects are only designed to be used for one unit of work. Bad things happen if you keep it around too long, like excessive memory consumption.
Do you need to retrieve data to display to the user? Create a DbContext to get the data and discard that DbContext.
Do you need to update a record? Create a new DbContext, update the record and discard that DbContext.
This is why, when you use EF Core with dependency injection in ASP.NET Core, it is created with a scoped lifetime, so any DbContext object only lives for the life of one HTTP request.
In the rare case you really do need to get fresh data for a record you already have an object for, you can use EntityEntry.Reload()/EntityEntry.ReloadAsync like this:
myContext.Entry(myProduct).Reload();
That doesn't help you if you only know the ID though.
If you really really need to reload an entity that you only have the ID for, you could do something weird like this:
private Product GetProductById(int id) {
//check if it's in the cache already
var cachedEntity = myContext.ChangeTracker.Entries<Product>()
.FirstOrDefault(p => p.Entity.Id == id);
if (cachedEntity == null) {
//not in cache - get it from the database
return myContext.Products.Find(id);
} else {
//we already have it - reload it
cachedEntity.Reload();
return cachedEntity.Entity;
}
}
But again, this should only be used in limited cases, when you've already addressed any cases of long-living DbContext object because unwanted caching isn't the only consequence.
Ok, I have the same problem and finally found the answer,
You doing everything right, that's just how EF works.
You can use .AsNoTracking() for your purposes:
return myContext.Products.AsNoTracking().Find(id)
make sure you addedusing Microsoft.EntityFrameworkCore; at the top.
It works like a magic

How to update a whole entity without specifying every one of its members?

My .net web service reads an entity from the DB and sends it to a client application.
The client application modifies some fields in the entity and then submits the entity back to the server to be updated in the DB.
The surefire but laborious way to do this goes something like:
public void Update(MyEntity updatedEntity)
{
using (var context = new MyDataContext())
{
var existingEntity = context .MyEntities.Single(e => e.Id == updatedEntity.Id);
existingEntity.FirstName = updatedEntity.Name;
existingEntity.MiddleName = updatedEntity.MiddleName;
existingEntity.LastName = updatedEntity.LastName;
// Rinse, repeat for all members of MyEntity...
context.SubmitChanges();
}
}
I don't want to go down this path because it forces me to specify each and every member property in MyEntity. This is will likely break in case MyEntity's structure is changed.
How can I take the incoming updatedEntity and introduce it to LINQ to SQL whole for update?
I've tried achieving this with the DataContext's Attach() method and entered a world of pain.
Is Attach() the right way to do it? Can someone point to a working example of how to this?
Attach is indeed one way to do it.
That said...
The surefire but laborious way to do this goes something like
The right way if you ask me.
This is will likely break in case MyEntity's structure is changed
I personally would expect to modify my Update business method in case the database schema has changed:
if it's an internal change that doesn't change the business, then there is just no reason to modify the code that calls your business method. Let your business method be in charge of the internal stuff
if it's some change that require you to modify your consumers, then so be it, it was required to update the calling code anyway (at least to populate for instance the new properties you added to the entity)
Basically, my opinon on this subject is that you shouldn't try to pass entities to your business layer. I explained why I think that in a previous answer.

Changes tracking for detached entities or session per client?

I develop a server with persistent client connections (non request based). As I keep track of each connected client state in memory it would be strange if I load entities each time when I need to access such client data.
So I have my detached entities and when I need to perform any changes I don't apply them directly but instead pass these changes and detached entity as a request to GameDb class. It performs changes on this entity and than loads the same entity from the db to perform the same changes again on session-owned entity so NH can track these changes.
I could use Merge but it's much slower because NH should load all entity data (including lazy collections which could be unmodified) to check each property for changes. In my case the performance is critical.
An example:
public void GameDb.UpdateTradeOperation(UserOperation operation, int incomeQuantity, decimal price)
{
if (operation == null) throw new ArgumentNullException("operation");
if (operation.Id == 0) throw new ArgumentException("operation is not persisted");
_operationLogic.UpdateTradeOperation(operation, incomeQuantity, price);
try
{
_factory.Execute(
s =>
{
var op = s.Get<UserOperation>(operation.Id);
_operationLogic.UpdateTradeOperation(op, incomeQuantity, price);
if (op.User.BalanceFrozen != operation.User.BalanceFrozen)
throw new Exception("Inconsistent balance");
}); // commits transaction if no exceptions thrown
}
catch (Exception e)
{
throw new UserStateCorruptedException(operation.User, null, e);
}
}
This approach brings some overcomplexity as I need to apply each change twice and check if the result states are equal. It would be easier if I could use NH Session to monitor entity changes. But it's not recommended to keep NH session opened for a long time and I could have thousands of such long lived opened sessions.
Also it forces me to split my entities and common logic. The problem is that GameDb class doesn't know from which context it's called and can't request any additional data for its operation (e.g. current prices or client socket inactivety timer or many other things) or it may need to conditionaly (by its decision) send some data to the client. Of course I can pass a bunch of delegates to GameDb method but it doesn't seem to me as a good solution.
Can I use Session.Lock to attach my unchanged detached entities so I don't need to perform the changes twice? What LockMode should I use?
Can I use a better approach here? If I keep one opened session per client but commit or rollback transactions quickly will it still open a lot of connections? Will the session keep entities state after the transaction is completed?
What kind of concurrency issues I can experience with long lived per-client-sessions:
If I operate each user entities only from its own thread fiber (or lock)?
If I request another user profile for readonly from "wrong" session (from that session's thread)?
I think what you need to do is to use a second level cache, and store Id of entities per connected client instead of storing entities in the memory.
When client connects you can fetch the entities using Id you storing, which will not even hit the database in subsequent requests as it will fetch entities from the second level cache and you do not need to worry about change tracking.
http://ayende.com/blog/3976/nhibernate-2nd-level-cache
I tried to use Session.Lock (LockMode.None) to reattach detached entities to the new session and it works. It adds an object to the session as clean and unchanged. I can modify it and it will be stored to the database with the next transaction commit.
This is better than merge because Nhibernate does not need to look at all the properties to find out what is changed.
if I change at least one property it updates the whole object (I mean all the properties but without collections and entity links if they are not touched). I set DynamicUpdate = true in the entities mapping and now it updates only changed properties.
If I change any property of the dettached entity outside of its current session, the next call to Session.Lock throws an exception (especially if I change the collection content the exception states "reassociated object has dirty collection"). I do those changes outside of the session because I don't need to save them (some stuff with references).
Very strange, but it works perfectly when I call Lock twice!
try
{
s.Lock(DbEntity, LockMode.None); // throws
}
catch
{
s.Lock(DbEntity, LockMode.None); // everything ok
}
Also for collections: before I came to the solution above I casted them to IPersistentCollection and used ClearDirty().
What about concurrency? My code unsures that each thread fiber updates only its user and nobody except this fiber has write access to the entity.
So the pattern is:
I open a session, get an entity and store it somewhere in the memory.
When I need to read its property - I can do it at any time and very fast.
When I want to modify it I open a new session and perform Lock() on it. After applying changes I commit the transaction and close the session.

How to cache database data into memory for use by MVC application?

I have a somewhat complex permission system that uses six database tables in total and in order to speed it up, I would like to cache these tables in memory instead of having to hit the database every page load.
However, I'll need to update this cache when a new user is added or a permission is changed. I'm not sure how to go about having this in memory cache, and how to update it safely without causing problems if its accessed at the same time as updating
Does anyone have an example of how to do something like this or can point me in the right direction for research?
Without knowing more about the structure of the application, there are lots of possible options. One such option might be to abstract the data access behind a repository interface and handle in-memory caching within that repository. Something as simple as a private IEnumerable<T> on the repository object.
So, for example, say you have a User object which contains information about the user (name, permissions, etc.). You'd have a UserRepository with some basic fetch/save methods on it. Inside that repository, you could maintain a private static HashSet<User> which holds User objects which have already been retrieved from the database.
When you fetch a User from the repository, it first checks the HashSet for an object to return, and if it doesn't find out it gets it from the database, adds it to the HashSet, then returns it. When you save a User it updates both the HashSet and the database.
Again, without knowing the specifics of the codebase and overall design, it's hard to give a more specific answer. This should be a generic enough solution to work in any application, though.
I would cache items as you use it, which means on your data layer when you are getting you data back you check on your cache if it is available there otherwise you go to the database and cache the result after.
public AccessModel GetAccess(string accessCode)
{
if(cache.Get<string>(accessCode) != null)
return cache.Get<string>(accessCode);
return GetFromDatabase(accessCode);
}
Then I would think next on my cache invalidate strategy. You can follow two ways:
One would be set expire data to be 1 hour and then you just hit the database once in a hour.
Or invalidate the cache whenever you update the data. That is for sure the best but is a bit more complex.
Hope it helps.
Note: you can either use ASP.NET Cache or another solution like memcached depending on your infrastructure
Is it hitting the database every page load that's the problem or is it joining six tables that's the problem?
If it's just that the join is slow, why not create a database table that summarizes the data in a way that is much easier and faster to query?
This way, you just have to update your summary table each time you add a user or update a permission. If you group all of this into a single transaction, you shouldn't have issues with out-of-sync data.
You can take advantage of ASP.NET Caching and SqlCacheDependency Class. There is article on MSDN.
You can use the Cache object built in ASP.Net. Here is an article that explains how.
I can suggest cache such data in Application state object. For thread-safe usage, consider using lock operator. Your code would look something like this:
public void ClearTableCache(string tableName)
{
lock (System.Web.HttpContext.Current)
{
System.Web.HttpContext.Current.Application[tableName] = null;
}
}
public SomeDataType GetTableData(string tableName)
{
lock (System.Web.HttpContext.Current)
{
if (System.Web.HttpContext.Current.Application[tableName] == null)
{
//get data from DB then put it into application state
System.Web.HttpContext.Current.Application[tableName] = dataFromDb;
return dataFromDb;
}
return (SomeDataType)System.Web.HttpContext.Current.Application[tableName];
}
}

Reconstituting domain objects from database: identity problem

We are using Linq to SQL to read and write our domain objects to a SQL Server database.
We are exposing a number of services (via WCF) to do various operations. Conecptually, the implementation of these operations consists of three steps: reconstitute the necessary domain objects from the database; execute the operation on the domain objects; persist the (now changed) domain objects back to the database.
Problem is that sometimes, there are two or more instances of the same entity objects, which can lead to inconsistenties when saving the objects back to the db. A little made-up example:
public void Move(string sourceLocationid, destinationLocationId, itemId);
which is supposed to move the item with the given id from the source to the destination location (actual services are more complicated, often involving many locations, items etc). Now, it could be that both source and destination location id are the same - a naive implementation would just reconstitute two instances of the entity object, which would lead to problems.
This issue is now "solved" by checking for it manually, i.e. we reconstitute a first location, check if the id of the second is different from it, and if so reconsistute the second, and so on. This is obvisouly difficult and error-prone.
Anyway, I was actually surprised that there does not seem to be a "standard" solution for this in domain driven design. In particular, repositories or factories do not seem to solve this problem (unless they maintain their own cache, which then needs to be updated etc).
My idea would be to make a DomainContext object per operation, which tracks and caches the domain objects used in that particular method. Instead of reconstituing and saving individual domain objects, such an object would be reconstituted and saved as a whole (possibly using repositories), and it could act as a cache for the domain objects used in that particular operation.
Anyway, it seems that this is a common problem, so how is this usually dealt with? What do you think of the idea above?
The DataContext in Linq-To-Sql supports the Identity Map concept out of the box and should be caching the objects you retrieve. The objects will only be different if you are not using the same DataContext for each GetById() operation.
Linq to Sql objects aren't really valid outside of the lifetime of the DataContext. You may find Rick Strahl's Linq to SQL DataContext Lifetime Management a good background read.
Also, the ORM is not responsible for logic in the domain. It's not going to disallow your example Move operation. That's up for the domain to decide what that means. Does it ignore it? or is it an error? It's your domain logic, and that needs to be implemented at the service boundary you are creating.
However, Linq-To-Sql does know when an object changes, and from what I've looked at, it won't record the change if you are re-assigning the same value. e.g. if Item.LocationID = 12, setting the locationID to 12 again won't trigger an update when SubmitChanges() is called.
Based on the example given, I'd be tempted to return early without ever loading an object if the source and destination are the same.
public void Move(string sourceLocationId, destinationLocationId, itemId)
{
if( sourceLocationId == destinationLocationId )
return;
using( DataContext ctx = new DataContext() )
{
Item item = ctx.Items.First( o => o.ItemID == itemId );
Location destination =
ctx.Locations.First( o => o.LocationID == destinationLocationID );
item.Location = destination;
ctx.SubmitChanges();
}
}
Another small point, which may or may not be applicable, is you should make your interfaces as chunky as possible. e.g. If you're typically going to perform 10 move operations at once, it's better to call 1 service method to perform all 10 operations at once, rather than 1 operation at a time. ref: chunky vs chatty
Many ORMs use two concepts that, if I understand you, address your issue. The first and most relevant is Context this is responsible for ensuring that only one object represents a entity (database table row, in the simple case) no mater how many times or ways it's requested from the database. The second is Unit of Work; this ensures that updates to the database for a group of entities either all succeed or all fail.
Both of these are implemented by the ORM I'm most familiar with (LLBLGen Pro), however I believe NHibernate and others also implement these concepts.

Categories