Reconstituting domain objects from database: identity problem

Reconstituting domain objects from database: identity problem - c#

We are using Linq to SQL to read and write our domain objects to a SQL Server database.
We are exposing a number of services (via WCF) to do various operations. Conecptually, the implementation of these operations consists of three steps: reconstitute the necessary domain objects from the database; execute the operation on the domain objects; persist the (now changed) domain objects back to the database.
Problem is that sometimes, there are two or more instances of the same entity objects, which can lead to inconsistenties when saving the objects back to the db. A little made-up example:
public void Move(string sourceLocationid, destinationLocationId, itemId);
which is supposed to move the item with the given id from the source to the destination location (actual services are more complicated, often involving many locations, items etc). Now, it could be that both source and destination location id are the same - a naive implementation would just reconstitute two instances of the entity object, which would lead to problems.
This issue is now "solved" by checking for it manually, i.e. we reconstitute a first location, check if the id of the second is different from it, and if so reconsistute the second, and so on. This is obvisouly difficult and error-prone.
Anyway, I was actually surprised that there does not seem to be a "standard" solution for this in domain driven design. In particular, repositories or factories do not seem to solve this problem (unless they maintain their own cache, which then needs to be updated etc).
My idea would be to make a DomainContext object per operation, which tracks and caches the domain objects used in that particular method. Instead of reconstituing and saving individual domain objects, such an object would be reconstituted and saved as a whole (possibly using repositories), and it could act as a cache for the domain objects used in that particular operation.
Anyway, it seems that this is a common problem, so how is this usually dealt with? What do you think of the idea above?

The DataContext in Linq-To-Sql supports the Identity Map concept out of the box and should be caching the objects you retrieve. The objects will only be different if you are not using the same DataContext for each GetById() operation.
Linq to Sql objects aren't really valid outside of the lifetime of the DataContext. You may find Rick Strahl's Linq to SQL DataContext Lifetime Management a good background read.
Also, the ORM is not responsible for logic in the domain. It's not going to disallow your example Move operation. That's up for the domain to decide what that means. Does it ignore it? or is it an error? It's your domain logic, and that needs to be implemented at the service boundary you are creating.
However, Linq-To-Sql does know when an object changes, and from what I've looked at, it won't record the change if you are re-assigning the same value. e.g. if Item.LocationID = 12, setting the locationID to 12 again won't trigger an update when SubmitChanges() is called.
Based on the example given, I'd be tempted to return early without ever loading an object if the source and destination are the same.
public void Move(string sourceLocationId, destinationLocationId, itemId)
{
if( sourceLocationId == destinationLocationId )
return;
using( DataContext ctx = new DataContext() )
{
Item item = ctx.Items.First( o => o.ItemID == itemId );
Location destination =
ctx.Locations.First( o => o.LocationID == destinationLocationID );
item.Location = destination;
ctx.SubmitChanges();
}
}
Another small point, which may or may not be applicable, is you should make your interfaces as chunky as possible. e.g. If you're typically going to perform 10 move operations at once, it's better to call 1 service method to perform all 10 operations at once, rather than 1 operation at a time. ref: chunky vs chatty

Many ORMs use two concepts that, if I understand you, address your issue. The first and most relevant is Context this is responsible for ensuring that only one object represents a entity (database table row, in the simple case) no mater how many times or ways it's requested from the database. The second is Unit of Work; this ensures that updates to the database for a group of entities either all succeed or all fail.
Both of these are implemented by the ORM I'm most familiar with (LLBLGen Pro), however I believe NHibernate and others also implement these concepts.

Related

How to to prevent EF from retrieving certain objects

Excuse me for my broken English.
In my application, all objects in the context have a property called ObsoleteFlag, which basically means if the object should still be used on the frontend. It's some sort of "soft-delete" flag without actually having to delete the data.
Now I want to prevent EF from returning any object where ObsoleteFlag is set to true (1)
If for example I retrieve object X, the navigational list property Y contains all the related objects of type Y, no matter what the ObsoleteFlag is set to.
Is there some general way of preventing EF from doing this? I don't want to check on the ObsoleteFlag property everywhere I access the context, and for every navigational property that may be loaded too.
Thanks and sorry for my broken English.

Two different approaches:
In your repository layer have a GetAllWhatever() that returns IQueryable<Whatever> and uses Where(x => !x.Obsolete) and use this whenever you retrieve objects of this type.
Create a view of Create View ActiveWhatever As Select * from ActiveWhatever Where obsolete = 0 and bind to that rather than the table.
The first is essentially checking the flag every time, but doing so in one place, so you don't have to keep thinking about it.
The second is much the same, but the work is pushed to the database instead of the .NET code. If you are going to modify the entities or add new entities you will have to make it a modifiable view, but just how that is done depends on the database in question (e.g. you can do it with triggers in SQL Server, and triggers or rules in PostgreSQL).
The second can also include having a rule or trigger for DELETE that sets your obsolete property instead of deleting, so that a normal delete as far as Entity Framework is concerned becomes one of your soft-deletes as far as the database is concerned.
I'd go for that approach unless you had a reason to object to a view existing just to help the application's implementation (that is you're heavily into the database being "pure" in being concerned with the data rather than its use). But then, if it's handy for one application it's likely handy for more, given the very meaning of this "obsolete".

How to design system around state of object to not duplicate mechanisms in code and back-end?

System I am working with ATM is C# and oracle however problem I am having is system agnostic (could happen to system with java and mysql or any other front-end and back-end combination):
I have TransactionDetail object that can have 9 statuses
Open,
Complete,
Cancelled,
No Quote,
Quoted,
Instructed,
Declined,
Refunded,
Removed
From my experience when one has to deal with statuses in front-end code he should do everything he can to avoid object status having a setter. It is because status is inherent quality and has to be determined at the moment when it is being needed - in other words status should always be determined by a method or get only property and not set.
So statuses are being retrieved with mechanisms like this (this is only a fragment of code but should give you indication how it works)
public TransactionStatus TransactionStatus()
{
if (db.DeclinedTransactions.Any(o => o.TransactionId == this.TransactionId))
return TransactionStatus.Declined;
}
MI is asking for these transaction statuses in a SQL view that would also contain all the data related to transaction.
If object status can be determined only from data of object itself creating computed columns can solve this problem in database. But what about objects like TransactionDetail that spans multiple tables - there isn't computed column mechanism that would allow to 'peek' into other tables.
The only solution I can think of is adding SQL function that determines state and then create a SQL view that contains function + data from table. What I don't like about this approach is that it requires to duplicate logic in code and in database.
How one should design system around state of object which to be determined requires information from more than one table, in a way that would not require to duplicate mechanisms in code and back-end?

If this were a project I was working on, I would not be looking to create a View to calculate this data.
I would be looking at my application business logic.
Whilst a fully normalised database makes perfect sense to the DBAs, there are cases where application performance and scalability can benefit greatly from a little de-normalization.
If you have a reliable framework of business logic (i.e. well defined business objects, good encapsulation, reliable unit tests) then I would personally be looking to add this to the business objects.
This then allows you to define you Status behaviour in code and update an explicit Status. For example, if a change is made to a business object that puts it into a different TransactionStatus then you can explicitly make the change to that status on the business object and persist the entire change to your database.
The usual response to this kind of design suggestion is that you then have to ensure you have the burden of keeping the two things in sync (explicit status vs state of the object) - the answer to that is making sure there is only one piece of logic to carry out these changes and that your business logic is water-tight as described before.
An example:
Invoice contains one or more InvoiceItem
InvoiceItem has a value.
Invoice, when displayed, needs an invoice value total
Usual way that this is done is to use SUM() to calculate the Invoice total "on the fly" in the database to populate an Invoice.Total value.
But if my business logic is well defined - perhaps I add InvoiceItem to an Invoice object in code, and the Add logic also takes the value from InvoiceItem and adds it to an Invoice.Total value - then when I commit the changes, I can also commit that Invoice.Total value.
When I want to display the total, I have a single value, rather than having to aggregate in the database.

Reducing Repositories to Aggregate Roots

I currently have a repository for just about every table in the database and would like to further align myself with DDD by reducing them to aggregate roots only.
Let’s assume that I have the following tables, User and Phone. Each user might have one or more phones. Without the notion of aggregate root I might do something like this:
//assuming I have the userId in session for example and I want to update a phone number
List<Phone> phones = PhoneRepository.GetPhoneNumberByUserId(userId);
phones[0].Number = “911”;
PhoneRepository.Update(phones[0]);
The concept of aggregate roots is easier to understand on paper than in practice. I will never have phone numbers that do not belong to a User, so would it make sense to do away with the PhoneRepository and incorporate phone related methods into the UserRepository? Assuming the answer is yes, I’m going to rewrite the prior code sample.
Am I allowed to have a method on the UserRepository that returns phone numbers? Or should it always return a reference to a User, and then traverse the relationship through the User to get to the phone numbers:
List<Phone> phones = UserRepository.GetPhoneNumbers(userId);
// Or
User user = UserRepository.GetUserWithPhoneNumbers(userId); //this method will join to Phone
Regardless of which way I acquire the phones, assuming I modified one of them, how do I go about updating them? My limited understanding is that objects under the root should be updated through the root, which would steer me towards choice #1 below. Although this will work perfectly well with Entity Framework, this seems extremely un-descriptive, because reading the code I have no idea what I’m actually updating, even though Entity Framework is keeping tab on changed objects within the graph.
UserRepository.Update(user);
// Or
UserRepository.UpdatePhone(phone);
Lastly, assuming I have several lookup tables that are not really tied to anything, such as CountryCodes, ColorsCodes, SomethingElseCodes. I might use them to populate drop downs or for whatever other reason. Are these standalone repositories? Can they be combined into some sort of logical grouping/repository such as CodesRepository? Or is that against best practices.

You are allowed to have any method you want in your repository :) In both of the cases you mention, it makes sense to return the user with phone list populated. Normally user object would not be fully populated with all the sub information (say all addresses, phone numbers) and we may have different methods for getting the user object populated with different kind of information. This is referred to as lazy loading.
User GetUserDetailsWithPhones()
{
// Populate User along with Phones
}
For updating, in this case, the user is being updated, not the phone number itself. Storage model may store the phones in different table and that way you may think that just the phones are being updated but that is not the case if you think from DDD perspective. As far as readability is concerned, while the line
UserRepository.Update(user)
alone doesn't convey what is being updated, the code above it would make it clear what is being updated. Also it would most likely be part of a front end method call that may signifiy what is being updated.
For the lookup tables, and actually even otherwise, it is useful to have GenericRepository and use that. The custom repository can inherit from the GenericRepository.
public class UserRepository : GenericRepository<User>
{
IEnumerable<User> GetUserByCustomCriteria()
{
}
User GetUserDetailsWithPhones()
{
// Populate User along with Phones
}
User GetUserDetailsWithAllSubInfo()
{
// Populate User along with all sub information e.g. phones, addresses etc.
}
}
Search for Generic Repository Entity Framework and you would fine many nice implementation. Use one of those or write your own.

Your example on the Aggregate Root repository is perfectly fine i.e any entity that cannot reasonably exist without dependency on another shouldn't have its own repository (in your case Phone). Without this consideration you can quickly find yourself with an explosion of Repositories in a 1-1 mapping to db tables.
You should look at using the Unit of Work pattern for data changes rather than the repositories themselves as I think they're causing you some confusion around intent when it comes to persisting changes back to the db. In an EF solution the Unit of Work is essentially an interface wrapper around your EF Context.
With regards to your repository for lookup data we simply create a ReferenceDataRepository that becomes responsible for data that doesn't specifically belong to a domain entity (Countries, Colours etc).

If phone makes no sense w/o user, it's an entity (if You care about it's identity) or value object and should always be modified through user and retrieved/updated together.
Think about aggregate roots as context definers - they draw local contexts but are in global context (Your application) themselves.
If You follow domain driven design, repositories are supposed to be 1:1 per aggregate roots.
No excuses.
I bet these are problems You are facing:
technical difficulties - object relation impedance mismatch. You are struggling with persisting whole object graphs with ease and entity framework kind a fails to help.
domain model is data centric (as opposed to behavior centric). because of that - You lose knowledge about object hierarchy (previously mentioned contexts) and magically everything becomes an aggregate root.
I'm not sure how to fix first problem, but I've noticed that fixing second one fixes first good enough. To understand what I mean with behavior centric, give this paper a try.
P.s. Reducing repository to aggregate root makes no sense.
P.p.s. Avoid "CodeRepositories". That leads to data centric -> procedural code.
P.p.p.s Avoid unit of work pattern. Aggregate roots should define transaction boundaries.

This is an old question, but thought worth posting a simple solution.
EF Context is already giving you both Unit of Work (tracks changes) and Repositories (in-memory reference to stuff from DB). Further abstraction is not mandatory.
Remove the DBSet from your context class, as Phone is not an aggregate root.
Use the 'Phones' navigation property on User instead.
static void updateNumber(int userId, string oldNumber, string newNumber)
static void updateNumber(int userId, string oldNumber, string newNumber)
{
using (MyContext uow = new MyContext()) // Unit of Work
{
DbSet<User> repo = uow.Users; // Repository
User user = repo.Find(userId);
Phone oldPhone = user.Phones.Where(x => x.Number.Trim() == oldNumber).SingleOrDefault();
oldPhone.Number = newNumber;
uow.SaveChanges();
}
}

If a Phone entity only makes sense together with an aggregate root User, then I would also think it makes sense that the operation for adding a new Phone record is the responsibility of the User domain object throught a specific method (DDD behavior) and that could make perfectly sense for several reasons, the immidiate reason is we should check the User object exists since the Phone entity depends on it existence and perhaps keep a transaction lock on it while doing more validation checks to ensure no other process have deleted the root aggregate before we are done validating the operation. In other cases with other kinds of root aggregates you might want to aggregate or calculate some value and persist it on column properties of the root aggregate for more efficient processing by other operations later on. Note though I suggest the User domain object have a method that adds the Phone it doesn't mean it should know about the existence of the database or EF, one of the great feature of EM and Hibernate is that they can track changes made to entity classes transparently and that also means adding of new related entities by their navigation collection properties.
Also if you want to use methods that retrieve all phones regardless of the users owning them you could still though it through the User repository you only need one method returns all users as IQueryable then you can map them to get all user phones and do a refined query with that. So you don't even need a PhoneRepository in this case. Beside I would rather use a class with extensions method for IQueryable that I can use anywhere not just from a Repository class if I wanted to abstract queries behind methods.
Just one caveat for being able to delete Phone entities by only using the domain object and not a Phone repository you need to make sure the UserId is part of the Phone primary key or in other words the primary key of a Phone record is a composite key made up of UserId and some other property (I suggest an auto generated identity) in the Phone entity. This makes sense intuively as the Phone record is "owned" by the User record and it's removal from the User navigation collection would equal its complete removal from the database.

Linq-To-SQL Table Class Design

I'm using Linq-To-SQL for a project with around 75 tables. We have to keep a cache of entire tables that we pull down because the entities are all interrelated and pulling them on demand takes way too long. So, to track all of these entities from all of these tables, we have a single class responsible for maintaining in-memory table references. This Cache object has a different property for each of the 75 table references, and each reference caches its table on demand. for example:
private EntityTableReference _reference;
public EntityTableReference EntityTableReference
{
get
{
// Caches all entities from the table
return _reference ?? (_reference = new EntityTableReference(this));
}
}
Now, I've seen a lot of guides saying that this really goes against the principles of OO. The Cache object doesn't do anything, it just provides a common object to pass around so that we can send a single reference to the Cache object in our function calls rather than a reference to every table that the function needs to access. This has been working really well for us and I don't see any downsides in terms of maintainability, readability, speed, etc.
Are there any criticisms against this sort of design decision? Is this a case where breaking the rules is OK because we've evaluated the advantages and disadvantages, or am I missing something here and digging myself into a hole?

One concern I can see is support for Concurrency. If a lot of processes/threads are accessing this object, the read/write operations might end up becoming a bottleneck.

Linq-To-Sql with WCF, Models, and POCO ViewModels Disconnected "DataContext" Timestamp/Rowversion

I have a Linq-To-Sql based repository class which I have been successfully using. I am adding some functionality to the solution, which will provide WCF based access to the database.
I have not exposed the generated Linq classes as DataContracts, I've instead created my own "ViewModel" as a POCO for each entity I am going to be returning.
My question is, in order to do updates and take advantage of some of the Linq-To-Sql features like cyclic references from within my Service, do I need to add a Rowversion/Timestamp field to each table in by database so I can use code like dc.Table.Attach(myDisconnectedObject)? The alternitive, seems ugly:
var updateModel = dc.Table.SingleOrDefault(t => t.ID == myDisconnectedObject.ID);
updateModel.PropertyA = myDisconnectedObject.PropertyA;
updateModel.PropertyB = myDisconnectedObject.PropertyB;
updateModel.PropertyC = myDisconnectedObject.PropertyC;
// and so on and so forth
dc.SubmitChanges();

I guess a RowVersion/TimeStamp column on each table might be the best and least intrusive option - just basically check for that one value, and you're sure whether or not your data might have been modified in the mean time. All other columns can be set to Update Check=Never. This will take care of handling the possible concurrency issues when updating your database from "returning" objects.
However, the other thing you should definitely check out is AutoMapper - it's a great little component to ease those left-right-assignment orgies you have to go through when using ViewModels / Data Transfer Objects by making this mapping between two object types a snap. It's well used, well tested, used by many and very stable - a winner!

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.