Change tracking entities from multiple sources in Domain Driven Design

Change tracking entities from multiple sources in Domain Driven Design - c#

I am currently in the process of developing a a rather big web application and is using domain driven design.
I have currently run into some trouble with tracking changes to my Product entity. The thing is, products are constructed partly from data in SQL Azure, partly from data in Azure Table Storage. If certain properties are changed, I will need to persist to both, other changes only to one.
As a result I can not use NHibernate or Entity Framework for tracking changes. For instance the Price argument on the
public void AddPrice(Price price)
method on the Product entity must be persisted to SQL Azure, calculations on a range of prices will take place and the result will be saved to Azure Table Storage.
How would you solve this?
Thoughts:
1) I thought about implementing my own change tracker based on Castle.DynamicProxy, but that seems rather tedious.
2) Implement events internally in the domain entities. This is not a good thing.

Scattering one entity across several persistent stores might not be a good idea. To be more precise, it might mean that it's not one and the same entity and could be split up in smaller, more accurately designed parts instead.
calculations on a range of prices will take place
Are you sure these calculations affect the Product entity and should be handled by the same NHibernate/EF session used in the Product repository ? Since they have to be stored elsewhere, don't they make up a first class notion in the ubiquitous language, resulting in a separate entity with a persistence logic of its own ?
See http://ayende.com/blog/153699/ask-ayende-repository-for-abstracting-multiple-data-sources

What do ORMs do? They take a copy of the data that's used to restore your object into its current state, just before they hand you a reference to the object. When behavior has been applied to the object and you're asking to persist it, the ORM will compare its copy of the data to the data currently inside the object and flush changes accordingly. Why not do the same? The only difference is that not all detected changes will be flushed to the same datastore.
HTH.
BTW, any concurrency going on here?

Related

Implementing object change tracking in an N-Tier WCF MVC application

Most of the examples I've seen online shows object change tracking in a WinForms/WPF context. Or if it's on the web, connected objects are used, therefore, the changes made to each object can be tracked.
In my scenario, the objects are disconnected once they leave the data layer (Mapped into business objects in WCF, and mapped into DTO on the MVC application)
When the users make changes to the object on MVC (e.g., changing 1 field property), how do I send that change from the View, all the way down to the DB?
I would like to have an audit table, that saves the changes made to a particular object. What I would like to save is the before & after values of an object only for the properties that we modified
I can think of a few ways to do this
1) Implement an IsDirty flag for each property for all Models in the MVC layer(or in the javascript?). Propagate that information all the way back down to the service layer, and finally the data layer.
2) Having this change tracking mechanism within the service layer would be great, but how would I then keep track of the "original" values after the modified values have been passed back from MVC?
3) Database triggers? But I'm not sure how to get started. Is this even possible?
Are there any known object change tracking implementations out there for an n-tier mvc-wcf solution?
Example of the audit table:
Audit table
Id Object Property OldValue NewValue
--------------------------------------------------------------------------------------
1 Customer Name Bob Joe
2 Customer Age 21 22

Possible solutions to this problem will depend in large part on what changes you allow in the database while the user is editing the data.
In otherwords, once it "leaves" the database, is it locked exclusively for the user or can other users or processes update it in the meantime?
For example, if the user can get the data and sit on it for a couple of hours or days, but the database continues to allow updates to the data, then you really want to track the changes the user has made to the version currently in the database, not the changes that the user made to the data they are viewing.
The way that we handle this scenario is to start a transaction, read the entire existing object, and then use reflection to compare the old and new values, logging the changes into an audit log. This gets a little complex when dealing with nested records, but is well worth the time spent to implement.
If, on the other hand, no other users or processes are allowed to alter the data, then you have a couple of different options that vary in complexity, data storage, and impact to existing data structures.
For example, you could modify each property in each of your classes to record when it has changed and keep a running tally of these changes in the class (obviously a base class implementation helps substantially here).
However, depending on the point at which you capture the user's changes (every time they update the field in the form, for example), this could generate a substantial amount of non-useful log information because you probably only want to know what changed from the database perspective, not from the UI perspective.
You could also deep clone the object and pass that around the layers. Then, when it is time to determine what has changed, you can again use reflection. However, depending on the size of your business objects, this approach can impose a hefty performance penalty since a complete copy has to be moved over the wire and retained with the original record.
You could also implement the same approach as the "updates allowed while editing" approach. This, in my mind, is the cleanest solution because the original data doesn't have to travel with the edited data, there is no possibility of tampering with the original data and it supports numerous clients without having to support the change tracking in the UI level.

There are two parts to your question:
How to do it in MVC:
The usual way: you send the changes back to the server, a controller handles them, etc. etc..
The is nothing unusual in your use case that mandates a change in the way MVC usually works.
It is better for your use case scenario for the changes to be encoded as individual change operations, not as a modified object were you need to use reflection to find out what changes if any the user made.
How to do it on the database:
This is probably your intended question:
First of all stay away from ORM frameworks, life is too complex as it.
On the last step of the save operation you should have the following information:
The objects and fields that need to change and their new values.
You need to keep track of the following information:
What the last change to the object you intend to modify in the database.
This can be obtained from the Audit table and needs to be saved in a Session (or Session like object).
Then you need to do the following in a transaction:
Obtain the last change to the object(s) being modified from the database.
If the objects have changed abort, and inform the user of the collision.
If not obtain the current values of the fields being changed.
Save the new values.
Update the Audit table.
I would use a stored procedure for this to make the process less chatty, and for greater separations of concerns between the database code and the application code.

How to design a Data Access Layer for a database table that may change in the future?

Introduction:
I'm refactoring (pretty much rewriting) a legacy application in my current internship. The part that this question will be concerned about is the database it uses and the way they retrieve data from it.
The database structure is:
There's a table that has the main records. Let's say each record is a measurement. It has some info about the measured material and different measurement information.
There's a table view they use that has the same information columns, plus some extra columns that contains data calculated from the given measurements. And it also filters some of the data from the table.
So let's say we have the main table with columns:
Measurement ID
Measurement A
Measurement B
The view has something like this:
Measurement ID
Measurement A
Measurement B
Some extra data (for example Measurement A * Measurement B)
The guy that is leading the development only knows some SQL, so he likes adding new columns that is calculated by some columns in the main table for experimenting. And this is definitely a need at the moment.
Requirements are:
Different types of databases should be supported (like SQL Server, Oracle, and probably some others).
The frontend should be able to show the view, which means even though some main columns will always stay the same, there may be some new columns including newly calculated values.
My question is:
What kind of system should I use to accommodate the needs of this application? I wanted to use Entity Framework, but the fact that the view may have new columns in the future is I think a problem. As far as I understand, I should map my classes to the database before compiling.
The other thing that I'm considering is maybe using Entity Framework to get data from the main table and do the calculations and the filtering that is currently done in the table view directly in the frontend, and skip the view altogether. Which sounds fine, though I don't know if they will allow me to do that.
What would you do in my case? Please take into account that I have virtually no experience with databases and ORMs.

You are correct in that using Entity Framework will be a problem if the underlying DB schema is always changing. It will require you to update the EF model on your end every time to grab those new columns.
Ideally, all of your database access is hidden behind the interface to your DAL, so that your application doesn't need to know about which ORM is being used -- if any -- or which database it's connecting to.
I hate to say it, but given your requirements, an ORM might not make sense. You might want to go with something more generic without any strong-typing. You could just simply always return a DataTable to your application layer, and it could loop through the columns and values to display whatever is returned. If there are fields you know will never change, you could create a manual mapping for those fields only into your application object(s).

You may have a look to NoSQL system that are a lot more flexible on the schema. Or have a look to document database like RavenDB. All these systems allow the schema to change dynamically. You need to check the Pro's and Con's to see if it can fill you requirements.
(This answer is a bit out of subject as it's about replacing the SQL server and not really creating a DAL, but other answers cover the subject well and I would like to propose another way that may help.)

If your schema is unstable, then using Entity Framework as a beginner is going to be a headache. The assumption is that you can just refresh the design canvas periodically to let the tool handle database table changes. You can try that for a time to see when it becomes too much of a pain, but without any prior experience using ORMs or Entity Framework it may not be worth the effort.
I would probably use something like Rob Conery's Massive ORM (https://github.com/robconery/massive). It gives you more flexibility with the underlying database schema and is a very small library. I remember it being ~300 lines of code and very easy to use. It uses C# dynamics so you'll have to be using >= C# 4.0 and be comfortable with that one concept but IMO it's worth it for the low-overhead. A full-fledged ORM like Entity Framework or NHibernate is going to cost a lot of learning cycles.
You could, of course, just stick to ADO.NET DataTables. They're a bit ugly and verbose, but they'll do the job.

You can use Entity Framework - Database First if the DB is changing. Of course, you will have to regenerate your classes when you want to be able to access new columns, when the DB schema changes.
If you need to accomodate different database servers, then you should take a look into implementing a repository pattern and abstract all your data access that way.

Your comment
it involves write operations to the main table but the main table never changes
confirms what I was hoping for. It means you can use Entity Framework as the core of you application and a different route to display data.
Suppose that for display (of the view) you use a classic DataTable (because all common grids support them, contrary to displaying dynamic objects). I don't know how create/update/delete will be done, but saving changes will at some point involve mapping a DataRow to a MainEntity object. You can write one method for that like
MainEntity DataRowToEntity(DataRow row)
{
var entity = new MainEntity();
entity.PropertyA = row["PropertyA"];
....
}
The MainEntity can be attached to a context, its status changed to Modified, and saved.

DataSet or Entity Data Model

Please excuse the noob question as I am new to integrating data with my applications. I've tried to find answers on the net, but not there yet.
I have an application I'm developing in C# on VS2010 which requires data in/out from a database. I am trying to figure out if its a DataSet or Entity Data Model I need to use when setting up a data source. My understanding was that it was the EDM which allowed me to treat tables/fields in a database as objects, but somehow it looks like I can do that with a DataSet too.
Some sources explain that a DataSet makes a cached copy of the Database which can then be manipulated.
Essentially my question is which should I use and what are the (dis)advantages of one over the other.

You have several options open to you when it comes to storing and retrieving data to/from a database:
At the very simplest level, use ADO.NET to open a connection to the DB, create a command and execute it. If you expect results back (i.e. SELECT ...) then you could call the command's ExecuteReader(...). Working in this manner results in very quick execution and the minimum of overhead, but you have to do more of the heavy lifting. If your app is simple, this is probably a good way to go. If your app is, or is likely to be more complex, you may want to consider other options...
ADO.NET DataSets are a reasonable DB IO mechanism, particularly for reading data from a DB. However, they can be a little cumbersome when trying to update the DB.
You could use an Object-Relational Mapper (ORM) like nHibernate or Entity Framework, but, frankly, that often results in your learning curve increasing dramatically while you figure out how to plug together the moving parts and make them work well together.
You might also consider a new variant of Entity Framework called Code First (CF): This allows you to pretty much design your code and CF will generate your EDM and handle the majority of the DB operations required for you to build your system. Scott Hanselman wrote up a nice intro into EF CF.
Having used practically every DB API and ORM on Windows over the last 20+ years, I am delighted with how CF is shaping up! EF 4.3 that shipped just a couple of weeks ago includes some key new improvements to CF including migrations which allow you to handle changes to your DB schema as it evolves. I've build 3-4 systems using EF CF over the last couple of months and am very happy - it's my favorite relational database IO mechanism at present.
If you want to really get into EF CF, I strongly recommend Julia Lerman's book EF CF - it's a short, nicely written, very useful guide that should take you no more than a day or two to work through the main sections of.
Hope this helps.

If you add a LocalDB data source to your project (because you want a small local database file) then when the Data Source Configuration Wizard pops up, it explicitly asks you whether you want to use a Dataset or Entity Data Model database model. Is this the situation you were facing? That was the problem I had that brought me to this entry.
There is no question that for an enterprise class application, or a website, you would want to investigate ADO.NET or an ORM, but it doesn't help answer this question, which has to do with what are the differences between choosing Dataset vs Entity Data Model in the wizard.
Essentially, Entity Data Model is the more recent technology. If you are unfamiliar with Dataset, then this is probably not the time to start using it.

If you're asking what are the pros and cons for ADO.NET (DataSet) vs EntityFramework (Entity Data Model) then there is a discussion that may help at ADO.NET Entity Framework or ADO.NET
EF will get you up and running pretty quickly but in my (very limited) experience its been a pain to maintain.
What is it that has determined that these are your only two options? There are far more available to you including many ORMs.

If your application is supporting a business application than queries get complex pretty soon. In such scenario, stored-procedures save a lot of time and are much easier to maintain and they work better with ADO.NET. In almost all scenarios, I would suggest using stored-procedures and ADO.NET. Move as much of the business rules and logic to stored procedures as you can...much easier to maintain this way.
Use Datasets (datatables) only to retrieve and read data. Any data that needs to be saved to database should be directly manipulated in the database ... no point doing it in dataset and then saving the same. In a multi-user environment it is almost always better to save the changes to database as soon as the user has clicked "save".
You may (should) use business objects within the application for business-logic processes.
Let us take a simple example of where you are saving a Contact (name, phone, email, address etc) and then retrieving a list of contacts added today...I would suggest you do it as follows:
1) Adding the contact - Client (web or otherwise) collects data --> data is saved in a Contact business object --> validate Contact object --> Call repository layer to save Contact object (adding a repository layer is useful but not-necessary to keep the data layer abstract from the client) --> Repository calls the data layer to save the contact object (here a simple ADO.NET call, using Command object, can be made to call the stored procedure to save the contact in database). No dataset was used in this use case.
2) Retrieving list of contacts -- Client calls the repository layer to get the list of contacts --> repository layer call the data layer to retrieve the data --> here the list of data is retrieved as a dataset(datatable) --> return the datatable back to the client and let the client read the data directly from datatable while rendering the data. Even a single contact can be retrieved as a dataset.
P.S: ORM is almost always an overkill. It is almost always used because certain developers like to keep everything object-oriented...so an extra layer gets added even though it does nothing useful (IMHO).

But, what if you have business logic (stored procedures) which can be used in many different applications.
So depends: if you make your application for different users with different backend storage, or you make many applications for users which doesn't change backend storage so often.
It is very important to have database integrity and rules independent from application (inner or outsource)

Abstracting away database specific id:s with the repository pattern?

I'm learning DDD (domain driven design) and the repository pattern (in C#). I would like to be able to use the repository pattern to persist an entity and not care which database is actually used (Oracle, MySQL, MongoDB, RavenDB, etc.). I am, however, not sure how to handle the database specific id:s most (all?) databases uses. RavenDB, for example, requires that each entity it should store has an id property of type string. Other may require an id property of type int. Since this is handled differently by different databases, I cannot make the database id a part of the entity class. But it would have to exist at some point, at least when I store the actual entity. My question is what the best practise regarding this is?
The idea I am currently pursuing is to, for each database I want to support, implement database specific "value objects" for each business object type. These value object would then have the database specific id property and I would map between the two upon reads and writes. Does this seem like a good idea?

This is the classic case of leaking abstractions. You can't possibly abstract away the type of database under a repository interface unless you want to loose all the good things that come with each database. The requirements on ID type (string, Guid or whatever) are only the very top of huge iceberg with majority of its mass under the muddy waters.
Think about transaction handling, concurrency and other stuff. I understand your point about persistence ignorance. It's a good thing for sure to not depend on specific database technology in the domain model. But you also can't get rid of any dependency on any persistence technology.
It's relatively easy to make your domain model work well with any RDBMS. Most of them have standardized data types. Using ORM like NHibernate will help you a lot. It's much harder to achieve the same among NoSQL databases because they tend to differ a lot (which is very good actually).
So my advise would be to do some research on what is the set of possible persistence technologies you will have to deal with and then choose appropriate level of abstraction for the persistence subsystem.
If this won't work for you, think about Event Sourcing. The event store is one of the least demanding persistence technique. Using library such as Jonathan Oliver's EventStore will allow you to use virtually any storage technology, including file system.

I would go ahead and create an int Id field in the entity and then convert that to a string in the repository where the Id must be a string. I think the effort to abstract your persistence is very worth while and actually eases maintenance.

You are doing the right thing! Abstract yourself away from the constraints of the databases primary key types!
Don't try to translate types, just use a different field.
Specifically: Do not try to use the database's primary key, except in your data access logic. If you need a friendly ID for an object, just create an additional field, of whatever type you like, and require your database to store that. Only in your data access layer would you need to find & update the DB record(s) based on your object's friendly ID. Easy.
Then, your constraints on which databases can persist your objects have changed from 'must be able to have a primary key of type xxxx' to simple 'must be able to store type xxxx'. I think you'll then find you cna use any database in the world. Happy coding! DDD is the best!

You can potentially have the ids in the entity but not expose it as part of entity's public interface. This is possible with NHibernate because it allows you to map table column to a private field.
So you can potentially have something like
class Customer {
private readonly Int32? _relationalId;
private readonly String? _documentId;
...
This is not ideal because your persistence logic 'bleeds' on business logic but given the requirements it probably is easier and more robust than maintaining mapping between entity and its id somewhere outside entity. I would also highly recommend you to evaluate "Database agnostic" approach which would be more realistic if you only want to support relational databases. In this case you can at least reuse ORM like NHibernate for your repository implementation. And most relational database support same id types. In your scenario you not only need ORM you also need something like "Object-Document-Mapper". I can see that you will have to write tons and tons of infrastructure code. I highly recommend you to reevaluate your requirements and choose between relational and document databases. Read this: Pros/cons of document-based databases vs. relational databases

Strategies for replacing legacy data layer with Entity framework and POCO classes

We are using .net C# 4.0, VS 2010, EF 4.1 and legacy code in this project we are working on.
I'm working on a win form project where I have made a decision to start using entity framework 4.1 for accessing an ms sql db. The code base is quite old and we have an existing data layer that uses data adapters. These data adapters are used all over the place (in web apps and win form apps) My plan is to replace the old db access code with EF over time and get rid for the tight coupling between UI layers and data layer.
So my idea is to more or less combine EF with the legacy data access layer and slowly replace the legacy data layer with a more modern take on things using EF. So for now we need to use both EF and the legacy db access code.
What I have done so far is to add a project containing the edmx file and context. The edmx is generated using database first approach. I have also added another project that contains the POCO classes (by using ADO.NET POCO Entity Generator). I have more or less followed Julia Lerman's approach in her book "Programming Entity Framework" on how to split the model and the generated POCO classes. The database model has been set for years and it's not an option the change the table and the relationships, triggers, stored procedures, etc, so I'm basically stuck with the db model as it is.
I have read about the repository pattern and unit of work and I kind of like the patterns, but I struggle to implement them when I have both EF and the legacy db access code to deal with. Specially when I don't have the time to replace all of the legacy db access code with a pure EF implementation. In an perfect world I would start all over again with a fresh take one the data model, but that is not an option here.
Is the repository and unit of work patterns the way to go here? In order to use the POCO classes in my business layer, I sometimes need to use both EF and the legacy db code to populate my POCO classes. In another words, I can sometimes use EF to retrieve a part of the data I need and the use the old db access layer to retrieve the rest of the data and then map the data to my POCO classes. When I want to update some data I need to pick data from the POCO classes and use the legacy data access code to store the data in the database. So I need to map the data retrieved from the legacy data access layer to my POCO classes when I want to display the data in the UI and vice versa when I want to save data to the data base.
To complicate things we store some data in tables that we don't know the name of before runtime (Please don't ask me why:-) ). So in the old db access layer, we had to create sql statements on the fly where we inserted the table and column names based on information from other tables.
I also find that the relationships between the POCO classes are somewhat too data base centric. In another words, I feel that I need to have a more simplified domain model to work with. Perhaps I should create a domain model that fits the bill and then use the POCO classes as "DAO's" to populate the domain model classes?
How would you implement this using the Repository pattern and Unit of Work pattern? (if that is the way to go)

Alarm bells are ringing for me! We tried to do something similar a while ago (only with nHibernate not EF4). We had several problems running ADO.NET along side an ORM - database concurrency being a big one.
The database model has been set for
years and it's not an option the
change the table and the
relationships, triggers, stored
procedures, etc, so I'm basically
stuck with the db model as it is.
Yep. Same thing! The problem was that our stored procs contained a lot of business logic and weren't simple CRUD procs so keeping the ORM updated with the various updates performed by a stored procedure was not easy at all - Single Responsibility Principle - not a good one to break!
My plan is to replace the old db
access code with EF over time and get
rid for the tight coupling
between UI layers and data layer.
Maybe you could decouple without the need for an ORM - how about putting a service/facade layer infront of your UI layer to coordinate all interactions with the underlying domain and hide it from the UI.
If your database is 'king' and your app is highly data driven I think you will always be fighting an uphill battle implementing the patterns you mention.
Embrace ado.net for this project - use EF4 and DDD patterns on your next green field proj :)

EDMX + POCO class generator results in EFv4 code, not EFv4.1 code but you don't have to bother with these details. EFv4.1 offers just different API which does exactly the same (and it is only wrapper around EFv4 API).
Depending on the way how you use datasets you can reach some very hard problems. Datasets are representation of the change set pattern. They know what changes were done to data and they are able to store just these changes. EF entities know this only if they are attached to the context which loaded them from the database. Once you work with detached entities you must make a big effort to tell EF what has changed - especially when modifying relations (detached entities are common scenario in web applications and web services). For those purposes EF offers another template called Self-tracking entities but they have another problems and limitations (for example missing lazy loading, you cannot apply changes when entity with the same key is attached to the context, etc.).
EF also doesn't support several features used in datasets - for example unique keys and batch updates. It's fun that newer MS APIs usually solve some pains of previous APIs but in the same time provide much less features then previous APIs which introduces new pains.
Another problem can be with performance - EF is slower then direct data access with datasets and have higher memory consumption (and yes there are some memory leaks reported).
You can forget about using EF for accessing tables which you don't know at design time. EF doesn't allow any dynamic behavior. Table names and the type of database server are fixed in mapping. Another problems can be with the way how you use triggers - ORM tools don't like triggers and EF has limited features when working with database computed values (possibility to fill value in the database or in the application is disjunctive).
The way of filling POCOs from EF + Datasets sounds like this will not be possible when using only EF. EF has some allowed mapping patterns but possibilities to map several tables to single POCO class are extremely limited and constrained (if you want to have these tables editable). If you mean just loading one entity from EF and another entity from data adapter and just make reference between them you should be OK - in this scenario repository sounds like reasonable pattern because the purpose of the repository is exactly this: load or persist data. Unit of work can be also usable because you will most probably want to reuse single database connection between EF and data adapters to avoid distributed transaction during saving changes. UoW will be the place responsible for handling this connection.
EF mapping is related to database design - you can introduce some object oriented modifications but still EF is closely dependent on the database. If you want to use some advanced domain model you will probably need separate domain classes filled from EF and datasets. Again it will be responsibility of repository to hide these details.

From how much we have implemented, I have learned following things.
POCO and Self Tracking objects are difficult to deal with, as if you do not have easy understanding of what goes inside, there will be number of unexpected behavior which may have worked well in your previous project.
Changing pattern is not easy, so far we have been managing simple CRUD without unit of work and identity map pattern. Now lot of legacy code that we wrote in past does not consider these new patterns and the logic will not work correctly.
In our previous code, we were simply using transactions and single insert/update/delete statement that was directly sent to database assuming transactions on server side will take care of all operations.
In such conditions, we were directly dealing with IDs all the time, newly generated IDs were immediately available after single insert statement, however this is not case with EF.
In EF, we are not dealing with IDs, we are dealing with navigation properties, which is a huge change from earlier ADO.NET programming methods.
From our experience we found that only replacing EF with earlier data access code will result in chaos. But EF + RIA Services offer you a completely new solution where you will probably get everything you need and your UI will very easily bind to it. So if you are thinking about complete rewriting using UI + RIA Services + EF, then it is worth, because lot of dependency in query management reduces automatically. You will be focusing only on business logic, but this is a big decision and the amount of man hours required in complete rewriting or just replacing EF is almost same.
So we went UI + RIA Services + EF way, and we started replacing one one module. Mostly EF will easily co-exist with your existing infrastructure so there is no harm.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.