Implementing object change tracking in an N-Tier WCF MVC application - c#

Most of the examples I've seen online shows object change tracking in a WinForms/WPF context. Or if it's on the web, connected objects are used, therefore, the changes made to each object can be tracked.
In my scenario, the objects are disconnected once they leave the data layer (Mapped into business objects in WCF, and mapped into DTO on the MVC application)
When the users make changes to the object on MVC (e.g., changing 1 field property), how do I send that change from the View, all the way down to the DB?
I would like to have an audit table, that saves the changes made to a particular object. What I would like to save is the before & after values of an object only for the properties that we modified
I can think of a few ways to do this
1) Implement an IsDirty flag for each property for all Models in the MVC layer(or in the javascript?). Propagate that information all the way back down to the service layer, and finally the data layer.
2) Having this change tracking mechanism within the service layer would be great, but how would I then keep track of the "original" values after the modified values have been passed back from MVC?
3) Database triggers? But I'm not sure how to get started. Is this even possible?
Are there any known object change tracking implementations out there for an n-tier mvc-wcf solution?
Example of the audit table:
Audit table
Id Object Property OldValue NewValue
--------------------------------------------------------------------------------------
1 Customer Name Bob Joe
2 Customer Age 21 22

Possible solutions to this problem will depend in large part on what changes you allow in the database while the user is editing the data.
In otherwords, once it "leaves" the database, is it locked exclusively for the user or can other users or processes update it in the meantime?
For example, if the user can get the data and sit on it for a couple of hours or days, but the database continues to allow updates to the data, then you really want to track the changes the user has made to the version currently in the database, not the changes that the user made to the data they are viewing.
The way that we handle this scenario is to start a transaction, read the entire existing object, and then use reflection to compare the old and new values, logging the changes into an audit log. This gets a little complex when dealing with nested records, but is well worth the time spent to implement.
If, on the other hand, no other users or processes are allowed to alter the data, then you have a couple of different options that vary in complexity, data storage, and impact to existing data structures.
For example, you could modify each property in each of your classes to record when it has changed and keep a running tally of these changes in the class (obviously a base class implementation helps substantially here).
However, depending on the point at which you capture the user's changes (every time they update the field in the form, for example), this could generate a substantial amount of non-useful log information because you probably only want to know what changed from the database perspective, not from the UI perspective.
You could also deep clone the object and pass that around the layers. Then, when it is time to determine what has changed, you can again use reflection. However, depending on the size of your business objects, this approach can impose a hefty performance penalty since a complete copy has to be moved over the wire and retained with the original record.
You could also implement the same approach as the "updates allowed while editing" approach. This, in my mind, is the cleanest solution because the original data doesn't have to travel with the edited data, there is no possibility of tampering with the original data and it supports numerous clients without having to support the change tracking in the UI level.

There are two parts to your question:
How to do it in MVC:
The usual way: you send the changes back to the server, a controller handles them, etc. etc..
The is nothing unusual in your use case that mandates a change in the way MVC usually works.
It is better for your use case scenario for the changes to be encoded as individual change operations, not as a modified object were you need to use reflection to find out what changes if any the user made.
How to do it on the database:
This is probably your intended question:
First of all stay away from ORM frameworks, life is too complex as it.
On the last step of the save operation you should have the following information:
The objects and fields that need to change and their new values.
You need to keep track of the following information:
What the last change to the object you intend to modify in the database.
This can be obtained from the Audit table and needs to be saved in a Session (or Session like object).
Then you need to do the following in a transaction:
Obtain the last change to the object(s) being modified from the database.
If the objects have changed abort, and inform the user of the collision.
If not obtain the current values of the fields being changed.
Save the new values.
Update the Audit table.
I would use a stored procedure for this to make the process less chatty, and for greater separations of concerns between the database code and the application code.

Related

What is the best way to ensure there is only a single ORM model instance tied to a row in a database?

The Problem
We have an app that stores hierarchical data in a database. We have defined a POCO object which represents a row of data.
The problem is we need certain properties to be dependent on the item's children and others on their ancestors. As an example, if a ((great)grand)child has incomplete state, then implicitly all of its parents are also incomplete. Similarly, if a parent has a status of disabled, then all children should be implicitly disabled as well.
On the database side of things, everything works thanks to triggers. However, the issue we're having is then synching those changes to any in-memory ORM objects that may have been affected.
That's why we're thinking to do all of this, we need to ensure there is only ever one model instance in memory for any specific row in the database. That's the crux of the entire problem.
We're currently doing that with triggers in the DB, and one giant hash-set of weak references to the objects keyed on the database's ID for the in-memory ORM objects, but we're not sure that's the proper way to go.
Initial Design
Our 'rookie' design started by loading all objects from the database which quickly blew out the memory, let alone took a lot of time loading data that may never actually be displayed in the UI as the user may never navigate to it.
Attempt 2
Our next attempt expanded on the former by dynamically loading only the levels needed for actual display in the UI, which greatly sped up loading, but now doesn't allow the state of the hierarchy to be polled without several calls to the database.
Attempt 2B
Similar to above, but we added persistent 'implicit status' fields which were updated via triggers in the database. That way if a parent was disabled, a trigger updated all children accordingly. Then the model objects simply refreshed themselves with the latest values from the database. This has the down-side of putting some business logic in the model layer and some in the database triggers as well as making both database writes and reads needed for every operation.
Fully Dynamic
This time we tried to make our models 'dumb' and removed our business layer completely from the code, moving that logic entirely to the database. That way there was only single-ownership of the business rules. Plus, this guaranteed bad data couldn't be inserted into the database in the first place. However, here too we needed to constantly poll the database for the 'current' values, meaning some logic did have to be built in to know which objects needed to be refreshed.
Fully Dynamic with Metadata
Similar to above, but all write calls to the database returned an update token that told the models if they had to refresh any loaded parents or children.
I'm hoping to get some feedback from the SO community on how to solve this issue.

Loading /Lazy loading of related entities

Scenario:
I have a (major) design problem. I have DTO classes to fill data from DB and use in the UI. The scenario I have is that:
I have a HouseObject which has TenantObject (one to many) with each tenant has AccountObject (one to many again) and so on (Example Scenario Only)
Problem:
Now my issue is, while retrieving data from DB for HouseObject, Should I get list of all TenantObjects and inturn list of all AccountObjects and so on? because of the one to many relationship, for one HouseObject potentially we are retrieving huge data for Tenants, Accounts and so on.
Should we just retrieve just HouseObject and fire off individual dependent queries per dependency? or should I get all data at once in single call and bind it on screen. Which is the desired solution?
Please advice.
If you looking for performance, and this is what I think you're looking for, you have to think in broad terms, not just Lazy/not lazy. You have to think, how much data you have, and how often it gets updated; where is it stored? Where application runs, how it is utilized, etc.
I see few scenarios:
Lazy load of small chunks. I like this one because it is useful when your data is modified often. You have fresh data all the time.
Caching in application layer. Here can be 2 sub-scenarios
a. Caching ready-models. You prepare your models, fully initialized.
b. Caching separate segments of data (houses, tenants, accounts) and query it in memory.
Prepare your denormalized joined data and store it in Materialized (Oracle) or Indexed (Sql Server) view. You still will have a trip to DB but it will be more efficient than join data each time, or make multiple calls each time.
Combination of #1 and #2.b and I like it the most. It requires more coding but gets you best results in performance and concurrency. Even if your data is mutating, you can have a mechanism to dump the cache. And if your Account changes, you don't need to dump Tenant.
And one more thing, if you need to update your data, remember - use different models. Your display models should be only ViewModel while you should have separate model for your Save. For example, your Save model may have field updatedBy while your View model doesn't.
You really don't have a "major" problem. It is normal daily problem for developers. You need to think of all aspects of your system.

Why would I use Entity Framework in a mobile situtation?

I want to save edited values from a WPF mobile app, via a Web API, as the user tabs out of each field. So on the LostFocus event.
When using EF then the whole entity graph is posted (put) to the Web API each time a field is updated. Even if I just make a DTO for the basic fields on the form, I would still be posting unnecessary data each time.
I was thinking of forgetting about EF in the Web API and simply posting the entity ID, field name and new value. Then in the controller, create my own SQL update statement and use good old ADO.Net to update the database.
This sounds like going back to the noughties or even the nineties, but is there any reason why I should not do that?
I have read this post which makes me lean towards my proposed solution.
Thanks for any comments or advice
Sounds like you are trying to move away from having a RESTful Web API and towards something a little more RPC-ish. Which is fine, as long as you are happy that the extra hassle of implementing this is worth it in terms of bandwith saved.
In terms of tech level, you're not regressing by doing what you proposed; I use EF every day but I still often need to use plain old ADO.NET every now and then and there is a reason why it's still well supported in the CLR. So there is no reason not to, as long as you are comfortable with writing SQL, etc.
However, I'd advise against your current proposal for a couple of reasons
Bandwidth isn't necessarily all that precious
Even for mobile devices, sending 20 or 30 fields back at a time probably isn't a lot of data. Of course, only you can know for your specific scenario if that's too much but considering the wide-spread availability of 3 & 4G networks, I wouldn't see this as a concern unless those fields contain huge amounts of data - of course, it's your use case so you know best :)
Concurrency
Unless the form is actually a representation of several discrete objects which can be updated independently, then by sending back individual changes every time you update a field, you run the risk of ending up with invalid state on the device.
Consider for example if User A and User B are both looking at the same object on their devices. This object has 3 fields A, B, C thus:
A-"FOO"
B-"42"
C-"12345"
Now suppose User A changes field "A" to "BAR" and tabs out of the field, and then User B changes field "C" to "67890" and tabs.
Your back-end now has this state for the object:
A - "BAR"
B - "42"
C - "67890"
However, User A and User B now both have an incorrect state for the Object!
It gets worse if you also have a facility to re-send the entire object from either client because if User A re-sends the entire form (for whatever reason) User B's changes will be lost without any warning!
Typically this is why the RESTful mechanism of exchanging full state works so well; you send the entire object back to the server, and get to decide based on that full state, if it should override the latest version, or return an error, or return some state that prompts the user to manually merge changes, etc.
In other words, it allows you to handle conflicts meaningfully. Entity Framework for example will give you concurrency checking for free just by including a specially typed column; you can handle a Concurreny exception to decide what to do.
Now, if it's the case that the form is comprised of several distinct entities that can be independently updated, you have more of a task-based scenario so you can model your solution accordingly - by all means send a single Model to the client representing all the properties of all of the individual entities on the form, but have separate POST back models, and a handler for each.
For example, if the form shows Customer Master data and their corresponding Address record, you can send the client a single model to populate the form, but only send the Customer Master model when a Customer Master field changes, and only the Address model when an address field changes, etc. This way you can have your cake and eat it because you have a smaller POST payload and you can manage concurrency.

Implement list of objects to be deleted in database

I have a form with few tabs, and in each tab an grid control. When user select a row to be deleted i want to remove it from the grid, and if the object exist in the database remove it too, but not permanent - only if and when user clicks save on form.
For now, if object doesn't exist in db i remove it from the list, and if objects exist in db i delete it from db and remove it from the list. But, if user clicks Cancel button he expects row/s not to be deleted from database.
I have two possible solutions on my mind: 1) - remove object from list, and if objects exist in db add it to the list of objects to be deleted 2) - implement another list, getter will return only objects with state != ToBeDeleted (performance?)
Note: i'm not using ORM tool, working with my own ado.net based data access framework.
I think the case you are descibing just asks pretty much for a Transaction.
ADO.Net handles them easily, provided you are using a reasonable database engine (so: no SqlServerCE for example:))
See for example the TransactionScope class. You construct such object before interacting with the database, and the changes will be "commited" if and only if you call Complete(). If you just leave it alone or if you Dispose() it, the transaction will be cancelled and all changes on the DB will be "rolledback", so, reverted.
So, in your case, you may open the transaction in the Form's ctor or onLoaded(), and Complete() at "save", and Dispose() at any other window closing.
While this is the normal way of handling such things for small systems, especially single-user ones, but be careful: if your system has to handle many concurent useres, you may be not able to use it in this way. The Transaction blocks rows and tables until it is completed or cancelled, and the therefore "other users" may see large delays..
So, how many users do you have to support and how often they will try to edit the same things?
-- edit: (10 users)
With that many users, you will want to avoid long-running transactions. Opening transaction at form-load will be unacceptable, and will lock many users away until that one current user closes the window. But, using transactions at Save() that push all the changes in one batch are OK.
Of course, if you can eliminate transactions at all - that's great! But, it is very hard thing to do if you also need to preserve data integrity.. To eliminate the need of transactions, almost always you have to redesign both the data structure on the DB side, and the way you obtain and work with the data. If you want to redesign both, then I'd really recommend to first try redesigning it to use some existing data-access framework, as even the basic .Net ADO has really nice features for online editing of databases held at SqlClient-compliant databases..
So, assuming you don't want to rewrite/rethink most of your code, you just need to buffer the data and also, delay all of the actual operations on the database.
You may want to do it in a "simple" form: when you display your form, instead of binding your Form directly to the database-driven datasources - download all required data to some BindingList<>s, DataTables, etc - whatever container you like. And bind your form to them instead. Probably you have something like that already set up. But, the important thing is that all those datacontainers must be offline or at least readonly+delayloaded.
Next, you've got to intercept all operations that the user performs on the UI. Surely you have it done already, as I'm assuming the application works:) As your Forms are bound to that offline cached items, your application should perform the operation on that cached data, and don't touch the database at all. But there's more: along with performing them on cached data, you should record what happens to which table.
Then, when finally the user stops playing around and presses CANCEL :) - you just trash everything and close the form. database not changed.
On Save - you open a fresh transaction, then iterate over the list of changes and effectively replay your recorder changes on the database, then commit transaction.
Please note two things though: the database could have changed during the time the users cached the data and the time he pressed Save. You have to detect this and abort, or resolve conflicts. You should do that inside that transaction, either during or before executing the recorded changes. You may detect it by simply comparing the online data with offline cached data (the unchanged original values, not those modified by user), or you may use some other mechanisms like OptimisticLocking and just compare the version tags on the rows.
If you don't like record-replay, you may implement a "DIFF"ing utility that takes the modified offline data and compares it in a generic way with the current-online tables. This is somewhat harder, but has a bonus: with such utility, you can initially doubly-cache the data: one copy for offline reference (just stored and never touched by the user) and one copy for offline editing (all those bound to the Forms). Now, upon Save you open transaction and diff the reference data against the online database. If there are any difference - you've just detected a collision. Solve/merge/abort/etc. If no differences, then you diff the modified data against online-data, and apply all differences found to the database and commit transaction.
Either of those methods has its pros and cons: aside from difficulty of implementation, there's memory issues of caching, latency issues if you dare to copy too large tables, etc.
But - once solved, it would work pretty nice.
And as you finish, you can go and boast that you have just implemented a smaller sis' of the DataSet+DataTable. I'm not joking, and I'm not laughing at you. I'm just trying to show you why everyone is telling you to rewise your DAO layer and try understanding and using the hard work that was already done for you by the platform designers/developers :)
Anyways, I've said you can avoid the clashes and transactions at all if you rethink your data structure.. For example: why do you DELETE the rows at all? I know there's a nifty DELETE statement in the SQL, but, well, do you really need to delete that row? Can't you just add some 'bool isDeleted' column and when user deletes the row from the Grid - just make set that rowcell to True and make the application filter-out any isDeleted=true rows and not show them? and not include them in views and aggregations? Bonus: sys/db admins now have a magic tool: undelete..
Let's take it further: do you need to UPDATE the rows? Maybe you can just APPEND some information that from (this-date) that row should have a new price? of course, the structure must be greatly altered: entities doesn't have properties, but have logs of timestamped property changes (or either the rows must have version numbers and be duplicated..), queries must be done against only the newest versiosn data, etc. Pros: database is now append-only. Transactions, if needed at all, are hyper-short. Cons: SELECT queries are complicated and may be slow, especially when joining many tables..
Pro/Con: and your db actually starts looking very meta- instead of data-base...
Con: and this is really hard task to "upgrade" existing application to such db structure. Writing a new app from scratch and importing data from odl system may be few times faster.
Now, to summarise:
I do not recommend any of the ways described.
First, I recommend you to take some ORM framework like NHibernate, EntityFramework, XPO from DevExpress, or whetever else. Any of them will save you lots of time. Those three I list here even have OptimisticLocking collision detection built-in. Why use SQL-self-written framework when such tools exist?
If not, then next I recommed to use existing tools found in the framework. you use SqlClient, whydontya use DataSet and DataTables? They are provided along with SqlClient and they have many useful mechanisms just built-in, which otherwise you will spend weeks on implementing and testing all by yourself. Learn to use DataSets and its collision detection, and its merging algorithms, and use them. You will loose a bit of time on experimenting and learning, but you will save huge amounts of time on not-reinventing the wheel.
If you really want to do it manually, start with data-caching and record-replay. It is easy to comprehend, it is quite easy to introduce anywhere where you currently use plain SQL queries, and will quickly introduce you to all kinds of cache-syncing and version-checking problems, and you will soon learn in details why all those strange mechanisms in the above-mentioned frameworks were implemented, how they work and what pros/cons they have.
and about the doubly-cached diffing approach.. it will be more tempting to write that record-repay, but please: use it only if you know very well how to detect/solve/merge collisions. Have at least one record-replay approach implemented before you try it..
..and of course yo umay use long-lasting transactions. Dumb-easy to introduce, and they "just irritate" the users.. Well, or even make the system unusable when >90% of the users constantly collide and hit the locks, heh.. No, that was a joke. Don't use long-lasting transactions. They are ok for 1-4 users, or for very sparse databases..

Change tracking entities from multiple sources in Domain Driven Design

I am currently in the process of developing a a rather big web application and is using domain driven design.
I have currently run into some trouble with tracking changes to my Product entity. The thing is, products are constructed partly from data in SQL Azure, partly from data in Azure Table Storage. If certain properties are changed, I will need to persist to both, other changes only to one.
As a result I can not use NHibernate or Entity Framework for tracking changes. For instance the Price argument on the
public void AddPrice(Price price)
method on the Product entity must be persisted to SQL Azure, calculations on a range of prices will take place and the result will be saved to Azure Table Storage.
How would you solve this?
Thoughts:
1) I thought about implementing my own change tracker based on Castle.DynamicProxy, but that seems rather tedious.
2) Implement events internally in the domain entities. This is not a good thing.
Scattering one entity across several persistent stores might not be a good idea. To be more precise, it might mean that it's not one and the same entity and could be split up in smaller, more accurately designed parts instead.
calculations on a range of prices will take place
Are you sure these calculations affect the Product entity and should be handled by the same NHibernate/EF session used in the Product repository ? Since they have to be stored elsewhere, don't they make up a first class notion in the ubiquitous language, resulting in a separate entity with a persistence logic of its own ?
See http://ayende.com/blog/153699/ask-ayende-repository-for-abstracting-multiple-data-sources
What do ORMs do? They take a copy of the data that's used to restore your object into its current state, just before they hand you a reference to the object. When behavior has been applied to the object and you're asking to persist it, the ORM will compare its copy of the data to the data currently inside the object and flush changes accordingly. Why not do the same? The only difference is that not all detected changes will be flushed to the same datastore.
HTH.
BTW, any concurrency going on here?

Categories