Synchronizing Entire Databases using Microsoft Sync Framework 2.1

Synchronizing Entire Databases using Microsoft Sync Framework 2.1 - c#

I need the ability to sync multiple remote databases, upload and download, with my main database.
However, the problem lies in the fact that I need to sync the entire database, and the database schema is going to be being updated constantly, and I didn't see any way to code it to grab the entire database schema without adding each individual table to the SyncScope.
This is problematic as that scope will always be changing. I solved the initial problem of removing the existing scope, and adding a new one, but I still cannot find any simple solutions, without querying system tables, and parsing the results, and passing those results (for 150+) tables back to my SyncScope.
The reasons I originally looked at Sync Framework are:
I need to be able to manage the direction of the sync (upload/download) when I do a sync programatically from C# on a button click.
I need the ability to turn on that button, based off their network connectivity.
There's additional tasks that need to be done on a sync download, such as changing connection strings of the mobile units, and storing information about their connection and unit in the database.
There's additional tasks that need to be run on a sync upload, such as verifying data against customer business rules through my OR/M, archiving the data to a network storage, restarting the application, and changing connection strings again.
Eventually, I need partial data sets, decided/chosen by the customer, at run-time, at the object level, in an OR/M framework. These objects, may coincide with one or more tables I won't know of at design-time, or may not even exist at design-time.
Does anyone know if another framework encompasses all my requirements, or if there is a simpler way to do this in the sync framework?

For this task, especially with a changing schema, you could consider Merge Replication instead of the Sync framework.

Related

Desktop application which can work offline when no connectivity with SQL Server

I am designing a WPF desktop application and using Entity framework Code First to create and use SQL Server Database. My database will be hosted on One Server machine and will be running 24*7.
I want to provide a feature, where you can modify data offline(when you have no connectivity with SQL Server DB) and Save it somehow. And whenever your application will find connection with SQL Server, all changes can be moved to SQL Server DB.
Is there any way to achieve this by using Entity Framework ?
I want to emphasis on the part that I am using Entity Framework. Is this type of functionality already implemented by EF?? Or I have to do it manually, like have to write that in any file system and then manually merge it later to DB ?

You could figure out the specific exceptions that are generated when the SQL Server connection is lost, and embed your calls in try-catch blocks. If the server is offline, then in your catch block, pass the entity to a method that serializes the entity to JSON and saves it to the hard drive in a special directory or something. On your next successful query, check that directory to see if there are any saved entities that need to be saved.
Be specific with your catches - you don't want unrelated exceptions to trigger this code.
Some things to keep in mind - what if somebody else changed the data in the meantime? Are you intending to overwrite those changes? How did you get the data which needs to be saved in the first place if you are offline?

As long as you have all data loaded into DbContext/ObjectContext you're free to amend those data anyway you want. Only when SaveChanges() is invoked, the connection is really needed.
However, if you're going to load everything into the context, you seem to reimplementing DataSet functionality, which, in addition, allows for xml serialization/deserialization of the changes, so the changes can be even saved between sessions.
Not as trendy as EF, though :)

While I have never tried this with SQL-based data I have done it in the past with filesystem-based data and it's a major can of worms.
First, you have to have some means of indicating what data needs to be stored locally so that it will be available when you're offline. This will need to be updated either all the time or before you head out--and that can involve a lot of data transfer.
Second, once you're back online there's a lot of conflict resolution that must be done. If there's a realistic chance that someone else might have changed the data while you were out you need some way of detecting the conflict and prompting the user as to what to do in that situation. This almost certainly requires a system that keeps a detailed edit trail on every unit of data that could reasonably be updated.
In my situation I was very fortunate in that it was virtually certain that if the remote user edited file [x] that overwriting the system copy was the right thing to do. Remote users would only be carrying the files that pertained to their projects, conflicts should never happen. Thus the writeback was simply based on timestamps, nothing more. Data which people in the field would not normally need to modify was handled by not even looking at it, modified files were simply copied from the system to the laptop.
This leaves the middle step--saving the pending writes. I disagree with Elemental Pete's answer in this regard--simply serializing them and saving the result them does not work because what happens when you read that data back in again? You see the old copy, not the changed copy!
My approach to this was a local store of all relevant data that was accessed exactly like the main system data was, all reads and writes worked normally.
Something a lot fancier might be needed if you have data that needs transactions involved.
Note that we also hit a nasty human problem: the update process took several minutes (note: >10y ago) simply analyzing what needed to be done, not counting any actual copy time. The result was people bypassing it when they thought they could. Sometimes they thought wrong, oops!

Best approach to incremently update application data

I have been working on an application for a couple of years that I updated using a back-end database. The whole key is that everything is cached on the client, so that it never requires an network connection to operate, but when it does have a connection it will always pickup the latest updates. Every application updated is shipped with the latest version of the database and I wanted it to download only the minimum amount of data when the database has been updated.
I currently use a table with a timestamp to check for updates. It looks something like this.
ID - Name - Description- Severity - LastUpdated
0 - test.exe - KnownVirus - Critical - 2009-09-11 13:38
1 - test2.exe - Firewall - None - 2009-09-12 14:38
This approach was fine for what I previously needed, but I am looking to expand more function of the application to use this type of dynamic approach. All the data is currently stored as XML, but I do not want to store complete XML files in the database and only transmit changed data.
So how would you go about allowing a fairly simple approach to storing dynamic content (text/xml/json/xaml) in a database, and have the client only download new updates? I was thinking of having logic that can handle XML inserted directly
ID - Data - Revision
15 - XXX - 15
XXX would be something like <Content><File>Test.dll<File/><Description>New DLL to load.</Description></Content> and would be inserted into the cache, but this would obviously be complicated as I would need to load them in sequence.
Another approach that has been mentioned was to base it on something similar to Source Control, storing the version in the root of the file and calculating the delta to figure out the minimal amount of data that need to be sent to the client.
Anyone got any suggestions on how to approach this with no risk for data corruption? I would also to expand with features that allows me to revert possibly bad revisions, and replace them with new working ones.

It really depends on the tools you are using and the architecture you already have. Is there already a server with some logic and a data access layer?
Dynamic approaches might get complicated, slow and limit the number of solutions. Why do you need a dynamic structure? Would it be feasible to just add data by using a name-value pair approach in a relational database? Static and uniform data structures are much easier to handle.
Before going into detail, you should consider the different scenarios.
Items can be added
Items can be changed
Items can be removed (I assume)
Adding is not a big problem. The client needs to remember the last revision number it got from the server and you write a query which get everything since there.
Changing is basically the same. You should care about identification of the items. You need an unchangeable surrogate key, as it seems to be the ID you already have. (Guids may be useful here.)
Removing is tricky. You need to either flag items as deleted instead of actually removing them, or have a list of removed IDs with the revision number when they had been removed.
Storing the data in the client: Consider using a relational database like SQLite in the client. (It doesn't need installation, it is just storing in a file. Firefox for instance stores quite a lot in SQLite databases.) When using the same in the server, you can probably reuse some code. It is also transaction based, which helps to keep it consistent (rollback in case of error during synchronization).
XML - if you really need it - can be stored just as a string in the database.
When using an abstraction layer or ORM that supports SQLite (eg. NHibernate), you may also reuse some code even when there is another database used by the server. Note that the learning curve for such an ORM might be rather steep. If you don't know anything like this, it could be too much.
You don't need to force reuse of code in the client and server.
Synchronization itself shouldn't be very complicated. You have a revision number in the client and a last revision in the server. You get all new / changed and deleted items since then in the client and apply it to the local store. Update the local revision number. Commit. Done.
I would never update only a part of a revision, because then you can't really know what changed since the last synchronization. Because you do differential updates, it is essential to have a well defined state of the client.

I would go with a solution using Sync Framework.
Quote from Microsoft:
Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline for applications, services and devices. Developers can build synchronization ecosystems that integrate any application, any data from any store using any protocol over any network. Sync Framework features technologies and tools that enable roaming, sharing, and taking data offline.
A key aspect of Sync Framework is the ability to create custom providers. Providers enable any data sources to participate in the Sync Framework synchronization process, allowing peer-to-peer synchronization to occur.

I have just built an application pretty much exactly as you described. I built it on top of the Microsoft Sync Framework that DjSol mentioned.
I use a C# front end application with a SqlCe database, and a SQL 2005 Server at the other end.
The following articles were extremely useful for me:
Tutorial: Synchronizing SQL Server and SQL Server Compact
Walkthrough: Creating a Sync service
Step by step N-tier configuration of Sync services for ADO.NET 2.0
How to Sync schema changed database using sync framework?

You don't say what your back-end database is, but if it's SQL Server you can use SqlCE (SQL Server Compact Edition) as the client DB and then use RDA merge replication to update the client DB as desired. This will handle all your requirements for sure; there is no need to reinvent the wheel for such a common requirement.

Implement list of objects to be deleted in database

I have a form with few tabs, and in each tab an grid control. When user select a row to be deleted i want to remove it from the grid, and if the object exist in the database remove it too, but not permanent - only if and when user clicks save on form.
For now, if object doesn't exist in db i remove it from the list, and if objects exist in db i delete it from db and remove it from the list. But, if user clicks Cancel button he expects row/s not to be deleted from database.
I have two possible solutions on my mind: 1) - remove object from list, and if objects exist in db add it to the list of objects to be deleted 2) - implement another list, getter will return only objects with state != ToBeDeleted (performance?)
Note: i'm not using ORM tool, working with my own ado.net based data access framework.

I think the case you are descibing just asks pretty much for a Transaction.
ADO.Net handles them easily, provided you are using a reasonable database engine (so: no SqlServerCE for example:))
See for example the TransactionScope class. You construct such object before interacting with the database, and the changes will be "commited" if and only if you call Complete(). If you just leave it alone or if you Dispose() it, the transaction will be cancelled and all changes on the DB will be "rolledback", so, reverted.
So, in your case, you may open the transaction in the Form's ctor or onLoaded(), and Complete() at "save", and Dispose() at any other window closing.
While this is the normal way of handling such things for small systems, especially single-user ones, but be careful: if your system has to handle many concurent useres, you may be not able to use it in this way. The Transaction blocks rows and tables until it is completed or cancelled, and the therefore "other users" may see large delays..
So, how many users do you have to support and how often they will try to edit the same things?
-- edit: (10 users)
With that many users, you will want to avoid long-running transactions. Opening transaction at form-load will be unacceptable, and will lock many users away until that one current user closes the window. But, using transactions at Save() that push all the changes in one batch are OK.
Of course, if you can eliminate transactions at all - that's great! But, it is very hard thing to do if you also need to preserve data integrity.. To eliminate the need of transactions, almost always you have to redesign both the data structure on the DB side, and the way you obtain and work with the data. If you want to redesign both, then I'd really recommend to first try redesigning it to use some existing data-access framework, as even the basic .Net ADO has really nice features for online editing of databases held at SqlClient-compliant databases..
So, assuming you don't want to rewrite/rethink most of your code, you just need to buffer the data and also, delay all of the actual operations on the database.
You may want to do it in a "simple" form: when you display your form, instead of binding your Form directly to the database-driven datasources - download all required data to some BindingList<>s, DataTables, etc - whatever container you like. And bind your form to them instead. Probably you have something like that already set up. But, the important thing is that all those datacontainers must be offline or at least readonly+delayloaded.
Next, you've got to intercept all operations that the user performs on the UI. Surely you have it done already, as I'm assuming the application works:) As your Forms are bound to that offline cached items, your application should perform the operation on that cached data, and don't touch the database at all. But there's more: along with performing them on cached data, you should record what happens to which table.
Then, when finally the user stops playing around and presses CANCEL :) - you just trash everything and close the form. database not changed.
On Save - you open a fresh transaction, then iterate over the list of changes and effectively replay your recorder changes on the database, then commit transaction.
Please note two things though: the database could have changed during the time the users cached the data and the time he pressed Save. You have to detect this and abort, or resolve conflicts. You should do that inside that transaction, either during or before executing the recorded changes. You may detect it by simply comparing the online data with offline cached data (the unchanged original values, not those modified by user), or you may use some other mechanisms like OptimisticLocking and just compare the version tags on the rows.
If you don't like record-replay, you may implement a "DIFF"ing utility that takes the modified offline data and compares it in a generic way with the current-online tables. This is somewhat harder, but has a bonus: with such utility, you can initially doubly-cache the data: one copy for offline reference (just stored and never touched by the user) and one copy for offline editing (all those bound to the Forms). Now, upon Save you open transaction and diff the reference data against the online database. If there are any difference - you've just detected a collision. Solve/merge/abort/etc. If no differences, then you diff the modified data against online-data, and apply all differences found to the database and commit transaction.
Either of those methods has its pros and cons: aside from difficulty of implementation, there's memory issues of caching, latency issues if you dare to copy too large tables, etc.
But - once solved, it would work pretty nice.
And as you finish, you can go and boast that you have just implemented a smaller sis' of the DataSet+DataTable. I'm not joking, and I'm not laughing at you. I'm just trying to show you why everyone is telling you to rewise your DAO layer and try understanding and using the hard work that was already done for you by the platform designers/developers :)
Anyways, I've said you can avoid the clashes and transactions at all if you rethink your data structure.. For example: why do you DELETE the rows at all? I know there's a nifty DELETE statement in the SQL, but, well, do you really need to delete that row? Can't you just add some 'bool isDeleted' column and when user deletes the row from the Grid - just make set that rowcell to True and make the application filter-out any isDeleted=true rows and not show them? and not include them in views and aggregations? Bonus: sys/db admins now have a magic tool: undelete..
Let's take it further: do you need to UPDATE the rows? Maybe you can just APPEND some information that from (this-date) that row should have a new price? of course, the structure must be greatly altered: entities doesn't have properties, but have logs of timestamped property changes (or either the rows must have version numbers and be duplicated..), queries must be done against only the newest versiosn data, etc. Pros: database is now append-only. Transactions, if needed at all, are hyper-short. Cons: SELECT queries are complicated and may be slow, especially when joining many tables..
Pro/Con: and your db actually starts looking very meta- instead of data-base...
Con: and this is really hard task to "upgrade" existing application to such db structure. Writing a new app from scratch and importing data from odl system may be few times faster.
Now, to summarise:
I do not recommend any of the ways described.
First, I recommend you to take some ORM framework like NHibernate, EntityFramework, XPO from DevExpress, or whetever else. Any of them will save you lots of time. Those three I list here even have OptimisticLocking collision detection built-in. Why use SQL-self-written framework when such tools exist?
If not, then next I recommed to use existing tools found in the framework. you use SqlClient, whydontya use DataSet and DataTables? They are provided along with SqlClient and they have many useful mechanisms just built-in, which otherwise you will spend weeks on implementing and testing all by yourself. Learn to use DataSets and its collision detection, and its merging algorithms, and use them. You will loose a bit of time on experimenting and learning, but you will save huge amounts of time on not-reinventing the wheel.
If you really want to do it manually, start with data-caching and record-replay. It is easy to comprehend, it is quite easy to introduce anywhere where you currently use plain SQL queries, and will quickly introduce you to all kinds of cache-syncing and version-checking problems, and you will soon learn in details why all those strange mechanisms in the above-mentioned frameworks were implemented, how they work and what pros/cons they have.
and about the doubly-cached diffing approach.. it will be more tempting to write that record-repay, but please: use it only if you know very well how to detect/solve/merge collisions. Have at least one record-replay approach implemented before you try it..
..and of course yo umay use long-lasting transactions. Dumb-easy to introduce, and they "just irritate" the users.. Well, or even make the system unusable when >90% of the users constantly collide and hit the locks, heh.. No, that was a joke. Don't use long-lasting transactions. They are ok for 1-4 users, or for very sparse databases..

Is there a way to manually update the tracking tables to jumpstart a sync?

I am using the Microsoft Sync Framework "collaboration" providers. Both ends of the sync will use SQL Express to begin with. When provisioned the database contains a "_tracking" table for each "real" table in the database. My database is fairly large, and I don't want to transfer the entire thing via MSF on the first sync. Is there a way to use some other method to "jumpstart" the sync when both sides are known to contain the same data? In my testing when both databases contain identical content, it looks like it downloads the entire scope, churns through the entire batch of "changes", and then uploads the entire scope back to the server which then churns through the entire dataset again. Is there any way to update the _tracking tables (hopefully only on one side) to let the system know that the database contents are the same?
More information (edit):
From examining the contents of the tracking tables after doing an initial sync, it looks like the scope_update_peer_timestamp and local_create_peer_timestamp fields in every _tracking table need to be updated on both sides. In addition, the update_scope_local_id, scope_update_peer_key, and last_change_datetime need to be set on one of the two sides.
The last_change_datetime field is a datetime and is fairly self-explanatory.
The two _timestamp fields seem to use ##DBTS and are thus bigints that contain the equivalent of an editable timestamp column.
That still leaves a bunch of unknowns:
Does MSF track which peer the content of the timestamp columns come from?
Which peer (local or remote) drives the contents of the _timestamp fields?
What logic drives the contents of the update_scope_local_id and scope_update_peer_key fields?
More information on the environment (edit):
I have SQL Express/Std on both sides. The server side will eventually contain information for several clients (using multi-tenancy), so backups will not be useful since the server will contain information for multiple clients.

how are you initializing your databases? are you provisioning databases that both contain the same set of data already?
the best way to initialize other replicas is to use the GenerateSnapshot method on the SqlCeSyncProvider that creates an SDF file to initialize other replicas or to do a back up of the database (non-SDF, SQL Server/Express database), restore it and run PerformPostRestoreFixup before doing a sync.

Is it possible to define a custom tracking mechanism with the Microsoft sync framework?

I am currently evaluating the Microsoft sync framework as a possible solution to sync data between two SQL databases. The examples I have seen so far rely on "tracking tables" containing the information used to track changes to be synced, with triggers on the main tables to keep them up to date.
My database already contains lots of this information (for an existing feature of the software), so it would be good to make use of that instead of having to migrate it all to the new tracking tables. I also don't like the ideas of doubling-up each table into a data table and a tracking table, and adding three triggers to each table - that sounds like it is likely to be a performance issue?
Is there any way of customising the tracking mechanism used by the sync framework (ie. the way in which changes are tracked)?

Yes, it is entirely possible to write your own logic to track changes and use them. For eg. one of the db syncproviders I have used, requires that you should define selectincrementalinsert command. Now which table(s) that data comes from and how you filter out the latest records is immaterial - you just need to define a query or an sp that gives you this data. This applies to all the other incremental sps (which deal with the change tracking)
Along with that you need an anchor value to define when the last sync has happened. I think there is no point in avoiding this one, since this is used exclusively for synchronization and your existing tracking tables will not contain a replacement for this.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.