How to sync two databases for disconnected systems from different companies

How to sync two databases for disconnected systems from different companies - c#

Is there a standard messaging protocol(s) / API(s) available to keep databases in sync. Or alternatively API(s) for creating and parsing messages.
Our company is working with another company to provide two different software packages to two different kinds of users. The data sits in two separate databases but parts of it have to remain in sync.
Their system is pretty much a black box to us. And vice versa.
So what would be required would be to track updates, and turn these into messages and send them to a web service, map these back to the destination database fields, and commit them.
The database schemas do not match.
I am aware that we are going to have to roll most of this ourself, but some ideas around messaging or techniques would be good.

One solution : SQL Server Integration Service. It appears from SQL Server 2005. This is exactly what you need. It was called DTS in SQL Server 2000 for Data Transformation Service. This was created to import/export/transform data from one point to an other. This is really easy to use from SQL Server 2005 (DTS is quite horrible).
So basically, you will have to write packages to import data from their database, transform, filter, etc. it exactly how you need it to insert it into your database. And vice versa.
Regarding the black box fact, you should generate the database relational design to make it easier.
EDIT
Just in case of you need to install it, I remember bugs from the SQL Server 2005 installer not installing SSIS at all. I had to satisfy all warnings in the installer system requirements step to obtain it.

You have two problems:
track the changes that have to be synced
apply the changes to the peer
There is a solution that combines a solution to both issues and I'm sure you are aware of it: replication. Merge Replication would allow both sites to update the data and would also provide merge conflict resolution. But replication only works when the table schema is similar and puts a big constraint on development as schema changes have to be carefully coordinated between the sites. In practice, when the sites are operated by independent companies, is quite difficult to maintain for a long term.
If you want to roll your own the change tracking part has built in support in SQL Server:
Change Tracking
Change Data Capture
Both can be used for a sync solution as a mean to detect what changed.
Applying the changes can be resolved by a web service, but there are also built-in solutions in SQL Server that allow for far higher scalability and throughput: Service Broker. Relying on a message defined API for sync allows the two sites to evolve at their own pace and change the schema almost at will, as long as the communication API (the message protocol)remains unchanged.

The answers provided give me some good ideas, but I think we are going to end up doing something a bit different.
We are using MSMQ, and defining a standard messaging system which we will roll ourselves.
As to how we will know what things have changed I am not sure at the moment.

Related

Best approach to incremently update application data

I have been working on an application for a couple of years that I updated using a back-end database. The whole key is that everything is cached on the client, so that it never requires an network connection to operate, but when it does have a connection it will always pickup the latest updates. Every application updated is shipped with the latest version of the database and I wanted it to download only the minimum amount of data when the database has been updated.
I currently use a table with a timestamp to check for updates. It looks something like this.
ID - Name - Description- Severity - LastUpdated
0 - test.exe - KnownVirus - Critical - 2009-09-11 13:38
1 - test2.exe - Firewall - None - 2009-09-12 14:38
This approach was fine for what I previously needed, but I am looking to expand more function of the application to use this type of dynamic approach. All the data is currently stored as XML, but I do not want to store complete XML files in the database and only transmit changed data.
So how would you go about allowing a fairly simple approach to storing dynamic content (text/xml/json/xaml) in a database, and have the client only download new updates? I was thinking of having logic that can handle XML inserted directly
ID - Data - Revision
15 - XXX - 15
XXX would be something like <Content><File>Test.dll<File/><Description>New DLL to load.</Description></Content> and would be inserted into the cache, but this would obviously be complicated as I would need to load them in sequence.
Another approach that has been mentioned was to base it on something similar to Source Control, storing the version in the root of the file and calculating the delta to figure out the minimal amount of data that need to be sent to the client.
Anyone got any suggestions on how to approach this with no risk for data corruption? I would also to expand with features that allows me to revert possibly bad revisions, and replace them with new working ones.

It really depends on the tools you are using and the architecture you already have. Is there already a server with some logic and a data access layer?
Dynamic approaches might get complicated, slow and limit the number of solutions. Why do you need a dynamic structure? Would it be feasible to just add data by using a name-value pair approach in a relational database? Static and uniform data structures are much easier to handle.
Before going into detail, you should consider the different scenarios.
Items can be added
Items can be changed
Items can be removed (I assume)
Adding is not a big problem. The client needs to remember the last revision number it got from the server and you write a query which get everything since there.
Changing is basically the same. You should care about identification of the items. You need an unchangeable surrogate key, as it seems to be the ID you already have. (Guids may be useful here.)
Removing is tricky. You need to either flag items as deleted instead of actually removing them, or have a list of removed IDs with the revision number when they had been removed.
Storing the data in the client: Consider using a relational database like SQLite in the client. (It doesn't need installation, it is just storing in a file. Firefox for instance stores quite a lot in SQLite databases.) When using the same in the server, you can probably reuse some code. It is also transaction based, which helps to keep it consistent (rollback in case of error during synchronization).
XML - if you really need it - can be stored just as a string in the database.
When using an abstraction layer or ORM that supports SQLite (eg. NHibernate), you may also reuse some code even when there is another database used by the server. Note that the learning curve for such an ORM might be rather steep. If you don't know anything like this, it could be too much.
You don't need to force reuse of code in the client and server.
Synchronization itself shouldn't be very complicated. You have a revision number in the client and a last revision in the server. You get all new / changed and deleted items since then in the client and apply it to the local store. Update the local revision number. Commit. Done.
I would never update only a part of a revision, because then you can't really know what changed since the last synchronization. Because you do differential updates, it is essential to have a well defined state of the client.

I would go with a solution using Sync Framework.
Quote from Microsoft:
Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline for applications, services and devices. Developers can build synchronization ecosystems that integrate any application, any data from any store using any protocol over any network. Sync Framework features technologies and tools that enable roaming, sharing, and taking data offline.
A key aspect of Sync Framework is the ability to create custom providers. Providers enable any data sources to participate in the Sync Framework synchronization process, allowing peer-to-peer synchronization to occur.

I have just built an application pretty much exactly as you described. I built it on top of the Microsoft Sync Framework that DjSol mentioned.
I use a C# front end application with a SqlCe database, and a SQL 2005 Server at the other end.
The following articles were extremely useful for me:
Tutorial: Synchronizing SQL Server and SQL Server Compact
Walkthrough: Creating a Sync service
Step by step N-tier configuration of Sync services for ADO.NET 2.0
How to Sync schema changed database using sync framework?

You don't say what your back-end database is, but if it's SQL Server you can use SqlCE (SQL Server Compact Edition) as the client DB and then use RDA merge replication to update the client DB as desired. This will handle all your requirements for sure; there is no need to reinvent the wheel for such a common requirement.

Architecture Question - One Central Database and Many Different Programs Accessing It

I am designing a program that will build and maintain a database, and act as a central server. This is the 'first stage' of a grander plan. Coming later will be 3-5 remote programs built around the information put into this database.
The requirements are:
The remote programs must be able to access the information in the database.
The remote programs must be able to set alerts when information in the database changes.
The remote programs must be able to request the central server to go out and fetch new / different data.
So, the question is this: how do I expose this data and events to the outside world? My two choices are:
Have them communicate directly with my 'server' application. This seems easier to:
do event notifications (although I suppose I'm probably missing something in SQL).
It also seems like this is more 'upgradeable' - that is I don't need to worry about the database updating and crashing all my remote programs because something changed. I can account for this and transform it the data to a version the child program will understand.
Just go ahead and let them connect directly to the database.
This nice thing about this is that it's solved. I can use LINQ for SQL. The only thing the main server application needs to do is let the remote programs know where the database is.
I'm unsure how to trigger / relay 'events' for field changes in a database over different programs that may or may not be on the same computer.
Forgive my ignorance on this question. I feel woefully unprepared to ask it, but I'm having a hard time figuring out where to get started with this. It is my first real DB project :-/
Thanks!

If the other programs are going to need to know about updates to the database, then the best solution is to manage all db updates through your server application so it can alert clients of the changes. Otherwise it will be tough for the clients to be aware of changes to the db. This also has the advantage of hiding the implementation details of your storage solution from the clients, so you are free to change databases, etc...

My suggestion would be to go with option 1. Build out a web service that can provide the information they all need. This will be the most flexible and allow you to reduce duplicate backend code that would happen with direct communication with the database.

I would recommend looking at some Data Source design patterns first. This types of patterns will help you come up with solutions about how to manage the states of your data. Otherwise I think that I would require some more information about your requirements for the clients to make any further useful suggestions.

I recommend you learn about SQL Server and/or databases first. You don't appear to realize that most of what you want from your "central server" can all be done by SQL Server itself.

A central databse is the simplest option and the cheapest to both build and maintain.
There are however a few scenarios where a central database could cause problems:
High load on one of the systems: A high load on one of the systems could reduce performance on the other systems. For example someone running an internal report stops you being able to take orders on your eCommerce site.
With several systems writing to the same database there is a greater chance of locking.
With several systems dependent on the same database schema, how do you upgrade? All systems at the same time?
If you need to take down the database all systems stop.

Tools for Building an OCA (Occasionally Connected Application)

I will be building an in-house, Occasionally Connected App (OCA). What technologies would you suggest I employ.
Here are my parameters:
.NET Shop(3.5sp1)
C# for code behind (winform,wpf,silverlight)
SQL Server Backend (2005 or possibly 2008 pending approval)
Solo Developer
Solo SQL Administrator
Low Tech end users
Low bandwidth to 5 Branch offices
This is a LOB app but not a POS.
Majority of users have laptops that they take to Member's Home
The Data for this App is stored in 5 separate Databases, though in one SQL instance.
I am looking for specific recommendations on which path to choose. Merge Replication or Sync Framework database synchronization providers? SQL Express or SQL CE at the Subscriber? Can I use LINQ to SQL for the DAL?
Is a Silverlight 'Offline/Out of Browser App' Example Here, feasible?
This is my first LARGE business application so any experienced comments are welcome.
As requested here is some additional info on the type of Data. My users are Nurses and Social Workers who go to Member's homes and create "Plans" or "Health Assessment Reviews" for them. These are things like a Medication List or a List of there current "Providers". Steps to achieve members' goals or a list of there current/past Diagnosis's. Things like that.
Also the typical Members Name, Address, Phone Number, etc. Mostly this is a Data Storage and Retrieval app that facilitates reporting. Very little "processing" takes place and Nurses and Social Workers work in teams that are assigned members so I usually have very little crossover or potential data conflicts. Nurses and SW's also are responsible for different area's of the MCP(Member Centered Plan)
Additional question; Is Sync Framework really only a viable option if I can use SQL 2008? Seems that way due to the Change Tracking etc....thoughts?

Once you solve the problem of change detection and data movement, everything else is trivial. In other words technologies like WPF, Silverlight, Forms and even WCF are orthogonal to your main problem and your choice should be based on your personal preferences and experience. The real hard nut to crack is working disconnected and synchronizing changes. Which leaves two out-of-the-box avenues: Synch Framework or Replication.
I would say, for your scenario, definetely Synch Framework. Merge replication, like all forms of replication, is designed for systems that are connected continously with intermitent disconnects. And most critically replication can work only over static names. Laptops connecting from various hot-spots and ISPs have a nasty habit of changing FQ names with each connection. Replication can overcome this only if a VPN of sort is used and VPN is usually a major support issue. Replication is just not designed for the high mobility of OCA systems.
Synch Framework will pretty much force you to SQL 2008 back end because of the need to Change Data Capture or Change Tracking, both being SQL 2008 only features.
You will still have plenty of hard problems to solve ahead (authentication, versioning and upgrade, data conflict resolution policies, securing data on the client for accidental media loss etc etc)

Personally, I would say:
.NET 3.5
WCF Data Services (for communication between the client app and your data)
SQL Server 2k5/2k8 (whichever you can use)
Silverlight w/ Out of Browser Functionality
VistaDB (to store data locally on the client until you can push to the server)

use unique-identifier for key if you are creating stuff while offline and not connected and when you do connect, updating the database.
this is going to be way easier than using auto-increment key

Having worked on an occasionally connected application, I'd encourage you to look in to SQL Server CE for the client machines, with Sync Services to handle the connections. Here is a good tutorial.

You could create this stuff from the ground up, it seems.
However, this seems an awful lot like a CRM application, and it wouldn't surprise me if you could find an enterprise software package to do this without starting from scratch and instead modify one of the configurations to meet your business rules.
In a previous life, I was a configuration developer for this thing called Siebel that might be close to what your'e looking for. They even have a built-in synchronization tool called Siebel Remote.
It might be a cheaper route to go than rolling your own from scratch.

I wrote an order taking program for wine sales reps. Here is the video. The client software is installed using click-once. That also installs SQL Server Express and loads the database. I used the Microsoft Sync Framework to sync the local database with the one on the server (see the last section of the video.)
With powerful clients now I don't see any reason to not use SQL Server Express, it is free with a limit of 4GB.
SQL CE had too many limitations - no stored procs being a major one.
You will need to use GUIDs everywhere as the primary key - see the new NewSequentialID().
I love click-once, it is a big time saver.
I'm looking forward to Silverlight, but just haven't had time to look into it. Not sure if I would have done it with Silverlight if doing it now or not.
Having said all this, this is not a project for anyone inexperienced. So I would also get some very experienced help.

Any ORMs that work with MS-Access (for prototyping)?

I'm in the early stages of a project, and it's not clear yet whether we'll need a "real" database (i.e. SQL Server et al). So I've been doing some prototyping using MS-Access, which is working fine so far. (developing in C#/VS2008/.Net 3.5/MS-Access 2000).
However, the object-relational impedance mismatch is already becoming annoying, and will only get worse as the project evolves.
I have not been able to find an ORM that will work with MS-Access. Any suggestions?
Edit - Follow Up
We ended up using Fluent NHibernate, mainly because it Automaps our object model to a relational database, which has been a huge win for us. Most of the FNH code samples we found used SQLite, and this worked so well that we intend to use it for our production database. (The app is a desktop scientific data collection and analysis package).

MSAccess files can be set up as an ODBC source on Windows machines. Almost any ORM will allow you to use ODBC. Here is a quick tutorial on how to set that up, it's outlined for Win2k but the process is the same for XP+. You also need to have MDAC installed on your box.
NHibernate seems to have native support of MSAccess as well, see here. I've never used it though. It also has an ODBC driver.. Many others support ODBC as well.
And again, as others are saying.. MSAccess does not scale... period. Installing a real database server is fairly easy, so I'd recommend SQL Server Express as others have, or even MySQL or Postgre, whatever is easier to set up.
If this is an application that you intend to deploy to clients, with each client having their own unique database, I would recommend another solution entirely, SQLite. SQLite gives you database power on an app by app basis. If you have a central database server, one of the previously mentioned solutions would be best.

There's only one scenario when choosing the Access Database Engine is a good choice: when building a self-contained Access application using Access Forms (though choosing to use Access in the first place is a questionable choice ;)
The database engine that VS2008 plays nicest with is SQL Server and you will have no problem finding an ORM that plays nice with SQL Server.

Can't give you an answer to your question, but instead of Access you might want to consider one of the following options:
SQL Server Express: is free and compatible with the full SQL Server
SQL Server Compact: also free, does not require any deployment/installation, does not support all features (e.g. no stored procedures).

At this stage, if you are unsure whether you need a "real" database or not, I'd skip MS Access and go straight to sql server express. It's free and still allows you to do everything you need to.
Plus, if you later decide you need to scale up, then you can without any pain.

I recommend you to use something like Microsoft SQL Server or PostgreSQL for prototyping. If you don't want to learn specific SQL syntax and install special tools for designing database schema, you can use ORM that automatically generates database schema from your persistent classes declaration. Anyway this approach is very effective for prototyping.

LLBLGen works with Access

Access is just a bad, bad idea. I believe MS only includes Access in Office to keep legacy users happy.
Even if you find an ORM that will work with an Access database, with few exceptions you're locking yourself into a niche tool that likely will not work out-of-the box with a real database engine. If you decide to switch to a real database engine later on, you'll not only have to deal with migrating the database, but switching to a different ORM.
See this comparison between SQL Server Express and SQL Server Compact. The comparison document also mentions some problems with other data stores, including Access.
If you are REALLY concerned about being able to install SQL Server Express, consider SQL Server Compact:
it can be linked into your redistributable app. No need to install a service (which may require admin rights during install of your application); everything is taken care of when you install your app. This makes the most sense if you need the data to reside on the user's machine instead of a server, and is most analogous to using Access.
It's less powerful than Express (doesn't support views, triggers, stored procedures, which I consider a requirement)
Can be scaled up to Express or other SQL Server versions very easily
Suitable for small-footprint installs like tablets, mobile devices, etc.
Always keep scalability in mind when designing any application. You don't want to wind up having to write a PHP->C++ compiler if/when your app becomes successful just because you picked the wrong tool up front.
While we're at it:
The big issue with Access (or, in this case, the Jet engine, which is the part you'd really be using when integrating an Access database with a .NET app) is that there is no "server" that handles datase requests. The engine, hosted in your app, must read and write directly to a file on disk that contains the database. Whenever this happens, the file must be locked to prevent concurrent writes. Dirty reads become more common as the number of users grows, as does the potential for database corruption.
Imagine having every customer at a large restaurant trying to simultaneously enter the kitchen to write down their orders or retrieve their food. Chaos would result. There'd be a lot of broken dishes, the kitchen would be a mess, you'd be lucky to get what you ordered in any sort of edible condition. With one customer, this probably works fine. With 5, eh, maybe. With 20,50,1000? Not so much.
So, the restaurant industry introduced waiters and managers that buffer IO to the kitchen. The database server application does something roughly analogous to this by restricting access to the files on disk. Everyone gets what they want, faster and in a much more reliable way, and the data store is protected.

Sometimes Connected CRUD application DAL

I am working on a Sometimes Connected CRUD application that will be primarily used by teams(2-4) of Social Workers and Nurses to track patient information in the form of a plan. The application is a revisualization of a ASP.Net app that was created before my time. There are approx 200 tables across 4 databases. The Web App version relied heavily on SP's but since this version is a winform app that will be pointing to a local db I see no reason to continue with SP's. Also of note, I had planned to use Merge Replication to handle the Sync'ing portion and there seems to be some issues with those two together.
I am trying to understand what approach to use for the DAL. I originally had planned to use LINQ to SQL but I have read tidbits that state it doesn't work in a Sometimes Connected setting. I have therefore been trying to read and experiment with numerous solutions; SubSonic, NHibernate, Entity Framework. This is a relatively simple application and due to a "looming" verion 3 redesign this effort can be borderline "throwaway." The emphasis here is on getting a desktop version up and running ASAP.
What i am asking here is for anyone with any experience using any of these technology's(or one I didn't list) to lend me your hard earned wisdom. What is my best approach, in your opinion, for me to pursue. Any other insights on creating this kind of App? I am really struggling with the DAL portion of this program.
Thank you!

If the stored procedures do what you want them to, I would have to say I'm dubious that you will get benefits by throwing them away and reimplementing them. Moreover, it shouldn't matter if you use stored procedures or LINQ to SQL style data access when it comes time to replicate your data back to the master database, so worrying about which DAL you use seems to be a red herring.
The tricky part about sometimes connected applications is coming up with a good conflict resolution system. My suggestions:
Always use RowGuids as your primary keys to tables. Merge replication works best if you always have new records uniquely keyed.
Realize that merge replication can only do so much: it is great for bringing new data in disparate systems together. It can even figure out one sided updates. It can't magically determine that your new record and my new record are actually the same nor can it really deal with changes on both sides without human intervention or priority rules.
Because of this, you will need "matching" rules to resolve records that are claiming to be new, but actually aren't. Note that this is a fuzzy step: rarely can you rely on a unique key to actually be entered exactly the same on both sides and without error. This means giving weighted matches where many of your indicators are the same or similar.
The user interface for resolving conflicts and matching up "new" records with the original needs to be easy to operate. I use something that looks similar to the classic three way merge that many source control systems use: Record A, Record B, Merged Record. They can default the Merged Record to A or B by clicking a header button, and can select each field by clicking against them as well. Finally, Merged Records fields are open for edit, because sometimes you need to take parts of the address (say) from A and B.
None of this should affect your data access layer in the slightest: this is all either lower level (merge replication, provided by the database itself) or higher level (conflict resolution, provided by your business rules for resolution) than your DAL.

If you can install a db system locally, go for something you feel familiar with. The greatest problem I think will be the syncing and merging part. You must think of several possibilities: Changed something that someone else deleted on the server. Who does decide?
Never used the Sync framework myself, just read an article. But this may give you a solid foundation to built on. But each way you go with data access, the solution to the businesslogic will probably have a much wider impact...

There is a sample app called issueVision Microsoft put out back in 2004.
http://windowsclient.net/downloads/folders/starterkits/entry1268.aspx
Found link on old thread in joelonsoftware.com. http://discuss.joelonsoftware.com/default.asp?joel.3.25830.10
Other ideas...
What about mobile broadband? A couple 3G cellular cards will work tomorrow and your app will need no changes sans large pages/graphics.
Excel spreadsheet used in the field. DTS or SSIS to import data into application. While a "better" solution is created.
Good luck!

If by SP's you mean stored procedures... I'm not sure I understand your reasoning from trying to move away from them. Considering that they're fast, proven, and already written for you (ie. tested).
Surely, if you're making an app that will mimic the original, there are definite merits to keeping as much of the original (working) codebase as possible - the least of which is speed.
I'd try installing a local copy of the db, and then pushing all affected records since the last connected period to the master db when it does get connected.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.