Sql-server – Setting up cloud database for syncing to local SQLite database

amazon-rdsdata synchronizationsql serversql server 2014sqlite

Background

I have a SQL Server database hosted on AWS RDS and there are web applications and WEB APIs that talk to the database. The database is a multi-tenant database and we are currently using SQL Server 2014 although we can upgrade if required.

A third-party developed a local client application on our behalf which has it's own SQLite database. This application is developed in Xamarin so it runs on Windows, iOS and Android. The local SQLite database must be kept in sync with the cloud database. Syncing data up to the cloud database is not a problem, but syncing data down is causing us issues. Currently we sync data down to the local database by asking the WEB API, every minute, to return all changes that have occurred since a particular date. The cloud database has a DateCreated, DateModified and DateDeleted column in every table and these columns are queried to see what data has changed since the last time the client synced data. The local application records the last successful sync date for each table.

Problem

This approach worked when there were few local clients and few tables to sync but as our client base has grown this approach doesn't seem scalable. We are running into performance issues on our cloud database and a lot of the time the sync-down tasks are cancelled due to timeouts or take ages to run. Our customers are complaining about the time it takes for changes they make on the cloud to sync down to the local application.

Potential Solution

Having researched various methods of tracking changes on SQL Server I believe that using the built-in Change Tracking feature is a better approach than using the DateCreated, DateModified and DateDeleted columns for tracking changes. What I am not sure about is how best to set this up.

Things to consider:

Not all columns on the cloud database tables need to sync to the local database – For example, TableA on the cloud database has 20 columns but its corresponding client TableA may only have 5
Not all data relating to a tenant needs to sync to their local database – for example if a record is marked as "inactive" for that tenant it should never be synced locally
A table on the local database may contain data from two or more tables on the cloud database
Not all tenants have the local application yet but they will eventually (this may take a year or more to roll out)

What I am thinking of doing is as follows:

Create a separate database in AWS RDS that exactly matches the local database
Enable change tracking on this database rather than on the main database
Use triggers to keep the main database in sync with the new database
Query the change tracking tables on the new database and return the changes to the local application
Create a new table to track if data has changed or not for each tenant and table – this way we won't need to query the change tracking tables each minute only to find that nothing has changed

The reason for the second database is to reduce the strain on the main database when clients are trying to sync data down and also keeping the schemas in sync reduces the complexity on the queries when a client requests to sync changes. For example, if a record is marked as "inactive" for the tenant in the main database, but that record has been changed, I don't want to have to filter this record out when the client requests to sync the data down. I would prefer to already have those records filtered out so that they would never exist in the second database at all. Hope that makes sense!

I would very much value your feedback on this approach and please feel free to suggest better ways of doing it. If there is something that is not clear please let me know and I'll update the question!

Best Answer

While CDC is one useful methodology in SQL Server, it can also be a little performance heavy too depending on how much data and how frequently it changes.

You might want to consider a 3rd party tool to manage your synchronization between SQL Server and SQLite, which can help minimize the performance and maintenance overhead of a homebrew solution. For example SQLiteSync by Amplifier is supposed to work well for this kind of use case (note I haven't used it myself though).

https://ampliapps.com/sqlite-sync/

Related Solutions

Clarification on Cloud Computing Database

Nope, you cannot work offline and then sync your changes. But this is a nice idea for a product! :)

Sql-server – Syncing two databases in SQL Server

Well I might not get it, but I try to answer it.

You said you need a high performance solution which runs often (minimum all 2 minutes) and you need a good approach which should be fast without locking. But you don't want a blackbox system.

Instead of a blackbox system, which is used on millions of installations with good results, you try to invent the wheel again and build your own solution? Hm, sounds a bit weird.

In fact these are my suggestions.

Replication even if you said you won't use it. It's quite the easiest and best solution you can use for this. The replication is easy to setup, replicate fast and you don't have to invent the wheel again. If you just weird about locking, you may try to set the ISOLATION LEVEL to READ_COMMITTED_SNAPSHOT. You can read more about it here. This will use up a part of your tempdb, but your table is always read- and writeable and the replication can work in the background.

See the example below:

ALTER DATABASE yourDatabase SET ALLOW_SNAPSHOT_ISOLATION ON
ALTER DATABASE yourDatabase SET READ_COMMITTED_SNAPSHOT ON

CDC (Change Data Capture) can also be a solution. But this way you need to build nearly everything on your own. And I've made the experience that CDC can be a fragile thing in some circumstances. CDC will capture all data on a watched table (you need to specify each watched table manually). Afterwards you'll get the value before and the value after an INSERT, UPDATE or DELETE. CDC will hold back those information for a period of time (you can specify it on your own). The approach could be to use CDC on certain tables you need to watch and manually replicate those changes to the other database. By the way, CDC uses the SQL Server Replication under the hood too. ;-) You can read more about it here.

Warning: CDC will not be aware of DDL-changes. This means, if you change a table and add a new column, CDC will watch the table but ignore all changes to the new column. In fact it only records NULL as value before and value after. You need to reinitialize it after DDL-Changes to a watched table.

The way you described above is something like capturing a workload using SQL Server Profiler and run it again on another database for some benchmarks. Well it could work. But the fact that there are too many side effects is a bit too heavy for me. What do you do if you capture a procedure call on your client. Afterwards running the same command at your principle database as it is out of sync? The procedure may run through, but it may delete/update/insert rows which were not present in your client. Or how do you handle multiple clients with one principle. I think this is too tricky. In the worst case, you probably destroy your integrity.
Another idea could be application based or using a trigger. Depending on how many tables you want to be synced. You can write all changes to a separate staging table and run an SQL Server Agent Job all x Minutes to sync those rows in the staging table with your master. But this may be a bit to heavy if you try to sync (e.g.) 150 tables. You would have a big overhead.

Well these are my 2 cents. Hopefully you have a good overview and maybe you found one solution which works for you.

Best Answer

Related Solutions

Clarification on Cloud Computing Database

Sql-server – Syncing two databases in SQL Server

Related Question