Sql-server – Setting up cloud database for syncing to local SQLite database

amazon-rdsdata synchronizationsql serversql server 2014sqlite

Background

I have a SQL Server database hosted on AWS RDS and there are web applications and WEB APIs that talk to the database. The database is a multi-tenant database and we are currently using SQL Server 2014 although we can upgrade if required.

A third-party developed a local client application on our behalf which has it's own SQLite database. This application is developed in Xamarin so it runs on Windows, iOS and Android. The local SQLite database must be kept in sync with the cloud database. Syncing data up to the cloud database is not a problem, but syncing data down is causing us issues. Currently we sync data down to the local database by asking the WEB API, every minute, to return all changes that have occurred since a particular date. The cloud database has a DateCreated, DateModified and DateDeleted column in every table and these columns are queried to see what data has changed since the last time the client synced data. The local application records the last successful sync date for each table.

Problem

This approach worked when there were few local clients and few tables to sync but as our client base has grown this approach doesn't seem scalable. We are running into performance issues on our cloud database and a lot of the time the sync-down tasks are cancelled due to timeouts or take ages to run. Our customers are complaining about the time it takes for changes they make on the cloud to sync down to the local application.

Potential Solution

Having researched various methods of tracking changes on SQL Server I believe that using the built-in Change Tracking feature is a better approach than using the DateCreated, DateModified and DateDeleted columns for tracking changes. What I am not sure about is how best to set this up.

Things to consider:

  • Not all columns on the cloud database tables need to sync to the local database – For example, TableA on the cloud database has 20 columns but its corresponding client TableA may only have 5
  • Not all data relating to a tenant needs to sync to their local database – for example if a record is marked as "inactive" for that tenant it should never be synced locally
  • A table on the local database may contain data from two or more tables on the cloud database
  • Not all tenants have the local application yet but they will eventually (this may take a year or more to roll out)

What I am thinking of doing is as follows:

  • Create a separate database in AWS RDS that exactly matches the local database
  • Enable change tracking on this database rather than on the main database
  • Use triggers to keep the main database in sync with the new database
  • Query the change tracking tables on the new database and return the changes to the local application
  • Create a new table to track if data has changed or not for each tenant and table – this way we won't need to query the change tracking tables each minute only to find that nothing has changed

The reason for the second database is to reduce the strain on the main database when clients are trying to sync data down and also keeping the schemas in sync reduces the complexity on the queries when a client requests to sync changes. For example, if a record is marked as "inactive" for the tenant in the main database, but that record has been changed, I don't want to have to filter this record out when the client requests to sync the data down. I would prefer to already have those records filtered out so that they would never exist in the second database at all. Hope that makes sense!

I would very much value your feedback on this approach and please feel free to suggest better ways of doing it. If there is something that is not clear please let me know and I'll update the question!

Best Answer

While CDC is one useful methodology in SQL Server, it can also be a little performance heavy too depending on how much data and how frequently it changes.

You might want to consider a 3rd party tool to manage your synchronization between SQL Server and SQLite, which can help minimize the performance and maintenance overhead of a homebrew solution. For example SQLiteSync by Amplifier is supposed to work well for this kind of use case (note I haven't used it myself though).

https://ampliapps.com/sqlite-sync/