Sql-server – SQL Server Automate refresh of one table only from one environment to another

sql servert-sqltransactional-replication

I'm looking for a best practice/option to refresh one table only from Prod to Staging environment.

Technical Details:

  1. The table in Source is loading daily with new data.
  2. The table in destination should represent new data only when refreshed.

Some of the options I'm thinking:

  1. Replication (Transactional)
  2. SSIS
  3. TSQL Script – SQL Agent Job

My first preferences would be to go with option 3. Will this be a good option? If yes can someone guide me to a script.

Best Answer

It's a question of preference. Please see below some advantages/disadvantages I found working with each of them.

  1. Replication (Transactional) - works excellent as long as you don't need to touch the table. It's real time so you will make your developers/testers really happy with "fresh", almost real-time data. It becomes tricky if you need to add a column, remove a column or if your table structure changes because in almost all cases you need to reinitialize the replication from scratch, but for just one table of a few MB it will work fast, this becomes an issue from a few GB on because it takes time. If you set up the replication with your account make sure it will not expire or get disabled because you will crash the replication; best is to use a service account. Another thing to consider is networking & server resources - transactions getting replicated real time will 'suck up' some network and some of the server memory and I/O, but unless your table gets a few hundreds rows (or more) modified every second I don't see that as being a real problem. You have a nice tutorial on how to set this up here.
  2. SSIS - you will not have (almost) real time data in your staging environment. So if any of your developers or testers have the idea of comparing data between prod an stage - refresh time needs to be considered. If you change the table structure you need to modify the package (always make sure to also refresh the metadata inside the package). My experience is that you work more to keep a SSIS package up to date than you work to recreate replication, but this is subjective to personal preference and requirements. Also if you set up your package as a 'truncate & load' you will delete and write again the same data every day no matter if your source table gets updated or not. If you don't want that you need to add some logic in your package and that can be complicated and lead to data loss or duplicates if you don't know your data very well. Another thing to consider is your access rights: if you want to deploy it you need SQL Admin-like rights, if you want to run this from a local folder, you need RDP rights to that machine and write permissions to copy your package to a local drive on the server. You still need a SQL Agent job to automate it (I assume you don't want to manually run this). Please keep in mind that you need to use Visual Studio with BI add-on to create the package. You find documentation about setting up the SSIS package here and a nice tutorial here.
  3. TSQL Script - SQL Agent Job - This is also my personal preference as it is the easiest to set up and maintain, specially if you have no experience with SSIS and/or replication. Just like for SSIS, you will not have (almost) real-time data. You have much more control and it's easier to change if your table structure changes. Depending on what kind of changes you can have on your source table it can get tricky to synchronize your data, but you always have the option of a truncate table and insert everything from production every time (highly NOT recommended for big tables and even if your table is small today, consider how big it will be in 3-5 years!). When I set up jobs like this I usually take my time to understand the data and all the changes that might happen and I use a merge statement as much as possible. If you have a insert date column in your prod table, then you can work your logic around that. When you set up the SQL Agent job make sure you change the owner to a service account, otherwise the job will crash if your account gets disabled.

For point 2 & 3 consider doing some index work right after you insert/modify the data to keep your performance to a high level.