How to Setup Local Database Development Process for Small Web Team

best practicesdevelopment

Background

I am working on creating a new development process for a small web team of about 4 programmers and 4 designers, with the obvious potential to grow the team in the future.
Our product is a central application that powers client websites that we also design and host.

Previously, we all worked via FTP on a dev server, with a single dev database. It "worked"* for a while, but we are moving into a new direction, so it is time to mature our process.

We use Percona Server 5.5, but this should be database agnostic, with the idea of keeping costs low.

Goals:

I am looking into creating a continuous integration (CI) process for database development with the following in mind:

  1. Developers have local copy of data to run development code against
  2. Able to rollback database structure to a previous changeset
  3. Able to separate new feature schema changes vs schema design fix changes
  4. Able to modify database structure locally for testing

Initial Concept

I have sketched out a process below using SVN and LiquiBase, though it completely removes #4.

  • create a 'development' branch from trunk
  • central 'development' db server runs off the 'development' branch
  • local developers are set up as slaves to the development branch (provides #1 above)
  • liquibase changesets are committed regularly to the development branch, which execute a post-commit hook to update the central development database (which will trickle down to the local machines running as slaves to this development server) (liquibase provides #2 above)
  • when features or schema fixes are ready to go to QA, DBA (me) will merge the appropriate changes from development branch into trunk. This act will create a sql script to apply to a staging database server.
  • Staging server should reflect TRUNK, which should have the same structure as Production, plus changes that are in QA
  • after executing the sql script on the staging server, do some QA on the changes.
  • If all looks good, TAG the structure. This will generate the .sql script to be run in production manually by the DBA (for off-peak hours if required)

This process requires that all developers run off the same 'development' branch, meaning there is only one version of the database schema at any given time (not sure that I want this).

It also means that any changes to the schema cannot be tested out locally and could affect other developers if not done right. In our environment, developers might add new tables but rarely modify existing structure. As the DBA, design fixes are done by me. But the inability to test fixes locally is my biggest hangup of the process.

How can the above process be tweaked to allow local development, while still maintaining a relatively up-to-date copy of data (as provided by replication in my proposed process)? I do not require the data to be current up to even the last week.


* By 'worked', I mean it sufficed but was a PITA.

Best Answer

Managing environments

I think you definitely don't want to be forced into a single database version. You've got enough developers that you will inevitably have multiple development work streams, and requirements to apply patches to the current production environment independent of development workstreams.

You can use Liquibase or a manual process to produce patch scripts to upgrade versions. I suggest starting out with a manual process and using the schema comparison tool for QA on the patches. Clean, automated, transparent synchronisation of a nontrivially complex database is a bit utopian.

Your central data model can be kept in whatever system takes your fancy. I've used everything from tedious enterprise repository tools to create table scripts. Create table scripts play nicely with ordinary source control tools such as subversion and not all repository tools do a good job of versioning.

Whatever you use as your master data model repository you need a fairly clean mechanism for deploying an environment from that model. It should be structured so that rollouts to an environment are easy. You also need a mechanism to patch from one released version to the next.

What I've done

I've done the following in the past when I was managing development environments. It's not particularly high tech, but it's amenable to version control and automated builds, so it makes it easy to roll out an environment to a specific version, and maitaining a large number of environments is quite practical.

Maintain a central repository: This could be a set of database creation scripts held in a version control systems, or a repository model in a data modelling tool. Take your pick. This model should have a build mechanism that allows an environment to be rolled out from the scripts without a lot of manual intervention.

If you have a lot of reference data you will need a load mechanism for it. Depending on how you want to do it, you could keep this in a database or in a set of files. The advantage of files is that they can also be versioned and labelled from the same version control system as your code base. A bunch of CSV files and bulk loader scripts in a source control repository can do this easily enough.

One option for deploying development environments is to take backups of the production database patched to the appropriate version and make them available for devs to restore into a development environment.

Make it easy to roll out: Like any CI build process, the database should be deployable from a single script. Set it up so that database connections can be paramaterised, or the script is location independent and can just be run through the connection.

Patch scripts: You will need roll forward and probably roll back scripts from each released version.

Build test environments from the repository model: This ensures that development on environments that are out of sync with the repository gets caught in testing.

Test the deployment process: Automated patching scripts, however they are created should be testable. Schema comparison tools are quite good for this, even if you dont't use them to generate the patch scripts.

  • Create a reference environment with the repository model build you tested against

  • Create a smoke test environment with either a backup of your production release or a build based on the current released version.

  • Run the patch script against the smoke test environment.

  • Use the schema comparison tool to compare the smoke test environment with the reference environment. The patch script should result in the two databases being identical, so you can investigate any differences.

What I like about this process

This is a bit heavyweight, and was designed for deploying into fairly bureaucratic and opaque production environments. However, it has the following strengths:

  • Developers can tinker where they need to.

  • Multiple branches and streams of development can be accommodated.

  • The deployment scripts themselves can be tested in a repeatable manner. This is very helpful to shut down deployment bunfights, as the repeatability can be demonstrated.

  • The schema comparison tools provide QA on the deployment itself.

  • Testing is always done against what is expected to be released, and this will catch issues arising from environments being out of sync.

  • Deployment based on repository models and patch scripts means that uncontrolled rubbish doesn't get accidentally migrated from development environments into production.

  • A lot of the process can be automated, although it is often desirable to prepare and test the deployment patch scripts manually.

  • Environments are cheap and easy to deploy without having to jump through hoops.

  • Developers are forced to make a system that is amenable to a simple build and deployment process.

  • Developers are forced to learn basic database administration tasks, but test and production environments are insulated from noob mistakes.

How it addresses your requirements

  1. Developers have local copy of data to run development code against

    The deployment scripts or DB images mean that they can set up an environment from any version that's available.

  2. Able to rollback database structure to a previous changeset

    Again, sorted by the deployment scripts. Either through DDL or test database backup images created through a controlled process, developers can bring up an environment to any specific version that you have.

  3. Able to separate new feature schema changes vs schema design fix changes

    Patches to a common version can be maintained in a separate fork in the svn tree. If database backups are used as the reference environments they need to be stored somewhere with the same folder structure as the branching of the source control projects.

  4. Able to modify database structure locally for testing

    The simple deployment process allows devs to tinker, and easily restore an environment to a local state, or bring up a reference environment to do comparisons and make change sets against.