Postgresql – Advice on scripting a “whitewashed” and downsized development db from main db

postgresqlscriptingsqlite

Currently we (an active opensource project) have a large PostgreSQL production database which contains all the site records. Since we can't just allow every developer or contributor access to the database (containing emails, passwords, phone numbers, etc.) but we need to give the developers a somewhat up-to-date SQLite database. We currently do the following:

  1. dump the database
  2. analyze and change it with custom python scripts including truncating the table to make exporting to SQLite more bearable, remove sensitive data like passwords
  3. export this to SQLite

This is very slow and error prone.

Is there a recommended/best practice approach way to do this, both the whitewashing and the table truncation, without breaking object relational mappings between tables?

As I analyze the problem domain I see the main problem with the object related mapping. I can't just pull the first thousand records from all the tables (with LIMIT) since an object in line 900, for example, in one table may map a foreign key to the 1001 line in another table. I guess changing and sanitizing data can be done with views, replacing certain columns for a calculated one (based on the original table column). Then a cron job could just dump the views.

I'll be glad for help/references.

Best Answer

What you're referring to is usually called Test Data Management. There are several commercial solutions, e.g. IBM Optim Test Data Management (disclosure: I work for IBM, but don't speak for them). I'm not aware of any free products that can perform that function, so I'm afraid you'll have to stick to your Python scripts unless you have deep pockets.

On a side note, I would suggest using the same database platform, i.e. PostgreSQL, for both production and development -- it will probably save you some troubleshooting time later.