PostgreSQL Recovery – Restore Database to New Version from Files

linuxpostgresqlrecovery

Scenario in short:

  • a development laptop broke
  • old HDD is still readable
  • new laptop has PostgreSQL 9.4 instead of 9.1
  • both laptops use a flavor of Ubuntu Linux

Question 1: Is my understanding correct that in order to restore the old data, I would need a server with the same major+minor version as the one that wrote the old data directory? Since no 9.1 packages are available for my distribution, I would have to compile a v9.1.x server from source, copy over the old data, start the server and perform a normal pg_dump, which could then be restored to the new cluster?

Question 2: There were around 10-15 databases on the old laptop, but since it was a development machine, in theory all of the data should be replacable, apart from some local experiments. I'm thinking of just scrapping the old data, but I can't remember with 100% certainty what those databases were. Is there a way to extract some basic information (such as database names, maybe even sizes or timestamps) from the old data directory without running a 9.1 server?

Best Answer

Looks like you've figured out question 1 for yourself already (short answer: yes, use the latest 9.1.x release, and make sure the compile-time options are the same between the version on the old and new machine to be sure the data directory, and the machines should ideally be as similar as possible in order to be binary-compatible, e.g. both x86-64, similar glibc versions, etc.).

But about question 2:

Is there a way to extract some basic information (such as database names, maybe even sizes or timestamps)

You can look under the "base" subdirectory of the data directory, and you should see something like this:

/datadir/base $ du -sh *
6.5M    1
6.1M    12292
6.5M    12297
 39M    3237152
6.3M    371336
 10M    4049006
 41M    4481732
6.7M    4691247
 10M    4927721
7.2M    4927722
7.4M    58068
  0B    pgsql_tmp

Each of those directories with an integer as the directory name represents a database inside my PostgreSQL cluster. The integers (OIDs) in the directory name match the oid you would see from a query like:

 SELECT oid, datname FROM pg_database;

if you had the server running. I don't know of a trivial way to determine the database name from those OIDs without having the server running, but at least you know how many databases there are and how big they should be. And you should be able to figure out creation time too from checking stat. In my case, "1" was for "template1", "12292" was "template0", and the rest were various other databases.