PostgreSQL – How to Perform Partial Dump of Large Objects (pg_largeobject Table)

dumppg-dumppostgresqlpostgresql-9.1scripting

Context : Our web-app is like a collaboration platform where you can share files, appointments, survey and so on. An instance of this web-app is divided in communities, each one is independent to each other but share the same PostgreSQL database.

Target : actually what i'm trying to do is to write a shell script in order to make regular backups of communities separately. This means that I need to select only part of the data inside the various Postgre tables and export/dump the community data.

Obstacle :
My big problem is with the pg_largeobjects table, used to store the BLOB of files. This table use OIDs as pk (something I don't relly like by the way). The table files (containing files metadata) , has a reference to the pg_largeobject OID.

Solutions considered :

    • Select all the OIDs from files table WHERE community_id = something.
    • And afterwards, I could export the data stream from pg_largeobjects using [lo_export][1] function giving all the OIDs selected in the previous operation:

      int lo_export(PGconn *conn, Oid lobjId, const char *filename);

The problem with this approach is when i will try to restore pg_largeobject records, I need to keep the same OIDs or the relation will be broken. I can use the function lo_import_with_oid that permit to define the desired OID, but in most cases OIDs will be already occupied from the original data, and for this reason the import will generate a new OID, that will not satisfact the relation with the rest of the database.

  1. I still don't know how I could achieve this, but i try to explain:
    • during the export I could select and save all the PK of the files table record of the certain community (let's call it filesPKs).
    • I will then perform a [lo_export] of pg_largeobjects
    • and during the lo_import I could ignore the fact that new OID are being generated. But in someway i would save the list of these new OIDs inserted.
    • as final step I could update all the rows(using filesPKs reference) in the files table in order to make them point to the new imported pg_largeobject records.

This second procedure in my opinion is very difficult, because I still don't see how could I update files rows in order to point to the correct largeobject.

  • An other option, that fancifully would be the best one, would be to modify the pg_largeobject table in order to have an additional column with a (static) reference to files table. But after a test it seems to be impossible:

    testdb=# alter table pg_largeobject add column prova text; ERROR:
    permission denied: "pg_largeobject" is a system catalog

Questions :

  • Any one else faced this issue?

  • Am I just missing something, or taking the wrong approach?

  • Out there exist an other way to export/dump pg_largeobjects records, exporting all the fields, not only the data stream?

Best Answer

I managed to resolve this issue as follow (2nd considered solution - revisited):

All in a while cicle, for every file:

  • select file_id , file_version, oid and put them in 3 variables
  • perform a [lo_export] of pg_largeobjects that require oid a path+filename to be defined. The filename will be the concatenation of the file_id and file_version.
  • during lo_import new OID will be generated. But using the filename defined above, I can perform the last stem:
  • update all the rows in the files table in order to make them point to the new imported pg_largeobject records and mantain the relation consistecy