Thesql to postgresql, migrate files directly ( longblob > pg_largeobject )

blobMySQLpostgresqlpython

I'm working on a python(2.6.6.) script and I'm evaluating what approach to take to migrate files from mysql-server to PostgreSQL . I've read the following posts:

But in both cases large object interfaces are used, and files are dumped from mysql into OS file system, which is not what I want.

I'm looking for a way to directly import the mysql longblobs into postgresql, without have to previously write them into OS filesystem. I believe that this approach will increase the overall migration performance since less 'steps' are required.

And also because my code structure looks to be ready to hold such kind of modification, because I already have to iterate each row of the tables that contain files data in order to convert some fields and make other kind of transformations.

example:

this is mysql source table :

describe source_table;
+-------------+---------------------+------+-----+---------+-------+
| Field       | Type                | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| RESOURCE_ID | bigint(20) unsigned | NO   | PRI | 0       |       |
| PAGE        | int(10) unsigned    | NO   | PRI | 0       |       |
| VERSION     | int(10) unsigned    | NO   | PRI | 0       |       |
| CONTENT     | longblob            | NO   |     | NULL    |       |
| CURRENT     | tinyint(1) unsigned | NO   |     | 0       |       |
+-------------+---------------------+------+-----+---------+-------+

this is postgresql destination table :

\d+ destination_table;
                     Table "public.destination_table"
   Column    |   Type   |     Modifiers      | Storage  | Description 
-------------+----------+--------------------+----------+-------------
 resource_id | bigint   | not null default 0 | plain    | 
 version     | smallint | not null default 0 | plain    | 
 page        | smallint | not null default 0 | plain    | 
 file_oid    | oid      |                    | plain    | 

This is my pseudo-code ( pseudo because the largeobject import part does not work and it's actually imaginative ) :

def migrate_files(function_args):

  for key, values in function_args.items():
    cur_msql = cnx_msql.cursor(dictionary=True)
    cur_psql = cnx_psql.cursor()

  ### SELECT DATA
    cur_msql.execute(values[0])

  ### CONVERT DATA
    # initialization of lists (rows tmp container)
    eachrow = []

    for row in cur_msql:

      # Here I do some other fileds conversions
      # I removed these lines because are meaningless for my problem

      # here I append the modified rows into the tmp container (list)
      eachrow.append(cur_psql.mogrify(values[2] , row))

    dataset = ','.join(eachrow)

  ### INSERT PSQL
    if dataset:    # this check value(dataset) is not empty
      try:
        cur_psql.execute( values[1] + dataset )
      except psycopg2.Error as e:
        print "Cannot execute the query on " + values[1] + dataset, e.pgerror
        cnx_psql.rollback()
        sys.exit( "Rollback! And leaving early this lucky script, find out what is wrong" )
    else:
      print "The dataset for " + key + " is empty, skipping..."

    # Cursors close
    cur_msql.close()
    cur_psql.close()

this is how I call my python function giving the queries in input as a python dictionary:

function_args={'destination_table':[
  "SELECT resource_id, page, version, content FROM source_table",
  "INSERT INTO destination_table (resource_id, version, page, file_oid) VALUES",
  "(%(resource_id)s, %(version)s, %(page)s, lo_import('content'))"
]}

migrate_files(function_args)

recap:

As you can see, what I'm trying to achieve is to directly import the stream of data(content) I got from mysql table into the pg_largeobject catalog using lo_import.

Additionally, since here I've read the following:

The return value is the OID that was assigned to the new large object

I'm also trying to directly output the OID was assigned to the inserted largeobject, into file_oid field of destination database.

Are these two operations possible?

Best Answer

lo_import takes a file path as argument and imports the file contents. As you want to avoid the round-trip to the file system, that function won't do.

I haven't use Python/psycopg2 personally, but from a glance at the doc it appears the class you're looking for is lobject. It provides the usual read/write/close methods wrapping the libpq API for large objects.

When creating a new large object, the OID that postgres assigned to it seems to be available through lobject.oid.