Postgresql – Best way to transfer millions of rows from one database to other

migrationpostgresql

There are two databases: DatabaseA and DatabaseB with the same schema. These databases have millions of rows.

The plan is to convert them into one single database (DatabaseCommon) and multiple schemas (schemaA and schemaB).

What is the best way to transfer data, indexes, etc. from the existing databases to DatabaseCommon?

Is there a Postgres script that can handle this use-case?

We can have downtime. There will be around 20 databases. Users are restricted to their own schema.

Best Answer

First rename the schemas in the current databases to have the names you want to end up with:

alter schema public rename to schema_a;

And the same for other databases. Whether you need to change your apps, SQL functions, etc. depends on what kind of coding standards you used in the first place. This can probably be done with the system online, if your apps can cope and if you juggle search_path appropriately.

Then lock people out of each database in turn, and use pg_dump to dump out the schema and data, and load it to the successor database.

pg_dump database_a -n schema_a | psql -1 database_common

If you have extensions in special schemas, elaborate systems of roles and grants, and stuff like that, then you might need to do some work to get that to work. There is no substitute for having a test/QA database and using it for testing and practicing.

Related Solutions

Postgresql – How does postgres use the schema search path when making indexes

Given the following setup:

regress=> CREATE SCHEMA A;
CREATE SCHEMA
regress=> CREATE SCHEMA B;
CREATE SCHEMA
regress=> SET search_path = B, public;
SET
regress=> CREATE TABLE bar(email text);
CREATE TABLE
CREATE UNIQUE INDEX index_bar_on_email ON bar USING btree (email);
CREATE INDEX

I cannot reproduce the problem you report in PostgreSQL 9.2:

regress=> SET search_path = A, B;
SET
regress=> CREATE TABLE bar(email text);
CREATE TABLE
regress=> CREATE UNIQUE INDEX index_bar_on_email ON bar USING btree (email);
CREATE INDEX

However, rather than using the search_path, it's safer to use explicit schema-qualification. For example, I'd re-write the above as:

regress=> RESET search_path;
RESET
regress=> SHOW search_path ;
  search_path   
----------------
 "$user",public
(1 row)

CREATE TABLE B.bar(email text);
CREATE UNIQUE INDEX b.index_bar_on_email ON b.bar USING btree (email);
CREATE TABLE A.bar(email text);
CREATE UNIQUE INDEX index_bar_on_email ON A.bar USING btree (email);

The indexes are automatically created in the schema of their associated table; see:

regress=> \di B.
                  List of relations
 Schema |        Name        | Type  | Owner | Table 
--------+--------------------+-------+-------+-------
 b      | index_bar_on_email | index | craig | bar
(1 row)

regress=> \di A.
                  List of relations
 Schema |        Name        | Type  | Owner | Table 
--------+--------------------+-------+-------+-------
 a      | index_bar_on_email | index | craig | bar
(1 row)

Update based on question change:

Yes, what you've shown does look like a Rails adapter issue. It's checking to see whether the index exists in any schema. It should be checking to see whether the first table of the given name in the search_path has the named index.

I would write the query differently. I'd leave off the join on pg_class entirely, instead using a cast to regclass to handle search_path resolution for me. I'd use the resulting oid to search for the index. Compare original, then updated, below. Note that the updated query does require search_path to be set first.

regress=> SELECT DISTINCT i.relname, d.indisunique, d.indkey, t.oid, am.amname
 FROM pg_class t, pg_class i, pg_index d, pg_am am
 WHERE i.relkind = 'i'
 AND d.indexrelid = i.oid
 AND d.indisprimary = 'f'
 AND t.oid = d.indrelid
 AND t.relname = 'bar'
 AND i.relnamespace IN (SELECT oid FROM pg_namespace WHERE nspname IN ('b','a') )
 AND i.relam = am.oid
 ORDER BY i.relname
;
      relname       | indisunique | indkey |  oid  | amname 
--------------------+-------------+--------+-------+--------
 index_bar_on_email | t           | 1      | 28585 | btree
 index_bar_on_email | t           | 1      | 28592 | btree
(2 rows)

regress=> SELECT DISTINCT i.relname, d.indisunique, d.indkey, 'bar'::regclass::oid, am.amname
 FROM pg_class i, pg_index d, pg_am am
 WHERE i.relkind = 'i'
 AND d.indexrelid = i.oid
 AND d.indisprimary = 'f'
 AND 'bar'::regclass = d.indrelid
 AND i.relam = am.oid
 ORDER BY i.relname                                                              
;
      relname       | indisunique | indkey | regclass | amname 
--------------------+-------------+--------+----------+--------
 index_bar_on_email | t           | 1      | 28585    | btree
(1 row)

Database Design – Best Way to Implement User with Trial Subscription

In the first place, renewal_date is more an attribute of Account than Institution; so it's in the wrong table. But even further, I would recommend a Subscription entity that would define a many-to-many relationship between Account and Tools.

(Oh. You don't have Tools. Well, maybe you should. You seem to have only one tool. Can you guarantee that will forever be the case?)

Then renewal_date would properly be an attribute of the relationship between Account and the tool they have access to.

Best Answer

Related Solutions

Postgresql – How does postgres use the schema search path when making indexes

Database Design – Best Way to Implement User with Trial Subscription

Related Question