Postgresql – How to adopt the PostgreSQL naming convention in legacy database

identifierpostgresql

I realize I'll need to update the queries, but if I have a database that has camelCase or spaces in it, you have to use double quotes on the identifiers (viz. schema, table, column). How do I migrate away from taking into account that I can't have capital letters, nor spaces in my identifiers. I need to normalize all of them to snake_case.

Best Answer

Moving to PostgreSQL convention

In PostgreSQL by convention and with good reason, we neither use spaces nor capital letters in our identifiers (columns, tables, schemas, etc). The use of _ is purely style though. In this example, we

Transliterate all ' ' to '_'
Migrate from camelCase to snake_case

This code changes nothing, but outputs the respective ALTER commands to update the schema (including the schema, table, and column names)

-- Columns first.
SELECT FORMAT(
  'ALTER TABLE %I.%I.%I RENAME COLUMN %I TO %I;',
  table_catalog,
  table_schema,
  table_name,
  column_name,
  lower(
    -- replace all spaces with _, xX and Xx becomes x_x
    regexp_replace(
      -- First, replace spaces with an _
      replace(column_name, ' ', '_'),
      '([[:lower:]])([[:upper:]])',
      '\1_\2',
      'g'
    )
  )
)
FROM information_schema.columns
WHERE column_name ~ ' |[[:lower:]][[:upper:]]'

-- Tables
UNION ALL
  SELECT FORMAT (
    'ALTER TABLE %I.%I.%I RENAME TO %I;', 
    table_catalog,
    table_schema,
    table_name,
    lower(
      regexp_replace(
        replace(table_name, ' ', '_'),
        '([[:lower:]])([[:upper:]])',
        '\1_\2',
        'g'
      )
    )
  )
  FROM information_schema.tables
  WHERE table_name ~ ' |[[:lower:]][[:upper:]]'

-- Schemas
UNION ALL
  SELECT FORMAT (
    'ALTER SCHEMA %I RENAME TO %I;',
    schema_name,
    lower(
      regexp_replace(
        replace(schema_name, ' ', '_'),
        '([[:lower:]])([[:upper:]])',
        '\1_\2',
        'g'
      )
    )
  )
  FROM information_schema.schemata
  WHERE schema_name ~ ' |[[:lower:]][[:upper:]]';

From there you can edit the commands to be run, delete the ones you don't want, or simply run \gexec if you're using psql and they will execute.

Testing

If you wish you can test the above

CREATE SCHEMA "myFoo bar";
CREATE TABLE "myFoo bar"."foo bar myBaz" ( "my foo" int, "myBar" int, "MyBaz" text, "myFooBarBaz" int, "MyFoo Bar Baz" int, "myTestQ" uuid );

                                            format                                             
-----------------------------------------------------------------------------------------------
 ALTER TABLE test.my_foo_bar.foo_bar_my_baz RENAME COLUMN "MyFoo Bar Baz" TO my_foo_bar_baz;
 ALTER TABLE test."myFoo bar"."foo bar myBaz" RENAME COLUMN "my foo" TO my_foo;
 ALTER TABLE test."myFoo bar"."foo bar myBaz" RENAME COLUMN "myBar" TO my_bar;
 ALTER TABLE test."myFoo bar"."foo bar myBaz" RENAME COLUMN "MyBaz" TO my_baz;
 ALTER TABLE test."myFoo bar"."foo bar myBaz" RENAME COLUMN "myFooBarBaz" TO my_foo_bar_baz;
 ALTER TABLE test."myFoo bar"."foo bar myBaz" RENAME COLUMN "MyFoo Bar Baz" TO my_foo_bar_baz;
 ALTER TABLE test."myFoo bar"."foo bar myBaz" RENAME COLUMN "myTestQ" TO my_test_q;
 ALTER TABLE test."myFoo bar"."foo bar myBaz" RENAME TO foo_bar_my_baz;
 ALTER SCHEMA "myFoo bar" RENAME TO my_foo_bar;
(9 rows)

Note because there is a collision in the above to my_foo_bar_baz, you'll see

ERROR: column "my_foo_bar_baz" of relation "foo bar myBaz" already exists

That just confirms nothing catastrophic happens if you happen to run into that. highly unlikely

Simple with `hstore`

If you have the additional module hstore installed (instructions in link below), there is a surprisingly simple way to replace the value(s) of individual field(s) without knowing anything about other columns:

Basic example: duplicate the row with id = 2 but replace 2 with 3:

INSERT INTO people
SELECT (p #= hstore('id', '3')).* FROM people p WHERE id = 2;

Details:

Assuming (since it's not defined in the question) that people.id is a serial column with an attached sequence, you'll want the next value from the sequence. We can determine the sequence name with pg_get_serial_sequence(). Details:

PostgreSQL SELECT primary key as "serial" or "bigserial"

Or you can just hard-code the sequence name if it's not going to change.
We would have this query:

~~INSERT INTO people SELECT (p #= hstore('id', nextval(pg_get_serial_sequence('people', 'id'))::text)).* FROM people p WHERE id = 2;~~

Which works, but suffers from a weakness in the Postgres query planner: The expression is evaluated separately for every single column in the row, wasting sequence numbers and performance. To avoid this, move the expression into a subqery and decompose the row once only:

INSERT INTO people
SELECT (p1).*
FROM  (
   SELECT p #= hstore('id', nextval(pg_get_serial_sequence('people', 'id'))::text) AS p1
   FROM   people p WHERE id = 2
   ) sub;

Probably fastest for a single (or few) row(s) at once.

json / jsonb

If you don't have hstore installed and can't install additional modules, you can do a similar trick with json_populate_record() or jsonb_populate_record(), but that capability is undocumented and may be unreliable.

How to set value of composite variable field using dynamic SQL

Transient temporary table

Another simple solution would be to use a transient temporary like this:

BEGIN;
CREATE TEMP TABLE people_tmp ON COMMIT DROP AS
SELECT * FROM people WHERE id = 2;
UPDATE people_tmp SET id = nextval(pg_get_serial_sequence('people', 'id'));
INSERT INTO people TABLE people_tmp;
COMMIT;

I added ON COMMIT DROP to drop the table automatically at the end of the transaction. Consequently, I also wrapped the operation into a transaction of its own. Neither is strictly necessary.

This offers a wide range of additional options - you can do anything with the row before inserting, but it's going to be a bit slower due to the overhead of creating and dropping a temp table.

This solution works for a single row or for any number of rows at once. Each row gets a new default value from the sequence automatically.

Using the short (SQL standard) notation TABLE people.

Dynamic SQL

For many rows at once, dynamic SQL is going to be fastest. Concatenate the columns from the system table pg_attribute or from the information schema and execute it dynamically in a DO statement or write a function for repeated use:

CREATE OR REPLACE FUNCTION f_row_copy(_tbl regclass, _id int, OUT row_ct int) AS
$func$
BEGIN
   EXECUTE (
      SELECT format('INSERT INTO %1$s(%2$s) SELECT %2$s FROM %1$s WHERE id = $1',
                    _tbl, string_agg(quote_ident(attname), ', '))
      FROM   pg_attribute
      WHERE  attrelid = _tbl
      AND    NOT attisdropped  -- no dropped (dead) columns
      AND    attnum > 0        -- no system columns
      AND    attname <> 'id'   -- exclude id column
      )
   USING _id;

   GET DIAGNOSTICS row_ct = ROW_COUNT;  -- directly assign OUT parameter
END
$func$  LANGUAGE plpgsql;

Call:

SELECT f_row_copy('people', 9);

Works for any table with an integer column named id. You could easily make the column name dynamic, too ...

Maybe not your first choice since you wanted to stay away from stored procedures, but then again, it's not a "stored procedure" anyway ...

Advanced solution

A serial column is a special case. If you want to fill more or all columns with their respective default values, it gets more sophisticated. Consider this related answer:

Generate DEFAULT values in a CTE UPSERT using PostgreSQL 9.3

Best Answer

Moving to PostgreSQL convention

Testing

Related Solutions

Postgresql – Migrating database live while being severely handicapped

Postgresql – Duplicate row with Primary Key in PostgreSQL

Simple with hstore

json / jsonb

Transient temporary table

Dynamic SQL

Advanced solution

Related Question

Simple with `hstore`