I need to import data from an old database to a new one, with slightly different structure. For example, in the old database, there is a table recording employees and their supervisors:
CREATE TABLE employee (ident TEXT PRIMARY KEY, name TEXT, supervisor_name TEXT)
Now, the new database is as follows:
CREATE TABLE person (id BIGSERIAL PRIMARY KEY, name TEXT, old_ident TEXT);
CREATE TABLE team (id BIGSERIAL PRIMARY KEY);
CREATE TABLE teammember (person_id BIGINT, team_id BIGINT, role CHAR(1));
That is, instead of plain table of employees with their supervisors' names, the new (more generic) database enables to create teams of people. The employees are members with role 'e'
, supervisors with role 's'
.
The question is how to easily migrate the data from employee
to the new structure, one team per employee-supervisor pair. For example, employees
employee: ('abc01', 'John', 'Dave'), ('abc02', 'Kyle', 'Emily')
are to be migrated as
person: (1, 'John', 'abc01'), (2, 'Dave', NULL), (3, 'Kyle', 'abc02'), (4, 'Emily', NULL)
team: (1), (2)
teammember: (1, 1, 'e'), (2, 1, 's'), (3, 2, 'e'), (4, 2, 's')
I would consider using a data-modifying CTE, inserting the employees and supervisors first, then teams among them. However, CTE may only return data from the inserted table row. Thus, I am not able to match who was the supervisor of who.
The only solution I can see is using plpgsql
, which would simply iterate over the data, hold the inserted team IDs in a temporary variable, and then insert the appropriate teammember
rows. But I am curious whether there are simpler or more elegant solutions.
There will be roughly several hundreds to several thousands of employees. Although it is generally a good practice, in my case, I wouldn't like to generate the new IDs based on the old ones, as the old IDs are strings like *.GM2
. I store them to the old_ident
column for reference.
Best Answer
You have all the information you need to populate the new database from the old one with 4 insert statements:
You may have to adjust to taste. I'm assuming employee.ident can be mapped onto person.id, and that your DBMS allows assigning values to columns with auto-generated values. Except for that, it's just basic SQL, nothing fancy and, of course, no loops.
Additional commentary:
SERIAL
(with its 2 billion possibilities) should be plenty, no need for aBIGSERIAL
.CHECK
orFOREIGN KEY
constraint for teammember.role? Perhaps the question simplified these details away.