PostgreSQL 9.2 – Using CTE INSERT to Provide Unique ID Values

postgresqlpostgresql-9.2

I am writing a job to transform data from an old design into a new design. In this process, I need to take the id from an insert into a separate table and use that in an insert to the target table, as such:

CREATE TABLE t1 {
  t1_id BIGSERIAL,
  col1 VARCHAR
};
CREATE TABLE t2 {
  t2_id BIGSERIAL,
  col2 VARCHAR, -- renamed from col1 to avoid confusion
  t1_id BIGINT REFERENCES t1.t1_id
};

I have the SQL defined that matches the following form:

WITH ins AS (
  INSERT INTO t1 (t1_id) VALUES (DEFAULT) RETURNING t1_id
) INSERT INTO t2
  (col1, t1_id)
SELECT
  a.val1, (SELECT * FROM ins)
FROM t3 a;

I wanted this to run the SELECT * FROM ins for every row of the SELECT .. but instead it only runs it once and uses that value for all rows in the SELECT. How can I restructure my SQL to get the desired behavior?

edit4

t1 ends up looking like:

1,<NULL>
(1 row)

t2 ends up looking like:

10,'a',1
11,'b',1 -- problem with id from t1 being 1
12,'c',1 -- problem with id from t1 being 1
.
.

What I want t1 to look like:

1,<NULL>
2,<NULL>
3,<NULL>
.
.

What I want t2 to look like:

10,'a',1
11,'b',2 -- id from t1 of 2
12,'c',3 -- id from t1 of 3
.
.

edit
To address what a_horse_with_no_name said, I also tried this (with the same result):

WITH ins AS (
  INSERT INTO t1 (t1_id) VALUES (DEFAULT) RETURNING t1_id
) INSERT INTO t2
  (col1, t1_id)
SELECT
  a.val1, b.t1_id
FROM t3 a
JOIN ins b ON TRUE;

edit2
I just tried directly referencing the appropriate SEQUENCE in my query, and that DOES work – but I don't like that solution very much at all (mostly because I don't like hard-coding object names.) If there is ANY solution other than directly referencing the name of the SEQUENCE I would appreciate it. 🙂

edit3
I suppose another solution would be to make use of a PROCEDURE to do the INSERT instead of a CTE .. but I'd still appreciation options/suggestions.

Best Answer

I don't understand why you need 2 tables if they have only 1-1 relationship. But here it is (pk is the primary key of t3):

WITH ins AS (
  INSERT INTO t1 (col1) 
    SELECT NULL FROM t3 
  RETURNING t1_id
) 
, r AS
( SELECT t1_id, ROW_NUMBER() OVER () AS rn
  FROM ins
) 
, t AS
( SELECT *, ROW_NUMBER() OVER () AS rn
  FROM t3
) 
INSERT INTO t2
  (col1, t1_id)
SELECT
  t.val1, r.t1_id
FROM t 
  JOIN r USING (rn) ;

If your t3 is the results of a SELECT instead of a preexisting table, you can implement it as such so that you don't have to repeat the t3 query twice:

WITH t3 AS (
  SELECT ...
), ins AS (
  INSERT INTO t1 (col1)
    SELECT NULL FROM t3
  RETURNING t1_id
), r AS (
  SELECT t1_id, ROW_NUMBER() OVER () AS rn
  FROM ins
), t AS (
  SELECT *, ROW_NUMBER() OVER () AS rn
  FROM t3
) INSERT INTO t2
  (col1, t1_id)
SELECT
  t.val1, r.t1_id
FROM t 
  JOIN r USING (rn);

Related Solutions

PostgreSQL – UTF8 to Latin1 Conversion

It looks like whatever client you are using is confused about the text encoding; it's sending utf-8 bytes as if they were latin-1, probably.

Check:

SHOW client_encoding;
SHOW server_encoding;
locale command in your terminal, if using psql

Your update is substituting the octal bytes \303\244 which are the utf-8 encoding for "ä" (U+00E4). You're not substituting latin-1 encoded data where you think you are.

Observe:

regress=> SELECT convert_from(BYTEA 'huvudv\303\244rke', 'latin-1');
 convert_from 
--------------
 huvudvÃ¤rke
(1 row)

regress=> SELECT convert_from(BYTEA 'huvudv\303\244rke', 'utf-8');
 convert_from 
--------------
 huvudvärke
(1 row)

Not only that, but your replace could only have matched in the first place if the replace target was the utf-8 encoded byte sequence for Ã¤ interpreted as latin-1, i.e. \303\203\302\244.

It's hard to be more specific without details about the Pg version, the client being used, etc, but the root cause is clearly your client doing something totally borked with encodings on I/O.

Your original text is totally mangled, it's not valid in UTF-8 or latin-1. It looks like someone's taken some UTF-8 data, decoded it as latin-1, and then encoded it as utf-8 again.

Yep, sure enough:

regress=> SELECT convert(convert_to('huvudvärke', 'utf-8'), 'latin-1', 'utf-8');
          convert          
---------------------------
 huvudv\303\203\302\244rke
(1 row)

there's your explanation.

I'd say you probably have a bunch of mis-encoded data in the DB already, and you noticed this one because it got mangled twice. You're probably doing something like routinely jamming utf-8 bytes into latin-1 encoded fields, but you usually get away with it because you decode them as utf-8 again - and this time you did something different.

If you want to get from your mangled text to the original, you simply have to reverse the incorrect encoding process. Decode the utf-8 and output latin-1, then re-interpet the latin-1 as utf-8 and decode again, e.g:

regress=> SELECT  convert_from(convert(BYTEA 'huvudv\303\203\302\244rke', 'utf-8', 'latin-1'), 'utf-8');
 convert_from 
--------------
 huvudvärke
(1 row)

Postgresql – insert inserted id to another table

You can utilize the returning keyword which will return the value of the created serial column. If you wrap you Insert in a With expression, you can then access the returned id and insert into the second table.

WITH getval(id) as
    (INSERT INTO table_a (some_col) VALUES (some_val) RETURNING id) 
INSERT into table_b (id) SELECT id from getval;

where id is the value of the serial resulting from the insert into table a. More in the docs, though it is a somewhat hidden feature.

Note, you can also use returning * to return the entire updated row, not just the serial column, should you need it.

Best Answer

Related Solutions

PostgreSQL – UTF8 to Latin1 Conversion

Postgresql – insert inserted id to another table

Related Question