Postgresql – Query between tables to delete similar records from another table

deletepostgresqlsubstring

I am writing an SQL query on two tables products01 and products02 in a PostgreSQL database.

For the entire set of records in products02 take the substring (first 10 characters of volumeId and use those substrings to find all the corresponding records in products01. There can be more than one hit in products01, for instance finding older legacy record of the same item, just a different edition. The eleventh character of volumeId makes it unique.

For instance, in products01 I can have volumeId's like:

record45671
record45672
record45673

And in products02 I would be using substring(volumeId for 10) or 'record456' to find those similar records in products01.

Also I'd like to display the results so I can verify my query is accurate.

Here is my initial attempt:

    SELECT substring(volumeId, for 10) FROM temporary.products01 t01
    WHERE EXISTS (
        SELECT volumeId FROM temporary.products02 t02
        WHERE t01.volumeId LIKE t02.volumeId);

Best Answer

To find all rows from table products01 where the 10 leading characters of volumeId match any products02.volumeId:

SELECT t1.*
FROM   products01 t1
JOIN   products02 t2 ON left(t1.volumeId, 10) = t2.volumeId;

To delete:

DELETE FROM products01 t1
USING  products02 t2
WHERE  left(t1.volumeId, 10) = t2.volumeId;

You can use this equivalent syntax to get both commands more in line:

SELECT DISTINCT t1.* FROM products01 t1, -- see below
-- DELETE FROM products01 t1 USING  
products02 t2
WHERE  left(t1.volumeId, 10) = t2.volumeId;

Only add DISTINCT if t2.volumeId is not fully deterministic (t1.volumeId can match multiple t2.volumeId).

Now you can either select the whole statement to select or starting with DELETE to delete. See:

Related Solutions

Postgresql – Delete data from multiple related tables

You can use Jailer tool, it will find all rows of child tables that reference to master table.

Example: I have 4 tables : employee, employee_detail, phone_address, relationship. I want to delete employee which name="JOHN". With Jailer, it will find rows in employee_detail & relationship which FK to "JOHN" (by id). And because of phone_address reference to employee_detail, so Jailer will find rows in phone_address.

tb_employee ----(1,n)----tb_employee_detail----(1,n)----tb_phone_address 
            ----(1,n)----tb_relationship

PostgreSQL – SELECT Multiple Values from One Table Matching Another Table

Actually, since

there can be n number of values for a single record of table first.

the number of columns to return is not exactly arbitrary. There is a maximum of columns, and this has a clean solution - unless you have more columns than Postgres allows for a table:

250 - 1600 depending on column types

BTW, typically, you would also have a third table listing all possible values of value, the whole of it implementing a classical many-to-many relationship.

How to implement a many-to-many relationship in PostgreSQL?

You can use CASE statements, or more elegantly, the crosstab() function of the additional module tablefunc. If you are unfamiliar with it, read basic instructions here first:

PostgreSQL Crosstab Query

Columns to the right of actual values are filled with NULL. Assuming a maximum of 5 possible values and building on this setup:

CREATE TABLE first  (id int, name text);
CREATE TABLE second (id int, value "char", fid int);  

INSERT INTO first  VALUES
  (1,'Ahmad')
, (2,'Sami')
, (3,'Khan')
, (4,'Nobody');  -- Added to demonstrate difference
INSERT INTO second VALUES
  (1,'a',1)
, (2,'b',1)
, (3,'c',2)
, (4,'d',1)
, (5,'e',2)
, (6,'f',3);

Either use crosstab(text) (1 parameter form) and join to table first another time:

SELECT id, f.name, value1, value2, value3, value4, value5
FROM   crosstab(
        'SELECT f.id, 1 AS dummy_category, s.value
         FROM   first  f
         JOIN   second s ON s.fid = f.id
         ORDER  BY f.id, s.value'
       ) ct (id int
           , value1 "char", value2 "char", value3 "char", value4 "char", value5 "char")
JOIN first f USING (id);

Or use crosstab(text, text) (2 parameter form) and generate dummy categories for your values:

SELECT *
FROM   crosstab(
        'SELECT f.id, f.name
              , row_number() OVER (PARTITION BY f.id ORDER BY s.value) AS dummy_category
              , s.value
         FROM   first  f
         JOIN   second s ON s.fid = f.id
         ORDER  BY f.id, s.value'
      , ('SELECT generate_series(1,5)')
       ) ct (id int, name text
           , value1 "char", value2 "char", value3 "char", value4 "char", value5 "char");

Result is the same either way:

 id | name  | value1 | value2 | value3 | value4 | value5
----+-------+--------+--------+--------+--------+--------
  1 | Ahmad | a      | b      | d      |        |
  2 | Sami  | c      | e      |        |        |
  3 | Khan  | f      |        |        |        |

If you want to include all rows from table first, make it a LEFT [OUTER] JOIN:

         ...
         LEFT JOIN second s ON s.fid = f.id
         ...

Then we get one additional result row for the above example:

  4 | Nobody |        |        |        |        |

Best Answer

Related Solutions

Postgresql – Delete data from multiple related tables

PostgreSQL – SELECT Multiple Values from One Table Matching Another Table

Related Question