Postgresql – Selecting parent and child data

postgresql

I have an application front end which speaks to a PostgreSQL database. I'm trying to find the most efficient way to extract a structure similar to the below.

CREATE TABLE people (
  person_id SERIAL PRIMARY KEY,
  fname     VARCHAR
);

CREATE TABLE items (
  person_id INTEGER REFERENCES people (person_id),
  item_id   SERIAL PRIMARY KEY,
  title     VARCHAR
);

INSERT INTO people (fname) VALUES
  ('Bob'),
  ('Jim'),
  ('Geoff');

INSERT INTO items (person_id, title) VALUES
  (1, 'Cat'),
  (1, 'Dog'),
  (1, 'Monkey'),
  (2, 'Elephant');

My current approach would be to list all of the items, then (within the application) iterate over each and run a SELECT. Something similar to this Pseudo code:

items = db.Query(SELECT * FROM people);
for item in items
   itemsub = db.Query(SELECT * FROM items WHERE id = item.person_id)

There are a number of reasons why I don't like this.

It requires a minimum of 1 + N(People) queries to extract a simple data structure. For a page containing 1000 items, this would produce a ton of network traffic.
The application has to manually iterate over the list atleast twice more after the database, one to build the better structure and a second to render it.
It seems like something a database should be able to do

One other alternative would be to perform a JOIN, but this would result in extra processing to remove duplicates in the application

The resulting structure should be relatively simple:

peopleItems{ 
   array people{ 
             array items{}
         }
}

However, I can't for the life of me work out the best approach.

Best Answer

This is just an idea.

test=# CREATE INDEX items_idx ON items (person_id);
CREATE INDEX

test=# SELECT person_id, fname, 
    (SELECT array_to_string(ARRAY(SELECT title FROM items WHERE items.person_id = people.person_id ), ',')) AS titles
 FROM people;
 person_id | fname |     titles     
-----------+-------+----------------
         1 | Bob   | Cat,Dog,Monkey
         2 | Jim   | Elephant
         3 | Geoff | 
(3 rows)

I don't know what application language you use (PHP, Ruby, or others), but all languages can easily parse this result.

I made dummy data and done EXPLAIN. The result is shown below:

test=# EXPLAIN 
SELECT person_id,
fname, 
(SELECT array_to_string(ARRAY(SELECT title FROM items WHERE items.person_id = people.person_id ), ',')) AS titles
 FROM people;
                                       QUERY PLAN                                     

--------------------------------------------------------------------------------------
---
                                        QUERY PLAN                                    

--------------------------------------------------------------------------------------
-----
 Seq Scan on people  (cost=0.00..984256.31 rows=9990 width=8)
   SubPlan 2
     ->  Result  (cost=98.50..98.51 rows=1 width=0)
           InitPlan 1 (returns $1)
             ->  Bitmap Heap Scan on items  (cost=19.81..98.50 rows=455 width=16)
                   Recheck Cond: (person_id = people.person_id)
                   ->  Bitmap Index Scan on items_idx  (cost=0.00..19.70 rows=455 widt
h=0)
                         Index Cond: (person_id = people.person_id)
(8 rows)

This query uses Bitmap Index scan, so it can be run very fast.

Please don't forget to create index of person_id in the items table.

Demo

CREATE TABLE journal_line(amount int); -- simplistic table for demo

CREATE OR REPLACE FUNCTION trg_insaft_check_balance()
    RETURNS trigger AS
$func$
BEGIN
   IF sum(amount) <> 0
      FROM journal_line 
      WHERE xmin::text::bigint = txid_current()  -- consider link above
         THEN
      RAISE EXCEPTION 'Entries not balanced!';
   END IF;

   RETURN NULL;  -- RETURN value of AFTER trigger is ignored anyway
END;
$func$ LANGUAGE plpgsql;

CREATE CONSTRAINT TRIGGER insaft_check_balance
    AFTER INSERT ON journal_line
    DEFERRABLE INITIALLY DEFERRED
    FOR EACH ROW
    EXECUTE PROCEDURE trg_insaft_check_balance();

Deferred, so it is only checked at the end of the transaction.

Tests

INSERT INTO journal_line(amount) VALUES (1), (-1);

Works.

INSERT INTO journal_line(amount) VALUES (1);

Fails:

ERROR: Entries not balanced!

BEGIN;
INSERT INTO journal_line(amount) VALUES (7), (-5);
-- do other stuff
SELECT * FROM journal_line;
INSERT INTO journal_line(amount) VALUES (-2);
-- INSERT INTO journal_line(amount) VALUES (-1); -- make it fail
COMMIT;

Works. :)

If you need to enforce your constraint before the end of the transaction, you can do so at any point in the transaction, even at the start:

SET CONSTRAINTS insaft_check_balance IMMEDIATE;

Faster with plain trigger

If you operate with multi-row INSERT it is more effective to trigger per statement - which is not possible with constraint triggers:

Constraint triggers can only be specified FOR EACH ROW.

Use a plain trigger instead and fire FOR EACH STATEMENT to ...

lose the option of SET CONSTRAINTS.
gain performance.

DELETE possible

In reply to your comment: If DELETE is possible you might add similar trigger doing a whole-table balance check after a DELETE has happened. This would be much more expensive, but won't matter much as it rarely happens.

Postgresql – Table for optional parent/child relationship

The solution you outlined is one valid option - assuming that an item can only belong to a single person at any given time.

In PostgreSQL you can enforce mutual exclusion between the two fk columns with a simple CHECK constraint:

either a Parent or Child must exists

... you can add a simple CHECK constraint:

CHECK (a IS NOT NULL OR b IS NOT NULL)

Would demand at least one NOT NULL column - but also allow that both parent_id and child_id exist. If you want to disallow that, too, make it:

CHECK (a IS NOT NULL AND b IS NULL OR b IS NOT NULL AND a IS NULL)

Best Answer

Related Solutions

Database Design – Modelling Constraints on Subset Aggregates

Demo

Tests

Faster with plain trigger

DELETE possible

Postgresql – Table for optional parent/child relationship

Related Question