PostgreSQL – Conditional Selection in FROM Clause

caseconditionpostgresql

I have two tables:

parent with columns identifier (pkey) and period;
child with columns identifier (pkey), parent_identifier and period.

There are additional columns, but I did not list them here as they are not really relevant.

I want to select the periods as following:

If the parent matches some predicate, then pick parent.period.
If not, then pick child.period from each child.

I have the following query:

select   q.period
from     parent,
lateral  (
           select  parent.period
           where   <some complex predicate>
           union all
           select  child.period
           from    child
           where   child.parent_identifier = parent.identifier and
                   not(<the same complex predicate as above>)
         ) as q(period);

While this query works and returns the expected periods, I am wondering if this query could be rewritten in such way that <some complex predicate> does not have to be evaluated twice, to improve performance (albeit by a small factor) and general query structure to represent the if ... else ... from above. Ideally, I want to loop once through each parent/child row and evaluate the predicate just once.

I tried to introduce a case when <some complex predicate> then ... else ... end clause, but I fail to get it right. I am not sure if that is even possible in the FROM clause. Is this possible?

I also had select case when <predicate> then parent.period else unnest(array(select child.period from child where ...)) from parent in mind (though I did not try it, so it might be disallowed in a case when ... as well), but I am not sure if first creating and then unnesting an array is really optimal. Also, I am not really fond of set-returning functions, such as unnest, in the selection.

Best Answer

The following queries evaluate the expensive predicate just once for each parent row. To achieve this, the predicate is evaluated in a separate subquery. The first version uses a lateral join, the second an inner join. This meets the requirements from the question.

select   q2.period
from     parent,
lateral  (
           select <some complex predicate>
         ) as q1(predicate),
lateral  (
           select  parent.period
           where   q1.predicate
           union all
           select  child.period
           from    child
           where   child.parent_identifier = parent.identifier and
                   not(q1.predicate)
         ) as q2(period);

select     q2.period
from       parent,
inner join (
             select  parent.identifier, <some complex predicate>
             from    parent
           ) as q1(identifier, predicate)
on         parent.identifier = q1.identifier,
lateral    (
             select  parent.period
             where   q1.predicate
             union all
             select  child.period
             from    child
             where   child.parent_identifier = parent.identifier and
                     not(q1.predicate)
           ) as q2(period);

Related Solutions

Postgresql – Altering a parent table in Postgresql 8.4 breaks child table defaults

Your problem is that when you add a new column to the_person, its child, the_person_two will have this field appended at the end of columns list (4th position), so after has_default column. See:

db=> \d temp_person
  Column   |       Type        |                            Modifiers                            
-----------+-------------------+-----------------------------------------------------------------
 person_id | integer           | not null default nextval('temp_person_person_id_seq'::regclass)
 name      | character varying | 
 foo       | text              | 

db=> \d temp_person_two 
   Column    |         Type         |                            Modifiers                            
-------------+----------------------+-----------------------------------------------------------------
 person_id   | integer              | not null default nextval('temp_person_person_id_seq'::regclass)
 name        | character varying    | 
 has_default | character varying(4) | not null default 'en'::character varying
 foo         | text                 |

So, when you execute this:

INSERT INTO temp_person_two VALUES ( NEW.* );

PostgreSQL will actually understand that you want to insert on the first three columns of temp_person_two (as NEW.* will expand to three values), generating something similar to this:

INSERT INTO temp_person_two(person_id,name,has_default)
VALUES ( NEW.person_id, NEW.name, NEW.foo );

So, temp_person_two.has_default will get the value of NEW.foo, which is NULL in your case.

The solution is to simply expand the column names:

INSERT INTO temp_person_two(person_id,name,foo)
VALUES ( NEW.person_id, NEW.name, NEW.foo );

or, you could also use this:

INSERT INTO temp_person_two(person_id,name,foo)
VALUES ( NEW.* );

But this is weak, as any changes on column positions may break your statements, so I'd recommend the first one.

EDIT:

So the conclusion and the lesson learned here is:

Always explicitly type the names of the columns and the values when issuing an INSERT command, in fact, when issuing any SQL command at all... =D

This will save you a lot of time solving problems like that in future.

Postgresql – postgres fetch record from partitions without the check constraints

A few details are still not clear, but let's see, what we can do now. First, having about 100M rows and 231 partitions sounds not that good. The resulting tables will be too small, in turn their number too high - I cannot tell the threshold, but at some point the query planning migt get too expensive. I think it is quite possible that yearly partitions would be enough. Alternatively, if you really want to fetch a whole month at a time, create monthly partitions.

Now to the actual problem.

It is not quite clear to me why you have rows in the parent table. The usual way of partitioning is that the parent is empty, and every row is redirected to one of the children.

At the same time, is you have an index on posted_at of the parent table (as you have it on the children), finding rows in the parent based on the timestamp is easy.

On the other hand, while I'm not sure which column shared_parent_id refers to, you can define an index on it, too - looking rows up based on this will be easy, too.

The only thing still has to be added is tell your query to look for parents in the parent table only. Let's have a look at a possible query:

WITH child_messages AS (
    SELECT shared_parent_id, {other interesting columns}
      FROM messages
     WHERE posted_at {matches your needs}
)
SELECT *
  FROM child_messages

UNION ALL

SELECT shared_parent_id, {other interesting columns}
  FROM ONLY messages -- this way it does not go to the children
 WHERE {unclear column} IN (SELECT shared_parent_id FROM child_messages);

The WITH query may pick up rows from the parent, too - this you may or may not want, adjust the query accordingly.

Furthermore, the performance might not be ideal, in this case there is room for tweaking the query (eg. a JOIN instead of the IN(), pushing the query in the WITH clause into a (sub)query and so on).

And a final notice: varchar(255) is usually a sign of a value of unknown-before length - if you really want to constrain it, you may want to choose a meaningful limit. Otherwise, an unlimited varchar (or text) has a slight performance advantage in PostgreSQL over the limited ones. Furthermore, from your example it seems that shared_parent_id is a number (integer) - use the best fitting type.

Best Answer

Related Solutions

Postgresql – Altering a parent table in Postgresql 8.4 breaks child table defaults

Postgresql – postgres fetch record from partitions without the check constraints

Related Question