Postgresql – Why the WHERE clause is not pushed down in the view’s query

optimizationperformancepostgresqlquery-performanceview

With Postgres 9.4, I'm doing the following query quite often:

SELECT DISTINCT ON(recipient) * FROM messages
LEFT JOIN identities ON messages.recipient = identities.name
WHERE timestamp BETWEEN timeA AND timeB
ORDER BY recipient, timestamp DESC;

So I decided to create a view:

CREATE VIEW myView AS SELECT DISTINCT ON(recipient) * FROM messages
LEFT JOIN identities ON messages.recipient = identities.name
ORDER BY recipient, timestamp DESC;

I just realized if I query my view like SELECT * FROM myView WHERE timestamp BETWEEN timeA AND timeB I get a significantly worse performance.

Doing EXPLAIN ANALYZE on both queries, I found out the reason is that the database in the second case brings up all the records, does the left join and then applies the WHERE clause. In other words, the WHERE clause is not pushed down into the view's query. I also tried to remove the ORDER BY from the view, but still the database performs the LEFT JOIN on full data rather on the filtered set.

What is the reason of this behavior? Is there a way I can get a comparable performance when using view?

Best Answer

You can create a function like this;

CREATE OR REPLACE FUNCTION public.get_messages_by_timestamp (
  time_a timestamp,
  time_b timestamp
)
RETURNS TABLE (
  recipient varchar,
  "timestamp" timestamp
) AS
$$
BEGIN
  RETURN QUERY
    SELECT DISTINCT ON (m.recipient) 
        m.recipient,
        m."timestamp"
      FROM messages m
      LEFT JOIN identities i ON m.recipient = i.name
      WHERE 
        m."timestamp" BETWEEN time_a AND time_b
      ORDER BY 
        m.recipient,
        m."timestamp" DESC;
END;
$$
LANGUAGE 'plpgsql';

Then you can use the function like a table

  SELECT *
         FROM get_messages_by_timestamp('2015-01-01', '2015-01-02')