Why Fast PostgreSQL Queries Appear in Slow Query Logs

hibernatepostgresql

On investigating an entry in our Postgres log, I noticed a large discrepancy between the reported duration and the speed of the query.

2015-07-28 17:27:26 UTC [dms_segment_3] LOG:  duration: 2924.763 ms  bind <unnamed>: SELECT  prospect.id AS prospect_id,  0,  am.id, am.customer_id, 0 FROM xxxxx.audience_member am LEFT OUTER JOIN xxxxx.campaign campaign     ON campaign.id = $1 LEFT OUTER JOIN xxxxx.end_user prospect     ON campaign.id=prospect.campaign_id        AND prospect.email_id=am.customer_id        AND prospect.end_user_type != 1 WHERE am.audience_id = $2  ORDER BY am.id limit $3
2015-07-28 17:27:26 UTC [dms_segment_3] DETAIL:  parameters: $1 = '4000013578869', $2 = '4000013744916', $3 = '500'

Running an explain on that query yielded much faster speeds:

explain analyze
SELECT  prospect.id AS prospect_id
,  0,  am.id, am.customer_id, 0
FROM xxxxx.audience_member am
LEFT OUTER JOIN xxxxx.campaign campaign
     ON campaign.id = 4000013578869 
LEFT OUTER JOIN xxxxx.end_user prospect
     ON campaign.id = prospect.campaign_id
    AND prospect.email_id = am.customer_id
    AND prospect.end_user_type != 1
WHERE am.audience_id = 4000013744916
ORDER BY am.id
limit 500;
                                                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
 Limit  (cost=326.61..326.67 rows=26 width=24) (actual time=0.500..0.517 rows=4 loops=1)
   ->  Sort  (cost=326.61..326.67 rows=26 width=24) (actual time=0.491..0.498 rows=4 loops=1)
         Sort Key: am.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop Left Join  (cost=0.00..326.00 rows=26 width=24) (actual time=0.260..0.402 rows=4 loops=1)
               Join Filter: (campaign.id = prospect.campaign_id)
               ->  Nested Loop Left Join  (cost=0.00..107.25 rows=26 width=24) (actual time=0.237..0.346 rows=4 loops=1)
                     ->  Index Scan using idx_audience_member_audience_id on audience_member am  (cost=0.00..99.62 rows=26 width=16) (actual time=0.062.
.0.071 rows=4 loops=1)
                           Index Cond: (audience_id = 4000013744916::bigint)
                     ->  Materialize  (cost=0.00..7.30 rows=1 width=8) (actual time=0.042..0.060 rows=1 loops=4)
                           ->  Seq Scan on campaign  (cost=0.00..7.30 rows=1 width=8) (actual time=0.154..0.219 rows=1 loops=1)
                                 Filter: (id = 4000013578869::bigint)
                                 Rows Removed by Filter: 23
               ->  Index Scan using idx_enduser_emailaddress on end_user prospect  (cost=0.00..8.40 rows=1 width=24) (actual time=0.006..0.006 rows=0 loops=4
)
                     Index Cond: (email_id = am.customer_id)
                     Filter: ((end_user_type <> 1) AND (campaign_id = 4000013578869::bigint))
 Total runtime: 0.701 ms
(17 rows)

Is there something I'm missing? What could account for the 2+ second gap between the log and what appears to be the "real" query duration?

We're using Postgres 9.2. In production we use a mixture of Hibernate queries and native SQL within a Tomcat 6 app.

UPDATE:

jpmc26's eagle eye spotted the parameters being passed as strings in the log but not in the explain, so I ran it again:

explain analyze SELECT  prospect.id AS prospect_id,  0,  am.id, am.customer_id, 0 FROM xxxxx.audience_member am LEFT OUTER JOIN xxxxx.campaign campaign     ON campaign.id = '4000013578869' LEFT OUTER JOIN xxxxx.end_user prospect     ON campaign.id=prospect.campaign_id        AND prospect.email_id=am.customer_id        AND prospect.end_user_type != 1 WHERE am.audience_id = '4000013744916'  ORDER BY am.id limit '500';
                                                                                     QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------
 Limit  (cost=326.62..326.68 rows=26 width=24) (actual time=0.168..0.186 rows=4 loops=1)
   ->  Sort  (cost=326.62..326.68 rows=26 width=24) (actual time=0.164..0.171 rows=4 loops=1)
         Sort Key: am.id
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop Left Join  (cost=0.00..326.01 rows=26 width=24) (actual time=0.065..0.145 rows=4 loops=1)
               Join Filter: (campaign.id = prospect.campaign_id)
               ->  Nested Loop Left Join  (cost=0.00..107.25 rows=26 width=24) (actual time=0.051..0.102 rows=4 loops=1)
                     ->  Index Scan using idx_audience_member_audience_id on audience_member am  (cost=0.00..99.62 rows=26 width=16) (actual time=0.017.
.0.025 rows=4 loops=1)
                           Index Cond: (audience_id = 4000013744916::bigint)
                     ->  Materialize  (cost=0.00..7.30 rows=1 width=8) (actual time=0.009..0.013 rows=1 loops=4)
                           ->  Seq Scan on campaign  (cost=0.00..7.30 rows=1 width=8) (actual time=0.023..0.031 rows=1 loops=1)
                                 Filter: (id = 4000013578869::bigint)
                                 Rows Removed by Filter: 23
               ->  Index Scan using idx_enduser_emailaddress on end_user prospect  (cost=0.00..8.40 rows=1 width=24) (actual time=0.005..0.005 rows=0 loops=4
)
                     Index Cond: (email_id = am.customer_id)
                     Filter: ((end_user_type <> 1) AND (campaign_id = 4000013578869::bigint))
 Total runtime: 0.259 ms
(17 rows)

This time the benefit of having the query in memory is evident, but otherwise nothing has changed.

Best Answer

What you see in the logs is a prepared statement execution. Using prepared statements is the usual way to interact with the database from an application layer.

For example, using Hibernate, one could write something like this (hopefully, the code snippet below is valid, did not test it):

String sql = "SELECT first_name, last_name FROM customer WHERE email = :customer_email";
...
query.setParameter("customer_email", Customer.email);

Among others, this is a good way to avoid SQL injection - unlike building a full query text, including the parameter, using concatenation.

One can do this from a PostgreSQL client, too:

PREPARE fetch_customer (text) AS
    SELECT first_name, last_name FROM customer WHERE email = $1;

[EXPLAIN ANALYZE] EXECUTE fetch_customer ('john@gmail.com');

On the other hand, and this is what you experience, a prepared query can result in inferior performance. A prepared statement cannot make an advantage from knowing what the values passed to it will be - simply because they are unknown at the preparation time. Thus, it has to choose a query plan that is generic enough to get any possible result in an acceptable time.

For example, imagine you had a partial index like

CREATE INDEX ON xxxx.campaign (campaign_id) WHERE campaign.id > 4000000000000;

(which does not make much sense in real life, but never mind).

When the planner knows it has 4000013578869 as campaign_id, it can opt for using this index, possibly significantly reducing the number of data pages to fetch. However, a prepared statement cannot do this, as some (most?) of the possible campaigns would be excluded.