PostgreSQL – Eliminating Duplicates and Optimizing Performance for Big Count and Small LIMIT

optimizationperformancepostgresqlpostgresql-performance

I have a big query in my Postgres 9.1 database:

SELECT *,
       count(*) OVER () AS full_count
FROM (
  SELECT DISTINCT ON(message.messagetime,message.messageid) message.messagetime,
         message.messageid,
         message.clientid,
         message.message_type,
         message.contenturl,
         message.nativecontenturl,
         message.receiverid,
         message.status,
         message.nativeisa,
         message.nativegs,
         message.isa,
         message.gs,
         message.originalfilename,
         message.duplicatekeyfield,
         message.duplicatenativeid,
         extractkeyfield.keyfield,
         sender.description AS sendername,
         receiver.description AS receivername
  FROM message
    LEFT JOIN extractkeyfield ON message.messageid = extractkeyfield.messageid
    LEFT JOIN clientprofile sender ON message.clientid = sender.clientid
    LEFT JOIN clientprofile receiver ON message.receiverid = receiver.clientid
  WHERE message.messagetime BETWEEN '01-01-2015   03:40:50'::timestamp AND '04-01-2016   03:35:09'::timestamp
) message
ORDER BY message.messagetime DESC limit 100 offset 0

Locally my query does not take too long but when I am connected to the server it can take up to 2 minutes to return the data. I'm not sure if it's because too much data is being transferred or a bad/unoptimized query. I am using all of the information returned from the query. My query is returns 1,249,333 rows.

I have tried to create an index and tried answers from multiple posts but nothing seemed to improve or help:

Any tips or help to decrease my query time would be much appreciated!

EXPLAIN output

See: http://explain.depesz.com/s/ui1

 QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1748742.62..1748742.87 rows=100 width=350) (actual time=129247.099..129247.385 rows=100 loops=1)
   ->  Sort  (cost=1748742.62..1751833.33 rows=1236283 width=350) (actual time=129247.094..129247.195 rows=100 loops=1)
         Sort Key: public.message.messagetime
         Sort Method: top-N heapsort  Memory: 76kB
         ->  WindowAgg  (cost=1660621.55..1701492.77 rows=1236283 width=350) (actual time=121565.119..126665.924 rows=1249333 loops=1)
               ->  Unique  (cost=1660621.55..1673676.41 rows=1236283 width=350) (actual time=109482.221..117931.767 rows=1249333 loops=1)
                     ->  Sort  (cost=1660621.55..1664973.17 rows=1740647 width=350) (actual time=109482.216..113785.356 rows=2397272 loops=1)
                           Sort Key: public.message.messagetime, public.message.messageid
                           Sort Method: external merge  Disk: 866352kB
                           ->  Hash Right Join  (cost=201396.04..361691.11 rows=1740647 width=350) (actual time=12600.048..30933.450 rows=2397272 loops=1)
                                 Hash Cond: ((extractkeyfield.messageid)::text = (public.message.messageid)::text)
                                 ->  Seq Scan on extractkeyfield  (cost=0.00..47477.74 rows=1765974 width=45) (actual time=0.023..4422.157 rows=1765974 loops=1)
                                 ->  Hash  (cost=130405.50..130405.50 rows=1236283 width=343) (actual time=12584.450..12584.450 rows=1249333 loops=1)
                                       Buckets: 1024  Batches: 512  Memory Usage: 992kB
                                       ->  Hash Left Join  (cost=200.65..130405.50 rows=1236283 width=343) (actual time=3.609..8758.093 rows=1249333 loops=1)
                                             Hash Cond: (public.message.receiverid = receiver.clientid)
                                             ->  Hash Left Join  (cost=100.33..113306.28 rows=1236283 width=325) (actual time=1.874..5929.557 rows=1249333 loops=1)
                                                   Hash Cond: (public.message.clientid = sender.clientid)
                                                   ->  Seq Scan on message  (cost=0.00..96207.07 rows=1236283 width=307) (actual time=0.051..2623.433 rows=1249333 loops=1)
                                                         Filter: ((messagetime >= '2015-01-01 03:40:50'::timestamp without time zone) AND (messagetime <= '2016-04-01 03:35:09'::timestamp without time zone))
                                                   ->  Hash  (cost=90.70..90.70 rows=770 width=26) (actual time=1.809..1.809 rows=770 loops=1)
                                                         Buckets: 1024  Batches: 1  Memory Usage: 47kB
                                                         ->  Seq Scan on clientprofile sender  (cost=0.00..90.70 rows=770 width=26) (actual time=0.004..0.913 rows=770 loops=1)
                                             ->  Hash  (cost=90.70..90.70 rows=770 width=26) (actual time=1.723..1.723 rows=770 loops=1)
                                                   Buckets: 1024  Batches: 1  Memory Usage: 47kB
                                                   ->  Seq Scan on clientprofile receiver  (cost=0.00..90.70 rows=770 width=26) (actual time=0.003..0.842 rows=770 loops=1)
 Total runtime: 129462.319 ms

Table definitions

Table message

      Column       |           Type           | Modifiers | Storage  | Description
-------------------+--------------------------+-----------+----------+-------------
 messageid         | character varying(255)   | not null  | extended |
 clientid          | bigint                   |           | plain    |
 parentid          | character varying(100)   |           | extended |
 receiptid         | character varying(100)   |           | extended |
 message_type      | character varying(50)    |           | extended |
 billable          | boolean                  |           | plain    |
 contenturl        | character varying(255)   |           | extended |
 messagetime       | timestamp with time zone |           | plain    |
 originalfilename  | character varying(255)   |           | extended |
 receiverid        | bigint                   |           | plain    |
 originalclientid  | bigint                   |           | plain    |
 status            | character varying(100)   |           | extended |
 nativemessageid   | character varying(100)   |           | extended |
 nativecontenturl  | character varying(255)   |           | extended |
 refmessageid      | character varying(100)   |           | extended |
 msgtotcount       | bigint                   |           | plain    |
 msgindex          | bigint                   |           | plain    |
 messagesize       | bigint                   |           | plain    |
 isconsolidated    | boolean                  |           | plain    |
 refisa            | character varying(15)    |           | extended |
 refgs             | character varying(15)    |           | extended |
 refst             | character varying(15)    |           | extended |
 isa               | character varying(15)    |           | extended |
 gs                | character varying(15)    |           | extended |
 st                | character varying(15)    |           | extended |
 nativeisa         | character varying(15)    |           | extended |
 nativegs          | character varying(15)    |           | extended |
 nativest          | character varying(15)    |           | extended |
 last_modified     | timestamp with time zone |           | plain    |
 ricbatchid        | character varying(25)    |           | extended |
 reconciled        | boolean                  |           | plain    |
 duplicatekeyfield | boolean                  |           | plain    |
 duplicatenativeid | boolean                  |           | plain    |
 msgmode           | character varying(25)    |           | extended |
 isnotrans         | boolean                  |           | plain    |
 parentmsgid       | character varying(256)   |           | extended |
 originatingserver | character varying(8)     |           | extended |
 statusdetail      | character varying(100)   |           | extended |

Indexes:
    "message_pkey" PRIMARY KEY, btree (messageid)
    "message_clientid_receiverid" btree (clientid, receiverid)
    "message_clientid_receiverid_status_messagetype" btree (clientid, receiverid, status, message_type)
    "message_gs" btree (gs)
    "message_isa" btree (isa)
    "message_last_modified" btree (last_modified)
    "message_messagetime_clientid_receiverid_status_messagetype" btree (messagetime, clientid, receiverid, status, message_type)
    "message_nativecontenturl" btree (nativecontenturl)
    "message_nativegs" btree (nativegs)
    "message_nativeisa" btree (nativeisa)
    "message_nativemessageid" btree (nativemessageid)
    "message_nativest" btree (nativest)
    "message_receiverid" btree (receiverid)
    "message_refgs" btree (refgs)
Foreign-key constraints:
    "message_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    "message_receiverid" FOREIGN KEY (receiverid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    "msgstatus" FOREIGN KEY (status) REFERENCES msgstatuslist(msgstatus)
Has OIDs: no

Table clientprofile

                  Column                  |           Type           |       Modifiers        | Storage  | Description
------------------------------------------+--------------------------+------------------------+----------+-------------
 clientid                                 | bigint                   | not null               | plain    |
 description                              | character varying(255)   |                        | extended |
 role                                     | character varying(255)   |                        | extended |
 filetype                                 | character varying(255)   |                        | extended |
 erroremail                               | character varying(255)   |                        | extended |
 messageenvelope                          | character varying(255)   |                        | extended |
 preprequired                             | boolean                  |                        | plain    |
 preprocessordocument                     | character varying(255)   |                        | extended |
 transporttype                            | character varying(255)   |                        | extended |
 transporturl                             | character varying(255)   |                        | extended |
 notransactiontype                        | character varying(255)   |                        | extended |
 usageindicator                           | character(1)             |                        | extended |
 requiresreceiptaggregation               | boolean                  |                        | plain    |
 requirestransactionaggregation           | boolean                  |                        | plain    |
 requiresvalidation                       | boolean                  |                        | plain    |
 last_modified                            | timestamp with time zone |                        | plain    |
 clientemail                              | character varying(500)   |                        | extended |
 numberofhours                            | bigint                   |                        | plain    |
 wantnotification                         | boolean                  |                        | plain    |
 requiresserverasgnctrlnums               | boolean                  | default false          | plain    |
 destextractor                            | character varying(256)   |                        | extended |
 notranslationpreprocessor                | character varying(255)   |                        | extended |
 alsoincrementserverctrlnumbersonresubmit | boolean                  | not null default false | plain    |
 includeoriginaltransactioninformation    | boolean                  |                        | plain    |
 credithold                               | boolean                  | not null default false | plain    |
 creditholdreason                         | text                     |                        | extended |
Indexes:
    "clientprofile_pkey" PRIMARY KEY, btree (clientid)
Referenced by:
    TABLE "activecustomermap" CONSTRAINT "activecustomermap_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "aggmappingid" CONSTRAINT "aggmappingid_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "controlnumbers" CONSTRAINT "controlnumbers_receiverid" FOREIGN KEY (receiverid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "controlnumbers" CONSTRAINT "controlnumbers_senderid" FOREIGN KEY (senderid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "customerinfo" CONSTRAINT "customerinfo_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "customermap" CONSTRAINT "customermap_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "documentinfo" CONSTRAINT "documentinfo_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "documentinfo" CONSTRAINT "documentinfo_tpid" FOREIGN KEY (tpid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "message" CONSTRAINT "message_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "message" CONSTRAINT "message_receiverid" FOREIGN KEY (receiverid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "notransextractor" CONSTRAINT "notransextractor_receiverid" FOREIGN KEY (receiverid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "notransextractor" CONSTRAINT "notransextractor_senderid" FOREIGN KEY (senderid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "notransinfo" CONSTRAINT "notransinfo_receiverid" FOREIGN KEY (receiverid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "notransinfo" CONSTRAINT "notransinfo_senderid" FOREIGN KEY (senderid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "senderisamap" CONSTRAINT "senderisamap_parentid" FOREIGN KEY (parentid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "senderisamap" CONSTRAINT "senderisamap_senderid" FOREIGN KEY (senderid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "tptxnhelper" CONSTRAINT "tptxnhelper_receiverid" FOREIGN KEY (receiverid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "tptxnhelper" CONSTRAINT "tptxnhelper_senderid" FOREIGN KEY (senderid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "translationmapping" CONSTRAINT "translationmapping_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "translationmapping" CONSTRAINT "translationmapping_tpid" FOREIGN KEY (tpid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "validation" CONSTRAINT "validation_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
    TABLE "xmcuser" CONSTRAINT "xmcuser_clientid" FOREIGN KEY (clientid) REFERENCES clientprofile(clientid) ON DELETE CASCADE
Triggers:
    customerinfo_client_add AFTER INSERT ON clientprofile FOR EACH ROW EXECUTE PROCEDURE set_customerinfo_clientid()

Table extractkeyfield

    Column     |           Type           | Modifiers | Storage  | Description
---------------+--------------------------+-----------+----------+-------------
 primaryid     | character varying(255)   |           | extended |
 keyfieldtype  | character varying(23)    |           | extended |
 keyfield      | character varying(255)   |           | extended |
 messageid     | character varying(255)   |           | extended |
 keyfieldtime  | timestamp with time zone |           | plain    |
 last_modified | timestamp with time zone |           | plain    |
Indexes:
    "extractkeyfield_keyfield" btree (keyfield)
    "extractkeyfield_last_modified" btree (last_modified)
    "extractkeyfield_messageid" btree (messageid)
    "extractkeyfield_primaryid" btree (primaryid)
Has OIDs: no

Best Answer

Query

Your question updates revealed that your query can be improved radically:

SELECT m.*
     , (SELECT keyfield  -- correlated subquery
        FROM   extractkeyfield 
        WHERE  messageid = m.messageid
        LIMIT  1)    AS keyfield
     , s.description AS sendername
     , r.description AS receivername
FROM  (
   SELECT messagetime    -- no DISTINCT necessary
        , messageid
        , clientid
        , message_type
        , contenturl
        , nativecontenturl
        , receiverid
        , status
        , nativeisa
        , nativegs
        , isa
        , gs
        , originalfilename
        , duplicatekeyfield
        , duplicatenativeid
        , count(*) OVER () AS full_count
   FROM   message
   WHERE  messagetime >= timestamp '2015-01-01 03:40:50'  -- incl. lower bound
   AND    messagetime <  timestamp '2016-04-01 03:35:10'  -- excl. upper bound
   ORDER  BY messagetime DESC NULLS LAST
   LIMIT  100            -- count and limit before joining more tables
   ) m
LEFT   JOIN clientprofile s ON s.clientid = m.clientid  -- join later
LEFT   JOIN clientprofile r ON r.clientid = m.receiverid;

Indexes

The perfect indexes for this query would be:

CREATE INDEX message_messagetime_idx ON message (messagetime DESC NULLS LAST);

Usefulness depends on the percentage of rows selected. (You still did not provide cardinalities.) For a small percentage, the index is more likely to be useful.

Your existing index message_messagetime_clientid_receiverid_status_messagetype works as well, less efficiently since it carries some more columns of "dead freight" for the use case.

Joins to clientprofile are covered by its PK. To optimize read performance you might create a multicolumn index on (clientid, description) to allow index-only scans. Since the rows are wide, this should pay.

Finally, extractkeyfield could profit from another multicolumn index on (messageid, keyfield). Again, only useful if you get index-only scans out of it. The rows are not as wide, the benefit is smaller than for clientprofile.

If you get all of this right, the query should be faster by orders of magnitude.

Major points

You still get an arbitrary pick from multiple related extractkeyfield.keyfield. That's according to your definition. Typically, a deterministic pick is more useful.

  • Since messageid is the PK of table message DISTINCT ON (message.messagetime, message.messageid) never makes sense. DISTINCT ON (message.messageid) would achieve the same, cheaper. Since you need the total count before LIMIT, a DISTINCT step makes the query much more expensive since all involved rows have to processed (not just counted). Luckily, DISTINCT can be avoided completely.

  • The correlated subquery with LIMIT 1 prevents duplicates a priori. So you don't need the expensive DISTINCT ON for the big table.

    (SELECT keyfield
     FROM   extractkeyfield 
     WHERE  messageid = m.messageid
     LIMIT  1)
    

    In Postgres 9.3+ you can also use a LATERAL join. Detailed explanation:

  • Select, count and limit rows from your main table before you join to additional tables. That's what happens in my subquery m. Else you do a lot of additional work just to throw it away later.

  • You shouldn't need more work_mem with the fixed query. That was a side effect of the needless DISTINCT step for > 1M rows. Check for lines mentioning "Disk" in connection with "sort" in the EXPLAIN output of the new query.

  • I modified your predicate for the time range:

    WHERE  messagetime >= timestamp '2015-01-01 03:40:50'  -- incl. lower bound
    AND    messagetime <  timestamp '2016-04-01 03:35:10'  -- excl. upper bound
    

    Consider the explanation:

  • Your column messagetime can be NULL. Careful with descending sort order! I made that:

    ORDER  BY messagetime DESC NULLS LAST
    

    Details:

Aside: your table definitions can probably be improved. Among other things, varchar(255) is an unlikely candidate for a PK. And I suspect some of your many indexes might not be used. But that's beyond the scope of this question.