Postgresql – How to optimize multiple ORDER BYs

optimizationorder-bypostgresql

I'm getting sluggish performance on a table with ~300,000 rows and B-Trees on each column.

This is for a dynamic pagination page where the query is constructed on demand, and the application caches the primary key in the query's specified order.

For this query

explain analyze 
 SELECT supplier_management.buyer_purchase_order_id 
     FROM supplier_management 
     ORDER BY item_description DESC,
      item_number DESC,
      order_type ASC,
      possession_date DESC,
      shipment_type DESC,
      store_type DESC

I get these results:

Sort (cost=51026.98..51750.35 rows=289348 width=72) (actual time=8229.280..12349.596 rows=289348 loops=1)

Sort Key: item_description, item_number, order_type, possession_date, shipment_type, store_type

Sort Method: external merge Disk: 24744kB

-> Seq Scan on supplier_management (cost=0.00..10876.48 rows=289348 width=72) (actual time=0.015..187.426 rows=289348 loops=1)

Total runtime: 12407.064 ms

How can the performance of a multi-column sort be improved? Or should I just do it in C++?

Table structure

buyer_purchase_order_id bigint
supplier_number bigint
supplier_name   character varying
purchase_order_number   bigint
store_number    integer
item_number bigint
item_description    character varying
project_type    character varying
order_date  integer
requested_arrival_date  integer
department  character varying
store_type  character varying
shipment_type   character varying
order_type  character varying
quantity_ordered    integer
quantity_allocation integer
quantity_staged integer
quantity_shipped    integer
quantity_received   integer
in_stock_date   integer
in_stock_date_visible_on    integer
show_red    boolean
requested_arrival_date_plus_four_business_days  integer
supplier_status character varying
notes_comments  character varying
requested_arrival_date_color    character varying
grand_opening_date  integer
possession_date integer
consolidator_name   character varying
real_requested_arrival_date_plus_four_business_days integer

No character varying exceeds the length limit for indexing; however, there is some duplication. Would it be more efficient to put the actual values in another table, normalize, and join on the related integer columns?

Best Answer

A few things you can do:

Use enums or lookups keyed by integer values, or a simple "char" field, instead of varchar sort keys where possible. I'd use an enum because you can control the sort order easily.

The only serious downside with an enum is that you can't currently drop values from an enum type. You can add them (including inserting them in the middle of the sort order) but not remove them. If that's a problem, you'll want to use a lookup table, or just a field declared "char" that has single character codes.

Also, if you don't need proper language collation, specify COLLATE "C" for character fields, e.g.

CREATE INDEX itemdesc_c ON supplier_management (item_description ASC COLLATE "C");

and then:

ORDER BY ...
    itemdesc COLLATE "C",
    ...

Important things to note:

  • Pg can combine indexes for predicates (WHERE clauses etc) but not sorting. You can't use a bitmap index scan for a sort. So it can use at most one of the candidate indexes, then it has to sort the rows within each group.

  • Low-selectivity indexes are a waste of time. If the values aren't widely distributed, don't index the column.

  • Pg's doing an on-disk sort. Throw more memory at the problem - try SET work_mem = '20MB' to start with. But see my comments below re thrashing with high max_connections. Use a connection pool.

  • Use a connection pool.

  • Indexes have a cost - they slow down insert/update/delete and increase vacuum work. So if the index isn't being used lots, get rid of it.

  • pg_catalog.pg_stat_user_indexes will help you tell which indexes are used.

  • pg_stat_statements (in contrib) and pg_stat_plans (the latter is an external module) are very useful for capturing data about query patterns, slow queries, etc.

  • Learn to love the auto_explain module.


Also, if you always do this sort, creating a composite index to match it will help.

CREATE INDEX bigindex ON supplier_management (
      item_description DESC,
      item_number DESC,
      order_type ASC,
      possession_date DESC,
      shipment_type DESC,
      store_type DESC
);

... but be aware that it's only useful for this particular sort, and it'll be a big index so it's only worth having if you do this a lot. In fact, you might as well add supplier_management.buyer_purchase_order_id too, so it can do an index-only scan:

CREATE INDEX bigindex ON supplier_management (
      item_description DESC,
      item_number DESC,
      order_type ASC,
      possession_date DESC,
      shipment_type DESC,
      store_type DESC,
      buyer_purchase_order_id
);