Postgresql – Optimize simple query using ORDER BY date and text

indexperformancepostgresqlquery-performance

I have a query to a table in Postgres with an order based on a date field and a number field, this table has 1000000 records

The data types of the table are:

fcv_id = serial
fcv_fecha_comprobante = timestamp without time zone
fcv_numero_comprobante = varchar(60)

The query is:

SELECT fcv_id, fcv_fecha_comprobante FROM factura_venta
ORDER BY fcv_fecha_comprobante, fcv_numero_comprobante

This query takes about 5 seconds, but if I take out the "order by" the query takes only 0.499 seconds

The problem I have is that I need to run this query in the shortest time possible,
so I search on google what can I do and create a composite index with the following query

CREATE INDEX factura_venta_orden ON factura_venta
USING btree (fcv_fecha_comprobante ASC NULLS LAST
           , fcv_numero_comprobante ASC NULLS LAST);
ALTER TABLE factura_venta CLUSTER ON factura_venta_orden;

But the query is taking the same time or even more.

I'm using Postgres 9.0.13, here is the EXPLAIN with 73436 rows

Sort  (cost=11714.03..11897.62 rows=73436 width=27) (actual time=1260.759..1579.853 rows=73436 loops=1)
  Sort Key: fcv_fecha_comprobante, fcv_numero_comprobante
  Sort Method:  external merge  Disk: 2928kB
  ->  Seq Scan on factura_venta  (cost=0.00..4018.36 rows=73436 width=27) (actual time=0.363..162.558 rows=73436 loops=1)
Total runtime: 1694.882 ms

Postgres is running on a Phenon II 1055T (3 cores) With 8 GB Ram and 500 GB disk.

How I can optimize this query?

Best Answer

According to your comment on a deleted post, you load all rows into a java module to conduct a search there. But searching is better done in the database itself - that's what a database is good at. Only return the rows you actually need.

If you really need all rows, there are many little things to make this faster. 1M rows will never be very fast, though.

Postgres 9.2 or later

You can make the index covering by appending fcv_id:

CREATE INDEX factura_venta_orden
ON factura_venta (fcv_fecha_comprobante, fcv_numero_comprobante, fcv_id);

This way, provided the table isn't updated too much, Postgres can retrieve results with an index-only scan.

The additional column comes last since it does not contribute to the sort order. Explanation:

In Postgres 11 or later you could make that:

CREATE INDEX factura_venta_orden
ON factura_venta (fcv_fecha_comprobante, fcv_numero_comprobante) INCLUDE (fcv_id);

CLUSTER / pg_repack

I see you already found CLUSTER. You are aware that this is a one-time operation, that should help your cause, but needs to be re-run after enough updates?

There is also the community tool pg_repack as replacement for VACUUM FULL / CLUSTER.

work_mem

This line in your EXPLAIN output:

Sort Method:  external merge  Disk: 2928kB

tells us, that sorting is not done in RAM, which is expensive. You could probably improve performance by tuning the according setting for work_mem

work_mem (integer)

Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. ...

Setting this too high may have adverse effects. Read the manual carefully. Consider increasing the setting only for the transaction with the big query:

BEGIN;
SET LOCAL work_mem  = '50MB';
SELECT ...;
COMMIT;

50 MB are an estimate based on your EXPLAIN ANALYZE output for 73k rows. Test with 1M rows to get the actual amount you need.