I have a query to a table in Postgres with an order based on a date field and a number field, this table has 1000000 records
The data types of the table are:
fcv_id = serial
fcv_fecha_comprobante = timestamp without time zone
fcv_numero_comprobante = varchar(60)
The query is:
SELECT fcv_id, fcv_fecha_comprobante FROM factura_venta
ORDER BY fcv_fecha_comprobante, fcv_numero_comprobante
This query takes about 5 seconds, but if I take out the "order by" the query takes only 0.499 seconds
The problem I have is that I need to run this query in the shortest time possible,
so I search on google what can I do and create a composite index with the following query
CREATE INDEX factura_venta_orden ON factura_venta
USING btree (fcv_fecha_comprobante ASC NULLS LAST
, fcv_numero_comprobante ASC NULLS LAST);
ALTER TABLE factura_venta CLUSTER ON factura_venta_orden;
But the query is taking the same time or even more.
I'm using Postgres 9.0.13, here is the EXPLAIN with 73436 rows
Sort (cost=11714.03..11897.62 rows=73436 width=27) (actual time=1260.759..1579.853 rows=73436 loops=1)
Sort Key: fcv_fecha_comprobante, fcv_numero_comprobante
Sort Method: external merge Disk: 2928kB
-> Seq Scan on factura_venta (cost=0.00..4018.36 rows=73436 width=27) (actual time=0.363..162.558 rows=73436 loops=1)
Total runtime: 1694.882 ms
Postgres is running on a Phenon II 1055T (3 cores) With 8 GB Ram and 500 GB disk.
How I can optimize this query?
Best Answer
According to your comment on a deleted post, you load all rows into a java module to conduct a search there. But searching is better done in the database itself - that's what a database is good at. Only return the rows you actually need.
If you really need all rows, there are many little things to make this faster. 1M rows will never be very fast, though.
Postgres 9.2 or later
You can make the index covering by appending
fcv_id
:This way, provided the table isn't updated too much, Postgres can retrieve results with an index-only scan.
The additional column comes last since it does not contribute to the sort order. Explanation:
In Postgres 11 or later you could make that:
CLUSTER
/pg_repack
I see you already found
CLUSTER
. You are aware that this is a one-time operation, that should help your cause, but needs to be re-run after enough updates?There is also the community tool
pg_repack
as replacement forVACUUM FULL
/CLUSTER
.work_mem
This line in your
EXPLAIN
output:tells us, that sorting is not done in RAM, which is expensive. You could probably improve performance by tuning the according setting for
work_mem
Setting this too high may have adverse effects. Read the manual carefully. Consider increasing the setting only for the transaction with the big query:
50 MB are an estimate based on your
EXPLAIN ANALYZE
output for 73k rows. Test with 1M rows to get the actual amount you need.