According to your comment on a deleted post, you load all rows into a java module to conduct a search there. But searching is better done in the database itself - that's what a database is good at. Only return the rows you actually need.
If you really need all rows, there are many little things to make this faster. 1M rows will never be very fast, though.
Postgres 9.2 or later
You can make the index covering by appending fcv_id
:
CREATE INDEX factura_venta_orden
ON factura_venta (fcv_fecha_comprobante, fcv_numero_comprobante, fcv_id);
This way, provided the table isn't updated too much, Postgres can retrieve results with an index-only scan.
The additional column comes last since it does not contribute to the sort order. Explanation:
In Postgres 11 or later you could make that:
CREATE INDEX factura_venta_orden
ON factura_venta (fcv_fecha_comprobante, fcv_numero_comprobante) INCLUDE (fcv_id);
CLUSTER
/ pg_repack
I see you already found CLUSTER
. You are aware that this is a one-time operation, that should help your cause, but needs to be re-run after enough updates?
There is also the community tool pg_repack
as replacement for VACUUM FULL
/ CLUSTER
.
work_mem
This line in your EXPLAIN
output:
Sort Method: external merge Disk: 2928kB
tells us, that sorting is not done in RAM, which is expensive. You could probably improve performance by tuning the according setting for work_mem
work_mem (integer)
Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. ...
Setting this too high may have adverse effects. Read the manual carefully. Consider increasing the setting only for the transaction with the big query:
BEGIN;
SET LOCAL work_mem = '50MB';
SELECT ...;
COMMIT;
50 MB are an estimate based on your EXPLAIN ANALYZE
output for 73k rows. Test with 1M rows to get the actual amount you need.
The idle in transaction
state shown in the pg_stat_activity
output generally means that there's something that starts a transaction with BEGIN
, does some work, and then doesn't issue the COMMIT
so that the transaction is finished. This has the effect of leaving the ACCESS SHARE
locks acquired by the SELECT
statements hanging around, which block the ALTER TABLE
statement you're trying to run from acquiring the ACCESS EXCLUSIVE
lock it needs to change the type of a column for every row in the table.
They could only be considered "ghost queries", in the sense that they are hanging around waiting for an answer of COMMIT
or ROLLBACK
that will probably never come. But they are problematic, nonetheless as you figured out.
There might be a bug somewhere in your application that has left all these connections open and waiting for a commit. I'd definitely check that out, and see if perhaps a code block doesn't have a terminating COMMIT
.
More commonly (in my experience, YMMV of course), people start database sessions with a BEGIN
, do some work, and then go to lunch and never close their session.
It's best to fix the problem at the source, but if you absolutely, positively need to do something right now, pg_cancel_backend
and pg_terminate_backend
are not bad ways to go.
Commenting on your answer, I'd specifically select the state of idle in transaction
, rather than LIKE '%idle%'
because those are the ones causing your problems. The idle
state backends are just waiting around patiently for new client commands, and had last ran a ROLLBACK
command.
You can look at the Explicit Locking documentation for more information about what blocks what.
The description of idle-in-transaction
is in the Unix Monitoring Tools section for reference.
pg_cancel_backend
and pg_stat_activity
issue SIGINT
and SIGTERM
signals, respectively, and the effects of those signals (and warnings about other signals) are documented in the Server Shutdown section.
Hope that helps explain what you saw and sheds some light on your mystery. =)
Best Answer
How to do it efficiently and how to do it without blocking everyone else for the duration are diametrically opposed goals. The efficient way is to schedule a downtime window in which to do it.
If that can't be done, then:
Add a new column of the desired type, with a NULL default value.
Change all the code which changes that old column or inserts new rows to make sure they correctly populate the new column as well (but make sure they don't depend on the new value already being correct--they just make it be correct)
Create an index to efficiently find rows where the new column is still NULL
Write a script to update rows with NULL values of new column to set them to be correct based on the value of the old column. Do this in chunks that are as large as feasible, given that the first row in the chunk will be locked for the duration of the processing of the chunk.
Once all chunks of this are done and verified, change all the code to rely on the new column rather than the old column.
Change all the code to stop trying to maintain the old column
Drop the old column
Drop the index that supports finding still-NULL values of the new column (unless it is still useful)