For example, if you have 100,000 rows in a table that gets, say, 75%
read operations, would it be wise to add a filtered index that only
covers 500 rows? Or 100 rows? What if the filter covers 85,000 of
those rows?
That would depend on the queries against the table. If query patterns are such that they target a subset of the 500 or 100 rows that your filtered index covers, perfect. On the flip side, it's unlikely that the optimiser is going to choose a filtered index that includes 85% unless it is a covering index for a particular query.
I haven't tested this but I would expect filtered index utilisation to exhibit the same tipping point behaviour as non-filtered indexes.
Normally we'd expect that when postgres was restarted, the crash recovery process would have removed files related to a rollback'ed index from the data directory.
Let's assume that it didn't work, or at least that it has to be checked manually.
The list of files that should be in the datadir can be established with a query like this:
select pg_relation_filenode(oid)
from pg_class
where relkind in ('i','r','t','S','m')
and reltablespace=0
order by 1;
reltablespace=0
is for the default tablespace. If the problematic index was created in a non-default tablespace, this 0
must be replaced by its OID in pg_tablespace
.
i,r,t,S,m in relkind
correspond respectively to indexes, tables, toast space, sequences, materialized views. All these objects have their data in files whose names match pg_relation_filenode(oid)
.
On disk, the data files are below $PGDATA/base/oid/
where oid
is the oid
of the database obtained by select oid,datname from pg_database
.
If we're not talking about the default tablespace, base
is replaced by PG_version_somelabel
instead.
List and sort the files matching relfilenodes in that directory:
ls | grep -E '^[0-9]+$' | sort -n > /tmp/list-of-relations.txt
(that actually keeps only the first segment for relations that are larger than 1Gb. If there are lingering segments not attached to anything they should be considered separately)
and diff that file with the result of the query above.
If there are lingering data files that do not correspond to any object that the db knows about, they should appear in that diff.
Best Answer
Free space within the table can be used to speed up certain kinds of updates on tables, do inserts faster by avoiding the need to extend the table, and make index growth more efficient.
If your table only changes infrequently and you don't mind reindexing it (or dropping & re-creating the index) each time, that's totally fine. A fillfactor of 100 will save a little bit of I/O when scanning the index, and the spare space is no use to you unless you do inserts/updates on the indexed table.