This is possible for PostgreSQL 8.4 or later. In order to create a multi-column GIN index, you need to install the additional module btree_gin.
Simple types like integer
or date
are usually better off with a (default) B-Tree index, so that's not installed with standard PostgreSQL. But for a case like this a multi-column index is fastest, so you need the additional index methods for the plain types.
In PostgreSQL 9.1 or later you can install it with CREATE EXTENSION
:
CREATE EXTENSION btree_gin;
In older versions you would run as privileged system user (like postgres):
psql -d dbname -f SHAREDIR/contrib/btree_gin.sql
For a PostgreSQL 8.4 installation on Debian,this might be:
psql -d mydb -f /usr/share/postgresql/8.4/contrib/btree_gin.sql
Then, given this table:
CREATE TEMP TABLE tbl (age int, names text[], thedate date);
... you can create this multi-column GIN index:
CREATE INDEX tbl_gin_idx ON tbl USING GIN (names, age, thedate);
... which can be used by a query like:
SELECT * FROM tbl
WHERE age = 26
AND '{json}'::text[] <@ names
ORDER BY thedate;
Note that a GIN index caries a non-trivial cost for write operations.
But that's hardly worth mentioning in comparison to what a SELECT
on 10 million rows without index would do to you.
You have a lot of indexes and my first impuls would be to check whether all of them are actually used and remove the unused ones.
The optimal index for your query is not there, yet, though: a multicolumn index on (scat_group_code, revenue)
. Since this is a materialized view, it's reasonable to assume a read-only situation, which is perfect for index-only scans.
Also, since the table (many columns, some potentially big text columns) is much wider than the index (two columns, fewer rows for the partial variant), an index-only scan is particularly efficient here. Size matters.
If this particular query is very common / important, I would go one step further and make it a partial, multicolumn index:
CREATE INDEX ON v_premises_filter (scat_group_code, revenue)
WHERE area IS NOT NULL
AND revenue IS NOT NULL;
I added the 2nd condition revenue IS NOT NULL
because min()
and max()
ignore NULL values. I tested with Postgres 9.6. You may have to add the logically redundant predicate AND revenue IS NOT NULL
to your query in older versions to make the query planner understand the index is applicable.
The index your query used in your tests (v_premises_filter_revenue_idx
) is on (revenue)
, which is pretty inefficient for a predicate on scat_group_code
. Adding that column as first index item makes a big difference, because Postgres can cheaply grab only the share with matching scat_group_code
now. Where your 2nd query had to filter 1631527 rows, (Rows Removed by Filter: 1631527"
) it only has to filter much fewer rows with area is null
OR revenue is null
for the simple index and none at all for the partial index.
Also, since irrelevant rows are excluded from the index, it is substantially smaller, which contribute3s to performance as well.
You should see two index-only scans and a much faster query in either case now.
Related:
Why the fluctuation?
Your table is big (1909175 rows) and data distribution is obviously irregular. The default statistics_target
setting of 100 is probably too small to gather enough details. Postgres decides to use a bad query plan for your 2nd query, based on insufficient information. It might help to increase the STATISTICS
setting for scat_group_code
, revenue
and area
.
But you won't need this with the suggested new index any more, because an index-only scan on the new index is always the best plan.
Consider this related answer from just yesterday:
Best Answer
Reindexing is not dangerous and can not harm data consistency. However, if you have time critical writes, you may loose data if the table is locked and the DML is aborted.
Reindexing should not take a lot of time, but will usually involve reading the whole table, sorting the index fields and writing a new index. Given the time for
COUNT(*)
it will likely take five minutes or more.It is unlikely this is an indexing problem.
COUNT(*)
should use a table scan in which case no index is read. I would expect you have an IO problem of some sort.Try using
COUNT(1)
orCOUNT(pk_field)
which may use the index.If you are running on a Unix or Linux platform you may want to monitor disk activity with
sar
. You might also have a failing disk which can cut IO rates dramatically.Tables with large objects can also increase IO significantly to construct the records for COUNT(*).