Postgresql small index for row contains query

indexpostgresql

I want to store several hundreds of gigabytes of geodata in a postgresql database.
I want to query the data on position, time and/or a unique identifier for each object.

My table layout is similar to this:

CREATE TABLE objects(
id int not null,
at timestamp not null,
pos geometry(Point) not null
/* Other columns irrelevant to the question */
) PARTITION BY RANGE (at)

BRIN indexes on at and pos perfectly serve the needs, since they speed up the queries with relatively small indexes.

A B-TREE index on id quickly becomes several gigabytes, so the server memory cannot hold many of them. BRIN is not suitable, as the id's are very spread within a page, so minimum and maximum is not a useful statistic.

Is there an alternate index type which can speed up queries like SELECT ... FROM objects WHERE id = x with smaller index sizes?

The table is append-only, so only for the newest timestamps new rows will be added.

Best Answer

Since seems like each value of id has many occurences in the table. Given that, you can use a GIN index with its compressed posting lists to save space. Usually GIN indexes are used with array types or their equivalents, but you can use the btree_gin extension to get access to GIN indexes with a scalar type.

However, GIN indexes have several limitations. They don't support ORDER BY, which doesn't seem to matter to you. And they don't support index-only-scans, which may be important to you. Even if the index shrinks to fit in RAM, each tuple pointer found in the index still has to go to the table for resolution. If your Btree index doesn't fit in RAM, surely your table doesn't either. So you will still have lots of random seeking into the table.

Indeed, that was almost certainly the problem in the first place--not random seeks against the btree index, but rather against the table. When you are using the BRIN index to drive table access, you naturally are going to visit the table with high locality of reference. It is not that the BRIN index is effective, it is that the same condition that allows the BRIN to be used also makes the table access for the particular set of rows being returned also be efficient. If you were to replace the BRIN indexes with equivalent Btree indexes, you would probably find that while the storage requirement went up, the performance stays mostly the same.

Related Solutions

Sql-server – the effect of replacing indexes with filtered (non-null value) indexes

Very interesting approach. My upvote for the creativity.

Since you reclaimed the space, I assume the original indexes are no longer in place? The downsides of filtered indexes then are:

Too many of them may cause the search space of the optimiser to grow too large, leading to poor query plans as the optimiser times out
There are several situations where a filtered index will not even be considered, even though the non-filtered equivalent would be. Notably, this can happen when you get a hash join on the indexed column or if you try to ORDER BY the column (without a filter)
Query parameterisation doesn't work with filtered indexes (see: http://www.sqlservercentral.com/blogs/practicalsqldba/2013/04/08/sql-server-part-9-filtered-index-a-new-way-for-performance-improvemnt/)

In practical terms, this means that you have to be extremely careful with filtered indexes as they will often result in horrible query plans. I would not go so far as to call them useless, but I view them as an addition to traditional indexes, not as a replacement (as you are trying to do).

Sql-server – Index usage stats DMV indicates no index activity

One possibility would be that you don't have sufficient permissions to view the metadata.

Another is that sys.dm_db_index_usage_stats only reports on indexes that have been used at least once since the server was last restarted. Also

whenever a database is detached or is shut down (for example, because AUTO_CLOSE is set to ON), all rows associated with the database are removed.

AUTO_CLOSE is the default for SQL Server Express.

You need to filter sys.dm_db_index_usage_stats by database_id to determine if it has metadata loaded for a particular database also the join should include index_id

SELECT *
FROM   sys.indexes i
       LEFT JOIN sys.dm_db_index_usage_stats s
         ON i.object_id = s.object_id
            AND i.index_id = s.index_id
            AND s.database_id = DB_ID()

Best Answer

Related Solutions

Sql-server – the effect of replacing indexes with filtered (non-null value) indexes

Sql-server – Index usage stats DMV indicates no index activity

Related Question