PostgreSQL – Managing Large Btree Index with Few Rows (OpenStreetMap GIS Data)

indexpostgispostgresqlpostgresql-9.1Ubuntu

I tried to import a subset of open street map data to a Postgres database using the standard tool (osm2pgsql). I then tried to rebuild the indices on the table using plain sql (dropping them, then recreating them) and they end up taking massive amount of space.

The table size is 9GB, with an int id, 70 text columns for attributes (for example column access_type has 250 different text values, and is null in 90% of cases), and a possibly large geometry column. If I try:

 CREATE INDEX planet_osm_polygon_accesstype_index 
   ON planet_osm_polygon 
   USING BTREE (access_type)

I end up with an index that's 1GB in size for 140k rows, which is massive considering how little data there is. This doesn't seem to happen on other systems, and I tried to ANALYZE, then VACUUM.

I don't know much about Postgres, but any hints about how to reduce the index size would be very appreciated.

(OS is ubuntu 12-04, PostgreSQL is version 9.1)

Best Answer

You seem to expect that rows with NULL values are excluded from a B-tree index automatically, but that's not the case. Those are indexed as well and can be searched for. However, since:

access_type ... is null in 90% of cases

that's hardly useful in your case. Such common values hardly ever make sense in an index to begin with, be it NULL or any other value. Exclude NULL values from the index with a partial index.

CREATE INDEX planet_osm_polygon_accesstype_index ON planet_osm_polygon (access_type)
WHERE access_type IS NOT NULL;

Should be much smaller and thus faster.

Remember that you may have to include the same WHERE condition in queries to make Postgres realize it can apply the partial index.

Related answers with more details: