Postgresql – How to deal with seldom used big database and postgresql

azure-vmperformancepostgresqlpostgresql-performance

I loaded in PostgreSQL (9.3) OpenStreetMap data for whole europe (400gb). Then I installed a geocoding api (nominatim) that queries the database with gis queries.

My problem is :

This database is queried a few times a day, but because postgres loads
the data in its buffer on demand, my first query on a particular gps
point is always slow, and I do only one query per GPS point, so it's
always slow (like 60sec slow against 100ms when the data is buffered)

.

What could I do here ?

  • I'm using a A6 Azure instance (4 Cores, 28 GB memory, HDD).
  • Server is Ubuntu 14.04.4 LTS
  • Swapping is enabled
  • There is index on the geometry columns queried.
  • The shared buffer is 10GB
  • work mem is 256MB
  • maintenance work mem is 4GB

Best Answer

depending on your usage, you could try to partition the data to reduce the scanned data (at 400gb even your indexes are large and heavy to read...) I think it's your best option as working with 400gb tables is heavy even on a very optimized queries... You could also pre-warm the cache with the index data , by issuing some of your queries on frequent gps points or by using something like pg_prewarm