PostgreSQL – How to Speed Up a Slow Query

performancepostgispostgresqlquery-performance

I'm running PostgreSQL 9.6 with PostGIS 2.3.3

I'm trying to make this rather critical query go faster (looking for users within 1000 meters of a location), but I'm having troubles setting up the right indexes.

Could someone point me in the right direction?

users are 200k rows, locations are 1200 rows, user_push_tokens are 155k rows

users table:

create table users
(
  id serial not null
    constraint users_pkey
      primary key,
  (an additional 20-ish columns),
  geo_point geometry(Point,4326)
);`

create unique index users_id
  on users (id);

create index users_geo_point_idx
  on users using gist(geo_point);

user_push_tokens table:

create table user_push_tokens
(
  user_id integer not null
    constraint push_tokens_user_id_fkey
      references users
        on delete cascade,
  push_token varchar(255) not null,
  push_provider varchar(64) not null,
  app varchar(64) default 'app'::character varying not null,
  id integer default nextval('user_push_tokens_id_seq'::regclass) not null
    constraint user_push_tokens_pkey
      primary key,
  active boolean default true not null,
);

create index trinity_unique_index
  on user_push_tokens (user_id, app, push_token);

create index user_push_tokens_token_fetch_idx
  on user_push_tokens (user_id, app, active);

create index user_id_index
  on user_push_tokens (user_id);

locations table:

create table locations
(
  id integer default nextval('locations_id_seq'::regclass) not null
    constraint locations_pkey
      primary key,
  (another 25 columns),
  geo_point geometry(Point,4326)
);

create unique index locations_id_key
  on locations (id);

create index locations_geo_point_idx
  on locations using gist(geo_point);

The query

EXPLAIN ANALYSE SELECT
    u.id AS user_id,
    upt.push_provider,
    upt.push_token
  FROM users u
    JOIN locations l ON l.id = 3896
    JOIN user_push_tokens upt
      ON upt.user_id = u.id AND upt.active = true AND upt.app = 'app'
  WHERE ST_DistanceSphere(u.geo_point, l.geo_point) <= 1000;

results in

Nested Loop  (cost=26087.06..63658.87 rows=30605 width=107) (actual time=353.304..887.371 rows=2498 loops=1)
  Join Filter: (_st_distance(geography(u.geo_point), geography(l.geo_point), '0'::double precision, false) <= '1000'::double precision)
  Rows Removed by Join Filter: 89539
  ->  Index Scan using locations_id_key on locations l  (cost=0.28..8.29 rows=1 width=32) (actual time=0.009..0.014 rows=1 loops=1)
        Index Cond: (id = 3896)
  ->  Hash Join  (cost=26086.78..39090.07 rows=91815 width=139) (actual time=352.437..657.228 rows=92037 loops=1)
        Hash Cond: (upt.user_id = u.id)
        ->  Seq Scan on user_push_tokens upt  (cost=0.00..7162.82 rows=91815 width=107) (actual time=0.032..103.512 rows=92037 loops=1)
              Filter: (active AND ((app)::text = 'app'::text))
              Rows Removed by Filter: 62437
        ->  Hash  (cost=22114.46..22114.46 rows=195546 width=36) (actual time=352.199..352.199 rows=195589 loops=1)
              Buckets: 65536  Batches: 4  Memory Usage: 3563kB
              ->  Seq Scan on users u  (cost=0.00..22114.46 rows=195546 width=36) (actual time=0.014..214.976 rows=195589 loops=1)

More readable explain output

Thank you for your time.

Best Answer

ST_DWithin

ST_DistanceSphere(x,y)<t cannot use your spatial index. ST_DWithin can, depending on the selectivity. Instead use,

ST_DWithin(u.geo_point, l.geo_point, 1000);

Also, for testing purposes, you should remove the join to user_push_tokens and add it back once you get a better plan for the simpler query.