I'm trying to speed up and simplify a SQL query against an imported OpenStreetMap database (OSM). The database is stored in a PostgreSQL 9.2.4 server.
This OSM import features two particular tables, planet_osm_rels
and planet_osm_ways
. In the first table exists an relation for country boundaries, which I can extract by querying an hstore column, tags_hstore
. The resulting members
attribute then contains an text array giving me a bunch of information, including which ways are part of this relation. The ID of the particular ways are prefixed with w
to indicate that it is a way ID, e.g. w23412
. To get the actual nodes of the ways, I need to query the planet_osm_ways
table with the IDs I obtained, minus the w
, of course.
To summarize, I've got the following table structure:
Table "public.planet_osm_rels"
Column | Type | Modifiers
-------------+----------+-----------
id | bigint | not null
way_off | smallint |
rel_off | smallint |
parts | bigint[] |
members | text[] |
tags | text[] |
pending | boolean | not null
tags_hstore | hstore |
Indexes:
"planet_osm_rels_pkey" PRIMARY KEY, btree (id)
"planet_osm_rels_idx" btree (id) WHERE pending
"planet_osm_rels_parts" gin (parts) WITH (fastupdate=off)
"planet_osm_rels_tags_hstore_idx" gin (tags_hstore)
Table "public.planet_osm_ways"
Column | Type | Modifiers
-------------+----------+-----------
id | bigint | not null
nodes | bigint[] | not null
tags | text[] |
pending | boolean | not null
tags_hstore | hstore |
Indexes:
"planet_osm_ways_pkey" PRIMARY KEY, btree (id)
"planet_osm_ways_idx" btree (id) WHERE pending
"planet_osm_ways_nodes" gin (nodes) WITH (fastupdate=off)
I've come up with the following query:
SELECT nodes
FROM planet_osm_ways
WHERE id IN (
SELECT trim(leading 'w' from unnest)::int
FROM (
SELECT unnest(members)
FROM planet_osm_rels
WHERE (tags_hstore @> '"type"=>"boundary", "admin_level"=>"2", "name:en"=>"Germany"'))
AS unnest
WHERE unnest LIKE 'w%');
The query is, small wonder, quite slow. I know that I can (a) eliminate the members
column by providing a link table and throw in some more indexes. However, I'd also like to optimize the query itself and at least remove one of the sub-queries, as the query plan is quite complex:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=299957.16..300008.23 rows=90957940 width=8)
-> HashAggregate (cost=299957.15..299957.16 rows=1 width=32)
-> Subquery Scan on unnest (cost=0.00..299954.76 rows=956 width=32)
Filter: (unnest.unnest ~~ 'w%'::text)
-> Seq Scan on planet_osm_rels (cost=0.00..297563.51 rows=191300 width=180)
Filter: ((tags)::hstore @> '"type"=>"boundary", "name:en"=>"Germany", "admin_level"=>"2"'::hstore)
-> Index Only Scan using planet_osm_ways_pkey on planet_osm_ways (cost=0.01..51.06 rows=1 width=8)
Index Cond: (id = (ltrim(unnest.unnest, 'w'::text))::integer)
(8 rows)
And the EXPLAIN ANALYZE
:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=299957.16..299980.93 rows=39090200 width=1147) (actual time=18680.342..36216.571 rows=1266 loops=1)
-> HashAggregate (cost=299957.15..299957.16 rows=1 width=32) (actual time=18606.686..18608.105 rows=1259 loops=1)
-> Subquery Scan on unnest (cost=0.00..299954.76 rows=956 width=32) (actual time=468.391..18606.233 rows=1259 loops=1)
Filter: (unnest.unnest ~~ 'w%'::text)
Rows Removed by Filter: 1283
-> Seq Scan on planet_osm_rels (cost=0.00..297563.51 rows=191300 width=180) (actual time=468.376..18605.288 rows=2542 loops=1)
Filter: ((tags)::hstore @> '"type"=>"boundary", "name:en"=>"Germany", "admin_level"=>"2"'::hstore)
Rows Removed by Filter: 1912651
-> Index Scan using planet_osm_line_pkey on planet_osm_line (cost=0.01..23.73 rows=3 width=1155) (actual time=13.926..13.978 rows=1 loops=1259)
Index Cond: (osm_id = (ltrim(unnest.unnest, 'w'::text))::bigint)
Total runtime: 36217.277 ms
The number of rows returned isn't quite as high as to explain the long runtime:
count
-------
1266
I cannot use SELECT unnest(members) AS unnested .... WHERE unnested LIKE 'w%'
since the "unnested" part is unknown to the WHERE clause. Is there any better way to do it?
Best Answer
IN
queries with huge sets are notoriously slow. It's often faster to use aJOIN
instead:But that's not the most important problem here. I wonder why the GIN index
planet_osm_rels_tags_hstore_idx
is not being used. Are you selecting large enough parts of the tableplanet_osm_rels
to justify a sequential scan?Oh, and
id
is typebigint
. So cast tobigint
instead ofint
for less friction.If you can extract "way IDs" and save them redundantly in a separate column
way_ids bigint[]
in your table, your query would become quite a bit simpler and faster, with one less subquery level: