I need to do A LOT (millions) of lookups and I need them be as efficient as possible.
I'm using the ip4r extension for PostgreSQL:
https://github.com/RhodiumToad/ip4r/blob/master/README.ip4r
Running PostgreSQL 9.6.1
My tables:
CREATE TABLE test_cidr_ipv6
(
id SERIAL PRIMARY KEY,
cidr ip6r NOT NULL,
domain INT NOT NULL,
start TIMESTAMP NOT NULL,
end TIMESTAMP,
data JSON NOT NULL
);
CREATE TABLE test_ipv6
(
id SERIAL PRIMARY KEY,
ip ip6 NOT NULL,
domain INT NOT NULL,
timestamp_start TIMESTAMP NOT NULL,
timestamp_end TIMESTAMP,
data JSON NOT NULL
);
Two Indices:
CREATE INDEX test_ipv6_idx ON test_ipv6 (ip);
CREATE INDEX test_cidr_ipv6_idx ON test_cidr_ipv6 USING GIST (cidr);
Current query:
(SELECT
test_ipv6.ip,
test_ipv6.data,
test_cidr_ipv6.cidr,
test_cidr_ipv6.data,
test_ipv6.start,
test_ipv6.end,
test_cidr_ipv6.start,
test_cidr_ipv6.end
FROM test_ipv6, test_cidr_ipv6
WHERE
test_ipv6.ip = ip6('2001:DB8::1')
AND ip6('2001:DB8::1') <<= test_cidr_ipv6.cidr
AND '2017-01-01 00:00:00.100000' BETWEEN test_ipv6.start AND test_ipv6.end
AND '2017-01-01 00:00:00.100000' BETWEEN test_cidr_ipv6.start AND test_cidr_ipv6.end)
UNION ALL
(SELECT ... ) # same as the query above, but with a different ip.
UNION ALL
... # repeated 1000+ times
Question #1-
Is there a better way to repeat the same query like I'm doing in a single "batch" without UNION ALL
?
Analyze on a single query:
db=# explain analyze SELECT
db-# test_ipv6.ip,
db-# test_ipv6.data,
db-# test_cidr_ipv6.cidr,
db-# test_cidr_ipv6.data,
db-# test_ipv6.start,
db-# test_ipv6.end,
db-# test_cidr_ipv6.start,
db-# test_cidr_ipv6.end
db-# FROM test_ipv6, test_cidr_ipv6
db-# WHERE
db-# test_ipv6.ip = ip6('2001:DB8::1')
db-# AND ip6('2001:DB8::1') <<= cidr
db-# AND '2017-01-01 00:00:00.100000' BETWEEN test_ipv6.start AND test_ipv6.end
db-# AND '2017-01-01 00:00:00.100000' BETWEEN test_cidr_ipv6.start AND test_cidr_ipv6.end;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.29..16.34 rows=1 width=127) (actual time=0.023..0.026 rows=2 loops=1)
-> Index Scan using test_ipv6_idx on test_ipv6 (cost=0.14..8.17 rows=1 width=71) (actual time=0.015..0.015 rows=1 loops=1)
Index Cond: (ip = '2001:DB8::1'::ip6)
Filter: (('2017-01-01 00:00:00.1'::timestamp without time zone >= start) AND ('2017-01-01 00:00:00.1'::timestamp without time zone <= end))
-> Index Scan using test_cidr_ipv6_idx on test_cidr_ipv6 (cost=0.14..8.17 rows=1 width=56) (actual time=0.005..0.008 rows=2 loops=1)
Index Cond: ('2001:DB8::1'::ip6r <<= cidr)
Filter: (('2017-01-01 00:00:00.1'::timestamp without time zone >= start) AND ('2017-01-01 00:00:00.1'::timestamp without time zone <= end))
Planning time: 0.154 ms
Execution time: 0.066 ms
(9 rows)
Question #2-
Is the "Nested Loop" concerning?
Is there a better way to structure this type of query to get the data I'm after?
When I do the "bulk" query by doing a UNION ALL
a bunch of times with this same query block, the execution time is ~4sec for 5k IPs at a time. Ideally, I would like to get 5k of IPs down to 500ms.
If anyone has any pointers for how to make this type of query more efficient I would be forever thankful.
Best Answer
UNION ALL
a bunch of times to begin with?Try something like this for your batch,
You may have better luck with a CTE or inlining something like this...