PostgreSQL IP4R Extension – Optimizing Query

postgresql

I need to do A LOT (millions) of lookups and I need them be as efficient as possible.

I'm using the ip4r extension for PostgreSQL:
https://github.com/RhodiumToad/ip4r/blob/master/README.ip4r

Running PostgreSQL 9.6.1

My tables:

CREATE TABLE test_cidr_ipv6
(
    id SERIAL PRIMARY KEY,
    cidr ip6r NOT NULL,
    domain INT NOT NULL,
    start TIMESTAMP NOT NULL,
    end TIMESTAMP,
    data JSON NOT NULL
);
CREATE TABLE test_ipv6
(
    id SERIAL PRIMARY KEY,
    ip ip6 NOT NULL,
    domain INT NOT NULL,
    timestamp_start TIMESTAMP NOT NULL,
    timestamp_end TIMESTAMP,
    data JSON NOT NULL
);

Two Indices:

CREATE INDEX test_ipv6_idx ON test_ipv6 (ip);
CREATE INDEX test_cidr_ipv6_idx ON test_cidr_ipv6 USING GIST (cidr);

Current query:

(SELECT
   test_ipv6.ip,
   test_ipv6.data,
   test_cidr_ipv6.cidr,
   test_cidr_ipv6.data,
   test_ipv6.start,
   test_ipv6.end,
   test_cidr_ipv6.start,
   test_cidr_ipv6.end
 FROM test_ipv6, test_cidr_ipv6
 WHERE
   test_ipv6.ip = ip6('2001:DB8::1')
   AND ip6('2001:DB8::1') <<= test_cidr_ipv6.cidr
   AND '2017-01-01 00:00:00.100000' BETWEEN test_ipv6.start AND test_ipv6.end
   AND '2017-01-01 00:00:00.100000' BETWEEN test_cidr_ipv6.start AND test_cidr_ipv6.end)
UNION ALL
(SELECT ... )  # same as the query above, but with a different ip. 
UNION ALL
...            # repeated 1000+ times    

Question #1-
Is there a better way to repeat the same query like I'm doing in a single "batch" without UNION ALL?

Analyze on a single query:

db=# explain analyze SELECT
db-#    test_ipv6.ip,
db-#    test_ipv6.data,
db-#    test_cidr_ipv6.cidr,
db-#    test_cidr_ipv6.data,
db-#    test_ipv6.start,
db-#    test_ipv6.end,
db-#    test_cidr_ipv6.start,
db-#    test_cidr_ipv6.end
db-#  FROM test_ipv6, test_cidr_ipv6
db-#  WHERE
db-#    test_ipv6.ip = ip6('2001:DB8::1')
db-#    AND ip6('2001:DB8::1') <<= cidr
db-#    AND '2017-01-01 00:00:00.100000' BETWEEN test_ipv6.start AND test_ipv6.end
db-#    AND '2017-01-01 00:00:00.100000' BETWEEN test_cidr_ipv6.start AND test_cidr_ipv6.end;
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.29..16.34 rows=1 width=127) (actual time=0.023..0.026 rows=2 loops=1)
   ->  Index Scan using test_ipv6_idx on test_ipv6  (cost=0.14..8.17 rows=1 width=71) (actual time=0.015..0.015 rows=1 loops=1)
         Index Cond: (ip = '2001:DB8::1'::ip6)
         Filter: (('2017-01-01 00:00:00.1'::timestamp without time zone >= start) AND ('2017-01-01 00:00:00.1'::timestamp without time zone <= end))
   ->  Index Scan using test_cidr_ipv6_idx on test_cidr_ipv6  (cost=0.14..8.17 rows=1 width=56) (actual time=0.005..0.008 rows=2 loops=1)
         Index Cond: ('2001:DB8::1'::ip6r <<= cidr)
         Filter: (('2017-01-01 00:00:00.1'::timestamp without time zone >= start) AND ('2017-01-01 00:00:00.1'::timestamp without time zone <= end))
 Planning time: 0.154 ms
 Execution time: 0.066 ms
(9 rows)

Question #2-
Is the "Nested Loop" concerning?
Is there a better way to structure this type of query to get the data I'm after?

When I do the "bulk" query by doing a UNION ALL a bunch of times with this same query block, the execution time is ~4sec for 5k IPs at a time. Ideally, I would like to get 5k of IPs down to 500ms.

If anyone has any pointers for how to make this type of query more efficient I would be forever thankful.

Best Answer

  1. For the Nested Loop, there is nothing wrong with that.. You have less than five rows you're returning and you do it in one loop.
  2. I'm not sure what you mean as batch. You'll have to show your other queries. Why are you running a UNION ALL a bunch of times to begin with?

Try something like this for your batch,

SELECT
  test_ipv6.ip,
  test_ipv6.data,
  test_cidr_ipv6.cidr,
  test_cidr_ipv6.data,
  test_ipv6.start,
  test_ipv6.end,
  test_cidr_ipv6.start,
  test_cidr_ipv6.end
FROM test_ipv6
INNER JOIN test_cidr_ipv6
  ON (test_ipv6.ip <<= cidr )
WHERE
  test_ipv6.ip = any(ARRAY[ip6('2001:DB8::1'),ip6('2001:DB8::1')]::ip6[])
  AND '2017-01-01 00:00:00.100000' BETWEEN test_ipv6.start AND test_ipv6.end
  AND '2017-01-01 00:00:00.100000' BETWEEN test_cidr_ipv6.start AND test_cidr_ipv6.end;

You may have better luck with a CTE or inlining something like this...

WITH t AS (
  SELECT ipv6(iptext) AS ip
  FROM (VALUES
   ('fddf:c4c1:2573::/48') ,
   ('fd8d:d482:3c08::/48') 
  ) AS v(iptext) 
)
SELECT
  test_ipv6.ip,
  test_ipv6.data,
  test_cidr_ipv6.cidr,
  test_cidr_ipv6.data,
  test_ipv6.start,
  test_ipv6.end,
  test_cidr_ipv6.start,
  test_cidr_ipv6.end
FROM test_ipv6
INNER JOIN test_cidr_ipv6
  ON (test_ipv6.ip <<= cidr)
INNER JOIN t
  USING (ip)
WHERE
  '2017-01-01 00:00:00.100000' BETWEEN test_ipv6.start AND test_ipv6.end
  AND '2017-01-01 00:00:00.100000' BETWEEN test_cidr_ipv6.start AND test_cidr_ipv6.end;