Postgresql – Defining a physical strategy for a table with ~110M rows

database-designindexpartitioningperformancepostgresql

I am tasked with designing a table in PostgreSQL (never used it) to load data from flat files. This table will contain ~110M rows. Each row will have a certain range for IP Addresses (e.g.: 0.0.1.0 – 0.0.2.240) and around 20 attributes.

The data needs to be entirely replaced on weekly basis and once a day an update will be done based on daily files.

The application will query this table for a given IP Address. Appropriate range needs to be found and attributes returned.

My plan is to partition the table based on the first octet of the IPs. This will give me 255 partitions. Indexes will be based on Start and End IP for the range.

Please let me know if there are any more helpful ideas for partitioning and indexing? Any good practices will also help.

Best Answer

ip4r Extension

My plan is to partition the table based on first octet of IP. This will give me 255 partitions. Indexes will be based on start and end ip for the range. Please let me know if there are any more helpful ideas for partitioning and indexing? Any good practices will also help.

This is an ideal case for ip4r, excerpt from the docs.

[...] the builtin types have no good support for index lookups of the form (column >>= parameter), i.e. where you have a table of IP address ranges and wish to find which ones include a given IP address. This requires an rtree or gist index to do efficiently, and also requires a way to represent IP address ranges that do not fall precisely on CIDR boundaries. [... The default types] are variable length types (to support ipv6) with non-trivial overheads, and the authors (whose applications mainly deal in large volumes of single IPv4 addresses) wanted a more lightweight representation.

CREATE TABLE ipranges (range ip4r primary key, description text not null);
CREATE INDEX ipranges_range_idx ON ipranges USING gist (range);
INSERT INTO ipranges VALUES ('10.0.0.0/8','rfc1918 block 1');
INSERT INTO ipranges VALUES ('172.16.0.0/12','rfc1918 block 2');
INSERT INTO ipranges VALUES ('192.168.0.0/16','rfc1918 block 3');
INSERT INTO ipranges VALUES ('0.0.0.0/1','classical class A space');
INSERT INTO ipranges VALUES ('10.0.1.10-10.0.1.20','my internal network');
INSERT INTO ipranges VALUES ('127.0.0.1','localhost');

SELECT *
FROM ipranges
WHERE ip <<= range
ORDER BY ip, @ range;