Postgresql – Postgres text column too large for an index – searching for IP address, so I don’t think text search will work

database-designfull-text-searchindex-tuningpostgresqlsubstring

I have a Postgres database with a column that has a large amount of text in it. There are potentially multiple IP addresses in the text, as well as a lot of other text.

I need to do a query

WHERE value LIKE '%' || ipValue || '%'

which would be like:

WHERE value LIKE '%192.168.2.1%'

The column is too big for an index:

[54000] ERROR: index row size 2872 exceeds btree version 4 maximum
2704 for index "rtfdata_value_index" Detail: Index row references
tuple (34153,51) in relation "rtfdata". Hint: Values larger than 1/3
of a buffer page cannot be indexed. Consider a function index of an
MD5 hash of the value, or use full text indexing.

An MD5 hash of the column value clearly will not work, and I don't think that a full-text index will allow me to search on the IP (since it's not a language word).

What can I do here?

My current best idea is to write a script that will create yet another table with a record_id and an ip_address column that will search each text column with a regex and pull out all the IPs and create 1 row per IP/record in this table. Then I can join this table when looking for the IP, instead of the wildcard search on the large text column.

Best Answer

You need a function which extracts the IP addresses into an array. The regexp you gave doesn't work for me, so I came up with my own crude one, you will probably want to tweak it to suit yourself:

create or replace function extract_ip(text) returns text[] immutable language SQL as $$
  select array_agg(x[1]) from regexp_matches($1,'\d+.\d+.\d+.\d+','g')f(x) 
 $$;
create index on foobar using gin (extract_ip(value));
select * from foobar where extract_ip(value) && ARRAY['192.168.2.1'];