I have a Postgres database with a column that has a large amount of text in it. There are potentially multiple IP addresses in the text, as well as a lot of other text.
I need to do a query
WHERE value LIKE '%' || ipValue || '%'
which would be like:
WHERE value LIKE '%192.168.2.1%'
The column is too big for an index:
[54000] ERROR: index row size 2872 exceeds btree version 4 maximum
2704 for index "rtfdata_value_index" Detail: Index row references
tuple (34153,51) in relation "rtfdata". Hint: Values larger than 1/3
of a buffer page cannot be indexed. Consider a function index of an
MD5 hash of the value, or use full text indexing.
An MD5 hash of the column value clearly will not work, and I don't think that a full-text index will allow me to search on the IP (since it's not a language word).
What can I do here?
My current best idea is to write a script that will create yet another table with a record_id
and an ip_address
column that will search each text column with a regex and pull out all the IPs and create 1 row per IP/record in this table. Then I can join this table when looking for the IP, instead of the wildcard search on the large text column.
Best Answer
You need a function which extracts the IP addresses into an array. The regexp you gave doesn't work for me, so I came up with my own crude one, you will probably want to tweak it to suit yourself: