Replacing Substrings in PostgreSQL Array Elements

arraypostgresqlupdate

I have a table with an array of URLs in column urls. Now there's a need to update some URLs changing the hostname. E.g.:

{https://storage.host1.com/file.jpg, https://storage.host1.com/file2.jpg}

should become:

{https://data.newhost.net/file.jpg, https://data.newhost.net/file2.jpg}

Got stuck on array elements string replacement.

Best Answer

To only replace the leading hostname - and only update rows where anything changes to begin with:

UPDATE tbl t
SET    urls = t1.urls
FROM  (
   SELECT t.id, u.*
   FROM   tbl t
   LEFT   JOIN LATERAL (
      SELECT bool_or(true) FILTER (WHERE u.url ^@ 'https://storage.host1.com/') AS hit
           , array_agg(regexp_replace(url, '^https://storage.host1.com/', 'https://data.newhost.net/')) AS urls
      FROM   unnest(t.urls) u(url)
      ) u ON true
   ) t1
WHERE  t.id = t1.id
AND    t1.hit;

fiddle

Since elements are re-aggregated right after being unnested, original order should be preserved. But see:

Pretty expensive, as all arrays are unnested, each element is checked, and then all arrays are re-aggregated. There are no clean & easy index options to identify qualifying rows cheaply, and only process those. (You might use a workaround like a trigram index on the text representation of the array ...)

If you have to do this often, consider normalizing your DB design, with a 1:n relationship to a separate urls table. Would make this operation much simpler and faster.