Postgresql – Update new column with value from old columns for all the rows

database-designmigrationpostgresql

I've got an odd db design in place that was here when I got on my new work space. We have a column named params which contains some string data which appears that it belongs in more than one column, this is the example value of column params :

"---
:body: "Hello\r\nContacting you that I'm ok with your plan.\r\nPlease send me regular updates\r\nThanks"
:emp_id: 4818
"

So I was planning to separate body and emp_id in their own columns so that this table can behave like normal one so we can query the data like normal. Having the data in this format is very hard to query, yet they have silly queries to get/match values.

Anyways I wrote some of my regexes to extract those values and save them to these colums (body) :

select a[1]
from (
    select regexp_matches(params, ':emp_id: ([0-9]+)', 'g') a
    from exchange where id = 483193
) s

for emp_id

select b[1]
from (
    select regexp_matches(params, 'body: (.+):emp', 'g') b
    from exchange where id = 483193
) s

Where id is an id of one exchange. I want to do this for all rows, I tried something but was spinning in the circles, this looked to be most promising but doesn't work :

update exchange 
set emp_id = (
  select a[1]
  from (
   select regexp_matches(parameters, ':emp_id: ([0-9]+)', 'g') a
   from exchange where id in (select id from exchange)
  ) s
) where

How can I update all rows? To fill the new columns emp_id and body from existing values in params?

Update:

I've got a massive performance issue here. My db has 500k+ records. And doing update like this :

UPDATE exchange
SET emp_id=cast(subquery.a[1] as int)
FROM (select regexp_matches(params, ':emp_id: ([0-9]+)', 'g') a
    from exchange) AS subquery

Takes really long time. Can this be optimized in some manner?

Best Answer

The solution is very simple. If you do

UPDATE exchange
SET emp_id = {some expression here on the params column}; -- no WHERE clause!

it will do exactly what you want: pick the params from the row to be updated, calculate the expression and update emp_id with its value. As there is no WHERE clause, it will go through the whole table. You don't need the ID of the row or anything else.