Postgresql – Postgres – how to randomize two columns in a table

postgresqlrandom

I have a table of User data which I would like to disguise/anonymize. Two of the columns have location data (e.g. lat and long, both BigDecimals).

I have anonymized all the names and other personalized information, and I would like to keep each lat/long pair together, however randomize them within the users, so that the locations stay accurate, but their association with a User is now random.

I assume this involves an UPDATE function, but I'm not sure how to perform it on the same table.

I am very new at postgres and would appreciate any help – thanks!

Best Answer

so that the locations stay accurate, but their association with a User is now random.

If I understand you right, it could work like this:

UPDATE tbl
SET lat = t.lat
  , lon = t.lon
FROM (
   SELECT t2.id, t1.lat, t1.lon
   FROM  (
      SELECT lat, lon, row_number() OVER () AS rn
      FROM   tbl t1
      ) t1
   JOIN  (
      SELECT id, row_number() OVER (ORDER BY random()) AS rn
      FROM   tbl t2
      ) t2 USING (rn)
   ) t
WHERE  tbl.id = t.id;

lat / lon have moved to another row, randomly. Every connection to it's original row is gone, but lat & lon stay together.