Postgresql – Postgres random text in JSONB column

postgresqlquery

Following is the query I'm using to scrub some fields in the JSONB column. I'm trying to radomize the first and last name so would like to use something like md5(random()::text) as values.

    update people set
    data = to_jsonb(data) || '{"firstName": "random_text", "lastName": "random_text"}'
    where id = 'b3c09005-7afb-4ad6-922d-76078875e59e';

I tried replacing random_text with md5(...) but I get an error

"DETAIL: Token "md5" is invalid.".

I also tried using || to concat but that didn't work either.

Best Answer

You have to use the md5 hash as a part of the string so you have to concat it

CREATE TABLE people("id" varchar(50), "data" jsonb,"firstName" text, "lastName" text)

    update people set
    data = to_jsonb(data) || ('{"firstName": "' || MD5('PostgreSQL MD5') || '", "lastName": "' || MD5('PostgreSQL MD5') || '"}')::jsonb
    where id = 'b3c09005-7afb-4ad6-922d-76078875e59e';

db<>fiddle here

still md5 is old and not very secure, so you should use a more secure version

Related Solutions

Postgresql – How to set weights on a Postgres TSVECTOR column

Okay so this was dumb of me. The weights work correctly. The issue I was having was because of the forced UTF8 encoding (I simply set "utf8" on the db, didn't rebuild. I set it back to latin1 and it works now).

Postgresql text column character Limit

You are right in thinking that items of a list should be stored as individual values of a column, i.e. in separate rows of that column, rather than as a single value (CSV string or anything like that) – at least if you expect to have queries against individual items of the list. Relational databases are designed to work that way, and storing multiple values of a column as a list (single value) is a no-no in such cases.

Regarding your data sample of fruits and vegetables, I am assuming that you are not thinking of splitting two lists of values exactly like that, storing each list's elements alongside each other on the same row without there actually being a relationship between them, just for the sake of normalisation of the way the vegetables and fruits are stored. That would be wrong.

I mean, if currently you have a row with a list of fruits and a list of vegetables such that each fruit and each vegetable are related to this particular row, i.e. like this:

FoodsID  Fruits                      Vegetables     other  columns
-------  --------------------------  -------------  -----  -------
100      Apple,Avocado,Mango,Orange  Corn,Tomatoes  ...    ...

and you want to expand each list into a row set while keeping the relationship of each fruit and each vegetable to the Foods row, you do not just add more rows and put the first item of one list together with the first item of the other list on the same row, then the second item of each list on another row and so on, like this:

FoodsID  Fruits   Vegetables  ...
-------  -------  ----------  -----
100      Apple    Corn        ...
100      Avocado  Tomatoes    ...
100      Mango                ...
100      Orange               ...

Again, that would be wrong. Fruit Apple may well be related to food item 100 as may vegetable Corn, but if these two items are not related to each other, there is no reason to store them on the same row.

Commonly, when you want to store a relationship between a subset and an item, you use a separate table. As you have two subsets of different kinds that are related to the same item, you just use two tables. So it would be:

table Foods:

FoodsID  other  columns
-------  -----  -------
100      ...    ...

table Fruits:

FoodsID  Fruit
-------  -------
100      Apple
100      Avocado
100      Mango
100      Orange

table Vegetables:

FoodsID  Vegetable
-------  ---------
100      Corn
100      Tomatoes

where FoodsID in table Foods would be the primary key of that table and same-named column in tables Fruits and Vegetable would serve as a reference (foreign key) to the corresponding row in Foods.

This way, when querying against individual items of either list as belonging to that particular row in Foods, you would probably use a join. Your typical filter conditions would be as simple as Vegetable = 'some vegetable' or Fruit IN ('some fruit', 'some other fruit') – not FindPos(Vegetable, 'some vegetable') > 0 or anything of the kind. The conditions would be simpler because each column would now contain one value per row. That will lend well to further optimisation of performance, because when you query often against a column value (without applying a function to it first, that is), you can make such queries faster by adding an index on the column – something that would be pointless for lists stored as a single value.

More can be said on this topic, which would, however, be beyond the scope of your question. I suggest you look up the terms I highlighted in bold for more information.

Best Answer

Related Solutions

Postgresql – How to set weights on a Postgres TSVECTOR column

Postgresql text column character Limit

Related Question