PostgreSQL Performance – JSONB_Set Performance in PostgreSQL

postgresqlpostgresql-performance

I searched the internet and PG documentation for hours and couldn't find any benchmark for it.

In MySQL 8.0 they specifically state that now the optimizer is smart and can do in-place modification of JSON column without updating the whole column. But I couldn't find any similar article about PG. jsonb is around for a long time already and it would be really expected that such common thing as updating a single field of large JSON would be optimized, but I couldn't find any notion about it. Moreover I found here an answer from 3 year ago (PostgreSQL update JSONB without jsonb_set) that proves that updating whole column as text is faster than using jsonb_set.

I'm trying to model a table with 2M rows, each will have 2 JSON fields with 3600 simple numeric fields (date: value). I will need to add 1 field to each JSON daily, meaning 4M json_sets.
Is it feasible, or I need to switch to some NoSQL like MongoDB for this feature?

Thanks

Best Answer

Every UPDATE in PostgreSQL creates a new version of the row. There is no in-place update. So not only would a new JSON be created, but also all other columns in the table would be copied.

Updating part of a JSON is not common in relational databases, or at least it shouldn't be. If you feel the need to do so, you have chosen the wrong data model, and you would be much better off using table columns instead of JSON attributes. If you then split the data across several tables using a process called “normalization”, an UPDATE doesn't hurt quite as much.

Related Solutions

Postgresql – Using MongoDB and PostgreSQL together

Some thoughts....

Typically one does not want to store pieces of tightly interrelated information in different systems. The chances of things getting out of sync is significant and now instead of one problem on your hands you have two. One thing you can do with Mongo though is use it to pipeline your data in or data out. My preference is to keep everything in PostgreSQL to the extent this is possible. However, I would note that doing so really requires expert knowledge of PostgreSQL programming and is not for shops unwilling to dedicate to using advanced features. I see a somewhat different set of options than you do. Since my preference is not something I see listed I will give it to you.

You can probably separate your metadata into common data, data required for classes, and document data. In this regard you would have a general catalog table with the basic common information plus one table per class. In this table you would have an hstore, json, or xml field which would store the rest of the data along with columns where you are storing data that must be constrained significantly. This would reduce what you need to put in these tables per class, but would allow you to leverage constraints however you like. The three options have different issues and are worth considering separately:

hstore is relatively limited but also used by a lot of people. It isn't extremely new but it only is a key/value store, and is incapable of nested data structures, unlike json and xml.

json is quite new and doesn't really do a lot right now. This doesn't mean you can't do a lot with it, but you aren't going to do a lot out of the box. If you do you can expect to do a significant amount of programming, probably in plv8js or, if you want to stick with older environments, plperlu or plpython. json is better supported in 9.3 though at least in current development snapshots, so when that version is released things will get better.

xml is the best supported of the three, with the most features, and the longest support history. Then again, it is XML.....

However if you do decide to go with Mongo and PostgreSQL together, note that PostgreSQL supports 2 phase commit meaning you can run the write operations, then issue PREPARE TRANSACTION and if this succeeds do your atomic writes in Mongo. If that succeeds you can then COMMIT in PostgreSQL.

PostgreSQL – Combining JSONB Query Statements

What I ended up doing using inspiration from MatheusOl answer.

INSERT INTO test (post_id, username, votes) 
VALUES (12345, 'testuser', '{"commentid" : {"vote": true }}'::jsonb) 
  ON CONFLICT ON CONSTRAINT test_pkey DO UPDATE

  SET votes = test.votes || coalesce(test.votes->'commentid', '{"commentid" : {"vote: true"}}'))
   || jsonb_set(test.votes, '{commentid,vote}', 'true'::jsonb) 

WHERE NOT test.votes @> EXCLUDED.votes
RETURNING *;

When using his answer, "commentid" would still be overwritten, but I don't really understand why.

This doesn't do things very efficiently, as it recreates the "commentid" key every time as a new object, but in my case it will have significantly more reads than writes, so it should be fine

Best Answer

Related Solutions

Postgresql – Using MongoDB and PostgreSQL together

PostgreSQL – Combining JSONB Query Statements

Related Question