PostgreSQL – JSON Field Performance with Large Documents

jsonpostgresqlpostgresql-9.3

How well is the Postgre JSON type optimized for large JSON documents? I am especially concerned about partial retrieval (e.g. the cost of getting the last N items of a JSON array or looking up one particular item in a JSON dict) in situations where the JSON object is multiple MB in size and too large to be loaded efficiently in full.

Background: I am working with a dataset where each record has 10,000s of annotations. I do not need these annotations fully indexed, but do I need the record to insert quickly, so I am considering storing them in a JSON field instead of creating thousands of additional rows in a mapping table.

This relates to PostgreSQL 9.3.

Best Answer

In PostgreSQL 9.3, performance will be fairly poor for large json documents. The whole document must be parsed in order to access a field, index an array, etc. It's stored on disk as a simple text representation, the json text.

PostgreSQL 9.4 is likely to change this, with support for jsonb storage on disk. It'll still have to read the whole json object from disk to access any part of it, but it won't have the same kinds of parsing overheads you'll see in 9.3. It'll also gain many of the indexing features of hstore.

In 9.3, I strongly suspect that hstore, which is stored in a compact binary representation on disk and supports a variety of indexing features, is much more appropriate for your needs.