PostgreSQL – JSONB with Indexing vs Hstore

database-designpostgresql

I am trying to decide on the database design, with as few assumptions (regarding how the web app actually evolves) as possible at this stage.

As a first step, understanding that JOINS are expensive, I am considering a small number of monolithic tables as opposed to a large number of normalized smaller tables. As a second point, I am confused between using hstore vs. regular tables vs. JSONB (with GiST indexing).

AFAIK (please feel free to correct):

  1. Generally, in Postgres, hstore is known to perform better than other data types. This presentation from FOSDEM PGDAY has some interesting stats (in the second half of the slides).
    https://wiki.postgresql.org/images/b/b4/Pg-as-nosql-pgday-fosdem-2013.pdf

  2. An advantage with hstore is the fast indexing (GiN or GiST). However, with JSONB, GiN and GiST indexing can also be applied to JSON data.

  3. This blog from a professional at 2nd Quadrant says "At this point it’s probably worth replacing hstore use with jsonb in all new applications" (scroll to the end):
    http://blog.2ndquadrant.com/postgresql-anti-patterns-unnecessary-jsonhstore-dynamic-columns/

So I would like to decide on the following:

  1. For the main (structured) part of the data: should it go in a couple of
    relational tables (relatively large with many columns), or should it be a number of key-value stores using hstore?
  2. For the ad hoc (user contributed/unstructured) data, should it be in JSON or ad hoc key value stores in hstore (with the keys stored in one of the main relational tables)?

Best Answer

Relational databases are designed around joins, and optimized to do them well.

Unless you have a good reason not to use a normalized design, use a normalised design.

jsonb and things like hstore are good for when you can't use a normalized data model, such as when the data model changes rapidly and is user defined.

If you can model it relationally, model it relationally. If you can't, consider json etc. If you're choosing between json/jsonb/hstore, generally choose jsonb unless you have a reason not to.

That's what I said in my blog post, which addresses just this topic. Please read the whole post. The paragraph you quoted points out that if you're choosing a dynamic structure you should choose jsonb over hstore, but the rest of the blog post is about why you should usually prefer to model relationally if you can.

So. Model the main structured part relationally. If the tables are really wide with lots of columns, this might be a sign that further normalization is required. Do not be afraid of joins. Learn to love joins. Joining many small tables will often be faster than querying and maintaining big denormalized tables. Denormalize only if you need to for specific cases, and preferably via materialized views ... but don't do it until you know you need to and have an actual concrete problem to solve.

For user-contributed data that's freeform and unstructured, use jsonb. It should perform as well as hstore, but it's more flexible and easier to work with.

One relevant thing to understand: GiST and GIN indexes like those used on jsonb are generally much less efficient than a plain b-tree index. They're more flexible, but a b-tree index on a normal column will almost always be much, much faster.