PostgreSQL – Etymology of the hstore Datatype

hstorepostgresql

Where did the datatype get its name? Hash store?

Best Answer

The name of the additional module hstore is derived from "hash store", because it was inspired by perl hash.
Oleg Bartunov and Teodor Sigaev - the authors of the module - say so in their presentation here:

Hstore — key/value storage (inspired by perl hash)

Original question (dupes within `hstore` value)

I think the root cause of your problems is explained in this quote from the manual:

Each key in an hstore is unique. If you declare an hstore with duplicate keys, only one will be stored in the hstore and there is no guarantee as to which will be kept.

Bold emphasis mine.

You are mistaken in assuming that you could have the same key twice in a single hstore value.

I can also not reproduce your count of 2. I get a count of 1 for the key 1614. Start by not using count as column name. It is a reserved word (but allowed in Postgres).

I get (tested with Postgres 9.1.9):

WITH carts AS (
   SELECT '"1614"=>{:quantity=>"100"}, "1938"=>{:quantity=>"50"}, "1614"=>{:quantity=>"50"}, "1983"=>{:quantity=>"100"}, "1322"=>{:quantity=>"10"}, "1691"=>{:quantity=>"25"}, "1614"=>{:quantity=>"77"}, "1734"=>{:quantity=>"20"}'::hstore items
    )
SELECT key, count(*) AS ct FROM
  (SELECT (each(items)).key FROM carts) AS stat
GROUP BY key
ORDER BY ct DESC, key;

Result:

 key  | ct
------+----
 1322 |  1
 1614 |  1
 1691 |  1
 1734 |  1
 1938 |  1
 1983 |  1

Updated question (dupes across multiple rows)

To aggregate values you need a subquery (or CTE).

Simplified test case:

CREATE TEMP TABLE carts(c_id serial, items hstore); 
INSERT INTO carts(items) VALUES
   ('"1614"=>"100", "1938"=>"50", "1983"=>"100", "1322"=>"10", "1691"=>"25", "1734"=>"20"')
  ,('"1614"=>"50"');

Query:

SELECT item_ids, count(*) AS ct, sum(items) AS sum_items
FROM (
   SELECT (each(items)).key AS item_ids
         ,(each(items)).value::int AS items  -- assuming values can be cast to int
   FROM   carts
   ) sub
GROUP  BY 1           -- positional parameter is just a notational convenience
ORDER  BY 2 DESC;

Result:

 item_ids | ct | sum_items
----------+----+-----------
 1614     |  2 |       150
 1734     |  1 |        20
 1691     |  1 |        25
 1983     |  1 |       100
 1938     |  1 |        50
 1322     |  1 |        10

Postgresql – Optimizing PostgreSQL hstore for lots of small updates

With lots of UPDATEs, each one, no matter how small, will cause the entire row to be rewritten into a new version, as a consequence of the MVCC mechanism.
Then the old version of the row will be picked up at some point by autovacuum when it's certain that no transaction may need it, and its space will be flagged as reusable .
The constant turnover of disk space for old and new rows leads to fragmentation, especially if the rows are large in size.
Additionally, the whole set of operations are logged into the WAL files (unless the table is unlogged).

For these reasons, high frequency UPDATEs of large columns is a worst case scenario for postgres.

So, if that session data doesn't really require durable storage in the first place, a mixed disk-memory specialized key/store engine like Redis is likely to perform way better.

Otherwise, this other question: What fillfactor for caching table? has good information and advice on how to mitigate the difficulties.

Best Answer

Related Solutions

Postgresql – Postgres nested hstore select query

Original question (dupes within hstore value)

Updated question (dupes across multiple rows)

Postgresql – Optimizing PostgreSQL hstore for lots of small updates

Related Question

Original question (dupes within `hstore` value)