With lots of UPDATEs, each one, no matter how small, will cause the entire row to be rewritten into a new version, as a consequence of the MVCC mechanism.
Then the old version of the row will be picked up at some point by autovacuum when it's certain that no transaction may need it, and its space will be flagged as reusable .
The constant turnover of disk space for old and new rows leads to fragmentation, especially if the rows are large in size.
Additionally, the whole set of operations are logged into the WAL files (unless the table is unlogged).
For these reasons, high frequency UPDATEs of large columns is a worst case scenario for postgres.
So, if that session data doesn't really require durable storage in the first place, a mixed disk-memory specialized key/store engine like Redis is likely to perform way better.
Otherwise, this other question: What fillfactor for caching table? has good information and advice on how to mitigate the difficulties.
I run a test myself and figured out that changing the DB strcture helps a lot the performance of the first query and just a bit of the second one.
My configuration
My PostgreSQL configuration is default except the working memory, which is set to yours (set work_mem = '64MB'
). I'm working on an SSD on a laptop with PostgreSQL 9.5.1. And I'm no expert!
New schematization
With this setup we put the tags in a separate table to easilly check they are unique.
CREATE TABLE IF NOT EXISTS tags (
id serial NOT NULL
, tag text NOT NULL UNIQUE
, PRIMARY KEY (id)
);
CREATE TABLE IF NOT EXISTS datapoints (
fk_tag integer NOT NULL
, time timestamp without time zone NOT NULL
, value double precision
, FOREIGN KEY (fk_tag)
REFERENCES tags (id)
ON UPDATE CASCADE -- customize as needed
ON DELETE RESTRICT -- customize as needed
);
Populate tables
I filled the database with random data but 10x less than in your case or it would take me > 1 h to complete the inserts. Any result I will obtain from benchmarking, will also be multiplied by 10 to estimate a table of your size.
Notice! I do not know how many tags you have. It's just hypotetical.
INSERT INTO tags (tag)
SELECT md5(random()::text) || md5(random()::text)
FROM generate_series(1, 100000); -- Hypothetical number of tags!
INSERT INTO datapoints (fk_tag, time, value)
SELECT trunc(random() * (SELECT max(id) - min(id) FROM tags) + (SELECT min(id) FROM tags) -- 99999 + 628940
, current_timestamp
, random()
FROM generate_series(1,16272771);
-- 10x less rows than you have.
-- This INSERT took ~ 7 minutes to finish.
A little bit of cleanup before we start and we let the query planner gather some statistics about the tables.
VACUUM;
ANALYZE tags;
ANALYZE datapoints;
Comparing with your hardware
First of all I'll create a table like yours (still 10x smaller) to estimate the difference between my and your hardware speed.
SELECT t.tag
, d.time
, d.value
INTO datapoints_joined
FROM datapoints AS d
INNER JOIN tags AS t
ON d.fk_tag = t.id;
By running the two queries as you wrote them on datapoints_joined, the results 246.15766 s and 232.87162 s (already multiplied by 10) show that my computer is slower by 2.67 and 3.30 times, so approximately 3x slower.
Any result I will obtain from benchmarking, will also be divided by 3 to estimate a hardware of your speed. So basically it should be:
my execution time * 10/3 = your execution time
[I do really hope that all my estimates are ok...]
First query
SELECT tag FROM tags;
Since the tag column is already UNIQUE
, there is no need for a DISTINCT
.
Result
EXPLAIN ANALYZE SELECT tag FROM tags;
Seq Scan on tags (cost=0.00..2235.00 rows=100000 width=65) (actual time=0.011..34.362 rows=100000 loops=1)
Planning time: 0.059 ms
Execution time: 42.998 ms -- SUCH SPEED, MUCH WOW!
On your machine (multipying the execution time by 10/3) it should take 143.33 ms = 0.14333 s which is ~519x faster than 74371.153 ms. Awesome!
Why is faster
In your case you were performing a DISTINCT
query on the tag column which is varchar/text. This requires a checkup of the whole string each time.
In my case the column is declared as UNIQUE
and is places in a separate table. This allows a simple select of all the tags which is obviously faster. The insertion of new tags, on the other hand, requires more because of the UNIQUE
check.
Second query
Now we perform a JOIN
to get the actual tag text instead of the foreign key, but we do it after the GROUP BY
on the integer field fk_tag to increase the speed of the joining.
SELECT t.tag
, d.minTime
, d.maxTime
FROM (
SELECT fk_tag
, min(time) AS minTime
, max(time) AS maxTime
FROM datapoints
GROUP BY fk_tag
) AS d
INNER JOIN tags AS t
ON d.fk_tag = t.id;
Hash Join (cost=391905.20..395209.83 rows=97915 width=81) (actual time=17849.685..17995.566 rows=99999 loops=1)
Hash Cond: (datapoints.fk_tag = t.id)
-> HashAggregate (cost=388420.20..389399.35 rows=97915 width=12) (actual time=17741.489..17807.022 rows=99999 loops=1)
Group Key: datapoints.fk_tag
-> Seq Scan on datapoints (cost=0.00..266375.40 rows=16272640 width=12) (actual time=0.006..2720.880 rows=16272771 loops=1)
-> Hash (cost=2235.00..2235.00 rows=100000 width=69) (actual time=108.116..108.116 rows=100000 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 11181kB
-> Seq Scan on tags t (cost=0.00..2235.00 rows=100000 width=69) (actual time=0.011..46.569 rows=100000 loops=1)
Planning time: 0.241 ms
Execution time: 18005.401 ms -- MEH, NOT SO FAST
On your machine (multipying the execution time by 10/3) it should take 60018.00 ms = 60.018 s which is ~1.45x faster than 86922.564 ms. Not a huge result, but still better than nothing.
A few considerations
Since we are selecting most of the table, PostgreSQL chooses the sequential reading over the index and is right about it. This is the reason I used no index overall.
If you need the first query more than the second one, maybe this can be a solution for you. Please try it on your machine to be sure.
Best Answer
Depending on how many different data sets there are, one option would be to partition the tables per-dataset.
When a dataset is updated,
BEGIN
a new transaction,TRUNCATE
the table,COPY
the new data into it, andCOMMIT
. PostgreSQL has an optimisation whereCOPY
ing into a table that's beenTRUNCATE
d in the same transaction does much less I/O if you're usingwal_level = minimal
(the default).If you cannot partition and truncate (say, if you're dealing with tens or hundreds of thousands of data sets, where there'd just be too many tables) you'll instead want to crank autovacuum up to run as much as it can, make sure you have good indexes on anything you delete based on, and be prepared for somewhat ordinary performance.
If you don't need crash safety - you don't mind your tables being empty after a system crash - you can also create your tables as
UNLOGGED
, which will save you a huge amount of I/O cost.If you don't mind having to restore the whole setup from a backup after a system crash you can go a step further and also set
fsync=off
, which basically says to PostgreSQL "don't bother with crash safety, I have good backups and I don't care if my data is permanently and totally unrecoverable after a crash, and I'm happy to re-initdb
before I can use my database again".I wrote some more about this in a similar thread on Stack Overflow about optimising PostgreSQL for fast testing; that mentions host OS tuning, separating WAL onto a different disk if you're not using
unlogged
tables, checkpointer adjustments, etc.There's also some info in the Pg docs for fast data loading and non-durable settings.