Index
First and foremost, for your type of query this is the much better index:
CREATE INDEX de_tt_priceinfo_received_station_id_idx
ON public.de_tt_priceinfo (station_id, received); -- note the reversed order
Since the combination is supposed to be unique (I assume), I suggest a UNIQUE
constraint on (station_id, receved)
instead:
ALTER TABLE de_tt_priceinfo ADD CONSTRAINT de_tt_priceinfo_station_id_received
UNIQUE (station_id, received);
The index index_station_id
is mostly superseded and can probably be dropped now.
The index de_tt_priceinfo_received_station_id_idx
may still have its use.
Be sure to understand the logic behind all this:
Query
I would also consider the basic DISTINCT ON
query:
SELECT DISTINCT ON (station_id)
station_id, e5, e10, diesel, received
FROM de_tt_priceinfo
WHERE received <= '2014-09-25 08:45:12'::TIMESTAMPTZ
AND station_id = ANY ('{0C91A93A-a-b-c-d, 578C44BB-a-b-c-d, 6F2F48A8-a-b-c-d
, 9982BE74-a-b-c-d, A24C612B-a-b-c-d, BEC3EF55-a-b-c-d
, F5137488-a-b-c-d}'::varchar[])
ORDER BY station_id, received DESC;
But since you seem to have a lot of rows per station, that's not going to shine. Instead:
SELECT *
FROM (
VALUES
('0C91A93A-a-b-c-d'::varchar)
, ('578C44BB-a-b-c-d')
, ('6F2F48A8-a-b-c-d')
, ('9982BE74-a-b-c-d')
, ('A24C612B-a-b-c-d')
, ('BEC3EF55-a-b-c-d')
, ('F5137488-a-b-c-d')
) s(station_id)
LEFT JOIN LATERAL (
SELECT e5, e10, diesel, received
FROM de_tt_priceinfo
WHERE station_id = s.station_id
AND received <= '2014-09-25 08:45:12'::TIMESTAMPTZ
ORDER BY received DESC
LIMIT 1
) p ON TRUE
This one should be dynamite in combination with above UNIQUE
constraint (or an equivalent index).
Detailed explanation:
Table definition
For a table with millions of rows it pays to optimize storage while easily possible. Makes everything smaller and faster.
That's how I would design it:
CREATE TABLE station (
station_id serial PRIMARY KEY
, station text
, CHECK (length(station) < 61) -- ?? optional, you decide
);
CREATE TABLE priceinfo (
priceinfo_id serial PRIMARY KEY
, station_id integer NOT NULL REFERENCES station ON UPDATE CASCADE
, received timestamptz NOT NULL DEFAULT now(),
, e5 integer -- price in 0.1 Cent
, e10 integer -- price in 0.1 Cent
, diesel integer -- price in 0.1 Cent
, CONSTRAINT priceinfo_station_id_received UNIQUE (station_id, received)
);
CREATE INDEX priceinfo_received_idx ON public.priceinfo (received);
The row size in priceinfo
would be 60 bytes (24 heap tuple header + null bitmap; 32 bytes data; 4 bytes item identifier), as compared to 94 bytes (24 + 66 + 4) in your original table. That's assuming 16-character string like in your example. Everything will be ~ 36 % smaller (or more?) and considerably faster.
The crucial index on (station_id, received)
is down to 8 bytes of data per index tuple instead of 32 bytes or even much more (!) - each plus overhead. In addition, handling integer
numbers for station_id
is generally faster than text with a COLLATION
on top of it.
Details:
Query would fetch station_id
from station
table first, which is cheap.
Prices are stored as integer
numbers signifying 0.1 Cent. (4 bytes instead of 10 bytes for your original numeric(4,3)
Multiply with 0.1 to get Cent or 0.001 to get € for display. Very simple and fast.
UUID
The string in the error message looks considerably longer and actually like a regular UUID
number:
871828b4-37e5-419c-b7a5-cdbe1e1c0148
If so, use the uuid
data type. Whether you adopt my design of keep your old. At least switch to the uuid
data type for a big overall gain in every aspect:
Best Answer
I guess you only want to test if the restoration of the dump works and nothing more, which means you can do some unsafe configuration changes. Let's first start with your settings.
These one are a good call:
These three are only important if you are using replication, and because you already set
wal_level
to minimum, you are not using it, so they are not important:You have a lot of RAM which won't be used for anything, I would bump this one up even more:
I would leave
wal_buffers
to it's default:and bump
shared_buffers
(wall_buffers
will be automatically calculated):What you should try to concentrate is to have no, or as less checkpoints as you can during your restore. Checkpoints are controlled by
max_wal_size
andcheckpoint_timeout
. First of bumpcheckpoint_timeout
to something like20h
, so that a timed checkpoint never happens while you restore:Then you can set
max_wal_size
to a value as high as your disk space allows you. If your restored DB is200GB
and your disk500GB
, you should be safe to setmax_wal_size
to100GB
because Postgres can store up to two checkpoints of wals (which is 2xmax_wal_size
):min_wal_size
won't matter that much in your case, but you can probably bump it to 10GBI would also recommend that you use
pg_restore
with--jobs=NUM
where NUM would probably be the number of CPU cores, but it also depends on the speed of yourSSD
, so you can play with this parameter.Besides Postgres settings, I would also recommend that if possible you add an additional
SATA
drive (7200RPM
will be fine) and symlinkpg_wal
directory to thatSATA
disk. That is the directory where Postgres saves WALs, and because they are written in append,SATA
is fast enough for them. It will reduced the load on theSSD
, but will also mean you will be able to bumpmax_wal_size
even more (depending on the size of theSATA
disk).Finally don't forget to restore your settings to the sane values after the dump is restored.