Postgresql – Postgres on Engineyard space at 100%, pg_xlog filled up with files

disk-spacepostgresql

I have a database that is growing rapidly. VERY rapidly. My /db directory is at 100% now, and there are a bunch of new files in pg_xlog.

My background is in Oracle and MySQL. I do not know what is going on with this database. I know that I need to hire a Postgres DBA quickly.

Is there anything in the immediate term that I can do to fix my issue? I notice that deleting data does nothing to mitigate my space issue.

Output from pg_settings is:

archive_command             /bin/true                       configuration file
archive_mode                on                              configuration file
archive_timeout             1min                            configuration file
checkpoint_segments         100                             configuration file
checkpoint_timeout          5min                            configuration file
client_encoding             UTF8                            session
DateStyle                   ISO, MDY                        configuration file
default_statistics_target   100                             configuration file
default_text_search_config  pg_catalog.english              configuration file
effective_cache_size        1358MB                          configuration file
hot_standby                 on                              configuration file
hot_standby_feedback        on                              configuration file
lc_messages                 C                               configuration file
lc_monetary                 C                               configuration file
lc_numeric                  C                               configuration file
lc_time                     C                               configuration file
listen_addresses            *                               configuration file
log_destination             csvlog                          configuration file
log_line_prefix             %m:                             proc=%p,user=%u,db=%d,host=%r   configuration file
log_min_duration_statement  2s                              configuration file
log_rotation_age            1d                              configuration file
log_rotation_size           100MB                           configuration file
logging_collector           on                              configuration file
maintenance_work_mem        128MB                           configuration file
max_connections             512                             configuration file
max_files_per_process       65535                           configuration file
max_stack_depth             6MB                             configuration file
max_standby_streaming_delay -1                              configuration file
max_wal_senders             5                               configuration file
port                        5432                            configuration file
search_path                 public, "$user", public         session
shared_buffers              424MB                           configuration file
temp_tablespaces                                            configuration file
wal_buffers                 8MB                             configuration file
wal_keep_segments           128                             configuration file
wal_level                   hot_standby                     configuration file
wal_writer_delay            200ms                           configuration file
work_mem                    32MB                            configuration file

Best Answer

You can revise the checkpoint_segments setting in your postgresql.conf and issue SELECT pg_reload_conf(); as the postgres (super)user (in any database) to make that change live. What the change to that setting does is reduce the number of WAL segments that remain resident in your pg_xlog directory. If you have a lot of data churn in your db you will want to keep the checkpoint_segments setting at 32 minimum (see http://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-CHECKPOINT-SEGMENTS for more details).

Some general tips:

Ensure your autovacuum settings are pretty aggressive.
Move the pg_xlog directory to a different volume than the $PGDATA directory. I have measured performance increases from 10% - 30% from that change alone.
Set up a monitoring tool to keep an eye on table/index/database/server size increases. Eg. Nagios, New Relic,

Index

First and foremost, for your type of query this is the much better index:

CREATE INDEX de_tt_priceinfo_received_station_id_idx
  ON public.de_tt_priceinfo (station_id, received);  -- note the reversed order

Since the combination is supposed to be unique (I assume), I suggest a UNIQUE constraint on (station_id, receved) instead:

ALTER TABLE de_tt_priceinfo ADD CONSTRAINT de_tt_priceinfo_station_id_received
UNIQUE (station_id, received);

The index index_station_id is mostly superseded and can probably be dropped now.
The index de_tt_priceinfo_received_station_id_idx may still have its use.

Be sure to understand the logic behind all this:

Query

I would also consider the basic DISTINCT ON query:

SELECT DISTINCT ON (station_id)
       station_id, e5, e10, diesel, received
FROM   de_tt_priceinfo
WHERE  received <= '2014-09-25 08:45:12'::TIMESTAMPTZ
AND    station_id = ANY ('{0C91A93A-a-b-c-d, 578C44BB-a-b-c-d, 6F2F48A8-a-b-c-d
                         , 9982BE74-a-b-c-d, A24C612B-a-b-c-d, BEC3EF55-a-b-c-d
                         , F5137488-a-b-c-d}'::varchar[])
ORDER BY station_id, received DESC;

But since you seem to have a lot of rows per station, that's not going to shine. Instead:

SELECT *
FROM  (
   VALUES
     ('0C91A93A-a-b-c-d'::varchar)
    , ('578C44BB-a-b-c-d')
    , ('6F2F48A8-a-b-c-d')
    , ('9982BE74-a-b-c-d')
    , ('A24C612B-a-b-c-d')
    , ('BEC3EF55-a-b-c-d')
    , ('F5137488-a-b-c-d')
   ) s(station_id)
LEFT JOIN LATERAL (
    SELECT e5, e10, diesel, received
    FROM   de_tt_priceinfo
    WHERE  station_id = s.station_id
    AND    received <= '2014-09-25 08:45:12'::TIMESTAMPTZ
    ORDER  BY received DESC
    LIMIT  1
   )  p ON TRUE

This one should be dynamite in combination with above UNIQUE constraint (or an equivalent index).

Detailed explanation:

Table definition

For a table with millions of rows it pays to optimize storage while easily possible. Makes everything smaller and faster.

That's how I would design it:

CREATE TABLE station (
   station_id serial PRIMARY KEY
 , station    text
 , CHECK (length(station) < 61) -- ?? optional, you decide 
);

CREATE TABLE priceinfo (
   priceinfo_id serial PRIMARY KEY
 , station_id   integer NOT NULL REFERENCES station ON UPDATE CASCADE
 , received     timestamptz NOT NULL DEFAULT now(),
 , e5           integer  -- price in 0.1 Cent
 , e10          integer  -- price in 0.1 Cent
 , diesel       integer  -- price in 0.1 Cent
 , CONSTRAINT priceinfo_station_id_received UNIQUE (station_id, received)
);

CREATE INDEX priceinfo_received_idx ON public.priceinfo (received);

The row size in priceinfo would be 60 bytes (24 heap tuple header + null bitmap; 32 bytes data; 4 bytes item identifier), as compared to 94 bytes (24 + 66 + 4) in your original table. That's assuming 16-character string like in your example. Everything will be ~ 36 % smaller (or more?) and considerably faster.

The crucial index on (station_id, received) is down to 8 bytes of data per index tuple instead of 32 bytes or even much more (!) - each plus overhead. In addition, handling integer numbers for station_id is generally faster than text with a COLLATION on top of it.

Details:

Configuring PostgreSQL for read performance

Query would fetch station_id from station table first, which is cheap.

Prices are stored as integer numbers signifying 0.1 Cent. (4 bytes instead of 10 bytes for your original numeric(4,3) Multiply with 0.1 to get Cent or 0.001 to get € for display. Very simple and fast.

`UUID`

The string in the error message looks considerably longer and actually like a regular UUID number:

871828b4-37e5-419c-b7a5-cdbe1e1c0148

If so, use the uuid data type. Whether you adopt my design of keep your old. At least switch to the uuid data type for a big overall gain in every aspect:

Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

Best Answer

Related Solutions

Postgresql – Optimizing ORDER BY in a full text search query

PostgreSQL – Finding Current Prices for Fuel Stations at a Specific Time

Index

Query

Table definition

UUID

Related Question

`UUID`