Postgresql – Postgres caching and bytea

cachepostgresql

I have a DB which, among other things, stores images (as bytea, if that's interesting). It also stores users' metadata, activity logs, etc., but images currently take 500MB out of the total 600MB data.

I'm a complete newbie, but if I understand correctly PG has (at least) 2 pools of buffers, one for indexes and one for "heap" (data).

I imagine that my images 'pollute' the heap cache, making virtually nothing else cached (the cache obviously is not large enough to accommodate all the images).

Can I stop the image blocks only (I don't mind the image indexes being cached) from being cached?

Best Answer

I think the first question is how much RAM the system has. If you have enough that the database fits comfortably in RAM then this is not a major problem. Basically the OS disk cache is very fast, and if the db is sitting there effectively, then you don't have anything to worry about. If you are on a system with, say, 4GB RAM and not a whole lot else running, I wouldn't worry.

The problem happens when this is not true, and then the shared buffers try to hold the most frequently used information. I would expect that caching would be ok there too. What you have to remember is that the PostgreSQL cache is slower than the OS cache because it provides features aimed at keeping frequently used data in the cache.

Of course all of the above depends highly on your query patterns.

EDIT

I've found Jeremiah Peschka's terrific article about table and index storage. With information from there, I could answer (2) as well. I set up a small test, so you can check these yourself.

-- we will need two extensions
CREATE EXTENSION pg_buffercache;
CREATE EXTENSION pageinspect;


-- a very simple test table
CREATE TABLE index_cache_test (
      id serial
    , blah text
);


-- I am a bit megalomaniac here, but I will use this for other purposes as well
INSERT INTO index_cache_test
SELECT i, i::text || 'a'
FROM generate_series(1, 1000000) a(i);


-- let's create the index to be cached
CREATE INDEX idx_cache_test ON index_cache_test (id);


-- now we can have a look at what is cached
SELECT c.relname,count(*) AS buffers
FROM 
    pg_class c 
    INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode 
    INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.relname
ORDER BY 2 DESC LIMIT 10;

             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2747
 pg_statistic_relid_att_inh_index |       4
 pg_operator_oprname_l_r_n_index  |       4
... (others are all pg_something, which are not interesting now)

-- this shows that the whole table is cached and our index is not in use yet

-- now we can check which row is where in our index
-- in the ctid column, the first number shows the page, so 
-- all rows starting with the same number are stored in the same page
SELECT * FROM bt_page_items('idx_cache_test', 1);

 itemoffset |  ctid   | itemlen | nulls | vars |          data
------------+---------+---------+-------+------+-------------------------
          1 | (1,164) |      16 | f     | f    | 6f 01 00 00 00 00 00 00
          2 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00
          3 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00
          4 | (0,3)   |      16 | f     | f    | 03 00 00 00 00 00 00 00
          5 | (0,4)   |      16 | f     | f    | 04 00 00 00 00 00 00 00
          6 | (0,5)   |      16 | f     | f    | 05 00 00 00 00 00 00 00
...
         64 | (0,63)  |      16 | f     | f    | 3f 00 00 00 00 00 00 00
         65 | (0,64)  |      16 | f     | f    | 40 00 00 00 00 00 00 00

-- with the information obtained, we can write a query which is supposed to
-- touch only a single page of the index
EXPLAIN (ANALYZE, BUFFERS) 
    SELECT id 
    FROM index_cache_test 
    WHERE id BETWEEN 10 AND 20 ORDER BY id
;

 Index Scan using idx_test_cache on index_cache_test  (cost=0.00..8.54 rows=9 width=4) (actual time=0.031..0.042 rows=11 loops=1)
   Index Cond: ((id >= 10) AND (id <= 20))
   Buffers: shared hit=4
 Total runtime: 0.094 ms
(4 rows)

-- let's have a look at the cache again (the query remains the same as above)
             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2747
 idx_test_cache                   |       4
...

-- and compare it to a bigger index scan:
EXPLAIN (ANALYZE, BUFFERS) 
SELECT id 
    FROM index_cache_test 
    WHERE id <= 20000 ORDER BY id
;


 Index Scan using idx_test_cache on index_cache_test  (cost=0.00..666.43 rows=19490 width=4) (actual time=0.072..19.921 rows=20000 loops=1)
   Index Cond: (id <= 20000)
   Buffers: shared hit=4 read=162
 Total runtime: 24.967 ms
(4 rows)

-- this already shows that something was in the cache and further pages were read from disk
-- but to be sure, a final glance at cache contents:

             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2691
 idx_test_cache                   |      58

-- note that some of the table pages are disappeared
-- but, more importantly, a bigger part of our index is now cached

All in all, this shows that indexes and tables can be cached page by page, therefore the answer for (2) is NO.

And a final one to illustrate temporary tables being non-cached here:

CREATE TEMPORARY TABLE tmp_cache_test AS 
SELECT * FROM index_cache_test ORDER BY id FETCH FIRST 20000 ROWS ONLY;

EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM tmp_cache_test ORDER BY id;

-- checking the buffer cache now shows no sign of the temp table

Postgresql – Battery backed write cache and Postgres

Literature

I gathered you some links you should read:

Possible Workflow

Basically your workflow might be the following, which is extracted as a combination of all tips of the above links:

Turn on postgres setting fsync if it is not.
Set synchronous_commit to "on" (or "local") if it is set to "off".
Run diskchecker.pl just for sanity to see if you bought something good and it is configured correctly.
See what sync method postgres selected automatically with show wal_sync_method;
Run pg_test_fsync
and use it's output to
1. check if you are affected by FLUSH CACHE EXT/SYNCHRONIZE CACHE problem.
2. decide on the basis of the additional information provided here (in paragraph wal_sync_method) and here (in paragraph test_fsync and below) what value to set wal_sync_method to.
Check if your BBU and file-system prohibit partial writes of 8kb pages, if yes you may disable full_page_writes (see last paragraph here).

Remark (why this might not be enough)

Note that this seems very reliable, but quoted from one of the links above:

Critics of the battery backup approach suggest that if you run such a system under load, eventually you’ll have a failure in this relatively complicated cache method that will corrupt your database in a way that’s nearly impossible to detect or recover from.

So even having a BBU and a correctly configured server instance, this does not remove the burden to backup everything regularly, and does not remove the possibility to loose some data.

Legal Note

I am not responsible for any lost data ;) neither due to lack of advice here or a wrong tip, nor through anything else.