Postgresql – What will happen with cached index or table when it is modified, will it stay in cache

cachepostgresql

if PostgreSQL loads something (for example an index) into cache, what happens if it (cached index) gets modified? Is it going to be removed from cache or not?

Best Answer

If by cache you mean shared_buffers, no, it won't be removed from there due to being changed. shared_buffers is the primary place where it does get changed. From there it will eventually get pushed out to the kernel, and then from there to disk.

SUGGESTION #1 : Use a Dedicated Key Cache for the Table

Did you know you could create a keycache dedicated to one or more MyISAM tables? Suppose you have a table called mydb.mytable and you want to load the entire mytable.MYI file into it. Here are the steps to set this up:

STEP 01) Get the physical size of the table's .MYI file

Run this query

SELECT index_length MYISize FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';

Suppose the query returns 1234567890 for MYISize. That's 1.114978 GB = 1177.3757 MB. Round that up to 1280M (that 1.25 GB).

STEP 02) Create the Buffer with that size

SET GLOBAL mykeycache.key_buffer_size = 1280 * 1024 * 1024;

STEP 03) Associate the cache with the indexes for mydb.mytable

CACHE INDEX mydb.mytable INTO mykeycache;

STEP 04) Load the Index Pages from the .MYI into the Dedicated Cache

LOAD INDEX INTO CACHE mykeycache;

That's all. All indexes for mydb.mytable are now in that cache only. Running these commands also removes the same index entries from the general MyISAM keycache.

You could create a startup script for mysql to preload the indexes.

I have written about this subject is past posts

Aug 28, 2012 : MySQL InnoDB Index in swap
Jun 09, 2012 : Can we assign mysql Key_buffer_size value for particular table?
Mar 20, 2012 : optimal table design mysql with primay key and varchar value
Jan 26, 2012 : Less RAM than Index_length MyISAM
Dec 30, 2011 : Is splitting a 'users' table for authentication purposes a good idea?

As a sidenote, I would like to suggestion something optional. You could speed up the access time of MyISAM data by changing the ROW_FORMAT of the MyISAM you want. For the same MyISAM table mydb.mytable, here is what you do:

ALTER TABLE mydb.mytable ROW_FORMAT=Fixed;

That's it. This a good and bad side to changing ROW_FORMAT to Fixed

GOOD SIDE : Queries hitting mydb.mytable will run 20-30% faster without any additional enhancements
BAD SIDE : This bloats all VARCHAR columns and treats them as CHAR. This will bloat indexes that have VARCHAR columns as well. This could increase table and index sizes 80-100%
My Past Posts Where I Discussed This
- May 10, 2011 : What is the performance impact of using CHAR vs VARCHAR on a fixed-size field?
- Jan 03, 2012 : Optimized my.cnf for high-end and busy server
- May 03, 2012 : Which is faster, InnoDB or MyISAM?

SUGGESTION #2 : Switch the table's Storage Engine to InnoDB

If you switch that table to InnoDB, you set up the InnoDB Buffer Pool in such a way that data and indexes that enter the Buffer Pool will linger a lot longer. This will simulate what your original question is asking: Partially invalidating a cache. That way, mysqldumps, full table scans, and queries like these will have roadblocks to cache stampedes. The options to tweak are innodb_old_blocks_pct and innodb_old_blocks_time.

According to the Configuration Options section of the MySQL Documentation for the InnoDB Buffer Pool:

Setting innodb_old_blocks_time greater than 0 prevents one-time table scans from flooding the new sublist with blocks used only for the scan. Rows in a block read in for a scan are accessed many times in rapid succession, but the block is unused after that. If innodb_old_blocks_time is set to a value greater than time to process the block, the block remains in the “old” sublist and ages to the tail of the list to be evicted quickly. This way, blocks used only for a one-time scan do not act to the detriment of heavily used blocks in the new sublist.

innodb_old_blocks_time can be set at runtime, so you can change it temporarily while performing operations such as table scans and dumps:
SET GLOBAL innodb_old_blocks_time = 1000;... perform queries that scan tables ...
SET GLOBAL innodb_old_blocks_time = 0;
This strategy does not apply if your intent is to “warm up” the buffer pool by filling it with a table's content. For example, benchmark tests often perform a table or index scan at server startup, because that data would normally be in the buffer pool after a period of normal use. In this case, leave innodb_old_blocks_time set to 0, at least until the warmup phase is complete.

PostgreSQL – Index Caching for Performance Improvement

Playing a bit with pg_buffercache, I could get answers to some of your questions.

This is quite obvious, but the results for (5) also show that answer is YES
I am yet to set up a good example for this, for now it is more yes than no :) (See my edit below, the answer is NO.)
Since the planner is who decides whether to use an index or not, we can say YES, it decides caching (but this is more complicated)
The exact details of caching could be derived from the source code, I couldn't find too much on this topic, except this one (see the author's answer, too). However, I'm pretty sure that this again is far more complicated than a simple yes or no. (Again, from my edit you can get some idea - since the cache size is limited, those 'sensible' indexes compete for available space. If they are too many, they will kick each other from cache - so the answer is rather NO.)
As a simple query with pg_buffercache shows, the answer is a definitive YES. It is worth to note that temporary table data don't get cached here.

EDIT

I've found Jeremiah Peschka's terrific article about table and index storage. With information from there, I could answer (2) as well. I set up a small test, so you can check these yourself.

-- we will need two extensions
CREATE EXTENSION pg_buffercache;
CREATE EXTENSION pageinspect;


-- a very simple test table
CREATE TABLE index_cache_test (
      id serial
    , blah text
);


-- I am a bit megalomaniac here, but I will use this for other purposes as well
INSERT INTO index_cache_test
SELECT i, i::text || 'a'
FROM generate_series(1, 1000000) a(i);


-- let's create the index to be cached
CREATE INDEX idx_cache_test ON index_cache_test (id);


-- now we can have a look at what is cached
SELECT c.relname,count(*) AS buffers
FROM 
    pg_class c 
    INNER JOIN pg_buffercache b ON b.relfilenode = c.relfilenode 
    INNER JOIN pg_database d ON (b.reldatabase = d.oid AND d.datname = current_database())
GROUP BY c.relname
ORDER BY 2 DESC LIMIT 10;

             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2747
 pg_statistic_relid_att_inh_index |       4
 pg_operator_oprname_l_r_n_index  |       4
... (others are all pg_something, which are not interesting now)

-- this shows that the whole table is cached and our index is not in use yet

-- now we can check which row is where in our index
-- in the ctid column, the first number shows the page, so 
-- all rows starting with the same number are stored in the same page
SELECT * FROM bt_page_items('idx_cache_test', 1);

 itemoffset |  ctid   | itemlen | nulls | vars |          data
------------+---------+---------+-------+------+-------------------------
          1 | (1,164) |      16 | f     | f    | 6f 01 00 00 00 00 00 00
          2 | (0,1)   |      16 | f     | f    | 01 00 00 00 00 00 00 00
          3 | (0,2)   |      16 | f     | f    | 02 00 00 00 00 00 00 00
          4 | (0,3)   |      16 | f     | f    | 03 00 00 00 00 00 00 00
          5 | (0,4)   |      16 | f     | f    | 04 00 00 00 00 00 00 00
          6 | (0,5)   |      16 | f     | f    | 05 00 00 00 00 00 00 00
...
         64 | (0,63)  |      16 | f     | f    | 3f 00 00 00 00 00 00 00
         65 | (0,64)  |      16 | f     | f    | 40 00 00 00 00 00 00 00

-- with the information obtained, we can write a query which is supposed to
-- touch only a single page of the index
EXPLAIN (ANALYZE, BUFFERS) 
    SELECT id 
    FROM index_cache_test 
    WHERE id BETWEEN 10 AND 20 ORDER BY id
;

 Index Scan using idx_test_cache on index_cache_test  (cost=0.00..8.54 rows=9 width=4) (actual time=0.031..0.042 rows=11 loops=1)
   Index Cond: ((id >= 10) AND (id <= 20))
   Buffers: shared hit=4
 Total runtime: 0.094 ms
(4 rows)

-- let's have a look at the cache again (the query remains the same as above)
             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2747
 idx_test_cache                   |       4
...

-- and compare it to a bigger index scan:
EXPLAIN (ANALYZE, BUFFERS) 
SELECT id 
    FROM index_cache_test 
    WHERE id <= 20000 ORDER BY id
;


 Index Scan using idx_test_cache on index_cache_test  (cost=0.00..666.43 rows=19490 width=4) (actual time=0.072..19.921 rows=20000 loops=1)
   Index Cond: (id <= 20000)
   Buffers: shared hit=4 read=162
 Total runtime: 24.967 ms
(4 rows)

-- this already shows that something was in the cache and further pages were read from disk
-- but to be sure, a final glance at cache contents:

             relname              | buffers
----------------------------------+---------
 index_cache_test                 |    2691
 idx_test_cache                   |      58

-- note that some of the table pages are disappeared
-- but, more importantly, a bigger part of our index is now cached

All in all, this shows that indexes and tables can be cached page by page, therefore the answer for (2) is NO.

And a final one to illustrate temporary tables being non-cached here:

CREATE TEMPORARY TABLE tmp_cache_test AS 
SELECT * FROM index_cache_test ORDER BY id FETCH FIRST 20000 ROWS ONLY;

EXPLAIN (ANALYZE, BUFFERS) SELECT id FROM tmp_cache_test ORDER BY id;

-- checking the buffer cache now shows no sign of the temp table

Best Answer

Related Solutions

Mysql – partially invalidate the MySQL table cache

SUGGESTION #1 : Use a Dedicated Key Cache for the Table

SUGGESTION #2 : Switch the table's Storage Engine to InnoDB

PostgreSQL – Index Caching for Performance Improvement

EDIT

Related Question