Postgresql – Pre Caching Index on a large table in PostgrSQL

cacheindexpostgresqlpostgresql-12

I have a table with about 10mln rows in it with a primary key and an index defined on it:

create table test.test_table(
    date_info date not null,
    string_data varchar(64) not null,
    data bigint
    primary key(date_info, string_data));
    create index test_table_idx 
    on test.test_table(string_data);

I have a query that makes the use of the test_table_idx:

select distinct date_info from test.test_table where string_data = 'some_val';

The issue is that first time around it could take up to 20 seconds to run the query and < 2 seconds on any subsequent runs.

Is there a way to pull load the entire index into memory rather then have DB load information on first access?

Best Answer

You could use the additional module pg_prewarm. Has to be installed once per database. See:

PostgreSQL: Force data into memory

It can "prewarm" tables as well as indexes. To do it for your index:

SELECT pg_prewarm('test.test_table_idx');

Unless you get index-only scans (which you do not with the index at hand), you might want to prewarm the table as well:

SELECT pg_prewarm('test.test_table');

There are more parameters to narrow down what and how to prewarm. Follow the links.

This is costly and the system might be better off using the cache for something else. If you know the exact query ahead of time and it's a SELECT without side effects, you might just run the query to "prewarm" relevant data pages of index and table.

Aside, you might be better off rearranging PK and index like this:

    ...
    primary key(string_data, date_info);
    create index test_table_idx on test.test_table(date_info);

Now, the PK index can give you index-only scans for the query at hand. Might make a substantial difference. See:

Related Solutions

Mysql – How to optimize this theSQL table that will need to hold 5+ million rows

Since the table is InnoDB

Bulk Insert Buffer (bulk_insert_buffer_size) is no good for you because it handles bulk inserts for MyISAM tables only
DISABLE KEYS / ENABLE KEYS only works for MyISAM. InnoDB handles secondary index processing in the system tablespace ibdata1.
- InnoDB will throw a warning about this
- See my post : https://stackoverflow.com/a/9525780/491757
- See the bug report : http://bugs.mysql.com/bug.php?id=5187

You may have to resort to altering the InnoDB buffer protocol to handle only INSERTs : Mysql load from infile stuck waiting on hard drive

Unless you plan to have store values in MLSNUMBER bigger than 2147483647, you may want to consider making MLSNUMBER an INT (4 bytes) instead of BIGINT (8 bytes) to save space on secondary index creation. If your values for MLSNUMBER are less than 4294967296, maybe MLSNUMBER should be INT UNSIGNED.

MySQL looking up more rows than needed (indexing issue)

Your indexes are fine for the two types of queries you mentioned.

This query will be satisfied by traversing the clustered index on the primary key...

[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;

...and this one is satisfied by the index on 'question_id':

[...] WHERE question_id = x;

The output of EXPLAIN SELECT is not telling you what you think it is telling you, because the value shown in rows is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB these are based on index statistics.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

For InnoDB tables, this number is an estimate, and may not always be exact.

^{— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows}

The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN is the information the optimizer gathered about the plan it selected.

When type is ref and key is not NULL, this means that the name listed in the key column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.

Note, sometimes you will see Using index in the Extra column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.

It's possible that running ANALYZE [LOCAL] TABLE would cause the numbers in rows shown by EXPLAIN to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE is unlikely to make any actual difference in performance.

It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE, because you are not inserting rows in primary key order (as would be the case with an auto_increment primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.

Best Answer

Related Solutions

Mysql – How to optimize this theSQL table that will need to hold 5+ million rows

MySQL looking up more rows than needed (indexing issue)

Related Question