If you create the index before loading the table, the time taken to load the data will be significantly increased.
pre load:
create table my_table1(val integer);
create index my_index1 on my_table1(val);
insert into my_table1(val) select generate_series(1,100000) order by random();
Time: 31755.858 ms
post load:
create table my_table2(val integer);
insert into my_table2(val) select generate_series(1,100000) order by random();
Time: 15344.130 ms
create index my_index2 on my_table2(val);
Time: 4073.686 ms
If you are ok with that, with pg_restore
you can:
- Load just the schema using
--schema-only
- Create the index with
--index
- Load the data using
--data-only
Of course "Buy more storage" may well be the best answer here...
For starters gid
should probably be a numeric type. integer
should be good enough or bigint
if the key space shouldn't be big enough. Much smaller footprint, faster processing than with character data, faster and smaller indexes.
More importantly, to improve performance I suggest database normalization.
Quote:
There is a fairly regular pattern where each word appears about 1000 times.
Create a separate table for unique words:
CREATE TABLE word (
word_id serial
, word text
);
Fill it with unique instances of word
in your big_tbl
:
INSERT INTO word (word)
SELECT DISTINCT word
FROM big_tbl
ORDER BY word;
ORDER BY
is optional, not needed for query at hand. But it speeds up index creation and might be cheaper overall.
The table should be small in comparison: only ~ 50k rows for 50M rows in your big table.
Add indexes after filling the table:
ALTER TABLE word
ADD CONSTRAINT word_word_uni UNIQUE (word) -- essential
, ADD CONSTRAINT word_word_id_pkey PRIMARY KEY (word_id); -- expendable?
If those are read-only tables, you can do without the pk. It's not relevant to the operations at hand.
Replace your big table with a much smaller new table. You may have to lock the big table to avoid concurrent writes. Concurrent reads are not a problem.
CREATE TABLE big_tbl_new AS
SELECT b.gid -- or the suggested smaller, faster numeric replacement
, w.word_id, b.stat
FROM big_tbl b
JOIN word w USING (word)
ORDER BY word; -- sorting by word helps query at hand
ORDER BY
clusters the data (once) making the query at hand faster, because far fewer blocks have to be read (unless your data is clustered mostly already). The sort carries a cost, weigh cost and benefit once more.
DROP big_tbl; -- make sure your new table has all data!
ALTER big_tbl_new RENAME TO big_tbl;
Recreate indexes:
ALTER TABLE big_tbl ADD CONSTRAINT big_tbl_gid_pkey PRIMARY KEY (gid); -- expendable?
CREATE INDEX big_tbl_word_id_idx ON big_tbl (word_id); -- essential
Your query looks like this now and should be faster:
SELECT b.*
FROM word w
JOIN big_tbl b USING (word_id)
WHERE w.word = 'something';
Reorganization is meant to be a one-time operation to re-organize your data. Keep the new form and also consider keeping indexes permanently.
All of this together (including new indexes) should occupy about half of what you had before on disk, also cutting the time for creation in half (at least). Index creation should be considerably faster, the query as well. If RAM is a limiting factor, these modification pay double.
If you have to write to the table as well, it becomes more expensive (but you did not mention anything about that). You'd need to adjust your logic for DELETE
/ UPDATE
/ INSERT
:
Example for INSERT
: Fetch word_id
for existing words or insert a new row in word
returning the new word_id. Details for this:
How do I insert a row which contains a foreign key?
Best Answer
The way to go is using a GIST index. This sort of index helps checking if a value is contained within a range.
Because you want to filter also by
person_id
, you will need to install thebtree_gist
extension. In addition, you should convert thevalid_from
andvalid_until
columns to a singletstzrange
column, which is a range column that holdstimestamp with time zone
range limits.After doing that, you can create an index on
person_id
and the new range column which you can callvalid_range
:Good luck!