Postgresql – Do I risk losing the benefits of indexing if I have an index on every column

indexperformancepostgresqlquery-performance

I use PostgreSQL 9.2. I have a table with ~5 million rows and 150 columns. The table does not change at all (I replace it once a year). Users query this table with all kinds of filters on any some of the columns, e.g.

select * from table where C > 43 and H is not null;
select * from table where A is null and F < 10 and F > 1 and X > 2;

For performance I plan to create an index on every column of the table. Some feeling in my stomach tells me to ask the experts first: Is it good design for the above described use case to create an index on every column?

UPDATE:
I have to speculate about real use cases. I can't measure the exact queries yet. This is in design phase.

The server is well equipped with RAM and SSD storage, so queries are already "fast" now, and I can feel the effect of caching when I fire similar queries in sequence.

The columns are of types double, integer, timestamp and geometry (which explicitly gets a 'gist' index).

The queries will include from 1 to 10 columns. Usually ~6. Results will usually be <20k rows. Queries on a column will never relate to another column.

Thanks for all the explanations. What I will do:
* select 1/4th of the columns that I think will be most used and create indexes.
* wait for more testing/usage and start measuring/analysing the queries and use-cases then.

Thank you

Best Answer

The answer is (almost always) no.

Each index you create means more disk space used, longer maintenance windows, and a higher cost if the table which is never supposed to change (save for being replaced once a year) actually does get changed. They increase the size of your backups and make fitting into your disaster recovery window that much harder. Also, many of those indexes probably will never get used, so they're just dead weight.

Then, you have to think about multi-column indexes. In both of your examples, you have two columns. If you only index single columns, no single non-clustered index may be good enough and so the query would just go back to the clustered index. Then there are the three-column indexes, four-column indexes, etc. etc.

The better bet is to determine which queries are common and start indexing those. I'm not that familiar with postgresql internals, but there are some links floating around which can help set up statistics gathering. Once you get a fairly solid idea of which queries users are actually using, you can create indexes which can help. Even then, this isn't a perfect strategy: you're (metaphorically) building roads where people are walking, but the people creating those paths have to slog through it until you can find out where they're going and build the road to help. This is a reactionary process, but it probably would catch a fairly large percentage of common uses.

Because this is a read-only database, however, could reasonably safely create subsets of common data which fit a large percentage of user requests (either through separate tables or creating materialized views). You can also get away with more non-clustered indexes than you would want to for a typical OLTP table. There are certainly marginal costs to indexes, but they are lower when you don't need to worry too much about insertions, updates, and deletions. And if a large number of queries are built around date or some other similar field, you could possibly partition the table on that column.