PostgreSQL – How to Index Two Tables with One-to-Many Relation

postgresql

Take the following two tables:

  • table A has ~100k rows and 50 columns, and it is being updated only twice per month, how long this update takes is not an issue (as long as it doesn't take more than one day, which I think is unlikely :D). It will grow very slightly, not more than 10k rows per year.
  • table B has ~25k rows and 25 columns, column F is a foreign key to column C of table A. This table is being updated 100-1000 times a day, most of the time to add new rows, so it will easily grow to hundreds of thousands of rows.

Table A has a one-to-many relation to table B, both column F and column C must be not null, have their own index, and column C is also unique but it is not the primary key of table A. Every row of table B has a unique correspondence with a row in table A.

Both tables are being queried many thousands of times every day, individually or using their relation using an application that allows a few dozen different types of query.

  1. Is it a good idea to index every column of table A that is involved in a query, plus making a composite index for every type of query? If no, what are the possible drawbacks?

  2. For table B is it a good idea to make a composite index only for the most time consuming queries? Or should I make a composite index also for every column in table B that is being queried, since those queries may vary in the parameters but not in the form?

Best Answer

The thing about indexes are that they are excellent tools for speeding up queries, but they bring along overhead as well.

  1. If you choose to index every column in a table that is involved in a query, you have chosen to store considerably more data. This has an effect on the storage required and in the overhead of maintaining all those indexes. (Or maintaining all columns in a very wide index.)
  2. In past experience, having many indexes, especially several indexes with similar distribution statistics, can make it easier for the server to select a poor index and thus a poor plan.

My experience shows that indexing just about everything will hurt your performance overall, for the reasons mentioned above.

Both table A and table B should only contain the indexes that would be of best use for your queries.

Of course, your first indexing decisions may need to be refined over time as you learn more about your database's behavior. Do not be afraid to add additional indexes when needed. And do not be afraid to drop an index that is not being useful.