Postgresql – Should I create an index for non key columns

indexpostgresql

I have a table in a PostGreSql database defined as following:

CREATE TABLE public."MATCH"(
    "ITEM_A_ID" bigint DEFAULT 0,
    "ITEM_B_ID" bigint DEFAULT 0,
    "OWNER_A_ID" bigint DEFAULT 0,
    "OWNER_B_ID" bigint DEFAULT 0,
    "OTHER_DATA" varchar(100) NOT NULL DEFAULT ''
    CONSTRAINT "MATCH_PK" PRIMARY KEY ("ITEM_A_ID","ITEM_B_ID")
);

It will contain a lot of rows. There will be a lot of queries like the following performed on this table:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id;
SELECT * FROM "MATCH" WHERE "OWNER_B_ID" = owner_b_id;

I was thinking about creating indexes on OWNER_A_ID and OWNER_B_ID, since these columns are not keys. Is this a good idea, and if yes, how should I create these? Should I create one index with both columns? Should I create two indexes? Should I include other columns?

Best Answer

The selection of enough indexes is often difficult. In your case it should be useful to create two indexes.

You should only create one index with both columns if your query always include the first column as a condition:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id [AND "OWNER_B_ID" = owner_b_id]

The whole B-Tree is built upon the order of columns in the index! You can't fully use a multi-column index on a, b on the following queries:

SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = owner_a_id OR "OWNER_B_ID" = owner_b_id
SELECT * FROM "MATCH" WHERE "OWNER_A_ID" = "OWNER_B_ID"

If you're only using equality checks you might consider a hash index. But postgresql has some disadvantages you should check first.

On other dbms you should consider adding additional columns in the index as data. This would be useful if you query these specific columns and not * because the dbms wouldn't need to feed the data from the table after using the index.

An important factor: indices fragment over the time (unless you aren't performing any insert/update/delete on the table). Please check whether your dba has some optimization operations installed.

Pleae check the documentation for additional options like FILLFACTOR or partial indexes: http://www.postgresql.org/docs/9.3/static/sql-createindex.html

Related Solutions

Postgresql – postgres composite index design

I would try making two composite partial indexes like

CREATE INDEX idx1 ON tw_schedules (scenario_id, type_well_id) WHERE type_well_id IS NOT NULL;

CREATE INDEX idx2 ON tw_schedules (scenario_id, tw_import_id) WHERE tw_import_id IS NOT NULL;

But, of course, everything depends on the selectivity of these indexes.

MySQL looking up more rows than needed (indexing issue)

Your indexes are fine for the two types of queries you mentioned.

This query will be satisfied by traversing the clustered index on the primary key...

[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;

...and this one is satisfied by the index on 'question_id':

[...] WHERE question_id = x;

The output of EXPLAIN SELECT is not telling you what you think it is telling you, because the value shown in rows is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB these are based on index statistics.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

For InnoDB tables, this number is an estimate, and may not always be exact.

^{— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows}

The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN is the information the optimizer gathered about the plan it selected.

When type is ref and key is not NULL, this means that the name listed in the key column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.

Note, sometimes you will see Using index in the Extra column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.

It's possible that running ANALYZE [LOCAL] TABLE would cause the numbers in rows shown by EXPLAIN to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE is unlikely to make any actual difference in performance.

It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE, because you are not inserting rows in primary key order (as would be the case with an auto_increment primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.

Best Answer

Related Solutions

Postgresql – postgres composite index design

MySQL looking up more rows than needed (indexing issue)

Related Question