PostgreSQL Index – Can Multi Indexing Speed Up AND Comparisons?

indexpostgresql

I am using PostgreSQL and have query of type:

SELECT * from t
WHERE a > 30 AND b < 20

where a and b are columns of table t. I have indexed columns a and b. Would that speed up this query or do I need to add a multi index (a, b)?

This post mentions that multi index speeds up query of type a = 3 and b = 4 and a = 3 but I am not sure if same applies for comparisons.

Also official PostgreSQL docs says that it can speed up queries of type SELECT name FROM test2 WHERE major = constant AND minor = constant, but in the later text it says many things which I don't really understand and hence asking this question here.

Best Answer

A multi-column index cannot be used for both conditions. For example, an index on (a, b) could be used for the condition a > 30, but not for b < 20.

If one of the conditions alone is selective enough and the other does not reduce the result set considerably, just create an index for that condition.

If you need index support for both conditions because neither of them is selective enough on its own, create two indexes, one for each condition. Then PostgreSQL can use a bitmap index scan on both indexes and a “bitmap or” to combine the results.

Related Solutions

Postgresql – Does the order of fields in SELECT query matter when using composite indexing

No, you can specify the 'params' (the parts of the where clause) in any order and the query optimizer will handle it. The optimizer will do the filtering in the order that it estimates is most efficient, but note that this is more complex than just choosing which order to filter: filtering might be done before or after joining for example.

You can't exactly prove this, but you can demonstrate it is true for a particular query by experimenting and seeing if the plan changes. It may even be true that there are edge cases where the order does matter, but my advice would be to ignore the possibility and assume it never happens as otherwise you will expend a lot of effort trying different permutations. Much better to focus on the kind of tuning which you know can pay dividends (eg correct indexing).

PostgreSQL – Optimizing Multi-Table GROUP BY Queries

I do wonder, why you have the report_type as attribute of the question?
Be that as it may, your objective:

The objective of the query is to figure out per county, district and report type for a specific question how many reports we have that have answered that question.

Why would you include ~~report_name~~ in GROUP BY step? That conflicts with your definition. I think you should remove that:

SELECT r.county, r.district, q.report_type
     , count(DISTINCT r.id) AS reports
FROM   question q 
JOIN   questionanswer qa ON qa.question_id = q.id
JOIN   report         r  ON qa.report_id = r.id
WHERE  q.name = 'touch' 
GROUP  BY 1,2,3;

Also, as long as you restrict the query to a single question, there is only one report_type in the result per definition. Including it in the result and GROUP BY clause doesn't change the numbers.

As for performance: either create a UNIQUE constraint on (question_id, report_id) (in that order!) like I suspect you should have:

ALTER TABLE questionanswer ADD CONSTRAINT qa_uni UNIQUE (question_id, report_id);

Or, barring that, at least create an index on (question_id, report_id).
Why is the order of columns in the index / constraint important?

Is a composite index also good for queries on the first field?

With the UNIQUE constraint in place, the query gets considerably cheaper:

     , count(*) AS reports

As long as you have only 40 questions you don't need an index on question.name, but as long as you select questions by name, you should still have a UNIQUE constraint on that column.

The PK on report does the rest.

Related query if you really want to count distinct counties and districts per question:

SELECT q.id, q.name, q.report_type
     , count(DISTINCT r.county)      AS distinct_counties
     , count(DISTINCT r.district)    AS distinct_districts
FROM   question q 
JOIN   questionanswer qa ON qa.question_id = q.id
JOIN   report         r  ON qa.report_id = r.id
WHERE  q.name = 'touch' 
GROUP  BY 1;  -- the PK column covers the whole table

Best Answer

Related Solutions

Postgresql – Does the order of fields in SELECT query matter when using composite indexing

PostgreSQL – Optimizing Multi-Table GROUP BY Queries

Related Question