No, you can specify the 'params' (the parts of the where
clause) in any order and the query optimizer will handle it. The optimizer will do the filtering in the order that it estimates is most efficient, but note that this is more complex than just choosing which order to filter: filtering might be done before or after joining for example.
You can't exactly prove this, but you can demonstrate it is true for a particular query by experimenting and seeing if the plan changes. It may even be true that there are edge cases where the order does matter, but my advice would be to ignore the possibility and assume it never happens as otherwise you will expend a lot of effort trying different permutations. Much better to focus on the kind of tuning which you know can pay dividends (eg correct indexing).
I do wonder, why you have the report_type
as attribute of the question?
Be that as it may, your objective:
The objective of the query is to figure out per county, district and
report type for a specific question how many reports we have that have
answered that question.
Why would you include report_name
in GROUP BY
step? That conflicts with your definition. I think you should remove that:
SELECT r.county, r.district, q.report_type
, count(DISTINCT r.id) AS reports
FROM question q
JOIN questionanswer qa ON qa.question_id = q.id
JOIN report r ON qa.report_id = r.id
WHERE q.name = 'touch'
GROUP BY 1,2,3;
Also, as long as you restrict the query to a single question, there is only one report_type
in the result per definition. Including it in the result and GROUP BY
clause doesn't change the numbers.
As for performance: either create a UNIQUE
constraint on (question_id, report_id)
(in that order!) like I suspect you should have:
ALTER TABLE questionanswer ADD CONSTRAINT qa_uni UNIQUE (question_id, report_id);
Or, barring that, at least create an index on (question_id, report_id)
.
Why is the order of columns in the index / constraint important?
With the UNIQUE
constraint in place, the query gets considerably cheaper:
, count(*) AS reports
As long as you have only 40 questions
you don't need an index on question.name
, but as long as you select questions by name
, you should still have a UNIQUE
constraint on that column.
The PK on report
does the rest.
Related query if you really want to count distinct counties and districts per question:
SELECT q.id, q.name, q.report_type
, count(DISTINCT r.county) AS distinct_counties
, count(DISTINCT r.district) AS distinct_districts
FROM question q
JOIN questionanswer qa ON qa.question_id = q.id
JOIN report r ON qa.report_id = r.id
WHERE q.name = 'touch'
GROUP BY 1; -- the PK column covers the whole table
Best Answer
A multi-column index cannot be used for both conditions. For example, an index on
(a, b)
could be used for the conditiona > 30
, but not forb < 20
.If one of the conditions alone is selective enough and the other does not reduce the result set considerably, just create an index for that condition.
If you need index support for both conditions because neither of them is selective enough on its own, create two indexes, one for each condition. Then PostgreSQL can use a bitmap index scan on both indexes and a “bitmap or” to combine the results.