Use conditional counting:
select count(case when year <= 1945 then 1 end) as pre1945,
count(case when year between 1946 and 1964 then 1 end) as period2,
count(case when year between 1965 and 1974 then 1 end) as period3,
...
from ...
where ...;
This works because count()
ignores null values and the case
statement returns a null
for values outside of the range it tests for (an else null
is implicit).
With the upcoming 9.4 version you can re-write this as
select count(*) filter (where year <= 1945) as pre1945,
count(*) filter (where year between 1946 and 1964) as period2,
count(*) filter (where year between 1965 and 1974) as period3,
...
from ...
where ...;
Query
This query should be substantially faster in any case:
SELECT parent_id, message_id, posted_at, share_count
FROM messages
WHERE feed_id = 7
AND posted_at >= '2015-01-01 4:0:0'
AND posted_at < '2015-04-28 4:0:0'
AND parent_id IS NULL -- match index condition
UNION ALL
(
SELECT DISTINCT ON(parent_id)
parent_id, message_id, posted_at, share_count
FROM messages
WHERE feed_id = 7
AND posted_at >= '2015-01-01 4:0:0'
AND posted_at < '2015-04-28 4:0:0'
AND parent_id IS NOT NULL -- match index condition
ORDER BY parent_id, posted_at DESC NULLS LAST
)
ORDER BY share_count DESC NULLS LAST, posted_at DESC NULLS LAST;
The CTE does nothing here that a plain subquery could not deliver also. And a CTE introduces an optimization barrier since it is executed separately and its result is materialized.
You have one more subquery-level than you actually need.
The expression (COALESCE(parent_id, message_id)
is not compatible with a plain index, you would need an index on that expression. But that may not be very useful either, depending on data distribution. Follow my links below for detailed information.
Splitting the simple case of parent_id IS NULL
into a separate SELECT
may or may not deliver the optimum. Especially not, if that's a rare case anyway, in which case a combined query with an index on (COALESCE(parent_id, message_id)
may perform better. Other considerations apply ...
Indices
Especially when supported with these indices:
CREATE INDEX messages_idx_null ON messages (
feed_id
, posted_at DESC NULLS LAST
, share_count DESC NULLS LAST
, parent_id, message_id
)
WHERE parent_id IS NULL;
CREATE INDEX messages_idx_notnull ON messages (
feed_id
, posted_at DESC NULLS LAST
, share_count DESC NULLS LAST
, parent_id, message_id
)
WHERE parent_id IS NOT NULL;
The two partial indices cover the whole table together and are about the same size together as a single total index.
The last two columns parent_id, message_id
only make sense if you get index-only scans out of it. Else remove them from both indices.
SQL Fiddle.
Depending on missing details, DISTINCT ON
may or may not be the best query technique for the purpose. Reade detailed explanation here:
And possibly faster alternatives here:
Best Answer
Boolean logic - or Three valued logic
x NOT IN (1, 2, NULL)
is the same asNOT (x = 1 OR x = 2 OR x = NULL)
x <> 1 AND x <> 2 AND x <> NULL
true AND true AND unknown
**unknown
**false
in this case as it will not pass theWHERE
condition **Now, this is why folk use
EXISTS
+NOT EXISTS
rather thanIN
+NOT IN
. Also see The use of NOT logic in relation to indexes for more** Note:
unknown
is the same asfalse
at the end of an expression in aWHERE
condition.While the expression is being evaluated, then it is unknown
See @kgrittn's comment below for why