Postgresql – Getting both Having and Not Having in same row

aggregate-filteroptimizationpostgresql

I have a query which I get the customers that have more than 1 orders
However, I would like to get the number of customer that has only 1 orders as well and divide them like 1 orders / more than 1 order

So that my table will have 3 colums

one_order, more_than_one_order, divided

I tried different approaches like duplicating the query like =1 but it is not the proper way

Query

SELECT COUNT(*)
FROM (
select customer_id from orders GROUP BY customer_id HAVING COUNT(order_id)>1
) A

Best Answer

You can use a case expression like:

SELECT count(case when cnt = 1 then 1 end) as one_order
     , count(case when cnt > 1 then 1 end) as more_than_one_order
FROM (
    select customer_id, count(1) as cnt from orders GROUP BY customer_id
) as T

You can also use a filter clause for the count aggregates:

SELECT count(1) filter (where cnt = 1) as one_order
     , count(1) filter (where cnt > 1) as more_than_one_order
FROM ...

For the third attribute you can either reuse the aggregates:

select count(1) filter (where cnt = 1) as ...
     , count(1) filter (where cnt > 1) as ...
     , count(1) filter (where cnt = 1) / count(1) filter (where cnt > 1) as ...

or add another level of nesting:

select one_order
     , more_than_one_order
     , one_order / more_than_one_order 
from (
    SELECT count(1) filter (where cnt = 1) as one_order
         , count(1) filter (where cnt > 1) as more_than_one_order
    FROM (
        select customer_id, count(1) as cnt 
        from orders 
        GROUP BY customer_id
    ) as T1
) as T2

Related Solutions

Mysql – How does the MySQL Query Optimizer react to a SELECT COUNT sentence

The two queries have a very big difference:

----- query 1
SELECT COUNT(*) 
FROM customers 
WHERE ID > 10000 
  AND country = 'US' ;

----- query 2
SELECT * 
FROM customers 
WHERE ID > 10000 
  AND country = 'US' ;

While the second query returns all rows that match the WHERE conditions, the first one has an aggregate function (COUNT()) in the SELECT list, so it does an aggregation, a collapsing of rows that match the conditions into one row and returns only one number, the number of rows that match the conditions.

So, for the first query, there is no sensible reason to have an ORDER BY. The result is one row only. Even more, it should produce an error as the rows (that have been collapsed into one) may have different values in the country and created_at columns. So, which one should be used for the ordering (say in a case where you had a GROUP BY and the result set was more than one rows)?

You can test at SQL-Fiddle that SQL-Server, when you add ORDER BY country, created_at, it produces the error:

Column "customers.country" is invalid in the ORDER BY clause because it is not contained in either an aggregate function or the GROUP BY clause.

An error is produced in Postgres, too.

But even in MySQL that may allow such non-standard syntax, to add ORDER BY in the first query, the optimizer is smart enough to not take that into account for the execution plan. There is nothing to order. One row will be returned anyway. You can check that by viewing the execution plans with EXPLAIN. Simple test at SQL-Fiddle: Mysql-test

Oracle (version 11g2) seems to allow such nonsense too. You can see the execution plan here: Oracle-test. Not sure how the plan should be interpreted but it seems that Oracle at least knows that it's one row only so the "sorting" operation is not costly.

Mysql – How to profile a particularly slow or inefficient query

You didn't provide CREATE TABLE scripts; thus, it's hard to give you a specific advice; I'll try to describe general approach....

If you want to speed up this query, first of all, make sure you have indexes on the columns that define join condition, used as a filter (WHERE), or column[s] you have in GROUP BY or ORDER BY. In general, a good sign that some indexes are missed is when EXPLAIN shows "NULL" in key,key_len and "ALL" in type column (not always though; for instance if you select all rows from the table, or table is small enough, so full scan is faster than index seek + lookup). Then you may want to tweak some indexes to make them covering for this query.

Side note. The query seems to me like a BI query which normally is not executed against OLTP database. When dealing with a big data, it makes sense to build a few cubes based on data from operational db and query them instead of original data. The nature of OLTP implies high level of normalization and optimization for INSERT/UPDATE/DELETE, not for SELECTs (still possible, but queries can be very long, not clear, and quite slow).

Best Answer

Related Solutions

Mysql – How does the MySQL Query Optimizer react to a SELECT COUNT sentence

Mysql – How to profile a particularly slow or inefficient query

Related Question