PostgreSQL – Resolving Inaccurate Count with Multiple Count Functions

countpostgresql

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.890

I'm relatively new to SQL and am learning a lot of about writing queries as in my current work role. I encountered this problem that puzzles me. If I do a count of subscriptions that are paid with a null churn date, I get a count of 1279. This is the correct number. But once I add the count of customers that installed shopify, this count jumps to 1835. Why would this happen? Can anybody help me out? Thanks!

select 
    count (s.account_id)
  , count (case when c.type ilike 'shopifychannel' and s.status = 'paid' 
                then c.created_at else null end) as installed_shopify
from subscriptions s
join channels c on s.account_id = c.account_id
where s.status = 'paid' 
  and s.churned_at is null

Count s.id will count the total number of customers who have s.status = 'paid'
Count case when 'shopifychannel' will count the total number of customers who have paid and installed a shopify channel. These are the outputs I am attempting to get.

Best Answer

I suppose you have a problem with your channels table that may return more than one row per s.account_id. I propose this query :

SELECT 
    COUNT (DISTINCT s.account_id),
    COUNT (DISTINCT CONCAT(s.account_id, c.type)) AS installed_shopify
FROM subscriptions s
LEFT JOIN channels c ON s.account_id = c.account_id AND c.type ILIKE 'shopifychannel'
WHERE s.status = 'paid'
  AND s.churned_at IS NULL

Again, you must be sure that there is only (zero or one) row in channels for each account_id having the condition c.type ilike 'shopifychannel'.

The LEFT JOIN assure you don't miss a row in your primary table subscriptions. The condition c.type ILIKE 'shopifychannel' is in the LEFT JOIN because if you put it in the WHERE clause, you'll also lose subscriptions lines that don't have installed Shopify.