Having issue with window function COUNT counting duplicates

redshiftwindow functions

I have a query like so:

select 
   distinct(line_items.email) as email,
   count(orders.order_id) over(partition by orders.order_id) as num_orders 
from 
   line_items 
   inner join orders on 
   line_items.order_id=orders.order_id

This is apart of a broader query where I need to be able to select from line_items vs. orders.

The problem here is that:

  • A user has multiple line items on a specific order (bought 2 different products)
  • The window function sees two rows and is still counting twice

I can't do count distinct because Redshift doesn't support count distinct as window function.

The records looks something like this:

line_item_id,order_id,email
123,1,bob@some.tld
124,1,bob@some.tld

The result of the above query looks like this:

email,num_orders
bob@some.tld,2

Is there a way I can get this to be:

email,num_orders
bob@some.tld,1

While retaining use of the window function?

Best Answer

It looks like you are trying to return the count of orders associated with an e-mail id? In which case I think you can change the partition to

count(orders.order_id) over(partition by line_items.email) as num_orders 

Rather then using a window function, you could also just do a group by

select t.email,
          count(*) as num_orders
     from (
  select line_items.email,
         orders.order_id
    from orders
    join line_items
      on orders.order_id = line_items.order_id
    group by line_items.email,
              orders.order_id
    ) t
    group by t.email