Postgresql – Select on multiple tables with group by

aggregategroup byjoin;postgresqlwindow functions

Using Postgres 9.3, I have problem with a select over few tables.
I have simcards stock and I would like to get something like this:

| network | supplier | total topups (EUR) | 1st topup qty|   
| three   | three    | 500                | 34           |  
| three   | core     | 300                | 4            |  
| o2      | o2       | 100                | 3            |  

| network | supplier | total topups (EUR) | 2nd topup qty|   
| three   | three    | 200                | 4            |  

| network | supplier | total topups (EUR) | 3rd topup qty|   
| three   | three    | 200                | 4            |  

my table schemas are:

  • simcard(id, iccid(unique), network_id, supplier_id)
  • topup(id, date, amount, simcard_id)
  • network(id, name)
  • supplier(id, name)

at the moment I am able to select all first topups, but I don't know how to display them in a format as I would like (network| supplied by| topups sum| 1st topup qty)
Then I have no idea how to select 2nd, 3rd topup for every simcard. I tried with limit 1 offset 1 or offset 2 but it didn't work.

select 
    distinct topup.iccid    
    , s.created_date
    , n.name as "network"
    , p.name as "supplier"
from
(
select 
    s.iccid as "iccid"
    , t.date

from 
    simcard_topup as t
    join simcard_simcard as s
        on t.iccid_id = s.id
--where 
--  CAST(t.date AS DATE) = date '2015-03-13' - integer '14'

group by
    s.iccid
    , t.date
order by
    t.date asc
) as topup
join simcard_simcard as s
    on topup.iccid = s.iccid
join simcard_network as n
    on s.network_id = n.id
join simcard_supplier as p
    on s.supplier_id = p.id

Best Answer

SELECT st.topup_nr, n.name AS network, p.name AS supplier
     , st.topup_sum, st.topup_qty
FROM  (
   SELECT t.topup_nr, s.network_id, s.supplier_id
        , sum(t.amount) AS topup_sum
        , count(*) AS topup_qty
   FROM  (
      SELECT simcard_id, amount
           , row_number() OVER (PARTITION BY simcard_id, ORDER BY date) AS topup_nr
      FROM   simcard_topup
      ) t
   JOIN   simcard_simcard s ON s.id = t.simcard_id  -- or so I assume
   GROUP  BY 1,2,3
   ) st
JOIN   simcard_network  n ON n.id = st.network_id
JOIN   simcard_supplier p ON p.id = st.supplier_id
ORDER  BY st.topup_nr, st.topup_qty DESC NULLS LAST
                     , st.topup_sum DESC NULLS LAST;

Notes

  • The core feature is the window function row_number() in the inner subquery to number topups per simcard. This makes aggregating for 1st, 2nd etc. topup in the outer query simple.

  • For lack of information, assuming to join simcard_simcard to simcard_topup on s.id = t.simcard_id, though you also have on t.iccid_id = s.id in your code, which seems backwards ...

  • Don't use a basic type name like date as column name.

  • Why do you have CAST(t.date AS DATE) in your code? A date should be a date, so no need to cast it to date. Also, it seems we only need the date for ranking topups.

  • Counting per (network_id, supplier_id) and sorting the result by ... topup_qty DESC, topup_sum DESC, so the combination with the most topups comes first. NULLS LAST is just to be safe against NULL values.