PostgreSQL – Grouping Rows to Create New Group ID

postgresql

I am trying to achieve the below but was not able to so far, any help would be greatly appreciated.
I have this data (sorted from a query by id, anchor, date, and time) that I wish to group by common anchor :

id anchor date        time     'group' (the value to get)
3  2      2019-01-01  07:00     1
4  2      2019-01-01  08:00     1
5  3      2019-01-01  15:00     2
7  3      2019-01-01  16:00     2
10 3      2019-01-01  17:00     2

I'm looking to do a query in postgres where I can select this data and foreach set of common anchors, have a 'group number'
I then need a query to sum the anchor of points of same group, example above would become:

anchor sum group
2      4   1
3      9   2

thanks!

EDIT: McNets solution works perfect,
I have another case, with below data.
The anchor repeats but after a change of anchor: they're sorted by time, first it was anchor 2, then anchor 3, then again anchor 2.
I need to group after the change (ids 11 & 12) to have a new group number in this case

id anchor date        time     'group' (the value to get)
3  2      2019-01-01  07:00     1
4  2      2019-01-01  08:00     1
5  3      2019-01-01  15:00     2
7  3      2019-01-01  16:00     2
10 3      2019-01-01  17:00     2
11 2      2019-01-01  18:00     3
12 2      2019-01-01  19:00     3

Best Answer

Use DENSE_RANK function:

select
    anchor, sum(anchor) as "sum", grp  as "group"
from
    (select 
         anchor, dense_rank() over (order by anchor) as grp
     from
         t) t1
group by
    anchor, grp
order by
    grp;

anchor | sum | group
-----: | --: | ----:
     2 |   4 |     1
     3 |   9 |     2

db<>fiddle here

More examples about DENSE_RANK here.

Even though it is preferable to add a new question when there is a significant change, this is the answer at your edited question.

select anchor,
       sum(anchor) as "sum",
       grp as "group"
from (
      select anchor, dt, tm,
             sum(rst) over (order by dt, tm) as grp
      from (
            select anchor, dt, tm,
                   case when coalesce(lag(anchor) 
                                      over (order by dt, tm), 0) <> anchor 
                        then 1 end as rst
            from   t
           ) t1
     ) t2
group by anchor, grp
order by grp;

anchor | sum | group
-----: | --: | ----:
     2 |   4 |     1
     3 |   9 |     2
     2 |   4 |     3

db<>fiddle here

Related Solutions

PostgreSQL Query – Sum Column Prior to Date and Show All Entries After

WITH parametros AS (
   SELECT '2015-03-06'::timestamp AS fecha_desde  -- provide parameters here
        , '2015-03-12'::timestamp AS fecha_hasta
   )
SELECT tropa, max(fecha) AS fecha, sum(cantidad) AS sum_cantidad
     , sum(sum(cantidad)) OVER (PARTITION BY tropa
                                ORDER BY max(fecha)) AS saldo
FROM   movimiento_stock, parametros p
WHERE  fecha <= p.fecha_hasta
GROUP  BY tropa, CASE WHEN fecha >= p.fecha_desde THEN id END
ORDER  BY 1, 2;

The core feature is bold. Read the manual about CASE.
Older rows get NULL for id (the default in a CASE expression); id is NOT NULL in the underlying table, so collisions are not possible.

This groups all rows older than fecha_desde into one group per tropa and leaves newer rows ungrouped (form individual "groups"). It's unclear how to aggregate other columns, so I only included fecha.

The same, wrapped into an SQL function:

CREATE OR REPLACE FUNCTION informe_tldr(fecha_desde timestamp
                                      , fecha_hasta timestamp)
  RETURNS TABLE(tropa int, fecha timestamp, sum_cantidad int, saldo int) AS
$func$
SELECT m.tropa, max(m.fecha), sum(m.cantidad)::int
     , sum(sum(m.cantidad)) OVER (PARTITION BY m.tropa
                                  ORDER BY max(m.fecha))::int AS saldo
FROM   movimiento_stock m
WHERE  m.fecha <= fecha_hasta
GROUP  BY m.tropa, CASE WHEN fecha >= fecha_desde THEN m.id END
ORDER  BY 1, 2;
$func$ LANGUAGE sql;

SQL Fiddle.

PostgreSQL – How to Calculate Rolling Sum, Count, and Average Over Date Interval

The query you have

You could simplify your query using a WINDOW clause, but that's just shortening the syntax, not changing the query plan.

SELECT id, trans_ref_no, amount, trans_date, entity_id
     , SUM(amount) OVER w AS trans_total
     , COUNT(*)    OVER w AS trans_count
FROM   transactiondb
WINDOW w AS (PARTITION BY entity_id, date_trunc('month',trans_date)
             ORDER BY trans_date
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);

Also using the slightly faster count(*), since id is certainly defined NOT NULL?
And you don't need to ORDER BY entity_id since you already PARTITION BY entity_id

You can simplify further, though:
Don't add ORDER BY to the window definition at all, it's not relevant to your query. Then you don't need to define a custom window frame, either:

SELECT id, trans_ref_no, amount, trans_date, entity_id
     , SUM(amount) OVER w AS trans_total
     , COUNT(*)    OVER w AS trans_count
FROM   transactiondb
WINDOW w AS (PARTITION BY entity_id, date_trunc('month',trans_date);

Simpler, faster, but still just a better version of what you have, with static months.

The query you might want

... is not clearly defined, so I'll build on these assumptions:

Count transactions and amount for every 30-day period within the first and last transaction of any entity_id. Exclude leading and trailing periods without activity, but include all possible 30-day periods within those outer bounds.

SELECT entity_id, trans_date
     , COALESCE(sum(daily_amount) OVER w, 0) AS trans_total
     , COALESCE(sum(daily_count)  OVER w, 0) AS trans_count
FROM  (
   SELECT entity_id
        , generate_series (min(trans_date)::timestamp
                         , GREATEST(min(trans_date), max(trans_date) - 29)::timestamp
                         , interval '1 day')::date AS trans_date
   FROM   transactiondb 
   GROUP  BY 1
   ) x
LEFT JOIN (
   SELECT entity_id, trans_date
        , sum(amount) AS daily_amount, count(*) AS daily_count
   FROM   transactiondb
   GROUP  BY 1, 2
   ) t USING (entity_id, trans_date)
WINDOW w AS (PARTITION BY entity_id ORDER BY trans_date
             ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING);

This lists all 30-day periods for each entity_id with your aggregates and with trans_date being the first day (incl.) of the period. To get values for each individual row join to the base table once more ...

The basic difficulty is the same as discussed here:

Referencing current row in FILTER clause of window function

The frame definition of a window cannot depend on values of the current row.

And rather call generate_series() with timestamp input:

Generating time series between two dates in PostgreSQL

The query you actually want

After question update and discussion:
Accumulate rows of the same entity_id in a 30-day window starting at each actual transaction.

Since your data is distributed sparsely, it should be more efficient to run a self-join with a range condition, all the more since Postgres 9.1 does not have LATERAL joins, yet:

SELECT t0.id, t0.amount, t0.trans_date, t0.entity_id
     , sum(t1.amount) AS trans_total, count(*) AS trans_count
FROM   transactiondb t0
JOIN   transactiondb t1 USING (entity_id)
WHERE  t1.trans_date >= t0.trans_date
AND    t1.trans_date <  t0.trans_date + 30  -- exclude upper bound
-- AND    t0.entity_id = 114284  -- or pick a single entity ...
GROUP  BY t0.id  -- is PK!
ORDER  BY t0.trans_date, t0.id

SQL Fiddle.

A rolling window could only make sense (with respect to performance) with data for most days.

This does not aggregate duplicates on (trans_date, entity_id) per day, but all rows of the same day are always included in the 30-day window.

For a big table, a covering index like this could help quite a bit:

CREATE INDEX transactiondb_foo_idx
ON transactiondb (entity_id, trans_date, amount);

The last column amount is only useful if you get index-only scans out of it. Else drop it.

But it's not going to be used while you select the whole table anyway. It would support queries for a small subset.