To be specific, I have an event table that logs when a user joins or leaves a team. It looks something like the following:
-------------------------------------
| user | event | team | timestamp |
-------------------------------------
| A | joined | 1 | 2016-1-1 |
| B | joined | 1 | 2016-1-1 |
| C | left | 1 | 2016-1-1 |
| C | joined | 2 | 2016-1-1 |
| A | left | 1 | 2016-1-2 |
| A | joined | 2 | 2016-1-2 |
| B | left | 1 | 2016-1-3 |
| A | left | 2 | 2016-1-3 |
-------------------------------------
I need to restructure this in a view to look something like the following
--------------------------------------
| user | team | joined | left |
--------------------------------------
| A | 1 | 2016-1-1 | 2016-1-2 |
| A | 2 | 2016-1-2 | 2016-1-3 |
| B | 1 | 2016-1-1 | 2016-1-3 |
| C | 1 | null | 2016-1-1 |
| C | 2 | 2016-1-1 | null |
--------------------------------------
How can I achieve this?
For more detail, I'm attempting to do this in Amazon Redshift (PostgreSQL)
Best Answer
Assuming all columns
NOT NULL
. And 'left' is never earlier than the associated 'joined'.Simple case
If users can join teams only once (which would be enforced by a
UNIQUE
constraint on("user", team)
, ideally), then the solution is a simpleGROUP BY
and works in Redshift as well as in most any RDBMS:Note the
NULLS FIRST
clause. Seems like you want to sort an open start withjoined IS NULL
first. Redshift supports that, too.Apart from that it's the most basic form of a crosstab / pivot query.
Not so simple
Judging from your column names and example data it might not be so simple. If users can join teams more than once (non-overlapping), you have to do more. You wouldn't want to merge multiple team memberships into one row like in this related answer:
Instead you have to pair up adjacent 'joined' and 'left' rows somehow. There are many ways ...
Postgres 9.4+
For modern Postgres I like this best:
Using the aggregate
FILTER
clause in a window function as well as in an aggregate function. Related (with links to alternatives):This way we count how many times the same user joined the same team, so we can group adjacent rows. Works for a missing
'joined'
at the begin or a missing'left'
at the end as well.Redshift
... does not support the new
FILTER
clause. We can substitute with a plain oldCASE
:SQL Fiddle.
Aside: you shouldn't use reserved words as identifiers, even if it's permitted by Redshift (or Postgres).