PostgreSQL Crosstab – How to Add Extra Columns

pivotpostgresql

I am using PostgreSQL 9.6 and have following schema

CREATE TABLE activitylog (
    sessionid text NOT NULL,
    studentid integer NOT NULL,
    activity text NOT NULL,
    created timestamp without time zone DEFAULT now() NOT NULL
);

The schema is used to log visitors activity. Each visit is assigned a session id which is unique for the visit. Sessions ends when user logs out. Every user can log in and log out multiple times.

INSERT INTO activitylog VALUES
  ( 'abc', 1, 'logged in'   , '2017-08-12T02:39:05.753088Z' ),
  ( 'abc', 1, 'page visited', '2017-08-12T02:49:05.753088Z' ),
  ( 'abc', 1, 'logged out'  , '2017-08-12T02:59:05.753088Z' ),
  ( 'def', 1, 'logged in'   , '2017-08-12T03:09:05.753088Z' ),
  ( 'def', 1, 'page visited', '2017-08-12T03:19:05.753088Z' ),
  ( 'def', 1, 'logged out'  , '2017-08-12T03:29:05.753088Z' ),
  ( 'ghi', 2, 'logged in'   , '2017-08-12T03:39:05.753088Z' ),
  ( 'ghi', 2, 'page visited', '2017-08-12T03:59:05.753088Z' ),
  ( 'ghi', 2, 'logged out'  , '2017-08-12T03:59:05.753088Z' );

Required Result

I want to select data from the table such that I get result in following format

|  sessionid  |  studentid  |          loggedin           |          loggedout          |      interval      |
| ----------- | ----------- |       --------------        |         -----------         | ------------------ |
|     abc     |      1      | 2017-08-12T02:39:05.753088Z | 2017-08-12T02:59:05.753088Z |       20min        |
|     def     |      1      | 2017-08-12T03:09:05.753088Z | 2017-08-12T03:29:05.753088Z |       20min        |
|     ghi     |      2      | 2017-08-12T03:39:05.753088Z | 2017-08-12T03:59:05.753088Z |       20min        |

Basically,

  1. Pivot table
  2. Add studentid column
  3. Calculate interval

What I have tried so far

I have successfuly piovoted the table

SELECT * FROM crosstab(
$$
select sessionid, activity, created from activitylog
where activity = 'logged in' OR activity = 'logged out'
ORDER  BY 1, 2
$$
)
AS sessioninfo (sessionid text, loggedin timestamp, loggedout   timestamp);

Sample Result

I tried to add studentid extra column using this query but it didn't give me desired result

SELECT * FROM crosstab(
$$
select sessionid, studentid, activity, created from
activitylog
where activity = 'logged in' OR activity = 'logged out'
ORDER  BY 1, 2
$$,
$$
VALUES (1), (0)
$$
)
AS sessioninfo (sessionid text, studentid INTEGER, 
loggedin timestamp, loggedout   timestamp);

But this places null in logged in and logged out timestamps.

Sample Result

To sum it up

  1. What is the purpose of VALUES with crosstab()? To be honest I am not completely sure what purpose second parameter of crosstab() serves and why in almost all examples it is VALUES.
  2. How do I add studentid to result?
  3. How do I calculate interval between two columns that are result of crosstab()?

Best Answer

What you want is something like this,

SELECT *
FROM crosstab(
        $$
                SELECT sessionid, studentid, activity, created
                FROM activitylog
                WHERE activity IN ('logged in', 'logged out')
                ORDER BY 1, 2
        $$,
        $$
                VALUES ('logged in'), ('logged out')
        $$
) AS sessioninfo (
        sessionid text,
        studentid int,
        loggedin timestamp,
        loggedout timestamp
)
CROSS JOIN LATERAL age(loggedout, loggedin);

Output:

 sessionid | studentid |          loggedin          |         loggedout          |   age    
-----------+-----------+----------------------------+----------------------------+----------
 abc       |         1 | 2017-08-12 02:39:05.753088 | 2017-08-12 02:59:05.753088 | 00:20:00
 def       |         1 | 2017-08-12 03:09:05.753088 | 2017-08-12 03:29:05.753088 | 00:20:00
 ghi       |         2 | 2017-08-12 03:39:05.753088 | 2017-08-12 03:59:05.753088 | 00:20:00
(3 rows)

What is the purpose of VALUES with crosstab()? To be honest I am not completely sure what purpose second parameter of crosstab() serves and why in almost all examples it is VALUES.

It specifies the categories. It can also be a query but here there is no need for that and being explicit was even more terse. You're interested in two categories those are loggedin, and loggedout. Age being a function of those two categories is calculated with CROSS JOIN LATERAL.

VALUES itself is a SQL expression. We could have written SELECT * FROM VALUES () AS t(category) but that's just more work.

How do I add studentid to result?

Just like in my demonstration.

How do I calculate interval between two columns that are result of crosstab()?

We used age. However one caveat about this we should hit right now..

Your schema is specified as timestamp without time zone. That's certainly a mistake. When that crosses DST, if ever, you're going to be in a world of pain. It'll show the time elapsed in the rollover.