PostgreSQL – Form Groups of Consecutive Rows with Same Value

gaps-and-islandsgroup bypostgresqlpostgresql-8.4window functions

I have a situation I think can be solved using window function but I'm not sure.

Imagine the following table

CREATE TABLE tmp
  ( date timestamp,        
    id_type integer
  ) ;

INSERT INTO tmp 
    ( date, id_type )
VALUES
    ( '2017-01-10 07:19:21.0', 3 ),
    ( '2017-01-10 07:19:22.0', 3 ),
    ( '2017-01-10 07:19:23.1', 3 ),
    ( '2017-01-10 07:19:24.1', 3 ),
    ( '2017-01-10 07:19:25.0', 3 ),
    ( '2017-01-10 07:19:26.0', 5 ),
    ( '2017-01-10 07:19:27.1', 3 ),
    ( '2017-01-10 07:19:28.0', 5 ),
    ( '2017-01-10 07:19:29.0', 5 ),
    ( '2017-01-10 07:19:30.1', 3 ),
    ( '2017-01-10 07:19:31.0', 5 ),
    ( '2017-01-10 07:19:32.0', 3 ),
    ( '2017-01-10 07:19:33.1', 5 ),
    ( '2017-01-10 07:19:35.0', 5 ),
    ( '2017-01-10 07:19:36.1', 5 ),
    ( '2017-01-10 07:19:37.1', 5 )
  ;

I'd like to have a new group at each change on column id_type.
E.G. 1st group from 7:19:21 to 7:19:25, 2nd starting and finishing at 7:19:26, and so on.
After it works, I want to include more criteria to define groups.

At this moment, using the query below …

SELECT distinct 
    min(min(date)) over w as begin, 
    max(max(date)) over w as end,   
    id_type
from tmp
GROUP BY id_type
WINDOW w as (PARTITION BY id_type)
order by  begin;

I get the following result:

begin                   end                     id_type
2017-01-10 07:19:21.0   2017-01-10 07:19:32.0   3
2017-01-10 07:19:26.0   2017-01-10 07:19:37.1   5

While I'd like:

begin                   end                     id_type
2017-01-10 07:19:21.0   2017-01-10 07:19:25.0   3
2017-01-10 07:19:26.0   2017-01-10 07:19:26.0   5
2017-01-10 07:19:27.1   2017-01-10 07:19:27.1   3
2017-01-10 07:19:28.0   2017-01-10 07:19:29.0   5
2017-01-10 07:19:30.1   2017-01-10 07:19:30.1   3
2017-01-10 07:19:31.0   2017-01-10 07:19:31.0   5
2017-01-10 07:19:32.0   2017-01-10 07:19:32.0   3
2017-01-10 07:19:33.1   2017-01-10 07:19:37.1   5

After I solve this first step, I'll add more columns to use as rules to break groups, and these others will be nullable.

Postgres Version: 8.4 (We have Postgres with Postgis, so it is not easy to upgrade. Postgis Functions changes names and there are other problems, but hopefully we are already re writing everything and the new version will use a newer version 9.X with postgis 2.x)

Best Answer

For a few points,

Don't call a non-temporary table tmp that just gets confusing.
Don't use text for timestamps (you're doing that in your example we can tell because the timestamp didn't get truncated and has .0)
Don't call a field that has time in it date. If it has date and time, it's a timestamp (and store it as one)

Better to use a window function..

SELECT id_type, grp, min(date), max(date)
FROM (
  SELECT date, id_type, count(is_reset) OVER (ORDER BY date) AS grp
  FROM (
    SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset
    FROM tmp
  ) AS t
) AS g
GROUP BY id_type, grp
ORDER BY min(date);

Outputs

 id_type | grp |          min          |          max          
---------+-----+-----------------------+-----------------------
       3 |   0 | 2017-01-10 07:19:21.0 | 2017-01-10 07:19:25.0
       5 |   1 | 2017-01-10 07:19:26.0 | 2017-01-10 07:19:26.0
       3 |   2 | 2017-01-10 07:19:27.1 | 2017-01-10 07:19:27.1
       5 |   3 | 2017-01-10 07:19:28.0 | 2017-01-10 07:19:29.0
       3 |   4 | 2017-01-10 07:19:30.1 | 2017-01-10 07:19:30.1
       5 |   5 | 2017-01-10 07:19:31.0 | 2017-01-10 07:19:31.0
       3 |   6 | 2017-01-10 07:19:32.0 | 2017-01-10 07:19:32.0
       5 |   7 | 2017-01-10 07:19:33.1 | 2017-01-10 07:19:37.1
(8 rows)

Explaination

First we need resets.. We generate them with lag()

SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset
FROM tmp
ORDER BY date;

         date          | id_type | is_reset 
-----------------------+---------+----------
 2017-01-10 07:19:21.0 |       3 |         
 2017-01-10 07:19:22.0 |       3 |         
 2017-01-10 07:19:23.1 |       3 |         
 2017-01-10 07:19:24.1 |       3 |         
 2017-01-10 07:19:25.0 |       3 |         
 2017-01-10 07:19:26.0 |       5 |        1
 2017-01-10 07:19:27.1 |       3 |        1
 2017-01-10 07:19:28.0 |       5 |        1
 2017-01-10 07:19:29.0 |       5 |         
 2017-01-10 07:19:30.1 |       3 |        1
 2017-01-10 07:19:31.0 |       5 |        1
 2017-01-10 07:19:32.0 |       3 |        1
 2017-01-10 07:19:33.1 |       5 |        1
 2017-01-10 07:19:35.0 |       5 |         
 2017-01-10 07:19:36.1 |       5 |         
 2017-01-10 07:19:37.1 |       5 |         
(16 rows)

Then we count to get groups.

SELECT date, id_type, count(is_reset) OVER (ORDER BY date) AS grp
FROM (
  SELECT date, id_type, CASE WHEN lag(id_type) OVER (ORDER BY date) <> id_type THEN 1 END AS is_reset
  FROM tmp
  ORDER BY date
) AS t
ORDER BY date

         date          | id_type | grp 
-----------------------+---------+-----
 2017-01-10 07:19:21.0 |       3 |   0
 2017-01-10 07:19:22.0 |       3 |   0
 2017-01-10 07:19:23.1 |       3 |   0
 2017-01-10 07:19:24.1 |       3 |   0
 2017-01-10 07:19:25.0 |       3 |   0
 2017-01-10 07:19:26.0 |       5 |   1
 2017-01-10 07:19:27.1 |       3 |   2
 2017-01-10 07:19:28.0 |       5 |   3
 2017-01-10 07:19:29.0 |       5 |   3
 2017-01-10 07:19:30.1 |       3 |   4
 2017-01-10 07:19:31.0 |       5 |   5
 2017-01-10 07:19:32.0 |       3 |   6
 2017-01-10 07:19:33.1 |       5 |   7
 2017-01-10 07:19:35.0 |       5 |   7
 2017-01-10 07:19:36.1 |       5 |   7
 2017-01-10 07:19:37.1 |       5 |   7
(16 rows)

Then we wrap in a subselect GROUP BY and ORDER and select the min max (range)

SELECT id_type, grp, min(date), max(date)
FROM (
  .. stuff
) AS g
GROUP BY id_type, grp
ORDER BY min(date);

Related Solutions

Using the Correct PostgreSQL Server

Most likely, you have both Postgresql 8.4 and Postgresql 9.2 installed on this server.

Prior to pg9.0, the RPM packages for postgresql were configured such that you could only have one version installed at a time. Upgrading the RPM replaced the old version with the new.

From pg 9.0, the RPM packages have been configured to allow installation of multiple versions side by side. This is to facilitate in-place upgrade of databases (which requires both versions installed during the conversion). The packages are names postgresql92, postgresql93, etc.

Also note that it would not be possible for both versions to run simultaneously on the same port.

My guess is that your server was rebooted, and when it came up, either they were both configured to start and the 8.4 version started first, or perhaps the 9.2 version is not configured to start at boot at all.

You can confirm this with:

yum list installed "postgres*"

to see that you have both versions installed. You can check which versions are configured to start at boot with:

chkconfig --list | grep postgres

To stop the 8.4 version and start the 9.2 version, so that you can access your data:

service postgresql stop
service postgresql-9.2 start

To make sure that 9.2 starts up next time you boot, and 8.4 does not:

chkconfig postgresql-9.2 on
chkconfig postgresql off

All the above commands executed as root.

If you do not specifically need the 8.4 postgresql installed on your server I would recommend you remove it.

PostgreSQL – How to Connect to Remote EC2 Database

You need to work your way up the networking stack to determine where the issue is.

Can you ping the destination from the source?

Can you telnet to the postgres port(5432 by default) from the source?

Can you connect to the postgres service with a management tool(pgadmin or psql) from the source?

Best Answer

Explaination

Related Solutions

Using the Correct PostgreSQL Server

PostgreSQL – How to Connect to Remote EC2 Database

Related Question