Postgresql – CASE statement not returning expected results

aggregateaggregate-filtercasejoin;postgresql

First time using Postgres and building a query with a CASE statement. I am trying to end up with a collection of data whereby one of the columns in the data (status) qualifies the row based on conditions.

I have 3 tables in a Postgres DB as follows:

Table devices:

id serial_number retired last_reported_utc_at other_data
2 1234 FALSE 2020-12-01 15:34:23
5 4567 FALSE 2020-12-01 15:34:23
6 NULL FALSE NULL

Table device_measurements:

id device_id reading read_utc_at
1 5 1.36 2020-12-01 15:34:23
2 2 2.45 2020-12-12 18:21:12
3 6 1.87 2020-12-03 20:29:12

Table devices_meters:

id device_id meter_id reading activated_utc_at deactivated_utc_at
1 5 23 1.36 2018-10-01 15:34:23 NULL
2 2 17 2.45 2019-06-12 18:21:12 NULL
3 6 17 2.45 2019-06-12 18:21:12 NULL

devices_meters links the device to a meter location. The activated and deactivated dates will sequence the history of the device to meters.

The status for each device can be one of:

active, online, offline, inactive, retired 

I have been trying to generate a column status with a CASE statement to flag each row as one of the above. The query runs, but the resulting data is not as expected.

My CASE statement is:

   CASE
        // Device retired boolean column flag set to true
        WHEN devices.retired = true
            THEN 'retired'

        // Device with null serial number
        WHEN devices.retired = false
            AND devices.serial_number IS NULL
            THEN 'inactive'

        // Device has not reported back last 24 hrs
        WHEN
            devices.retired = false
            AND MAX(device_measurements.reported_utc_at) < :yesterdayFilter
            THEN 'offline'

        // Device has reported back last 24hrs and has non zero readings for data in this period
        WHEN
            devices.retired = false
            AND MAX(device_measurements.reported_utc_at) BETWEEN :yesterdayFilter AND :nowFilter
            AND SUM(device_measurements.flow) > 0
            THEN 'active'

        // Device has reported back last 24hrs and has all zero readings for data in this period
        WHEN
            devices.retired = false
            AND MAX(device_measurements.reported_utc_at) BETWEEN :yesterdayFilter AND :nowFilter
            AND SUM(device_measurements.flow) = 0
            THEN 'online'

    END AS status,

:nowFilter contains a date of now() in UTC.

:yesterdayFilter contains a date of now()->subHours(24).

The above does not fail but returns unpredictable results, such an offline device which has had a data report less than 24 hrs ago.

The whole query is:

select
    "devices".*,
    CASE
        WHEN devices.retired = true THEN 'retired'
        WHEN devices.retired = false
        AND devices.serial_number IS NULL THEN 'inactive'
        WHEN devices.retired = false
        AND MAX(device_measurements.reported_utc_at) BETWEEN :yesterdayFilter AND :nowFilter
        AND SUM(device_measurements.flow) > 0 THEN 'active'
        WHEN devices.retired = false
        AND MAX(device_measurements.reported_utc_at) BETWEEN :yesterdayFilter AND :nowFilter
        AND SUM(device_measurements.flow) = 0 THEN 'online'
        WHEN devices.retired = false
        AND MAX(device_measurements.reported_utc_at) < :yesterdayFilter THEN 'offline'
    END AS status,
    MAX(device_measurements.reported_utc_at) AS last_reported_utc_at
from
    "devices"
    left join "devices_meters" on "devices_meters"."device_id" = "devices"."id"
    left join "device_measurements" on "device_measurements"."device_id" = "devices"."id"
group by
    "devices"."id"
having
    (
        CASE
            WHEN devices.retired = true THEN 'retired'
            WHEN devices.retired = false
            AND devices.serial_number IS NULL THEN 'inactive'
            WHEN devices.retired = false
            AND MAX(device_measurements.reported_utc_at) BETWEEN :yesterdayFilter AND :nowFilter
            AND SUM(device_measurements.flow) > 0 THEN 'active'
            WHEN devices.retired = false
            AND MAX(device_measurements.reported_utc_at) BETWEEN :yesterdayFilter AND :nowFilter
            AND SUM(device_measurements.flow) = 0 THEN 'online'
            WHEN devices.retired = false
            AND MAX(device_measurements.reported_utc_at) < :yesterdayFilter THEN 'offline'
        END
    ) = :status
order by
    "devices"."installed_at" desc
limit
    10 offset 0;

Best Answer

Based on a couple of educated guesses, this might be the query you actually want:

SELECT *
FROM  (
   SELECT d.*
        , CASE
           WHEN d.retired               THEN 'retired'
           WHEN d.serial_number IS NULL THEN 'inactive'
           WHEN sum_flow > 0            THEN 'active'
           WHEN sum_flow = 0            THEN 'online'
           ELSE                              'offline'
          END AS status
        , m.last_reported_utc_at
   FROM   devices d
   LEFT   JOIN (
      SELECT device_id
           , max(reported_utc_at) AS last_reported_utc_at
           , sum(flow) FILTER (WHERE reported_utc_at BETWEEN :yesterdayFilter AND now() AT TIME ZONE 'UTC') AS sum_flow
      FROM   device_measurements
      GROUP  BY device_id
      ) m ON m.device_id = d.id
   ) sub
WHERE  status = :status
ORDER  BY installed_at DESC
LIMIT  10
OFFSET 0;

Major points

CASE statements are processed from left to right. Assuming retired is boolean NOT NULL, repeating devices.retired = false in every following WHEN clause is just noise. Consider the manual:

CASE clauses can be used wherever an expression is valid. Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner. If no WHEN condition yields true, the value of the CASE expression is the result of the ELSE clause. If the ELSE clause is omitted and no condition is true, the result is null.

You write:

:nowFilter contains a date of now() in UTC

So it's not a variable but a constant now() AT TIME ZONE 'UTC'.
Consider operating with timestamp with timezone instead. See:

:yesterdayFilter contains a date of now()->subHours(24)

So a timestamp that is less than 24h earlier than :nowFilter?

I removed the join to devices_meters completely, as it plays no role in the query.

I moved the aggregation into subquery. That's cleaner and cheaper for the purpose. Consider:

You seem to want the sum of flow for the given time interval, not the sum total of all rows. I achieve that with an aggregate FILTER clause. See:

But you seem to want the latest reported_utc_at overall, so no filter there.

I wrapped it all in a subquery and added an outer WHERE clause to replace the much more verbose HAVING clause. ORDER BY, LIMIT, and OFFSET after that, to get 10 rows for LIMIT 10.