Postgresql – Selecting the last record that meets a condition

postgresql

I have a simplified table like this:

id |val |
---|----|
1  |6   |
2  |5   |
3  |4   |
4  |4   |
5  |8   |
6  |9   |
7  |7   |

I want to select the records in a way that where val < 5 the last record where val >= 5 is selected. So the result should look like this:

id |val |
---|----|
1  |6   |
2  |5   |
3  |5   |
4  |5   |
5  |8   |
6  |9   |
7  |7   |

I was wondering if this is possible without defining a function, so I tried:

SELECT
CASE
    WHEN val < 5 THEN LAG(id) OVER (ORDER BY id)
    WHEN val >= 5 THEN id
END id,
CASE
    WHEN val < 5 THEN LAG(val) OVER (ORDER BY id)
    WHEN val >= 5 THEN val
END val
FROM test;

but it only returns:

id |val |
---|----|
1  |6   |
2  |5   |
3  |5   |
4  |4   |
5  |8   |
6  |9   |
7  |7   |

Is is possible to write this query without defining a function? Or would it be better with a function?

EDIT: A combined solution

For the sake of completeness, although it slightly differs from the original problem description, I include that if the answers from @Andriy M and @Hogan are combined then NULL values will be preserved and only values below the threshold will be updated with the last valid value. In some cases it is desirable to keep the NULL values. Thus the query would be like this:

SELECT
    id,
    val,
CASE WHEN val < 5 THEN (SELECT sub.val
                        FROM test AS sub
                        WHERE sub.id <= main.id
                            AND sub.val >= 5
                        ORDER BY id DESC
                        LIMIT 1) 
ELSE val END AS newval
FROM test AS main;

Which returns:

id |val |newval |
---|----|-------|
1  |6   |6      |
2  |5   |5      |
3  |4   |5      |
4  |NULL|NULL   |
5  |8   |8      |
6  |9   |9      |
7  |7   |7      |

Best Answer

I think LAG() cannot be used here, because with LAG you need to be specific about how many rows back you want to go. (It is 1 by default, but you can specify 3, 10 or any other number. The point, however, is that it must be a specific number.) In your situation, you do not know if the last matching value was on the previous row or on the row before it or even earlier.

So, a different approach would be to find the ID of the last row with the matching value, then look that ID up to get the value for the final output – something like this:

SELECT
  s.id,
  t.val
FROM
  (
    SELECT
      id,
      MAX(CASE WHEN val >=5 THEN id END) OVER (ORDER BY id ASC) AS last_id
    FROM
      test
  ) AS s
  INNER JOIN test AS t ON s.last_id = t.id
ORDER BY
  s.id ASC
;

Or you could use a correlated subquery to get the last value that is more than 5 in the subset from the lowest ID to the current ID:

SELECT
  id,
  (
    SELECT
      sub.val
    FROM
      test AS sub
    WHERE
      sub.id <= main.id
      AND sub.val >= 5
    ORDER BY
      id DESC
    LIMIT
      1
  ) AS val
FROM
  test AS main
ORDER BY
  id ASC
;

This would be similar to LAG() but more flexible (and likely less efficient) – a LAG() with a tweak, if you like.