Your description results in a table definition like this:
CREATE TABLE tbl (
lap_id serial PRIMARY KEY
, lap_no int NOT NULL
, car_type enum NOT NULL
, race_id int NOT NULL -- REFERENCES ...
, UNIQUE(race_id, car_type, lap_no)
);
General solution for this class of problems
To get the longest sequence (1 result, longest of all, arbitrary pick if there are ties):
SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT *, count(*) FILTER (WHERE step)
OVER (ORDER BY race_id, car_type, lap_no) AS grp
FROM (
SELECT *, (lag(lap_no) OVER (PARTITION BY race_id, car_type ORDER BY lap_no) + 1)
IS DISTINCT FROM lap_no AS step
FROM tbl
) x
) y
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;
count(*) FILTER (WHERE step)
only counts TRUE
(= step to next group), which results in a new number for every new group.
Related question on SO, one answer featuring a procedural solution with plpgsql:
If the top requirement is performance, the plpgsql function is typically faster in this particular case because it can calculate the result in a single scan.
Faster for consecutive numbers
We can capitalize on the fact that consecutive lap_no
define a sequence, for a much simpler and faster version:
SELECT race_id, car_type, count(*) AS seq_len
FROM (
SELECT race_id, car_type
, row_number() OVER (PARTITION BY race_id, car_type ORDER BY lap_no) - lap_no AS grp
FROM tbl
) x
GROUP BY race_id, car_type, grp
ORDER BY seq_len DESC
LIMIT 1;
Consecutive laps end up in the same grp
. Every missing lap results in a lower grp
per partition.
This relies on (race_id, car_type, lap_no)
being UNIQUE NOT NULL
. NULL values or duplicates could break the logic.
Discussion of Jack's simpler alternative
@Jack's version effectively counts all laps (rows) where the previous lap_no
in this race_id
had the same car_type
. That's simpler and faster and correct - as long as each car_type
can only have one sequence per race_id
.
But for a task that simple the query could be simpler, yet. It would follow logically that all lap_no
per (car_type, race_id)
must be in sequence, and we could just count the laps:
SELECT race_id, car_type, count(*) AS seq_len
FROM tbl
GROUP BY race_id, car_type
ORDER BY seq_len DESC
LIMIT 1;
If, on the other hand, one car_type
can have multiple separate sequences per race_id (and the question does not specify otherwise), Jack's version will fail.
Faster for a given race / car type
In reply to the comment / clarifications in the question: restricting the query to one given (race_id, car_type)
will make it much faster, of course:
SELECT count(*) AS seq_len
FROM (
SELECT row_number() OVER (ORDER BY lap_no) - lap_no AS grp
FROM tbl
WHERE race_id = 1
AND car_type = 'red'
) x
GROUP BY grp
ORDER BY seq_len DESC
LIMIT 1;
db<>fiddle here
Old SQL Fiddle
Index
Key to top performance is a fitting index (except for the mentioned procedural solution working with a single sequential scan). A multicolumn index like this serves best:
CREATE INDEX tbl_mult_idx ON tbl (race_id, car_type, lap_no);
If your table has the UNIQUE
constraint I assumed at the top, that is implemented with just this (unique) index internally, and you do not need to create another index.
Best Answer
You can do something like this..
To query this, just search for
code = 0 AND count >= 4
.Explanation
First we generate a reset
Then we create groups,
Now you'll notice that the groups a fn(code,segment), reseting when either changes. Now we group by the
grp
, and do acount(*)
, but because you want the ids we run them througharray_agg
and store them as an array.Note, you say and the sequence of IDs is not broken. I'm working off the assumption that this is not what you meant. PostgreSQL doesn't have a concept of gapless sequences, so that's not a good idea anyway. I assume you wanted what I've done.