SQLite – Efficient Way to Get Last 10 Rows of Each Group Using CTE

performancequery-performanceselectsqlite

Consider there is a table of job runs history with the following schema:

job_runs
(
    run_id integer not null, -- identifier of the run
    job_id integer not null, -- identifier of the job
    run_number integer not null, -- job run number, run numbers increment for each job
    status text not null, -- status of the run (running, completed, killed, ...)
    primary key (run_id)
    -- ...
)

and it is required to get the last 10 runs with status != 'running' for each job (jobs differ by job_id). To do that I wrote the following query:

SELECT
    *
FROM
    job_runs AS JR1
WHERE
    JR1.run_number IN
    (
        SELECT
            JR2.run_number
        FROM
            job_runs AS JR2
        WHERE
            JR2.job_id = JR1.job_id
            AND
            JR2.status != 'running'
        ORDER BY
            JR2.run_number
        DESC
        LIMIT
            10
    )

It do what I need, but even though there is a multifield index on the job_id and run_num fields of the job_runs table the query is slow, because it scans job_runs table and for each its row runs subquery. The index helps subqueries to run fast each time, but the fact that the nester query scans entire table kills performance. So how can I tune performance of the query?

some thoughts:

  • Number of jobs (different job_ids) is small and if there were a FOR loop in SQLite it would be easy to loop over all distinct job_ids and run the subquery
    passing the job id instead of JR1.job_id then UNION all results.
  • Also I've tried to construct the query using recursive CTE, but unsuccessfully.

important:

Please don't suggest to run the loop inside the source code of my application. I need pure SQL solution.

Best Answer

Since the number of distinct job_id is small, you can try this way of writing this type of queries. I've often used in MySQL that doesn't have window functions - and you can't even have a subquery with a IN (SELECT ... LIMIT x). Try this with an index on (job_id, run_number, status):

SELECT
    jr.*
FROM
    ( SELECT DISTINCT job_id
      FROM job_runs
    ) AS jrd
  JOIN
    job_runs AS jr
    ON  jr.job_id = jrd.job_id
    AND jr.status <> 'running'
    AND jr.run_number >= COALESCE(
        (
            SELECT jri.run_number
            FROM job_runs AS jri
            WHERE jri.job_id = jrd.job_id
              AND jri.status <> 'running'
            ORDER BY jri.run_number DESC
            LIMIT 1 OFFSET 9
        ), -2147483647) 
    ;

and this (with same index):

SELECT
    jr.*
FROM
    ( SELECT DISTINCT job_id
      FROM job_runs
    ) AS jrd
  JOIN
    job_runs AS jr
    ON  jr.run_id IN
        (
            SELECT jri.run_id
            FROM job_runs AS jri
            WHERE jri.job_id = jrd.job_id
              AND jri.status <> 'running'
            ORDER BY jri.run_number DESC
            LIMIT 10
        )
    ;