There are 2 tables:
CREATE TABLE sample1 (
id int,
name varchar,
score int,
mode int
)
CREATE TABLE sample2 (
id int,
mode int
...
)
I need to select distinct name
with MIN(score)
and ROW_NUMBER()
. Must sort by ROW_NUMBER() ASC
. The mode
must exist in table sample2
for a row from sample1
, so there is INNER JOIN
.
ROW_NUMBER()
is needed for sequential LIMIT queries so I can select data in batches. I don't care about consistency of that LIMIT+ROW_NUMBER() trick.
My query:
WITH sample1 AS (
WITH sample1 AS (
SELECT a.name, MIN(a.score) s
FROM sample1 a
JOIN sample2 USING (mode)
GROUP BY a.name
)
SELECT *, ROW_NUMBER() OVER (ORDER BY s ASC) AS n
FROM sample1
) SELECT * FROM sample1 WHERE n > $1 LIMIT $2
I am concerned about efficiency of that select, in fact it's 3 level deep subquery. Perhaps there are less ugly and more efficient ways to achieve same result?
The number of rows in this table will be approximately 1,000,000 and distinct set is expected to be 100,000. Queries are frequent, actually it's a hot table for select, only score column will be updated in batches of the same size as limit select.
Best Answer
Would seem like you do not need any CTE, subquery or even
JOIN
at all. Just anEXISTS
semi-join and an additionalOFFSET
:A large
OFFSET
is always costly, as all the leading, discarded rows have to be considered in the ordering first.If it's a read-only table, you could add a
MATERIALIZED VIEW
with an index on the row number and query withSELECT * FROM mv WHERE n > $1 AND n <= $1 + $2
.Would help most for big
$1
and small$2
.