I have the following table with ~2000000 rows
CREATE TABLE TableA (idA integer PRIMARY KEY, idB TEXT UNIQUE)
From time to time, I need to process each rows, the order does not mater.
So I process 200 rows at a time, (around a transaction)
SELECT iaA, idB FROM TableA ORDER BY ROWID asc LIMIT 200
but when I checked the query with EXPLAIN QUERY PLAN
I noticed that it was doing a table scan.
0 0 0 SCAN TABLE TableA USING INTEGER PRIMARY KEY (~1000000 rows)
I am especially curious about the ~1000000 rows
It does not matter if I select 1 row or 1000 rows, I still do a table scan with ~1 million rows.
Is this not the most efficient way of querying the table for 'any' data?
As I said, the order does not matter, the data has to be processed.
Edit: If I run the query with no ORDER BY
SELECT * FROM TableA LIMIT 200
I get fairly similar results.
SCAN TABLE TableA USING COVERING INDEX sqlite_autoindex_TableA_1 (~1000000 rows)
Best Answer
EXPLAIN QUERY PLAN shows only an estimate, and the LIMIT clause is ignored when doing this estimation. (And this number is so misleading that recent versions of SQLite do not show it.)
A plain
SELECT * FROM TableA LIMIT x
is the fastest way to getx
rows from a table. SQLite computes result rows on demand, so it will not read the entire table but stop scanning afterx
rows.Table scans are likely to be inefficient when you are searching for a single row or a few rows, or when you have a filter that greatly reduces the number of rows to be returned. But when the query is going to return all rows anyway, it is not possible to reduce the number of rows that actually need to be read from disk, so the best way to reduce disk I/O is to avoid touching anything but the table itself. (In this case, a covering index can replace a table.)