I have a MS Access query that is based on a linked ODBC table (Oracle).
The DDL for the underlying Oracle table is:
create table road_insp
(
insp_id integer,
road_id integer,
insp_date date,
length number(10,2)
);
INSERT INTO road_insp (insp_id, road_id, insp_date, length) VALUES (1, 100, to_date('1/1/2017','MM/DD/YY'), 20);
INSERT INTO road_insp (insp_id, road_id, insp_date, length) VALUES (2, 101, to_date('2/1/2017','MM/DD/YY'), 40);
INSERT INTO road_insp (insp_id, road_id, insp_date, length) VALUES (3, 101, to_date('3/1/2017','MM/DD/YY'), 60);
INSERT INTO road_insp (insp_id, road_id, insp_date, length) VALUES (4, 102, to_date('4/1/2018','MM/DD/YY'), 80);
INSERT INTO road_insp (insp_id, road_id, insp_date, length) VALUES (5, 102, to_date('5/1/2018','MM/DD/YY'), 100);
INSERT INTO road_insp (insp_id, road_id, insp_date, length) VALUES (6, 102, to_date('5/1/2018','MM/DD/YY'), 120);
select * from road_insp
INSP_ID ROAD_ID INSP_DAT LENGTH
---------- ---------- -------- ----------
1 100 17-01-01 20
2 101 17-02-01 40
3 101 17-03-01 60
4 102 18-04-01 80
5 102 18-05-01 100
6 102 18-05-01 120
And the local MS Access Query is:
SELECT ri.*
FROM user1_road_insp AS ri
WHERE ri.insp_id = (
select
top 1 ri2.insp_id
from
user1_road_insp ri2
where
ri2.road_id = ri.road_id
and year(insp_date) between [Enter a START year:] and [Enter a END year:]
order by
ri2.insp_date desc,
ri2.length desc,
ri2.insp_id
);
The Problem:
The performance of the query is quite poor. Unlike the table in the sample DDL, the real table has about 10,000 records. Similar queries execute instantly, but this query takes several minutes to run.
I've checked the SQLOut.txt log, and the problem seems to be that it is executing way too many individual statements:
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH" FROM "USER1"."ROAD_INSP" "ri"
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH" FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN ? AND ? )
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH" FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN ? AND ? )
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH" FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN ? AND ? )
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH" FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN ? AND ? )
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH" FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN ? AND ? )
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH" FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN ? AND ? )
Question:
Unlike my other local MS Access queries, this query uses the TOP
predicate. MS Access seems to be attempting to translate TOP
to SQL that the ODBC driver and/or Oracle can understand. However, it seems to be doing a very poor job of it.
How can I improve the performance of this query?
Best Answer
As Mike Walsh suggested, this is really an Access issue more than an ODBC problem. The ODBC drivers really should accept the same SQL dialect as the database they're designed to connect to. Apparently, Access is not properly translating the Top predicate, so it is instead re-executing the subquery for every row of the primary SQL query. Such pitfalls are briefly mentioned in section "Moving Query Processing to the Server" of the Microsoft Documentation you linked to.
Update
I had previously recommended using the aggregate function First(), but now rescind that advise in this case where the selected column differs from the multiple sort fields. At least in Access, it does not behave properly when sorting on multiple fields. See details at end.
Notice that the WHERE condition is applied repeatedly for every subquery. This is necessary in Access because each aggregate query must draw from the same data using the same criteria, otherwise the results could be bogus. (Access does not support temporary tables and multiple statements in a single 'query' which could be used pre-filter and reuse the same dataset.) The nested queries are necessary to match the original, because one must apply the min/max selection on each column individually to properly mimic the sort order.
Should the outermost query also have the same selection criteria re-applied? Only the original poster can know that, but should consider that as-is the returned set could contain rows which contradict the subquery criteria.
Problems with First()
Access treats First() as any other aggregate function, therefore its use requires that all other fields in the Select or Order By clauses are also aggregated. Consider that the First() function is only useful when an Order By clause is specified (otherwise it returns an arbitrary value). BUT, as soon as all fields in the Order By are also included in an aggregate statement, this produces multiple rows for unique combinations of values, effectively nullifying the aggregate behavior of the First() function.
If the single, sorted column is the same as that in First(), then there is no problem. Likewise, it might behave sufficiently well if the sort order matches the indexed key columns of the underlying table. Otherwise, Access does not necessarily respect the sort order of subqueries (see https://stackoverflow.com/questions/13651394/what-order-is-used-by-first-function), so it is not sufficient to do the sort first and pipe that into the statement with the First() function, because it is technically allowed that Access can process input rows in any order. Essentially, Access makes it impossible to reliably order the rows for which the First() or Last() function should be applied for this question.