MS Access – Translating TOP Predicate to ODBC/Oracle SQL

ms accessodbcoptimizationtop

I have a MS Access query that is based on a linked ODBC table (Oracle).

The DDL for the underlying Oracle table is:

create table road_insp
(
    insp_id integer,
    road_id integer,
    insp_date date,
    length number(10,2)
);

INSERT INTO road_insp (insp_id, road_id, insp_date, length)  VALUES (1, 100, to_date('1/1/2017','MM/DD/YY'), 20);
INSERT INTO road_insp (insp_id, road_id, insp_date, length)  VALUES (2, 101, to_date('2/1/2017','MM/DD/YY'), 40);
INSERT INTO road_insp (insp_id, road_id, insp_date, length)  VALUES (3, 101, to_date('3/1/2017','MM/DD/YY'), 60);
INSERT INTO road_insp (insp_id, road_id, insp_date, length)  VALUES (4, 102, to_date('4/1/2018','MM/DD/YY'), 80);
INSERT INTO road_insp (insp_id, road_id, insp_date, length)  VALUES (5, 102, to_date('5/1/2018','MM/DD/YY'), 100);
INSERT INTO road_insp (insp_id, road_id, insp_date, length)  VALUES (6, 102, to_date('5/1/2018','MM/DD/YY'), 120);

select * from road_insp

   INSP_ID    ROAD_ID   INSP_DAT     LENGTH
---------- ----------   -------- ----------
         1        100   17-01-01         20
         2        101   17-02-01         40
         3        101   17-03-01         60
         4        102   18-04-01         80

         5        102   18-05-01        100
         6        102   18-05-01        120

And the local MS Access Query is:

SELECT ri.*
FROM user1_road_insp AS ri
WHERE ri.insp_id = (
                    select 
                        top 1 ri2.insp_id   
                    from 
                        user1_road_insp ri2 
                    where 
                        ri2.road_id = ri.road_id 
                        and year(insp_date) between  [Enter a START year:] and [Enter a END year:]
                    order by 
                        ri2.insp_date desc, 
                        ri2.length desc,
                        ri2.insp_id
                   );

The Problem:

The performance of the query is quite poor. Unlike the table in the sample DDL, the real table has about 10,000 records. Similar queries execute instantly, but this query takes several minutes to run.

I've checked the SQLOut.txt log, and the problem seems to be that it is executing way too many individual statements:

SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH"  FROM "USER1"."ROAD_INSP" "ri" 
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH"  FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN  ? AND  ? ) 
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH"  FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN  ? AND  ? ) 
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH"  FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN  ? AND  ? ) 
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH"  FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN  ? AND  ? ) 
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH"  FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN  ? AND  ? ) 
SQLExecDirect: SELECT "INSP_ID" ,"ROAD_ID" ,"INSP_DATE" ,"LENGTH"  FROM "USER1"."ROAD_INSP" "ri2" WHERE ({fn year("INSP_DATE" )}BETWEEN  ? AND  ? )

Question:

Unlike my other local MS Access queries, this query uses the TOP predicate. MS Access seems to be attempting to translate TOP to SQL that the ODBC driver and/or Oracle can understand. However, it seems to be doing a very poor job of it.

How can I improve the performance of this query?

Best Answer

As Mike Walsh suggested, this is really an Access issue more than an ODBC problem. The ODBC drivers really should accept the same SQL dialect as the database they're designed to connect to. Apparently, Access is not properly translating the Top predicate, so it is instead re-executing the subquery for every row of the primary SQL query. Such pitfalls are briefly mentioned in section "Moving Query Processing to the Server" of the Microsoft Documentation you linked to.

Update

I had previously recommended using the aggregate function First(), but now rescind that advise in this case where the selected column differs from the multiple sort fields. At least in Access, it does not behave properly when sorting on multiple fields. See details at end.

Use an Access Pass-through query using the Oracle SQL dialect. This bypasses the Access SQL interpreter and execution engine and instead sends the query directly to the database server.
JOIN to a subquery instead of referencing an outer table in a WHERE clause subquery. Best case, this will hopefully translate into a complete statement sent to the server. Worst case I think will be a small, determinate number of queries sent to the server, with Access performing joins (and possibly the aggregation and/or sorting) locally which is often sufficiently fast. Without the use of Top or First() when grouping on road_id, the alternative looks somewhat ugly but may still be more efficient than the original query.

SELECT ri4.*
FROM user1_road_insp ri4
INNER JOIN
  (select ri3.road_id, min(ri3.insp_id) as min_insp_id
  from user1_road_insp ri3 INNER JOIN 
    (select ri2.road_id, a1.max_insp_date, Max(ri2.length) As max_length
    from user1_road_insp ri2 INNER JOIN
      (select ri1.road_id, Max(ri1.insp_date) As max_insp_date
      from user1_road_insp ri1
      where year(ri1.insp_date) between [Enter a START year:] and [Enter a END year:]
      group by ri1.road_id) a1
    on ri2.road_id = a1.road_id AND ri2.insp_date = a1.max_insp_date
    where year(ri2.insp_date) between [Enter a START year:] and [Enter a END year:]
    group by ri2.road_id, a1.max_insp_date) a2
  on ri3.road_id = a2.road_id AND ri3.insp_date = a2.max_insp_date and ri3.length = a2.max_length
  where year(ri3.insp_date) between [Enter a START year:] and [Enter a END year:]
  group by ri3.road_id) a3
ON (ri4.road_id = a3.road_id) AND (ri4.insp_id = a3.min_insp_id)

Notice that the WHERE condition is applied repeatedly for every subquery. This is necessary in Access because each aggregate query must draw from the same data using the same criteria, otherwise the results could be bogus. (Access does not support temporary tables and multiple statements in a single 'query' which could be used pre-filter and reuse the same dataset.) The nested queries are necessary to match the original, because one must apply the min/max selection on each column individually to properly mimic the sort order.

Should the outermost query also have the same selection criteria re-applied? Only the original poster can know that, but should consider that as-is the returned set could contain rows which contradict the subquery criteria.

Problems with First()

Access treats First() as any other aggregate function, therefore its use requires that all other fields in the Select or Order By clauses are also aggregated. Consider that the First() function is only useful when an Order By clause is specified (otherwise it returns an arbitrary value). BUT, as soon as all fields in the Order By are also included in an aggregate statement, this produces multiple rows for unique combinations of values, effectively nullifying the aggregate behavior of the First() function.

If the single, sorted column is the same as that in First(), then there is no problem. Likewise, it might behave sufficiently well if the sort order matches the indexed key columns of the underlying table. Otherwise, Access does not necessarily respect the sort order of subqueries (see https://stackoverflow.com/questions/13651394/what-order-is-used-by-first-function), so it is not sufficient to do the sort first and pipe that into the statement with the First() function, because it is technically allowed that Access can process input rows in any order. Essentially, Access makes it impossible to reliably order the rows for which the First() or Last() function should be applied for this question.

Best Answer

Update

Problems with First()

Related Solutions

Ms-access – update an Access table with an Oracle table

ODBC Microsoft Access Driver – Fix ‘Not Enough Space on Temporary Disk’ Error

Related Question