Heavy performance drop on Progress query, looks like based on string character inside table

performanceprofilerprogress-database

I'm launching following query on a Rel 11.6 Progress database:

FIND FIRST Table1 WHERE Table1.Field1 = 1
                    AND Table1.Field2 >= 2
                    AND TRUE
                    AND
  CAN-FIND(Table2 WHERE Table2.Field1 = Table1.Field3    /* join */
                    AND Table2.Field2 = 0
                    AND Table2.Field3 = "WHATEVER").

On both tables, the necessary indexes are created and rebuilt.

The problem is the following: when the criterion Table2.Field3 = "WHATEVER" is present (although that field is present in an index), the query takes entire minutes to run, otherwise the query is done very rapidly (in a matter of seconds).

What can explain such behaviour in a Progress 11.6 database and are there any profiling possibilities for monitoring what's happening? (I'm working with the appBuilder/procedure editor "IDE")

Thanks in advance

Best Answer

It isn't quite clear to me but, as I read the question, you are saying that you have two distinct versions of the code. One with that phrase, another without.

I expect that in the version with that bit of code it is slow because there is no index where field1, field2, and field3 are in consecutive order. In a case like that you will have a table-scan to resolve the query.

To illustrate using the standard sports2000 database:

define variable dbx as int64 no-undo.

find _myconnection no-lock.
find _connect no-lock where _connect-id = _myconn-userid + 1.
find _userio  no-lock where _userio-id = _myconn-userid + 1.

display _connect-name _connect-type _connect-device.

dbx =  _userio-dbaccess.

find first order no-lock
  where order.orderStatus = "shipped"
    and order.shipDate    >= 2/10/1997
//  and can-find( employee where employee.lastName = "parker" )
    and can-find( employee where employee.firstName = "karen" )
  no-error.

find _userio  no-lock where _userio-id = _myconn-userid + 1.
dbx =  _userio-dbaccess - dbx.

display orderNum dbx.

When the CAN-FIND() references employee.lastName (commented out above) you will see that "dbx" = 4. That is to say that your query accessed a total of 4 db blocks (2 index entries and 2 data records). When you run it with employee.firstName it takes 114 db accesses because, while firstName is an indexed field, it is the 2nd component of a compound index and thus cannot be used to bracket the query. Obviously in the tiny sports2000 database this has very little impact on runtime. But the 25x difference in db access operations shows the table-scan in action and hints at the impact that would be felt in a real database.

In OpenEdge 4gl (aka "ABL") queries are optimized at compile time by following rules. There are many rules but the "golden rule" is to have equality matches on leading components of composite indexes. There is a lot of information on the topic shared in the various user groups. A good repository of past presentations can be found at PUG Challenge. The various "pick an index" talks are a good place to get started.

(The OpenEdge SQL engine uses a cost-based optimizer - but we know that this is a 4GL query since the FIND FIRST and CAN-FIND syntax is being used.)

You can see which indexes are being selected by compiling your code with the XREF option. If you see WHOLE-INDEX in that output that is telling you that a table scan is required. (If you do not see that it is not a sign that your query is efficient - just that you didn't trigger that particular warning.)

You can also profile activity at run time using the virtual system tables or by using a tool such as ProTop (if you go with ProTop you probably want to investigate "programmer mode"). Either approach will reveal which indexes are actually being used and how much activity there is. If you carefully control your test environment you can learn quite a lot.