Minimizing Indexed Reads with Complex Criteria

firebirdindexperformance

I'm optimizing a Firebird 2.5 database of work tickets. They're stored in a table declared as such:

CREATE TABLE TICKETS (
  TICKET_ID id PRIMARY KEY,
  JOB_ID id,
  ACTION_ID id,
  STATUS str256 DEFAULT 'Pending'
);

I generally want to find the first ticket that hasn't been processed and is in Pending status.

My processing loop would be:

  1. Retrieve 1st Ticket where Pending
  2. Do work with Ticket.
  3. Update Ticket Status => Complete
  4. Repeat.

Nothing too fancy. If I'm watching the database while this loop runs I see the number of indexed reads climbs for each iteration. The performance doesn't seem to degrade terribly that I can tell, but the machine I'm testing on is pretty quick. However, I've received reports of performance degradation over time from some of my users.

I've got an index on Status, but it still seems like it scans down the Ticket_Id column each iteration. It seems like I'm overlooking something, but I'm not sure what. Is the climbing number of indexed reads for something like this expected, or is the index misbehaving in some way?

— Edits for comments —

In Firebird you limit row retrieval like:

Select First 1
  Job_ID, Ticket_Id
From
  Tickets
Where
  Status = 'Pending'

So when I say "first", I'm just asking it for a limited record set where Status = 'Pending'.

Best Answer

The degradation over time occurs due to the increased number of items that are in the "Complete" status. Think about this for a second - you won't get any performance degradation when testing as you probably have a small number of rows with the status as "Complete". But in production, they may have millions of rows with the "Complete" status and this number will increase over time. This, essentially, makes your index on Status less and less useful over time. As such, the database probably just decides that because Status almost always has the value 'Complete', it will just scan the table instead of using the index.

In SQL Server (and maybe other RDBMS's?), this can be worked around using Filtered Indexes. In SQL Server you would add a WHERE condition onto the end of your index definition to say "apply this index only to records with a Status <> 'Complete' ". Then any query using this predicate will most likely use the index on the small amount of records not set to 'Complete'. However, based on the documentation here: http://www.firebirdsql.org/refdocs/langrefupd25-ddl-index.html, it does not look like Firebird supports filtered indexes.

A workaround is to put 'Complete' records in an ArchiveTickets table. Create a table with the exact same definition (though without any auto generated ID) as your Tickets table and maintain rows between them by pushing 'Complete' records to the ArchiveTickets table. The Index on your Tickets table will then be over a much smaller number of records and be much higher performance. This will likely mean you will need to change any reports etc that reference 'Complete' tickets to point to the Archive table or perform a UNION across both Tickets and ArchiveTickets. This will have the advantage of not only being fast, but will also mean that you can create specific indexes for the ArchiveTickets table to make it perform better for other queries (for instance: Give me the average time to completion for complete tickets) which are not needed on the Tickets table.

You should be concerned with this if your production is going to go into the thousands of rows. Performance will degrade over time and negatively impact your user experience.