Ny benefit in limiting the number precision on a surrogate PK in Oracle

database-designindexoracle

I'm reviewing some DDL that creates surrogate primary keys of type NUMBER(38). When I first looked at this I thought it was a problem because 10^38 is a HUGE number, much much bigger than the number of stars in the universe or all the grains of sand on earth. But I've come to learn that this doesn't matter in terms of how the numbers are stored. And there doesn't seem to be any other options than say using a float type which I'm not really ready to seriously contemplate. Here are some references about those facts:

Given that, it would seem that it's pointless to bother with declaring a limit on the size of the PK i.e. just use NUMBER (which apparently can hold 40 digits.) The only answer that I've seen is that you should do it for consistency. But for a surrogate PK, the only thing it needs to be consistent with is itself. What I mean is, if we set a limit, it's going to be something larger than what we would possibly reach so it seems completely pointless.

One piece of this I'm not sure about is whether the declaration could matter for indexing. I think it probably doesn't for the most common approaches but are there any indexing strategies that might take this limit into consideration or use more storage because the column is not limited?

Best Answer

Just use

my_key number not null

If you are generating the primary key using an Oracle sequence (PS: you should), you are guaranteed to get integers so you don't have to have Oracle checking precision, etc. if you specified number(7,0) to specify an integer with up to seven digits.

And then if you ever exceed 7 digits, not a fun day to fix that. As the links you've referenced above mention that Oracle only uses enough space to store the number, no need to worry about how much storage is being used; and therefore you don't have to concern yourself with indexing of the columns holding the primary and foreign keys.

The only time I would specify a fixed length integer would be if the primary key was an intelligent key such as a USA social security number, but intelligent keys are a bad idea too.

Related Solutions

Minimizing Indexed Reads with Complex Criteria

The degradation over time occurs due to the increased number of items that are in the "Complete" status. Think about this for a second - you won't get any performance degradation when testing as you probably have a small number of rows with the status as "Complete". But in production, they may have millions of rows with the "Complete" status and this number will increase over time. This, essentially, makes your index on Status less and less useful over time. As such, the database probably just decides that because Status almost always has the value 'Complete', it will just scan the table instead of using the index.

In SQL Server (and maybe other RDBMS's?), this can be worked around using Filtered Indexes. In SQL Server you would add a WHERE condition onto the end of your index definition to say "apply this index only to records with a Status <> 'Complete' ". Then any query using this predicate will most likely use the index on the small amount of records not set to 'Complete'. However, based on the documentation here: http://www.firebirdsql.org/refdocs/langrefupd25-ddl-index.html, it does not look like Firebird supports filtered indexes.

A workaround is to put 'Complete' records in an ArchiveTickets table. Create a table with the exact same definition (though without any auto generated ID) as your Tickets table and maintain rows between them by pushing 'Complete' records to the ArchiveTickets table. The Index on your Tickets table will then be over a much smaller number of records and be much higher performance. This will likely mean you will need to change any reports etc that reference 'Complete' tickets to point to the Archive table or perform a UNION across both Tickets and ArchiveTickets. This will have the advantage of not only being fast, but will also mean that you can create specific indexes for the ArchiveTickets table to make it perform better for other queries (for instance: Give me the average time to completion for complete tickets) which are not needed on the Tickets table.

You should be concerned with this if your production is going to go into the thousands of rows. Performance will degrade over time and negatively impact your user experience.

Oracle Text CTXCAT Domain Index – Transactional and Before Insert/Update Trigger

Aha! I've found the answer. Talk about an edge case.

First, I found this post from 2007, where someone says:

...the AFTER trigger for synchronizing the CTXCAT index on [column] is not firing (since my update statement does not include the indexed column).

...

Oracle, would it not be better to generate the CTXCAT trigger to examine the :old and :new values in the indexed column, rather than using a check on (if updating([column]))?

Over a year later, someone replied:

I fixed this issue by adding additional UPDATING (first_name and last_name) conditions in the DR$table_name trigger.

Here is part of the modified trigger.

if (inserting or updating('LAST_NAME_FIRST') or updating('FIRST_NAME') or updating('LAST_NAME')) then reindex := TRUE;

Hm, so the CTXCAT index uses a trigger to know when it needs to update the index for a particular entry. All I need to do is tweak the trigger and recompile it so it does what I want.

To get the content of the trigger:

SELECT text
  FROM   user_source
  WHERE  name = 'DR$NAMES_IDXTC'
  AND    type = 'TRIGGER'
  ORDER  BY line;

I copied this into Sublime, prettified it, and found this (simplified):

TRIGGER "TEST"."DR$NAMES_IDXTC" AFTER
INSERT
OR
UPDATE ON "TEST"."NAMES"
FOR EACH ROW DECLARE REINDEX boolean := FALSE;

BEGIN 

IF (inserting
    OR updating('COMPOUND_NAME')
    OR :new."COMPOUND_NAME" <> :old."COMPOUND_NAME") THEN REINDEX := TRUE;
END IF;

...

END;

You can see that the 12c Oracle Text version's AFTER trigger does actually compare the :new and :old values of the indexed column to see if it needs to update--not the case back in 2008.

So...if I'm updating the :new value in my BEFORE trigger, that should be reflected in the AFTER trigger, and the comparison would kick off an update to the index. What gives?

Well, here are the two SQL statements I was using:

update test.names set 
  first_name = 'Skye', 
  last_name = 'Fillingim'
  where ... ;

update test.names set
  first_name = null,
  last_name = null
  where ... ;

(Slightly different than what I said in my question, I apologize.)

The effect here is that, each time I used one of these statements, either the :old.compound_name or :new.compound_name would be null. So when we get to this condition:

OR :new."COMPOUND_NAME" <> :old."COMPOUND_NAME"

We are doing an equality comparison against a null, which evaluates to UNKNOWN. Hence, no index update.

This is actually an extreme edge case, because you have to be indirectly modifying :new.column via a trigger, and either :new or :old must be null. I would never have discovered it it I hadn't used those exact SQL statements.

So, we have a slight update to the trigger:

IF (inserting
    OR updating('COMPOUND_NAME')
    OR :new."COMPOUND_NAME" <> :old."COMPOUND_NAME"
    OR (:new."COMPOUND_NAME" IS NULL AND :old."COMPOUND_NAME" IS NOT NULL)
    OR (:new."COMPOUND_NAME" IS NOT NULL AND :old."COMPOUND_NAME" IS NULL)
    ) THEN REINDEX := TRUE;

And then just stick CREATE OR REPLACE in front of the old trigger, recompile it, and everything works perfectly.

Best Answer

Related Solutions

Minimizing Indexed Reads with Complex Criteria

Oracle Text CTXCAT Domain Index – Transactional and Before Insert/Update Trigger

Related Question