Sql-server – Leveraging an index on another column with the same ordering guarantee

indexsql server

I have a massive table that is only indexed by the auto-increment column, which is also populated with the current timestamp (which is not indexed).

CREATE TABLE MyTable
(
    ID BIGINT NOT NULL IDENTITY(1, 1) PRIMARY KEY
    ,AuditTimestampUtc DATETIME NOT NULL DEFAULT(GETUTCDATE())
    ,...
)

If I need to query by the creation timestamp of the row, how can I do it efficiently? Adding an index is not feasible because of how gargantuan the table is (hundreds of millions to billions of rows), as we cannot afford the downtime, and I am performing a rare debugging task in a readonly environment, which is essentially

SELECT [...] FROM MyTable WHERE AuditTimestampUtc BETWEEN @Start AND @End

I am trying to debug a novel issue, and have not had to do this task before, so I would have difficulty making the argument for creating a new index. And unfortunately there is quite a process for processing a request to create and sanitize a full database dump (especially given its size), or cloning it to another environment. I have an outdated dump to experiment with, but running the final query will be supervised through a read-only account on production.

Writing a custom binary search seems like overkill, especially in a RDBMS, but alas computers are not mind-readers, even though it is apparent to a person that the identity column can be used as a surrogate ordering* to efficiently search the table by creation time.

_{*Assuming nobody enables IDENTITY_INSERT to violate this temporal ordering guarantee.}

P.S. I do not believe the database platform is very relevant to this question for the first time ever (ignoring the specific syntactical differences of declaring the index/default constraint/etc) but I am using SQL Server.

Best Answer

You could create a temporary table containing just the two columns you're interested in, using the key column as a pointer into the "real" table. Something like:

CREATE TABLE #t
(
    ID int NOT NULL
    , AuditTimestampUtc datetime NOT NULL
);

Insert rows from MyTable like this:

INSERT INTO #t WITH (TABLOCKX) (ID, AuditTimeStampUtc)
SELECT mt.ID
    , mt.AuditTimeStampUtc
FROM dbo.MyTable mt

Then create an index on the table like this:

CREATE CLUSTERED INDEX t_AuditTimeStampUtc
ON #t (AuditTimeStampUtc);

Now, you should be able to query the "real" table, making use of the index on the #temp table, as in:

SELECT <columns from mt>
FROM dbo.MyTable mt
    INNER JOIN #t t ON mt.ID = t.ID
WHERE t.AuditTimeStampUtc >= '2019-06-01 00:00:00'
    AND t.AuditTimeStampUtc < '2019-07-01 00:00:00'

The query above will, probably, do an index seek on the non-clustered index t_AuditTimeStampUtc, with a nested loops join into MyTable. This may be faster than just querying the original table. Especially if you need to do multiple queries like this against MyTable.

Copying data from a large table might seem like a bad idea. If the original table had only the two columns, then yes, I'd agree it's a dumb thing to do. However, if MyTable has many columns, the temp table will only occupy a small fraction of the space of the main table, and will be much more efficient.

Related Solutions

Sql-server – the effect of replacing indexes with filtered (non-null value) indexes

Very interesting approach. My upvote for the creativity.

Since you reclaimed the space, I assume the original indexes are no longer in place? The downsides of filtered indexes then are:

Too many of them may cause the search space of the optimiser to grow too large, leading to poor query plans as the optimiser times out
There are several situations where a filtered index will not even be considered, even though the non-filtered equivalent would be. Notably, this can happen when you get a hash join on the indexed column or if you try to ORDER BY the column (without a filter)
Query parameterisation doesn't work with filtered indexes (see: http://www.sqlservercentral.com/blogs/practicalsqldba/2013/04/08/sql-server-part-9-filtered-index-a-new-way-for-performance-improvemnt/)

In practical terms, this means that you have to be extremely careful with filtered indexes as they will often result in horrible query plans. I would not go so far as to call them useless, but I view them as an addition to traditional indexes, not as a replacement (as you are trying to do).

Oracle Text CTXCAT Domain Index – Transactional and Before Insert/Update Trigger

Aha! I've found the answer. Talk about an edge case.

First, I found this post from 2007, where someone says:

...the AFTER trigger for synchronizing the CTXCAT index on [column] is not firing (since my update statement does not include the indexed column).

...

Oracle, would it not be better to generate the CTXCAT trigger to examine the :old and :new values in the indexed column, rather than using a check on (if updating([column]))?

Over a year later, someone replied:

I fixed this issue by adding additional UPDATING (first_name and last_name) conditions in the DR$table_name trigger.

Here is part of the modified trigger.

if (inserting or updating('LAST_NAME_FIRST') or updating('FIRST_NAME') or updating('LAST_NAME')) then reindex := TRUE;

Hm, so the CTXCAT index uses a trigger to know when it needs to update the index for a particular entry. All I need to do is tweak the trigger and recompile it so it does what I want.

To get the content of the trigger:

SELECT text
  FROM   user_source
  WHERE  name = 'DR$NAMES_IDXTC'
  AND    type = 'TRIGGER'
  ORDER  BY line;

I copied this into Sublime, prettified it, and found this (simplified):

TRIGGER "TEST"."DR$NAMES_IDXTC" AFTER
INSERT
OR
UPDATE ON "TEST"."NAMES"
FOR EACH ROW DECLARE REINDEX boolean := FALSE;

BEGIN 

IF (inserting
    OR updating('COMPOUND_NAME')
    OR :new."COMPOUND_NAME" <> :old."COMPOUND_NAME") THEN REINDEX := TRUE;
END IF;

...

END;

You can see that the 12c Oracle Text version's AFTER trigger does actually compare the :new and :old values of the indexed column to see if it needs to update--not the case back in 2008.

So...if I'm updating the :new value in my BEFORE trigger, that should be reflected in the AFTER trigger, and the comparison would kick off an update to the index. What gives?

Well, here are the two SQL statements I was using:

update test.names set 
  first_name = 'Skye', 
  last_name = 'Fillingim'
  where ... ;

update test.names set
  first_name = null,
  last_name = null
  where ... ;

(Slightly different than what I said in my question, I apologize.)

The effect here is that, each time I used one of these statements, either the :old.compound_name or :new.compound_name would be null. So when we get to this condition:

OR :new."COMPOUND_NAME" <> :old."COMPOUND_NAME"

We are doing an equality comparison against a null, which evaluates to UNKNOWN. Hence, no index update.

This is actually an extreme edge case, because you have to be indirectly modifying :new.column via a trigger, and either :new or :old must be null. I would never have discovered it it I hadn't used those exact SQL statements.

So, we have a slight update to the trigger:

IF (inserting
    OR updating('COMPOUND_NAME')
    OR :new."COMPOUND_NAME" <> :old."COMPOUND_NAME"
    OR (:new."COMPOUND_NAME" IS NULL AND :old."COMPOUND_NAME" IS NOT NULL)
    OR (:new."COMPOUND_NAME" IS NOT NULL AND :old."COMPOUND_NAME" IS NULL)
    ) THEN REINDEX := TRUE;

And then just stick CREATE OR REPLACE in front of the old trigger, recompile it, and everything works perfectly.

Best Answer

Related Solutions

Sql-server – the effect of replacing indexes with filtered (non-null value) indexes

Oracle Text CTXCAT Domain Index – Transactional and Before Insert/Update Trigger

Related Question