Does a cluster index provide more benefits than pre-sorting the load file and creating non-cluster index

clustered-indexindexinformix

Informix 11.70.TC4DE:

CREATE TABLE cluster_tbl
    (
fk_id INT,
data CHAR(2048)
    );

LOAD FROM "presorted.ld" INSERT INTO cluster_tbl;

CREATE UNIQUE CLUSTER INDEX cl_idx ON cluster_tbl(fk_id);

{###}

CREATE TABLE noncluster_tbl
    (
fk_id INT,
data CHAR(2048)
    );

LOAD FROM "presorted.ld" INSERT INTO noncluster_tbl;

CREATE UNIQUE INDEX ncl_idx ON noncluster_tbl(fk_id);

{###}

UPDATE STATISTICS;

It's my understanding that a cluster index is essentially the same as pre-sorting the load file, inserting it into the table and creating the index for the foreign key column.

I created two tables, each loaded with 2 Million rows of identically data, pre-sorted by the foreign key. However, one table has a cluster index on fk_id, the other table with a non-clustered index on fk_id.

Is there any difference in the tree structure or additional overhead in maintaining or accessing data from a table with a cluster index, versus the same table/data with a non-cluster index?

Best Answer

In an Informix database, there is not a lot of difference between the table and index created by loading an empty table in pre-sorted order and creating a clustered index on the table in the same order. Thus, the net result is basically the same. Further, there are no long-term implications for either method; Informix does not maintain the clustered index in order after the initial clustering operation. The performance benefit from a clustered index with Informix is a second-order effect; there is a small benefit when you are selecting the data in the clustered order, but it is not a huge benefit.

However, there are considerable practical advantages to using CREATE [UNIQUE] CLUSTER INDEX <idxname> ON <tablename>(<columnlist>) or using ALTER INDEX <idxname> TO CLUSTER. These include the fact that you don't have to drop the table (and thereby destroy views selecting from the table, or lose permissions granted on the table — though I suppose you could do a full table delete, or even truncate the table, instead of dropping the table), and you don't have to explicitly store the data in a file, and you save the cost of transferring the data from the DBMS to the application for saving the data and the cost of transferring the data from the application back to the DBMS when you recreate (reload) the table.

So, unless there are reasons for wanting a copy of the data outside the database, there's no point in using the UNLOAD and LOAD mechanism; you are better off using a clustered index because it is easier to manage the operation that way.