DB2 LUW – How to Determine Page Size for Index

db2db2-luw

I'm in the process of migrating a set of databases from code set ISO8859-1 to UTF-8, STRING_UNITS=CODEUNITS32. I suspect that quite a few indexes will no longer fit in there current tablespace page size (SQL0614N). Is there a formula that can be used to determine the minimum page size for each index?

I've found the following article by Serge Reilau:

https://www.ibm.com/developerworks/community/blogs/SQLTips4DB2LUW/entry/getrowsize?lang=en

that I guess I can modify to determine the size of the index, but I'm not entirely sure whether the same formula holds for indexes.

Any pointers?

Best Answer

Not sure whether this is actually correct, but I'll use it as an estimate until I figure out something better:

CREATE OR REPLACE FUNCTION GetInxSize(indschema VARCHAR(128), indname VARCHAR(128))
RETURNS INTEGER
SPECIFIC GETINXSIZE
READS SQL DATA
DETERMINISTIC
NO EXTERNAL ACTION
    RETURN
        SELECT SUM(
                    CASE c.TYPENAME
                       WHEN 'SMALLINT'        THEN length
                       WHEN 'INTEGER'         THEN length
                       WHEN 'BIGINT'          THEN length
                       WHEN 'REAL'            THEN length
                       WHEN 'DOUBLE'          THEN length
                       WHEN 'DECFLOAT'        THEN length
                       WHEN 'DECIMAL'         THEN TRUNC(length / 2) + 1
                       WHEN 'CHARACTER'       THEN 4*length
                       WHEN 'VARCHAR'         THEN 4*length + 4
                       WHEN 'GRAPHIC'         THEN length * 2
                       WHEN 'VARGRAPHIC'      THEN length * 2 + 4
                       WHEN 'XML'             THEN 4*c.inline_length + 3
                       WHEN 'DATE'            THEN length
                       WHEN 'TIME'            THEN length
                       WHEN 'TIMESTAMP'       THEN length
                       ELSE raise_error('78000', 'Unknown type')
                    END +
                       CASE WHEN NULLS = 'Y' THEN 1 ELSE 0 END
                 )
        FROM SYSCAT.INDEXES i
        JOIN SYSCAT.INDEXCOLUSE ic
            ON i.indschema = ic.indschema
            AND i.indname = ic.indname
        JOIN SYSCAT.COLUMNS c
            ON i.tabschema = c.tabschema
            AND i.tabname = c.tabname
            AND ic.colname = c.colname
        WHERE i.indschema = GetInxSize.indschema
          AND i.indname = GetInxSize.indname
@

The DB:s in question don't use compression, and I also skipped types that can not be part of an index.

SELECT rtrim(indschema) || '.' || rtrim(indname)
FROM SYSCAT.INDEXES i 
JOIN SYSCAT.TABLES t 
    ON i.tabschema = t.tabschema 
    AND i.tabname = t.tabname 
JOIN SYSCAT.TABLESPACES ts 
    ON coalesce(t.index_tbspace, t.tbspace) 
WHERE t.tabschema not like 'SYS%' 
  AND getinxsize(indschema, indname) > ts.pagesize / 4;

alt:

SELECT distinct rtrim(t.tabschema) || '.' || rtrim(t.tabname)
FROM SYSCAT.INDEXES i 
JOIN SYSCAT.TABLES t 
    ON i.tabschema = t.tabschema 
    AND i.tabname = t.tabname 
JOIN SYSCAT.TABLESPACES ts 
    ON coalesce(t.index_tbspace, t.tbspace) = ts.tbspace 
WHERE t.tabschema not like 'SYS%' 
  AND getinxsize(indschema, indname) > ts.pagesize / 4;

EDIT: Example of XML column in index

A table like:

CREATE TABLE TESTING_APA.ATOM_FEED  
( FEED_XML XML 
, FEED_XML_IS_NULL SMALLINT NOT NULL 
      GENERATED ALWAYS AS (CASE WHEN FEED_XML IS NULL THEN 1 ELSE 0 END) 
) IN USERSPACE1
ORGANIZE BY ROW;

Generates two indexes (not sure why though) where type name of the column is XML:

select tabschema, tabname, indname, colnames 
from syscat.indexes 
where indname in ('SQL190304085605540','SQL190304085605550')

TESTING_APA  ATOM_FEED  SQL190304085605540  +FEED_XML                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
TESTING_APA  ATOM_FEED  SQL190304085605550  +FEED_XML

In addition, explicit xml indexes also result in indexes in the catalog whose column type is XML.

Since the source database is not UTF-8, there can't be any JSON columns. For a UTF-8 database, it may be worthwhile to investigate whether indexes for JSON comes into play. The function GetInxSize is however specially designed to handle non-utf -> utf, so it will have to be changed for utf-8 databases.

Related Solutions

DB2: Is it good to create the LONG table space to use 32 KB data page

Check out the general answer I gave to you on your other question.

If you are storing your XML as a LOB, then I guess I'd see what is the common average size per table. If the average size is 4MB or less (which it sounds like), then you'd only need a 4K tablespace (see Space requirements for table, table 1).

If you are storing the XML as an XML data type in DB2, then you actually get some performance gains over LOB, especially if you inline the XML (and aren't doing updates to it all that often). This may be the best since you mention sizes less than 32K. You may wish to check out XML storage page for in-lining XML within the table vs outside for XML storage.

Db2ckbkp – How to determine what tablespaces are temp-tablespace

The tablespace content type (Datatype in your second example) is defined in sqlutil.h:

/* table space contents                                                       */
#define SQLB_TBS_ANY 0x0               /* All permanent data. Regular table   */
                                       /* space.                              */
#define SQLB_TBS_LARGE 0x20            /* All permanent data. Large table     */
                                       /* space.                              */
#define SQLB_TBS_LONG SQLB_TBS_LARGE   /* Kept for compatibility.             */
#define SQLB_TBS_SYSTMP 0x10           /* system temp data                    */
#define SQLB_TBS_USRTMP 0x40           /* user temp data                      */

So those with Datatype 16 (0x10) would be the system temporary tablespaces I should think.

However, I don't think using the number of non-system temporary tablespaces to determine restore parallelism makes much sense. I would suggest leaving it for DB2 to decide. If you still want to outsmart DB2, you might be better off using the number of CPU cores, I/O parallelism, or the number of containers as the basis of your adjustments.

Best Answer

Related Solutions

DB2: Is it good to create the LONG table space to use 32 KB data page

Db2ckbkp – How to determine what tablespaces are temp-tablespace

Related Question