Which parameters and metrics to look at when CDC based replication is causing high load

change-data-captureoracletungsten-replicator

I'm investigating the reasons why a Tungsten Replicator extractor is causing heavy load on an Oracle 11g database. The replication is set up to use synchronous CDC and there is no unusual load while the extractor is not running, which leads me to believe that the triggers registered by the CDC package are working just fine, without causing any significant load.

When the extractor is running, the load average of the database server can go as high as 60, but it's not that high all the time. Most likely the variance depends on the number of writes.

What configuration parameters and/or metrics of the database should I look into to identify the cause of this load? Tungsten seems to be simply running standard CDC procedures as outlined below.

Initial prepartion steps of the extractor:

  1. Get a list of tables to extract from a table prepared by a setup script:

    SELECT * FROM TUNGSTEN_SOURCE_TABLES
    
  2. For each source table, get a list of columns to extract.

    SELECT
      UNIQUE CHANGE_SET_NAME,
      PUB.COLUMN_NAME,
      PUB_ID,
      COL.COLUMN_ID
    FROM TUNGSTEN_PUBLISHED_COLUMNS PUB, ALL_TAB_COLUMNS COL
    WHERE
      SOURCE_SCHEMA_NAME = '{SERVICE_NAME}' AND
      SOURCE_TABLE_NAME = '{TABLE_NAME}' AND
      SOURCE_SCHEMA_NAME = COL.OWNER AND
      SOURCE_TABLE_NAME = COL.TABLE_NAME AND
      PUB.COLUMN_NAME = COL.COLUMN_NAME AND
      CHANGE_SET_NAME = 'TUNGSTEN_CS_{SERVICE_NAME}'
    ORDER BY COL.COLUMN_ID
    
  3. Create a subscription with DBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION
  4. Create a subscriber view for each source table with DBMS_CDC_SUBSCRIBE.SUBSCRIBE
  5. Activate the subscription with DBMS_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION

Inner loop of the extractor while it's running:

  1. Extend the subscription window with DBMS_CDC_SUBSCRIBE.EXTEND_WINDOW
  2. Select each subscription view to fetch any changes

    -- Receiving changes for the first time
    SELECT
      COLUMN1, COLUMN2, ...,
      CSCN$,
      COMMIT_TIMESTAMP$,
      OPERATION$
    FROM VW_TUNGSTEN_{SERVICE_NAME}{INDEX}
    ORDER BY CSCN$, RSID$
    
    -- Receiving changes after the first window
    SELECT
      COLUMN1, COLUMN2, ...,
      CSCN$,
      COMMIT_TIMESTAMP$,
      OPERATION$
    FROM VW_TUNGSTEN_{SERVICE_NAME}{INDEX}
    WHERE CSCN$ > :lastSCN
    ORDER BY CSCN$, RSID$
    
  3. Purge the subscription window with DBMS_CDC_SUBSCRIBE.PURGE_WINDOW

Environment:

  • Oracle Database 11g Release 11.1.0.6.0 – Production
    • According to my client, this box is 5 years old
    • open_mode: read write
    • log_mode: archivelog
    • platform_id: 10
    • platform_name: Linux IA (32-bit)
  • Tungsten Replicator 3.0.0

Best Answer

I wasn't able to tune up the Oracle side of parameters, due to having limited access to the database instance, but the database load has seen improvements after stripping down the tables and columns that are being replicated.

1. Determine which tables are getting updated the most.

Try running the extractor for a few hours and let the THL grow. Then, you can count the number of changes logged for each table using a command like:

./bin/thl list | gawk -f thl_summary.awk | sort -n -r

Where thl_summary.awk is:

#!/bin/gawk
# Sum up rows from a `bin/thl list` output
BEGIN {
    OFS="\t"
}
match($0, /- TABLE = (.*)/, a) {
    table=a[1]
}
match($0, /- ROW# = (.*)/, a) {
    rows[table]++
}
END {
    for (k in rows) {
        print rows[k], k
    }
}

2. Replicate only those tables and columns that are necessary.

Looking at the list of tables, ask yourself if you can exclude any of them.

  • Exclude CLOB columns.
    • They aren't really supported by synchronous CDC.
    • Even if you specify a subset of the columns in DBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE, each and every change will be logged to the change table1. But by excluding CLOB columns, we can lessen the load of costly writes and reads.
  • Exclude all tables that are not really needed on the downstream database.
    • If necessary, add a fallback table with a subset of the columns that you really need, even if it means denormalizing a bit.
    • The way Tungsten configures CDC, a single update to a source row generates two rows in the corresponding change table2. These extra updates can add up if you have a frequently updated table.

Once you've determined which tables and columns to replicate, you can specify them in a file named tungsten.tables:

aSubsetOfTheTable<TAB>column1,...,columnN
wholeTable

Then, you need to re-run the setupCDC.sh script with setupCDC.conf:

specific_tables=1

1 https://community.oracle.com/thread/2288254

2 Updated rows will generate a record with operation$='UU' ... and all the old values, and another record with operation$='UN' or 'UL' with new values. (source)