Db2 – Isolation Level for a Data Warehouse

data-warehousedb2isolation-level

We are building out a data warehouse (and also some data marts as well) in our organization. My DBA skills have mostly been on OLTP type applications, but I am moving into supporting OLAP for our organization.

We ran into an issue recently where one of our ETL programs (in this case it happens to be a Message Driven Bean written in Java) is using the Repeatable Read (RR) isolation level. Because of this, they were running into locks. I obviously question why RR was being used. (EDIT: I originally thought RR on the DB2 side, but learned this was RR from the JDBC side, which maps to RS on the DB2 side. Since then we have also tried RC on the JDBC side, which is CS on the DB2 side. That appears to have resolved some of the locking issues.)

So I started to research what should be the suggested isolation levels for data warehousing, both from the feed/source side, as well as from the retrieval side.

I really can't find much. I have found a few articles that mentioned Uncommitted Read (UR) being good for the query/retrieval side, but nothing about the feeding side. I was thinking that Cursor Stability (CS) or at the highest Read Stability (RS) should be used. I would side with CS, but perhaps that is tainted by OLTP thinking?

In our case the main feed for our warehouse is right now done "real-time" using Java message queues. Retrieval from the warehouse will be done with ETL tool (DataStage in our case), probably kicked off as nightly jobs (some for daily reports, others for monthly, or quarterly reports). These reports will be based off of our marts. So while the warehouse might be fed "real-time", it won't be read from, or actively queried over (at least not right now).

Granted, we use DB2 9.7 FP 4 (and we are hoping to migrate to 10.1 sometime next year) and some of the specifics may differ per vendor, but what isolation levels do you warehouse DBA's recommend for feeds and reads and why?

Also, any input on isolation levels for marts would be appreciated to.

EDIT: Just so it is easier for other vendor comparisons. Here is the table of DB2 isolation level names as mapped to the JDBC isolation
levels. (This per the book Understanding DB2: Learning Visually By
Examples
and thus copyright IBM.)

+------------------------------+-----------------------+
|            JDBC              |          DB2          |
+------------------------------+-----------------------+
| TRANSACTION_READ_UNCOMMITTED | Uncommitted Read (UR) |
+------------------------------+-----------------------+
| TRANSACTION_READ_COMMITTED   | Cursor Stability (CS) |
+------------------------------+-----------------------+
| TRANSACTION_REPEATABLE_READ  | Read Stability   (RS) |
+------------------------------+-----------------------+
| TRANSACTION_SERIALIZABLE     | Repeatable Read  (RR) |
+------------------------------+-----------------------+

Best Answer

For most databases including Data Warehouses Read Committed is a sufficient isolation level. I wouldn't use Read Uncommitted due to the possibility of incorrect answers. On SQL Server we can also add Read Committed Snapshot Isolation. This versions rows and avoids queries being blocked by updates. Don't know if you have something similar on DB2.