Db2 performance issue – large number of IOPS how to determine problem

db2

DB2 database is on storage. We are using 9.7
Storage administrators reported that today database was doing 5000 IOPS (during past days average number was 300 IOPS). Those are disk operations from server to disk system.
db2data is currently on 89%.
Today we did not do any large upload.
How is this possible what happened to cause this problem?

In db2diag log only one ERROR is seen today:

2014-02-21-12.07.26.669877+060 E91992479E1392      LEVEL: Error
PID     : 22867                TID  : 1084623168   PROC : db2acd 0
INSTANCE: ctginst1             NODE : 000
FUNCTION: DB2 UDB, Health Monitor, HealthIndicator::update, probe:500
MESSAGE : ADM10500E  Health indicator "Database Automatic Storage Utilization" 
          ("db.auto_storage_util") breached the "upper" alarm threshold of "90 
          %" with value "94 %" on "database" "ctginst1.MAXDB71 ".  Calculation: 
          "((db.auto_storage_used/db.auto_storage_total)*100);" = 
          "((30921187328 / 32821358592) * 100)" = "94 %".  History (Timestamp, 
          Value, Formula): " (02/21/2014 11:57:26.209607, 89, ((29312966656 / 
          32821358592) * 100)), (02/21/2014 11:47:26.676942, 89, ((29311541248 
          / 32821358592) * 100)), (02/21/2014 11:37:26.426393, 89, 
          ((29324247040 / 32821358592) * 100)), (02/21/2014 11:27:26.984178, 90,
          ((29492969472 / 32821358592) * 100)), (02/21/2014 11:17:26.520087, 90,
          ((29499133952 / 32821358592) * 100)), (02/21/2014 11:07:26.945333, 93,
          ((30552121344 / 32821358592) * 100)), (02/21/2014 10:57:26.657101, 92,
          ((30317146112 / 32821358592) * 100)), (02/21/2014 10:47:26.264625, 92,
          ((30147538944 / 32821358592) * 100)), (02/21/2014 10:37:26.876803, 91,
          ((29968994304 / 32821358592) * 100))"

Any help is welcome. How to determine what happened? Should we restart database to solve this?

Best Answer

The message you include from your db2diag.log shows that there is a potential space problem (i.e. the file system holding your data is filling up). The monitoring data here shows that you are ranging between 89% and 94% over a 90 minute period with increases and decreases – this is probably associated with system temporary tables in the database. It may be worthwhile to increase the size of the filesystem(s) holding the data since you're getting very close to running out of space.

This space utilization, however, has nothing to do with how much I/O activity (IOPS) the database is generating.

Generally when you have an I/O problem where you see a much different usage pattern than normal. You mention that normally there the system is generating 300 IOPS, but lately it's been generating 5000 IOPS.

Assuming that the I/O activity can be traced to the file system(s) holding DB2 data, then you need to monitor the database during these periods of high I/O to find out what queries are active and are causing lots of physical I/O, and then do some investigation to figure out why. You can use the db2top utility as well as many other monitoring tools to figure this out.

At that point it's a tuning exercise. Are your bufferpools too small? Was an index dropped? Or is the cause a new query that needs to be reviewed / tuned?