DB2 Instances per AIX LPAR – Best Practices for Balance

best practicesdb2instance

I have a question that has been bugging me, but I just don't know how to validate it. We run DB2 LUW on AIX in our shop. Our AIX is currently at 7.1. Our DB2 is at 9.7 FP 4 for most installations and 10.5 FP 2 for our newest databases. We even have mixed installations (ie, 9.7 and 10.5) on the same "boxes".

Our boxes are LPARs of course, given to us by our AIX admins. The AIX admins are also our storage admins and so all disk is SAN. There is not much physical about our database servers.

What has been bugging me is what is the correct balance of DB2 instances per LPAR? I have started to see some strange behavior regarding DB2 auto-configuring itself. On one DEV LPAR we have the following configurations:

Do note that the following is true for all instances: HEALTH_MON is set to OFF, and AUTO_MAINT is set to OFF for every database. We do allow Self-Tuning Memory Manager (STMM) to be on by default with all necessary parameters set for that (ie, zeros or AUTOMATIC in the correct parameters, the default that DB2 provids…). If we build an OLTP database then we set instance parm MAX_QUERYDEGREE to 1 and leave INTRA_PARALLEL to NO and db parm DFT_DEGREE to 1. If we build an OLAP database (separate instance) we set instance parm MAX_QUERYDEGREE to ANY and INTRA_PARALLEL to YES and the db parm DFT_DEGREE to ANY. We also do not try to mix OLAP and OLTP databases in the same instance, though part of the question is whether to mix the instances on the same LPAR…..

Here is what we have on one DEV LPAR:

  1. First instance is 9.7. Contains some OLAP databases, though they are very small since they are used for testing out DDL changes.
  2. Second instance is 9.7. Contains an OLAP database "playground" for developers. (This one is supposed to be retired the first chance we get.)
  3. Third instance is 10.5. Contains the official DEV version of that playground database from instance 2.
  4. Fourth instance is 10.5. OLTP database with a home grown metadata dictionary.
  5. Fifth instance is 10.5. OLTP database that is used for lookup data.
  6. Sixth instance is 10.5. ODS database (unsure if this is OLTP or OLAP….) for developers to tie data from two transactional systems.

Now disk isn't the biggest issue as that is handled by the SAN. And even memory and CPU are virtuallized by nature of this being an LPAR. But I have seen some strange stuff with the memory.

The one OLAP database (both the playground and the official DEV) are rather large. The developers are testing ETL'ing in large volumes of data and estimating size for their official request for QAT and PROD buildouts. Ever since one of these has started to grow, I've noticed that new instance buildouts and new database buildouts now have STMM turned off by default. I can manually turn it back on, but the point being I find this disturbing. Whether or not STMM is enabled by default is decided by DB2 at creation time based on system resources. So, to me, this is saying DB2 does not feel it has adequate resources for auto management, so it is trying to cap memory in hopes of it gains at least that much. (My guess on that.)

When I discuss this with the AIX admins, they don't seem to think memory and CPU is an issue. They say we aren't really fulling utilizing what we could be, so in their mind (and I'm saying this very loosely) we could put all our dev instances on the same LPAR. Or similar for other environments. Again, that is loose, because they understand the point of separation for HA and DR scenarios.

Anyway that leaves me wondering, are their any pointers or guidelines between knowing when you can add instances to an LPAR or when you should split to other LPARs? I'm sure the answer is "it depends", but I'm looking for any approximations or guidelines? Especially if those differ along the lines of OLTP and OLAP.

Best Answer

You won't get any specific answer to this question, because as you observed, it depends.

The answer depends on your company's tolerance for having one DB2 instance adversely impact the performance of another's. You may be able to control or limit this impact if your AIX admins are willing to learn and set up WLM policies at the operating system level (not DB2 WLM).

This is really a question of capacity management. Understanding the workloads in your environment is key here, and being able to identify (or predict) when you'll run into limits will influence your decisions about how many instance(s) to put on a single server.