Db2 – Should indexes automatically be placed into their own bufferpool

best practicesbuffer-pooldb2indexperformance

We work with DB2 LUW (specifically we are currently on 9.7 FP4) running on AIX.

Per best practices from IBM, they recommend that when you build a table, you place the data, the indexes, and LOBs/LONG data each into their own tablespace. (Reason is better control over disk, maintenance, backups, etc.)

Tablespaces have to be associated to bufferpools. Right now most of our tables and indexes fit into 4K bufferpools and tablespaces.

Normally with our scripts we set up a 4K bufferpool BP4K. We then set up at least two 4K tablespaces (as we dont' have a lot of LOBs): TS_DAT_4K for data and TS_IND_4K for indexes.
By default we have just been assigning both of these tablespaces to bufferpool BP4K.

What I have wondered: Since indexes are in their own tablespace, should they also be given their own bufferpool?

My reason for wondering this is based on optimization. If indexes have their own bufferpool, then they are more likely to remain in memory (rather than being pushed out of memory due to table records being read in). This would allow for faster scanning of indexes to find the records in the tables. And since indexes would no longer share the same bufferpool with the tables, more of the tables can now remain in memory for logical retrieval versus a physical read. So my thought is that this would reduce physical I/O and thus help with the performance/maintenance of the database.

I also can't help but wondering if this is just pre-optimization, which is 99% of the time a bad thing to do, introducing extra overhead, etc. (especially since we haven't identified that we need separate bufferpools yet based on current performance. Of course the application under development isn't in production yet either and does still need database tuning….)

Thoughts on this? Is this a best-practice? or just pre-optimization and over-thinking?

Best Answer

Lets run two hypothetical databases (H1 and H2) in parallel, with the same total amount of RAM for bufferpools (R).

  • Let H1 have a single bufferpool of size R.
  • Let H2 have two bufferpools: one of size I for index pages, the other of size D for data pages. (And D+I==R of course.)

The question is:

  • How do you determine the right values for I and D to make H2 perform better than H1?

My answer is that you can't in general. The database engine for H1 has more room for optimizing its cache than H2. If there are times of day where more index pages would lead to better performance, it can discard unused data pages and have a "mostly index pages" cache. If at a later time, data pages become hotter, it can evict more index pages and have a "mostly data pages" cache.

H2 can't do that. Once it has I pages-worth of index pages cached, it can't cache more even if that's what would have been best right now. It's stuck there with a sub-optimal use of RAM.

The only way for H2 to perform just as well as H1 is if the D/I split initially chosen is ideal, and the workload is very stable. That sure can happen, but I'd wager that it is not a very usual database scenario. If that's not your case, think of H1 as being the same as H2 but with dynamic partitioning of the cache between data and index pages managed directly by the thing that knows the most about how it needs to optimizing I/O (i.e. the database engine).

This is not to say that maintaining different bufferpools is never a good idea.
One scenario for isolating specific pages in a specific cache I've encountered is having a critical "report" that needed to run fast (obviously) at all times, and happened to use a few tables that were pretty much never used elsewhere. So those pages kept getting evicted, and that "report" had runtimes that varied a lot from run to run. Moving a specific set of tables (and their indexes) to a specific pool removed much of the "non-deterministic" performance part.
But that's sub-optimal for the database as a whole, more of a kludge and nearer Voodoo optimization IMO. (And that wasn't on DB2 but I believe that's irrelevant here.)

So my advice is: if you have X Gb of RAM available for cache, use a single buffer, put everything in it, and let the database engine do its magic in there.
If you run into a corner case that would appear to benefit from cache segregation, try it out, benchmark it, think about the overhead of having to maintain the magic numbers for each cache size, and go tune the queries, schema or disk layout instead :)