We work with DB2 LUW (specifically we are currently on 9.7 FP4) running on AIX.
Per best practices from IBM, they recommend that when you build a table, you place the data, the indexes, and LOBs/LONG data each into their own tablespace. (Reason is better control over disk, maintenance, backups, etc.)
Tablespaces have to be associated to bufferpools. Right now most of our tables and indexes fit into 4K bufferpools and tablespaces.
Normally with our scripts we set up a 4K bufferpool BP4K. We then set up at least two 4K tablespaces (as we dont' have a lot of LOBs): TS_DAT_4K for data and TS_IND_4K for indexes.
By default we have just been assigning both of these tablespaces to bufferpool BP4K.
What I have wondered: Since indexes are in their own tablespace, should they also be given their own bufferpool?
My reason for wondering this is based on optimization. If indexes have their own bufferpool, then they are more likely to remain in memory (rather than being pushed out of memory due to table records being read in). This would allow for faster scanning of indexes to find the records in the tables. And since indexes would no longer share the same bufferpool with the tables, more of the tables can now remain in memory for logical retrieval versus a physical read. So my thought is that this would reduce physical I/O and thus help with the performance/maintenance of the database.
I also can't help but wondering if this is just pre-optimization, which is 99% of the time a bad thing to do, introducing extra overhead, etc. (especially since we haven't identified that we need separate bufferpools yet based on current performance. Of course the application under development isn't in production yet either and does still need database tuning….)
Thoughts on this? Is this a best-practice? or just pre-optimization and over-thinking?
Best Answer
Lets run two hypothetical databases (
H1
andH2
) in parallel, with the same total amount of RAM for bufferpools (R
).H1
have a single bufferpool of sizeR
.H2
have two bufferpools: one of sizeI
for index pages, the other of sizeD
for data pages. (AndD+I==R
of course.)The question is:
I
andD
to makeH2
perform better thanH1
?My answer is that you can't in general. The database engine for
H1
has more room for optimizing its cache thanH2
. If there are times of day where more index pages would lead to better performance, it can discard unused data pages and have a "mostly index pages" cache. If at a later time, data pages become hotter, it can evict more index pages and have a "mostly data pages" cache.H2
can't do that. Once it hasI
pages-worth of index pages cached, it can't cache more even if that's what would have been best right now. It's stuck there with a sub-optimal use of RAM.The only way for
H2
to perform just as well asH1
is if theD
/I
split initially chosen is ideal, and the workload is very stable. That sure can happen, but I'd wager that it is not a very usual database scenario. If that's not your case, think ofH1
as being the same asH2
but with dynamic partitioning of the cache between data and index pages managed directly by the thing that knows the most about how it needs to optimizing I/O (i.e. the database engine).This is not to say that maintaining different bufferpools is never a good idea.
One scenario for isolating specific pages in a specific cache I've encountered is having a critical "report" that needed to run fast (obviously) at all times, and happened to use a few tables that were pretty much never used elsewhere. So those pages kept getting evicted, and that "report" had runtimes that varied a lot from run to run. Moving a specific set of tables (and their indexes) to a specific pool removed much of the "non-deterministic" performance part.
But that's sub-optimal for the database as a whole, more of a kludge and nearer Voodoo optimization IMO. (And that wasn't on DB2 but I believe that's irrelevant here.)
So my advice is: if you have X Gb of RAM available for cache, use a single buffer, put everything in it, and let the database engine do its magic in there.
If you run into a corner case that would appear to benefit from cache segregation, try it out, benchmark it, think about the overhead of having to maintain the magic numbers for each cache size, and go tune the queries, schema or disk layout instead :)