How to reduce the Tuple Mover’s disk I/O impact for mergeouts

vertica

I have a cluster with a constant trickle load of new data fed by a stream of large batch COPY statements and also simultaneously support interactive read-queries against the most recently added data, i.e. time series for chart data.

How can I control the Tuple Mover so that it spreads out its work over a longer period of time, to reduce the insane disk I/O it periodically causes which slows down read-query performance to horrible levels. This data is partitioned on a date column, so every 24 hours precisely (GMT) there is a flurry of disk I/O due to partition mini-ROS to ROS consolidation. I've set ActivePartitions to 2, long ago, which helped reduce the problem but it's still a significant slowdown while the mergout is hogging the spindles.

It seems like the resource pool settings ought to help, but I can't find a setting that has an effect. Any suggestions?

Best Answer

What you need to do is make sense of what is your usual load size and set the size of your TM resource pool accordingly. Changing the ActivePartitions to 2 will help you because will stop looking for older partitions to Consolidate.

Answer me this questions :

  • What is the size of your TM pool ?
  • How do you do your Trickle loads ? (they go to WOS or ROS)
  • How many partitions do you have in the table ?
  • How many projections do you have for that table ? (do you use them all)
  • Do you run delete on this data ?(check for replay delete)
  • Your projection have the default sort ? (all columns or specific columns ?)

What is the value of MaxMrgOutROSSizeMB and MoveOutSizePct,PurgeMergeoutPercent.

Better off can you post the definition of your TM ?