How to reduce the Tuple Mover’s disk I/O impact for mergeouts

vertica

I have a cluster with a constant trickle load of new data fed by a stream of large batch COPY statements and also simultaneously support interactive read-queries against the most recently added data, i.e. time series for chart data.

How can I control the Tuple Mover so that it spreads out its work over a longer period of time, to reduce the insane disk I/O it periodically causes which slows down read-query performance to horrible levels. This data is partitioned on a date column, so every 24 hours precisely (GMT) there is a flurry of disk I/O due to partition mini-ROS to ROS consolidation. I've set ActivePartitions to 2, long ago, which helped reduce the problem but it's still a significant slowdown while the mergout is hogging the spindles.

It seems like the resource pool settings ought to help, but I can't find a setting that has an effect. Any suggestions?

Best Answer

What you need to do is make sense of what is your usual load size and set the size of your TM resource pool accordingly. Changing the ActivePartitions to 2 will help you because will stop looking for older partitions to Consolidate.

Answer me this questions :

What is the size of your TM pool ?
How do you do your Trickle loads ? (they go to WOS or ROS)
How many partitions do you have in the table ?
How many projections do you have for that table ? (do you use them all)
Do you run delete on this data ?(check for replay delete)
Your projection have the default sort ? (all columns or specific columns ?)

What is the value of MaxMrgOutROSSizeMB and MoveOutSizePct,PurgeMergeoutPercent.

Better off can you post the definition of your TM ?

Raw Size

The raw size is useful for capacity planning or monitoring license utilization (it's the space the data would take up if it wasn't compressed). To get the raw data size of the entire database, you can either use the GET_COMPLIANCE_STATUS() function, or query the system table v_internal.license_audits.

The GET_COMPLIANCE_STATUS() function retrieves information about the most recent audit. An audit estimates the raw data size of the database and stores the information in v_internal.license_audits. By default, audits take place daily at 23:59 and can be configured or manually run.

Here's some example output:

dbadmin=> SELECT GET_COMPLIANCE_STATUS();
                            GET_COMPLIANCE_STATUS
---------------------------------------------------------------------------------

Raw Data Size: 4.83TB +/- 0.24TB
License Size : 30.00TB
Utilization  : 16%
Audit Time   : 2014-05-11 23:59:49.763799+00
Compliance Status : The database is in compliance with respect to raw data size.

License End Date: 10/30/2014
Days Remaining: 171.18

If you have permissions, you can directly query the license_audits table:

SELECT /*+ label(license_utilization)*/
       audit_start_timestamp, 
       database_size_bytes / ( 1024^3 ) AS database_size_gb, 
       license_size_bytes / ( 1024^3 )  AS license_size_gb, 
       usage_percent 
FROM   v_catalog.license_audits 
ORDER  BY audit_start_timestamp DESC 
LIMIT  30;

Further, if you wish to get raw data size at the schema level, you can use this (from vertica.tips):

SELECT /*+ label(estimated_raw_size)*/
       pj.anchor_table_schema, 
       pj.used_compressed_gb, 
       pj.used_compressed_gb * la.ratio AS raw_estimate_gb 
FROM   (SELECT ps.anchor_table_schema, 
               SUM(used_bytes) / ( 1024^3 ) AS used_compressed_gb 
        FROM   v_catalog.projections p 
               JOIN v_monitor.projection_storage ps 
                 ON ps.projection_id = p.projection_id 
        WHERE  p.is_super_projection = 't' 
        GROUP  BY ps.anchor_table_schema) pj 
       CROSS JOIN (SELECT (SELECT database_size_bytes 
                           FROM   v_catalog.license_audits 
                           ORDER  BY audit_start_timestamp DESC 
                           LIMIT  1) / (SELECT SUM(used_bytes) 
                                        FROM   V_MONITOR.projection_storage) AS ratio) la 
ORDER  BY pj.used_compressed_gb DESC;

Compressed Size

The compressed size is the actual size of the data on the disk. This is useful for estimating disk space usage as Vertica recommends that at least 40% of space is available at all times. You can get the compressed size from column_storage or projection_storage system tables.

Using projection_storage will also return any empty tables (from vertica.tips):

SELECT /*+ label(compressed_table_size)*/
       anchor_table_schema, 
       anchor_table_name, 
       SUM(used_bytes) / ( 1024^3 ) AS used_compressed_gb 
FROM   v_monitor.projection_storage 
GROUP  BY anchor_table_schema, 
          anchor_table_name 
ORDER  BY SUM(used_bytes) DESC;

Why does the Tuple Mover in C-Store consider only rows older than LWM

Given that the referenced paper is 10 years old, I would recommend looking at a The Vertica Analytic Database: C-Store 7 Years Later since Vertica has more automatic epoch advancement mechanisms.

For reference, the acronyms used now are:

WOS - Write Optimized Store
ROS - Read Optimized Store
AHM - Ancient History Marker (Low Water Mark)
LGE - Last Good Epoch

A quick overview of how epoch's work in Vertica:

I could not understand, when moving records from WS (write optimized storage) to RS (read optimized storage), why the tuple mover considers only records older than LWM?

Vertica will automatically advance the epoch as a background process. In the example below, once data is committed, it will belong to the current epoch.

-- Get the current epoch
dbadmin=> SELECT CURRENT_EPOCH FROM system;
 CURRENT_EPOCH
---------------
           238
(1 row)

-- Insert a row into the table without committing (WOS)
dbadmin=> INSERT INTO tbl (a) VALUES (1);
 OUTPUT
--------
      1
(1 row)

-- Get the epoch for the row
dbadmin=> SELECT a, epoch FROM tbl;
 a | epoch
---+-------
 1 |
(1 row)

-- Commit the insert
dbadmin=> COMMIT;
COMMIT

-- Get the epoch for the row
dbadmin=> SELECT a, epoch FROM tbl;
 a | epoch
---+-------
 1 |   238
(1 row)

Doesn't this mean that all the rows that were inserted in the system after LWM would only be in WS?

It does not. WOS is just a temporary storage location until the data gets moved to ROS. The epoch is just a way to manage transactions.

Best Answer

Related Solutions

Vertica – How to Get the Size of a Database

Raw Size

Compressed Size

Why does the Tuple Mover in C-Store consider only rows older than LWM