Why does the Tuple Mover in C-Store consider only rows older than LWM

columnstorevertica

In the Tuple Mover section of the C-Store paper by Michael Stonebraker (link: http://db.csail.mit.edu/projects/cstore/vldb.pdf) the following is described:

MOP (merge out process) will find all records in the chosen WS segment with an insertion time at or before the LWM (low water mark; a timestamp order/epoch value) […] The most recent insertion time of a record in RS’ becomes the segment’s new t_lastmove and is always less than or equal to the LWM. […] Hence, LWM “chases” HWM (high water mark), and the delta between them is chosen to mediate between the needs of users who want historical access and the WS space constraints.

I could not understand, when moving records from WS (write optimized storage) to RS (read optimized storage), why the tuple mover considers only records older than LWM? Doesn't this mean that all the rows that were inserted in the system after LWM would only be in WS? In a system, with small LWM, i.e, in a system where old historical queries are supported, this may mean that much of the records would be in WS only and we would miss out all the optimizations provided by the read optimized storage.

Am I missing something?

Best Answer

Given that the referenced paper is 10 years old, I would recommend looking at a The Vertica Analytic Database: C-Store 7 Years Later since Vertica has more automatic epoch advancement mechanisms.

For reference, the acronyms used now are:

WOS - Write Optimized Store
ROS - Read Optimized Store
AHM - Ancient History Marker (Low Water Mark)
LGE - Last Good Epoch

A quick overview of how epoch's work in Vertica:

I could not understand, when moving records from WS (write optimized storage) to RS (read optimized storage), why the tuple mover considers only records older than LWM?

Vertica will automatically advance the epoch as a background process. In the example below, once data is committed, it will belong to the current epoch.

-- Get the current epoch
dbadmin=> SELECT CURRENT_EPOCH FROM system;
 CURRENT_EPOCH
---------------
           238
(1 row)

-- Insert a row into the table without committing (WOS)
dbadmin=> INSERT INTO tbl (a) VALUES (1);
 OUTPUT
--------
      1
(1 row)

-- Get the epoch for the row
dbadmin=> SELECT a, epoch FROM tbl;
 a | epoch
---+-------
 1 |
(1 row)

-- Commit the insert
dbadmin=> COMMIT;
COMMIT

-- Get the epoch for the row
dbadmin=> SELECT a, epoch FROM tbl;
 a | epoch
---+-------
 1 |   238
(1 row)

Doesn't this mean that all the rows that were inserted in the system after LWM would only be in WS?

It does not. WOS is just a temporary storage location until the data gets moved to ROS. The epoch is just a way to manage transactions.

Related Solutions

Why does CURRENT_LOAD_SOURCE() return “Batch No. 1” instead of the actual source file name

~~This certainly looks like a bug, whether you want to characterize it as an oversight, unfinished feature, or something else.~~

Vertica support tells me this is a known issue and a fix is being tracked as part of VER-36735. (Unfortunately, their issue tracker is not visible to the public.)

The problem appears to be related to the LOCAL clause, which is used to load files that are on the client machine as opposed to the server.

If you remove the LOCAL clause (and put the files you are loading on the server), then CURRENT_LOAD_SOURCE() will return the file name of the file being loaded, as expected.

COPY table1 (
    col1,
    col2,
    file_name AS CURRENT_LOAD_SOURCE()
)
FROM :src_file  -- no LOCAL!
REJECTED DATA :rejected_file
EXCEPTIONS :exceptions_file
SKIP 1;

If you're stuck with the LOCAL, then your best bet is to simply pass in a new variable (e.g. :file_name) that you explicitly set with the name of the file you are loading. You'll just need to be sure to provide the base name of the file (as opposed to the full path), to match the intended behavior of CURRENT_LOAD_SOURCE().

Columnstore Index Not Filling Entire Rowgroup – Azure SQL Data Warehouse

There is an Extended Event that fires when a columnstore group is being 'cut', I think the event is column_store_index_build_process_segment. This event will have a 'trim' reason, and you should look for two possible trim causes:

memory (unlikely for your case), if the index build does not have enough memory to build a segment.
dictionary size (the likely cause), this happens if the dictionaries used to encode the data in a column segment reach the maximum size (16Mb).

Of course, to capture this event you need to set up an XE session during the index build (the linked article shows how).

You could also look at the post-build artifacts, specifically at the dictionaries, and see if the secondary dictionaries associated with the small segment are already at full size (16Mb). This would indicate the likely cause of trim to be full dictionary.