In the Tuple Mover section of the C-Store paper by Michael Stonebraker (link: http://db.csail.mit.edu/projects/cstore/vldb.pdf) the following is described:
MOP (merge out process) will find all records in the chosen WS segment with an insertion time at or before the LWM (low water mark; a timestamp order/epoch value) […] The most recent insertion time of a record in RS’ becomes the segment’s new t_lastmove and is always less than or equal to the LWM. […] Hence, LWM “chases” HWM (high water mark), and the delta between them is chosen to mediate between the needs of users who want historical access and the WS space constraints.
I could not understand, when moving records from WS (write optimized storage) to RS (read optimized storage), why the tuple mover considers only records older than LWM? Doesn't this mean that all the rows that were inserted in the system after LWM would only be in WS? In a system, with small LWM, i.e, in a system where old historical queries are supported, this may mean that much of the records would be in WS only and we would miss out all the optimizations provided by the read optimized storage.
Am I missing something?
Best Answer
Given that the referenced paper is 10 years old, I would recommend looking at a The Vertica Analytic Database: C-Store 7 Years Later since Vertica has more automatic epoch advancement mechanisms.
For reference, the acronyms used now are:
A quick overview of how epoch's work in Vertica:
Vertica will automatically advance the epoch as a background process. In the example below, once data is committed, it will belong to the current epoch.
It does not. WOS is just a temporary storage location until the data gets moved to ROS. The epoch is just a way to manage transactions.