How bad is it to delete from a Greenplum AO table

greenplum

We have a few AO tables (which nowadays means "Append Optimized", not "Append Only"). Normally, we just insert into these tables.

As part of a significant system change, I need to do a fairly large delete from these tables. It's only about 2% of the data volume, and only over a small range of partitions. Eventually those partitions will roll off the end of our retention policy.

I can't find any information on the consequences of executing a delete on an AO table.

Best Answer

The main difference between heap table and AO tables is how the visibility data is stored. In heap tables visibility information is the header of each record, AO tables have a separate visibility map structure.

Once there have been UPDATE or DELETE operations on an AO table this visibility map needs to be processed. This will incur a very small addition load for reading the map and processing it. The processing is just a set/not-set check so it's a lot lighter than the range check for MVCC visibility on heap tables.

With a small percentage of rows being dropped the effects on scan operations will be negligible. If you VACUUM an AO table where there have been UPDATE or DELETE operations then the table will be compacted, this will create a lot of disk IO and may require quite a bit of space.