Postgres Transition Table – What is it and How to Use?

postgresqlpostgresql-10tabletrigger

The page describing what's new in Postgres 10 mentions “Transition Tables for Triggers”.

Transition Tables for Triggers

This feature makes AFTER STATEMENT triggers both useful and performant by exposing, as appropriate, the old and new rows to queries. Before this feature, AFTER STATEMENT triggers had no direct access to these, and the workarounds were byzantine and had poor performance. Much trigger logic can now be written as AFTER STATEMENT, avoiding the need to do the expensive context switches at each row that FOR EACH ROW triggers require.

What is a transition table?

Best Answer

You know how there are OLD and NEW record-variables for FOR EACH ROW triggers?

Transition tables are the FOR EACH STATEMENT equivalents. They're tables with the old and new tuples, so your triggers can see what changed.

Related Solutions

MySQL Trigger vs Stored Procedure Performance Comparison

For the sake of simplicity, triggers are the way to go for implementing any kind of tracking of database changes. However, you need to be aware of what happens under the hood when you use triggers.

According to MySQL Stored Procedure Programming, page 256 under the head "Trigger Overhead" says the following:

It is important to remember that, by necessity, triggers add overhead to the DML statement to which they apply. the actual amount of overhead will depend upon the nature of the trigger, but --- as all MySQL triggers execute FOR EACH ROW --- the overhead can rapidly accumulate for statements that process large numbers of rows. You should therefore avoid placing any expensive SQL statements or procedural code in triggers.

An expanded explanation of trigger overhead is given on pages 529-531. The conclulding point from that section states the following:

The lesson here is this: since the trigger code will execute once for every row affected by a DML statement, the trigger can easily become the most significant factor in DML performance. Code inside the trigger body needs to be as lightweight as possible and -- in particular -- any SQL statements in the trigger should be supported by indexes whenever possible.

Not mentioned in the book is another factor when using triggers: When it comes to audit logging, please be aware of what you log data into. I say this because should you choose to log to a MyISAM table, each INSERT into a MyISAM table produces a full table lock during the INSERT. This can become a serious bottleneck in a high-traffic, high-transaction environment. Additionally, if the trigger is against an InnoDB table and you log changes in MyISAM from within the trigger, this will secretly disabled ACID compliance (i.e., reduce block transactions to autocommit behavior), which is cannot be rolled back.

When using triggers on InnoDB tables and logging changes

The table you log to is also InnoDB
You have autocommit turned off
You setup START TRANSACTION...COMMIT/ROLLBACK blocks thoroughly

In this way, audit logs can benefit from COMMIT/ROLLBACK as would main tables.

Concerning using stored procedures, you would have to painstakingly call the stored procedure at every point of DML against the table being tracked. One could easily miss logging changes in the face of tens of thousands of lines of application code. Placing such code in a trigger eliminates finding all those DML statements.

CAVEAT

Depending on how complex the trigger is, it can still be a bottleneck. If you want to reduce bottlenecks in audit logging, there is something you can do. However, it will require a little infrastructure change.

Using commodity hardware, create two more DB Servers

Distribution master (DM) to relay audit info only
MySQL Server for Auditing (AU)

This will server to reduce write I/O on the main database (MD) due to audit logging. Here is how you can accomplish it:

Step 01) Turn on binary logging in the main database.

Step 02) Using an inexpensive server, setup MySQL (same version as MD) with binary logging enabled. This will be DM. Setup replication from MD to DM.

Step 03) Using a second inexpensive server, setup MySQL (same version as MD) with binary logging disabled. Setup each audit table to use --replicate-do-table. This will be AU. Setup replication from DM to AU.

Step 04) mysqldump the table structures from MD and load it into DM and AU.

Step 05) Convert all audit tables in MD to use the BLACKHOLE storage engine

Step 06) Convert all tables in DM and AU to use the BLACKHOLE storage engine

Step 07) Convert all audit tables in AU to use the MyISAM storage engine

When done

DM will replicate from MD and record stuff in its binary log only
With --replicate-do-table filter on all audit tables, AU will replicate from DM

What this does is store audit info on a separate DB server and also reduce any write I/O degradation that MD would normally have.

Postgresql – how to chain postgres RULEs

Next time, please include the EXPLAIN output rather than making us dig for it in your scripts. There's no guarantee my system is using the same plan as yours (although with your test data it is likely).

The rule system here is working properly. First, the I want to include my own diagnostic queries (note I did not run EXPLAIN ANALYSE since I was just interested in what query plan was generated):

rulestest=# explain DELETE FROM user_hits WHERE day = '2013-03-16';
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Delete on application_hits  (cost=0.00..3953181.85 rows=316094576 width=24)
   ->  Nested Loop  (cost=0.00..3953181.85 rows=316094576 width=24)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..128.53 rows=6352 width=22)
               ->  Nested Loop  (cost=0.00..96.78 rows=6352 width=22)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..2.49 rows=16 width=16)
                           ->  Nested Loop  (cost=0.00..2.41 rows=16 width=16)
                                 ->  Seq Scan on application_hits  (cost=0.00..1
.10 rows=4 width=10)
                                       Filter: (day = '2013-03-16'::date)
                                 ->  Materialize  (cost=0.00..1.12 rows=4 width=
10)
                                       ->  Seq Scan on client_hits  (cost=0.00..
1.10 rows=4 width=10)
                                             Filter: (day = '2013-03-16'::date)

 Delete on client_hits  (cost=0.00..989722.41 rows=79023644 width=18)
   ->  Nested Loop  (cost=0.00..989722.41 rows=79023644 width=18)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..43.83 rows=1588 width=16)
               ->  Nested Loop  (cost=0.00..35.89 rows=1588 width=16)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..1.12 rows=4 width=10)
                           ->  Seq Scan on client_hits  (cost=0.00..1.10 rows=4 
width=10)
                                 Filter: (day = '2013-03-16'::date)

 Delete on project_hits  (cost=0.00..248851.80 rows=19755911 width=12)
   ->  Nested Loop  (cost=0.00..248851.80 rows=19755911 width=12)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..16.91 rows=397 width=10)
               ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 width=10
)
                     Filter: (day = '2013-03-16'::date)

 Delete on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
   ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
         Filter: (day = '2013-03-16'::date)
(39 rows)

rulestest=# select distinct day from application_hits;
    day     
------------
 2013-03-15
 2013-03-16
(2 rows)

rulestest=# select count(*), day from application_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from client_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from project_hits group by day;
 count |    day     
-------+------------
   397 | 2013-03-15
   397 | 2013-03-16
(2 rows)

If your data is anything like your existing data, neither rules nor triggers will work very well. Better will be a stored procedure which you pass a value and it deletes everything you want.

First let's note that indexes here will get you nowhere because in all cases you are pulling half of the tables (I did add indexes on day on all tables to help the planner but this made no real difference).

You need to start with what you are doing with RULEs. RULEs basically rewrite queries and they do so using ways that are as robust as possible. Your code also doesn't match your example though it matches your question better. You have rules on tables which cascade to rules on other tables which cascade to rules on other tables

Therefore when you delete from user_hits where [criteria], the rules transform this into a set of queries:

DELETE FROM application_hits 
 WHERE day IN (SELECT day FROM client_hits 
               WHERE day IN (SELECT day FROM user_hits WHERE [condition]));
DELETE FROM client_hits
  WHERE day IN (SELECT day FROM user_hits WHERE [condition]);
DELETE FROM user_hits WHERE [condition];

Now, you might think we could skip the scan on client_hits in the first, but that isn't what happens here. The problem is that you could have days in user_hits and application_hits that are not in client_hits so you really have to scan all tables.

Now here there is no magic bullet. A trigger isn't going to work much better because, while it gets to avoid scanning every table, it gets fired every row that gets deleted so you basically end up with the same nested loop sequential scans that are currently killing performance. It will work a bit better because it will delete rows along the way rather than rewriting the query along the way, but it isn't going to perform very well.

A much better solution is to just define a stored procedure and have the application call that. Something like:

CREATE OR REPLACE FUNCTION delete_stats_at_date(in_day date) RETURNS BOOL 
LANGUAGE SQL AS
$$
DELETE FROM application_hits WHERE day = $1;
DELETE FROM project_hits WHERE day = $1;
DELETE FROM client_hits WHERE day  = $1;
DELETE FROM user_hits WHERE day = $1;
SELECT TRUE;
$$;

On the test data this runs in 280 ms on my laptop.

One of the hard things regarding RULEs is remembering what they are and noting that the computer cannot, in fact, read your mind. This is why I would not consider them a beginner's tool.

Best Answer

Related Solutions

MySQL Trigger vs Stored Procedure Performance Comparison

Postgresql – how to chain postgres RULEs

Related Question