Postgresql – Update a view with daily tables in Postgres

postgresql

I have a view like this:

CREATE OR REPLACE VIEW some_table AS
select * from h20180101
union all
select * from h20180102
union all
select * from h20180103
etc...

It's a union daily tables. The thing is each day a new table is created. Is there a way in PL/PgSQL to automatically update the script daily to make a union to the table of the previous day?

Thanks!

Best Answer

If you are on postgresql 10, the best way to handle this is to use partitioning so you have a single table instead of a view. On lower versions you can do partitioning, but it is not declarative (i.e. you have to do a lot of the work yourself).

Can't think of a way to do this from within Postgres/Pl/PgSQL, but you could do something like this:

Create a shell script that you run in psql to output the list of tables. Something like psql -c "\dt h2018*" or the equivalent query looking at system tables
Use the output to send it to AWK, or your favorite scripting language, and then build a new .sql file and then call that with psql. You likely can do it with a scripting language completely to build the query, but likely the above is the easiest and fastest way to accomplish what you want.

Related Solutions

Postgresql – how to chain postgres RULEs

Next time, please include the EXPLAIN output rather than making us dig for it in your scripts. There's no guarantee my system is using the same plan as yours (although with your test data it is likely).

The rule system here is working properly. First, the I want to include my own diagnostic queries (note I did not run EXPLAIN ANALYSE since I was just interested in what query plan was generated):

rulestest=# explain DELETE FROM user_hits WHERE day = '2013-03-16';
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Delete on application_hits  (cost=0.00..3953181.85 rows=316094576 width=24)
   ->  Nested Loop  (cost=0.00..3953181.85 rows=316094576 width=24)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..128.53 rows=6352 width=22)
               ->  Nested Loop  (cost=0.00..96.78 rows=6352 width=22)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..2.49 rows=16 width=16)
                           ->  Nested Loop  (cost=0.00..2.41 rows=16 width=16)
                                 ->  Seq Scan on application_hits  (cost=0.00..1
.10 rows=4 width=10)
                                       Filter: (day = '2013-03-16'::date)
                                 ->  Materialize  (cost=0.00..1.12 rows=4 width=
10)
                                       ->  Seq Scan on client_hits  (cost=0.00..
1.10 rows=4 width=10)
                                             Filter: (day = '2013-03-16'::date)

 Delete on client_hits  (cost=0.00..989722.41 rows=79023644 width=18)
   ->  Nested Loop  (cost=0.00..989722.41 rows=79023644 width=18)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..43.83 rows=1588 width=16)
               ->  Nested Loop  (cost=0.00..35.89 rows=1588 width=16)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..1.12 rows=4 width=10)
                           ->  Seq Scan on client_hits  (cost=0.00..1.10 rows=4 
width=10)
                                 Filter: (day = '2013-03-16'::date)

 Delete on project_hits  (cost=0.00..248851.80 rows=19755911 width=12)
   ->  Nested Loop  (cost=0.00..248851.80 rows=19755911 width=12)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..16.91 rows=397 width=10)
               ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 width=10
)
                     Filter: (day = '2013-03-16'::date)

 Delete on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
   ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
         Filter: (day = '2013-03-16'::date)
(39 rows)

rulestest=# select distinct day from application_hits;
    day     
------------
 2013-03-15
 2013-03-16
(2 rows)

rulestest=# select count(*), day from application_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from client_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from project_hits group by day;
 count |    day     
-------+------------
   397 | 2013-03-15
   397 | 2013-03-16
(2 rows)

If your data is anything like your existing data, neither rules nor triggers will work very well. Better will be a stored procedure which you pass a value and it deletes everything you want.

First let's note that indexes here will get you nowhere because in all cases you are pulling half of the tables (I did add indexes on day on all tables to help the planner but this made no real difference).

You need to start with what you are doing with RULEs. RULEs basically rewrite queries and they do so using ways that are as robust as possible. Your code also doesn't match your example though it matches your question better. You have rules on tables which cascade to rules on other tables which cascade to rules on other tables

Therefore when you delete from user_hits where [criteria], the rules transform this into a set of queries:

DELETE FROM application_hits 
 WHERE day IN (SELECT day FROM client_hits 
               WHERE day IN (SELECT day FROM user_hits WHERE [condition]));
DELETE FROM client_hits
  WHERE day IN (SELECT day FROM user_hits WHERE [condition]);
DELETE FROM user_hits WHERE [condition];

Now, you might think we could skip the scan on client_hits in the first, but that isn't what happens here. The problem is that you could have days in user_hits and application_hits that are not in client_hits so you really have to scan all tables.

Now here there is no magic bullet. A trigger isn't going to work much better because, while it gets to avoid scanning every table, it gets fired every row that gets deleted so you basically end up with the same nested loop sequential scans that are currently killing performance. It will work a bit better because it will delete rows along the way rather than rewriting the query along the way, but it isn't going to perform very well.

A much better solution is to just define a stored procedure and have the application call that. Something like:

CREATE OR REPLACE FUNCTION delete_stats_at_date(in_day date) RETURNS BOOL 
LANGUAGE SQL AS
$$
DELETE FROM application_hits WHERE day = $1;
DELETE FROM project_hits WHERE day = $1;
DELETE FROM client_hits WHERE day  = $1;
DELETE FROM user_hits WHERE day = $1;
SELECT TRUE;
$$;

On the test data this runs in 280 ms on my laptop.

One of the hard things regarding RULEs is remembering what they are and noting that the computer cannot, in fact, read your mind. This is why I would not consider them a beginner's tool.

Postgresql – In Postgres, How to merge two tables

One approach could be to use a writable view to transform results in-query, eg

ALTER TABLE utilities.labpreferences
    RENAME TO utilities.old_labpreferences

CREATE TABLE utilities.labpreferences
( utilities.old_labpreferences EXCLUDING DEFAULTS )
WITHOUT OIDS;

CREATE OR REPLACE RULE "labpreferences_select" AS
  ON SELECT TO utilities.labpreferences
  DO INSTEAD
  SELECT -- fields with modifications --
    FROM utilities.old_labpreferences
    WHERE s.id like 'My.%'
  UNION
  SELECT -- fields without modification --
    FROM utilities.old_labpreferences
    WHERE s.id not like 'My.%'
    ;

Backing the changes out is as simple as removing the rule and new (empty) table, and renaming the old table back.

You will also need rules for update/insert/delete to this table - the postgres documentation for DO INSTEAD is pretty good on this front - see http://www.postgresql.org/docs/devel/static/rules-views.html

Best Answer

Related Solutions

Postgresql – how to chain postgres RULEs

Postgresql – In Postgres, How to merge two tables

Related Question