PostgreSQL – Automatically Update View Based on Daily Created Tables

database-designpartitioningpostgresql

Let's say I have a table that is created daily:

There is view that is like:

CREATE OR REPLACE VIEW someview
AS 
SELECT * FROM 20181227
UNION
SELECT * FROM 20181226
UNION
SELECT * FROM 20181225
...

Is there a way to replace that with another code that automatically grabs data from all those tables without having to make a union for each one? Right now there is a bash script that runs daily and regenerates the view with a new union daily, but that's inelegant.

Best Answer

Look to table partitioning. Ideally, use the latest version of Postgres (currently Postgres 11) since there have been major improvements in Postgres 10 and 11.

Postgres 11 allows RANGE, LIST and HASH partitioning. One big partition per day would suggest LIST partitioning based on a date column like:

CREATE TABLE foo (
  foo_date date NOT NULL
, foo_id   bigint NOT NULL GENERATED ALWAYS AS IDENTITY
, data     text
, PRIMARY KEY (foo_date, foo_id)
)
PARTITION BY LIST (foo_date);

CREATE TABLE foo_20181226 PARTITION OF foo FOR VALUES IN ('20181226');
CREATE TABLE foo_20181227 PARTITION OF foo FOR VALUES IN ('20181227');
-- etc.

(Alternatively, you might have a timestamp or timestamptz column and use RANGE partitioning for that.)

Then you can query the master table directly to automatically include all partitions:

SELECT * FROM foo;

Or (since your main concern seems to be short syntax):

TABLE foo;

See:

Is there a shortcut for SELECT * FROM?

Various optimizations are possible, with constraints, indexes, column defaults etc. depending on requirement details.

Be sure to read the linked chapter of the manual to understand various pros and cons. Yours should be the perfect use case (unless undisclosed requirements are in the way). In particular, you can easily and very quickly add and remove partitions with minimum interference with the rest of the table.

There are still limitations in the current implementation. In particular, partition pruning is great to improve performance, but there is room for improvement (currently in development). That said, it's probably going to nuke the performance of your view for queries that don't need to involve all tables (partitions), since the view is going to consider all union'ed tables every time. A potentially faster alternative for certain queries is (currently) picking relevant tables (partitions) in a custom UNION ALL query.

And don't use names consisting of only digits like in your example ~~20181225~~. Use legal, lower-case names starting with a letter, or you have to double quote the identifier at all times.

SQL identifiers and key words must begin with a letter (a-z, but also letters with diacritical marks and non-Latin letters) or an underscore (_). Subsequent characters in an identifier or key word can be letters, underscores, digits (0-9), or dollar signs ($). Note that dollar signs are not allowed in identifiers according to the letter of the SQL standard, so their use might render applications less portable. The SQL standard will not define a key word that contains digits or starts or ends with an underscore, so identifiers of this form are safe against possible conflict with future extensions of the standard.

Are PostgreSQL column names case-sensitive?

Related Solutions

PostgreSQL – Modify Existing search_path and Preserve Current Values

SELECT set_config('search_path', 'fred,'||current_setting('search_path'), false);

The false says it's not a transaction-LOCAL setting.

For the bonus question, you can store the value in a custom setting:

SELECT set_config('tmp.search_path', current_setting('search_path'), false);

From version 9.2 on, you don't even have to define this setting in postgresql.conf.

PostgreSQL – Maximum Partitioning Techniques

You need one partition for that many records. Not 1000. Certainly not 1000/year. This is not a problem that requires partitioning. It looks to me like you've decided on the solution before fully stating and analysing the problem.

Reading between the lines, it sounds like you're implementing a mulit-tenant system and have already decided that partitioning is the way to do that. Right?

If so: wrong approach. Start with a single table. Partition if/when you need to for performance and maintenance reasons. With a DB of this scale it is very unlikely that you will ever need to, it's tiny.

How many partitions are too many

Because the constraint exclusion code isn't super smart, try to stick to low partition counts. I prefer tens or hundreds at most.

Is having small partitions bad (could have less than 150 records per partition)

Yes, it's very wasteful in terms of planning and execution time.

possibly need to separate partitioned data legally (some cases but not all)

What's the difference between a partition and a single table with a composite key? I've never seen a legal or regulatory code that goes down to the level of actually specifying database structure, other than maybe PCI, and not in this way.

Details please.

is this the best approach, if not what are some other ideas

Use one table, a composite key, and some composite indexes. If useful/necessary, use partial indexes for sub-ranges.

Best Answer

Related Solutions

PostgreSQL – Modify Existing search_path and Preserve Current Values

PostgreSQL – Maximum Partitioning Techniques

Related Question