PostgreSQL – Creating a Database View Listing Distinct Days

materialized-viewpostgresqlpythonview

I have a list of sensors of many machines, that stores on a second base (i.e., 86400 records each day for each machine). One of the many columns is the datetime of the value (that can also be stored separately as date and time).
I'm using PostgreSQL 9.5.

Frequently, I need to find out what dates have sensor's records available for each machine.

I thought about creating a database view of distinct dates for each machine, but after searching for "materialized views" I think I'm ok with updating this list once in a while each day, as data is stored daily.

Would anyone give any advice about doing it?

The table is similar to

ID | Machine_code | Sensor_1 | Sensor_2 | Sensor3 | Datetime
1  | Drimmer      | 12       |  15.8    | 13.4    | 2016-03-03 10:00:03 GMT+1

If it helps anyhow, I'm using Django Framework in front of it.

Best Answer

create materialized view foo as 
  select distinct machine_code, date(datetime) from thing;

create unique index on foo(machine_code,date);

Beware that the conversion of datetime to date will occur in the timezone of whoever last created or refreshed the MV. You might want to use this instead:

create materialized view foo as 
  select distinct machine_code, date(datetime at time zone 'GMT+1') from thing;

But that might not do what you want with daylight savings time.

Related Solutions

Postgresql – how to chain postgres RULEs

Next time, please include the EXPLAIN output rather than making us dig for it in your scripts. There's no guarantee my system is using the same plan as yours (although with your test data it is likely).

The rule system here is working properly. First, the I want to include my own diagnostic queries (note I did not run EXPLAIN ANALYSE since I was just interested in what query plan was generated):

rulestest=# explain DELETE FROM user_hits WHERE day = '2013-03-16';
                                              QUERY PLAN                        

--------------------------------------------------------------------------------
----------------------
 Delete on application_hits  (cost=0.00..3953181.85 rows=316094576 width=24)
   ->  Nested Loop  (cost=0.00..3953181.85 rows=316094576 width=24)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..128.53 rows=6352 width=22)
               ->  Nested Loop  (cost=0.00..96.78 rows=6352 width=22)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..2.49 rows=16 width=16)
                           ->  Nested Loop  (cost=0.00..2.41 rows=16 width=16)
                                 ->  Seq Scan on application_hits  (cost=0.00..1
.10 rows=4 width=10)
                                       Filter: (day = '2013-03-16'::date)
                                 ->  Materialize  (cost=0.00..1.12 rows=4 width=
10)
                                       ->  Seq Scan on client_hits  (cost=0.00..
1.10 rows=4 width=10)
                                             Filter: (day = '2013-03-16'::date)

 Delete on client_hits  (cost=0.00..989722.41 rows=79023644 width=18)
   ->  Nested Loop  (cost=0.00..989722.41 rows=79023644 width=18)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..43.83 rows=1588 width=16)
               ->  Nested Loop  (cost=0.00..35.89 rows=1588 width=16)
                     ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 wi
dth=10)
                           Filter: (day = '2013-03-16'::date)
                     ->  Materialize  (cost=0.00..1.12 rows=4 width=10)
                           ->  Seq Scan on client_hits  (cost=0.00..1.10 rows=4 
width=10)
                                 Filter: (day = '2013-03-16'::date)

 Delete on project_hits  (cost=0.00..248851.80 rows=19755911 width=12)
   ->  Nested Loop  (cost=0.00..248851.80 rows=19755911 width=12)
         ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=10)
               Filter: (day = '2013-03-16'::date)
         ->  Materialize  (cost=0.00..16.91 rows=397 width=10)
               ->  Seq Scan on project_hits  (cost=0.00..14.93 rows=397 width=10
)
                     Filter: (day = '2013-03-16'::date)

 Delete on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
   ->  Seq Scan on user_hits  (cost=0.00..1887.00 rows=49763 width=6)
         Filter: (day = '2013-03-16'::date)
(39 rows)

rulestest=# select distinct day from application_hits;
    day     
------------
 2013-03-15
 2013-03-16
(2 rows)

rulestest=# select count(*), day from application_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from client_hits group by day;
 count |    day     
-------+------------
     4 | 2013-03-15
     4 | 2013-03-16
(2 rows)

rulestest=# select count(*), day from project_hits group by day;
 count |    day     
-------+------------
   397 | 2013-03-15
   397 | 2013-03-16
(2 rows)

If your data is anything like your existing data, neither rules nor triggers will work very well. Better will be a stored procedure which you pass a value and it deletes everything you want.

First let's note that indexes here will get you nowhere because in all cases you are pulling half of the tables (I did add indexes on day on all tables to help the planner but this made no real difference).

You need to start with what you are doing with RULEs. RULEs basically rewrite queries and they do so using ways that are as robust as possible. Your code also doesn't match your example though it matches your question better. You have rules on tables which cascade to rules on other tables which cascade to rules on other tables

Therefore when you delete from user_hits where [criteria], the rules transform this into a set of queries:

DELETE FROM application_hits 
 WHERE day IN (SELECT day FROM client_hits 
               WHERE day IN (SELECT day FROM user_hits WHERE [condition]));
DELETE FROM client_hits
  WHERE day IN (SELECT day FROM user_hits WHERE [condition]);
DELETE FROM user_hits WHERE [condition];

Now, you might think we could skip the scan on client_hits in the first, but that isn't what happens here. The problem is that you could have days in user_hits and application_hits that are not in client_hits so you really have to scan all tables.

Now here there is no magic bullet. A trigger isn't going to work much better because, while it gets to avoid scanning every table, it gets fired every row that gets deleted so you basically end up with the same nested loop sequential scans that are currently killing performance. It will work a bit better because it will delete rows along the way rather than rewriting the query along the way, but it isn't going to perform very well.

A much better solution is to just define a stored procedure and have the application call that. Something like:

CREATE OR REPLACE FUNCTION delete_stats_at_date(in_day date) RETURNS BOOL 
LANGUAGE SQL AS
$$
DELETE FROM application_hits WHERE day = $1;
DELETE FROM project_hits WHERE day = $1;
DELETE FROM client_hits WHERE day  = $1;
DELETE FROM user_hits WHERE day = $1;
SELECT TRUE;
$$;

On the test data this runs in 280 ms on my laptop.

One of the hard things regarding RULEs is remembering what they are and noting that the computer cannot, in fact, read your mind. This is why I would not consider them a beginner's tool.

Sql-server – Creating Indexed View GROUP BY Epoch Date

I think there's some misunderstanding about what you're attempting to do here.

Since your current design is to return all 24 rows from the base table, presumably all the supplementary fields are returned as well (to display in a grid, or something).

In order to fully aggregate the Value column, all the supplementary columns cannot be included in the SELECT list. Alternatively, if those columns are included in the GROUP BY clause, the view would represent only a partial aggregation, as there would be one row for each unique combination of the columns in the GROUP BY column list.

The only way I see something like this being useful is if the supplementary columns aren't included in the view, and there is some other process that requires only the daily aggregated values, without the base row data. Such a view could be defined like this:

CREATE TABLE [dbo].[BaseTbl]
(
    ColRowID bigint NOT NULL,
    AggregateID int NOT NULL,
    Epoch int NOT NULL,
    CustomerID int NOT NULL,
    TypeID tinyint NOT NULL,
    ErrorID smallint NOT NULL,
    Value int NOT NULL,

    PRIMARY KEY CLUSTERED(Epoch, CustomerId)
);
GO

CREATE VIEW [dbo].[ixvw_AggTbl]
    WITH SCHEMABINDING
AS
    SELECT
        t.Epoch / 86400 AS EpochDay,
        CustomerID,
        TypeID,
        SUM(t.Value) AS TotalValue,
        COUNT_BIG(*) AS __RowCount
        FROM [dbo].[BaseTbl] t
        GROUP BY
            t.Epoch / 86400,
            CustomerID,
            TypeID;
GO

CREATE UNIQUE CLUSTERED INDEX IX_ixvw_AggTbl
    ON [dbo].[ixvw_AggTbl](EpochDay, CustomerID, TypeID);

Unfortunately, you can't go farther and convert the EpochDay column to an actual date within the indexed view because DATEADD is non-deterministic (see Aaron's comment below for why), so you'd have to convert it in the actual SELECT query against the view. But that's not too difficult.

In any event, as I said before, I'm not sure how useful this would be for your specific application.

Best Answer

Related Solutions

Postgresql – how to chain postgres RULEs

Sql-server – Creating Indexed View GROUP BY Epoch Date

Related Question