What can I expect performance-wise of such a trigger/partition scheme?
Context switches mean using triggers is always going to use a lot more CPU than a simple insert
. The script below can be used to quantify how much impact that will have - and also demonstrates auto-creating partitions using triggers and compares the performance either way.
Please note I haven't included any indexing, or any consideration of update
statements.
begin;
set role dba;
create role stack;
grant stack to dba;
create schema authorization stack;
set role stack;
--
--******** the above creates a nice clean schema as a test area
--
set client_min_messages to warning; --******** or you get a lot of "NOTICE: merging column "xyz" with inherited definition" notices
--
create table phpheap(log_id serial not null, host text not null, logdate date not null, message text not null); --******** This table is used to compare 'insert' performance with that on the partitioned version
create table php(log_id serial not null, host text not null, logdate date not null, message text not null);
--
create function php_host_insert() returns trigger language plpgsql security definer as $$
begin
set search_path to 'stack';
execute 'insert into php_'||new.host||'_'||to_char(new.logdate, 'YYYYmMM')||'(log_id, host, logdate, message) values($1, $2, $3, $4)' using new.log_id, new.host, new.logdate, new.message;
return null;
exception when undefined_table then
execute 'create table php_'||new.host||'_'||to_char(new.logdate, 'YYYYmMM')||'(log_id int not null, host text not null check(host='''||new.host||'''), logdate date not null check(to_char(logdate, ''YYYYmMM'')='''||to_char(new.logdate, 'YYYYmMM')||'''), message text not null) inherits (php_'||new.host||')';
execute 'insert into php_'||new.host||'_'||to_char(new.logdate, 'YYYYmMM')||'(log_id, host, logdate, message) values($1, $2, $3, $4)' using new.log_id, new.host, new.logdate, new.message;
return null;
end;$$;
--
create function php_insert() returns trigger language plpgsql security definer as $$
begin
set search_path to 'stack';
execute 'insert into php_'||new.host||'(log_id, host, logdate, message) values($1, $2, $3, $4)' using new.log_id, new.host, new.logdate, new.message;
return null;
exception when undefined_table then
execute 'create table php_'||new.host||'(log_id int not null, host text not null check(host='''||new.host||'''), logdate date not null, message text not null) inherits(php)';
execute 'create trigger trig_insert_php_'||new.host||' before insert on php_'||new.host||' for each row execute procedure php_host_insert()';
execute 'insert into php_'||new.host||'(log_id, host, logdate, message) values($1, $2, $3, $4)' using new.log_id, new.host, new.logdate, new.message;
return null;
end;$$;
--
create trigger trig_insert_php before insert on php for each row execute procedure php_insert();
--
\timing on
insert into phpheap(host, logdate, message) select 'host1', current_date-(generate_series(-99999, 0, 1)/1000)::integer, repeat('hello',20);
--******** output
--INSERT 0 100000
--Time: 1102.140 ms
insert into php(host, logdate, message) select 'host1', current_date-(generate_series(-99999, 0, 1)/1000)::integer, repeat('hello',20);
--******** output
--INSERT 0 0
--Time: 35615.498 ms
insert into php(host, logdate, message) select 'host1', current_date-(generate_series(-99999, 0, 1)/1000)::integer, repeat('hello',20);
--******** output
--INSERT 0 0
--Time: 34074.579 ms
\timing off
--
--******** Now we replace the trigger functions with the 'normal' kind that don't auto-create partitions
--
create or replace function php_host_insert() returns trigger language plpgsql security definer as $$
begin
set search_path to 'stack';
execute 'insert into php_'||new.host||'_'||to_char(new.logdate, 'YYYYmMM')||'(log_id, host, logdate, message) values($1, $2, $3, $4)' using new.log_id, new.host, new.logdate, new.message;
return null;
end;$$;
--
create or replace function php_insert() returns trigger language plpgsql security definer as $$
begin
set search_path to 'stack';
execute 'insert into php_'||new.host||'(log_id, host, logdate, message) values($1, $2, $3, $4)' using new.log_id, new.host, new.logdate, new.message;
return null;
end;$$;
--
\timing on
insert into php(host, logdate, message) select 'host1', current_date-(generate_series(-99999, 0, 1)/1000)::integer, repeat('hello',20);
--******** output
--INSERT 0 0
--Time: 28457.146 ms
\timing off
--
rollback;
Technically to display n-th value you use the ranking functions like RANK
, DENSE_RANK
or ROW_NUMBER
. Which one exactly depends, but what you describe matches best ROW_NUMBER
:
with cte as (
select row_number() over (order by Date) as rn, *
from table)
where rn % 1000 = 0;
But you are saying that you want o do this for performance reasons, avoid going through millions of rows. Such a query had already done the damage, it read from disk all the rows and had to sort them so the performance price was already payed.
A better alternative to selecting a random sample of data is to use the TABLESAMPLE
clause, see Limiting Result Sets by Using TABLESAMPLE:
select *
from table tablesample (100 rows);
TABLESAMPLE will be way more efficient as it actually avoids reading all the data, it only samples some of the pages in the table and returns all rows in the sampled pages.
But consider that if you use TABLESAMPLE your WHERE clause are applied after the sampling. So it may be that the sample does not contain any row for the device/sensor you're interested in. This is specially true for sensor with small data.
Best Answer
This is just a starting point and will likely need some tweaking.
Essentially this will get you the 4 closest capture dates to your specified date (2 closest after and 2 closest before).
You will need to add some logic to your outer select to pick which ones to use, but you will be doing a
DATEDIFF
on 4 fields instead of all of them.