I have a table where I periodically record the throughput of all network interfaces on a set of servers:
create table net (
ts timestamptz not null,
host text not null,
interface text not null,
recv_bytes bigint not null
);
create index on net (ts);
insert into net (ts, host, interface, recv_bytes) values
('2017-01-01 00:00:00+00', 'a', 'eth0', 500),
('2017-01-01 00:00:00+00', 'b', 'eth1', 2000),
('2017-01-01 00:00:01+00', 'b', 'eth0', 1000), -- measurements arrive with some jitter, +1s here
('2017-01-01 00:00:02+00', 'c', 'eth0', 100), -- only present in this interval
('2017-01-01 00:00:04+00', 'b', 'eth1', 1900),
('2017-01-01 00:00:05+00', 'a', 'eth0', 550),
('2017-01-01 00:00:05+00', 'b', 'eth0', 1200),
('2017-01-01 00:00:10+00', 'a', 'eth0', 600),
('2017-01-01 00:00:10+00', 'b', 'eth0', 1500),
('2017-01-01 00:00:11+00', 'b', 'eth1', 1900),
('2017-01-01 00:00:15+00', 'a', 'eth0', 600),
('2017-01-01 00:00:15+00', 'b', 'eth1', 1400),
('2017-01-01 00:00:16+00', 'b', 'eth0', 1400),
('2017-01-01 00:00:16+00', 'b', 'eth1', 1700); -- (b,eth1) appears 3 times in this interval
I would like to find the total throughput across all interfaces for a given time range, averaged over 10 second intervals. If a (host, interface) pair is missing data for a given interval, it can simply be omitted. This is what I came up with:
select ts_interval, host, sum(recv_bytes) as recv_bytes
from (
select
to_timestamp(floor(extract(epoch from ts) / 10) * 10) as ts_interval,
host,
interface,
avg(recv_bytes) as recv_bytes
from net
group by ts_interval, host, interface
) avg_net
where ts_interval >= '2017-01-01 00:00:00+00' and ts_interval < '2017-01-01 00:00:20+00'
group by ts_interval, host
order by ts_interval, host;
Result:
ts_interval | host | recv_bytes
------------------------+------+-----------------------
2017-01-01 00:00:00+00 | a | 525.0000000000000000
2017-01-01 00:00:00+00 | b | 3050.0000000000000000
2017-01-01 00:00:00+00 | c | 100.0000000000000000
2017-01-01 00:00:10+00 | a | 600.0000000000000000
2017-01-01 00:00:10+00 | b | 3116.6666666666666667
This query seems overly verbose. The essence of that I'm trying to do is to sum over one dimension and average over the other. In pseudo-SQL:
select
to_timestamp(floor(extract(epoch from ts) / 10) * 10) as ts_interval,
host,
sum(avg(recv_bytes OVER ts_interval) OVER host) as recv_bytes
from net
where ts >= '2017-01-01 00:00:00+00' and ts < '2017-01-01 00:00:20+00'
group by ts_interval, host
order by ts_interval, host;
Is there a more concise way to do this?
Best Answer
Query
You can use
generate_series()
to generate a time raster to join to:Same result. Not less verbose, but much faster if you have an index on
(ts)
.Then again, the crucial point is to have "sargable" predicates, which you also get without
generate_series()
this way:Minor point: With exclusively positive numbers,
trunc()
is equivalent tofloor()
and a bit faster.I am using ordinal numbers, since you asked for "less verbose" code. But that's not the core of the question here ...
Core question
You can run a window function over an aggregate function in one query level (though it's not typically faster than using a subquery).
But not the other way round: you cannot run an aggregate function over a window function in one query level. Just not possible. Window functions happen after aggregate functions in SQL.
Related: