I want to group records that have the same combination of class
,
ip_address
, and hostname
, and keep the highest timestamp
for each day
from each group.
Not using the column name timestamp
(like you shouldn't either). It's a reserved word in SQL and a basic type name in Postgres. Using ts
instead.
The query is surprisingly simple with DISTINCT ON
:
SELECT DISTINCT ON (class, ip_address, hostname, ts::date) *
FROM agent_log
WHERE ts < now() - interval '7 days'
ORDER BY class, ip_address, hostname, ts::date, ts DESC;
Detailed explanation:
Select first row in each GROUP BY group?
Since the seconds are from 0 (1970-01-01 00:00:00 UTC
), you should look for every multiple of 60
SELECT * FROM mytable WHERE MOD(TimeStamp,60)=0;
or if TimeStamp is indexed, you can do
SELECT T.* FROM
(SELECT TimeStamp FROM mytable WHERE MOD(TimeStamp,60)=0) M
INNER JOIN mytable T USING (TimeStamp);
Give it a Try !!!
SUGGESTION #1
You should store the timestamp of the minute and index it
ALTER TABLE mytable ADD COLUMN MinuteTimeStamp AFTER TimeStamp;
UPDATE mytable SET MinuteTimeStamp = TimeStamp - MOD(TimeStamp,60);
ALTER TABLE mytable ADD INDEX MinuteTimeStamp_UniqueKey_ndx (MinuteTimeStamp,UniqueKey);
Then, you can do MIN aggregation on MinuteTimeStamp.
SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM mytable GROUP BY MinuteTimeStamp;
and use it get those records
SELECT B.* FROM
(SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM mytable GROUP BY MinuteTimeStamp) A
INNER JOIN mytable B USING (UniqueKey);
It was tactfully pointed out that triggers would degrade performance
Perhaps doing INSERTs like this may help
INSERT INTO mytable (UniqueKey,TimeStamp,MinuteTimeStamp) VALUES
(
uniquevalue,
UNIX_TIMESTAMP(NOW()),
UNIX_TIMESTAMP(NOW() - INTERVAL SECOND(NOW()) SECOND)
);
SUGGESTION #2
Since you have over 1000 columns (Ugh), perhaps a table of those minute timestamps would be better.
CREATE TABLE MinuteKeys
(
MinuteTimeStamp INT UNSIGNED NOT NULL,
UniqueKey INT UNSIGNED NOT NULL,
PRIMARY KEY (UniqueKey)
KEY MinuteTimeStamp_UniqueKey_ndx (MinuteTimeStamp,UniqueKey)
) ENGINE=MyISAM;
ALTER TABLE MinuteKeys DISABLE KEYS;
INSERT INTO MinuteKeys SELECT TimeStamp - MOD(TimeStamp,60),UniqueKey FROM mytable;
ALTER TABLE MinuteKeys ENABLE KEYS;
Then, you could use that table for the aggregation
SELECT B.* FROM
(SELECT MinuteTimeStamp,MIN(UniqueKey) UniqueKey
FROM MinuteKeys GROUP BY MinuteTimeStamp) A
INNER JOIN mytable B USING (UniqueKey);
EPILOGUE
Other suggestions are possible but you should really consider normalization of the table
See my post Too many columns in MySQL as to why
Best Answer
You can use FULL JOIN like this:
Or just use CASE: