Edit: added create/insert sql at the bottom
I need to extract data from a table (in postgres), where I want to look at nearby records within a timespan
The table looks like this:
objectid | keyword | date_created | ip
----------+----------------+---------------------+-----------------
1 | guard | 2014-09-03 12:29:00 | 123.123.123.123
2 | dog | 2014-09-03 12:30:00 | 123.123.123.123
3 | police | 2014-09-03 12:40:00 | 123.123.123.123
4 | swan | 2014-09-03 12:45:00 | 123.123.123.123
5 | goose | 2014-09-03 12:46:00 | 123.123.123.123
6 | golf | 2014-09-04 03:15:00 | 123.123.123.123
7 | police | 2014-11-03 12:45:00 | 11.11.11.11
8 | police hunt | 2014-11-03 12:46:00 | 11.11.11.11
9 | police officer | 2014-11-03 12:47:00 | 11.11.11.11
10 | police dog | 2014-11-03 12:48:00 | 11.11.11.11
11 | house | 2014-11-03 12:58:00 | 11.11.11.11
12 | police hunt | 2013-02-03 03:01:00 | 11.11.11.11
13 | police | 2013-02-03 03:05:00 | 11.11.11.11
14 | police office | 2013-02-03 03:07:00 | 11.11.11.11
15 | wolf | 2013-02-03 06:28:00 | 11.11.11.11
16 | fox | 2013-02-03 06:48:00 | 11.11.11.11
(16 rows)
I would like to extract nearby records for specific keywords,
e.g. I want to extract the one nearest records for the keyword "police", if the nearest record is < 30 min from the datestamp of police and has the same ip,
so above that would return
2 | dog | 2014-09-03 12:30:00 | 123.123.123.123
3 | police | 2014-09-03 12:40:00 | 123.123.123.123
4 | swan | 2014-09-03 12:45:00 | 123.123.123.123
7 | police | 2014-11-03 12:45:00 | 11.11.11.11
8 | police hunt | 2014-11-03 12:46:00 | 11.11.11.11
12 | police hunt | 2013-02-03 03:01:00 | 11.11.11.11
13 | police | 2013-02-03 03:05:00 | 11.11.11.11
14 | police office | 2013-02-03 03:07:00 | 11.11.11.11
I am really just interested in the keywords for now, if it makes a difference, so for "police" I be happy with a list:
dog, swan, police hunt, police office
Also, all records (not just the one closest) within the 30 min timespan could be interesting, so i would like to look at that as well, e.g.
for "police" that would return
1 | guard | 2014-09-03 12:29:00 | 123.123.123.123
2 | dog | 2014-09-03 12:30:00 | 123.123.123.123
3 | police | 2014-09-03 12:40:00 | 123.123.123.123
4 | swan | 2014-09-03 12:45:00 | 123.123.123.123
5 | goose | 2014-09-03 12:46:00 | 123.123.123.123
7 | police | 2014-11-03 12:45:00 | 11.11.11.11
8 | police hunt | 2014-11-03 12:46:00 | 11.11.11.11
9 | police officer | 2014-11-03 12:47:00 | 11.11.11.11
10 | police dog | 2014-11-03 12:48:00 | 11.11.11.11
11 | house | 2014-11-03 12:58:00 | 11.11.11.11
12 | police hunt | 2013-02-03 03:01:00 | 11.11.11.11
13 | police | 2013-02-03 03:05:00 | 11.11.11.11
14 | police office | 2013-02-03 03:07:00 | 11.11.11.11
I have a huge list (10000) of keywords i want to this for, although if that is tricky in sql I could create a separate sql script for each of them.
I am really stumped how to start on this
Create/insert:
drop table tmp_data;
CREATE TABLE tmp_data (
objectid SERIAL PRIMARY KEY,
keyword varchar(80) NOT NULL,
date_created TIMESTAMP without time zone default current_timestamp,
ip varchar(80) NOT NULL
);
insert into tmp_data (keyword, date_created, ip) values ('guard','2014-09-03 12:29', '123.123.123.123');
insert into tmp_data (keyword, date_created, ip) values ('dog','2014-09-03 12:30', '123.123.123.123');
insert into tmp_data (keyword, date_created, ip) values ('police','2014-09-03 12:40', '123.123.123.123');
insert into tmp_data (keyword, date_created, ip) values ('swan','2014-09-03 12:45', '123.123.123.123');
insert into tmp_data (keyword, date_created, ip) values ('goose','2014-09-03 12:46', '123.123.123.123');
insert into tmp_data (keyword, date_created, ip) values ('golf','2014-09-04 03:15', '123.123.123.123');
insert into tmp_data (keyword, date_created, ip) values ('police','2014-11-03 12:45', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('police hunt','2014-11-03 12:46', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('police officer','2014-11-03 12:47', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('police dog','2014-11-03 12:48', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('house','2014-11-03 12:58', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('police hunt','2013-02-03 3:01', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('police','2013-02-03 3:05', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('police office','2013-02-03 3:07', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('wolf','2013-02-03 06:28', '11.11.11.11');
insert into tmp_data (keyword, date_created, ip) values ('fox','2013-02-03 06:48', '11.11.11.11');
Best Answer
For adjacent records:
For all records:
fiddle