I'm trying to determine what the optimal query might be for associating a date with a record that would have been active for that day.
Here is a setup to demonstrate what I'm trying to do (demo at SQL Fiddle):
DDL:
CREATE TABLE component_data
("id" int, "generated_date" timestamp, "removed_date" timestamp, "active" bool, "owner_rel_id" int)
;
CREATE TABLE dates
("fulldate" date, "year" int, "month" int, "day" int)
;
component_data
Sample Data:
INSERT INTO component_data
("id", "generated_date", "removed_date", "active", "owner_rel_id")
VALUES
(2507562, '2016-04-23 07:37:51', NULL, 't', 977720),
(2507563, '2016-04-23 07:37:51', NULL, 't', 977720),
(2507564, '2016-04-23 07:37:51', NULL, 't', 977720),
(2507565, '2016-04-23 07:37:51', NULL, 't', 977720),
(2507566, '2016-04-23 07:37:51', NULL, 't', 977720),
(2507567, '2016-04-23 07:37:51', NULL, 't', 977720),
(1586966, '2013-09-16 18:28:23', NULL, 't', 653531),
(1586854, '2013-09-16 17:02:43', NULL, 't', 678806),
(2363032, '2015-12-26 13:59:02', NULL, 't', 666874),
(1586929, '2013-09-16 18:04:41', NULL, 't', 678820),
(1587071, '2013-09-16 18:47:32', NULL, 't', 463631),
(1587072, '2013-09-16 18:53:10', NULL, 't', 678834),
(2363033, '2015-12-26 13:59:02', NULL, 't', 666874),
(2235362, '2015-09-07 17:30:58', NULL, 't', 882233),
(1587065, '2013-11-17 05:12:42', '2016-05-16 09:12:58-04', NULL, 678831),
(1587785, '2013-09-19 00:00:00', NULL, 't', 679404),
(2363036, '2015-12-26 14:15:59', NULL, 't', 713679),
(1581675, '2013-09-25 00:00:00', '2016-05-16 09:34:26-04', NULL, 677199),
(1210333, '2013-02-14 00:00:00', NULL, 't', 681094),
(1592753, '2013-09-24 18:04:49', NULL, 't', 680679),
(1593061, '2013-09-27 00:00:00', '2016-05-15 14:28:35-04', NULL, 680794),
(1593064, '2013-09-27 00:00:00', '2016-05-15 14:28:35-04', NULL, 680794)
;
dates
Sample Data:
INSERT INTO dates
("fulldate", "year", "month", "day")
VALUES
('2016-05-01 00:00:00', 2016, 5, 1),
('2016-05-02 00:00:00', 2016, 5, 2),
('2016-05-03 00:00:00', 2016, 5, 3),
('2016-05-04 00:00:00', 2016, 5, 4),
('2016-05-05 00:00:00', 2016, 5, 5),
('2016-05-06 00:00:00', 2016, 5, 6),
('2016-05-07 00:00:00', 2016, 5, 7),
('2016-05-08 00:00:00', 2016, 5, 8),
('2016-05-09 00:00:00', 2016, 5, 9),
('2016-05-10 00:00:00', 2016, 5, 10),
('2016-05-11 00:00:00', 2016, 5, 11),
('2016-05-12 00:00:00', 2016, 5, 12),
('2016-05-13 00:00:00', 2016, 5, 13),
('2016-05-14 00:00:00', 2016, 5, 14),
('2016-05-15 00:00:00', 2016, 5, 15),
('2016-05-16 00:00:00', 2016, 5, 16),
('2016-05-17 00:00:00', 2016, 5, 17),
('2016-05-18 00:00:00', 2016, 5, 18),
('2016-05-19 00:00:00', 2016, 5, 19),
('2016-05-20 00:00:00', 2016, 5, 20),
('2016-05-21 00:00:00', 2016, 5, 21),
('2016-05-22 00:00:00', 2016, 5, 22),
('2016-05-23 00:00:00', 2016, 5, 23),
('2016-05-24 00:00:00', 2016, 5, 24),
('2016-05-25 00:00:00', 2016, 5, 25)
;
Query:
SELECT
d.fulldate,
cd.id,
cd.owner_rel_id,
cd.generated_date,
cd.removed_date,
cd.active
FROM
component_data cd
INNER JOIN dates d ON (cd.generated_date <= d.fulldate) AND ((cd.removed_date >= d.fulldate) OR cd.active)
WHERE
d.fulldate IN ('2016-05-16', '2016-05-17')
ORDER BY
cd.id,
d.fulldate;
In the real world scenario, the component_data
table will have ~2 million rows or so, and I'll be comparing against dates back to 1JAN2013
I'm wondering if there's a more efficient way to do what my query is doing.
Best Answer
So I happened to have stumbled on a more efficient pattern for this sort of operation (based on how much faster this is in testing).
Given the above, here's what it would look like (I've excluded fields beyond what are necessary to point to a
compoent_data
record):Updated Query: