I have some instagram API data sitting across two tables. One with post_data, and one with profile_data.
The profile_data table has a timestamp of when the account was checked through the API to get the number of followers the account has at a point in time.
The post_data table has the timestamp of a post.
I need to join these two together with a best fit timestamp to calculate an engagement rate (i.e. likes on a post / followers of account).
The profile_data table has just under 13k rows of data and the
post_data table has just under 6k of rows.
There is a third relational table called brands which contains the brand featured in a post_id
My current query looks like this:
SELECT p.ent_id,
p.post_time,
p.record_updated,
p.last_checked,
p.post_id,
p.id,
p.image,
p.caption,
p.tags,
p.likes as 'likes',
p.comment_count as 'comment_count',
p.comments,
p.users_in_photo,
p.post_type,
p.link,
p.gender,
p.model,
p.content_type,
p.location,
p.campaign,
GROUP_CONCAT(b.brand SEPARATOR',') AS brands,
d.followers,
((p.likes+p.comment_count)/d.followers)*100 AS 'engagement_rate',
p.reach as 'reach',
p.impressions as 'impressions',
p.saved as 'saved',
((p.reach/d.followers)*100) as 'reach_rate'
FROM post_data p
LEFT JOIN featured_brands b ON p.post_id = b.post_id
LEFT JOIN profile_data d ON p.post_time BETWEEN SUBTIME(d.time_checked, '00:30:00')
AND ADDTIME(d.time_checked, '00:30:00')
WHERE p.post_time > '2019-01-02 00:00:00'
AND p.post_time < '2019-05-13 23:59:59'
GROUP BY p.post_id
ORDER BY likes DESC;
This query achieves the goal however for the date range above, the query takes almost 27seconds which is not acceptable. I have identified that changed the SUBTIME and ADDTIME to the equivalent of (d.time_checked - interval 30 minutes)
I can get this down to about 15 seconds but this is still too long.
I figure the BETWEEN call is my major problem but I'm unsure of how to achieve the same result more effectively.
Are there any other methods that I could achieve this result in a more efficient manner?
Edit:
Mysql version 5.5.42.
DDL:
CREATE TABLE `featured_brands` (
`post_id` varchar(255) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
`brand` varchar(255) CHARACTER SET utf8mb4 NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `post_data` (
`ent_id` int(11) NOT NULL AUTO_INCREMENT,
`post_time` timestamp NULL DEFAULT NULL,
`record_updated` timestamp NULL DEFAULT NULL,
`last_checked` timestamp NULL DEFAULT NULL,
`post_id` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL DEFAULT '',
`image` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`caption` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`tags` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`likes` int(7) NOT NULL,
`comment_count` int(7) NOT NULL,
`comments` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`users_in_photo` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`post_type` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`link` text CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL,
`gender` varchar(1000) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
`model` varchar(1000) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
`content_type` varchar(255) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
`location` varchar(255) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
`campaign` varchar(255) CHARACTER SET utf8mb4 DEFAULT NULL,
`id` varchar(50) CHARACTER SET utf8mb4 DEFAULT NULL,
`ig_id` varchar(50) CHARACTER SET utf8mb4 DEFAULT NULL,
`engagement` int(11) DEFAULT NULL,
`reach` int(11) DEFAULT NULL,
`saved` int(11) DEFAULT NULL,
`impressions` int(11) DEFAULT NULL,
`no_advanced_data` int(11) DEFAULT NULL,
PRIMARY KEY (`ent_id`),
UNIQUE KEY `post_id` (`post_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `profile_data` (
`record_id` int(11) NOT NULL,
`time_checked` timestamp NULL DEFAULT NULL,
`profile_id` int(7) DEFAULT NULL,
`username` varchar(255) DEFAULT NULL,
`profile_picture` varchar(255) DEFAULT NULL,
`full_name` varchar(255) DEFAULT NULL,
`bio` text,
`website` varchar(255) DEFAULT NULL,
`is_business` int(1) DEFAULT NULL,
`media_count` int(11) DEFAULT NULL,
`follows` int(11) DEFAULT NULL,
`followers` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Best Answer
Some optimizations in these areas have been made since 5.5. I recommend you upgrade.
Don't use
LEFT
unless you need the semantics of it.Avoid joining to
b
in the following manner: ChangeGROUP_CONCAT(b.brand SEPARATOR',') AS brands
toand get rid of the
LEFT JOIN ...
This may help the "explode-implode" pattern of
JOINing
, then collapsing viaGROUP BY
.You may be able to get rid of the
GROUP BY
altogether by turning the otherLEFT JOIN
into a "derived table":The derived table would be a
SELECT
thatJOIN
post_data andprofile_data
with the primary goal of finding which post_ids are interesting.Indexes...