I am currently aggregating posts from different social networks. Currently I have 50-100 million posts from Facebook, Twitter, Youtube, Instagram , Pinterest.
Consider a table posts
posts
{
id int(11),
user_id int(11),
url varchar(256),
image varchar(256),
source int(11), // Social Network Source
created bigint(20), // Publish time of the post
visible int(1) // Public or Private Posts
}
Using InnoDB
Indexes :
primary key on id
user_id,source
user_id,created
Space usage
Type Usage
Data 45,876.0 MiB
Index 4,959.0 MiB
Total 50,835.0 MiB
Using Amazon RDS with 7.5 GB RAM.
The queries I am executing are as follows
select id
from posts
where user_id={user_id}
and visible=1
order by created desc
LIMIT 20 ;
select id
from posts
where user_id={user_id}
and source={your_network}
and visible=1
order by created desc
LIMIT 20 ;
For users with > 500k posts sometimes my queries take a really long time.
I have run explain and you can see the results below
EXPLAIN SELECT id
FROM `posts`
WHERE user_id =123529745 and source=1 and visible=1
ORDER BY created DESC
LIMIT 20
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ref user_id_2,user_id_4,user_id_3,created created 4 const 954174 Using where; Using index
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE posts ref user_id_2,user_id_4,user_id_3,created created 4 const 742308 Using where;
Can you please suggest me any optimizations?
Best Answer
This is your query:
You are doing only equality comparisons in the
where
and then sorting bycreated
.A single index on:
posts(user_id, source, visible, created, id)
should allow the entire query to be satisfied only by the index -- first finding the right rows, then sorting by them, and then fetching the id.