Mysql – ORDER BY optimization for a large table

MySQLoptimization

I am currently aggregating posts from different social networks. Currently I have 50-100 million posts from Facebook, Twitter, Youtube, Instagram , Pinterest.

Consider a table posts

posts
{
    id  int(11),
    user_id  int(11),
    url varchar(256), 
    image varchar(256), 
    source int(11),  // Social Network Source 
    created bigint(20), // Publish time of the post 
    visible int(1) // Public or Private Posts
}

Using InnoDB

Indexes :

primary key on id
user_id,source
user_id,created

Space usage

Type    Usage 
Data    45,876.0    MiB
Index   4,959.0 MiB
Total   50,835.0    MiB

Using Amazon RDS with 7.5 GB RAM.

The queries I am executing are as follows

select id 
from posts 
where user_id={user_id} 
  and visible=1  
order by created desc 
  LIMIT 20 ;


select id 
from posts 
where user_id={user_id} 
  and source={your_network} 
  and visible=1 
order by created desc 
  LIMIT 20 ;

For users with > 500k posts sometimes my queries take a really long time.

I have run explain and you can see the results below

EXPLAIN SELECT id
FROM  `posts` 
WHERE user_id =123529745 and source=1 and visible=1
ORDER BY created DESC 
LIMIT 20

id    select_type       table   type    possible_keys   key     key_len ref     rows    Extra 
1       SIMPLE          posts   ref     user_id_2,user_id_4,user_id_3,created   created 4       const   954174  Using where; Using index

id    select_type       table   type    possible_keys   key     key_len ref     rows    Extra 
1       SIMPLE          posts   ref     user_id_2,user_id_4,user_id_3,created   created 4       const   742308  Using where;

Can you please suggest me any optimizations?

Best Answer

This is your query:

SELECT id
FROM  `posts` 
WHERE user_id =123529745 and source=1 and visible=1
ORDER BY created DESC

You are doing only equality comparisons in the where and then sorting by created.

A single index on: posts(user_id, source, visible, created, id) should allow the entire query to be satisfied only by the index -- first finding the right rows, then sorting by them, and then fetching the id.