Postgresql – Finding by ID in postgresql very slow on big table

herokupostgresqlpostgresql-9.2

I have videos table with 18M rows. When I search for particular video by ID, it takes up to 6 seconds to complete. Sometimes it takes few milliseconds, sometimes up to 6 seconds, but on average it is around 2 seconds.

Application is hosted on heroku and I'm using Crane database (https://addons.heroku.com/heroku-postgresql) with 410MB of RAM.

Is there any way to speed this up? I'm querying for videos 50 times per second on average, and new videos are inserted/updated at rate of 50 per second.

explain analyze SELECT * FROM videos WHERE id = 17841464 LIMIT 1;
                                                            QUERY PLAN                                                            
----------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..6.43 rows=1 width=119) (actual time=2337.892..2337.894 rows=1 loops=1)
   ->  Index Scan using videos_pkey on videos  (cost=0.00..6.43 rows=1 width=119) (actual time=2337.888..2337.888 rows=1 loops=1)
         Index Cond: (id = 17841464)
 Total runtime: 2337.943 ms

Here is how the table looks like:

 \d+ videos;
                                                           Table "public.videos"
     Column     |            Type             |                      Modifiers                      | Storage  | Stats target | Description 
----------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
 id             | integer                     | not null default nextval('videos_id_seq'::regclass) | plain    |              | 
 uuid           | character(11)               | not null                                            | extended |              | 
 channel_id     | integer                     | not null                                            | plain    |              | 
 category_id    | integer                     |                                                     | plain    |              | 
 title          | character varying(255)      |                                                     | extended |              | 
 published_at   | timestamp without time zone |                                                     | plain    |              | 
 view_count     | bigint                      |                                                     | plain    |              | 
 like_count     | integer                     |                                                     | plain    |              | 
 dislike_count  | integer                     |                                                     | plain    |              | 
 favorite_count | integer                     |                                                     | plain    |              | 
 comment_count  | integer                     |                                                     | plain    |              | 
 disabled       | boolean                     | default false                                       | plain    |              | 
 created_at     | timestamp without time zone |                                                     | plain    |              | 
 updated_at     | timestamp without time zone |                                                     | plain    |              | 
Indexes:
    "videos_pkey" PRIMARY KEY, btree (id)
    "videos_uuid_idx" UNIQUE, btree (uuid)
    "videos_latest_by_channel_idx" btree (channel_id, published_at DESC)
    "videos_top_by_channel_idx" btree (channel_id, view_count DESC)
Has OIDs: no

Best Answer

Honestly I have never seen a single value btree lookup take this long. I don't think your problem is simply in your query. I think it is elsewhere.

First you say you have a lot of writes. This probably means you are constantly pushing a lot of stuff out of the buffers and doing a lot of random disk I/O. I would not be surprised if a lot of the bottleneck is in the general I/O and your lack of RAM.

For 18 million rows, these don't look very large. They may all fit in memory.

At any rate the first thing to do if had access to the command line on the server would be to take CPU I/O wait vs user vs system time (you may be able to do that elsewhere in your code). Also run explain with (ANALYSE, BUFFERS, VERBOSE) set so that you can see much more about what is going on. These should give you an idea if your problem is underwhelming CPU activity, slow RAM, or lots of disk I/O but as it is, there is very little that can be done.

Another thing you can do is this:

VACUUM ANALYSE VERBOSE;

See if that speeds things up.

Best Answer

Related Solutions

PostgreSQL Optimization – Explain Plan with Giant Gaps Between Operations

Postgresql – Improve performance on concurrent UPDATEs for a timestamp column in Postgres

Related Question