Postgresql – Postgres DB is extremely fast with a query with a certain parameter, but extremely slow for another

postgresqlquery

I have a table called stats, where I store every action on a certain website (basically when the user clicks a link, data is saved in the db). The data is modeled with PostgreSQL as

(
id integer NOT NULL DEFAULT nextval('stats_id_seq'::regclass),
company integer NOT NULL,
layout integer NOT NULL,
browsername character varying(50) COLLATE pg_catalog."default",
lang character varying(5) COLLATE pg_catalog."default" NOT NULL,
...)

There are other fields that I don't think are relevant. Basically what I want to do is see which are the most used browsers for a certain company, by counting how many pages are visited by that browser and how many users there are for a given browser.

My query is:

select browsername, count(DISTINCT phpsessid) as visitors, count(*) as pages
from public.stats
where company=1 and createdate>'2017-01-01' and createdate<'2018-12-12' 
group by browsername order by visitors desc;

The stats table contains around 80 millions rows. Now, this query is extremely fast if I set company=1 or company=222 (it's the company ID). It takes less than half a second to fetch around 20k rows. However, it is EXTREMELY slow if I set company=13549, for instance (we're talking about literal hours here). Obviously something is wrong, either in the data modeling or in the way I query.

How come there's such a difference for different companies? The DB was not done by me, so I apologize if I left something useful out, and feel free to ask.

The indexes are:

CREATE INDEX stats_new_company_13549_index
ON public.stats USING btree
(company, createdate)
TABLESPACE pg_default    WHERE company = 13549


CREATE INDEX stats_new_company_14863_index
ON public.stats USING btree
(company, createdate)
TABLESPACE pg_default    WHERE company = 14863

CREATE INDEX stats_new_company_createdate_cet_index
ON public.stats USING btree
(company, date(timezone('CET'::text, createdate)))
TABLESPACE pg_default;

CREATE INDEX stats_new_company_createdate_index_1
ON public.stats USING btree
(company, createdate)
TABLESPACE pg_default;

Here's the plan for the fast query: https://explain.depesz.com/s/uKmJ

And here's the plan for the slow one: https://explain.depesz.com/s/wysA

Just by looking at the plans you can see it took several minutes for the second one to even query the explain.

I also noticed that after running the explain a couple of times, it got done in reasonable amounts the third time. From that moment onward, the query would also drastically reduce its execution time, from hours to a mere 4 seconds.
It's the second time it happens, and I swear I'm not crazy. If I change the company ID once again, the query takes hours again. I'm at lost here: is there some index problem?

Best Answer

The fast query retrieves and sorts 26229 rows - for that small number of rows, the sorting can be done in memory, so obviously this is going to be quick. First because retrieving the data only takes ~500ms and then the sorting is done in 50ms.

The slow query retrieves 560135 rows (20 times as many as the first query) but the time it wook - 149939ms - seems quite slow. Maybe your table (or index) is bloated - the number of blocks needed to read that number of rows is way too high I think.

You can run vacuum full analyze public.stats; and see if performance gets better after that.

Or maybe you simply have a very slow harddisk.

The sorting was done on disk, but that only added another 3 seconds to the total runtime.

The question why the same query is sometimes fast and sometimes quick (especially when run the second time) is more often that not answered with: caching effects. When you re-run the second query you will probably see a lot of the "shared read=318147" information turn to "shared hit=..." which means those blocks were already in the cache.