Postgresql – Select pseudorandom row from a very small partial index in Postgres

postgresql

I have a huge table, with a tiny partial index on <0.1% of the rows satisfying a certain condition (one column is null). I would like to pseudorandomly select a row that satisfies this condition. Can I use the partial index that already exists to speed this up, and if so, how?

Best Answer

Shouldn't be a problem, just put the WHERE condition of your partial index into the WHERE condition of your query.

create table foobar as select id, case when random()<0.001 then NULL else random() END as nullable, random() as z from generate_series(1,1000000) f(id);
create index on foobar (id ) where nullable is null;
select * from foobar 
   where nullable is null order by random() limit 1;

This does have to read all the indexed rows, so there is only fast it can be. It also has to "sort" all the rows, but since the sort knows about the LIMIT it is not a full sort and so is not really Nlog(N).

If you are willing to augment your table with a column of random numbers (which I had already done in my example, just for filler at the time), then you can do better.

create table foobar as select id, case when random()<0.001 then NULL else random() END as nullable, random() as z from generate_series(1,50000000) f(id);
create index on foobar (z) where nullable is null;

with rand as (select random()/1.0005 as rand) 
select foobar.* from foobar,rand where nullable is null
  and z between rand and rand+ 0.0005 
order  by random() limit  1;

You would have to tune the constant 0.0005 with knowledge of how many indexed rows there are. If you make it too small, there is the possibility that the range will contain not rows and so you get no result (you could then retry it), and if you make it too large you use more time than necessary.

Without changing the table, then I think you can't do much better than reading all of the indexed tuples; without being willing to compromise on the quality of the randomness.