Postgresql – Full text search over multiple CPU processes

full-text-searchperformancepostgresqlpostgresql-performance

Is it possible to configure Postgres to split out a full text search over multiple CPU processes in an attempt to complete quicker?

I'm running a full text search on 2 million records on a GIN indexed tsvector column, where the source text is about 10,000 characters long.

I have way more CPU than is being used during the search, so I feel like splitting the search over 4 processes in batches of 500k would allow it to run the search concurrently and therefore complete faster.

I'd be interested to know if anyone has tried this or implemented their own equivalent programatically in SQL.

Best Answer

The default setting of "max_parallel_workers_per_gather" is 2, which won't spread work over all 4 CPUs for any one query. But that doesn't matter if you aren't getting parallel plans in the first place.

Parallel query is a relatively new feature to PostgreSQL, and is still being improved. You should use the newest version you can to give yourself the best chance of benefiting from it.

I believe the index consultation cannot be parallelized (in any version). The table consultation can be, but it often doesn't make sense to.

If the indexed part of the query is highly selective and returns few rows, then "parallel_setup_cost" will exceed the benefit of parallelizing the table access for just a few rows.

On the other hand, if you return a lot of rows, then "parallel_tuple_cost" (multiplied by rows returned) will exceed the benefit. If you access a lot of rows, but don't return them (like count(*) or some other aggregate, or a filter which the index is unable to address) that is the optimal case for parallelization to work well.