PostgreSQL – Can a Single Query Use Multiple Cores?

parallelismperformancepostgresqlquery-performance

In recent versions of PostgreSQL (as of Dec 2013), can we share a query between two or more cores to get a performance boost? Or should we get faster cores?

Best Answer

No, for versions of PostgreSQL prior to v9.6. Please see the PostgreSQL FAQ: How does PostgreSQL use CPU resources?

The PostgreSQL server is process-based (not threaded). Each database session connects to a single PostgreSQL operating system (OS) process. Multiple sessions are automatically spread across all available CPUs by the OS. The OS also uses CPUs to handle disk I/O and run other non-database tasks. Client applications can use threads, each of which connects to a separate database process.

Since version 9.6, portions of some queries can be run in parallel, in separate OS processes, allowing use of multiple CPU cores. Parallel queries are enabled by default in version 10 (max_parallel_workers_per_gather), with additional parallelism expected in future releases.

Related Solutions

PostgreSQL Performance – Use Nested Loop with Indices Over Hash Join

This closely related answer on SO should provide answers to your primary question:
Setting enable_seqscan = off in a single SELECT query

You could use in similar fashion, to disable hash joins for the current transaction:

SET LOCAL enable_hashjoin=off;

But that's not my advice. Read the answer over there.
And this one about statistics and cost settings, too.

More importantly, untangle your query first:

SELECT creation_epoch, user_screen_name, chunk
FROM  (
   SELECT id AS owner_user_id
   FROM   users
   WHERE  reputation > 100000
   ORDER  BY reputation 
   LIMIT  500
   ) u
JOIN   posts p USING (owner_user_id)
JOIN   post_tokenized t USING (id)
WHERE  type = 'tag'
AND    user_screen_name IS NOT NULL;

Should be considerably faster and also make it easier for the query planner to choose the best plan (given sane cost settings and table statistics).

PostgreSQL – Why Index Not Used When OR Condition is Applied?

I cannot really answer your question, because I really don't know why, but I've found a way to make PostgreSQL do more or less what I guess you want. I've tested your situation with a simplified simulation scenario, and using PostgreSQL 9.6.1 (latest as of today). I get the same results.

Good news is: If you can change the way you make your query, you have a couple of options which use the trigram index.

The first one consists on moving the condition on the subcontacts. In this case, the trigram index is used for one of the situations (but not the other):

SELECT
    c.id, c.number
FROM  
    cases c
    JOIN case_contacts caco ON caco.case_id = c.id
    JOIN contacts con_main ON con_main.id = caco.contact_id
    LEFT JOIN 
    (
        SELECT
            * 
        FROM
            contacts  
        WHERE
            v_fullname ilike '%test%' 
    ) AS con_sub ON con_sub.id = caco.subcontact_id
WHERE  
    con_main.v_fullname ILIKE '%test%'
    or con_sub.id is not null /* if the left join gave an answer, it's got '%test%' */ ;

A very few trials with simulated data (where aprox. 0.1%, 2.5%, 5% or 25% of the v_fullname contain '%test%') show that the difference in execution times is minuscule. [My disc is SSD, a real HD might behave very differently.] This should actually be checked with a real system with real data... but it seems that using the trigram index or not, doesn't make a big difference.

PostgreSQL is not exceptionally good at estimating how many rows will appear searching "like '%test%'", but it seems not to matter on which plan decides to use.

There is another option, which (with my little experimentation) works a little bit faster in most cases, and a lot faster when the percentage of '%test%' is low. This option means using a CTE to "prefilter" the contacts (and it uses the trigram index once, because it doesn't need to use it twice):

WITH filtered_contacts AS
(
SELECT
    *
FROM
    contacts
WHERE
    v_fullname ilike '%test%'
)
SELECT
    c.id, c.number
FROM  
    cases c
    JOIN case_contacts caco ON caco.case_id = c.id
    JOIN filtered_contacts con_main ON con_main.id = caco.contact_id
    LEFT JOIN filtered_contacts con_sub ON con_sub.id = caco.subcontact_id
WHERE
    con_main.v_fullname ILIKE '%test%'
    or con_sub.id is not null /* we need this test again, or we'll miss rows */ ;

Best Answer

Related Solutions

PostgreSQL Performance – Use Nested Loop with Indices Over Hash Join

PostgreSQL – Why Index Not Used When OR Condition is Applied?

Related Question