Postgresql – Joining a function and a view

execution-planperformanceplpgsqlpostgresql

I have a function get_sa001 and a view Axis_RefCustomer.

When I execute my function get_sa001 on it's own for a certain period let's say 2016 – 2017, the execution time is about ~6 seconds.

SELECT d."Selling_date",  d."Value_in_EUR", d."Value_in_currency", d."Site"
FROM report.get_sa001('2016-01-01'::date, '2017-03-31'::date, 32) AS d

When I execute a select on the view Axis_RefCustomer, it runs for around ~1 second.

Select a."Selling_currency" FROM report."Axis_RefCustomer" AS a

When I join them together, the execution time is around ~39 seconds !

SELECT d."Selling_date",
a."Selling_currency", 
d."Value_in_EUR", 
d."Value_in_currency", 
d."Site"
FROM report.get_sa001('2016-01-01'::date, '2017-03-31'::date, 32) AS d 
LEFT JOIN report."Axis_RefCustomer" 
AS a ON d."Site" = a."Site" 
AND d."Internal_reference" = a."Reference_internal" 
AND d."Customer_code" = a."Customer_code"

Is there anyway to reduce the amount of time my query takes to execute ?

What my function does : My function collects data from the dates I put in as parameters and associates them with a view and it sends me back a table of which I use on a 3rd party app.

Here's my explain analyze for 2016 – 2017 of the function on it's own :


Function Scan on get_sa001 d  (cost=0.25..10.25 rows=1000 width=104) (actual time=2522.959..2534.987 rows=53446 loops=1)
Total runtime: 2537.926 ms

Here's my explain analyze for the view on it's own :


Hash Right Join  (cost=3527.82..5840.32 rows=74513 width=4) (actual time=47.363..71.317 rows=77965 loops=1)
    Hash Cond: ((("T12_RefCustomer"."Site")::text = ("T01_References"."Site")::text) AND (("T12_RefCustomer"."Internal_reference")::text = ("T01_References"."Internal_reference")::text))
    ->  Seq Scan on "T12_RefCustomer"  (cost=0.00..348.81 rows=13181 width=29) (actual time=0.002..2.350 rows=13182 loops=1)
    ->  Hash  (cost=1973.13..1973.13 rows=74513 width=22) (actual time=46.591..46.591 rows=74513 loops=1)
            Buckets: 2048  Batches: 4  Memory Usage: 1019kB
            ->  Seq Scan on "T01_References"  (cost=0.00..1973.13 rows=74513 width=22) (actual time=0.014..18.580 rows=74513 loops=1)
    Total runtime: 72.540 ms    

Here's my explain analyze for the query using a left join :


Nested Loop Left Join  (cost=1.23..2375.17 rows=1000 width=108) (actual time=2406.131..42314.297 rows=53446 loops=1)
  ->  Function Scan on get_sa001 d  (cost=0.25..10.25 rows=1000 width=168) (actual time=2405.980..2429.250 rows=53446 loops=1)
  ->  Nested Loop  (cost=0.98..2.35 rows=1 width=28) (actual time=0.547..0.746 rows=1 loops=53446)
        Join Filter: (((d."Site")::text = ("T01_References"."Site")::text) AND ((d."Internal_reference")::text = ("T01_References"."Internal_reference")::text) AND (("T02_Customers"."Site")::text = ("T01_References"."Site")::text))
        Rows Removed by Join Filter: 126"
        ->  Nested Loop  (cost=0.57..1.42 rows=1 width=33) (actual time=0.011..0.057 rows=127 loops=53446)
              ->  Index Scan using "T02_Customers_pkey" on "T02_Customers"  (cost=0.28..0.77 rows=1 width=17) (actual time=0.004..0.004 rows=1 loops=53446)
                    Index Cond: ((d."Customer_code")::text = ("Customer_code")::text)
              ->  Index Scan using "T12_RefCustomer_pkey" on "T12_RefCustomer"  (cost=0.29..0.64 rows=1 width=29) (actual time=0.007..0.027 rows=99 loops=68927)
                    Index Cond: ((("Customer_code")::text = ("T02_Customers"."Customer_code")::text) AND (("Site")::text = ("T02_Customers"."Site")::text))
        ->  Index Scan using "T01_References_pkey" on "T01_References"  (cost=0.42..0.92 rows=1 width=22) (actual time=0.005..0.005 rows=1 loops=6795220)
              Index Cond: ((("Internal_reference")::text = ("T12_RefCustomer"."Internal_reference")::text) AND (("Site")::text = ("T12_RefCustomer"."Site")::text))
Total runtime: 42318.196 ms  

What I've done :
I've Vacuumed my tables and reindexed them but the results have not changed.

Best Answer

Have you tried using CTEs? I've found several times that reducing the size of my data sets before attempting the joins can lead to a big improvement in query times.

with
    __d as(
        select
            "Selling_date",
            "Value_in_EUR",
            "Value_in_currency",
            "Site",
            "Internal_reference",
            "Customer_code"
        from
            report.get_sa001('2016-01-01'::date, '2017-03-31'::date, 32)
    ),
    __a as(
        select
            "Selling_currency",
            "Site",
            "Reference_internal" as "Internal_reference",
            "Customer_code"
        from
            report."Axis_RefCustomer"
        where
            ("Site", "Reference_internal", "Customer_code") in(
                select
                    "Site",
                    "Internal_reference",
                    "Customer_code"
                from
                    __d
            )
    )
select
    __d."Selling_date",
    __a."Selling_currency", 
    __d."Value_in_EUR", 
    __d."Value_in_currency", 
    __d."Site"
from
    __d
    left join __a using("Site", "Internal_reference", "Customer_code")