Postgresql – Function which returns a table for a selected period of time

execution-planperformanceplpgsqlpostgresql

I have a PostgreSQL function on my hands.

The function works but the results are very slow. I'm hopping someone will be able to help me reduce the execution time.

I myself was thinking along the roads of a create index but I'm only new to SQL so any advice or help will be much appreciated !

I'm currently doing 7 sorts. There's a join with another view and therefor the performances are reduced. Is there a way to reduce the number of sorts ?
When I execute a request from start of 2016 to start of 2017, I get execution times of 40 seconds.
In my Explain, I have nested loops. Is there maybe a way around them ?

What my function does :
My function collects data from the dates I put in as parameters and associates them with a view and it sends me back a table of which I use on a 3rd party app.

Here's my code:

CREATE OR REPLACE FUNCTION report.get_sa001(
    IN "date_D" timestamp without time zone, 
    IN "date_F" timestamp without time zone, 
    IN frequence integer)
RETURNS TABLE(
    "Period_date" timestamp without time zone, 
    "Site" character varying, 
    "Customer_code" character varying,
    "Internal_reference" character varying, 
    "InvoiceNumber" character varying, 
    "Value_in_currency" numeric, 
    "Value_in_EUR" numeric, 
    "Value_Budget_in_EUR" numeric, 
    "Selling_price_CUR" numeric, 
    "Selling_price_EUR" numeric, 
    "Currency_code" character varying, 
    "Selling_quantity" numeric, 
    "Variance_price_CUR" numeric, 
    "Variance_price_EUR" numeric, 
    "Variance_value_CUR" numeric, 
    "Variance_value_EUR" numeric, 
    "Selling_date" timestamp without time zone) AS
$BODY$
DECLARE p_debut timestamp without time zone;
DECLARE p_fin timestamp without time zone;

BEGIN

p_debut = dw.get_period_end("date_D", "frequence");
p_fin = dw.get_period_end("date_F", "frequence");



DROP TABLE IF EXISTS "dates_1" CASCADE;
CREATE TEMPORARY TABLE "dates_1" AS
SELECT p_debut::date + n AS "date", 
       dw.period_frequency(p_debut::date + n) AS "frequency" 
FROM generate_series(0, p_fin::date - p_debut::date) AS x(n)
WHERE (dw.period_frequency(p_debut::date + n) & frequence != 0);

DROP TABLE IF EXISTS "dates" CASCADE;
CREATE TEMPORARY TABLE "dates" AS
SELECT "date", 
       LAG("date") OVER (ORDER BY "date") AS "date_prev", 
       "frequency" 
FROM "dates_1";

DROP TABLE IF EXISTS "Sales_over_period" CASCADE;
CREATE TEMPORARY TABLE "Sales_over_period" AS
SELECT "dates"."date"::timestamp without time zone, 
       "FI-D7_Sales".* 
FROM "dates" 
LEFT JOIN dw."FI-D7_Sales"
ON "FI-D7_Sales"."Selling_date" >= COALESCE("dates"."date_prev", "date_D")
AND "FI-D7_Sales"."Selling_date" <= "dates"."date" 
WHERE "FI-D7_Sales"."Selling_date" <= "date_F"
ORDER BY "FI-D7_Sales"."Site", 
         dates."date";

DROP TABLE IF EXISTS "Ref_price" CASCADE;
CREATE TEMPORARY TABLE "Ref_price" AS        
SELECT "FI-D7_Sales"."Internal_reference",
       SUM ("FI-D7_Sales"."Value_in_currency") As "Ref_value",
       SUM ("FI-D7_Sales"."Qty") As "Ref_quantity",
       "FI-D7_Sales"."Site" 
FROM dw."FI-D7_Sales"
INNER JOIN (
    SELECT DISTINCT ON (
        "FI-D7_Sales"."Internal_reference", "FI-D7_Sales"."Site") 
    "FI-D7_Sales"."Internal_reference",
    "FI-D7_Sales"."Selling_date" As "Ref_date",
    "FI-D7_Sales"."Site"
    FROM dw."FI-D7_Sales"
    WHERE "FI-D7_Sales"."Selling_date"<= "date_D"
    ORDER BY "FI-D7_Sales"."Internal_reference", 
             "FI-D7_Sales"."Site", 
             "FI-D7_Sales"."Selling_date" DESC) FID7_MaxDate
ON "FI-D7_Sales"."Internal_reference" =FID7_MaxDate."Internal_reference" 
AND "FI-D7_Sales"."Site" =FID7_MaxDate."Site" 
AND "FI-D7_Sales"."Selling_date" =FID7_MaxDate."Ref_date"
GROUP BY "FI-D7_Sales"."Internal_reference", 
         "FI-D7_Sales"."Selling_date", 
         "FI-D7_Sales"."Site";

DROP TABLE IF EXISTS "Sales_with_currency" CASCADE;
CREATE TEMPORARY TABLE "Sales_with_currency" AS              
SELECT "Sales_over_period".*,
    Case 
       WHEN "Ref_price"."Ref_quantity" is Null Then -1
       WHEN "Ref_price"."Ref_quantity" <> 0 Then "Ref_price"."Ref_value"/"Ref_price"."Ref_quantity" 
    ELSE 0
    END AS "Ref_price",
       "C00_Sites"."Accounting_currency", 
       date_trunc('MONTH'::text, "Sales_over_period"."Selling_date") + '1 mon -1 days'::interval AS "eom_date",
       date_trunc('YEAR'::text, "Sales_over_period"."Selling_date") + '1 year -1 days'::interval AS "eoy_date"
FROM "Sales_over_period" 
JOIN dw."C00_Sites" 
ON "Sales_over_period"."Site"::text = "C00_Sites"."Site"::text
LEFT JOIN "Ref_price" 
ON "Sales_over_period"."Site"::text = "Ref_price"."Site"::text 
AND "Sales_over_period"."Internal_reference"::text = "Ref_price"."Internal_reference"::text;

DROP TABLE IF EXISTS "Sales_with_exchangerate" CASCADE;
CREATE TEMPORARY TABLE "Sales_with_exchangerate" AS 
SELECT "Sales_with_currency".*,
       COALESCE(exE."ratePerEur", exE_last."ratePerEur_Last")::numeric(15,4) AS "ratePerEur_E",
       COALESCE(exB."ratePerEur", exB_last."ratePerEur_Last")::numeric(15,4) AS "ratePerEur_B"
FROM "Sales_with_currency"
LEFT JOIN dw."FI-D0_ExchangeRates" AS exE
ON exE."rateType" ='E'
AND "Sales_with_currency"."Accounting_currency" = exE."rateCurrency" 
AND "Sales_with_currency"."eom_date" = exE."rateDate"
LEFT JOIN dw."FI-D0_ExchangeRates_LAST" AS exE_last
ON "Sales_with_currency"."Accounting_currency" = exE_last.rate_cur 
AND exE_last."rate_type" = 'E'
LEFT JOIN dw."FI-D0_ExchangeRates" AS exB
ON exB."rateType" = 'B'
AND "Sales_with_currency"."Accounting_currency" = exB."rateCurrency" 
AND "Sales_with_currency"."eoy_date" = exB."rateDate"
LEFT JOIN dw."FI-D0_ExchangeRates_LAST" AS exB_last
ON "Sales_with_currency"."Accounting_currency" = exB_last.rate_cur 
AND exB_last."rate_type" = 'B';

DROP TABLE IF EXISTS "Sales_final" CASCADE;
CREATE TEMPORARY TABLE "Sales_final" AS 
SELECT "Sales_with_exchangerate"."date",
       "Sales_with_exchangerate"."Site",
       "Sales_with_exchangerate"."Customer_code",
       "Sales_with_exchangerate"."Internal_reference",
       "Sales_with_exchangerate"."InvoiceNumber",
       "Sales_with_exchangerate"."Value_in_currency",
    CASE 
       WHEN "Sales_with_exchangerate"."Accounting_currency" = 'EUR'::bpchar 
       THEN "Sales_with_exchangerate"."Value_in_currency"::numeric(15,4)
    ELSE ("Sales_with_exchangerate"."Value_in_currency" / "Sales_with_exchangerate"."ratePerEur_E")::numeric(15,4)
    END AS "Value_in_EUR",
    CASE 
       WHEN "Sales_with_exchangerate"."Accounting_currency" = 'EUR'::bpchar 
       THEN "Sales_with_exchangerate"."Value_in_currency"::numeric(15,4)
    ELSE ("Sales_with_exchangerate"."Value_in_currency" / "Sales_with_exchangerate"."ratePerEur_B")::numeric(15,4)
    END AS "Value_Budget_in_EUR",
    CASE 
       WHEN "Sales_with_exchangerate"."Qty" = 0 Then 0::numeric(15,4)
    ELSE ("Sales_with_exchangerate"."Value_in_currency"/"Sales_with_exchangerate"."Qty")::numeric(15,4) 
    END AS "Selling_price_CUR",
    CASE 
       WHEN "Sales_with_exchangerate"."Qty" = 0 Then 0::numeric(15,4)
       WHEN "Sales_with_exchangerate"."Accounting_currency" = 'EUR'::bpchar 
       THEN ("Sales_with_exchangerate"."Value_in_currency"/"Sales_with_exchangerate"."Qty")::numeric(15,4)
    ELSE (("Sales_with_exchangerate"."Value_in_currency" / "Sales_with_exchangerate"."ratePerEur_E")/"Sales_with_exchangerate"."Qty")::numeric(15,4)
    END AS "Selling_price_EUR",
       "Sales_with_exchangerate"."Currency_code",
       "Sales_with_exchangerate"."Qty"::numeric(15,4) AS "Selling_quantity",
    CASE 
       WHEN "Sales_with_exchangerate"."Ref_price"=-1 Then -1::numeric(15,4)
    ELSE "Sales_with_exchangerate"."Ref_price"::numeric(15,4) 
    END AS "Variance_price_CUR",
    CASE 
       WHEN "Sales_with_exchangerate"."Ref_price"=-1 Then -1::numeric(15,4)
       WHEN "Sales_with_exchangerate"."Accounting_currency" = 'EUR'::bpchar 
       THEN "Sales_with_exchangerate"."Ref_price"::numeric(15,4)
    ELSE ("Sales_with_exchangerate"."Ref_price" / "Sales_with_exchangerate"."ratePerEur_E")::numeric(15,4)
    END AS "Variance_price_EUR",
    CASE 
       WHEN "Sales_with_exchangerate"."Ref_price"=-1 Then 0::numeric(15,4)
    ELSE ("Sales_with_exchangerate"."Ref_price"*"Sales_with_exchangerate"."Qty")::numeric(15,4)
    END AS "Variance_value_CUR",
    CASE 
       WHEN "Sales_with_exchangerate"."Ref_price" = -1 Then 0::numeric(15,4)
       WHEN "Sales_with_exchangerate"."Accounting_currency" = 'EUR'::bpchar 
       THEN ("Sales_with_exchangerate"."Ref_price"*"Sales_with_exchangerate"."Qty")::numeric(15,4)
    ELSE (("Sales_with_exchangerate"."Ref_price" / "Sales_with_exchangerate"."ratePerEur_E")*"Sales_with_exchangerate"."Qty")::numeric(15,4)
    END AS "Variance_value_EUR",
       "Sales_with_exchangerate"."Selling_date"  
FROM "Sales_with_exchangerate";

RETURN QUERY 

SELECT * 
FROM "Sales_final";

END
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100
  ROWS 1000;

Here's my query and Explain Analyze :

EXPLAIN ANALYZE SELECT d."Selling_date", a."Selling_currency", d."Value_in_EUR", d."Value_in_currency", d."Site"
FROM report.get_sa001('2016-01-01'::date, '2017-03-31'::date, 32) AS d LEFT JOIN report."Axis_RefCustomer" 
AS a ON d."Site" = a."Site" AND d."Internal_reference" = a."Reference_internal" AND d."Customer_code" = a."Customer_code"

Nested Loop Left Join  (cost=1.23..2375.17 rows=1000 width=108) (actual time=2471.699..40647.542 rows=53442 loops=1)
  ->  Function Scan on get_sa001 d  (cost=0.25..10.25 rows=1000 width=168) (actual time=2470.512..2485.426 rows=53442 loops=1)
  ->  Nested Loop  (cost=0.98..2.35 rows=1 width=28) (actual time=0.524..0.714 rows=1 loops=53442)
        Join Filter: (((d."Site")::text = ("T01_References"."Site")::text) AND ((d."Internal_reference")::text = ("T01_References"."Internal_reference")::text) AND (("T02_Customers"."Site")::text = ("T01_References"."Site")::text))
        Rows Removed by Join Filter: 126
        ->  Nested Loop  (cost=0.57..1.42 rows=1 width=33) (actual time=0.011..0.053 rows=127 loops=53442)
              ->  Index Scan using "T02_Customers_pkey" on "T02_Customers"  (cost=0.28..0.77 rows=1 width=17) (actual time=0.003..0.004 rows=1 loops=53442)
                    Index Cond: ((d."Customer_code")::text = ("Customer_code")::text)
              ->  Index Scan using "T12_RefCustomer_pkey" on "T12_RefCustomer"  (cost=0.29..0.64 rows=1 width=29) (actual time=0.007..0.025 rows=99 loops=68922)
                    Index Cond: ((("Customer_code")::text = ("T02_Customers"."Customer_code")::text) AND (("Site")::text = ("T02_Customers"."Site")::text))
        ->  Index Scan using "T01_References_pkey" on "T01_References"  (cost=0.42..0.92 rows=1 width=22) (actual time=0.005..0.005 rows=1 loops=6795175)
              Index Cond: ((("Internal_reference")::text = ("T12_RefCustomer"."Internal_reference")::text) AND (("Site")::text = ("T12_RefCustomer"."Site")::text))
Total runtime: 40650.764 ms

What I have done:

I tried turning off the nested loops SET enable_nestloop = off;.
Therefore Postgres decided to use a Hash left join which reduced my total run time down from 40 seconds down to 2.8 seconds. Is Hash a viable solution ?
Is there anything complementary to Hash or to replace it which would reduce even more my execution time ?

Here's my Explain Analyze after I turned the nestloop to off :

SET enable_nestloop = off;
EXPLAIN ANALYZE SELECT d."Selling_date", a."Selling_currency", d."Value_in_EUR", d."Value_in_currency", d."Site"
FROM report.get_sa001('2016-01-01'::date, '2017-03-31'::date, 32) AS d LEFT JOIN report."Axis_RefCustomer" 
AS a ON d."Site" = a."Site" AND d."Internal_reference" = a."Reference_internal" AND d."Customer_code" = a."Customer_code"


Hash Left Join  (cost=6342.36..6382.37 rows=1000 width=108) (actual time=2774.785..2806.634 rows=53446 loops=1)
  Hash Cond: (((d."Site")::text = ("T01_References"."Site")::text) AND ((d."Internal_reference")::text = ("T01_References"."Internal_reference")::text) AND ((d."Customer_code")::text = ("T02_Customers"."Customer_code")::text))
  ->  Function Scan on get_sa001 d  (cost=0.25..10.25 rows=1000 width=168) (actual time=2597.277..2605.074 rows=53446 loops=1)
  ->  Hash  (cost=6329.04..6329.04 rows=747 width=28) (actual time=177.496..177.496 rows=13182 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 768kB
        ->  Hash Join  (cost=5275.62..6329.04 rows=747 width=28) (actual time=138.692..174.757 rows=13182 loops=1)
              Hash Cond: ((("T02_Customers"."Site")::text = ("T01_References"."Site")::text) AND (("T12_RefCustomer"."Internal_reference")::text = ("T01_References"."Internal_reference")::text))
              ->  Merge Join  (cost=1747.80..1925.62 rows=3703 width=33) (actual time=111.117..129.543 rows=13182 loops=1)
                    Merge Cond: ((("T12_RefCustomer"."Site")::text = ("T02_Customers"."Site")::text) AND (("T12_RefCustomer"."Customer_code")::text = ("T02_Customers"."Customer_code")::text))
                    ->  Sort  (cost=1250.80..1283.75 rows=13181 width=29) (actual time=84.036..87.813 rows=13182 loops=1)
                          Sort Key: "T12_RefCustomer"."Site", "T12_RefCustomer"."Customer_code"
                          Sort Method: external merge  Disk: 528kB
                          ->  Seq Scan on "T12_RefCustomer"  (cost=0.00..348.81 rows=13181 width=29) (actual time=0.004..1.998 rows=13182 loops=1)
                    ->  Sort  (cost=497.00..510.98 rows=5592 width=17) (actual time=27.076..27.486 rows=17525 loops=1)
                          Sort Key: "T02_Customers"."Site", "T02_Customers"."Customer_code"
                          Sort Method: quicksort  Memory: 607kB
                          ->  Seq Scan on "T02_Customers"  (cost=0.00..148.92 rows=5592 width=17) (actual time=0.003..0.597 rows=5593 loops=1)
              ->  Hash  (cost=1973.13..1973.13 rows=74513 width=22) (actual time=26.911..26.911 rows=74513 loops=1)
                    Buckets: 2048  Batches: 4  Memory Usage: 1019kB
                    ->  Seq Scan on "T01_References"  (cost=0.00..1973.13 rows=74513 width=22) (actual time=0.002..11.099 rows=74513 loops=1)
Total runtime: 2810.434 ms

Best Answer

Manually disabling planner methods to force different query plans is a desperate measure of last resort and typically hiding problems, plus causing more problems later. In short: don't.

The solution to your problem is a complete rewrite of the function.

Your temp tables do not have any indexes or even table statistics. You would have to run ANALYZE on each temp table manually after creation.

And possibly create indexes. This would most probably allow (much) better query plans, but add more overhead, yet. Still just putting lipstick on a pig.

Instead, replace your complete function body with a single query using CTEs or subqueries instead of temp tables. Much less overhead, and indexes and table statistics of the underlying tables can be used. That will be much faster.

I am not going to do the rewrite for you. Too much work, and well beyond the scope of OT questions here.